Commit Graph

24 Commits

Author SHA1 Message Date
Michael Bolin
5cf1692782 Add new Thrift API: hgClearDirstate(mountPoint)
Summary:
When Hg tells the `dirstate` to `clear()`, we should also clear out any data we
have on the server for the `Dirstate`.

As it stands, the way we subclass `dirstate`, it does not appear like `clear()`
should be called, in practice, though one thing that could call it is
`hg debugrebuilddirstate`. It is probably good for us to have an RPC lying
around that we can use to reset the `Dirstate.`

Reviewed By: wez

Differential Revision: D5675298

fbshipit-source-id: 38926cfd93f4f83e4c28910f812a693cb32e423a
2017-08-22 16:50:24 -07:00
Michael Bolin
e050ffcce2 Add new Thrift API: hgDeleteDirstateTuple(mountPoint, relativePath)
Summary:
Previously, we were overloading `hgSetDirstateTuple()` to also make it possible
to delete an entry from the internal `hgDirstateTuples` map. Now we have an
explicit method to do this, which enables us to remove some hacks/TODOs.

Reviewed By: simpkins

Differential Revision: D5665170

fbshipit-source-id: bc0753d4990c8966fd9e6c50b29a954d5023292f
2017-08-18 21:49:59 -07:00
Michael Bolin
5a2246acbf Address a major outstanding TODO in Dirstate::hgGetDirstateTuple.
Summary:
Previously, `Dirstate::hgGetDirstateTuple()` was reporting a status of
`DirstateNonnormalFileStatus::NotTracked` even when the true status was
`Normal`.

Falsely reporting the status has serious consequences when running `hg log` on
an existing, tracked file. Specifically, it causes this `f not in wctx`
condition to be `True` here:

https://phab.mercurial-scm.org/diffusion/HG/browse/default/mercurial/cmdutil.py;da8bdeb1be28b976909a963c89e974264686e2bb$2316

which in turn causes the slow path to be selected:

https://phab.mercurial-scm.org/diffusion/HG/browse/default/mercurial/cmdutil.py;da8bdeb1be28b976909a963c89e974264686e2bb$2320

For large repositories like ours, this can be very, very slow.

There are still some TODOs in the new implementation, but this seems much more
faithful to the true implementation than what we had before.

Reviewed By: quark-zju

Differential Revision: D5655741

fbshipit-source-id: 07b953e23e4d74c480ac2d94dfc6a8df9df4fcbb
2017-08-18 21:49:59 -07:00
Michael Bolin
57f5d72a27 Reimplement dirstate used by Eden's Hg extension as a subclass of Hg's dirstate.
Summary:
This is a major change to Eden's Hg extension.

Our initial attempt to implement `edendirstate` was to create a "clean room"
implementation that did not share code with `mercurial/dirstate.py`. This was
helpful in uncovering the subset of the dirstate API that matters for Eden. It
also provided a better safeguard against upstream changes to `dirstate.py` in
Mercurial itself.

In this implementation, the state transition management was mostly done
on the server in `Dirstate.cpp`. We also made a modest attempt to make
`Dirstate.cpp` "SCM-agnostic" such that the same APIs could be used for
Git at some point.

However, as we have tried to support more of the sophisticated functionality
in Mercurial, particularly `hg histedit`, achieving parity between the clean room
implementation and Mercurial's internals has become more challenging.
Ultimately, the clean room implementation is likely the right way to go for Eden,
but for now, we need to prioritize having feature parity with vanilla Hg when
using Eden. Once we have a more complete set of integration tests in place,
we can reimplement Eden's dirstate more aggressively to optimize things.

Fortunately, the [[ https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/ | sqldirstate ]]
extension has already demonstrated that it is possible to provide a faithful
dirstate implementation that subclasses the original `dirstate` while using a different
storage mechanism. As such, I used `sqldirstate` as a model when implementing
the new `eden_dirstate` (distinguishing it from our v1 implementation, `edendirstate`).

In particular, `sqldirstate` uses SQL tables as storage for the following private fields
of `dirstate`: `_map`, `_dirs`, `_copymap`, `_filefoldmap`, `_dirfoldmap`. Because
`_filefoldmap` and `_dirfoldmap` exist to deal with case-insensitivity issues, we
do not support them in `eden_dirstate` and add code to ensure the codepaths that
would access them in `dirstate` never get exercised. Similarly, we also implemented
`eden_dirstate` so that it never accesses `_dirs`. (`_dirs` is a multiset of all directories in the
dirstate, which is an O(repo) data structure, so we do not want to maintain it in Eden.
It appears to be primarily used for checking whether a path to a file already exists in
the dirstate as a directory. We can protect against that in more efficient ways.)

That leaves only `_map` and `_copymap` to worry about. `_copymap` contains the set
of files that have been marked "copied" in the current dirstate, so it is fairly small and
can be stored on disk or in memory with little concern. `_map` is a bit trickier because
it is expected to have an entry for every file in the dirstate. In `sqldirstate`, it is stored
across two tables: `files` and `nonnormalfiles`. For Eden, we already represent the data
analogous to the `files` table in RocksDB/the overlay, so we do not need to create a new
equivalent to the `files` table. We do, however, need an equivalent to the `nonnormalfiles`
table, which we store in as Thrift-serialized data in an ordinary file along with the `_copymap`
data.

In our Hg extension, our implementation of `_map` is `eden_dirstate_map`, which is defined
in a Python file of the same name. Our implementation of `_copymap` is `dummy_copymap`,
which is defined in `eden_dirstate.py`. Both of these collections are simple pass-through data
structures that translate their method calls to Thrift server calls. I expect we will want to
optimize this in the future via some client-side caching, as well as creating batch APIs for talking
to the server via Thrift.

One advantage of this new implementation is that it enables us to delete
`eden/hg/eden/overrides.py`, which overrode the entry points for `hg add` and `hg remove`.
Between the recent implementation of `dirstate.walk()` for Eden and this switch
to the real dirstate, we can now use the default implementation of `hg add` and `hg remove`
(although we have to play some tricks, like in the implementation of `eden_dirstate.status()`
in order to make `hg remove` work).

In the course of doing this revision, I discovered that I had to make a minor fix to
`EdenMatchInfo.make_glob_list()` because `hg add foo` was being treated as
`hg add foo/**/*` even when `foo` was just a file (as opposed to a directory), in which
case the glob was not matching `foo`!

I also had to do some work in `eden_dirstate.status()` in which the `match` argument
was previously largely ignored. It turns out that `dirstate.py` uses `status()` for a number
of things with the `match` specified as a filter, so the output of `status()` must be filtered
by `match` accordingly. Ultimately, this seems like work that would be better done on the
server, but for simplicity, we're just going to do it in Python, for now.

For the reasons explained above, this revision deletes a lot of code `Dirstate.cpp`.
As such, `DirstateTest.cpp` does not seem worth refactoring, though the scenarios it was
testing should probably be converted to integration tests. At a high level, the role of
`DirstatePersistence` has not changed, but the exact data it writes is much different.
Its corresponding unit test is also disabled, for now.

Note that this revision does not change the name of the file where "dirstate data" is written
(this is defined as `kDirstateFile` in `ClientConfig.cpp`), so we should blow away any existing
instances of this file once this change lands. (It is still early enough in the project that it does
not seem worth the overhead of a proper migration.)

The true test of the success of this new approach is the ease with which we can write more
integration tests for things like `hg histedit` and `hg graft`. Ideally, these should require very
few changes to `eden_dirstate.py`.

Reviewed By: simpkins

Differential Revision: D5071778

fbshipit-source-id: e8fec4d393035d80f36516ac050cad025dc3ba31
2017-05-26 12:05:29 -07:00
Adam Simpkins
a521832407 simplify the Dirstate::removeAll() code
Summary:
This replaces the shouldFileBeDeletedByHgRemove() function with a much simpler
and safer check.

It turns out this code was more complicated than it needed to be: it looked up
the inode in question and confirmed that it still existed, even though it's
calling function had already previously looked up the inode object.

I believe this deletes the last place in the code that still assumes that an
InodeBase object must be loaded if the inode has previously been materialized.
This also removes one of the last remaining places that directly holds the
TreeInode contents_ lock outside of TreeInode.cpp.

Reviewed By: bolinfest

Differential Revision: D4968834

fbshipit-source-id: a37bc0d7889eb81cca5b045cbc82732a53ef53d4
2017-05-01 18:54:41 -07:00
Adam Simpkins
51df0c0663 refactor Dirstate::addAll() to clean up some old code
Summary:
This updates the Dirstate::addAll() code to use EdenMount::diff() internally,
instead of getStatusForExistingDirectory().

This lets us delete getStatusForExistingDirectory() and several other helper
functions that it used.  The logic in getStatusForExistingDirectory() was based
on the incorrect assumption that files can only be modified from their expected
source control state if they are materialized.  This could result in incorrect
results in a variety of cases (particularly after renames, or "hg reset"
operations).  The EdenMount::diff() code does not have this problem.

That said, this new addAll() code does still have some performance issues--it
currently does a full tree diff for each input path.  We should ideally fix
this in the future to only diff the necessary subtree for each path.  However,
in the short term this trade-off seems worth being able to delete all of this
older, buggy code.  diff() should be cheap enough in most cases that this won't
be a major problem unless a large number of paths are given as input.

Reviewed By: bolinfest

Differential Revision: D4968835

fbshipit-source-id: 1834aa98a26dcaa0e1c06c7ac25c57944fa1b5f7
2017-05-01 18:54:41 -07:00
Adam Simpkins
87cbfe142b update the in-memory snapshot correctly after a commit
Summary:
This fixes "hg commit" so that it correctly updates the in-memory snapshot.
This has been broken ever since I added the in-memory snapshot when
implementing checkout().  The existing scmMarkCommitted() method updated only
the Dirstate object and the on-disk SNAPSHOT file.

This diff fixes checkout() and resetCommit() to clear the Dirstate user
directives correctly, and then replaces calls to scmMarkCommitted() with
resetCommit().

Reviewed By: bolinfest

Differential Revision: D4935943

fbshipit-source-id: 5ffcfd5db99f30c730ede202c5e013afa682bac9
2017-04-24 18:06:59 -07:00
Adam Simpkins
a6ae3edab9 move eden/utils and eden/fuse into eden/fs
Summary:
This change makes it so that all of the C++ code related to the edenfs daemon
is now contained in the eden/fs subdirectory.

Reviewed By: bolinfest, wez

Differential Revision: D4889053

fbshipit-source-id: d0bd4774cc0bdb5d1d6b6f47d716ecae52391f37
2017-04-14 11:39:02 -07:00
Adam Simpkins
6cced7c449 implement "hg status" using EdenMount::diff()
Summary:
This is an initial pass at implementing "hg status" using the new EdenMount and
TreeInode diff() code.

More work still needs to be done to clean things up here, but this makes
"hg status" more correct in the face of modified files and directories that are
not modified, and vice-versa.

In particular, the following still needs to be cleaned up in future diffs:
- Update the "hg status" code to more correctly process user directives for
  paths that did not appear in the diff output.  This will probably be
  simplified some by correctly updating the user directive list on checkout and
  resetCommit() operations.  It may also make things simpler if we store the
  user directives in a hierarchical path map, rather than in a flat list.
- Update the addAll() code to also use the new diff logic, rather than the
  getStatusForExistingDirectory() function.
- Clean up the locking some.  I think it may be worth changing the code to use
  a single lock to manage both the current commit ID for the mount and the
  dirstate user directives.  We probably should hold this lock for the duration
  of a diff operation.  The diff operation should also return the commit ID
  that it is based against.

Reviewed By: wez

Differential Revision: D4752281

fbshipit-source-id: 6a6d7e2815321b09c75e95aa7460b0da41931dc0
2017-03-30 21:35:00 -07:00
Adam Simpkins
d1556c3601 implement Future-based recursive Inode lookup
Summary:
Rename the current EdenMount::getInodeBase() function to
EdenMount::getInodeBaseBlocking() and add a new getInodeBase() function that
performs the lookup asynchronously and returns a Future<InodePtr>.

This is implemented with a TreeInode::getChildRecursive() function that allows
asynchronous recursive lookups at any point in the tree.

Reviewed By: bolinfest

Differential Revision: D4427855

fbshipit-source-id: aca55e681a48c848b085b7fc6a13efe6cf0a4e3a
2017-01-25 16:56:12 -08:00
Adam Simpkins
251da81f36 update all copyright statements to "2016-present"
Summary:
Update copyright statements to "2016-present".  This makes our updated lint
rules happy and complies with the recommended license header statement.

Reviewed By: wez, bolinfest

Differential Revision: D4433594

fbshipit-source-id: e9ecb1c1fc66e4ec49c1f046c6a98d425b13bc27
2017-01-20 22:03:02 -08:00
Adam Simpkins
3a73253057 add new InodePtr classes
Summary:
This defines our own custom smart pointer type for Inode objects.  This will
provide us with more control over Inode lifetime, allowing us to decide if we
want to unload them immediately when they become unreferenced, or keep them
around for a while.

This will also allow us to fix some memory management issues around EdenMount
destruction.  Currently we destroy the EdenMount immediately when it is
unmounted.  This can cause issues if other parts of the code are still holding
references to Inode objects from this EdenMount.  Our custom InodePtr class
will also allow us to delay destroying the EdenMount until all of its Inodes
have been destroyed.

This diff adds the new pointer types and updates the code to use them, but does
not actually implement destroying unreferenced inodes yet.  The logic for that
has proven to be slightly subtle; I will split it out into its own separate
diff.

Reviewed By: wez

Differential Revision: D4351072

fbshipit-source-id: 7a9d81cbd226c9662a79a2f2ceda82fe2651f312
2017-01-17 15:03:20 -08:00
Michael Bolin
1ea4af26c5 Use the StatusCode from Thrift as the canonical representation.
Summary:
Most of the work for this diff was achieved via:

```
find eden -type f | xargs sed -i -e 's#ThriftHgStatusCode#StatusCode#g'
find eden -type f | xargs sed -i -e 's#HgStatusCode#StatusCode#g'
```

Reviewed By: simpkins

Differential Revision: D4237975

fbshipit-source-id: 2ee04a89101291c8972ac7bd3ff6cca92cbb0799
2016-12-16 17:49:05 -08:00
Michael Bolin
dfd1634170 Move batch processing of hg add <directory> to the server.
Summary:
This should make the common case of `hg add` or `hg add .` much more efficient
because it no longer performs a walk of the entire repository from the client
side.

Reviewed By: simpkins

Differential Revision: D4317871

fbshipit-source-id: 7061553fe0de0c4afa84b4f835316965088675e8
2016-12-15 13:02:38 -08:00
Michael Bolin
5e08c5e1a7 Introduce Dirstate::getStatusForDirectory().
Summary:
This will make it easier to implement `hg add <directory>` such that
`<directory>` is expanded on the server rather than on the client.

Reviewed By: simpkins

Differential Revision: D4318735

fbshipit-source-id: cf0e89bd95eb58304cd23e70beb77bc7151f2c5c
2016-12-13 12:00:22 -08:00
Adam Simpkins
89fb0f811b add InodePtr, TreeInodePtr, and FileInodePtr type names
Summary:
Define InodePtr, TreeInodePtr, and FileInodePtr as aliases for std::shared_ptr
of the underlying inode type.  This also updates all of the code to use these
new type names.

This will make it easier swap out std::shared_ptr with a custom pointer type in
the future.  (I believe we will need a custom type in the future so that we
can have more precise control of the reference counting so we can load and
unload Inode objects on demand.  std::shared_ptr::unique() doesn't quite
provide the flexibility we need, and is also being deprecated in C++17.)

Reviewed By: bolinfest

Differential Revision: D4297791

fbshipit-source-id: 1080945649290e676f62689592159f1166159b20
2016-12-12 17:50:35 -08:00
Michael Bolin
309d0da769 Make it possible to make a commit from Eden.
Summary:
In this revision, we override `committablectx.markcommitted()` to make a Thrift
call to Eden to record the new commit. For now, this defers the need for us to
implement `edendirstate.normal()`, though we expect we will have to provide a
proper implementation at some point.

Because `hg update` is not implemented yet, this puts us in a funny state where
we have to restart eden after `hg commit` to ensure all of our `TreeEntry` and
other in-memory data structures are in the correct state.

Reviewed By: simpkins

Differential Revision: D4249214

fbshipit-source-id: 8ec06dfee67070f008dd93a0ee6c810ce75d2faa
2016-12-10 01:07:06 -08:00
Michael Bolin
017b35bca6 Add support for hg remove for an individual file.
Summary:
This adds support for `hg remove <file>` in Eden and sets up some of the scaffolding
to support `hg remove <directory>`.

Note that the Thrift API for `scmRemove()` is slightly different than that of `scmAdd()`
in that it returns a list of error messages to display to the user rather than throwing
an exception. In practice, for batch operations, Mercurial will allow some operations
to succeed while others may fail, so it is possible to have multiple error messages to
return.

Unlike the current implementation of `hg add`, this does the directory traversal
on the server rather than on the client. Once we work out how to do this for
`hg remove`, we should figure out how to reuse the logic for `hg add`.

Reviewed By: simpkins

Differential Revision: D4263068

fbshipit-source-id: d084774d562c48c59664f313eba229d4197929fe
2016-12-09 19:36:17 -08:00
Adam Simpkins
74fa63d0d1 store a pointer to the EdenMount in the Dirstate
Summary:
Update the Dirstate to store a pointer to the EdenMount object that owns it,
rather than storing pointers to the lower-level MountPoint and ObjectStore
objects.

This change is necessary in order for me to move more functionality from
MountPoint to EdenMount.  (In particular, I plan to move the InodeDispatcher to
the EdenMount.)

As part of this change I also started moving some APIs from MountPoint to
EdenMount.  For now the EdenMount versions are just thin wrappers on top of the
MountPoint APIs.  I will move the functionality directly into EdenMount in a
future diff.

Reviewed By: bolinfest

Differential Revision: D4255675

fbshipit-source-id: 93749c6516c3cea4b4ae93de4ca49ddf05f4d260
2016-12-01 17:52:30 -08:00
Adam Simpkins
87c2d46ce3 write dirstate file atomically
Summary:
Write the dirstate data using the new folly::writeFileAtomic() function.
This ensures that the dirstate file will always contain full, valid contents,
even if we crash or run out of disk space partway through writing out the data.

This diff also includes a couple other minor tweaks:
- Update Dirstate to store the DirstatePersistence object directly inline,
  rather than allocating it separately on the heap.
- Update the DirstatePersistenceTest code to prefix temporary directories with
  "eden_test".  This just makes it easier to tell if the tests fail to clean up
  after themselves for any reason.

Reviewed By: bolinfest

Differential Revision: D4254027

fbshipit-source-id: 6b6601b65aeacdee998a6c4260e972d5fb2426ac
2016-12-01 17:52:30 -08:00
Adam Simpkins
aaa3332644 simplify EdenMount and Dirstate construction
Summary:
This cleans up construction of the EdenMount and Dirstate objects:

- The EdenMount constructor is now responsible for creating the Overlay and
  Dirstate objects.
- The Dirstate constructor is now responsible for loading the
  DirstatePersistence file.
- The EdenMount now takes ownership of the ClientConfig object, and stores it
  for later use.
- The ClientConfig object now has a method to get the path to the
  DirstatePersistence file.
- I added a ClientConfig::createTestConfig() method, so that the TestMount code
  can now use the same EdenMount constructor as the normal code.

This simplifies the logic in EdenServiceHandler and TestMount, and makes some
of the initialization dependencies a little bit simpler.

This change is necessary in order for me to move some logic from
fusell::MountPoint into EdenMount.  The Dirstate object will need a pointer
back to its EdenMount object, and this diff enables that.

Reviewed By: bolinfest

Differential Revision: D4249393

fbshipit-source-id: 439786accbf48c8696dbc6ca4fe77a4c6bdeab65
2016-12-01 17:52:30 -08:00
Michael Bolin
8c19620e62 Support directories in Dirstate::computeDelta().
Summary:
Previous to this commit, the `Dirstate` logic only worked correctly when the
changes occurred in the root directory. Obviously that is very limiting, so this
adds support for changes in arbitrary directories at arbitrary depths.

This also introduces support for things like a file being replaced by a
directory of same name or vice versa. The tests have been updated to verify
these cases.

One interesting design change that fell out of this was the addition of the
`removedDirectories` field to the `DirectoryDelta` struct. As you can see,
all entries in a removed directory need to be processed by the new
`addDeletedEntries()` function. These require special handling because deleted
directories do not show up in the traversal of modified directories.

In contrast, new directories do show up in the traversal, so they require a
different type of special handling. Specifically, this call will return `NULL`:

```
auto tree = getTreeForDirectory(directory, rootTree.get(), objectStore);
```

When this happens, we must pass an empty vector of tree entries to
`computeDelta()` rather than `&tree->getTreeEntries()`. Admittedly, the special
case of new directories is much simpler than the special case of deleted ones.

Reviewed By: simpkins

Differential Revision: D4219478

fbshipit-source-id: 4c805ba3d7688c4d12ab2ced003a7f5c19ca07eb
2016-11-29 06:51:14 -08:00
Michael Bolin
b078392a9c Add Thrift endpoints for Hg dirstate.
Summary:
This also introduces the change where the `EdenMount` creates
and takes ownership of the `Dirstate`.

To clean some of this up, I had to expose a `getEdenDir()` method on `EdenServer`
that returns an `AbsolutePathPiece`. This was previously stored internally as a
`std::string`, so I had to clean up a bunch of path construction that was using `edenDir_`.

Reviewed By: simpkins

Differential Revision: D4123763

fbshipit-source-id: 270b182521c1a84bb054832f4b5f92af849d67e4
2016-11-26 12:01:41 -08:00
Michael Bolin
0f834ea809 Flip Dirstate -> EdenMount dependency.
Summary:
Previously, `Dirstate` took a `std::shared_ptr<EdenMount>`, but now it takes
pointers to a `MountPoint` and an `ObjectStore` because it does not need the
entire `EdenMount`. Ultimately, this will enable us to have `EdenMount` create
the `Dirstate` itself, but that will be done in a follow-up commit.

Fortunately, it was pretty easy to remove the references to `edenMount_` in
`Dirstate.cpp` and rewrite them in terms of `mountPoint_` or `objectStore_`.
The one thing that I also decided to move was `getModifiedDirectoriesForMount()`
because I already needed to create an `EdenMounts` file (admittedly not a
great name) to collect some utility functions that use members of an `EdenMount`
while not having access to the `EdenMount` itself.

As part of this change, all of the code in `eden/fs/model/hg` has been moved to
`eden/fs/inodes` so that it is alongside `EdenMount`. We are going to change
the `Dirstate` from an Hg-specific concept to a more general concept.

`LocalDirstatePersistence` is no longer one of two implementations of
`DirstatePersistence`. (The other was `FakeDirstatePersistence`.) Now there is
just one concrete implementation called `DirstatePersistence` that takes its
implementation from `LocalDirstatePersistence`. Because there is no longer a
`FakeDirstatePersistence`, `TestMount` must create a `DirstatePersistence` that
uses a `TemporaryFile`.

Because `TestMount` now takes responsibility for creating the `Dirstate`, it
must also give callers the ability to specify the user directives. To that end,
`TestMountBuilder` got an `addUserDirectives()` method while `TestMount` got a
`getDirstate()` method. Surprisingly, `TestMountTest` did not need to be updated
as part of this revision, but `DirstateTest` needed quite a few updates
(which were generally mechanical).

Reviewed By: simpkins

Differential Revision: D4230154

fbshipit-source-id: 9b8cb52b45ef5d75bc8f5e62a58fcd1cddc32bfa
2016-11-26 12:01:41 -08:00