Summary:
This is a major change to Eden's Hg extension.
Our initial attempt to implement `edendirstate` was to create a "clean room"
implementation that did not share code with `mercurial/dirstate.py`. This was
helpful in uncovering the subset of the dirstate API that matters for Eden. It
also provided a better safeguard against upstream changes to `dirstate.py` in
Mercurial itself.
In this implementation, the state transition management was mostly done
on the server in `Dirstate.cpp`. We also made a modest attempt to make
`Dirstate.cpp` "SCM-agnostic" such that the same APIs could be used for
Git at some point.
However, as we have tried to support more of the sophisticated functionality
in Mercurial, particularly `hg histedit`, achieving parity between the clean room
implementation and Mercurial's internals has become more challenging.
Ultimately, the clean room implementation is likely the right way to go for Eden,
but for now, we need to prioritize having feature parity with vanilla Hg when
using Eden. Once we have a more complete set of integration tests in place,
we can reimplement Eden's dirstate more aggressively to optimize things.
Fortunately, the [[ https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/ | sqldirstate ]]
extension has already demonstrated that it is possible to provide a faithful
dirstate implementation that subclasses the original `dirstate` while using a different
storage mechanism. As such, I used `sqldirstate` as a model when implementing
the new `eden_dirstate` (distinguishing it from our v1 implementation, `edendirstate`).
In particular, `sqldirstate` uses SQL tables as storage for the following private fields
of `dirstate`: `_map`, `_dirs`, `_copymap`, `_filefoldmap`, `_dirfoldmap`. Because
`_filefoldmap` and `_dirfoldmap` exist to deal with case-insensitivity issues, we
do not support them in `eden_dirstate` and add code to ensure the codepaths that
would access them in `dirstate` never get exercised. Similarly, we also implemented
`eden_dirstate` so that it never accesses `_dirs`. (`_dirs` is a multiset of all directories in the
dirstate, which is an O(repo) data structure, so we do not want to maintain it in Eden.
It appears to be primarily used for checking whether a path to a file already exists in
the dirstate as a directory. We can protect against that in more efficient ways.)
That leaves only `_map` and `_copymap` to worry about. `_copymap` contains the set
of files that have been marked "copied" in the current dirstate, so it is fairly small and
can be stored on disk or in memory with little concern. `_map` is a bit trickier because
it is expected to have an entry for every file in the dirstate. In `sqldirstate`, it is stored
across two tables: `files` and `nonnormalfiles`. For Eden, we already represent the data
analogous to the `files` table in RocksDB/the overlay, so we do not need to create a new
equivalent to the `files` table. We do, however, need an equivalent to the `nonnormalfiles`
table, which we store in as Thrift-serialized data in an ordinary file along with the `_copymap`
data.
In our Hg extension, our implementation of `_map` is `eden_dirstate_map`, which is defined
in a Python file of the same name. Our implementation of `_copymap` is `dummy_copymap`,
which is defined in `eden_dirstate.py`. Both of these collections are simple pass-through data
structures that translate their method calls to Thrift server calls. I expect we will want to
optimize this in the future via some client-side caching, as well as creating batch APIs for talking
to the server via Thrift.
One advantage of this new implementation is that it enables us to delete
`eden/hg/eden/overrides.py`, which overrode the entry points for `hg add` and `hg remove`.
Between the recent implementation of `dirstate.walk()` for Eden and this switch
to the real dirstate, we can now use the default implementation of `hg add` and `hg remove`
(although we have to play some tricks, like in the implementation of `eden_dirstate.status()`
in order to make `hg remove` work).
In the course of doing this revision, I discovered that I had to make a minor fix to
`EdenMatchInfo.make_glob_list()` because `hg add foo` was being treated as
`hg add foo/**/*` even when `foo` was just a file (as opposed to a directory), in which
case the glob was not matching `foo`!
I also had to do some work in `eden_dirstate.status()` in which the `match` argument
was previously largely ignored. It turns out that `dirstate.py` uses `status()` for a number
of things with the `match` specified as a filter, so the output of `status()` must be filtered
by `match` accordingly. Ultimately, this seems like work that would be better done on the
server, but for simplicity, we're just going to do it in Python, for now.
For the reasons explained above, this revision deletes a lot of code `Dirstate.cpp`.
As such, `DirstateTest.cpp` does not seem worth refactoring, though the scenarios it was
testing should probably be converted to integration tests. At a high level, the role of
`DirstatePersistence` has not changed, but the exact data it writes is much different.
Its corresponding unit test is also disabled, for now.
Note that this revision does not change the name of the file where "dirstate data" is written
(this is defined as `kDirstateFile` in `ClientConfig.cpp`), so we should blow away any existing
instances of this file once this change lands. (It is still early enough in the project that it does
not seem worth the overhead of a proper migration.)
The true test of the success of this new approach is the ease with which we can write more
integration tests for things like `hg histedit` and `hg graft`. Ideally, these should require very
few changes to `eden_dirstate.py`.
Reviewed By: simpkins
Differential Revision: D5071778
fbshipit-source-id: e8fec4d393035d80f36516ac050cad025dc3ba31
Summary:
I was working on a new test and I got an error that `directaccess` must be
enabled for `inhibit` to work.
Reviewed By: simpkins
Differential Revision: D5077133
fbshipit-source-id: cc5235c845e3f299f96e1c901ef4aea18ca57b76
Summary:
I had to add simple implementations to various things in `edendirstate`
in order to be able to run `hg histedit`. There is still a lot more to do, but
at least this gives us a starting point to iterate and a test to demonstrate
the most simple functionality.
Reviewed By: wez
Differential Revision: D5049308
fbshipit-source-id: 34727f633c003cacae44108eb3ece06590098c7b
Summary:
Note that we must specify quite a few extensions to get behavior that is
representative of how Hg works at Facebook.
Reviewed By: DurhamG
Differential Revision: D5057478
fbshipit-source-id: ee774a9b8dcebe82e4b19cc52f9b0b5a53e6420c
Summary:
Recall that we override `$HOME` in integration tests, so this will not overwrite
your personal `~/.hgrc` when you run an integration test.
An upcoming integration test for `hg histedit` that I am working on requires
this value to be set.
Reviewed By: wez
Differential Revision: D5051112
fbshipit-source-id: 2fd8541aa6504640b08337fdc22160e243beaae3
Summary:
`HgExtensionTestBase.assert_status()` was added in D4814422, but it was only
applied to `update_test.py`. This change updates the docstring (it appears to
have been copy/pasted from a nearby method), and makes use of it in the other
integration tests.
Reviewed By: wez
Differential Revision: D5050775
fbshipit-source-id: bb70740b6f455a84e7a22c3286c8ddbe2462f816
Summary:
Previous to this change, when I would add `import pudb; pudb.set_trace()` to do
some debugging, two annoying things would happen:
- I would have to edit the `TARGETS` file to add `pudb` as a dependency and
then `buck build eden/integration/hg` again.
- When I hit a breakpoint using `pudb`, I would have to go through the welcome
screen, change the theme, etc., because my settings were not found.
Now that I figured out what the problem was, I added instructions to the
`TARGETS` file to help others fall into the pit of success.
Reviewed By: wez
Differential Revision: D5050725
fbshipit-source-id: 1896f9f52eb056b3295b2d8e896dabb5d990ba22
Summary:
This fixes "hg commit" so that it correctly updates the in-memory snapshot.
This has been broken ever since I added the in-memory snapshot when
implementing checkout(). The existing scmMarkCommitted() method updated only
the Dirstate object and the on-disk SNAPSHOT file.
This diff fixes checkout() and resetCommit() to clear the Dirstate user
directives correctly, and then replaces calls to scmMarkCommitted() with
resetCommit().
Reviewed By: bolinfest
Differential Revision: D4935943
fbshipit-source-id: 5ffcfd5db99f30c730ede202c5e013afa682bac9
Summary:
This updates how we build and package the eden hg extension, and how we find it
during integration tests.
- Update the extension to always look relative to its current location to find
the other modules it depends on. This ensures that the integration tests
always find modules from the local repository, and do not use the modules
installed on the system.
- Add a buck rule to unpack the python archive at build time. This is needed
for integration tests to use the local version of the module.
- Ensure that we install a correct `hgext3rd/__init__.py` module in the eden
extension directory. This is required to correctly set up `hgext3rd` as a
namespace package. This also unfortunately needs to be a `.py` file, and not
just a .pyc file. (The pkgutil.expand_path() code looks specifically for
directories containing `__init__.py` files, and does not check for
`__init__.pyc`.)
- Update the extension to only try importing the native thrift modules if we
are running python 2.7.6 or greater. Python 2.7.6 is the first that supports
unicode arguments to `struct.pack()`, which thrift requires. Python 2.7.5 can
import the thrift modules, but throws errors when trying to run them.
Reviewed By: bolinfest
Differential Revision: D4935279
fbshipit-source-id: 9af81736124c55476a5eb5beba9474a4371a639b
Summary:
Fix a subtle crash during checkout when handling newly added entries that
already exist in the working directory: CheckoutAction passed the entry name to
checkoutUpdateEntry() as a PathComponentPiece. However, this
PathComponentPiece could refer to the entry name owned by newScmEntry_, and it
also passed newScmEntry_ into checkoutUpdateEntry() as an rvalue reference.
As a result, if the string data was stored invalidated by the move the name
would no longer be valid when checkoutUpdateEntry() tried to use it.
This bug is triggered by doing an "hg update --clean", where a file added in
the destination commit already exists on disk, and has an entry name of 23
characters or less. (The 23 character limit is fbstring's upper bound on
small string optimizations, where it will store the string data inline in the
object, causing it to be invalidated on move.)
This also fixes a crash in a VLOG() statement when the verbose log level for
TreeInode.cpp was set to 4 or greater.
Reviewed By: bolinfest
Differential Revision: D4882544
fbshipit-source-id: 917ede6eeae2224aaa0724b8b30324f3c3a5c924
Summary:
Update the hg extension to implement dirstate.rebuild(). This is necessary for
the `hg reset` command. This also now implements dirstate.setparents() for
cases when there is only one parent.
Reviewed By: wez
Differential Revision: D4823780
fbshipit-source-id: 802de006e03860995095dc3af17acb2eb05f4e8b
Summary:
Add an assert_status() method to the hg integration tests that runs "hg
status", parses the output, then compares it to expected results.
Reviewed By: wez
Differential Revision: D4814422
fbshipit-source-id: 24ebdc2e0239c4833953c31e5786cc320bcd9d62
Summary:
The hg_import_helper script that eden uses to import data from mercurial keeps
a long-lived repository object open. This caches some data about the
repository, and if new commits are added after it was created, it can fail to
see them.
This updates hg_import_helper.py to catch errors that occur when trying to use
the repository objects. The code will invalidate the repository object and
then retry the operation once, in the hopes that it will now succeed after
invalidation.
Reviewed By: bolinfest
Differential Revision: D4752659
fbshipit-source-id: 1c75c84766d6bbda0710882a338eaa09e0cb0030
Summary:
The kernel can return ENOENT in response to invalidation notification if we
have never told the kernel about the inode in question. This resulted in
spurious errors during checkout when updating files that were loaded internally
by edenfs rather than via FUSE call. For instance, this was commonly triggered
by .gitignore files, which eden loads on its own to perform ignore processing.
Reviewed By: bolinfest
Differential Revision: D4752630
fbshipit-source-id: d4e092643a8d33cf33709f7e3664289f167ac093
Summary:
I found it rather awkward in HgExtensionTestBase that self.repo is not actually
the repository being tested. It was instead the repostiory used as the backing
store for the mercurial data, and self.repo_for_mount was the repository being
tested.
This diff renames the two repository classes, so that self.backing_repo is now
the backing store repository, and self.repo is the repository being tested.
In order to do this I changed HgExtensionTestBase to derive directly from
EdenTestCase. Previously it derived from EdenHgTest, and was letting
EdenHgTest set up self.repo. It seemed more understandable to avoid deriving
EdenHgTest now since self.repo is not the repository that needs to be set up
initially.
Reviewed By: bolinfest
Differential Revision: D4752631
fbshipit-source-id: d8b542b0ecead66b965af1a582085345e28b2908
Summary:
Previously the eden hg extension short-circuited the checkout operation if the
destination commit was the same as the one currently checked out. This was
incorrect if --clean was specified, since we do need to reset the working
directory state in this case.
This updates the extension code to always make the thrift checkout() call when
doing a force checkout.
This also avoids calling applyupdates() to resolve conflicts when force=True.
When doing a force checkout, eden reports files with conflicts that it
overwrote, but these do not need to be resolved by mercurial.
Additionally, this also updates a couple other APIs that have recently been
changed in upstream mercurial: merge.update() now takes an updatecheck
argument, and repo.join() should now be written as repo.vfs.join().
Reviewed By: bolinfest
Differential Revision: D4752510
fbshipit-source-id: e1ee92d086315e35a1378f674e668876a667c0ce
Summary:
Move the integration tests from eden/fs/integration up one directory, to
eden/integration.
The main benefit is that this makes it easy to run just the edenfs unit tests
by running "buck test eden/fs/...". These unit tests complete much more
quickly than the full set of integration tests, providing a faster test suite
to re-run repeatedly during development. The integration tests can be run with
"buck test eden/integration/...", and the full set of tests can still be run
with "buck test eden/..."
Reviewed By: wez
Differential Revision: D4490247
fbshipit-source-id: 5ceb5a19526f56e1cb926f352fa30ad2f1212c05