Summary:
Update the gitignore handling code to perform pattern matching the same way git
does. Previously the code just called the standard fnmatch() function, which
does not handle "**" in patterns the same way git does.
This includes our own new implementation of glob pattern matching. I did
evaluate several other options before writing our own implementation here:
- The wildmatch() code used by git (and watchman, and rsync) has a few
downsides: it is not distributed by itself as a library anywhere else.
Therefore we would probably have to include a copy of this code in our
repository. Making another copy is unfortunate, and somewhat undesirable
from a legal and licensing perspective. This code also only works with
nul-terminated strings, and our code deals primarily with non-terminated
StringPiece objects.
- I did look at translating glob patterns in to regular expressions and using
re2 to perform matching. Unfortunately re2 turns out to be substantially
slower than wildmatch() for typical gitignore patterns.
This new implementation performs some preprocessing on the glob pattern, and
generates a pattern opcode buffer. Eden can perform this glob preprocessing
when it first loads a .gitignore file, and can then save and re-use this result
each time it needs to match a filename. Doing this preprocessing allows
matching to be done 50% to 100% faster than wildmatch() for typical glob
patterns.
Reviewed By: bolinfest
Differential Revision: D4194573
fbshipit-source-id: 46bc6a61b6d8066f4bbdb5d3e74265a3e72e42cc
Summary:
This adds some initial code for handling gitignore files.
I did check to see if there were APIs from libgit2 that we could leverage for
this, but it does not look like we can easily use their functionality. The
libgit2 ignore code seems to tightly coupled with their repository data
structures, and it requires that you actually have a git repository.
This code isn't quite 100% compatible with git's semantics yet. In particular:
- For now we are just using fnmatch() to do the matching. This is currently
inefficient as we have to do string allocations on each match attempt. This
also doesn't quite match git's behavior, particularly with regard to "**"
inside patterns.
- The code currently does not have a mechanism for indicating if a path refers
to a directory or not, so trailing slashes in the pattern are not honored
correctly.
We will probably need to implement our own fnmatch-like function in the future
to solve these issues.
Reviewed By: bolinfest
Differential Revision: D4156480
fbshipit-source-id: 8ceaefd3805358ae2edc29bfc316e5c8f2fb7d31
Summary:
For now, initial state is represented by a
`std::unordered_map<RelativePath, HgUserStatusDirective>`.
Reviewed By: simpkins
Differential Revision: D4123461
fbshipit-source-id: 83a99e1f504dd1efca1bc1ed33cbc3f116787a80
Summary:
This adds the logic to power `hg rm`. There are comprehensive tests that attempt to cover
all of the edge cases.
This evolved to become a complex change because I realized that I needed to change
my internal representation of the dirstate to implement it properly. Specifically, we now maintain
a map (`userDirectives`) of files that have been explicitly scheduled for change via `hg add` or `hg rm`.
To compute the result of `hg status`, we find the changes between the manifest/root tree
and the overlay and also consult `userDirectives`. `Dirstate::getStatus()` was updated
considerably as part of this commit due to the introduction of `userDirectives`.
As such, `Dirstate::remove()` must do several things:
* Defend the integrity of the dirstate by throwing appropriate exceptions for invalid inputs.
* Delete the specified file, if appropriate.
* Update `userDirectives`, if appropriate.
Although `Dirstate::add()` was not the focus of this commit, it also had to be updated to
match the pattern introduced by `Dirstate::remove()`.
Some important features that are still not supported are:
* Handling ignored files correctly.
* Storing copy/move information.
Reviewed By: simpkins
Differential Revision: D4104503
fbshipit-source-id: d5d45a279e16ded584c6cd4d528ba92d2c8e2993
Summary:
D4014598 changed this line but didn't change the test expectations.
Since it seems desirable for the mount point name to be used, I've reverted
back to the prior state for this line.
Reviewed By: bolinfest
Differential Revision: D4202265
fbshipit-source-id: bddc01436e0a5921a3b0b2c01c0fd2c32f5f1960
Summary:
Originally, D3858635 was going to introduce a scheme for hooks where the
repo type was included in the path:
/etc/eden/hooks/hg/post-clone <args...>
But over the course of the review, we decided to make the repo type a
parameter:
/etc/eden/hooks/post-clone hg <other-args...>
Unfortunately, `generate-hooks-dir` was not updated as part of that
change and it is not covered by unit tests. This error was particularly hard
to discover because of how `ENOENT` is handled, so I added a log statement for
that.
Reviewed By: simpkins
Differential Revision: D4200277
fbshipit-source-id: ffffd871cd78dcaeb717be8f1e01893ce9643a47
Summary:
This is a quick and dirty fix for this issue that was causing
and confusing bug where the memory for the `AbsolutePathPiece`
was getting reclaimed, so when it was later read as the value
for a path, it failed because it was binary garbage.
This is mainly caused by the `std::move(config)` that passes
the `ClientConfig` to the `EdenMount` constructor. I will do
some more general cleanup for that in a follow-up revision,
but I wanted to have this change in its own commit that makes
it clear where the failure/fix were coming from.
Reviewed By: simpkins
Differential Revision: D4198939
fbshipit-source-id: 19e0423a1bee924fa6cc2edc8bae534ef472c988
Summary:
Use new, less confusing names for mentioned thrift methods.
Codemod with 'Yes to All'. Reverted changes in thrift/
Reviewed By: yfeldblum
Differential Revision: D4076812
fbshipit-source-id: 4962ec0aead1f6a45efc1ac7fc2778f39c72e1d0
Summary:
This is the start of the C++ dirstate implementation. It's possible that this
commit does too many things at once:
* Introduces `Dirstate` type.
* Includes logic for serializing/deserializing the dirstate's data so that it persists across Eden restarts.
* Includes logic for basic `hg add` calls.
* Includes unit tests where we model Eden usage via the TestMount utility.
I'm backing this up in Phabricator with `--plan-changes` to start until I get
some basic `hg add` functionality working end-to-end. When that looks good, I'll
determine if/how this should be split into smaller commits.
Reviewed By: wez
Differential Revision: D4023232
fbshipit-source-id: 7fc931d547ccadb34f7caae93bc4eb8f91f6ceb8
Summary:
This is a utility that should be generally useful in creating test,
including the test of `TestMount` itself.
Reviewed By: simpkins
Differential Revision: D4073653
fbshipit-source-id: dda1d8ea8d29aa071a31f8e2afab324f9109e9b2
Summary:
This is a utility that should be generally useful in creating test,
including the test of `TestMount` itself.
As you can see, this helped uncover a bug in the way we were
inserting blobs into `LocalStore`.
Reviewed By: simpkins
Differential Revision: D4073039
fbshipit-source-id: 42683fd0bfdb0a1e77df9324fcaa79091f45e83d
Summary: This is a follow-up revision from a comment on D4013464.
Reviewed By: wez
Differential Revision: D4050278
fbshipit-source-id: 1e46526f58a07e1eedd8ace1a6d84a919240d899
Summary:
This is analogous to the existing `getEntryForFile()` helper function that we
have, and I was able to rewrite `getEntryForFile()` in terms of
`getTreeForDirectory()`, which simplifies the code considerably.
Also moved things from `eden/fs/model/hg/misc.h` to
`eden/fs/store/ObjectStores.h`, which is much more appropriate.
Reviewed By: wez
Differential Revision: D4032817
fbshipit-source-id: ff4d32120fb050f8b5c5c53b7f2e94b524781648
Summary:
This is not a one-liner and this is needed for the upcoming `Dirstate` class,
so moving this code to a place where it is more easily reusable.
Reviewed By: simpkins
Differential Revision: D4032001
fbshipit-source-id: 7d8d87802665ac2993ec0a3ac73c5f645fe4a1aa
Summary:
Performs a depth-first traversal of the overlay to find modified
directories and returns them in that order.
Reviewed By: simpkins
Differential Revision: D4025309
fbshipit-source-id: 09d8ed41b250dddbfb3fe545643ec3fd755a430e
Summary:
Now that I've done all this work, I'm not sure whether it is a good idea or even
necessary. I'll keep it in my back pocket.
Reviewed By: simpkins
Differential Revision: D4014598
fbshipit-source-id: 6ded3cc29838e964b56833ac24dff19e9de040f5
Summary:
These are new helper methods we need to create test scenarios.
They will be used in upcoming revisions.
Reviewed By: wez
Differential Revision: D4046981
fbshipit-source-id: 9c66c456be57006173e4a65eed603de4a426a438
Summary: This should be useful for my upcoming unit tests for the Hg dirstate.
Reviewed By: simpkins
Differential Revision: D4013464
fbshipit-source-id: 46460186abfa104aa026894068cd160e52c94729
Summary: This will make it easier to compare a `TreeEntry` with a `TreeInode::Entry`.
Reviewed By: simpkins
Differential Revision: D4034298
fbshipit-source-id: 29674e2902661bf46394ea71b81537b35bd4b107
Summary: This should make some of the upcoming test harness work a little easier.
Reviewed By: simpkins
Differential Revision: D4011747
fbshipit-source-id: 87ee80a6d641a29be9027b163b1adee496f4452f
Summary:
I need this for the upcoming test harness so I can avoid creating a
`ClientConfig`, which is currently a huge pain to do from a unit test.
Reviewed By: simpkins
Differential Revision: D4010842
fbshipit-source-id: 03d1e1de9c3047340a6f26202d4b432f4a8620b4
Summary:
This was reported by ASAN.
The major issue was that `FakeObjectStore` was returning a copy of a `Tree`,
so it was not the case that the `TreeEntry*` returned by `getEntryForFile()`
was guaranteed to be "owned by" the `Tree* root` that was passed in. To address
this, we change `getEntryForFile()` to now return a copy of the `TreeEntry*`
that it gets back from `getEntryPtr()`. It really comes down to this line:
```
auto entry = currentDirectory->getEntryPtr(piece.basename());
```
because we cannot guarantee that `currentDirectory` will live past the end of
`getEntryForFile()`, so we cannot guarantee that return return value of
`currentDirectory->getEntryPtr()` will, either.
Special thanks to meyering and yfeldblum for helping me debug this.
Reviewed By: simpkins
Differential Revision: D4024627
fbshipit-source-id: 6295e6f2b1d2f544271b2aebad27a4ad3ae04563
Summary:
Utility function that given a `Tree` and a `RelativePathPiece`, returns the
corresponding `TreeEntry` in the `ObjectStore`, if it exists.
Reviewed By: wez
Differential Revision: D3980261
fbshipit-source-id: 2808a4ca45be84e3a6bb91b0cf2db19a3bf88798
Summary:
In an upcoming revision, I am going to introduce a utility function that takes
an `ObjectStore` (well, now an `IObjectStore`) as a parameter and I want to be
able to test it. Having a `FakeObjectStore` should make this considerably easier
without having to resort to mocks.
Reviewed By: simpkins
Differential Revision: D3980580
fbshipit-source-id: 5886e2055c893e749cc898226e1baade776c3ea7
Summary:
Adds a very basic example of testing eden functionality with hypothesis.
We'll be building on this with stateful testing in a follow on diff tomorrow.
There's some prep/setup work in the base test class that can be removed when an updated version of hypothesis ships and is updated in our third-party repo.
Reviewed By: simpkins
Differential Revision: D3968250
fbshipit-source-id: 46382c3bf2d6a0edbd60ac2b048b1bae26ca2572
Summary: This is necessary so that it can be used as the key in an `unordered_map`.
Reviewed By: simpkins
Differential Revision: D3980575
fbshipit-source-id: d225a98f957f9aae2f2f50a6cc365011d953c92e
Summary:
Apparently we did not have an existing unit test for `Tree`, so this adds one.
The other methods should be tested, as well, but I'm about to use `getEntryPtr()`
elsewhere, which is why I just focused on this one for the moment.
Reviewed By: simpkins
Differential Revision: D3980150
fbshipit-source-id: 33456fd621a1894606605af4fee06ba42d124752
Summary:
We want to use these with Eden
Depends on D3961190
Depends on D3961193
Depends on D3961196
Depends on D3961208
Reviewed By: rhysparry
Differential Revision: D3961232
fbshipit-source-id: 56f5a1811625303514e4398a6d47ea90ba348724
Summary:
The default `casecollisionauditor` reads all of `dirstate._map`. This is
(1) expensive for a large repo and (2) accesses a private property that we would
prefer not to expose via `edendirstate`. Setting `portablefilenames = ignore` by
default avoids this check.
This means that Eden users are currently responsible for not creating directory
entries that would cause a case-insensitive collision. Ideally, we would just do
this check on the server.
Reviewed By: wez
Differential Revision: D3964461
fbshipit-source-id: f351bdeaad0fc06cd70cc637ca1b6fde249dde9c
Summary:
The getMaterializedEntries() would previously try to dereference a null pointer
if the input mount path did not refer to a valid mount piont.
Reviewed By: bolinfest, wez
Differential Revision: D3942600
fbshipit-source-id: 2a8c9aa87d2bd8175f7bc77f3d6293ad25e9c198
Summary:
Add some helper functions for constructing EdenError objects from a few
different types of arguments. Also update eden.thrift to indicate that most
functions can throw EdenErrors on failure.
Reviewed By: bolinfest, wez
Differential Revision: D3942588
fbshipit-source-id: 1b561c5310a8a218f88c38c70499e087fe47bbe0
Summary:
Python 2.x requires the current class name be passed into super().
Add arguments to super so that we can use this inside a mercurial extension.
(Mercurial only supports python 2.x.)
Reviewed By: bolinfest
Differential Revision: D3942573
fbshipit-source-id: 06df55f217631a398004c0d25448d3a612f772e9
Summary:
The keys in the config directory map are normalized, absolute paths to the
mount point. When trying to look up a mount point make sure we also always use
a normalized absolute path.
Reviewed By: bolinfest
Differential Revision: D3942565
fbshipit-source-id: 63db838ffc7139d779925adf07c50f849d73bcc5
Summary:
We were hitting an assertion in the case where we did a `mkdir`
followed by a `rename` followed by `getMaterializedEntries`.
The issue is that our in-memory representation has a boolean to indicate
whether a dir inode is materialized, but our serialization format does
not have this bit. When we loaded the data we were not setting the
field to true and this was caught by the DCHECK.
If we have serialized data for a dir then it is, by definition, materialized
and we should just set that field to true.
Reviewed By: bolinfest
Differential Revision: D3900795
fbshipit-source-id: 62d8281e7a1009056d274888c9aff87664d2e09f
Summary:
This design is inspired by that of Git hooks:
https://git-scm.com/docs/githooks
By default, `/etc/eden/hooks` should be the place where Eden looks for
hooks; however, this can be overridden in `~/.edenrc` on a per-`repository` basis.
This directory should be installed as part of installing Eden.
There is information in `eden/hooks/README.md` about this.
The first hook that is supported is for post-clone logic for a repository.
This change demonstrates the need for an `eden config --get <value>`
analogous to what Git has, as hooks should be able to leverage this in their
own scripts. There introduces a `TODO` in `post-clone.py` where such a
feature would be useful, so that I could add the following to my `~/.edenrc`
to develop the Eden extension for Hg:
```
[hooks]
hg.edenextension = /data/users/mbolin/fbsource/fbcode/eden/hg/eden
[repository fbsource]
path = /data/users/mbolin/fbsource
type = hg
hooks = /data/users/mbolin/eden-hooks
```
Note that this revision also introduces a `generate-hooks-dir` script that can be
used to generate the standard `/etc/eden/hooks` directory that we intend to
distribute with Eden. This is also useful in creating the basis for a custom `hooks`
directory that can be specified as shown above in an `~/.edenrc` file.
Reviewed By: simpkins
Differential Revision: D3858635
fbshipit-source-id: 215ca26379a4b3b0a07d50845fd645b4d9ccf0f2
Summary:
simpkins spotted this; we were passing the wrong path down to the overlay saving dir.
This adds a test to prove that the source and destination directory contents
are correct both immediately after performing the rename and after remounting,
where we just read the serialized data.
Reviewed By: simpkins
Differential Revision: D3888694
fbshipit-source-id: 7f5fb5be417db5c693ac8a07b85abbffdbfe0fff
Summary:
This is pretty straightforward; we just walk back until we hit the
boundary with the requested JournalPosition.sequenceNumber
Reviewed By: simpkins
Differential Revision: D3872970
fbshipit-source-id: 1405f05957346d7ac513070f0407a477548aff1d
Summary:
populate the position from the latest journal delta.
To facilitate this, we also define the mountGeneration value to be a
combination of the pid and the time at which we created the EdenMount object,
as well as a global counter that we bump for each mount.
The precise value and meaning of this bits really doesn't matter, just that we
are unlikely to pick the same value for this same mountPoint path again if we
were to remount in the future.
Since we are now in a position to report JournalPosition values to clients, now
is also a good time to fill out the `currentPosition` field for the
`getMaterializedEntries` thrift call, and to check that this value is
consistent with the value we return via `getCurrentJournalPosition`.
Reviewed By: simpkins
Differential Revision: D3872952
fbshipit-source-id: 2fbc25d2e9711035b66ab1bf5d746507b72de265
Summary:
This just populates the initial snapshot hash in the journal.
The `addDelta` method will propagate this into subsequent deltas if the delta
to be added has hash values that have not been set from the default 0-filled
hash values.
Reviewed By: simpkins
Differential Revision: D3872936
fbshipit-source-id: d0014ded40488a2be04d5a381e1d9815c7f0a638
Summary:
This diff adds a couple more things to our thrift interface:
1. Introduces JournalPosition
2. Adds methods to query the current JournalPosition and obtain a
delta since a given JournalPosition
3. Augments getMaterializedFiles to also return the current JournalPosition
4. Adds a method to evaluate a `glob` against Eden
5. Adds a method using thrift streaming to subscribe to realtime changes
Could probably finesse the naming a little bit.
The JournalPosition allows reasoning about changes to files that are not part
of an Eden snapshot. Internally the journal position is just the
SequenceNumber from the journal datastructures, but when we expose it to
clients we need to be able to distinguish between a sequence number from the
current instance of the eden service and a prior incarnation (eg: if the
process has been restarted, and we have no way to recreate the journal we need
to be able to indicate this to the client if they ask about changes in that
range). For the convenience of the client we also include the `toHash` (the
most recent hash from the journal entry) which is likely useful for the `hg`
dirstate operations; it is useful to know that the snapshot may have changed
since the last query about the dirstate.
The `getFileInformation` method returns the instantaneously available `stat()`
like information about the requested list of files. Since we simply don't
have historical data on how files in the overlay looked (only how they look
now), this method does not allow passing in a JournalPosition. When it comes
to comparing historical data, we will need to add an API that accepts two
snapshot hashes and generates the results from there. This particular method
is geared up to understanding the current state of the world; the obvious use
case is plugging in the file list from `getFilesChangedSince` into this
function to figure out what's what.
* Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC?
Why is there a glob method? It's to support a use-case in the watchman/buck
integration. I'm just sketching it out in the thrift interface at this stage.
In the future we also need to be able to express how to carry out a tree walk,
but that will require some query predicates that I don't want to get hung up on
specifying immediately.
Why is the streaming stuff in its own thrift file? We can't generate code for
it in java or perhaps also python. It's only needed to plumb data into
watchman so it's broken out into its own definition. Nothing depends on that
file yet, so it's probably not specified quite right. The important thing is
how the subscribe method looks: it's essentially the same as the method to
query a delta, but it keeps emitting deltas as they are produced. This is
another API that will benefit from query predicates when we get around to
specifying them.
I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the
appropriate snapshot ids in the journal entry; this will allow us to indicate
when we've checked out a new snapshot, or created a new snapshot. We have
no way to populate these yet; I commented on D3762646 about storing the
`snapshotID` that we have during `EdenServiceHandler::mountImpl` into either
the `EdenMount` or the proposed `RootInode` class. Once we have that we
can simply sample it and store it as we generate `JournalDelta`s.
Reviewed By: simpkins
Differential Revision: D3860804
fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
Summary:
This is pretty simplistic: we just wlock and add a delta for the set
of file(s) that were changed in a given fuse operation (this is typically 1
file, but rename affects 2).
To reduce boilerplate very slightly, I've added an initializer_list constructor
for JournalDelta that makes it less cumbersome to create a JournalDelta for a
list of files.
Reviewed By: simpkins
Differential Revision: D3866053
fbshipit-source-id: cd918e2c98c022d5ef79430cd8ab4aef88875239
Summary:
This implements a pretty simple change Journal and associated
JournalDelta.
The Journal is intended to be held in memory and not persisted to disk.
The idea is that we'll hold a `Synchronized<Journal>` along with the
other mount data and grab a `wlock` on it each time we want to add
a change record.
This diff doesn't change any other existing functionality.
Reviewed By: simpkins
Differential Revision: D3660162
fbshipit-source-id: a6b6fa28dd12e4d34718956167ee87f8cb2d89ca
Summary:
Adds a thrift call that returns the list of materialized entries from the whole tree.
This is intended to be plugged into the mercurial dirstate extension.
Reviewed By: simpkins
Differential Revision: D3851805
fbshipit-source-id: 8429fdb4eeccc32928e8abc154d4e6fd49343556