Commit Graph

29 Commits

Author SHA1 Message Date
Adam Simpkins
e96c3e1ca7 fully implement gitignore glob pattern matching
Summary:
Update the gitignore handling code to perform pattern matching the same way git
does.  Previously the code just called the standard fnmatch() function, which
does not handle "**" in patterns the same way git does.

This includes our own new implementation of glob pattern matching.  I did
evaluate several other options before writing our own implementation here:

- The wildmatch() code used by git (and watchman, and rsync) has a few
  downsides: it is not distributed by itself as a library anywhere else.
  Therefore we would probably have to include a copy of this code in our
  repository.  Making another copy is unfortunate, and somewhat undesirable
  from a legal and licensing perspective.  This code also only works with
  nul-terminated strings, and our code deals primarily with non-terminated
  StringPiece objects.

- I did look at translating glob patterns in to regular expressions and using
  re2 to perform matching.  Unfortunately re2 turns out to be substantially
  slower than wildmatch() for typical gitignore patterns.

This new implementation performs some preprocessing on the glob pattern, and
generates a pattern opcode buffer.  Eden can perform this glob preprocessing
when it first loads a .gitignore file, and can then save and re-use this result
each time it needs to match a filename.  Doing this preprocessing allows
matching to be done 50% to 100% faster than wildmatch() for typical glob
patterns.

Reviewed By: bolinfest

Differential Revision: D4194573

fbshipit-source-id: 46bc6a61b6d8066f4bbdb5d3e74265a3e72e42cc
2016-11-21 15:26:07 -08:00
Adam Simpkins
b7ff172fc6 initial framework for gitignore file handling
Summary:
This adds some initial code for handling gitignore files.

I did check to see if there were APIs from libgit2 that we could leverage for
this, but it does not look like we can easily use their functionality.  The
libgit2 ignore code seems to tightly coupled with their repository data
structures, and it requires that you actually have a git repository.

This code isn't quite 100% compatible with git's semantics yet.  In particular:

- For now we are just using fnmatch() to do the matching.  This is currently
  inefficient as we have to do string allocations on each match attempt.  This
  also doesn't quite match git's behavior, particularly with regard to "**"
  inside patterns.

- The code currently does not have a mechanism for indicating if a path refers
  to a directory or not, so trailing slashes in the pattern are not honored
  correctly.

We will probably need to implement our own fnmatch-like function in the future
to solve these issues.

Reviewed By: bolinfest

Differential Revision: D4156480

fbshipit-source-id: 8ceaefd3805358ae2edc29bfc316e5c8f2fb7d31
2016-11-21 15:26:07 -08:00
Michael Bolin
27c8864401 Allow initial state for Dirstate to be passed to the constructor.
Summary:
For now, initial state is represented by a
`std::unordered_map<RelativePath, HgUserStatusDirective>`.

Reviewed By: simpkins

Differential Revision: D4123461

fbshipit-source-id: 83a99e1f504dd1efca1bc1ed33cbc3f116787a80
2016-11-18 19:26:04 -08:00
Michael Bolin
12eac0f5db Implement Dirstate::remove().
Summary:
This adds the logic to power `hg rm`. There are comprehensive tests that attempt to cover
all of the edge cases.

This evolved to become a complex change because I realized that I needed to change
my internal representation of the dirstate to implement it properly. Specifically, we now maintain
a map (`userDirectives`) of files that have been explicitly scheduled for change via `hg add` or `hg rm`.

To compute the result of `hg status`, we find the changes between the manifest/root tree
and the overlay and also consult `userDirectives`. `Dirstate::getStatus()` was updated
considerably as part of this commit due to the introduction of `userDirectives`.

As such, `Dirstate::remove()` must do several things:
* Defend the integrity of the dirstate by throwing appropriate exceptions for invalid inputs.
* Delete the specified file, if appropriate.
* Update `userDirectives`, if appropriate.

Although `Dirstate::add()` was not the focus of this commit, it also had to be updated to
match the pattern introduced by `Dirstate::remove()`.

Some important features that are still not supported are:
* Handling ignored files correctly.
* Storing copy/move information.

Reviewed By: simpkins

Differential Revision: D4104503

fbshipit-source-id: d5d45a279e16ded584c6cd4d528ba92d2c8e2993
2016-11-18 19:26:04 -08:00
Michael Bolin
ff20e0f5c0 Introducing Hg Dirstate abstraction in C++.
Summary:
This is the start of the C++ dirstate implementation. It's possible that this
commit does too many things at once:
* Introduces `Dirstate` type.
* Includes logic for serializing/deserializing the dirstate's data so that it persists across Eden restarts.
* Includes logic for basic `hg add` calls.
* Includes unit tests where we model Eden usage via the TestMount utility.

I'm backing this up in Phabricator with `--plan-changes` to start until I get
some basic `hg add` functionality working end-to-end. When that looks good, I'll
determine if/how this should be split into smaller commits.

Reviewed By: wez

Differential Revision: D4023232

fbshipit-source-id: 7fc931d547ccadb34f7caae93bc4eb8f91f6ceb8
2016-11-01 17:49:08 -07:00
Michael Bolin
f5f9545bd3 Introduce getTreeForDirectory helper function.
Summary:
This is analogous to the existing `getEntryForFile()` helper function that we
have, and I was able to rewrite `getEntryForFile()` in terms of
`getTreeForDirectory()`, which simplifies the code considerably.

Also moved things from `eden/fs/model/hg/misc.h` to
`eden/fs/store/ObjectStores.h`, which is much more appropriate.

Reviewed By: wez

Differential Revision: D4032817

fbshipit-source-id: ff4d32120fb050f8b5c5c53b7f2e94b524781648
2016-10-21 13:32:02 -07:00
Michael Bolin
1953391b36 Introduce TreeEntry.getMode() because getOwnerPermissions() was not doing the expected thing.
Summary: This will make it easier to compare a `TreeEntry` with a `TreeInode::Entry`.

Reviewed By: simpkins

Differential Revision: D4034298

fbshipit-source-id: 29674e2902661bf46394ea71b81537b35bd4b107
2016-10-19 10:54:11 -07:00
Michael Bolin
6f14b7f6d0 Fix "heap-use-after-free" issues in misc.cpp and miscTest.cpp.
Summary:
This was reported by ASAN.

The major issue was that `FakeObjectStore` was returning a copy of a `Tree`,
so it was not the case that the `TreeEntry*` returned by `getEntryForFile()`
was guaranteed to be "owned by" the `Tree* root` that was passed in. To address
this, we change `getEntryForFile()` to now return a copy of the `TreeEntry*`
that it gets back from `getEntryPtr()`. It really comes down to this line:

```
auto entry = currentDirectory->getEntryPtr(piece.basename());
```

because we cannot guarantee that `currentDirectory` will live past the end of
`getEntryForFile()`, so we cannot guarantee that return return value of
`currentDirectory->getEntryPtr()` will, either.

Special thanks to meyering and yfeldblum for helping me debug this.

Reviewed By: simpkins

Differential Revision: D4024627

fbshipit-source-id: 6295e6f2b1d2f544271b2aebad27a4ad3ae04563
2016-10-14 17:53:18 -07:00
Michael Bolin
670b69cc6b Introduce getEntryForFile() utility function.
Summary:
Utility function that given a `Tree` and a `RelativePathPiece`, returns the
corresponding `TreeEntry` in the `ObjectStore`, if it exists.

Reviewed By: wez

Differential Revision: D3980261

fbshipit-source-id: 2808a4ca45be84e3a6bb91b0cf2db19a3bf88798
2016-10-14 10:41:29 -07:00
Michael Bolin
04932226b7 Make facebook::eden::Hash hashable.
Summary: This is necessary so that it can be used as the key in an `unordered_map`.

Reviewed By: simpkins

Differential Revision: D3980575

fbshipit-source-id: d225a98f957f9aae2f2f50a6cc365011d953c92e
2016-10-12 15:53:30 -07:00
Michael Bolin
ce7d1cdd3b Add a test for Tree.getEntryPtr().
Summary:
Apparently we did not have an existing unit test for `Tree`, so this adds one.
The other methods should be tested, as well, but I'm about to use `getEntryPtr()`
elsewhere, which is why I just focused on this one for the moment.

Reviewed By: simpkins

Differential Revision: D3980150

fbshipit-source-id: 33456fd621a1894606605af4fee06ba42d124752
2016-10-06 22:20:35 -07:00
Wez Furlong
8b41b90108 sample the snapshot id in the journal at mount time
Summary:
This just populates the initial snapshot hash in the journal.

The `addDelta` method will propagate this into subsequent deltas if the delta
to be added has hash values that have not been set from the default 0-filled
hash values.

Reviewed By: simpkins

Differential Revision: D3872936

fbshipit-source-id: d0014ded40488a2be04d5a381e1d9815c7f0a638
2016-09-26 13:52:25 -07:00
Andrew Gallagher
a0ad9681a2 codemod: add explicit headers parameter for C/C++ rules under
Summary:
This codemods `TARGETS` under `[a-d]*` directories in fbcode to make
the `headers` parameter explicitly refer to `AutoHeaders.RECURSIVE_GLOB`.

Reviewed By: yfeldblum

Differential Revision: D3801845

fbshipit-source-id: 715c753b6d4ca3a9779db1ff0a0e6632c56c0655
2016-09-01 10:26:38 -07:00
Caren Thomas
adc13d4ed6 make put and get for trees/blobs symmetric
Summary: This change updates LocalStore to perform serialization of trees and blobs internally so that its users don't need to be aware of the internal serialization format. Previously, the get and put APIs were asymmetric such that the get APIs returned deserialized Tree and Blob objects, while put required raw serialized bytes. After this change, put will also use deserialized Tree and Blob objects.

Reviewed By: simpkins

Differential Revision: D3589899

fbshipit-source-id: 2e572e6ec5af44d66206b178a03f7a9d619b2290
2016-07-25 12:34:25 -07:00
Wez Furlong
af0c18bd0d eden: ensure that TreeEntry's are imported in sorted order
Summary:
Mercurial maintains its manifest in sorted order, but since the manifest only tracks file names we can end up with the following sequence:

```
some/path-foo/bar
some/path/bar
```

This is because the `-` sorts ahead of the `/`.

This diff defers passing the entries to the tree serializer, buffering them up
into a temporary vector and using `std::lower_bound` to find the appropriate
insertion point.

Reviewed By: bolinfest

Differential Revision: D3529329

fbshipit-source-id: 395ed16a20c14d17717ec69192a38f0407b51e1d
2016-07-07 13:37:49 -07:00
Wez Furlong
798f4bda58 eden: introduce Tree::getEntryPtr(PathComponent)
Summary:
This eliminates a linear scan from TreeInode and replaces it with a
binary search, exploiting the sorted order of the entries vector.

Two new methods are introduced: getEntryPtr which returns a pointer to the
entry with the matching name, and getEntryAt() which returns a reference
(throwing a range error if there is no such entry).

I wanted to use the PathMap class here, but that would cause us to duplicate
the name string as both the key and value in the map.

Reviewed By: bolinfest

Differential Revision: D3515723

fbshipit-source-id: 4ee0371f3ec08cbcf110cf28f5c1e1529b120fb6
2016-07-05 17:42:14 -07:00
Adam Simpkins
c51e282dfb import git objects on demand
Summary:
This moves git import logic from the GitImporter class to GitBackingStore.
The logic is simpler now, since GitBackingStore only needs to import a single
Tree or Blob at a time.

Reviewed By: bolinfest

Differential Revision: D3448752

fbshipit-source-id: da2d59f953ada714d8512545ae83dd48e5d3e410
2016-06-20 11:45:09 -07:00
Yedidya Feldblum
d950fdeaed Wrappers for some of OpenSSL's crypto hash functions
Summary:
[Folly] Wrappers for some of OpenSSL's crypto hash functions.

Wraps some of the OpenSSL crypto hash functions with variants that take `ByteRange` for input and `MutableByteRange` for output, and also variants that take `const IOBuf&` for input as well.

These are a bit nicer to use than passing pointers and lengths separately.

Reviewed By: ivmaykov

Differential Revision: D3434562

fbshipit-source-id: 3688ef11680a029b7664ac417a7781e70f9c6926
2016-06-16 18:30:50 -07:00
Adam Simpkins
183b6f208e add some debug logging in ObjectStore.cpp
Summary:
Add some verbose logging about when trees and blobs are loaded in the object
store.

Reviewed By: bolinfest

Differential Revision: D3434182

fbshipit-source-id: 3e8d2617290604f119e6164d15d63324a4c9a2aa
2016-06-15 14:24:12 -07:00
Adam Simpkins
fab40060f1 unbreak gcc-4.9 builds
Summary:
D3406773 included a change which compiles on clang and gcc-5.x, but fails to
build with gcc-4.9.

This looks like a bug in gcc-4.9's handling of list initialization.  Overload
resolution for non-initializer-list constructors should be attempted if
no suitable initializer-list constructors are found, but gcc-4.9 does not
appear to do this.

Reviewed By: bolinfest

Differential Revision: D3410142

fbshipit-source-id: f34125000eb3fa949c2427aa4ffbd4ef92942cd7
2016-06-09 22:15:05 -07:00
Adam Simpkins
e7a8605e0d update deserializeGitBlob() to accept an IOBuf
Summary:
Update deserializeGitBlob() to work on an IOBuf, rather than an rvalue
reference to a string.

The ugliness about having to wrap a std::string in a managed IOBuf is now
hidden inside the StoreResult class, rather than being something that the
GitBlob code has to know about.

Reviewed By: bolinfest

Differential Revision: D3403977

fbshipit-source-id: 0c58c019557050d6e201c1a462fa051c2526674a
2016-06-08 19:01:14 -07:00
Adam Simpkins
4147c7b937 make Hash objects assignable, and add a default constructor
Summary:
Previously Hash objects could not be assigned to after they were created, since
they contained a const member.  This makes the data non-const, so a Hash
variable can be replaced to contain new contents after it is created.

This also adds a default constructor, which zero-initializes the hash.  The
default constructor makes it possible to declare a Hash with a 0-value at one
location, and then set it to the desired value at some later point.

Reviewed By: bolinfest

Differential Revision: D3406773

fbshipit-source-id: 41e2c7e3ad5bc4d14813be4adaa03866701380f6
2016-06-08 16:16:59 -07:00
Adam Simpkins
947dc27e3e use std::array when possible
Summary:
Update several places to use std::array rather than plain C arrays, using
folly::make_array() to automatically deduce the correct type when necessary.

Reviewed By: wez

Differential Revision: D3370445

fbshipit-source-id: b7642cf3a9b08eac817988bf95679bf5e584ef72
2016-06-08 00:15:22 -07:00
Adam Simpkins
5b65743a38 update deserializeGitTree() to work with IOBuf
Summary:
Update deserializeGitTree() to accept an IOBuf object.  IOBuf objects can
easily wrap other buffers, so this can still easily support ByteRange objects
as well.

Being able to use IOBuf's Cursor class ended up simplifying the logic a bit as
well.

Note that using IOBuf does require copying the name and mode data out of the
buffer when we read it (using the readTerminatedString() API).  This is
necessary since the data may not be stored contiguously in the IOBuf.  However,
this shouldn't impact performance much: we already need to copy the name data
into a std::string anyway.  For the mode, most modern platforms can avoid doing
a heap allocation for this small string.

Reviewed By: bolinfest

Differential Revision: D3357255

fbshipit-source-id: 5b6e1bc93199849327409a8039266d7dc4f3afdf
2016-06-08 00:15:22 -07:00
Adam Simpkins
d414ee1021 add logic for serializing git trees
Summary: Add a GitTreeSerializer class for serializing git tree data.

Reviewed By: bolinfest

Differential Revision: D3356770

fbshipit-source-id: d04bc9788117272504c2faa335b3648e4ac93e81
2016-06-08 00:15:21 -07:00
Adam Simpkins
c769088f16 add Hash::sha1() factory functions
Summary:
Add some static helper functions to create Hash objects by running a SHA1 hash
on input data.

Reviewed By: wez, bolinfest

Differential Revision: D3354594

fbshipit-source-id: 6d6bfb835175e7a25c1e6e2539438bee5887a863
2016-05-27 16:36:14 -07:00
Adam Simpkins
106717e4e7 update Hash::getBytes() to return a folly::ByteRange
Summary:
Change Hash::getBytes() to return a folly::ByteRange rather than a
std::array<uint8_t, 20>.  This makes Hash more convienent to use with existing
APIs that accept a ByteRange.  (For instance, IOBuf.)

There were only 2 call sites using the existing getBytes() functionality,
and they only used the data() method on the returned std::array, so they don't
have to be updated at all to use a ByteRange.

Reviewed By: bolinfest

Differential Revision: D3354581

fbshipit-source-id: 8f2a3c196e59620fb5b0fb2caf4d1d7f26e1d2c4
2016-05-27 16:36:14 -07:00
Facebook Github Bot 8
83f42a9fa6 Include build files that were inadvertently excluded from the initial export.
fbshipit-source-id: 2c76f0d5e55d84859ad9f4841cbe6994a62446f8
2016-05-12 16:08:34 -07:00
Facebook Github Bot 5
2eeea32117 Initial commit
fbshipit-source-id: 2bcefbd0cd127cc5ea982e074ea6819d7aac3d7a
2016-05-12 14:09:13 -07:00