Commit Graph

162 Commits

Author SHA1 Message Date
Xavier Deguillard
3307c3ac79 inodes: no longer follow symlinks for gitignore
Summary:
Starting with Git 2.32
(https://github.com/git/git/blob/master/Documentation/RelNotes/2.32.0.txt#L7),
gitignore symlinks are no longer followed, and auditing the entire codebase
shows that this feature is not used (or easily replaced).

Removing support for symlinked gitignore makes us closer in behavior to Git,
but also allows the code to be simplified a bit. When diffing trees, inodes
were being loaded in order to read the gitignore files, which had a few
drawbacks:
 - Loading inodes causes slower updates on Windows and macOS,
 - The behavior was likely incorrect as the content of the gitignore read came
   from the working copy state, not from the trees diffed.
Since handling of symlinks was the major benefit of loading inodes and this
behavior is no longer needed, the store code can simply load the correct
gitignore from the ObjectStore, eliminating the drawbacks listed above.

Reviewed By: chadaustin

Differential Revision: D37355747

fbshipit-source-id: 569478a958ff01cf9fbef01508008d8b29e0c056
2022-07-13 19:30:34 -07:00
Katie Mancini
4898fa44e5 fix error reporting
Summary: errors were getting bubbled up incorrectly, fix this by changing thenValue to thenTry

Reviewed By: chadaustin

Differential Revision: D37676816

fbshipit-source-id: b8722aa311829bfa3ba785bd1e55c458e03addba
2022-07-11 14:03:34 -07:00
Katie Mancini
7a6eea67ab get attributes for InodeOr
Summary:
We want to be able to expose attribute information for all the entries in a
directory with the readdir endpoint. This will allow Buck2 to access attributes
it needs while listing directories (filetype) and experiment with fetching all
the data it ever needs at once.

Now we have InodeOrTreeOrEntry for each child we can retrive attribute
information from here for each child.

note: this two phase collection of data looks safe to
renames/deletions/additions. The InodeOrTreeOrEntry have sufficient information
to lookup attributes directly. So we should not end up in a situation where
we get an entry, but then can not lookup attributes for it. The children we list
will be from a consitent state for the directory, but the data may be a bit
stale if there are concurrent modifications to the inodes (again renames,
removals, additons).

Reviewed By: xavierd

Differential Revision: D37444846

fbshipit-source-id: 3c7f33bbec60fb83e4a46403dce715a188ab47f3
2022-07-11 14:03:34 -07:00
Chad Austin
b26418ef01 migrate to throw_ and throwf
Summary:
I've always been annoyed at needing to include folly/Conv.h or fmt to
throw a formatted exception. And while exceptions are rarely in the
hot path, it's still worth having the option to optimize the
formatting and exception allocation step at some point in the future.

I'm sure I missed some, but this diff migrates most of our
`throw T{fmt::format(...)}` patterns to `throwf` and
`throw T{folly::to<std::string>(...)}` to `throw_`.

Reviewed By: genevievehelsel

Differential Revision: D37229822

fbshipit-source-id: c053b5cdaed99e58c485afd2a99be98828f07657
2022-07-08 13:30:53 -07:00
Katie Mancini
d708ea4229 add file type to attributes endpoint
Summary:
Directory listing is slower on NFS that FUSE because readdir returns basically
no info (names of children only). That means most the time clients must
lookup and getattr each entry to actually learn any info about them.

or we Turn on readdir plus which is logically equivalent currently.

buck2 relies on directory listing, and this means buck2 is less performant on
NFS than fuse.

From prototyping thrift is significantly faster. And it allows us to avoid
loading inodes :). So checkout afterwards will also be much faster.

However, buck2 will need filetypes exposed to be able to use thrift. so we are
adding filetype to the attributes api.

we could expose mode bits directly, but buck2 only needs filetype
(regular/symlink/executable) currently.

Reviewed By: fanzeyi

Differential Revision: D35835655

fbshipit-source-id: a38f024edf74496aff0639bf509256700e21f308
2022-06-01 12:06:35 -07:00
Xavier Deguillard
db078643a3 model: use PathMap in Tree
Summary:
One of the unchecked and not documentation assumption of Tree is that they must
be sorted the same way as a TreeInode. Unfortunately, there is no guarantee
that the TreeEntry in a Tree are sorted, and the only place where this
guarantee is upheld... is in the testharness test library.

To solve this, the best way is to use the same underlying datastructure in both
TreeInode and Tree: a PathMap. This is guarantee to be sorted, and it properly
takes into account case sensistivity.

Reviewed By: chadaustin

Differential Revision: D36257252

fbshipit-source-id: 0fadd3c84d8e95631a8293c6797544ea94046a10
2022-05-19 18:57:33 -07:00
Xavier Deguillard
1cff43c9ec store: make the ObjectStore aware of case sensitivity
Summary:
One of the not well documented and already broken assumption in EdenFS is that
the ordering of TreeEntry in a Tree and the ordering of DirEntry in a TreeInode
is the same. Both the diff code and the checkout code relies on it to be the
same to decide whether entries are the same, or added, or removed.

The issue is that the ordering in a Tree is first not guaranteed, second never
checked and third fully dependant on the BackingStore which may or may not
respect this property. Today, we're somewhat lucky that the ordering is more or
less the same. It however breaks down on case insensitive mounts, where
insertion of an entry into a TreeInode heavily depends on the order and the
case!

Fixing this is unfortunately not as simple as just the insertion order, as this
breaks the ordering assumption of TreeEntry, and thus both checkout and diff
fail spectacularly. The proper fix is to ensure that TreeEntry are stored in a
PathMap, and then the PathMap can be fixed. A first step towards this goal is
to make sure Tree are case aware and thus needs to be constructed with a
CaseSensitivity argument. For now, this is only used in getEntryPtr, but in the
future, it'll be used to construct the PathMap.

One downside of adding an argument to the Tree constructor is that all call
sites needs updating. For now, I've made the choice of not doing this, and
instead hacking it up in the entry point for the backingstore: the ObjectStore.
This is extremely ugly, but is very pragmatic: on Windows and macOS, conversion
of case sensitivity will almost never happen.

In subsequent diffs, the BackingStores, LocalStore and TreeCache will be
converted to build Tree with the right case from the beginning.

Reviewed By: chadaustin

Differential Revision: D36257251

fbshipit-source-id: 17590f9a9f3608fa3a3dbef39b10eefbece4c745
2022-05-19 18:57:33 -07:00
Xavier Deguillard
26de666c16 model: remove getName from TreeEntry
Summary:
Now that the name is returned from Tree::find and stored alongside the
TreeEntry, we no longer need the TreeEntry type to hold a PathComponent. Let's
thus get rid of it.

Reviewed By: genevievehelsel

Differential Revision: D36215508

fbshipit-source-id: 133d1eb640944c673fff870f1a5ef821313c288e
2022-05-11 20:24:35 -07:00
Xavier Deguillard
c4338e208b model: move Tree closer to the PathMap API
Summary:
With Tree bound to be backed by PathMap in the near future (see previous diff
for why), we need to unify their API to ease this transition.

One drawback of this diff is that we now store the PathComponent twice: once in
the std::vector, and once in the TreeEntry. In a future diff, the PathComponent
from TreeEntry will be removed, solving this issue.

Reviewed By: chadaustin

Differential Revision: D36215509

fbshipit-source-id: 12390c825ef706f13481d832f255a5a0d8770717
2022-05-11 20:24:35 -07:00
Xavier Deguillard
114afbad3a model: remove an unecessary std::move in Tree:tryDeserialize
Summary:
The caller of this function wants to return a std::unique_ptr, thus instead of
using an optional to denote failure to deserialize, we can use a nullptr to
encode the same information. This prevents the caller from moving the Tree into
a unique_ptr if the deserialization succeded.

Reviewed By: genevievehelsel

Differential Revision: D36179488

fbshipit-source-id: 735fd6480faa8a520e538d065c68996cc3bb7615
2022-05-05 17:46:04 -07:00
Xavier Deguillard
a6a8e73b5d model: remove Tree::getEntryAt
Summary: This is only used in test, we can use getEntryPtr instead there.

Reviewed By: genevievehelsel

Differential Revision: D36144078

fbshipit-source-id: 3bb6a2d6df1904b154b4fe60312ea2e2cd33554e
2022-05-05 17:46:04 -07:00
Xavier Deguillard
c00eadbed8 model: remove Tree::getEntryNames
Summary: This is only used in a single test, no need to have this in the Tree class.

Reviewed By: genevievehelsel

Differential Revision: D36140131

fbshipit-source-id: 4d4a4641353129e7f239f6c5db08d54dae317e6f
2022-05-05 17:46:04 -07:00
Xavier Deguillard
aabecc7937 model: remove GitTreeSerializer
Summary:
We can serialize tree in a different format, and since the deserializer knows
how to properly detect whether a tree is in the git or custom format, we can
simply get rid of the GitTree serializer.

Reviewed By: genevievehelsel

Differential Revision: D36140129

fbshipit-source-id: b549e39aa10bf94be61ab88e151ed300a52d5269
2022-05-05 13:17:16 -07:00
Xavier Deguillard
e94cc122b2 model: remove Tree::getEntryAt(size_t)
Summary:
This is only used in tests and would prevent switching the Tree type to use
PathMap internally. Let's thus remove the method.

Reviewed By: chadaustin

Differential Revision: D36119092

fbshipit-source-id: eedc70c916bfadc16f0233a7ef430e2b6af83969
2022-05-05 08:45:09 -07:00
Chad Austin
1b34c8026e use BackingStore to translate object IDs to human readable representations
Summary:
The BackingStore is responsible for converting between ObjectId and
human readable representations. Add an ObjectIdCodec just like
RootIdCodec to perform that conversion, and migrate our Thrift
handlers to use it.

Reviewed By: kmancini

Differential Revision: D33202497

fbshipit-source-id: cc499ec5e53c7626eb27528538c6e9d94404b468
2022-03-03 17:17:25 -08:00
Chad Austin
b1e51686ba fix some laziness in the include structure
Summary:
When we migrated from Hash20 to ObjectId, we didn't fix the #include structure.
Clean that up.

Reviewed By: genevievehelsel

Differential Revision: D32977635

fbshipit-source-id: 202b02f01f22bc174c7559c22af081deb2945caa
2022-03-03 12:11:31 -08:00
Xavier Deguillard
a29d465ee8 fs: fix license header
Summary:
With Facebook having been renamed Meta Platforms, we need to change the license
headers.

Reviewed By: fanzeyi

Differential Revision: D33407812

fbshipit-source-id: b11bfbbf13a48873f0cea75f212cc7b07a68fb2e
2022-01-04 15:00:07 -08:00
Chad Austin
ad1036bbc2 require explicit ObjectId when constructing Tree
Summary:
While we'd like to change this, Trees require an ObjectId today. Don't
provide a default, and make every ObjectId explicit.

Reviewed By: genevievehelsel

Differential Revision: D32895654

fbshipit-source-id: b45f348101269d80ac7e0dfd6f651be2a49ecc63
2021-12-08 18:54:06 -08:00
Michael Cuevas
57ca56df18 Move BlobMetadata.h from fs/store to fs/model
Summary: Used codemod to modify all references to eden/fs/store/BlobMetadata.h. Changed them to eden/fs/model/BlobMetadata.h and moved the file to the model director

Reviewed By: genevievehelsel

Differential Revision: D32397203

fbshipit-source-id: 0c17b96d1aa33ba32fefe43ea5c29a645c577765
2021-11-17 11:31:26 -08:00
Andrey Chursin
fd1743308f introduce new format for tree serialization [proxy hash removal 8/n]
Summary:
This diff introduces new format for serializing trees in local store.

Unlike currently used git serialization format, new format supports variable length object ids.

Long term this will be replaced with using mercurial as a source of truth, but for now we need this to roll out proxy hash replacement.

Reviewed By: chadaustin

Differential Revision: D31998935

fbshipit-source-id: 9aacfbea631e75c0ea1421094a0b1ae255adb04a
2021-11-10 09:40:45 -08:00
Andrey Chursin
08f337f7ab embed proxy hashes into object id [proxy hash removal 7/n]
Summary:
This diff introduces config store:embed-proxy-hashes.

When this config is set, we store HgId directly into ObjectId, instead of using proxy hash object.
This allows to bypass proxy hash rocks db storage when reading files.

**Compatibility notes**

This diff is compatible with previous versions unless store:embed-proxy-hashes config is set.

Once config is set, new ObjectId format is used and serialized into inodes. Once this is done previous versions of eden fs won't be able to read overlay inodes created by this version.

This means we need to be careful with setting this config - once set we won't be able to roll back eden fs version easily, it will basically require re-creating eden checkout.

Inodes created prior to this config being set will remain written in old format, only when new inode is written is when new format is used.

**Git tree format issue**

We use git tree serialization format in the LocalStore to serialize trees.
This format assumes 20-byte hashes and is not compatible with variable length ObjectId.

In this diff we bypass this issue by not storing trees into local store. This seem ok in terms of correctness, because tree information can always be fetched from mercurial.

However, this seem to impose performance penalty on some work loads (see below).

We can solve this by either introducing new format that supports var length object id(short term), or by getting rid of tree cache and efficiently getting the data directly from mercurial(long term).

**Performance numbers**

Hot file access time is reduced by 50%:
```
$ fsprobe.sh run cat.targets

Before:
lat: 0.2331 ms, qps: 4, dur: 28.697384178s, 123092 files, 217882490 bytes, 1641 errors, rate 7.59 Mb/s

After:
lat: 0.1611 ms, qps: 6, dur: 19.835917353s, 123092 files, 217882490 bytes, 1641 errors, rate 10.98 Mb/s
```

However, we do not see improvement with arc focus, most likely due to bypassing tree serialization, so we will need to figure out that issue.

We can still merge this diff and see if enabling this feature on other workloads like sandcastle is benefitical.

Reviewed By: chadaustin

Differential Revision: D31777929

fbshipit-source-id: fc4b678477d0737c9f242968f0be99ed04f4f58a
2021-11-05 17:05:43 -07:00
Andrey Chursin
7e2bffe4ac Remove ObjectId(hex) constructor [proxy hash removal 6/n]
Summary: As discussed in previous diff, this removes constructor from hex string and adds ObjectId(fbstring) constructor

Reviewed By: chadaustin

Differential Revision: D31841234

fbshipit-source-id: c36ae315ad3a6eaecfd47889588c2bd18928aafb
2021-10-25 20:06:30 -07:00
Andrey Chursin
edc522575e Support var length hashes in tree metadata [proxy hash removal 5/n]
Summary: This diff supports variable length hashes in serialized tree metadata. See comments to serialize/deserialize functions for description of the format

Reviewed By: chadaustin

Differential Revision: D31701196

fbshipit-source-id: 24d7630d4574842e3888b56b7291e313d8e35e55
2021-10-22 17:52:02 -07:00
Andrey Chursin
c24bb6096b Make ObjectId a variable length hash [proxy hash removal 4/n]
Summary:
This diff modifies ObjectId structure to support storing more information directly in ObjectId, avoiding proxy objects like HgProxyHash and ReCasDigestProxyHash.

Storing this information directly in ObjectId allows to avoid additional db access for loading/storing those proxy hash objects, reducing file access latency in hot case.

**New ObjectId format**

ObjectId can now content variable length hash.

The variable length can be used in following cases:
(1) To replace ReCasDigestProxyHash, extra 8 bytes can be used to store size portion of `remote_execution::TDigest`
(2) To replace HgProxyHash, extra 1 byte can be used to identify whether ObjectId represents ProxyHashId or an HgId.

In the future, ObjectId can contain path or other information needed to implement ACL

**Compatibility notes**

For compatibility reasons, we only currently initialize ObjectId with 20-bytes content.

This is essential to allow smooth migration to a new format
(a) Until new hash format is actually used(e.g. we switch HgBackingStore to store HgId inside ObjectId), this code is backwards compatible with other EdenFs versions. This revision can be safely rolled back and previous version of EdenFs can still read inode data written by this version
(b) When we introduce support for new ObjectId into HgBackingStore, we can gate it's usage behind config, allowing controlled slow roll out of new ObjectId format.

**ToDo**

Just to track progress of removing proxy hashes, few things are still left:
* We need different format for SerializedTreeMetadata
* We need to support new format for thriftHash
* We need to actually switch HgBackingStore to embed HgId information into ObjectId, instead of using ProxyHash

Not planned:
* Migration of ReCasDigestProxyHash - I don't know how to test it, so someone else should probably do that

Reviewed By: chadaustin

Differential Revision: D31668130

fbshipit-source-id: 720127354c648651bb35e850beb8dd252a5566b2
2021-10-22 17:52:01 -07:00
Chad Austin
dc37a97e19 rename ObjectId::toString to toLogString
Summary:
ObjectId::toString was too ambiguous of a name which makes it hard to
reason about the correctness of all of its uses as we generalize the
meaning of object identifiers.

Rename toString to toLogString, and change a bunch of call sites to
use the iostream, folly::to, or fmt formatters.

For the one place that knows the format of the ID but wants its hex
representation, use asHexString. This gives us the freedom to change
the log representation of ObjectIds in the future without worrying
about breakage.

Reviewed By: andll

Differential Revision: D31776897

fbshipit-source-id: 8160ab581f260b3bb955afbad65d6dbe8486e3bc
2021-10-19 18:58:52 -07:00
Andrey Chursin
ae684f3993 explicit Hash20 instead of Hash [proxy hash removal 2/n]
Summary:
This is fairly mechanical diff that finalizes split of Hash into ObjectId and Hash20.

More specifically this diff does two things:
* Replaces `Hash` with `Hash20`
* Removes alias `using Hash = Hash20`

Reviewed By: chadaustin

Differential Revision: D31324202

fbshipit-source-id: 780b6d2a422ddf6d0f3cfc91e3e70ad10ebaa8b4
2021-10-01 12:43:26 -07:00
Andrey Chursin
0af2511a3f separate out ObjectId [proxy hash removal 1/n]
Summary:
The goal of this stack is to remove Proxy Hash type, but to achieve that we need first to address some tech debt in Eden codebase.

For the long time EdenFs had single Hash type that was used for many different use cases.

One of major uses for Hash type is identifies internal EdenFs objects such as blobs, trees, and others.

We seem to reach agreement that we need a different type for those identifiers, so we introduce separate ObjectId type in this diff to denote new identifier type and replace _some_ usage of Hash with ObjectId.

We still retain original Hash type for other use cases.

Roughly speaking, this is how this diff separates between Hash and ObjectId:

**ObjectId**:
* Everything that is stored in local store(blobs, trees, commits)

**Hash20**:
* Explicit hashes(Sha1 of the blob)
* Hg identifiers: manifest id and blob hg ig

For now, in this diff ObjectId has exactly same content as Hash, but this will change in the future diffs. Doing this way allows to keep diff size manageable, while migrating to new ObjectId right away would produce insanely large diff that would be both hard to make and review.

There are few more things that needs to be done before we can get to the meat of removing proxy hashes:

1) Replace include Hash.h with ObjectId.h where needed
2) Remove Hash type, explicitly rename rest of Hash usages to Hash20
3) Modify content of ObjectId to support new use cases
4) Modify serialized metadata and possibly other places that assume ObjectId size is fixed and equal to Hash20 size

Reviewed By: chadaustin

Differential Revision: D31316477

fbshipit-source-id: 0d5e4460a461bcaac6b9fd884517e129aeaf4baf
2021-10-01 10:25:46 -07:00
Chad Austin
855e94b4df add missing headers
Summary:
VC++ 2019 is pickier about which standard library includes include
each other. Be explicit.

Reviewed By: zhengchaol

Differential Revision: D31186916

fbshipit-source-id: 95cfa8848d0e2e312e2024923fa166db5f68dde0
2021-09-27 17:01:18 -07:00
Chad Austin
49e49f9fc2 replace most folly:format uses
Summary:
folly:format is deprecated in lieu of fmt and std::format. Migrate
most of EdenFS to fmt instead.

Differential Revision: D31025948

fbshipit-source-id: 82ed674d5e255ac129995b56bc8b9731a5fbf82e
2021-09-20 16:23:22 -07:00
Chad Austin
7ef95f6d82 add an ObjectId type
Summary:
To eliminate the need for proxy hashes, we need variable-width object
IDs. Introduce an ObjectId type much like RootId.

Reviewed By: genevievehelsel

Differential Revision: D30819412

fbshipit-source-id: 07a185ba6b866b475c92f811e70aa00a8a9f895f
2021-09-13 17:21:01 -07:00
Chad Austin
a4ba22dc48 rename Hash to Hash20
Summary:
In preparation for expanding to variable-width hashes, rename the
existing hash type to Hash20.

Reviewed By: genevievehelsel

Differential Revision: D28967365

fbshipit-source-id: 8ca8c39bf03bd97475628545c74cebf0deb8e62f
2021-09-08 16:27:10 -07:00
Xavier Deguillard
fe0ea26fdf store: avoid copying proxy hashes during prefetch
Summary:
Looking at strobelight when performing an `eden prefetch` shows that a lot of
time is spent copying data around. The list of hash to prefetch is for instance
copied 4 times, let's reduce this to only one time when converting Hash to a
ByteRange.

Reviewed By: chadaustin

Differential Revision: D30433285

fbshipit-source-id: 922e6e5c095bd700ee133e9bb219904baf2ae1ac
2021-08-23 11:05:02 -07:00
Chad Austin
29c5aef912 model: namespace facebook::eden
Summary: C++17

Reviewed By: fanzeyi

Differential Revision: D28966916

fbshipit-source-id: baf8bec7b211ecf18d3fc3edf79a7c2de6b5aa68
2021-06-08 19:29:37 -07:00
Chad Austin
bb1cccac89 introduce a variable-width RootId type that identifies the root of an EdenFS checkout's contents
Summary:
Backing stores differentiate between individual tree objects and the
root of a checkout. For example, Git and Mercurial roots are commit
hashes. Allow EdenFS to track variable-width roots to better support
arbitrary backing stores.

Reviewed By: genevievehelsel

Differential Revision: D28619584

fbshipit-source-id: d94f1ecd21a0c416c1b4933341c70deabf386496
2021-06-07 17:25:31 -07:00
Chad Austin
894eaa9840 move root ID parsing and rendering into BackingStore
Summary:
The meaning of the root ID is defined by the BackingStore, so move
parsing and rendering into the BackingStore interface.

Reviewed By: xavierd

Differential Revision: D28560426

fbshipit-source-id: 7cfed4870d48016811b604348742754f6cdbd842
2021-06-03 11:07:14 -07:00
Chad Austin
df90f5626e stop tracking parent2
Summary:
EdenFS goes out of its way to track the second working copy parent,
but it never uses it. Stop writing it to the SNAPSHOT file.

Reviewed By: genevievehelsel

Differential Revision: D28453213

fbshipit-source-id: d7d36a1c67553f92234bec911051f4f1d4ef1d4a
2021-05-21 10:53:16 -07:00
Katie Mancini
85942cfaad use portable version of gtest
Summary:
gtest includes some windows headers that will have conflicts with the
folly portability versions. This caused some issues in my in-memory tree
cache diffs (D27050310 (8a1a529fcc)).

We should probably generally be using the folly portable gtests so we can
avoid such issues in the future.

see here for more details: bd600cd4e8/folly/portability/GTest.h (L19)

I ran this with codemod yes to all

- convert all the includes with quotes:
`codemod -d eden/fs --extensions cpp,h '\#include\ "gtest/gtest\.h"' '#include <folly/portability/GTest.h>'`

- convert all the includes with brackets
`codemod -d eden/fs --extensions cpp,h '\#include\ <gtest/gtest\.h>' '#include <folly/portability/GTest.h>'`

- convert the test template
`codemod -d eden/facebook --extensions template '\#include\ <gtest/gtest\.h>' '#include <folly/portability/GTest.h>'`

then used `arc lint` to clean up all the targets files

Reviewed By: genevievehelsel, xavierd

Differential Revision: D28035146

fbshipit-source-id: c3b88df5d4e7cdf4d1e51d9689987ce039f47fde
2021-05-12 15:58:27 -07:00
Katie Mancini
90072e0f4e add unit tests for TreeCache
Summary:
This introduces some basic unit tests to ensure correctness of the cache.
We are adding tests to cover the simple methods of the object cache since we
are using that code path here. And adding a few sanity check tests to make sure
the cache works with trees.

Reviewed By: chadaustin

Differential Revision: D27050296

fbshipit-source-id: b5f0577c1662483f732bb962c5b40bca8e1dcb40
2021-04-27 17:38:40 -07:00
Katie Mancini
1a02401df9 create a custom in memory tree cache
Summary:
Chad first noted that deserializing trees from the local store can be expensive.
From the thrift side EdenFS does not have a copy of trees in memory. This
means for glob files each of the trees that have not been materialized will be
read from the local store. Since reading an deserializing trees from the local
store can be expensive lets add an in memory cache so that some of these
reads can be satisfied from here instead.

This introduces the class for the in memory cache and is based on the existing
BlobCache. note that we keep the minimum number of entries functionality from
the blob cache. This is unlikely to be needed as trees are much less likely
than blobs to exceed a reasonable cache size limit, but kept since we already
have it.

Reviewed By: chadaustin

Differential Revision: D27050285

fbshipit-source-id: 9dd46419761d32387b6f55ff508b60105edae3af
2021-04-27 17:38:39 -07:00
Katie Mancini
53d3f1e6cd Templatize ObjectCache
Summary:
We would like to use a limited size LRU cache fore trees as well as blobs,
so I am templatizing this to allow us to use this cache for trees.

Trees will not need to use Interest handles, but in the future we could use
this cache for blob metadata, which might want to use interest handles.
Additionally if we at somepoint change the inode tree scheme that would remove
the tree content from the inodes itself, interest handle might be useful for
trees. We could also use this cache proxy hashes which may or may not use
interest handles. Since some caches may want interest handles and others will
not I am creating get/insert functions that work with and without interest
handles.

Reviewed By: chadaustin

Differential Revision: D27797025

fbshipit-source-id: 6db3e6ade56a9f65f851c01eeea5de734371d8f0
2021-04-27 17:38:39 -07:00
Chad Austin
4b9a93230b optimize HgProxyHash some and make loading from LocalStore explicit
Summary:
It's always annoyed me that HgProxyHash has a constructor which knows
how to load itself from a LocalStore. Add an explicit load() function,
and clean up some other stuff about the class while I'm in there.

Reviewed By: xavierd

Differential Revision: D26769231

fbshipit-source-id: f0ea9f16c3f1fbcd3d4361bcc34845901094b282
2021-03-12 10:42:46 -08:00
Xavier Deguillard
8853701e91 path: forbid building non-utf8 paths
Summary:
The world has moved on utf-8 as the default encoding for files and data, but
EdenFS still accepts non utf-8 filenames to be written to it. In fact, most of
the time when a non utf-8 file is written to the working copy, and even though
EdenFS handles it properly, Mercurial ends up freaking out and crash. In all of
these cases, non-utf8 files were not intentional, and thus refusing to create
them wouldn't be a loss of functionality.

Note that this diff makes the asumption that Mercurial's manifest only accept
utf8 path, and thus we only have to protect against files being created in the
working copy that aren't utf8.

The unfortunate part of this diff is that it makes importing trees a bit more
expensive as testing that a path is utf8 valid is not free.

Reviewed By: chadaustin

Differential Revision: D25442975

fbshipit-source-id: 89341a004272736a61639751da43c2e9c673d5b3
2021-02-23 11:35:12 -08:00
Xavier Deguillard
7d6d6f3714 model: remove test-only constructor for TreeEntry
Summary:
The StringPiece constructor is untyped, and was only used in test. We can
afford to build the PathComponent in tests instead to avoid future headaches.

Reviewed By: genevievehelsel

Differential Revision: D25434556

fbshipit-source-id: 4b10bf2576870e81412d76c4b9755b45e26986b3
2021-01-05 14:08:14 -08:00
Xavier Deguillard
978cd4549c hg: ignore invalid filename when importing manifest
Summary:
Mercurial support files with `\` in their name, which can't be represented on
Windows due to `\` being the path separator. Currently, EdenFS will throw
errors at the user when such file are encountered, let's simply warn, and
continue.

Reviewed By: chadaustin

Differential Revision: D25430523

fbshipit-source-id: 4167b4cd81380226aead8e4f4850a7738087fd95
2021-01-05 14:08:14 -08:00
Xavier Deguillard
77f3f239a2 store: replace use of ctreemanifest with small manifest parser
Summary:
The code still took a dependency on Mercurial's old manifest code to parse
manifests. It turns out the manifests have a very simple format that we could
parse directly.

This avoids various copies, conversions, std::list, removes ~1k lines of code,
at the expense of adding ~100 lines of code (some of them being C++
boilerplate).

Reviewed By: fanzeyi

Differential Revision: D25385018

fbshipit-source-id: 90d4cda2b7797584bc48c086d5592a7ecaa05dfc
2020-12-09 08:40:38 -08:00
Xavier Deguillard
34598d4337 remove dependency on glog
Summary:
The EdenFS codebase uses folly/logging/xlog to log, but we were still relying
on glog for the various CHECK macros. Since xlog also contains equivalent CHECK
macros, let's just rely on them instead.

This is mostly codemodded + arc lint + various fixes to get it compile.

Reviewed By: chadaustin

Differential Revision: D24871174

fbshipit-source-id: 4d2a691df235d6dbd0fbd8f7c19d5a956e86b31c
2020-11-10 16:31:15 -08:00
Xavier Deguillard
a75af7a63d PathFuncs: allow paths on Windows to be '\' separated
Summary:
Previously, when that code was ported on Windows, paths separator were
converted from '\' to '/' when a wide string was provided, all the other paths
were treated as is.

The main issue with this strategy is that not all paths can be converted, the
non-stored ones for instance are immutable, which leads to some subtle bugs
down the line. For instance, the paths: "Z:/foo/bar/baz" and "Z:\foo/bar\baz"
would not be equal as the path separator isn't the same, but both of these are
actually the same path underneath.

To solve this, this diff first introduce a Windows path separator, and then
modifies the path comparison functions to ignore the path separator and only
compare the components.

I'm definitively not a fan of the pattern I use for searching for both / and \
in paths, suggestions are welcome for how to improve that.

Reviewed By: chadaustin

Differential Revision: D24376980

fbshipit-source-id: 0702bf775c7c3937b2138abd5a63d339ac80aaed
2020-10-22 16:24:17 -07:00
Chad Austin
9f651a8f28 Remove dead includes in eden
Reviewed By: simpkins

Differential Revision: D23864216

fbshipit-source-id: e1e3803ee47398639bf3cf85e3347d54ecaff424
2020-10-09 15:25:47 -07:00
Zeyi (Rice) Fan
671f931d30 model: add toByteString to Hash
Summary:
Thrift represents `binary` data type as `std::string` in C++. This method will
help us to convert `Hash` into a byte string.

Reviewed By: xavierd

Differential Revision: D24083621

fbshipit-source-id: ae50088db7727d98ca11a017f82b71e942217a17
2020-10-05 15:51:18 -07:00
Xavier Deguillard
c6b9788af8 win: move win/utils onto utils/
Summary: This will make it easier to build with Buck.

Reviewed By: fanzeyi

Differential Revision: D23827754

fbshipit-source-id: bf3bf4d607a08b9831f9dfea172b2e923a219561
2020-09-22 09:09:56 -07:00