Summary:
To support better telemetry and logging in watchman we want to use Eden's components. Lets migrate and detangle the needed pieces.
This change moves Throw.h and it's related tests from eden to edencommon.
Reviewed By: genevievehelsel
Differential Revision: D54046153
fbshipit-source-id: 669d702c13e70536d9c0b58ff8ff17f826237851
The internal and external repositories are out of sync. This Pull Request attempts to brings them back in sync by patching the GitHub repository. Please carefully review this patch. You must disable ShipIt for your project in order to merge this pull request. DO NOT IMPORT this pull request. Instead, merge it directly on GitHub using the MERGE BUTTON. Re-enable ShipIt after merging.
Summary:
See why on previous diff
Now with this diff GlobTree is independent of inode
Reviewed By: kmancini
Differential Revision: D49933175
fbshipit-source-id: 1551a2b7e054df5df88ac37fbf0bf45f91e34548
Summary:
Some tools seem to query EdenFS directly about the type of certain files. If queried directly, EdenFS might answer that some files are symlinks regardless of whether symlink support for Windows is enabled or not. This adds some additional gating to make sure that EdenFS only answers that files are symlinks if symlink support is enabled for a checkout.
Split between this diff and D47829752 for clarity
Reviewed By: xavierd
Differential Revision: D47326227
fbshipit-source-id: 6e60b3d434ba45dd529286436526350ee0ac9dcf
Summary:
It is a minor historical mistake that Blob knew its own ID. There can
be multiple ID schemes that reference a single blob, and sometimes we
want to construct a blob without an ID available.
Now that blob IDs are never read directly, remove the field and
simplify a variety of construction sites.
Reviewed By: kmancini
Differential Revision: D39188561
fbshipit-source-id: 72b59b744aac42c312816d568fa563629575267a
Summary:
In preparation for removing the blob ID from Blob, update all of the
call sites that previously read it.
Reviewed By: xavierd
Differential Revision: D39188082
fbshipit-source-id: 7bf0b652a929cb24957e617e499fc8228807f99d
Summary:
Partially enables symlink support on Windows by making symlinks appear on Windows as actual symlinks as opposed as regular files containing only the place where the symlink would point in other systems.
Creating new commits with symlinks also works (after editing `fscap.py` on hg's side as well as the requirements file for the current directory for enabling symlinks on Windows EdenFS) when the symlinks are in the same directory.
Reviewed By: xavierd
Differential Revision: D44218035
fbshipit-source-id: 0e3094dc5a13cabef1cd24f8fe18cc73ca40d4a8
Summary:
The methods mentioned in the title changed a bit on Windows, now allowing them to respond that a TreeEntry can be a symlink.
In order to gate this change, on Windows now there is a helper method that is used for reverting the changes for `TreeEntry:: getType` to its previous behavior, in order to be able to tell whether symlinks are enabled for the current Eden checkout or not. Where possible, (which is in most places) the config for whether symlinks are enabled or not on the Eden checkout is passed down.
The non-gated changes are kept since in those cases we actually want to register that a TreeEntry is actually a symlink, even when symlinks are "not enabled".
Originally this was intended to be part of D44218035, but was split for clarity.
Reviewed By: xavierd
Differential Revision: D47326228
fbshipit-source-id: be6cfae6626bf3a32aa119d25bf8b5fe6a549898
Summary:
clang-tidy had some automated suggestions for our code. Apply the ones
that make sense.
Some of them didn't, like removal of all uses of `volatile`. I
manually reverted those changes.
Reviewed By: genevievehelsel
Differential Revision: D41051052
fbshipit-source-id: 3fe22a91e929d3bb8e6346126c2c7bf9f027eb32
Summary: Removing the previous default ctor to ensure that we always compute and pass blake3.
Reviewed By: chadaustin
Differential Revision: D46268716
fbshipit-source-id: d9bfbc7bfd07b61dbb2e915c3fe72d7526d919e1
Summary:
Adding blake3 support into TreeEntry and backing stores.
Note: Http and RE CAS stores don't provide blake3 hash so far. While it would be pretty easy to add support for RE, not sure how hard it would be for the http store.
Reviewed By: chadaustin
Differential Revision: D46268715
fbshipit-source-id: db66e63fe0348eb582a8050f22cdc0ff720ccf85
Summary:
For tools that want to take advantage of the same fast-path logic when
directories don't change across updates, expose a semistable ID they
can use to cache derived data or get a rough understand when a
directory has changed its contents.
Reviewed By: kmancini, xavierd
Differential Revision: D45974142
fbshipit-source-id: 7b2b482876b07e73514a936e198de2dc31ed1597
Summary:
In both the ObjectStore and in the hg BackingStore, copies of the unique_ptr
were being made. For large blobs this is particularly inefficient as
potentially several MB (if not more) of data needs to be copied. Let's fix this
by changing the BackingStore API to return a shared_ptr.
In order to make the code easier to read and write, also define 3 types:
TreePtr, BlobPtr and BlobMetadataPtr and use them in the BackingStore code.
Future changes should be done at a later point to convert the whole codebase to
using these.
Reviewed By: chadaustin
Differential Revision: D45967102
fbshipit-source-id: 6086f95456232db48a5cbec47b7cf8b14e4424ed
Summary:
Having a strict ObjectID format is quite inconvenient. We will need to introduce a new ObjectID format for Eden x Sparse, so it's in our best interest to remove the ObjectID format restrictions before hand.
This will allow us to place the high entropy data (proxy hash in our case) in any location in the ObjectId without causing a ton of hash collisions. This will enable us to introduce FilteredObjectIDs in the form:
`<tree_or_blob_byte><filterset_id><path><ObjectId>`
where the `<ObjectId>` contains the high entropy bits we need to hash.
Reviewed By: xavierd
Differential Revision: D45793298
fbshipit-source-id: 77385e32f63d5f3d1fc37b72b9971f5717cbd872
Summary:
I'm about to make some changes to VirtualInode. In advance, refactor a
few things I noticed.
Reviewed By: kmancini
Differential Revision: D45672797
fbshipit-source-id: 7edf67ac14fb9b98324d3ed20eecbebb5ff903c6
Summary:
I need to use `std::optional<ObjectId>::operator==`, which uses
`ObjectId::operator==`.
I tried to make it hard to accidentally compare ObjectId, but that's
fighting a losing battle, so bring it back.
Reviewed By: genevievehelsel
Differential Revision: D45629405
fbshipit-source-id: 83c88afff0fb2954e40dd59204d0cfc76155fa70
Summary: Backing out the revert of D44173515 to unblock RE with their development. I will revert my own diff instead.
Reviewed By: genevievehelsel
Differential Revision: D45190063
fbshipit-source-id: 2b14fdc2fa118719aed0f3215393d172a162639f
Summary:
Eden's readdir and getfileAttribute endpoint returns an error when an entry in
a directory has a type that is not: regular file, directory, or symlink.
This causes issues for Buck2 because it propagates this error to the user. For
Buck2 a directory having a file that isn't a regular file, directory, or
symlink isn't an error case, it's just a file Buck2 wants to skip over. Buck2
would like to be able to differentiate real errors getting the filetype (like
say a network error) and having a weird file in some directory.
From chatting with Thomas, Buck2 is unlikely to ever care what type the file
is (if its not a file, dir or symlink). So it's sufficient just let buck2
know it's some "other" type of file. I think it makes sense to just add a non
source control type here. I also considered adding dtype as an attribute, but
I don't think we need it, but we could add that too.
In some cases it can be dangerous to add values to thrift enumeration
(SourceControlType enum we change below)
(reference post: https://fb.workplace.com/groups/thriftusers/permalink/785884732120941/).
But in our case, rust + Buck2 handles new enum types gracefully
(and with exactly the behavior we want):
https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/buck2/app/buck2_common/src/io/eden.rs?lines=157
so adding a value to the enum is safe (for buck2).
hack is our other client. they are going to handle it less gracefully:
https://www.internalfb.com/code/fbsource/[65673fd318750984372aeb5b44036a259a0d85d2]/fbcode/hphp/hack/src/facebook/hh_distc/package/package.rs?lines=441 but from what I can tell hack would also
error if they tried to list a directory with a socket in it with out this
change. Will confirm with them that this change is ok with them.
Reviewed By: chadaustin
Differential Revision: D44794698
fbshipit-source-id: 4e3ab7964fa2c0932b0363fb9ad62f24af74480c
Summary:
Generelizing the Hash structure and adding Hash32 support.
Also, added a few basic keyed blake3 methods for that.
This is mainly a preparation to start supporting blake3.
bypass-github-export-checks
Reviewed By: chadaustin
Differential Revision: D44173515
fbshipit-source-id: 87c55d47dabe50c7104f09ee0078f29513068862
Summary:
Our C++ benchmarks were disabled on Windows because of
CLOCK_MONOTONIC, so add a std::chrono fallback on Windows.
Reviewed By: genevievehelsel
Differential Revision: D41569853
fbshipit-source-id: 2c34f94bcc8cbc7d4a2b8704022287e497112b34
Summary:
I want to be able to query the state of a blob in the hgcache to inspect when
blob contents don't match the cached sizes we have for them. This is to help
catch an attempted repro for corrupt file sizes we have seen from user reports.
However, I don't want this to do any network fetching (bcz slow). and we don't
cache blobs in the local store in production. so I need to be able to get at
the hgcache.
I could go the `hg debugscmstore` route but I think it will be useful anyways
for `eden debug blob` to expose this information for ease of future debugging.
While I am adding it, I am setting up the plumbing to be able to fetch blobs from
multiple places.
Since we are changing the return type, we have to create a new endpoint. While
the endpoint changes I will also clean them up to follow more standard thrift
practices.
Reviewed By: chadaustin
Differential Revision: D41013986
fbshipit-source-id: af543fe9952dc15f28ef6cc13f6bf2eab95f753e
Summary:
std::string_view has noexcept accessors and folly::Range doesn't, so
this allows us to make Path and PathPiece noexcept.
Reviewed By: kmancini
Differential Revision: D41145426
fbshipit-source-id: 046f6f6a532d8d0da8508ccf7896c914e19a25ec
Summary: Now that we format with fmt, we don't need folly/Conv here.
Reviewed By: xavierd
Differential Revision: D41105006
fbshipit-source-id: 3685fe4f06f998996d0e88fb39fb5578a57c85f8
Summary: We're migrating to fmt, so add a formatter for Hash20.
Reviewed By: xavierd
Differential Revision: D41104916
fbshipit-source-id: 51ecbd61f1b5ba2fe257cd6ef5835bc9030b1a0a
Summary:
We can use C++17 and operator""sv replaces the need for `literal` in
GlobMatcherTest.cpp.
Reviewed By: genevievehelsel
Differential Revision: D40108769
fbshipit-source-id: dcce96fd76cff212d93d175ee69e44f9b098ae8a
Summary:
Eden's globber is currently hardcoded to be *mostly* case-sensitive ("mostly", because some code paths are already case-insensitive - see D39892007 (05b2300f73)). Here we augment the implementation to consistently perform case-insensitive globbing in case-insensitive mounts.
As a possible followup, we may want to expose this as an argument to `globFiles`. At the moment it is solely controlled by the checkout config.
Reviewed By: chadaustin, xavierd
Differential Revision: D39906585
fbshipit-source-id: 5b266cb75cf073651f6e2c66e7f1ebdd7026312f
Summary:
The fuzzer discovered an off-by-one in character class handling. Add a
test and fix.
Reviewed By: simpkins
Differential Revision: D40089224
fbshipit-source-id: b5f02530d156220698f4f38c394ee48a1600571d
Summary:
I don't know the difference between our prior magic incantations and
the new magic incantations, but the old ones crazy errors when writing
variadic wrappers that eventually call into fmt::format or fmt::join.
Reviewed By: vitaut
Differential Revision: D39991902
fbshipit-source-id: 8ca3215267912b3bb0ddf586681d3eb5294f52ff
Summary:
The ObjectCache is an extremely hot code path, dominating the glob query times.
When profiling it, hashing ObjectId shows up as a big component of the
ObjectCache runtime. By convention, we know that ObjectId contain a hash
already, and thus we can use that as the ObjectId hash directly.
Reviewed By: chadaustin
Differential Revision: D39399209
fbshipit-source-id: b8113102ac94445efed74dcdd88bb0b1f3c62591
Summary:
Now that both Tree and DirContents share a PathMap, the API for both is very
similar. We can thus remove a bunch of functions that were intended to abstract
the slight difference between these 2.
As a bonus, an unecessary path copy is avoided.
Reviewed By: kmancini
Differential Revision: D38664359
fbshipit-source-id: c09962b062a5a95531d46d086410e4331ad92d26
Summary:
Previously, the new Windows Fsck logic was seeing dmode_t::Unknown for symlink files, which it then stored in the overlay. This caused later hg status's to see the file as having been modified.
This diff makes TreeEntry report dtype_t::Regular for symlinks on Windows. Thanks to xavierd for pointing this out.
Reviewed By: xavierd
Differential Revision: D38793312
fbshipit-source-id: dfff9ddac7a6c02503c01ea83a4486bc2687722b
Summary:
Fixes https://github.com/facebookexperimental/edencommon/issues/3
`utils` is a bit too generic and Gentoo seems to be building edencommon into a shared library.
Reviewed By: chadaustin
Differential Revision: D38719753
fbshipit-source-id: fb46b6a7c9d3bcc3034765cb47e997a80c646b3d
Summary:
From the folly format docs
> Use fmt::format instead of folly::format for better performance, build times and compatibility with std::format
eden build times have gotten a bit high, cutting out folly format will help reduce build time, so let's start
by banishing it from eden
see the last change in this group for the difference
Reviewed By: xavierd
Differential Revision: D38251005
fbshipit-source-id: 62f70896da0e852707ecf10198b1e14eaadb94a2
Summary:
From the folly format docs
> Use fmt::format instead of folly::format for better performance, build times and compatibility with std::format
eden build times have gotten a bit high, cutting out folly format will help reduce build time, so let's start
by banishing it from eden
see the last change in this group for the difference
Reviewed By: xavierd
Differential Revision: D38251003
fbshipit-source-id: eace8cee2a1c02ecf95cc386b16c3ae595a4f188
Summary:
I was iterating on a single unit test and noticed our compilation times have
crossed the threshold from bad to horrific. I was seeing 30+ seconds per unit
test .cpp file.
After playing around with -ftime-trace, I found some obvious low-hanging fruit:
- forward declaration opportunities
- pImpl
- moving some implementation to cpp files
Some bigger opportunities remain:
The folly/portability/GTest.h and folly/portability/Windows.h header situation
isn't great. They pull in winsock2.h which alone takes two seconds to compile.
We also probably don't need the folly/portability/Unistd.h compatibility
wrappers in EdenFS or Watchman.
Also, folly/Format.h is quite expensive, and there are other dependencies that
pull in Boost MPL.
Reviewed By: xavierd
Differential Revision: D38195736
fbshipit-source-id: 9c64bab5ff5851d5a896674712aec71d6780c79a
Summary:
We had no unit tests for dematerialization, so introduce two,
especially after the subtleties of D37904727 (c46461bbfb).
Reviewed By: xavierd
Differential Revision: D38020803
fbshipit-source-id: 854011118ff7a426d3b98bc75a536d8910e41ee3
Summary:
The hg backing store supports three object ID schemes: proxy hashes,
revhash+path, and revhash alone. When migrating between two schemes,
it looks to EdenFS as if the entire working copy has changed, and
EdenFS will then fetch and compare every tree.
To avoid this traversal, add an areObjectsKnownIdentical function to
BackingStore which the hg backing store can use to short-circuit
recursion, if two different ObjectIds point to the same object.
Reviewed By: xavierd
Differential Revision: D37904727
fbshipit-source-id: 4d0b7c528b5b96bfe7cb4e8ca76d0e432e003249
Summary:
Lets use OptionSet to represent the bitset of desired attributes instead of a
raw int. This makes the type of the bitset much more clear and makes the
bit comparisons a bit easier to read.
Reviewed By: chadaustin
Differential Revision: D38037647
fbshipit-source-id: 02721bf09a1235bbe3d4a17039671e904748b79f
Summary:
readdir and getEntryAttributesForPath allow collecting all the attributes for a
directory or a list of files, but this currently always fetches all the
attributes no matter which are requested.
Buck2 will only need a subset of the attributes on the readdir call.
So its worth optimizing here to only fetch requested attributes.
There are still some optimizations work making:
- we should avoid looking up InodeOrTreeOrEntry when no attributes are requested
- we can even probably avoid looking up some data when requesting the
TreeEntryType only.
Reviewed By: xavierd
Differential Revision: D37557706
fbshipit-source-id: 894dd9907e55f7255473fd9564812e2bb773c45d