Summary:
Passing shared_ptr by copy everywhere can be expensive as it forces an atomic
operation to be performed. Since the caller of the glob code can easily
guarantee that the data will outlive the globbing code, let's just pass
references/pointers to it so that only references and pointers are copied.
Reviewed By: genevievehelsel
Differential Revision: D31344889
fbshipit-source-id: cee797202470aa123381d9ee22e11780722f5b33
Summary:
Writing to the LocalStore is purely the responsability of the
LocalStoreCachedBackingStore and not of the individual BackingStore. Thus, they
cannot assume that the root Tree is actually stored in it and should just
directly import it.
Reviewed By: chadaustin
Differential Revision: D31340206
fbshipit-source-id: 0f485ceb9fa71f7a7bdc8aaefaa850540075c88c
Summary:
Looking at Instruments, when issuing tons of glob queries to EdenFS, EdenFS
appears to be spending a very large amount of time adding tasks to the
UnboundedTaskExecutor. Since globs are expected to be fast, we can afford to
execute them inline, reducing this overhead and speeding up glob queries.
Reviewed By: chadaustin
Differential Revision: D31289485
fbshipit-source-id: 428fff9f5fea65073b2a061dc7070d63ae36d95d
Summary:
The globbing algorithm is recursive and returns its own glob results merged
with its children's glob results. The merging is done by simply copying the
children's glob result and returning it. What this means is that a single
GlobResult will be copied K times, with K being the recursion depth at which it
was created. This makes the total number of copies be O(K*N) with N the result
length.
Since we can simply avoid these copies by simply creating the GlobResult in a
shared vector, we can avoid the copies entirely at the expense of taking a
lock.
Reviewed By: chadaustin
Differential Revision: D31288036
fbshipit-source-id: ae8a98a01eab2ba7f23908d347d7a4ec199cdfab
Summary:
One layer about getChildRecursive is getInode, let's make this one use an
ImmediateFuture too.
Reviewed By: chadaustin
Differential Revision: D31283397
fbshipit-source-id: 8bc524bea857d6ec5bc045d6e3383d38133c3b38
Summary:
This is fairly mechanical diff that finalizes split of Hash into ObjectId and Hash20.
More specifically this diff does two things:
* Replaces `Hash` with `Hash20`
* Removes alias `using Hash = Hash20`
Reviewed By: chadaustin
Differential Revision: D31324202
fbshipit-source-id: 780b6d2a422ddf6d0f3cfc91e3e70ad10ebaa8b4
Summary:
The goal of this stack is to remove Proxy Hash type, but to achieve that we need first to address some tech debt in Eden codebase.
For the long time EdenFs had single Hash type that was used for many different use cases.
One of major uses for Hash type is identifies internal EdenFs objects such as blobs, trees, and others.
We seem to reach agreement that we need a different type for those identifiers, so we introduce separate ObjectId type in this diff to denote new identifier type and replace _some_ usage of Hash with ObjectId.
We still retain original Hash type for other use cases.
Roughly speaking, this is how this diff separates between Hash and ObjectId:
**ObjectId**:
* Everything that is stored in local store(blobs, trees, commits)
**Hash20**:
* Explicit hashes(Sha1 of the blob)
* Hg identifiers: manifest id and blob hg ig
For now, in this diff ObjectId has exactly same content as Hash, but this will change in the future diffs. Doing this way allows to keep diff size manageable, while migrating to new ObjectId right away would produce insanely large diff that would be both hard to make and review.
There are few more things that needs to be done before we can get to the meat of removing proxy hashes:
1) Replace include Hash.h with ObjectId.h where needed
2) Remove Hash type, explicitly rename rest of Hash usages to Hash20
3) Modify content of ObjectId to support new use cases
4) Modify serialized metadata and possibly other places that assume ObjectId size is fixed and equal to Hash20 size
Reviewed By: chadaustin
Differential Revision: D31316477
fbshipit-source-id: 0d5e4460a461bcaac6b9fd884517e129aeaf4baf
Summary:
In advance of Thrift servers defaulting the queue timeout to 100 ms,
which is quite low for EdenFS's needs, explicitly set our queue
timeout to 5 seconds.
Reviewed By: zhengchaol
Differential Revision: D31218348
fbshipit-source-id: 35a109fb6848f7c81c4b58d70e2beae90557e1c8
Summary: we can just use getBackingStores like how is done for `startRecordingBackingStoreFetch` and only record non-empty fileAccesses. This will enable fetch logging for LocalCacheBackingStores which use an HgQueuedBackingStore under the hood
Reviewed By: zhengchaol
Differential Revision: D31215109
fbshipit-source-id: 443d28a57144fdcf078bd653ecf5726825f55740
Summary: fix the dynamic casting for getting a tracebus for the trace hg entrypoint. dynamic cast still makes sense at this point since `trace hg` should only be called on hg backed mounts
Reviewed By: chadaustin
Differential Revision: D31214737
fbshipit-source-id: 65e018e6658d934d8ecd3434bdfc3d72f6873d2b
Summary: instead of dynamic casting to find the repo name, all backing stores can return an optional reponame, and can check if the optional is set.
Reviewed By: zhengchaol
Differential Revision: D31214723
fbshipit-source-id: 9d10114ff6bde13254d3a3caaf2401f87d07ffd7
Summary: add more information to the runtime error thrown by the dynamic cast failure in `eden trace hg` and predictive fetch
Reviewed By: zhengchaol
Differential Revision: D31212247
fbshipit-source-id: 982901dfd2eb05db9ca6e7366277a07b6b29872f
Summary:
While debugging the unlinked inode unloading for NFS I have re-added these
logs a couple times. These seem valuable to have in eden so that we don't have
to add them any time we are debugging eden and we can debug a bit in a
production eden rather than dev built eden.
Reviewed By: xavierd
Differential Revision: D30971151
fbshipit-source-id: 58172079dfe4f4e4ba31bae30bf982e2cbe0fd29
Summary:
We run periodic inode unloading for unlinked inodes on NFS because we get no
information from the client on when inodes are no longer needed, and we have to
clean them up at some point for memory and disk reasons. See previous commit
summaries for more details on this (D30144901 (ffa558bf84)).
Let's add some counters on this so we have a bit more visibility into the
process. This counter is meant to mimic the PeriodicUnloadCounter counter.
Reviewed By: chadaustin
Differential Revision: D30966688
fbshipit-source-id: cfc8d769b53073d9f4c0c27b6bee20e222c6c8d2
Summary:
Some of EdenFS backing store requires EdenFS to cache objects locally to avoid
potentially expensive network fetches, while others already have some form of
local caching. In the past, all backing store fell in the first category, but
thanks to Mercurial's native backing store implementation the LocalStore
caching has become pure overhead for it. Previously, this was worked around by
configuring the LocalStore to not cache blobs locally, but this wasn't done for
trees. This config also conflicts with the need to cache blobs and trees
locally for backing stores in the first category (such as ReCas).
Since we know at construction time what backing store needs local caching, we
can simply wrap these in the newly introduced LocalStoreCachedBackingStore
store.
For now, since the Mercurial backing store always writes a proxy hash to the
LocalStore, bypassing the LocalStore for trees would be a regression due to the
added disk IO. Once proxy hashes are gone for Mercurial, we can remove the
LocalStoreCachedBackingStore wrapper.
Reviewed By: chadaustin
Differential Revision: D31118905
fbshipit-source-id: 4a2958eafeeb8144ee4421ec44dbd30cedceee29
Summary:
folly:format is deprecated in lieu of fmt and std::format. Migrate
most of EdenFS to fmt instead.
Differential Revision: D31025948
fbshipit-source-id: 82ed674d5e255ac129995b56bc8b9731a5fbf82e
Summary:
Having tons of booleans in a function can be very error prone from a caller
perspective, using a structure to pass in the same information can mitigate
some of this issue.
Reviewed By: kmancini
Differential Revision: D30883743
fbshipit-source-id: dcf38d29bfe2cb5155879f7ae4eab5cea31f798a
Summary:
We periodically need to dereference inodes on NFS because we get no other info
from the kernel on when should dereference them.
It can be disruptive to a users workflow because an open files that were rm'ed
or removed on checkout will no longer have their old content. (on a native
filesystem or fuse applications that had the file open propr to the removal
would still be able to access files.) For most editors this is not a problem
because they read the file on open (seems fine for vim and vscode from testing).
However folks could theoretically have a workflow this does not jive with.
Let's make it configurable how often this runs, so users can control how
much we distupt their workflow.
Reviewed By: xavierd
Differential Revision: D30144899
fbshipit-source-id: 59cf5faea70b3aea216ca2bcb45b96e34f5e72b5
Summary:
NFSv3 has no inode invalidation flow built into the procall. The kernel does not
send us forget messages like we get in FUSE. The kernel also does not send us
notifications when a file is closed. Thus EdenFS can not easily tell when
all handles to a file have been closed.
As is now we never clean up inodes. This is bad for memory & disk usage.
We will never unload an inode so we always keep it in memory once it's created.
Additonally, we never remove a materialized inode from the overlay. This means
we have unbounded memory and disk usage :/
We need to clean up these inodes at somepoint. There are a couple high level
options:
1. Support nfsv4. NFSv4 sends us close message when a file handle is closed.
This would allow us to actually keep track of reference coundts on an inode.
However, This is a lot of work. There is a lot of other things we would have to
support before we can move to nfsv4.
2. Run background inode cleanups.
nfsv4 is probably the right long term solution. But for now we should be able to
get by with periodic unloads.
I considered a couple of options for unloads:
1. Unload inodes immediatly when files are removed.
2. Delay cleaning up inodes until a while after they are removed. (i.e. clean
up inodes n seconds after an `unlink`, `rename`, `rmdir`, or checkout)
3. Run periodic inode unloading. (i.e. once a day unload inodes).
Option 1. feels a bit too hostile to applications that hold files open.
Option 3. means we will build up a lot of cruft over the course of the day. But is
probably the most application friendly.
I decided to try out option 2 first and see if it works well with the common
developer tools. Its seems to work (see below) so I am going with it.
This diff only does inode cleanup after checkout. we might want to run inode
clean up after unlink/remove dir as well, but this would be more expensive.
Batch unloading feels better on checkout seems better to me and should happen
frequently enough to clean up space for people.
There is one known "broken" behavior in this diff. We unload all unlinked
inodes which means we will erase more inodes than we should. Sometimes EdenFS
crashes or bugs and unlinks legit inodes. Normally we let those live in the
overlay so we could go in an recover them. My plan to fix this is to mark inodes
for unloading instead of just unloading all unlinked inodes.
Reviewed By: xavierd
Differential Revision: D30144901
fbshipit-source-id: 345d0c04aa386e9fb2bd40906d6f8c41569c1d05
Summary: This adds inode number to NFS trace event so that we can use it in ActivityRecorder to show the filename of the FS request.
Reviewed By: xavierd
Differential Revision: D30849770
fbshipit-source-id: 580faf5fccb1a225399d9aec843e23eae1874e87
Summary:
`eden prefetch` and `eden glob` return lists that despite being called
"maching files" actually contains both files and directories.
In some cases, we only want the list of files and it introduces unnessecary
overhead on our clients for them to have to stat all the files in the list to
filter out the dirs. Let's add an option to just list files.
Reviewed By: chadaustin
Differential Revision: D30816193
fbshipit-source-id: 6e264142162ce03e560c969a0c0dbbc2f418d7b9
Summary:
Put code using the usage service behind an `EDEN_HAVE_USAGE_SERVICE` macro.
Previously the C++ code was simply guarded by a `__linux__` check, and the
CMake code did not have a guard at all. This caused builds from the GitHub
repository to fail on Linux, since the code attempted to use the usage service
client which was not available.
Reviewed By: xavierd
Differential Revision: D30797846
fbshipit-source-id: 32a0905d0e1d594c3cfb04a466aea456d0bd6ca1
Summary:
In preparation for expanding to variable-width hashes, rename the
existing hash type to Hash20.
Reviewed By: genevievehelsel
Differential Revision: D28967365
fbshipit-source-id: 8ca8c39bf03bd97475628545c74cebf0deb8e62f
Summary:
Since the background condition is before the actual prefetching of files,
specifying the background option would just glob files but not prefetch them
which is equivalent to prefetching all the trees.
Reviewed By: genevievehelsel
Differential Revision: D30618753
fbshipit-source-id: 5533b1c78d614342ac3341ce033795be3850750a
Summary:
This change has the unintended effect of causing any Thrift calls to
potentially issue a recursive EdenFS call due to symlink resolution requiring
running `readlink` on the root of the repo itself.
Fixing this isn't really possible, thus let's revert the change altogether, we
can force clients to issue a realpath before issuing EdenFS Thrift calls.
Reviewed By: kmancini
Differential Revision: D30550796
fbshipit-source-id: 9494c8e08c8af2392eeb344879f156cb56f93ea6
Summary:
Made changes to ensure that numResults is always a 32 bit unsigned int, and startTime and endTime are 64 bit unsigned ints. This is to ensure consistency across the smartservice and the endpoint in the daemon.
Also, updated the scuba query in the smartservice to only consider dirs with > 1 access (may update this later to accept a configurable lower bound on access count, but for now, including access=1 doesn't make sense).
Reviewed By: genevievehelsel
Differential Revision: D30396526
fbshipit-source-id: 10e7bd969928da91ab29d413280a1ff956db438c
Summary:
Looking at strobelight when performing an `eden prefetch` shows that a lot of
time is spent copying data around. The list of hash to prefetch is for instance
copied 4 times, let's reduce this to only one time when converting Hash to a
ByteRange.
Reviewed By: chadaustin
Differential Revision: D30433285
fbshipit-source-id: 922e6e5c095bd700ee133e9bb219904baf2ae1ac
Summary: This diff renames ```SetPathRootId``` to ```SetPathObjectId``` as we want to support BLOB
Reviewed By: chadaustin
Differential Revision: D30404536
fbshipit-source-id: f34446ec20aeaf87f5f61e29e421a9bceb0b2a4a
Summary:
The SmartPlatform service that queries for a user's most used directories allows optional parameters of: os, startTime, endTime, and sandcastleAlias instead of user. This diff extends the current predictive prefetch option which queries based on the current user, mount repository, and a default numResults, to allow specification of all parameters including the optional ones.
If a user and/or repo is not specified these are determined from the server state and mount, respectively. If numResults is not specified, a default value is used (predictivePrefetchProfileSize, currently 10,000).
For sandcastle aliases, we check if the SANDCASTLE_ALIAS environment variable is set, and if so, use the value as a parameter. If a sandcastle alias is specified, the smartservice will ignore the user and query based on the alias, otherwise a user is assumed.
Differential Revision: D30160507
fbshipit-source-id: 174797f0a6f840bb33f669c8d1bb61d76ff7a309
Summary:
In the case where the path to the mount has symlinks, EdenFS would only accept
the path to it that was specified at mount time, even though another path may
refer to the same directory.
To solve this, we can simply normalize paths in all the Thrift endpoint to make
sure that EdenFS always refers to a mount point under its non-symlinked path.
Reviewed By: chadaustin
Differential Revision: D30320515
fbshipit-source-id: e578d059a3b1307d6b24c4b9bdb1ceb3b534c460
Summary: The journal stream is disconnected at Watchman shutdown, which is the expected behavior. This changes the log level to INFO.
Reviewed By: chadaustin
Differential Revision: D30231657
fbshipit-source-id: 94909daeba786b1bed7497e4a21ffcfc52d6d9cb
Summary: Due to lifetime issues of FetchContext& w.r.t. background prefetching, we can just create the helper in `_globFiles` and use that to maintain lifetime
Reviewed By: xavierd
Differential Revision: D30175224
fbshipit-source-id: b2fccb76f9d4011139e80bd5bc52c40bbab08b94
Summary: Added a Thrift method that tells EdenFS to prefetch files from a user's most used directories using an endpoint that talks to the edenfs/edenfs_service SmartPlatform service to get the directory list. The default number of directories is set to 10,000.
Reviewed By: genevievehelsel
Differential Revision: D29909976
fbshipit-source-id: bfb1a411d50d7355ff604de5bc090a9e2c3100a0
Summary:
This adds the options to `eden stats` for collecting only fast stats and printing in JSON.
`eden stats` can be slow especially due to collecting fb303 counters and private bytes. An example use case of this new lightweight endpoint is that Buck can poll it to display Eden related info in its cli (see [post](https://fb.workplace.com/groups/132499338763090/permalink/210396380973385/) for context).
Reviewed By: xavierd
Differential Revision: D29687041
fbshipit-source-id: a663e71231527c5dfb822acbf238af0ac6ce4a00
Summary: Currently we have to manually save the id returned from `start_recording`. After this, we can simply ask for the list of all active recorder sessions.
Reviewed By: genevievehelsel
Differential Revision: D30056117
fbshipit-source-id: 7fd69b70e7b04fcd0b3724f4ee16c5e5e86badaf
Summary: For `future_` endpoints, we wrap the final return statement with `wrapFuture(std::move(helper), ...` to ensure we keep the ThriftLogHelper alive through the whole call.
Differential Revision: D30018980
fbshipit-source-id: 2c63fe5d7b4504912cc46a32ca04f16e98b0805f
Summary:
If Mercurial asks EdenFS to update to a commit that it has just created, this
can cause a long delay while EdenFS tries to import the commit.
EdenFS needs to resolve the commit to a root manifest. It does this via the
import helper, but the import helper won't know about the commit until it is
restarted, which takes a long time.
To fix this, we add an optional "root manifest" parameter to the checkout or
reset parents thrift calls. This allows the Mercurial client to inform EdenFS
of the root manifest that it already knows about, allowing EdenFS to skip this
step.
Reviewed By: chadaustin
Differential Revision: D29845604
fbshipit-source-id: 61736d84971cd2dd9a8fdaa29a1578386246e4bf
Summary:
This adds debug commands for ActivityRecorder:
```
eden debug start_recording --output-dir <DIR>
* stdout: the id of the profile
eden debug stop_recording --unique <ID>
* stdout: the output file path
```
Users can record multiple profiles concurrently. Each profile is identified by the timestamp when it started.
Reviewed By: genevievehelsel
Differential Revision: D29666359
fbshipit-source-id: 487ca67de77378a8141bc4ac46b9abd1375ffd23
Summary:
We want to introduce two debug commands to record perf profiles such as files read. This can later be integrated to CI so that we can have this data for troubleshooting perf issues.
* `eden debug start_recording` starts recording perf metrics such as files read/written and fetch counts/latency for a given mount.
* `eden debug end_recording` stops recording and dumps the recorded profile to a local file.
This diff adds the boilerplate `ActivityRecorder` (borrowed heavily from `HiveLogger`'s implementation). The start command would create an instance of the recorder; the end command would destroy the recorder. The recording and dumping are handled by the implementing class.
Reviewed By: genevievehelsel
Differential Revision: D29506895
fbshipit-source-id: a927a363942a041d5ae54186a265576325dfeed5
Summary: This adds counters for memory and disk counts in addition to import count so that we can understand cache hit rates during local investigation or output this in ActivityRecorder.
Reviewed By: genevievehelsel
Differential Revision: D29805637
fbshipit-source-id: 34261f91c33d6bd4bcb4b85b17d2e68360410896
Summary: We already have AccessType for FUSE, this adds the same categorization for NFS. This allows us to easily filter events in trace stream and ActivityRecorder.
Reviewed By: chadaustin
Differential Revision: D29771074
fbshipit-source-id: a437f0693f9062fb2df3b6f618a9d8860a05df12
Summary: Extended eden doctor to check if the PrivHelper is accessible and report when it is not.
Reviewed By: genevievehelsel
Differential Revision: D29593250
fbshipit-source-id: 2390e75b91c9d6f713db4b6084868af91a0b6623