Summary:
One of the largest contributor to EdenFS memory usage are the internal
IndexedLog buffers to hold data in memory until a threshold is reached. Since
the main benefit of these buffers is to utilize the disk bandwidth, very large
buffers aren't necessary and much smaller ones will be able to achieve similar
results.
A default 50MB buffer is used which will cap the memory usage to 50MB * 3:
- File IndexedLogDataStore
- Tree IndexedLogDataStore
- File LFS
The aux and history stores are also reduced down to 10MB.
Reviewed By: DurhamG
Differential Revision: D30948343
fbshipit-source-id: 74e789856ac995a5672b6aefe8a68c9580f69613
Summary:
We periodically need to dereference inodes on NFS because we get no other info
from the kernel on when should dereference them.
This means the NFS kernel might have references to inodes after we delete them.
An unknown inode number is not a bug on NFS. It's just stale, so the error should
reflect that.
Reviewed By: xavierd
Differential Revision: D30144898
fbshipit-source-id: 3d448e94aea5acb02908ea443bcf3adae80eb975
Summary:
We periodically need to dereference inodes on NFS because we get no other info
from the kernel on when should dereference them.
It can be disruptive to a users workflow because an open files that were rm'ed
or removed on checkout will no longer have their old content. (on a native
filesystem or fuse applications that had the file open propr to the removal
would still be able to access files.) For most editors this is not a problem
because they read the file on open (seems fine for vim and vscode from testing).
However folks could theoretically have a workflow this does not jive with.
Let's make it configurable how often this runs, so users can control how
much we distupt their workflow.
Reviewed By: xavierd
Differential Revision: D30144899
fbshipit-source-id: 59cf5faea70b3aea216ca2bcb45b96e34f5e72b5
Summary:
NFSv3 has no inode invalidation flow built into the procall. The kernel does not
send us forget messages like we get in FUSE. The kernel also does not send us
notifications when a file is closed. Thus EdenFS can not easily tell when
all handles to a file have been closed.
As is now we never clean up inodes. This is bad for memory & disk usage.
We will never unload an inode so we always keep it in memory once it's created.
Additonally, we never remove a materialized inode from the overlay. This means
we have unbounded memory and disk usage :/
We need to clean up these inodes at somepoint. There are a couple high level
options:
1. Support nfsv4. NFSv4 sends us close message when a file handle is closed.
This would allow us to actually keep track of reference coundts on an inode.
However, This is a lot of work. There is a lot of other things we would have to
support before we can move to nfsv4.
2. Run background inode cleanups.
nfsv4 is probably the right long term solution. But for now we should be able to
get by with periodic unloads.
I considered a couple of options for unloads:
1. Unload inodes immediatly when files are removed.
2. Delay cleaning up inodes until a while after they are removed. (i.e. clean
up inodes n seconds after an `unlink`, `rename`, `rmdir`, or checkout)
3. Run periodic inode unloading. (i.e. once a day unload inodes).
Option 1. feels a bit too hostile to applications that hold files open.
Option 3. means we will build up a lot of cruft over the course of the day. But is
probably the most application friendly.
I decided to try out option 2 first and see if it works well with the common
developer tools. Its seems to work (see below) so I am going with it.
This diff only does inode cleanup after checkout. we might want to run inode
clean up after unlink/remove dir as well, but this would be more expensive.
Batch unloading feels better on checkout seems better to me and should happen
frequently enough to clean up space for people.
There is one known "broken" behavior in this diff. We unload all unlinked
inodes which means we will erase more inodes than we should. Sometimes EdenFS
crashes or bugs and unlinks legit inodes. Normally we let those live in the
overlay so we could go in an recover them. My plan to fix this is to mark inodes
for unloading instead of just unloading all unlinked inodes.
Reviewed By: xavierd
Differential Revision: D30144901
fbshipit-source-id: 345d0c04aa386e9fb2bd40906d6f8c41569c1d05
Summary:
Delete a non-existing file is fine, and also deleting a file when a directory
with the same name just ignores the delete.
This diff adds tests to cover these cases. Overall it seems like a bug, but I'm
not sure it worth fixing - who knows if we have bonsai changesets that rely on
that!
Reviewed By: yancouto
Differential Revision: D30990826
fbshipit-source-id: b04992817469abe2fa82056c4fddac3689559855
Summary:
This method allows to append a value instead of just replacing it.
It will be used in the next diff when we derive manifest for a stack of commits
in one go.
Reviewed By: yancouto
Differential Revision: D30989889
fbshipit-source-id: dd9a574609b4d289c01d6eebcc6f5c76a973a96b
Summary:
The NFS protocol needs to know if a read reached the end-of-file to avoid a
subsequent read and thus reduce the chattyness of the protocol.
On top of avoiding RPC calls, this should also halve the amount of data read
from Mercurial due to the BlobCache freeing the in-memory cached blob when the
FS has read the file in its entirety. This meant that the second READ would
always force the blob to be reloaded from the Mercurial store, which would also
force that blob to be kept in memory until being evicted (due to it not being
fully read).
Reviewed By: kmancini
Differential Revision: D30871422
fbshipit-source-id: 8acf4e21ea22b2dfd7f81d2fdd1b137a6e90cc8f
Summary:
Changes:
- Limit simultainous open git-repo objects by amount of CPUs.
- Put a semaphore limit so we wait inside tokio::task domain instead of tokio::blocking domain (later is more expensive and has a hard upper limit).
Reviewed By: mitrandir77
Differential Revision: D30976034
fbshipit-source-id: 3432983b5650bac6aa5178d98d8fd241398aa682
Summary:
This allows the mononoke_api user to choose whether the skiplists
should be used to spped up the ancestry checks or not.
The skiplists crate is already prepared for the situation where skiplist
entries are missing and traverses the graph then.
Reviewed By: yancouto
Differential Revision: D30958909
fbshipit-source-id: 7773487b78ac6641fa2a427c55f679b49f99ac8d
Summary:
Allow the mononoke_api user to choose whether they want
oprerations to be sped up using WBC or not.
Reviewed By: yancouto
Differential Revision: D30958908
fbshipit-source-id: 038cf77735e7c655f6801d714762e316b6817df5
Summary:
Some crates like mononoke_api depend on warm bookmark cache to speed up the
bookmark operations. This prevents them from being used in cases requiring
quick and low overhead startup like CLIs.
This diff makes it possible to swap out the warm bookmark cache to a
implementation that doesn't cache anything. (See next diffs to see how it's
used in mononoke_api crate).
Reviewed By: yancouto
Differential Revision: D30958910
fbshipit-source-id: 4d09367217a66f59539b566e48c8d271b8cc8c8e
Summary:
This method was added before the more generic list method was added. Let's get
rid of it for simplicity and to discourage listing all the bookmarks.
Reviewed By: yancouto
Differential Revision: D30958911
fbshipit-source-id: f4518da3f34591c313657161f69af96d15482e6c
Summary:
0.4.24 is incompatible with crates that use `deny(warnings)` on a compiler 1.55.0 or newer.
Example error:
```
error: unused borrow that must be used
--> common/rust/shed/futures_ext/src/stream/return_remainder.rs:22:1
|
22 | #[pin_project]
| ^^^^^^^^^^^^^^
|
= note: this error originates in the derive macro `::pin_project::__private::__PinProjectInternalDerive` (in Nightly builds, run with -Z macro-backtrace for more info)
```
The release notes for 0.4.28 call out this issue. https://github.com/taiki-e/pin-project/releases/tag/v0.4.28
Reviewed By: krallin
Differential Revision: D30858380
fbshipit-source-id: 98e98bcb5a6b795b93ed1efd706a1711f15c57db
Summary:
Move optional line handling logic into a separate function and simplify.
This diff is intended to be a pure refactoring with no observable changes in behavior. In particular, all the code dealing with the "optional" list appears to be dead code because if the line is optional, linematch will return "retry", so that branch is never reachable.
Reviewed By: DurhamG
Differential Revision: D30849757
fbshipit-source-id: 17283f9217466b3f85d913da66222b9a6779abe4
Summary:
This line was iterating over a list of files and looking in the
manifest for each one. This results in serial manifest reads which can result in
serial network requests.
Let's instead use manifest.matches() to test them all at once via the underlying
BFS, which does bulk fetching.
Differential Revision: D30938359
fbshipit-source-id: 1af7d417288b82efdd537a4afeaf93c1b55eaf49
Summary:
Demonstrate issues with the vertex to path resolution. Basically, the vertex to
path resolution logic did not check if the "parent of merge" being used is
actually valid (is an ancestor of provided heads) or not.
Reviewed By: DurhamG
Differential Revision: D30911150
fbshipit-source-id: 83d215910d5ba67ac0d5749927018a7aefcc6730
Summary:
The tree metadata fetching evolution goes as follow
(1) (commit, path) scs query
(2) tree manifest scs query [we are here]
(3) eden api manifest query [in development]
Option (1) is no longer used and is the only placed that required scs proxy hash.
Removing it will simplify transition from (2) to (3) and also cleans up bunch of unused code.
It also comes with minor performance improvement, saving about 5% on file access time.
To be precise, this is measured by running fsprobe [this is probably too little to measure in high noise benchmark like running arc focus]:
```
fsprobe.sh run cat.targets --parallel 24
```
Results:
```
W/ scshash:
P24: 0.1044 0.1007 0.1005 (hot) 0.1019 avg
W/o scshash:
P24: 0.0954 0.0964 0.1008 (hot) 0.0975 avg
```
This performance improvement comes from the fact, that even though scs hash was never created or used, we still attempted to load it from scs table, and even though this load always failed it contributed to execution time.
Reviewed By: xavierd
Differential Revision: D30942663
fbshipit-source-id: af84f1e5658e7d8d9fb6853cbb88f02b49cd050b
Summary: File access latency can actually be less then 1 ms, so it's good to show more digits
Reviewed By: DurhamG
Differential Revision: D30942905
fbshipit-source-id: 2fc8d48dbc08c55b89d829d1474ae11c2c3df1c3
Summary:
Since fsprobe itself requires a 'plan' to run, we need separate script to standartize list of plans we think are relevant
This scripts allows to generate fsprobe plans and run them
Reviewed By: DurhamG
Differential Revision: D30908892
fbshipit-source-id: eb722fe1f6d982e42b66614f08bc73345e04f9e6