sapling/eden/fs
Katie Mancini ffa558bf84 implement inode unloading after checkout
Summary:
NFSv3 has no inode invalidation flow built into the procall. The kernel does not
send us forget messages like we get in FUSE. The kernel also does not send us
notifications when a file is closed. Thus EdenFS can not easily tell when
all handles to a file have been closed.

As is now we never clean up inodes. This is bad for memory & disk usage.
We will never unload an inode so we always keep it in memory once it's created.
Additonally, we never remove a materialized inode from the overlay. This means
we have unbounded memory and disk usage :/

We need to clean up these inodes at somepoint. There are a couple high level
options:
1. Support nfsv4. NFSv4 sends us close message when a file handle is closed.
This would allow us to actually keep track of reference coundts on an inode.
However, This is a lot of work. There is a lot of other things we would have to
support before we can move to nfsv4.
2. Run background inode cleanups.

nfsv4 is probably the right long term solution. But for now we should be able to
get by with periodic unloads.

I considered a couple of options for unloads:
1. Unload inodes immediatly when files are removed.
2. Delay cleaning up inodes until a while after they are removed. (i.e. clean
up inodes n seconds after an `unlink`, `rename`, `rmdir`, or checkout)
3. Run periodic inode unloading. (i.e. once a day unload inodes).

Option 1. feels a bit too hostile to applications that hold files open.
Option 3. means we will build up a lot of cruft over the course of the day. But is
probably the most application friendly.

I decided to try out option 2 first and see if it works well with the common
developer tools. Its seems to work (see below) so I am going with it.

This diff only does inode cleanup after checkout. we might want to run inode
clean up after unlink/remove dir as well, but this would be more expensive.
Batch unloading feels better on checkout seems better to me and should happen
frequently enough to clean up space for people.

There is one known "broken" behavior in this diff. We unload all unlinked
inodes which means we will erase more inodes than we should. Sometimes EdenFS
crashes or bugs and unlinks legit inodes. Normally we let those live in the
overlay so we could go in an recover them. My plan to fix this is to mark inodes
for unloading instead of just unloading all unlinked inodes.

Reviewed By: xavierd

Differential Revision: D30144901

fbshipit-source-id: 345d0c04aa386e9fb2bd40906d6f8c41569c1d05
2021-09-16 14:35:04 -07:00
..
benchharness config: namespace facebook::eden 2021-06-08 19:29:37 -07:00
benchmarks benchmarks: add HgImportRequestQueue::dequeue benchmark 2021-08-26 12:28:51 -07:00
cli raise indulde dot files to a command arg 2021-09-14 10:02:33 -07:00
cli_rs third-party/rust: bump all the tracing packages 2021-09-15 16:52:25 -07:00
config implement inode unloading after checkout 2021-09-16 14:35:04 -07:00
docs Fix typo in inode documentation 2021-07-30 15:27:59 -07:00
fuse assign FS events to sampling groups 2021-09-08 11:40:22 -07:00
inodes implement inode unloading after checkout 2021-09-16 14:35:04 -07:00
journal Remove direct uses of gmock.h 2021-07-07 13:32:31 -07:00
model add an ObjectId type 2021-09-13 17:21:01 -07:00
monitor migrate from LockedPtr::getUniqueLock 2021-06-13 18:53:58 -07:00
nfs add inode number to NFS trace event 2021-09-14 10:44:46 -07:00
notifications notifications: support Windows 2020-11-11 09:37:56 -08:00
prjfs windows: invalidate negative path cache during start 2021-09-09 10:48:53 -07:00
py suppress errors in fbcode/eden - batch 1 2021-08-24 14:30:57 -07:00
rocksdb Remove dead includes in eden 2019-10-11 16:45:01 -07:00
scripts move eden/scripts/ into eden/fs/ 2020-11-04 18:29:49 -08:00
service implement inode unloading after checkout 2021-09-16 14:35:04 -07:00
sqlite overlay: use PersistentSqliteStatement in TreeOverlayStore 2021-03-15 12:01:48 -07:00
store deprecate scs proxy hash 2021-09-14 19:52:15 -07:00
takeover Remove direct uses of gmock.h 2021-07-07 13:32:31 -07:00
telemetry log FS trace events with HiveLogger 2021-09-02 10:32:03 -07:00
testharness rename Hash to Hash20 2021-09-08 16:27:10 -07:00
third-party fs: update fuse_kernel_linux.h 2021-03-17 20:55:43 -07:00
utils utils: mark SpawnedProcess as being waited when waitpid fails 2021-09-13 20:00:45 -07:00
CMakeLists.txt nfs: make it compile with getdeps builds 2021-02-03 17:54:54 -08:00