zed/crates/project_panel
Max Brunsfeld f128cf4a33
Defer scanning some worktree subdirectories until they are expanded in the project panel (#2622)
Closes
https://linear.app/zed-industries/issue/Z-352/high-memory-usage-from-fs-scanning-if-project-contains-symlinks-that

### Background

Currently, when you open a project, Zed eagerly scans the directory,
building an in-memory representation of all of the files and directories
within. This scanning includes all git-ignored files and follows any
symlinks. When any directory changes on disk, Zed recursively rescans it
in order to keep its in-memory representation up-to-date. When
collaborating, all of these files are replicated to all guests.

Right now, there are some performance problems associated with the
maintenance of this filesystem state:
* For various reasons, some projects contain symlinks that point out to
large folders like `$HOME`, which itself contains many symlinks that
point to the same large directory. When these projects are opened, the
worktree scans endlessly, using more and more memory.
* Some git-ignored directories (like `target` in a rust project) contain
*many* more files than are actually tracked in the git repository. These
files often change as a result of saving, (e.g. because the compiler
runs). Maintaining in memory all of these paths isn't useful to the
user, and causes significant CPU usage on every save. Most importantly,
when collaborating sending all of these changes to guests can be slow,
and can delay all other RPC messages.

### Change

This PR changes the worktree's filesystem-scanning logic to be *lazy*
about scanning two types of directories:
* git ignored directories
* "external" directories (those that are canonically located outside of
the worktree root, but accessed via symlinks)

The laziness works as follows. When, during a recursive scan, a
directory is found that falls into one of the above 2 categories, that
directory is marked as "unloaded". The directory might later be scanned,
if some explicit operation is performed within it (like opening a
buffer, or creating a file), if any collaborator expands that directory
in their project panel, or if an LSP requests that it be watched.

### Results

When collaborating on the `zed` folder:

| metric | before | after |
|-------|--------|------|
| # `worktree_entries` in collab db initially | 154,763 |  77,679 |
| # `worktree_entries` in collab db after 5 saves | 181,952 | 77,679
(nothing new to scan) |
| app memory footprint (host) | 260MB | 228.5 MB  |

The db thing is a win, because reading and writing to the
`worktree_entries` table is one of the most expensive thing that the
`collab` server does.

There's also generally lower background CPU usage after every save,
because we don't need to recursively rescan directories inside of
`target`.

### Limitations

We still end up scanning some unnecessary directories (like
`target/debug/build/zed-b612db829aeac16e/out`) because the LSP instructs
us to watch those.

### To do:

* [x] Expand parent directories of any path opened via LSP
* [x] Avoid creating orphaned entries when FS events happen inside of
unscanned directories
* [x] Scan any newly-non-ignored directories after gitignore changes
* [x] Emit correct events for newly-discovered paths when expanding dirs
* [x] GC the set of expanded directory ids when dirs are removed
* [x] Don't include "external" entries in file-finder
* [x] Expand any directories watched by LSP
* [ ] manual testing and profiling

### Release Notes:

- Fixed a bug where Zed would use excessive memory when a project folder
contained symlinks pointing to directories outside of the project.
- Reduced Zed's memory and CPU usage when working in folders containing
many git-ignored files.
2023-06-27 17:07:23 -07:00
..
src Defer scanning some worktree subdirectories until they are expanded in the project panel (#2622) 2023-06-27 17:07:23 -07:00
Cargo.toml Merge remote-tracking branch 'origin/main' into panels 2023-05-23 08:24:28 +02:00