sapling/lib/pathmatcher
Jun Wu 3ffa0f28e2 gitignore: avoid quadratic behavior
Summary:
The correct gitignore matcher needs O(N^2) time to check a path which is N
directory deep. For example, to check "a/b/c/d", it needs to check:

  - Whether .gitignore matches a/b/c/d
  - Whether a/.gitignore matches b/c/d
  - Whether a/b/.gitignore matches c/d
  - Whether a/b/c/.gitignore matches d

  - Whether .gitignore matches a/b/c
  - Whether a/.gitignore matches b/c
  - Whether a/b/.gitignore matches c

  - Whether .gitignore matches a/b
  - Whether a/.gitignore matches b

  - Whether .gitignore matches a

It might not look that bad because N=4 for the above example. But when N is
larger (ex. node_modules/../node_modules/../node_modules/..), things get much
worse.

This patch adds "caching" about whether a directory is ignored or not. For
example, if "a/b/" is ignored, the new code would skip checking subdirectories
(ex. "a/b/c/"). The time complexity is now roughly O(N) gitignore tests instead
of O(N^2), since we only did a gitignore check for a parent directory of a path
being tested once, and then cache the parent directory result in a boolean
value.

To be clear, for the first time checking a path which is not ignored, it still
needs O(N^2) for initializing the trees. But once it's initialized, the next
time checking a file in a same directory, will be O(N).

`LruCache` is replaced by `HashMap` since it does not support `.get` and the
code needs that to work.

The perf issue was previously documented as a "PERF" comment.
This diff removes it.

Reviewed By: DurhamG

Differential Revision: D7496058

fbshipit-source-id: f10895b8f0d7dcdde6faf9daeec5cd78a1f15a2b
2018-04-13 21:51:48 -07:00
..
src gitignore: avoid quadratic behavior 2018-04-13 21:51:48 -07:00
Cargo.toml gitignore: avoid quadratic behavior 2018-04-13 21:51:48 -07:00