sapling/eden/mononoke/blobrepo
Thomas Orozco 0b083a74b1 mononoke/blobrepo_hg: optimize case conflict check performance
Summary:
Our case conflict checking is very inefficient on large changesets. The root
cause is that we traverse the parent manifest for every single file we are
modifying in the new changeset.

This results in very poor performance on large changes since we end up
reparsing manifests and doing case comparisons a lot more than we should. In
some pathological cases, it results in us taking several *minutes* to do a case
conflict check, with all of that time being spent on CPU lower-casing strings
and deserializing manifests.

This is actually a step we do after having uploaded all the data for a commit,
so this is pure overhead that is being added to the push process (but note it's
not part of the pushrebase critical section).

I ended up looking at this issue because it is contributing to the high
latencies we are seeing in commit cloud right now. Some of the bundles I
checked had 300+ seconds of on-CPU time being spent to check for case
conflicts. The hope is that with this change, we'll get fewer pathological
cases, and might be able to root cause remaining instances of latency (or have
that finally fixed).

This is pretty easy to repro.

I added a binary that runs case conflict checks on an arbitrary commit, and
tested it on `38c845c90d59ba65e7954be001c1eda1eb76a87d` (a commit that I noted
was slow to ingest in commit cloud, despite all its data being present already,
meaning it was basically a no-op). The old code takes ~3 minutes. The new one
takes a second.

I also backtested this by rigging up the hook tailer to do case conflict checks
instead (P145550763). It is about the same speed for most commits (perhaps
marginally slower on some, but we're talking microseconds here), but for some
pathological commits, it is indeed much faster.

This notably revealed one interesting case:

473b6e21e910fcdf7338df66ee0cbeb4b8d311989385745151fa7ac38d1b46ef (~8K files)
took 118329us in the new code (~0.1s), and 86676677us in the old (~87 seconds).

There are also commits with more files in recent history, but they're
deletions, so they are just as fast in both (< 0.1 s).

Reviewed By: StanislavGlebik

Differential Revision: D24305563

fbshipit-source-id: eb548b54be14a846554fdf4c3194da8b8a466afe
2020-10-15 09:49:39 -07:00
..
blobrepo_hg mononoke/blobrepo_hg: optimize case conflict check performance 2020-10-15 09:49:39 -07:00
blobsync mononoke: update Memblob::new callsites to ::default() 2020-10-07 12:11:10 -07:00
changeset_fetcher segmented_changelog: add on-demand updating dag implementation 2020-09-02 17:20:42 -07:00
common Update formatter to rustfmt 2.0 2020-09-09 07:52:33 -07:00
errors mononoke/types: indicate what path conflicted in a case conflict 2020-10-15 09:49:39 -07:00
factory mononoke: update Memblob::new callsites to ::default() 2020-10-07 12:11:10 -07:00
override blobrepo: move ChangesetFetcher to attributes 2020-09-02 17:20:41 -07:00
repo_blobstore mononoke: remove assert_present from Blobstore trait 2020-10-01 01:23:52 -07:00
src Update formatter to rustfmt 2.0 2020-09-09 07:52:33 -07:00
test mononoke/blobrepo_hg: optimize case conflict check performance 2020-10-15 09:49:39 -07:00
Cargo.toml move existing changeset derivation logic to mercurial_derived_data 2020-09-09 07:56:32 -07:00