sapling/eden/mononoke/manifest/src
Stanislau Hlebik 35a7998c07 mononoke: add a method to derive a simple stack of manifests
Summary:
Background: I've been looking into derived data performance and found that
while overall performance is good, it depends quite a lot on the blobstore
latency i.e. the higher the latency the slower the derivation. What's worse is
that increasing blobstore latency even by 100ms might increase time of
derivation of 100 commits from 12 to 65 secs! [1]

However we have ways to mitigate it:
* **Option 1** If we use "backfill" mode then it makes derived data derivation less
sensitive to the put() latency
* **Option 2** If we use "parallel" mode then it makes derived data derivation less
sensitive to the get() latency.

We can use "backfill" mode for almost all derived data types (only exception is
filenodes), however "parallel" only enabled for a few derived data types (e.g.
fsnodes, skeleton manifests, filenodes).

In particular, we didn't have a way to do batch derived data derivation for
unodes, and so unodes derivation might get quite sensitive to the blobstore
get() latency. So this diff tries to address that.

I considered three options:
* **Option 1** The simplest option of implementing "parallel" mode for unodes is to just
do a unode warmup before we start a sequential derivation for a stack of commits. After the
warmup all necessary entries should be in cache, so derivation should be less latency sensitive.
This could work, but it has a few disadvantages, namely:
* We do additional traversal - not the end of the world, but it might get
 expensive for large commits
* We might fetch large directories that don't fit in cache more often than we
need to.

That said, because of it's simplicity it might be a reasonable option to keep
in mind, and I might get back to it later.

* **Option 2** Do a derivation for a stack of commits. We have a function to derive a
manifest for a single commit, but we could write a similar function to derive the whole stack at once.
That means for each changed file or directory we generate not a single change
but a stack of changes.
I was able to implement it, but the code was too complicated. There were quite
a few corner cases (particularly when a file was replaced with a directory, or
when deriving a merge commit), and dealing with all of them was a pain.
Moreover, we need to make sure it works correctly in all scenarios, and that
wouldn't be an easy thing to do.

* **Option 3** Do a derivation for a "simple" stack of commits. That's basically the
simplified version of option #2. Let's allow doing batch derivation only for
stacks that have no
a) merges
b) path changes that are ancestors of each other (which cause file/dir
conflicts).

This implementation is significantly simpler than option #2, and it should
cover most of the cases and hopefully bring perf benefits (though this is
something I'm yet about to measure). So this is what this diff implements

Reviewed By: yancouto

Differential Revision: D30989888

fbshipit-source-id: 2c50dfa98300a94a566deac35de477f18706aca7
2021-10-13 08:48:27 -07:00
..
bonsai.rs bounded_traversal: require futures to be boxed 2021-03-12 08:12:57 -08:00
derive_batch.rs mononoke: add a method to derive a simple stack of manifests 2021-10-13 08:48:27 -07:00
derive.rs Increase available CPU parallelism in Mercurial derived data 2021-06-07 01:51:36 -07:00
implicit_deletes.rs mononoke/blobstore: make Blobstore generic over lifetime 2020-11-20 05:51:52 -08:00
lib.rs mononoke: add a method to derive a simple stack of manifests 2021-10-13 08:48:27 -07:00
ops.rs bounded_traversal: require futures to be boxed 2021-03-12 08:12:57 -08:00
ordered_ops.rs bounded_traversal: require futures to be boxed 2021-03-12 08:12:57 -08:00
select.rs admin: add --ordered to skeleton manifest tree command 2021-02-02 09:00:17 -08:00
tests.rs mononoke: add a method to derive a simple stack of manifests 2021-10-13 08:48:27 -07:00
types.rs mononoke: add new PathTree insert_and_merge method 2021-09-16 13:58:03 -07:00