Summary:
Background: I've been looking into derived data performance and found that
while overall performance is good, it depends quite a lot on the blobstore
latency i.e. the higher the latency the slower the derivation. What's worse is
that increasing blobstore latency even by 100ms might increase time of
derivation of 100 commits from 12 to 65 secs! [1]
However we have ways to mitigate it:
* **Option 1** If we use "backfill" mode then it makes derived data derivation less
sensitive to the put() latency
* **Option 2** If we use "parallel" mode then it makes derived data derivation less
sensitive to the get() latency.
We can use "backfill" mode for almost all derived data types (only exception is
filenodes), however "parallel" only enabled for a few derived data types (e.g.
fsnodes, skeleton manifests, filenodes).
In particular, we didn't have a way to do batch derived data derivation for
unodes, and so unodes derivation might get quite sensitive to the blobstore
get() latency. So this diff tries to address that.
I considered three options:
* **Option 1** The simplest option of implementing "parallel" mode for unodes is to just
do a unode warmup before we start a sequential derivation for a stack of commits. After the
warmup all necessary entries should be in cache, so derivation should be less latency sensitive.
This could work, but it has a few disadvantages, namely:
* We do additional traversal - not the end of the world, but it might get
expensive for large commits
* We might fetch large directories that don't fit in cache more often than we
need to.
That said, because of it's simplicity it might be a reasonable option to keep
in mind, and I might get back to it later.
* **Option 2** Do a derivation for a stack of commits. We have a function to derive a
manifest for a single commit, but we could write a similar function to derive the whole stack at once.
That means for each changed file or directory we generate not a single change
but a stack of changes.
I was able to implement it, but the code was too complicated. There were quite
a few corner cases (particularly when a file was replaced with a directory, or
when deriving a merge commit), and dealing with all of them was a pain.
Moreover, we need to make sure it works correctly in all scenarios, and that
wouldn't be an easy thing to do.
* **Option 3** Do a derivation for a "simple" stack of commits. That's basically the
simplified version of option #2. Let's allow doing batch derivation only for
stacks that have no
a) merges
b) path changes that are ancestors of each other (which cause file/dir
conflicts).
This implementation is significantly simpler than option #2, and it should
cover most of the cases and hopefully bring perf benefits (though this is
something I'm yet about to measure). So this is what this diff implements
Reviewed By: yancouto
Differential Revision: D30989888
fbshipit-source-id: 2c50dfa98300a94a566deac35de477f18706aca7
Summary: I'd like to reuse it in the next diff, so let's move it to a separate function
Reviewed By: yancouto
Differential Revision: D31603911
fbshipit-source-id: 69ec0553022aabe662f75a50321423c54aafd196
Summary:
We don't really need it, and InnerRepo is more expensive to create, so let's
not do that.
Reviewed By: yancouto
Differential Revision: D31602266
fbshipit-source-id: 269bf50f8b4ee99e22888d91af2ac392078d32fa
Summary:
We were `hg adding` all files that were modified, however, we only really need to add files that didn't exist before the snapshot changes.
This gets rid of the annoying "already tracked!" messages.
Reviewed By: markbt
Differential Revision: D31438120
fbshipit-source-id: ca3545bc5881dcf01283abfa5ec9eca6309ff607
Summary: updated library.sh timeout handling to be clearer by checking the time change directly, and to use the health_check endpoint
Reviewed By: yancouto
Differential Revision: D31536407
fbshipit-source-id: 43cdf4260c10dfd4d3097dcd92d071c3d18b18f8
Summary:
We don't actually use bundle2 to deliver manifests anymore. Everything
is done via ondemand fetching, which won't have the problem covered in this
test. So let's delete it.
Reviewed By: quark-zju
Differential Revision: D31309313
fbshipit-source-id: 312508fa1b5e903314b92c048d23525c2194ab91
Summary:
This is part of removing filepeer. I also enabled treemanifest and
modernclient (i.e. lazy changelog) on a few tests.
Reviewed By: quark-zju
Differential Revision: D31032055
fbshipit-source-id: 6822274ad07303ed86b2ee5dd4e09979f1e215d5
Summary:
This test covers 'hg clone -r REV' which isn't really a supported clone
case since we now depend on cloning particular remote bookmarks. The test is
also heavily dependent on revision numbers. So let's just delete it.
This is needed as part of removing the tests dependency on hg server logic like
filepeer.
Reviewed By: quark-zju
Differential Revision: D31309312
fbshipit-source-id: e4620186e3eda3114686de36d06710747439ae18
Summary:
Ports test-bundle.t to use modernclient configs, meaning it no longer
uses the filepeer and it uses treemanifest and lazychangelog. I deleted the
parts of the test that depend on mounting a bundle file on top of a repo, since
that logic is unused in production.
Reviewed By: quark-zju
Differential Revision: D31309310
fbshipit-source-id: a535ed9a21253fd258f70088e7436480957afb2a
Summary:
We need to pass the file content blob with rename header to eagerepo,
not the one without the header. This resulted in a SHA1 mismatch assertion when
trying to make test-bundle.t use eagerepo.
Reviewed By: quark-zju
Differential Revision: D31309314
fbshipit-source-id: afaf3af3423b3f3006c1a95ddbf0da20056d9581
Summary: This is a step towards moving test-bundle.t to modernclient configs.
Reviewed By: quark-zju
Differential Revision: D31309318
fbshipit-source-id: 9680e85031797088d624a6c85e2d7316b108818e
Summary:
This test depends on a precomputed, legacy-formated, checked-in bundle
file. Bundle files are on their way to being replaced with commit cloud, and we
don't use a lot of the features in this test anyway (branches, merges) so let's
delete this test.
Reviewed By: quark-zju
Differential Revision: D31309316
fbshipit-source-id: b5729bed33a8c84fa75792528630d99dbc1996be
Summary:
Add a way to test EdenAPI endpoints.
The goal is to deduplicate with `make_req` and `read_res` tools to make it easier to
write new edenapi endpoints (no need to update `make_req` or `read_res`).
Reviewed By: DurhamG, yancouto
Differential Revision: D31465801
fbshipit-source-id: 5127941d0820ce737a4958a1d124f420acbaf771
Summary: Add a binding so Python code can use the Rust pprint implementation.
Reviewed By: DurhamG
Differential Revision: D31465826
fbshipit-source-id: b8f49f0d0587f82fae5906577acda95a09953e69
Summary:
I found that I need to "print" a value that might contain bytes and SHA1
hashes. JSON cannot do this job well because it does not support bytes.
Rust debug print can be too verbose.
Originally I tried:
- Python json: Not easily extendable to support binary data.
- Python repr: Not pretty.
- Python pprint: Lots of complexity on line wrapping, not easy to extend.
I ended up with an adhoc version of pprint (in D31465801):
# Similar to pprint, but much simpler (no textwrap) and rewrites
# binary nodes to hex(hexnode).
def format(o, indent, sort=sort):
if isinstance(o, bytes) and len(o) == 20:
yield 'bin("%s")' % hex(o)
elif isinstance(o, (bytes, str, bool, int)) or o is None:
yield repr(o)
elif isinstance(o, list):
yield "["
if sort:
o = sorted(o, key=lambda o: repr(o))
yield from formatitems(o, indent + 1)
yield "]"
elif isinstance(o, tuple):
yield "("
yield from formatitems(o, indent + 1)
yield ")"
elif isinstance(o, dict):
yield "{"
def fmt(kv, indent=indent + 1):
k, v = kv
kfmt = "".join(format(k, indent + 1)) + ": "
yield kfmt
yield from format(v, indent + len(kfmt))
items = sorted(o.items())
yield from formatitems(items, indent + 1, fmt)
yield "}"
else:
yield "?"
def formatitems(items, indent, fmt=None):
if fmt is None:
fmt = lambda o: format(o, indent)
total = len(items)
for i, o in enumerate(items):
if i > 0:
yield "\n"
yield " " * indent
yield from fmt(o)
if i + 1 < total:
yield ","
Later I found I need this feature in Rust too (D31465805 and D31465802).
So I translated the above Python code to Rust.
The Python syntax means the printed content can be copy-pasted to
Python files to form some quick scripts. Python code can also use
`eval` or `ast.literal_eval` (for safety, but no `bin` support) to
deserialize pprint output.
Reviewed By: DurhamG
Differential Revision: D31465797
fbshipit-source-id: ef4d17df84590075f74a0298ac89f4a963d8ed3c
Summary:
Serde deserializer is usually more permissive than restrictive. Make it a bit
more permissive in `deserialize_bytes` when we see a string.
Practically this means the Python bindings are a bit more permissive. However
the Rust interfaces are still strong typed so I think it is okay.
Reviewed By: DurhamG
Differential Revision: D31465810
fbshipit-source-id: a339b662f00a16953ce849ed8c2d65a1c3365081
Summary:
Use serde to decode HgId from Python. This allows us to get pure Rust types
earlier and get flexible hash support (binary or hex) from `HgId` for free.
Reviewed By: ahornby
Differential Revision: D31465821
fbshipit-source-id: 18c459d5fab6d508d0ec37f136bd45a7baf1e473
Summary:
See previous diff for context. This makes the endpoint a bit easier to use.
Also add retry, since it's now easier with the Vec type.
Reviewed By: yancouto
Differential Revision: D31416217
fbshipit-source-id: 40de6d14cf5cd088cd69156758699706bc7b8b8b
Summary:
See previous diff for context. This makes the endpoint a bit easier to use.
Also add retry, since it's now easier with the Vec type.
Reviewed By: yancouto
Differential Revision: D31416215
fbshipit-source-id: 53c169d17b2e16c285b5ea6d5c90ce0ee5d0b280
Summary:
See D31407278 (b68ef5e195) for a similar change. Basically, for lightweight metdata,
`Vec<T>` is more convenient than `Respoonse`. The prefix lookup endpoint
is lightweight. Let's switch to `Vec` fot the `EdenApi` trait.
Reviewed By: yancouto
Differential Revision: D31416216
fbshipit-source-id: 260235b57ddedcd7be8accb263a0090330445f7f
Summary:
See D31407278 (b68ef5e195) for a similar change. Basically, for lightweight metdata,
`Vec<T>` is more convenient than `Respoonse`. The commit graph endpoint
outputs commit hashes, which are lightweight. So let's switch to `Vec`
fot the `EdenApi` trait.
Reviewed By: yancouto
Differential Revision: D31410099
fbshipit-source-id: 0966b9afb47264c34ebe88355dc6df669dfb803b
Summary:
Previously the `clone.force-edenapi-clonedata` config decides whether to use
segments clone. That can be error prone - it will be a crash if the server
doesn't have the segments index.
Change it so we ask about the segments index and automatically use the lazy
changelog clone.
`clone.force-edenapi-clonedata` will be no longer necessary.
Differential Revision: D31358367
fbshipit-source-id: 9fa639d0349d00938c89cc091ea793f20dd714c8
Summary:
Expose the Rust capabilities endpoint to Python.
Note: I avoided some complexities here:
- Not implementing a `capabilities_py` function just for rustfmt purpose.
This avoids repeating the `capabilities` signature two more extra times.
- Not supporting legacy progress callback at all.
Reviewed By: yancouto
Differential Revision: D31358368
fbshipit-source-id: 76d0d71e627adbc57ed853922c4f826f3edfccb4
Summary:
Mononoke side implementation was done by D30831346 (8aa676ada0).
Note: I avoided some complexities here:
- Not using `paths::` constant since the constant isn't useful elsewhere. It
saves repeating the "capabilities" name a few times.
Reviewed By: yancouto
Differential Revision: D31358370
fbshipit-source-id: d75d057d1fdc44fffac9d136dbd10241d78b5cfd
Summary:
The capabilities method reports what optional features a repo has.
It was first introduced at Mononoke side. See D30831346 (8aa676ada0).
Reviewed By: yancouto
Differential Revision: D31358369
fbshipit-source-id: 673c30f660621279f84d451898dc3707974c1cae
Summary:
`EdenApi` is implemented in a few places including some test utilities.
Changing function signatures would require changing the `unimplemented!()`
test implementation too. That's a bit annoying.
Add default implementation and drop the `unimplemented!()` implementations
to make it easier to change edenapi interfaces.
Reviewed By: yancouto
Differential Revision: D31408643
fbshipit-source-id: 602ccd3ce545d1ab646bc32eb84976417f0df9f8
Summary:
When reading related code I found "No need to create new indexes" confusing.
Clarify it by stating that the index is wrong but no business logic depend on
it.
Differential Revision: D31146157
fbshipit-source-id: 4c73b4958ac6fb286bfc5b8256c8c03a26cda7b0
Summary:
When eden is not running on windows `eden rm` will spit out some error
messages and exits with error code 1. But it does actually successfully
remove the repo.
On linux removing a repo while eden is not running behaves just like if
eden were running.
Let's make the removal more graceful on windows.
Reviewed By: xavierd
Differential Revision: D31519805
fbshipit-source-id: d393922a9474e64251142207ae38a602766f17bf
Summary: Removes some boiler plate and implements the trait more consistently for Arc<X> as well as Arc<dyn X>
Reviewed By: Croohand
Differential Revision: D31558567
fbshipit-source-id: 503e745ccf4eba6dcf03b8a3f4ace49310c1c319
Summary: This simplifies test setup and means can stop throwing away creation SQL errors
Reviewed By: StanislavGlebik
Differential Revision: D31536408
fbshipit-source-id: 4d3df0f69c5b49719196c8a1b20d65c965d88869
Summary: sys.platform is "win32", not the path to python.
Reviewed By: fanzeyi
Differential Revision: D31554106
fbshipit-source-id: 64b388d2fb8a493f811a0cf22fe2471a25bfbf7e
Summary: Use atomic rename instead of fs::write when writing out the hgrc.dynamic config file. fs::write re-writes the file if it already exists which opens a race condition if two hg processes try to write the config at the same time.
Reviewed By: DurhamG
Differential Revision: D31548054
fbshipit-source-id: c5216bf6f9f2ff78d37a2c0e23c8c22fdf1d0897
Summary: Rebase (and other commands) print the mappings between old and new commits via the tweakdefaults extension. Previously we would sort the node mapping dictionary which yields apparent random ordering. Now instead we maintain dictionary order which by default gives us the dictionary insert order (which in rebase's case, at least, is revision order).
Reviewed By: DurhamG
Differential Revision: D31551290
fbshipit-source-id: 0b5d9c4846ba42154ae387bdf4777a61e5e1e605
Summary:
I've investigated the deadlock we hit recently and attempted to fix it with D31310626 (8ca8816fdd).
Now there's a bunch of facts, an unproven theory and a potential fix. Even though the theory is not proven and hence the fix might be wrong, the theory sounds plausible and the fix should be safe to land even if the theory is wrong.
But first, the facts,
### Facts
a) There's a repro for a deadlock - running `buck-out/gen/eden/mononoke/git/gitimport/gitimport#binary/gitimport --use-mysql-client --repo-name aosp --mononoke-config-path configerator://scm/mononoke/repos/tiers/prod --panic-fate=exit --cachelib-only-blobstore /data/users/stash/gitrepos/boot_images git-range 46cdf6335e1c737f6cf22b89f3438ffabe13b8ae b4106a25a4d8a1168ebe9b7f074740237932b82e` very often deadlocks (maybe not every time, but at least every 3rd run). So even though D31310626 (8ca8816fdd) unblocked the mega_grepo_sync, it doesn't seem like it fixed or even mitigated the issue consistently. Note that I'm running it on devbig which has quite a lot of cpus - this may or may not make deadlock more likely.
b) The code deadlocks on [`find_file_changes()`](https://fburl.com/code/7i6tt7om) function, and inside this function we do two things - first we find what needs to be uploaded using [`bonsai_diff()`](https://fburl.com/code/az3v3sbk) function, and then we use [`do_upload()`](https://fburl.com/code/kgb98kg9) function to upload contents to the mononoke repo. Note that both `bonsai_diff()` and `do_upload()` use git_pool. Bonsai_diff produces a stream, and then we apply do_upload to each element of the stream and finally we apply [`buffer_unordered(100)`](https://fburl.com/code/6tuqp3jd) to upload them all in parallel.
In simplified pseudo-code it looks like
```
bonsai_diff(..., git_pool, ...)
.map(|entry| do_upload(..., git_pool, entry, ....)
.try_buffer_unordered(100)
```
c) I've added a few printlns in P462159232, and run the repro command until it, well, repros. These printlns just shows when there was an attempt to grab a semaphore and when it successfully grabbed, and also who was the caller - `bonsai_diff()` or `do_upload()` function. This is the example output I'm getting P462159671. The output is a mess, but what's interesting here is that there exactly 100 more entries of "grabbing semaphore uploading" than "grabbed semaphore uploading". And 100 is exactly the size of the buffer in buffer_unordered.
### Theory
So above are the facts, and below is the theory that fits these facts (but as I said above, the theory is unproven yet). Let me start with a few assumption
1) Fact `c)` seem to suggest that when we run into a deadlock there was a full buffer of `do_upload()` futures that were all waiting on a semaphore.
1) If we look at [`buffer_unordered` code](https://docs.rs/futures-util/0.3.17/src/futures_util/stream/stream/buffer_unordered.rs.html#71) we can see that if buffer is full then it stops polling underlying stream until any of the buffered futures is done.
1) Once semaphore is released it [wakes up just a handful of futures](https://docs.rs/tokio/1.12.0/src/tokio/sync/batch_semaphore.rs.html#242) that wait for this semaphore.
Given this assumptions I believe the following situation is happening:
1) We get into a state where we have 100 `do_upload()` futures in buffer_unordered stream all waiting for a semaphore. At the same time all the semaphore permits are grabbed by `bonsai_diff()` stream (because of assumption #1)
2) Because of assumption #2 we don't poll underlying `bonsai_diff()` stream. However this stream has already [spawned a few tasks](https://fburl.com/code/9iq3tfad) which successfully grabbed the semaphore permit and are executing
3) Once these `bonsai_diff()` tasks are finished they [drop the semaphore permit](https://fburl.com/code/sw5fwccw), which in turn causes semaphore to be released and one of the tasks that are waiting to be woken up (see assumption #3). But now two things can happen - either a `do_upload()` future will be woken up or `bonsai_diff()` future will be woken up. If the former happens then `buffer_unordered` would poll the `do_upload()` future and continue executing. However if the latter happens (i.e. `bonsai_diff()` future was woken up) then it causes a deadlock - buffer_unordered() aren't going to poll `bonsai_diff()` stream, and so bonsai_diff() stream won't be able to make any progress. At the same time `do_upload()` futures aren't going to be polled at all because they weren't woken up in the first place, and so we deadlock.
There are a few things that seem unlikely in this theory (e.g. why we never wake up any of the `do_upload()` futures), so I do think it's worth validating. We could either add a bunch of println! in buffer_unordered and semaphore code, or try to setup [tokio-console](https://github.com/tokio-rs/console).
### Potential fix
Potential fix is simple - let's just add another spawn around `GitPool::with()` function. If the theory is right, then this fix should work. Once a semaphore is released and a `bonsai_diff()` future is woken up then this future will be spawned and hence will be able to complete successfully regardless of whether `buffer_unordered()` has its buffer full or not. And once `bonsai_diff()` fully completes then finally `do_upload()` futures will be woken up and `buffer_unordered()` will progress until completion.
If the theory is wrong then we just added one useless `tokio::spawn()`. We have a lot of spawns all over the place and `tokio::spawn` are cheap. So landing this fix shouldn't make things worse, but it might actually improve them (and the testing I did suggests that it indeed should improve them - see `Test plan` section below). I don't think it should be seen as a reason to not find the root cause
Reviewed By: mojsarn
Differential Revision: D31541432
fbshipit-source-id: 0260226a21e6e5e0b41736f86fb54a3abccd0a0b
Summary:
Incorporated Diffs:
* Fix blank lines between structured annotated fields D31372368
* Fix format for empty struct D31489644
Manual component version update
Bump Schedule: https://www.internalfb.com/intern/msdk/bump/?schedule_fbid=326075138621961
Package: https://www.internalfb.com/intern/msdk/package/608249566967196/
Oncall Team: nuclide
NOTE: This build is expected to expire at 2022/10/08 03:04PM PDT
---------
New project source changes since last bump based on D30698375 (e5f5b589b458e473ab300550737172d43167f2bf at 2021/09/01 04:03PM IST):
no related diffs found between since last bump
---------
build-break (bot commits are not reviewed by a human)
Reviewed By: zertosh
Differential Revision: D31517842
fbshipit-source-id: 9f04c0ad1f418ba8ec3c96b05366683413d93f97
Summary: This appears to have broken all the tests on Linux.
Reviewed By: zhengchaol
Differential Revision: D31505082
fbshipit-source-id: 610eb941d0f0eb536a4688ac2637a8466be0b05c
Summary:
xterm-color terminals don't support hiding the cursor the way curses likes to.
this crashes eden top for users.
eden top mostly functions fine otherwise in xterm-color, so let's make
this a non fatal error.
Note: the highlighting behavior is a little wierd (the styling after the
highlighted column does not work, no bolding etc). This looks like a wierd
curses bug on these terminals, so not sure if we can really fix this. I figure
a little wierd styling is still better than a crash.
Reviewed By: fanzeyi
Differential Revision: D31480121
fbshipit-source-id: 581ef7c548fd1f7986c4f93d8b797d7f7588c351
Summary: This will let us identify clients in scuba logs. This will be useful to identify whether a client has some specific feature enabled. We can use this instead of dedicated boolean columns like is_eden, or is_using_feature_X.
Reviewed By: krallin
Differential Revision: D31501373
fbshipit-source-id: 0e63b73659fd145f01098d60ced510e464730982
Summary:
The hg client provides a header which contains `ClientInfo` data. This
includes the sandcastle alias and nonce.
Update Mononoke to parse this header and then log the sandcastle alias and
nonce to scuba.
Reviewed By: krallin
Differential Revision: D31208450
fbshipit-source-id: f0971b668887de47fbab29b7ce9b0173cbdeafe4
Summary:
The `Metadata::new` function has grown quite a few optional arguments
which can make it difficult to read.
Simplify the `new` method and add new methods to record optional information
(such as encoded CATs).
Reviewed By: krallin
Differential Revision: D31500788
fbshipit-source-id: 9675c39f3061fef614e792e6ab6e365e9b423b2a
Summary:
The hg client provides extra info in a header that now includes the
sandcastle alias and nonce.
Update the LFS server to parse the header and log the sandcastle alias and
nonce to scuba.
Reviewed By: krallin
Differential Revision: D31208449
fbshipit-source-id: 773f0ec22060203c2c74a20090bd4893885506f8
Summary:
The hg client uses a version of reqwest that relies on tokio-0.2. In
the next diff I will add `clientinfo` as a dependency of Mononoke LFS server,
which also pulls in `configparser`. However, Mononoke LFS server uses
tokio-1.0.
Update the reqwest dependency to be the latest version.
Reviewed By: quark-zju
Differential Revision: D31389559
fbshipit-source-id: acf4c3b5c9df2a8bc8cfa134a2d314fbb96c3354