Summary:
There is inevitably interaction between caching, deduplication and rate
limiting:
- You don't want the rate limiting to be above caching (in the blobstore stack,
that is), because you shouldn't rate limits cache hits (this is where we are
today).
- You don't want the rate limiting to below deduplication, because then you get
priority inversion where a low-priority rate-limited request might hold the
semaphore while a higher-priority, non rate limited request wants to do the
same fetch (we could have moved rate limiting here prior to introducing
deduplication, but I didn't do it earlier because I wanted to eventually
introduce deduplication).
So, now that we have caching and deduplication in the same blobstore, let's
also incorporate rate limiting there!.
Note that this also brings a potential motivation for moving Memcache into this
blobstore, in case we don't want rate limiting to apply to requests before they
go to the _actual_ blobstore (I did not do this in this diff).
The design here when accessing the blobstore is as follows:
- Get the semaphore
- Check if the data is in cache, if so release the semaphore and return the
data.
- Otherwise, check if we are rater limited.
Then, if we are rate limited:
- Release the semaphore
- Wait for our turn
- Acquire the semaphore again
- Check the cache again (someone might have put the data we want while we were
waiting).
- If the data is there, then return our rate limit token.
- If the data isn't there, then proceed to query the blobstore.
If we aren't rate limited, then we just proceed to query the blobstore.
There are a couple subtle aspects of this:
- If we have a "late" cache hit (i.e. after we waited for rate limiting), then
we'll have waited but we won't need to query the blobstore.
- This is important when a large number of requests from the same key
arrive at the same time and get rate limited. If we don't do this second
cache check or if we don't return the token, then we'll consume a rate
limiting token for each request (instead of 1 for the first request).
- If a piece of data isn't cacheable, we should treat it like a cache hit with
regard to semaphores (i.e. release early), but like a miss with regard to
rate limits (i.e. wait).
Both of those are addressed captured in the code by returning the `Ticket` on a
cache hit. We can then choose to either return the ticket on a cache hit, or wait
for it on a cache miss.
(all of this logic is captured in unit tests, we can remove any of the blocks
there in `Shards::acquire` and a test will fail)
Reviewed By: farnz
Differential Revision: D22374606
fbshipit-source-id: c3a48805d3cdfed2a885bec8c47c173ee7ebfe2d
Summary:
Sometimes we take a token then realize we don't want it. In this case, giving it back is convenient.
This adds this!
Reviewed By: farnz
Differential Revision: D22374607
fbshipit-source-id: ccf47e6c75c37d154704645c9e826f514d6f49f6
Summary:
This is a mirror image of a diff, which made backsyncer use `LiveCommitSyncConfig`: we want to use configerator-based live configs, when we run in the continuous tailing mode.
As no-op iteration time used to be 10s and that's a bit wasteful for tests, this diff changes it to be configurable.
Finally, because of instantiating various additional `CommitSyncerArgs` structs, this diff globs out some of the `using repo` logs (which aren't very useful as test signals anyway, IMO).
Reviewed By: StanislavGlebik
Differential Revision: D22209205
fbshipit-source-id: fa46802418a431781593c41ee36f468dee9eefba
Summary: Tidy up some comments in this file.
Reviewed By: ikostia
Differential Revision: D22376165
fbshipit-source-id: ce4760776048aa8e72809b4f828d0ea426fcf878
Summary: This diff actually start to use the option
Reviewed By: krallin
Differential Revision: D22373943
fbshipit-source-id: fe23da9c3daa1f9f91a5ee5e368b33e0091aa9c1
Summary:
Previously if a blame request was rejected (e.g. because a file was too large)
then we returned BlameError::Error.
This doesn't look correct, because there's BlameError::Rejected. This diff
makes it so that fetch_blame function returns BlameError::Rejected
Reviewed By: aslpavel
Differential Revision: D22373948
fbshipit-source-id: 4859809dc315b8fd66f94016c6bd5156cffd7cc2
Summary:
In the next diffs we'll need to read override_blame_filesize_limit from derived
data config, and this config is stored in BlobRepo.
this diff makes a small refactoring to pass BlobRepo to fetch_full_file_content
Reviewed By: krallin
Differential Revision: D22373946
fbshipit-source-id: b209abce82c0279d41173b5b25f6761659a92f3d
Summary: This will make adding blame file size limit override the next diffs easier
Reviewed By: krallin
Differential Revision: D22373945
fbshipit-source-id: 4857e43c5d80596340878753ea90bf31d7bb3367
Summary:
We're always yielding zero or one child during traversal, bounded traversal is
unnecessary here
Differential Revision: D22242148
fbshipit-source-id: b4c8a1279ef7bd15e9d0b3b2063683f45e30a97a
Summary:
Let's use new option in CLI. Unfortunately we can't easily accept commit ids in
named params so it has to be a postional one.
Differential Revision: D22234412
fbshipit-source-id: a9c27422fa65ae1c42cb1c243c7694507a957437
Summary:
If anything were to go wrong, we'd be happy to know which puts we ignored. So,
let's log them.
Reviewed By: farnz
Differential Revision: D22356714
fbshipit-source-id: 5687bf0fc426421c5f28b99a9004d87c97106695
Summary:
Eventually, I plan to make this the default, but for now I'd like to make it
something we can choose to turn on or off as a cmd argument (so we can start
with the experimental tier and Fastreplay).
Note that this mixes volatile vs. non-volatile pools when accessing the pools
for cacheblob. In practice, those pools are actually volatile, it's just that
things don't break if you access them as non-volatile.
Reviewed By: farnz
Differential Revision: D22356537
fbshipit-source-id: 53071b6b21ca5727d422e10f685061c709114ae7
Summary:
I canaried this on Fastreplay, but unfortunately that showed that sometimes we
just deadlock, or get so slow we might as well be deadlocked (and it happens
pretty quickly, after ~20 minutes). I tried spawning all the `get()` futures,
and that fixes the problem (but it makes gettreepack noticeably slower), so
that suggests something somewhere is creating futures, polling them a little
bit, then never driving them to completion.
For better or worse, I'd experienced the exact same problem with the
ContextConcurrencyBlobstore (my initial attempt at QOS, which also used a
semaphore), so I was kinda expecting this to happen.
In a sense, this nice because I we've suspected there were things like that in
the codebase for a while (e.g. with the occasional SQL timeout we see where it
looks like MySQL responds fast but we don't actually poll it until past the
timeout), and it gives us a somewhat convenient repro.
In another sense, it's annoying because it blocks this work :)
So, to work around the problem, for now, let's spawn futures to force the work
to complete when a semaphore is held. I originally had an unconditional spawn
here, but that is too expensive for the cache-hit code path and slows things
down (by about ~2x).
However, having it only if we'll query the blobstore isn't not as expensive,
and that seems to be fine (in fact it is a ~20% p99 perf improvement,
though the exact number depends on the number of shard we use for this, which I've had to tweak a bit).
https://pxl.cl/1c18H
I did find what I think is one potential instance of this problem in
`bounded_traversal_stream`, which is that we never try to poll `scheduled` to
completion. Instead, we just poll for the next ready future in our
FuturesUnordered, and if that turns out to be synchronous work then we'll just
re-enqueue more stuff (and sort of starve async work in this FuturesUnordered).
I tried updating bounded traversal to try a fairer implementation (which polls
everything), but that wasn't sufficient to make the problem go away, so I think
this is something we have to just accept for now (note that this actually has
some interesting perf impact in isolation: it's a free ~20% perf improvement on
p95+: https://pxl.cl/1c192
see 976b6b92293a0912147c09aa222b2957873ef0df if you're curious
Reviewed By: farnz
Differential Revision: D22332478
fbshipit-source-id: 885b84cda1abc15c51fbc5dd34473e49338e13f4
Summary: Like it says in the title. Those are useful!
Reviewed By: farnz
Differential Revision: D22332479
fbshipit-source-id: f9bddad75fcbed2593c675f9ba45965bd87f1575
Summary:
The goal of this blobstore is to dedupe reads by waiting for them to finish and
hit cache instead (and also to dedupe writes, but that's not relevant here).
However, this is not a desirable feature if a blob cannot be stored in cache,
because then we're serializing accesses for no good reason. So, when that
happens, we store "this cannot be stored in cache", and we release reads
immediately.
Reviewed By: farnz
Differential Revision: D22285269
fbshipit-source-id: be7f1c73dc36b6d58c5075172e5e3c5764eed894
Summary:
I'm going to store things that aren't quite the exact blobs in here, so on the
off chance that we somehow have two caching blobstores (the old one and this
one) that use the same pools, we should avoid collisions by using a prefix.
And, since I'm going to use a prefix, I'm adding a newtype wrapper to not use
the prefixed key as the blobstore key by accident.
Differential Revision: D22285271
fbshipit-source-id: e352ba107f205958fa33af829c8a46896c24027e
Summary:
This introduces a caching blobstore that deduplicates reads and writes. The
underlying motivation is to improve performance for processes that might find
themsleves inadvertently reading the same data concurrently from a bunch of
independent callsites (most of Mononoke), or writing the same bit of data over
and over again.
The latter is particularly useful for things like commit cloud backfilling in
WWW, where some logger commits include the same blob being written hundreds or
thousands of times, and cause us to overload the underlying Zippy shard in
Manifold. This is however a problem we've also encountered in the past in e.g.
the deleted files manifest and had to solve there. This blobstore is a little
different in the sense that it solves that problem for all writers.
This comes at the cost of writes being dropped if they're known to be
redundant, which prevents updates through this blobstore. This is desirable for
most of Mononoke, but not all (notably, for skiplist updates it's not great).
For now, I'm going to add this behind an opt-in flag, and later on I'm planning
to make it opt-out and turn it off there (I'm thinking to use the CoreContext
for this).
Reviewed By: farnz
Differential Revision: D22285270
fbshipit-source-id: 4e3502ab2da52a3a0e0e471cd9bc4c10b84a3cc5
Summary: This allowed me to compare two alternative approaches to queue draining, and generally seems like a useful thing to do.
Reviewed By: krallin
Differential Revision: D22364733
fbshipit-source-id: b6c76295c85b4dec6f0bfd7107c30bb4e4a28942
Summary: It's useful to derive all enabled derived data at once
Reviewed By: krallin
Differential Revision: D22336338
fbshipit-source-id: 54bc27ab2c23c175913fc02e6bf05d18a54c249c
Summary:
We've recently added an option to perform a stack move in megarepolib. A "stack
move" it's a stack of commits that move a files according to a mover. Now let's
expose it in the megarepotool
Reviewed By: ikostia
Differential Revision: D22312486
fbshipit-source-id: 878d4b2575ed2930bbbf0b9b35e51bb41393e622
Summary:
Printing it can take too much time.
Use a larger threshold than `list` or `set`, since `hg` commonly uses a dict
for command line options, and that can exceeds 8 items.
As we're here, also fix the traceback order so it's "most recent call last".
Reviewed By: kulshrax
Differential Revision: D22362003
fbshipit-source-id: 3d2f4664bec6b4cfaf42b8e5d2fc47b0f3d96411
Summary: It was moved by D22305173 (fdba0b98c2). Skip testing it.
Reviewed By: kulshrax
Differential Revision: D22364199
fbshipit-source-id: 3e205daa5aac517020664005a6f95d0292674bc3
Summary:
In order to do what the title says, this diff does:
1. Add the `eden/oss/.../third-party/rust/.../Cargo.toml` files. As mentioned in the previous diff, those are required by GitHub so that the third party dependencies that are local in fbsource are properly defined with a "git" dependency in order for Cargo to "link" crates properly.
2. Changes to `eden/scm/Makefile` to add build/install commands for getdeps to invoke. Those command knowing that they are called from withing getdeps context they link the dependencies brought by getdeps into their proper places that match their folder layout in fbsource. Those Makefile commands also pass a GETDEPS_BUILD env to the setup.py invocations so that it knows it is being called withing a getdeps build.
3. Changes to `eden/scm/setup.py` that add "thriftasset" that makes use of the getdeps.py provided "thrift" binary to build .py files out of thrift files.
4. Changes to `distutils_rust` to use the vendored crates dir provided by getdeps.
5. Changes to `getdeps/builder.py` and `getdeps/manifest.py` that enable more fine-grained configuratior of how Makefile builds are invoked.
6. Changes to `getdeps/buildopts.py` and `getdeps/manifest.py` to disable overriding PATH and pkgconfig env, so that "eden/scm" builds in getdeps using system libraries rather than getdeps-provided ones (NOTE: I've tried to use getdeps provided libraries, but the trickiest bit was that Rust links with Python, which is currently not providable by getdeps, so if you try to build everything the system provided Python libraries will collide with getdeps provided ones)
7. Added `opensource/fbcode_builder/manifests/eden_scm` for the getdeps build.
Reviewed By: quark-zju
Differential Revision: D22336485
fbshipit-source-id: 244d10c9e06ee83de61e97e62a1f2a2184d2312f
Summary:
The test case assumed that clone would return data in order from the server.
That is not a valid assumption and Mononoke doesn't return data in order.
Reviewed By: xavierd
Differential Revision: D22364636
fbshipit-source-id: abfcbe0074a08c9a76c42d351ce5c792eb65e24f
Summary:
Enable mutation, remotenames and narrow-heads (for clients).
Some `x` commits are shown because the server repo cannot hide them.
Reviewed By: DurhamG
Differential Revision: D22200501
fbshipit-source-id: 35abe5198025598f4f5ef8807c2eaa6f3b2f2318
Summary:
D21894320 (64585a5895) tries to disable features from hgsql. However tests can
run hg commands with hgsql extension turned on or off, and that
messes up things.
Let's test the hgsql requirement and disable related features unconditionally
so we have more confident that the related features won't be turned on
accidentally.
Previously, the `initclient` helper function will create repos with the `hgsql`
requirement. Since we're using the repo requirement to disable features, disable
`hgsql` temporarily during `initclient` so that the client repos can still have
modern features like visibility and narrow-heads.
The globalrevs test is affected and temporarily disabled. It will be fixed in
follow-ups.
Reviewed By: DurhamG
Differential Revision: D22200507
fbshipit-source-id: 3715464430a9115bb00122a8dfa03b1abf9d03ab
Summary: This helps debugging some hgsql issues especially in test-globalrevs.
Reviewed By: DurhamG
Differential Revision: D22200509
fbshipit-source-id: cddf79a00951d135dee20a5e9fb3a486abad2ff7
Summary:
Without the change the traceback looks like:
```
Traceback (most recent call last):
File "eden/scm/tests/testutil/dott/shobj.py", line 103, in __del__
out = self.output
File "eden/scm/tests/testutil/dott/shobj.py", line 67, in output
self._output = func(*args[1:], **kwargs) or ""
File "eden/scm/tests/test-globalrevs-t.py", line 499, in checkglobalrevs
isgreaterglobalrev("desc('h2')", "desc('g1')") == ""
File "eden/scm/tests/test-globalrevs-t.py", line 480, in isgreaterglobalrev
if getglobalrev(left) > getglobalrev(right):
File "eden/scm/tests/test-globalrevs-t.py", line 469, in getglobalrev
return int(s)
ValueError: invalid literal for int() with base 10: ''
```
There is no clue about who calls `checkglobalrevs`.
With this change the extra context is provided and it's easy to find out the code is around line 517:
```
Before executing:
File "eden/scm/tests/test-globalrevs-t.py", line 517, in <module>
sh % "cd ../master2"
File "eden/scm/tests/testutil/dott/shobj.py", line 161, in __mod__
return LazyCommand(command)
File "eden/scm/tests/testutil/dott/shobj.py", line 38, in __init__
_checkdelayedexception()
File "eden/scm/tests/testutil/dott/shobj.py", line 204, in _checkdelayedexception
traceback.print_stack()
```
Note: before D19649475 (a634526801) the traceback looks like more useful but it's hard to
create that traceback on Python 3:
```
Traceback (most recent call last):
File "eden/scm/tests/test-globalrevs-t.py", line 517, in <module>
sh % "cd ../master2"
File "eden/scm/tests/testutil/dott/shobj.py", line 161, in __mod__
return LazyCommand(command)
File "eden/scm/tests/testutil/dott/shobj.py", line 38, in __init__
_checkdelayedexception()
File "eden/scm/tests/testutil/dott/shobj.py", line 202, in _checkdelayedexception
exec("raise exctype, excvalue, tb")
File "eden/scm/tests/testutil/dott/shobj.py", line 103, in __del__
out = self.output
File "eden/scm/tests/testutil/dott/shobj.py", line 67, in output
self._output = func(*args[1:], **kwargs) or ""
File "eden/scm/tests/test-globalrevs-t.py", line 499, in checkglobalrevs
isgreaterglobalrev("desc('h2')", "desc('g1')") == ""
File "eden/scm/tests/test-globalrevs-t.py", line 480, in isgreaterglobalrev
if getglobalrev(left) > getglobalrev(right):
File "eden/scm/tests/test-globalrevs-t.py", line 469, in getglobalrev
return int(s)
ValueError: invalid literal for int() with base 10: ''
```
Reviewed By: DurhamG
Differential Revision: D22200508
fbshipit-source-id: 07088eac72763f890cc847b9991d79fed18ee0ef
Summary:
Similar to D7121487 (af8ecd5f80) but works for mutation store. This makes sure at the Rust
layer, mutation entries won't get lost after rebasing or metaeditting a set of
commits where a subset of the commits being edited has mutation relations.
Unlike the Python layer, the Rust layer works for mutation chains. Therefore
some of the tests changes.
Reviewed By: markbt
Differential Revision: D22174991
fbshipit-source-id: d62f7c1071fc71f939ec8771ac5968b992aa253c
Summary:
Enforcing 1:1 handling is causing trouble for multiple reasons:
- Test cases like "Metaedit with descendant folded commits" in
test-mutation.t requires handling non-1:1 entries.
- Currently operations like `metaedit-copy` uses two predecessors (edited,
copied). It is considered 1:1 rewrite (rewriting 1 commit to
another 1 commit). However, it does not use multiple records, therefore
cannot be distinguished from `fold`, which is not 1:1 rewrite and also
has multiple predecessors. We want to include the `metaedit-copy`
entries in the 1:1 DAG too.
Therefore lift the 1:1 restriction and let's see how it works.
Reviewed By: markbt
Differential Revision: D22175008
fbshipit-source-id: 84d870dbcc433a0d0e5611a83c93781bfa59d035
Summary:
This makes it easier to remove cycles in other places.
There are probably fancier and more efficient algorithm for this.
For now I just wrote one that is easy to verify correctness.
Reviewed By: markbt
Differential Revision: D22174975
fbshipit-source-id: 8a2dc755e4bc0b066eda5f42a51208c92409f2f9
Summary: Showing that it does not work for a chain of amends (or metaedits).
Reviewed By: DurhamG
Differential Revision: D22174968
fbshipit-source-id: ff942042b69f96e1cc7092a7003cf4608730a66f