Summary:
This updates the virtually_sharded_blobstore to deduplicate puts only if the
data being put is actually the data we have put in the past. This is done by
keeping track of the hash of things we've put in the presence cache.
This has 2 benefits:
- This is safer. We only dedupe puts we 100% know succeeded (because this
particular instance was the one to attempt the put).
- This is creates less surprises, notably it lets us overwrite data in the
backing store (if we are writing something different).
Reviewed By: StanislavGlebik
Differential Revision: D22392809
fbshipit-source-id: d76a49baa9a5749b0fb4865ee1fc1aa5016791bc
Summary:
Running those on my devserver, I noticed they can be a bit flaky. They're are
racy on the purpose, but let's relax them a bit.
We have a lot of margin here — our blobstore is rate limited at once request
every 10ms, and we need to do 100 requests (the goal is to show that they don't
all wait), so 100ms is fine to prove that they're not rate limited when sharing
the same data.
Reviewed By: StanislavGlebik
Differential Revision: D22392810
fbshipit-source-id: 2e3c9cdf19b0e4ab979dfc000fbfa8da864c4fd6
Summary:
There is inevitably interaction between caching, deduplication and rate
limiting:
- You don't want the rate limiting to be above caching (in the blobstore stack,
that is), because you shouldn't rate limits cache hits (this is where we are
today).
- You don't want the rate limiting to below deduplication, because then you get
priority inversion where a low-priority rate-limited request might hold the
semaphore while a higher-priority, non rate limited request wants to do the
same fetch (we could have moved rate limiting here prior to introducing
deduplication, but I didn't do it earlier because I wanted to eventually
introduce deduplication).
So, now that we have caching and deduplication in the same blobstore, let's
also incorporate rate limiting there!.
Note that this also brings a potential motivation for moving Memcache into this
blobstore, in case we don't want rate limiting to apply to requests before they
go to the _actual_ blobstore (I did not do this in this diff).
The design here when accessing the blobstore is as follows:
- Get the semaphore
- Check if the data is in cache, if so release the semaphore and return the
data.
- Otherwise, check if we are rater limited.
Then, if we are rate limited:
- Release the semaphore
- Wait for our turn
- Acquire the semaphore again
- Check the cache again (someone might have put the data we want while we were
waiting).
- If the data is there, then return our rate limit token.
- If the data isn't there, then proceed to query the blobstore.
If we aren't rate limited, then we just proceed to query the blobstore.
There are a couple subtle aspects of this:
- If we have a "late" cache hit (i.e. after we waited for rate limiting), then
we'll have waited but we won't need to query the blobstore.
- This is important when a large number of requests from the same key
arrive at the same time and get rate limited. If we don't do this second
cache check or if we don't return the token, then we'll consume a rate
limiting token for each request (instead of 1 for the first request).
- If a piece of data isn't cacheable, we should treat it like a cache hit with
regard to semaphores (i.e. release early), but like a miss with regard to
rate limits (i.e. wait).
Both of those are addressed captured in the code by returning the `Ticket` on a
cache hit. We can then choose to either return the ticket on a cache hit, or wait
for it on a cache miss.
(all of this logic is captured in unit tests, we can remove any of the blocks
there in `Shards::acquire` and a test will fail)
Reviewed By: farnz
Differential Revision: D22374606
fbshipit-source-id: c3a48805d3cdfed2a885bec8c47c173ee7ebfe2d
Summary:
If anything were to go wrong, we'd be happy to know which puts we ignored. So,
let's log them.
Reviewed By: farnz
Differential Revision: D22356714
fbshipit-source-id: 5687bf0fc426421c5f28b99a9004d87c97106695
Summary:
I canaried this on Fastreplay, but unfortunately that showed that sometimes we
just deadlock, or get so slow we might as well be deadlocked (and it happens
pretty quickly, after ~20 minutes). I tried spawning all the `get()` futures,
and that fixes the problem (but it makes gettreepack noticeably slower), so
that suggests something somewhere is creating futures, polling them a little
bit, then never driving them to completion.
For better or worse, I'd experienced the exact same problem with the
ContextConcurrencyBlobstore (my initial attempt at QOS, which also used a
semaphore), so I was kinda expecting this to happen.
In a sense, this nice because I we've suspected there were things like that in
the codebase for a while (e.g. with the occasional SQL timeout we see where it
looks like MySQL responds fast but we don't actually poll it until past the
timeout), and it gives us a somewhat convenient repro.
In another sense, it's annoying because it blocks this work :)
So, to work around the problem, for now, let's spawn futures to force the work
to complete when a semaphore is held. I originally had an unconditional spawn
here, but that is too expensive for the cache-hit code path and slows things
down (by about ~2x).
However, having it only if we'll query the blobstore isn't not as expensive,
and that seems to be fine (in fact it is a ~20% p99 perf improvement,
though the exact number depends on the number of shard we use for this, which I've had to tweak a bit).
https://pxl.cl/1c18H
I did find what I think is one potential instance of this problem in
`bounded_traversal_stream`, which is that we never try to poll `scheduled` to
completion. Instead, we just poll for the next ready future in our
FuturesUnordered, and if that turns out to be synchronous work then we'll just
re-enqueue more stuff (and sort of starve async work in this FuturesUnordered).
I tried updating bounded traversal to try a fairer implementation (which polls
everything), but that wasn't sufficient to make the problem go away, so I think
this is something we have to just accept for now (note that this actually has
some interesting perf impact in isolation: it's a free ~20% perf improvement on
p95+: https://pxl.cl/1c192
see 976b6b92293a0912147c09aa222b2957873ef0df if you're curious
Reviewed By: farnz
Differential Revision: D22332478
fbshipit-source-id: 885b84cda1abc15c51fbc5dd34473e49338e13f4
Summary: Like it says in the title. Those are useful!
Reviewed By: farnz
Differential Revision: D22332479
fbshipit-source-id: f9bddad75fcbed2593c675f9ba45965bd87f1575
Summary:
The goal of this blobstore is to dedupe reads by waiting for them to finish and
hit cache instead (and also to dedupe writes, but that's not relevant here).
However, this is not a desirable feature if a blob cannot be stored in cache,
because then we're serializing accesses for no good reason. So, when that
happens, we store "this cannot be stored in cache", and we release reads
immediately.
Reviewed By: farnz
Differential Revision: D22285269
fbshipit-source-id: be7f1c73dc36b6d58c5075172e5e3c5764eed894
Summary:
I'm going to store things that aren't quite the exact blobs in here, so on the
off chance that we somehow have two caching blobstores (the old one and this
one) that use the same pools, we should avoid collisions by using a prefix.
And, since I'm going to use a prefix, I'm adding a newtype wrapper to not use
the prefixed key as the blobstore key by accident.
Differential Revision: D22285271
fbshipit-source-id: e352ba107f205958fa33af829c8a46896c24027e
Summary:
This introduces a caching blobstore that deduplicates reads and writes. The
underlying motivation is to improve performance for processes that might find
themsleves inadvertently reading the same data concurrently from a bunch of
independent callsites (most of Mononoke), or writing the same bit of data over
and over again.
The latter is particularly useful for things like commit cloud backfilling in
WWW, where some logger commits include the same blob being written hundreds or
thousands of times, and cause us to overload the underlying Zippy shard in
Manifold. This is however a problem we've also encountered in the past in e.g.
the deleted files manifest and had to solve there. This blobstore is a little
different in the sense that it solves that problem for all writers.
This comes at the cost of writes being dropped if they're known to be
redundant, which prevents updates through this blobstore. This is desirable for
most of Mononoke, but not all (notably, for skiplist updates it's not great).
For now, I'm going to add this behind an opt-in flag, and later on I'm planning
to make it opt-out and turn it off there (I'm thinking to use the CoreContext
for this).
Reviewed By: farnz
Differential Revision: D22285270
fbshipit-source-id: 4e3502ab2da52a3a0e0e471cd9bc4c10b84a3cc5
Summary: We designed the schema to make this simple to implement - it's literally a metadata read and a metadata write.
Reviewed By: ikostia
Differential Revision: D22233922
fbshipit-source-id: b392b4a3a23859c6106934f73ef60084cc4de62c
Summary:
Eventually, we want everything to be `async`/`await`; as a stepping stone in that direction, switch the remaining lobstore traits to new-style futures.
This just pushes the `.compat()` out to old-style futures, but it makes the move to non-'static lifetimes easier, as all the compile errors will relate to lifetime issues.
Reviewed By: krallin
Differential Revision: D22183228
fbshipit-source-id: 3fe3977f4469626f55cbf5636d17fff905039827
Summary:
Eventually, we want everything to be `async`/`await`; as a stepping stone in that direction, switch some of the blobstore interfaces to new-style `BoxFuture` with a `'static` lifetime.
This does not enable any fixes at this point, but does mean that `.compat()` moves to the places that need old-style futures instead of new. It also means that the work needed to make the transition fully complete is changed from a full conversion to new futures, to simply changing the lifetimes involved and fixing the resulting compile failures.
Reviewed By: krallin
Differential Revision: D22164315
fbshipit-source-id: dc655c36db4711d84d42d1e81b76e5dddd16f59d
Summary: Add link support to CountedBlobstore
Reviewed By: StanislavGlebik
Differential Revision: D22090644
fbshipit-source-id: 36dc5454f1ca12c91d0eac6e5059f554ac5cb352
Summary: If a value is above cachelib limit let's try to compress it.
Reviewed By: krallin
Differential Revision: D22139644
fbshipit-source-id: 9eb366e8ec94fe66529d27892a988b035989332a
Summary: A bunch of our dependencies weren't really used, and this fact has recently became a source of hard failures. This diff is an attempt to fix it.
Reviewed By: StanislavGlebik
Differential Revision: D22136288
fbshipit-source-id: 4ae265a93e155312ae086647a27ecf1cbbd57e9c
Summary:
Tooling can't handle named_deps yet, but it can warn about them
P133451794
Reviewed By: StanislavGlebik
Differential Revision: D22083499
fbshipit-source-id: 46de533c19b13b2469e912165c1577ddb63d15cd
Summary:
Remove unused dependencies for Rust targets.
This failed to remove the dependencies in eden/scm/edenscmnative/bindings
because of the extra macro layer.
Manual edits (named_deps) and misc output in P133451794
Reviewed By: dtolnay
Differential Revision: D22083498
fbshipit-source-id: 170bbaf3c6d767e52e86152d0f34bf6daa198283
Summary:
Add optional compress on put controlled by a command line option.
Other than costing some CPU time, this may be a good option when populating repos from existing uncompressed stores to new stores.
Reviewed By: farnz
Differential Revision: D22037756
fbshipit-source-id: e75190ddf9cfd4ed3ea9a18a0ec6d9342a90707b
Summary: Add zstd decoding support to packblob so that if store contains individually zstd compressed blobs we can load them on get()
Reviewed By: farnz
Differential Revision: D22037755
fbshipit-source-id: 41d85be6fcccf14fb198f6ea33a7ca26c4527a46
Summary:
Add a regex for repo prefix detection, and use it in prefixblob for removal of repo prefix from blob-embedded keys.
This is important to keep blobs copyable between repos, and to allow dedupe between same blob in two repos.
I looked at alternate approaches of passing down the prefix from PrefixBlob::new(), but that was fragile and didn't cover the use cases from things like blobstore_healer where no prefixblob is configured at all but packblob will be in the blobstore stack. Using a pattern is the only real option in such "all repo" cases.
The aim of binding the pattern and its tests closely to the prefix generation is to make it hard for someone to get them out of sync, and provide a clear local test failure if they do.
Reviewed By: farnz
Differential Revision: D22034638
fbshipit-source-id: 95a1c2e1ef81432cba606c22afa16b026f59fd5f
Summary:
Add pack and unpack logic to packblob.
Loading a packed form is possible from a regular get(), as the store may contain packed data.
Storing a packed form is only possible from a new put_packed() method, intended for use from the packer (and tests).
NB As noted in the TODO in get, this does not yet handle prefix removal on get, will address that in a separate diff.
Reviewed By: StanislavGlebik
Differential Revision: D21980498
fbshipit-source-id: f534b0e754aa29c42bf00bb7e764f93f1446c14c
Summary: Add blobstore link trait so we can use hardlink style links in fileblob and memblob for testing and later sqlblob et al for prod.
Reviewed By: StanislavGlebik
Differential Revision: D21935647
fbshipit-source-id: f76eaca26b6b226c77d6e39e9c64e02b4145b614
Summary:
async the blobstore test functions, in preparation for adding test coverage for BlobstoreWithLink::link
The boxable test was duplicating the simple roundtrip test which seemed unnecessary, now it just checks that we can Box the blobstore type.
Reviewed By: krallin
Differential Revision: D21974471
fbshipit-source-id: a3cd71a7503a3a670b3c6223812a2e870a20a16e
Summary: update memblob imports to futures_old in preparation for using new futures to implement BlobstoreWithLink
Reviewed By: krallin
Differential Revision: D21973642
fbshipit-source-id: dea4a686db41a0d618395c4dc411603f922a9e22
Summary: Update imports in preparation for using new futures for the BlobstoreWithLink mixin trait
Reviewed By: StanislavGlebik
Differential Revision: D21973587
fbshipit-source-id: 094aa82abc21848d43f50d6d169087e19339ba04
Summary: Switch blobstore imports to use futures_old naming in preparation for using new futures in BlobstoreWithLink (and later BlobstoreWithEnumeration) traits.
Reviewed By: krallin
Differential Revision: D21972842
fbshipit-source-id: 6022c98e7b7254f7c7beb46dc1c7f83609810853
Summary: Add basic packblob store which wraps blobs in the thrift StorageEnvelope type on put() and unwraps them on get()
Reviewed By: farnz
Differential Revision: D21924405
fbshipit-source-id: 81eccf5da6b5b50bdb9aae13622301a20dca0eac
Summary:
Thift definitions for a new blobstore layer known as packblob, which is an enabling part of compression at the mononoke blobstore layer. Its storage formats will support raw data, individual blob compression and blob delta compression.
The included README.md has more detail on the expected use of these types.
Reviewed By: farnz
Differential Revision: D21761723
fbshipit-source-id: a277800f4b4aed5972c4a5bb277e984ff12660f8
Summary: Without this, the most lagged shard will slow down all shards; with this, a single lagging shard doesn't slow down other shards.
Reviewed By: ahornby
Differential Revision: D22018264
fbshipit-source-id: c23f25ea1d0b0b6891d2a6f76dfed467101d2f4d
Summary: I'll want the delay operation to respect shard_id - move it into the stores. While here, do some refactoring that will reduce the size of the next diff.
Reviewed By: ahornby
Differential Revision: D22018317
fbshipit-source-id: d2333dfc6749b2a13bc8a67bfa953ed25cedf847
Summary: Prep for 1.44 but also general cleanups.
Reviewed By: dtolnay
Differential Revision: D22024428
fbshipit-source-id: 8e1d39a1e78289129b38554674d3dbf80681f4c3
Summary: There's over 150 lines of code here, or around 1/3rd of lib.rs. Move to a separate module, so that growth of tests won't make lib.rs too long
Reviewed By: ahornby
Differential Revision: D21952798
fbshipit-source-id: 4e4edf4fde5a6626e2c33c8ea30ec85e6f3c6bf1
Summary: These were minimal conversions of old-style tests - use `.compat()` so that we have "true" async tests.
Reviewed By: ahornby
Differential Revision: D21950533
fbshipit-source-id: 71608201c4fe475d07b9ff3fafa138036f3dad5b
Summary: We used to inline smaller blobs into the data table, and only chunk bigger blobs. Performance tests show that we don't have significant issues with round trips to our MySQL infrastructure, so let's simplify the problems of deduplication and hard linking by always storing the data in chunks, and having the keys directly return significant data.
Reviewed By: ahornby
Differential Revision: D21939500
fbshipit-source-id: 6ad73b25ac859fb1b4f067a3016516b713e8b2f5
Summary:
I observed that for whatever reason our setting of `use_try_shorthand = true` in rustfmt.toml was causing entire functions to not get processed by rustfmt. Even files that contain neither `try` nor `?`. Remove it and reformat fbsource.
Documentation of that config:
- https://github.com/rust-lang/rustfmt/blob/master/Configurations.md#use_try_shorthand
We don't particularly care about the value anymore because nobody writes `r#try!(...)` in 2018 edition code.
Minimized:
```
fn f() {
g(
)
// ...
.h
}
```
This function gets formatted only if use_try_shorthand is not set.
The bug is fixed in the rustfmt 2.0 release candidate.
Reviewed By: jsgf
Differential Revision: D21878162
fbshipit-source-id: b028673c0eb703984d24bf0d2983453fc2a8c212
Summary: This cache didn't improve performance, and is a reasonable amount of extra code. Just rip it out, and improve higher caching layers if needed.
Reviewed By: StanislavGlebik
Differential Revision: D21818412
fbshipit-source-id: 3ca58bff180c6da91d451754b67b23e61b736059
Summary:
We can kill a single shard's replication with weight of puts; by tracking lag once a second, we can slow down to a point where no shard overloads.
This is insufficient by itself, as it slows all shards down equally (whereas we only want to slow down the laggy shard), but this now avoids high replication lag
Reviewed By: StanislavGlebik
Differential Revision: D21720833
fbshipit-source-id: a819f641206641e80f8edde92006fb08cdcf36a9
Summary:
CacheBlob logs hit and miss stastics to Scuba. Let's add the same for
ODS.
Reviewed By: krallin
Differential Revision: D21640922
fbshipit-source-id: 8f7d17f048bf53bdc6cd8bda0384a51cae7b6a30
Summary:
CountedBlobstore is a Blobstore that wraps another blobstore in order
to report metrics about it, such as the number of gets or errors. It's commonly
used to wrap a CacheBlob blobstore, which itself is a caching wrapper over an
inner blobstore.
CountedBlobstore exposes metrics that are supposed to track the number of hits
or misses when fetching blobs. However, these metrics don't make sense as the
CountedBlobstore has no view into cache activity. These metrics actually report
the number of requests and the number of missing blobs rather than hits and
misses.
Remove these misleading counters.
Reviewed By: krallin
Differential Revision: D21640923
fbshipit-source-id: 07b9fc9864c70991415c2b84f35d631b702c17d1
Summary:
D21573455 accidentally removed the scuba sampling for scrub_get
operations. Add this back in.
Reviewed By: StanislavGlebik
Differential Revision: D21638972
fbshipit-source-id: eee66dbce161de69246f4da0a15dc2cf00e1ba01
Summary:
Currently, we have a gap between updating and subsequently reading the
value of `write_order`. If another blobstore's put operation completes before
we have called `record_put_stats` then we may have an incorrect `write_order`
value.
Whilst this is only a minor issue, let's fix it anyway. :-)
Reviewed By: krallin
Differential Revision: D21619669
fbshipit-source-id: 1b8bacbcb4c195e6765ebdfaa68425f286f88c06
Summary:
The blobstore multiplexer contains logic to log blobstore operations to
scuba, as well as updating `PerfCounters`. There are some cases where we don't use the
multiplexed blobstore, which means that we're missing this important logging.
Factor out the logging to a separate crate and implement `LogBlob`, which wraps
another blobstore and performs logging to both scuba and PerfCounters.
Reviewed By: StanislavGlebik
Differential Revision: D21573455
fbshipit-source-id: 490ffd347f1ad19effc93b93f880836467b87651