Summary:
Implement BlobstorePutOps for S3Blob. This uses is_present to check the various put behaviours
While implementing this I noticed get_sharded_key could be updated to take a reference, so I did that as well.
Differential Revision: D24079253
fbshipit-source-id: 16e194076dbdb4da8a7a9b779e0bd5fb60f550a6
Summary: Now that fileblob and memblob support put behaviour logic, update the overwrite test to check the overwrite result.
Differential Revision: D24021167
fbshipit-source-id: d9578630205cf5d79999a459cc29481968d5717d
Summary: Update memblob to be PutBehaviour aware by changing implementation from Blobstore to BlobstoreOps
Differential Revision: D24021166
fbshipit-source-id: 04dd25c5535769ea507120c1886592b808a7bbc6
Summary: Update Memblob::new callsites to ::default() in preparation for adding arguments to ::new() to specify the put behaviour desired
Differential Revision: D24021173
fbshipit-source-id: 07bf4e6c576ba85c9fa0374d5aac57a533132448
Summary: Add put behaviour handling to fileblob so that it can prevent overwrites if requested.
Differential Revision: D23933228
fbshipit-source-id: 8e74ac96b232be841174f6ad2bd2fccf92aaa90d
Summary:
Add put behaviour to BlobstoreOptions in preparation for passing in the put behaviour through blobstore_factory.
Later in the stack a command line option is added to set this non-None so that we can turn on overwrite logging for particular jobs.
Reviewed By: StanislavGlebik
Differential Revision: D24021169
fbshipit-source-id: 5692e2d3912ebde07b0d7bcce54b79df188a9f16
Summary:
This is the first part of allowing us to update mononoke blobstore put behaviour to optionally a) log when it is overwriting keys, and b) not overwrite existing keys.
Introduce BlobstorePutOps for blobstore implementations so we can track overwrite status of a put, and force an explicit PutBehaviour if required. Its intended that only blobstore implementation code and special admin tooling will need to access BlobstorePutOps methods.
Reviewed By: farnz
Differential Revision: D24021168
fbshipit-source-id: 56ae34f9995a93cf1e47fbcfa2565f236c28ae12
Summary: Remove unnecessary clone in packblob along with the Clone constraint on the inner blobstore.
Reviewed By: krallin
Differential Revision: D24109293
fbshipit-source-id: b47e68e63b6ffda95d28d974ed6883e4ae31b3a1
Summary:
We want to end up with two `put` behaviours - overwrite and do not overwrite.
Currently, SQLBlob only implements the latter, but some users assume that `put` always overwrites. Change to match Manifold
Reviewed By: aslpavel
Differential Revision: D24079501
fbshipit-source-id: f75cac81acf874337c38f82597aae645c41a319b
Summary:
Remove assert_present from Blobstore trait as it had only one callsite other than the various blobstore layers/impls.
Replaced that one last call in repo_commit.rs/assert_in_blobstore() with an equivalent call to is_present.
Reviewed By: farnz
Differential Revision: D24016927
fbshipit-source-id: 764fddbebeb4b1192d196078b8824cf8a08e9691
Summary:
This diff introduces Mysql client for Rust to Mononoke as a one more backend in the same row with raw xdb connections and myrouter. So now Mononoke can use new Mysql client connections instead of Myrouter.
To run Mononoke with the new backend, pass `--use-mysql-client` options (conflicts with `--myrouter-port`).
I also added a new target for integration tests, which runs mysql tests using mysql client.
Now to run mysql tests using raw xdb connections, you can use `mononoke/tests/integration:integration-mysql-raw-xdb` and using mysql client `mononoke/tests/integration:integration-mysql`
Reviewed By: ahornby
Differential Revision: D23213228
fbshipit-source-id: c124ccb15747edb17ed94cdad2c6f7703d3bf1a2
Summary: SomeFailedOthersNone should not consider write mostly blobstores None if all other stores Error
Reviewed By: farnz
Differential Revision: D23840334
fbshipit-source-id: 9838bead6fec0d5f920e4a788387025d0dacf80b
Summary: Add a test for SomeFailedOthersNone when write mostly blobstore is None
Reviewed By: farnz
Differential Revision: D23840685
fbshipit-source-id: 81834663169b3a522b9c08e0a36f0b91354916c7
Summary:
Implemented S3 blobstore
Isilon implements S3 as 1:1 mapping into filesystem, and it limits the maximum number of blobs in the single directory. To overcome it lets shard the keys using base64 encoding and making 2 level dir structure with 2 chars dir names.
Reviewed By: krallin
Differential Revision: D23562541
fbshipit-source-id: c87aca2410381a07babb191cbd8cf28233556e03
Summary:
I've re-backfilled some of blame values for configerator. But old values might
still be in memcache. To make sure that's not the case let's bump the memcache
key.
Reviewed By: krallin
Differential Revision: D23810971
fbshipit-source-id: c333a51ffb2babf7da808b276f9cfa31baaa105c
Summary:
We are currently logging only the outermost underlying error or context, not any of the lower level causes. This makes mononoke_blobstore_trace less useful!
This changes to use anyhow's alternate format that includes causes
Reviewed By: krallin
Differential Revision: D23708577
fbshipit-source-id: fa2e71734841e2b75d824c456dccf61c1fb13fd2
Summary:
In preparation of moving away from SSH as an intermediate entry point for
Mononoke, let Mononoke work with newly introduced Metadata. This removes any
assumptions we now make about how certain data is presented to us, making the
current "ssh preamble" no longer central.
Metadata is primarily based around identities and provides some
backwards-compatible entry points to make sure we can satisfy downstream
consumers of commits like hooks and logs.
Simarly we now do our own reverse DNS resolving instead of relying on what's
been provided by the client. This is done in an async matter and we don't rely
on the result, so Mononoke can keep functioning in case DNS is offline.
Reviewed By: farnz
Differential Revision: D23596262
fbshipit-source-id: 3a4e97a429b13bae76ae1cdf428de0246e684a27
Summary:
This imports the async-compression crate. We have an equivalent-ish in
common/rust, but it targets Tokio 0.1, whereas this community-supported crate
targets Tokio 0.2 (it offers a richer API, notably in the sense that we
can use it for Streams, whereas the async-compression crate we have is only for
AsyncWrite).
In the immediate term, I'd like to use this for transfer compression in
Mononoke's LFS Server. In the future, we might also use it in Mononoke where we
currently use our own async compression crate when all that stuff moves to
Tokio 0.2.
Finally, this also updates zstd: the version we link to from tp2 is actually
zstd 1.4.5, so it's a good idea to just get the same version of the zstd crate.
The zstd crate doesn't keep a great changelog, so it's hard to tell what has changed.
At a glance, it looks like the answer is not much, but I'm going to look to Sandcastle
to root out potential issues here.
Reviewed By: StanislavGlebik
Differential Revision: D23652335
fbshipit-source-id: e250cef7a52d640bbbcccd72448fd2d4f548a48a
Summary:
shed/sql library used mainly to communicate with Mysql db and to have a nice abstraction layer around mysql (which is used in production) and sqlite (integration tests). The library provided an interface, that was backed up from Mysql side my raw connections and by MyRouter.
This diff introduces a new backend - new Mysql client for Rust.
New backend is exposed as a third variant for the current model: sqlite, mysql (raw conn and myrouter) and mysql2 (new client). The main reason for that is the fact that the current shed/sql interface for Mysql
(1) heavily depends on mysql_async crate, (2) introduces much more complexity than needed for the new client and (3) it seems like this will be refactored and cleaned up later, old things will be deprecated.
So to not overcomplicate things by trying to implement the given interface for the new Mysql client, I tried to simplify things by adding it as a third backend option.
Reviewed By: farnz
Differential Revision: D22458189
fbshipit-source-id: 4a484b5201a38cc017023c4086e9f57544de68b8
Summary: When we're scrubbing blobstores, it's not actually a success state if a scrub fails to write. Report this back to the caller - no-one will usually be scrubbing unless they expect repair writes to succeed, and a failure is a sign that we need to investigate further
Reviewed By: mitrandir77
Differential Revision: D23601541
fbshipit-source-id: d328935af9999c944719a6b863d0c86b28c54f59
Summary:
With three blobstores in play, we have issues working out exactly what's wrong during a manual scrub. Make the error handling better:
1. Manual scrub adds the key as context for the failure.
2. Scrub error groups blobstores by content, so that you can see which blobstore is most likely to be wrong.
Reviewed By: ahornby, krallin
Differential Revision: D23565906
fbshipit-source-id: a199e9f08c41b8e967d418bc4bc09cb586bbb94b
Summary:
Previously we were not logging a redacted access if previous access was logged
less < MIN_REPORT_TIME_DIFFERENCE_NS ago. That doesn't work well with our
tests.
Let's instead add a sampling tunable.
Reviewed By: krallin
Differential Revision: D23595067
fbshipit-source-id: 47f6152945d9fdc2796fd1e74804e8bcf7f34940
Summary:
Generated by formatting with rustfmt 2.0.0-rc.2 and then a second time with fbsource's current rustfmt (1.4.14).
This results in formatting for which rustfmt 1.4 is idempotent but is closer to the style of rustfmt 2.0, reducing the amount of code that will need to change atomically in that upgrade.
---
*Why now?* **:** The 1.x branch is no longer being developed and fixes like https://github.com/rust-lang/rustfmt/issues/4159 (which we need in fbcode) only land to the 2.0 branch.
---
Reviewed By: StanislavGlebik
Differential Revision: D23568780
fbshipit-source-id: b4b4a0aa683d236e2fdeb5b96d723ac2d84b9faf
Summary:
Before redacting something it would be good to check that this file is not
accessed by anything. Having log-only mode would help with that.
Reviewed By: ikostia
Differential Revision: D23503666
fbshipit-source-id: ae492d4e0e6f2da792d36ee42a73f591e632dfa4
Summary: When running manual scrub for a large repo with one empty store, we are doing one peek per key. For keys that have existed for some time this is unnecessary as we know the key should exist and slows down the scrub.
Reviewed By: farnz
Differential Revision: D23054582
fbshipit-source-id: d2222350157ca37aa31b7792214af4446129c692
Summary: Ctime is an Option<i64>, so rather than as_ctime()/into_ctime() use the fact that it's fairly small and Copy to just .ctime()
Reviewed By: krallin
Differential Revision: D23081739
fbshipit-source-id: be62912eca02e5c29d7473d6f386d98df11000dd
Summary: Add a non-thrift header to packblob so we can vary thrift protocol in future.
Reviewed By: farnz
Differential Revision: D22953758
fbshipit-source-id: a114a350105e75cbe57f6c824295d863c723f32f
Summary: Move it from `'static` BoxFutures to async_trait and lifetimes
Reviewed By: markbt
Differential Revision: D22927171
fbshipit-source-id: 637a983fa6fa91d4cd1e73d822340cb08647c57d
Summary:
Backfillers and other housekeeping processes can run so far ahead of the blobstore sync queue that we can't empty it from the healer task as fast as the backfillers can fill it.
Work around this by providing a new mode that background tasks can use to avoid filling the queue if all the blobstores are writing successfully. This has a side-effect of slowing background tasks to the speed of the slowest blobstore, instead of allowing them to run ahead at the speed of the fastest blobstore and relying on the healer ensuring that all blobs are present.
Future diffs will add this mode to appropriate tasks
Reviewed By: ikostia
Differential Revision: D22866818
fbshipit-source-id: a8762528bb3f6f11c0ec63e4a3c8dac08d0b4d8e
Summary: We were using a git snapshot of auto_impl from somewhere between 0.3 and 0.4; 0.4 fixes a bug around Self: 'lifetime constraints on methods that blocks work I'm doing in Mononoke, so update.
Reviewed By: dtolnay
Differential Revision: D22922790
fbshipit-source-id: 7bb68589a1d187393e7de52635096acaf6e48b7e
Summary:
- Enumerate API now provided via trait BlobstoreKeySource
- Implementation for Fileblob and ManifoldBlob
- Modified populate_healer to use new api
- Modified fixrepocontents to use new api
Reviewed By: ahornby
Differential Revision: D22763274
fbshipit-source-id: 8ee4503912bf40d4ac525114289a75d409ef3790
Summary:
There are two reasons to want a write quorum:
1. One or more blobstores in the multiplex are experimental, and we don't want to accept a write unless the write is in a stable blobstore.
2. To reduce the risk of data loss if one blobstore loses data at a bad time.
Make it possible
Reviewed By: krallin
Differential Revision: D22850261
fbshipit-source-id: ed87d71c909053867ea8b1e3a5467f3224663f6a
Summary:
With upcoming write quorum work, it'll be interesting to know all the failures that prevent a put from succeeding, not just the most recent, as the most recent may be from a blobstore whose reliability is not yet established.
Store and return all errors, so that we can see exactly why a put failed
Reviewed By: ahornby
Differential Revision: D22896745
fbshipit-source-id: a3627a04a46052357066d64135f9bf806b27b974
Summary:
We're going to add an SQL blobstore to our existing multiplex, which won't have all the blobs initially.
In order to populate it safely, we want to have normal operations filling it with the latest data, and then backfill from Manifold; once we're confident all the data is in here, we can switch to normal mode, and never have an excessive number of reads of blobs that we know aren't in the new blobstore.
Reviewed By: krallin
Differential Revision: D22820501
fbshipit-source-id: 5f1c78ad94136b97ae3ac273a83792ab9ac591a9
Summary:
Add a cmdlib argument to control cachelib zstd compression. The default behaviour is unchanged, in that the CachelibBlobstore will attempted compression when putting to the cache if the object is larger than the cachelib max size.
To make the cache behaviour more testable, this change also adds an option to do an eager put to cache without the spawn. The default remains to do a lazy fire and forget put into the cache with tokio::spawn.
The motivation for the change is that when running the walker the compression putting to cachelib can dominate CPU usage for part of the walk, so it's best to turn it off and let those items be uncached as the walker is unlikely to visit them again (it only revisits items that were not fully derived).
Reviewed By: StanislavGlebik
Differential Revision: D22797872
fbshipit-source-id: d05f63811e78597bf3874d7fd0e139b9268cf35d
Summary:
We have two deficiencies to correct in here; modernise the code without changing behaviour first to make it easier to later fix them.
Deficiency 1 is that we always call the `on_put` handler; we need a mode that doesn't do that unless a blobstore returns an error, for jobs not waiting on a human.
Deficiency 2 is that we accept a write after one blobstore accepts it; there's a risk of that being the only copy if a certain set of race conditions are met
Reviewed By: StanislavGlebik
Differential Revision: D22701961
fbshipit-source-id: 0990d3229153cec403717fcd4383abcdf7a52e58
Summary:
This diff adds a minimal workflow for running integrations tests for Mononoke. Currently only one test is run and it fails.
This also splits the regular Mononoke CI into separate files for Linux and Mac to match the current style in Eden repo.
There are the "scopeguard::defer" fixes here that somehow escaped the CI tests.
Some tweaks have been made to "integration_runner_real.py" to make it runnable outside FB context.
Lastly the change from using "[[ -v ... ]" to "[[ -n "${...:-}" ]]; in "library.sh" was made because the former is not supported by the default Bash version preinstalled on modern MacOS.
Pull Request resolved: https://github.com/facebookexperimental/eden/pull/26
Reviewed By: krallin
Differential Revision: D22541344
Pulled By: lukaspiatkowski
fbshipit-source-id: 5023d147823166a8754be852c29b1e7b0e6d9f5f
Summary: Stage 1 of a migration - next step is to make all users of this trait use new futures, and then I can come back, add lifetimes and references, and leave it modernised
Reviewed By: StanislavGlebik
Differential Revision: D22460164
fbshipit-source-id: 94591183912c0b006b7bcd7388a3d7c296e60577
Summary: D22381744 updated the version of `futures` in third-party/rust to 0.3.5, but did not regenerate the autocargo-managed Cargo.toml files in the repo. Although this is a semver-compatible change (and therefore should not break anything), it means that affected projects would see changes to all of their Cargo.toml files the next time they ran `cargo autocargo`.
Reviewed By: dtolnay
Differential Revision: D22403809
fbshipit-source-id: eb1fdbaf69c99549309da0f67c9bebcb69c1131b
Summary:
This updates the virtually_sharded_blobstore to deduplicate puts only if the
data being put is actually the data we have put in the past. This is done by
keeping track of the hash of things we've put in the presence cache.
This has 2 benefits:
- This is safer. We only dedupe puts we 100% know succeeded (because this
particular instance was the one to attempt the put).
- This is creates less surprises, notably it lets us overwrite data in the
backing store (if we are writing something different).
Reviewed By: StanislavGlebik
Differential Revision: D22392809
fbshipit-source-id: d76a49baa9a5749b0fb4865ee1fc1aa5016791bc
Summary:
Running those on my devserver, I noticed they can be a bit flaky. They're are
racy on the purpose, but let's relax them a bit.
We have a lot of margin here — our blobstore is rate limited at once request
every 10ms, and we need to do 100 requests (the goal is to show that they don't
all wait), so 100ms is fine to prove that they're not rate limited when sharing
the same data.
Reviewed By: StanislavGlebik
Differential Revision: D22392810
fbshipit-source-id: 2e3c9cdf19b0e4ab979dfc000fbfa8da864c4fd6
Summary:
There is inevitably interaction between caching, deduplication and rate
limiting:
- You don't want the rate limiting to be above caching (in the blobstore stack,
that is), because you shouldn't rate limits cache hits (this is where we are
today).
- You don't want the rate limiting to below deduplication, because then you get
priority inversion where a low-priority rate-limited request might hold the
semaphore while a higher-priority, non rate limited request wants to do the
same fetch (we could have moved rate limiting here prior to introducing
deduplication, but I didn't do it earlier because I wanted to eventually
introduce deduplication).
So, now that we have caching and deduplication in the same blobstore, let's
also incorporate rate limiting there!.
Note that this also brings a potential motivation for moving Memcache into this
blobstore, in case we don't want rate limiting to apply to requests before they
go to the _actual_ blobstore (I did not do this in this diff).
The design here when accessing the blobstore is as follows:
- Get the semaphore
- Check if the data is in cache, if so release the semaphore and return the
data.
- Otherwise, check if we are rater limited.
Then, if we are rate limited:
- Release the semaphore
- Wait for our turn
- Acquire the semaphore again
- Check the cache again (someone might have put the data we want while we were
waiting).
- If the data is there, then return our rate limit token.
- If the data isn't there, then proceed to query the blobstore.
If we aren't rate limited, then we just proceed to query the blobstore.
There are a couple subtle aspects of this:
- If we have a "late" cache hit (i.e. after we waited for rate limiting), then
we'll have waited but we won't need to query the blobstore.
- This is important when a large number of requests from the same key
arrive at the same time and get rate limited. If we don't do this second
cache check or if we don't return the token, then we'll consume a rate
limiting token for each request (instead of 1 for the first request).
- If a piece of data isn't cacheable, we should treat it like a cache hit with
regard to semaphores (i.e. release early), but like a miss with regard to
rate limits (i.e. wait).
Both of those are addressed captured in the code by returning the `Ticket` on a
cache hit. We can then choose to either return the ticket on a cache hit, or wait
for it on a cache miss.
(all of this logic is captured in unit tests, we can remove any of the blocks
there in `Shards::acquire` and a test will fail)
Reviewed By: farnz
Differential Revision: D22374606
fbshipit-source-id: c3a48805d3cdfed2a885bec8c47c173ee7ebfe2d
Summary:
If anything were to go wrong, we'd be happy to know which puts we ignored. So,
let's log them.
Reviewed By: farnz
Differential Revision: D22356714
fbshipit-source-id: 5687bf0fc426421c5f28b99a9004d87c97106695
Summary:
I canaried this on Fastreplay, but unfortunately that showed that sometimes we
just deadlock, or get so slow we might as well be deadlocked (and it happens
pretty quickly, after ~20 minutes). I tried spawning all the `get()` futures,
and that fixes the problem (but it makes gettreepack noticeably slower), so
that suggests something somewhere is creating futures, polling them a little
bit, then never driving them to completion.
For better or worse, I'd experienced the exact same problem with the
ContextConcurrencyBlobstore (my initial attempt at QOS, which also used a
semaphore), so I was kinda expecting this to happen.
In a sense, this nice because I we've suspected there were things like that in
the codebase for a while (e.g. with the occasional SQL timeout we see where it
looks like MySQL responds fast but we don't actually poll it until past the
timeout), and it gives us a somewhat convenient repro.
In another sense, it's annoying because it blocks this work :)
So, to work around the problem, for now, let's spawn futures to force the work
to complete when a semaphore is held. I originally had an unconditional spawn
here, but that is too expensive for the cache-hit code path and slows things
down (by about ~2x).
However, having it only if we'll query the blobstore isn't not as expensive,
and that seems to be fine (in fact it is a ~20% p99 perf improvement,
though the exact number depends on the number of shard we use for this, which I've had to tweak a bit).
https://pxl.cl/1c18H
I did find what I think is one potential instance of this problem in
`bounded_traversal_stream`, which is that we never try to poll `scheduled` to
completion. Instead, we just poll for the next ready future in our
FuturesUnordered, and if that turns out to be synchronous work then we'll just
re-enqueue more stuff (and sort of starve async work in this FuturesUnordered).
I tried updating bounded traversal to try a fairer implementation (which polls
everything), but that wasn't sufficient to make the problem go away, so I think
this is something we have to just accept for now (note that this actually has
some interesting perf impact in isolation: it's a free ~20% perf improvement on
p95+: https://pxl.cl/1c192
see 976b6b92293a0912147c09aa222b2957873ef0df if you're curious
Reviewed By: farnz
Differential Revision: D22332478
fbshipit-source-id: 885b84cda1abc15c51fbc5dd34473e49338e13f4