Summary:
This modifies our object popularity mechanism to account for the size of the
objects being downloaded. Indeed, considering our bottleneck is bandwidth, we
forcing similar routing for 10 downloads of a 10MB object and 10 downloads of a
1GB object doesn't make that much sense.
This diffs updates our counting so that we now record the object size instead
of a count. I'll set up routing so that we disallow consistent routing when a
single object exceeds 250MiB/s of throughput ( = 1/4th of a task).
It's worth noting that this will be equivalent to what we have right now for
our most problematic objects (GraphQL schemas in Fbsource, 35M each), given
that we "unroute" at 150 requests / 20 seconds (`150 * 35 / 20 = 262`).
The key difference here is that this will work for all objects.
This does mean LFS needs to cache and know about content metadata. That's not
actually a big deal. Indeed, over a week, we serve 25K distinct objects
(https://fburl.com/scuba/mononoke_lfs/a2d26s1a), so considering content
metadata is a hundred bytes (and so is the key), we are looking at a few MBs of
cache space here.
As part of this, I've also refactored our config handling to stop duplicating
structures in Configerator and LFS by using the Thrift objects directly (we
still have a few dedicated structs when post-processing is necessary, but we
no longer have anything that deserializes straight from JSON).
Note that one further refinement here would be to consistently route but to
more tasks (i.e. return one of 2 routing tokens for an object that is being downloaded
at 500MiB/s). We'll see if we need that.
Reviewed By: HarveyHunt
Differential Revision: D24361314
fbshipit-source-id: 49e1f86cf49357f60f1eac298a753e0c1fcbdbe5
Summary: Convert `ChangesetFetcher` to new type futures
Reviewed By: StanislavGlebik
Differential Revision: D25244213
fbshipit-source-id: 4207386d81397a930a566db008019bb8f31bf602
Summary:
Move expected_item_size_byte into CachelibSettings, seems like it should be there.
To enable its use also exposes a parse_and_init_cachelib method for callers that have different defaults to default cachelibe settings.
Reviewed By: krallin
Differential Revision: D24761024
fbshipit-source-id: 440082ab77b5b9f879c99b8f764e71bec68d270e
Summary: convert mercurial_derived_data to new type futures
Reviewed By: ahornby
Differential Revision: D25220329
fbshipit-source-id: c2532a12e915b315fe6eb72f122dbc37822bbb2a
Summary:
This diff prepares the Mononoke codebase for composition-based extendability of
`ScubaSampleBuilder`. Specifically, in the near future I will add:
- new methods for verbose scuba logging
- new data field (`ObservabilityContext`) to check if verbose logging should
be enabled or disabled
The higher-level goal here is to be able to enable/disable verbose Scuba
logging (either overall or for certain slices of logs, like for a certain
session id) in real time, without restarting Mononoke. To do so, I plan to
expose the aforementioned verbose logging methods, which will run a check
against the stored `ObservabilityContext` and make a decision of whether the
logging is enabled or not. `ObservabilityContext` will of course hide
implementation details from the renamed `ScubaSampleBuilderExt`, and just provide a yes/no
answer based on the current config and sample fields.
At the moment this should be a completely harmless change.
Reviewed By: krallin
Differential Revision: D25211089
fbshipit-source-id: ea03dda82fadb7fc91a2433e12e220582ede5fb8
Summary:
- convert save_bonsai_changesets to new type futures
- `blobrepo:blobrepo` is free from old futures deps
Reviewed By: StanislavGlebik
Differential Revision: D25197060
fbshipit-source-id: 910bd3f9674094b56e1133d7799cefea56c84123
Summary: convert `BlobRepo::get_bonsai_bookmark` to new type futures
Reviewed By: StanislavGlebik
Differential Revision: D25188577
fbshipit-source-id: fb6f2b592b9e9f76736bc1af5fa5a08d12744b5f
Summary: Prefix old futures with `Old` so it would be possible to start conversion of BlobRepo to new type futures
Reviewed By: StanislavGlebik
Differential Revision: D25187882
fbshipit-source-id: d66debd2981564b289d50292888f3f6bd3343e94
Summary:
We have `HgBlobChangeset`, `HgFileEnvelope`, `HgManifestEnvelope` ... but we
also have `BlobManifest`. Let's be a little consistent.
Reviewed By: markbt
Differential Revision: D25122288
fbshipit-source-id: 9ae0be49986dbcc31cee9a46bd30093b07795c62
Summary:
We have 4 different ways of awaiting futures in there: sometimes we create a
runtime, sometimes we use async-unit, sometimes we use `fbinit::test` and
sometimes we use `fbinit::compat_test`. Let's be consistent.
While in there, let's also get rid of `should_panic`, since that's not a very
good way of running tests.
Reviewed By: HarveyHunt
Differential Revision: D25186195
fbshipit-source-id: b64bb61935fb2132d2e5d8dff66fd4efdae1bf64
Summary:
HgBlobEntry is kind of a problematic type right now:
- It has no typing to indicate if it's a file or a manifest
- It always has a file type, but we only sometimes care about it
This results in us using `HgBlobEntry` to sometimes represent
`Entry<HgManifestId, (FileType, HgFileNodeId)>`, and some other times to
represent `Entry<HgManifestId, HgFileNodeId>`.
This makes code a) confusing, b) harder to refactor because you might be having
to change unrelated code that cares for the use case you don't care about (i.e.
with or without the FileType), and c) results in us sometimes just throwing in
a `FileType::Normal` because we don't care about the type, which is prone to
breaking stuff.
So, this diff just removes it, and replaces it with the appropriate types.
Reviewed By: farnz
Differential Revision: D25122291
fbshipit-source-id: e9c060c509357321a8059d95daf22399177140f1
Summary: Remove 'static requirement for async methods of Blobstore, propagate this change and fixup low hanging fruits where the code can become 'static free easily.
Reviewed By: ahornby, farnz
Differential Revision: D24839054
fbshipit-source-id: 5d5daa04c23c4c9ae902b669b0a71fe41ee6dee6
Summary:
This isn't actually being consulted anywhere save for a single test, so let's
just remove it (it's not like the test checks anything important — that field
might not as well exist given we never read it).
Reviewed By: farnz
Differential Revision: D25093494
fbshipit-source-id: 5f4a53f8666fc0e8a89ceade44baa96e71fb813f
Summary: Now that `derive03` is the only version available, rename it to `derive`.
Reviewed By: krallin
Differential Revision: D24900106
fbshipit-source-id: c7fbf9a00baca7d52da64f2b5c17e3fe1ddc179e
Summary:
Under this configuration SegmentedChangelog Dags (IdDag + IdMap) are always
downloaded from saves. There is no real state kept in memory.
It's a simple configuration and somewhat flexible with treaks to blobstore
caching.
Reviewed By: krallin
Differential Revision: D24808330
fbshipit-source-id: 450011657c4d384b5b42e881af8a1bd008d2e005
Summary:
Allow users of `tests_utils` to create paths that are not `String`, by supporting any type
that can be converted into `MPath`.
Reviewed By: StanislavGlebik
Differential Revision: D24887002
fbshipit-source-id: 47ad567507185863c1cfa3c6738f30aa9266901a
Summary: It is possible that hash of newly created bonsai_changeset will be different from what is in prod repo. In this case let's fetch bonsai from the prod, to make backup repo consistent with prod.
Reviewed By: StanislavGlebik
Differential Revision: D24593003
fbshipit-source-id: 70496c59927dae190a8508d67f0e3d5bf8d32e5c
Summary:
This updates the external facing API of the filestore to use 0.3 streams.
Internally, there is still a bit of 0.3 streams, but as of this change, it's
all 0.3 outside.
This required a few changes here and there in places where it was simpler to
just update them to use 0.3 futures instead of `compat()`-ing everything.
Reviewed By: ikostia
Differential Revision: D24731298
fbshipit-source-id: 18a1dc58b27d129970a6aa2d0d23994d5c5de6aa
Summary: Like it says in the title.
Reviewed By: StanislavGlebik
Differential Revision: D24731300
fbshipit-source-id: b9c44fc1e4bd4cfe8655e1024a0547e40fb99424
Summary:
Like it says in the title. This required quite a lot of changes at callsites,
as you'd expect.
Reviewed By: StanislavGlebik
Differential Revision: D24731299
fbshipit-source-id: e58447e88dcc3ba1ab3c951f87f7042e2b03eb2c
Summary: Like it says in the title. This updates `store()` and its (many) callsites.
Reviewed By: ahornby
Differential Revision: D24728658
fbshipit-source-id: 5fccf76d25e58eaf069f3f0cf5a31d2c397687ea
Summary:
This updates the metadata APIs in the filestore to futures 0.3 & async / await.
This changes the external API of the filestore, so there's quite a bit of churn
outside of that module.
Reviewed By: markbt
Differential Revision: D24727255
fbshipit-source-id: 59833f185abd6ab9c609c6bcc22ca88ada6f1b42
Summary: As part of the effort to deprecate futures 0.1 in favor of 0.3 I want to create a new futures_ext crate that will contain some of the extensions that are applicable from the futures_01_ext. But first I need to reclame this crate name by renaming the old futures_ext crate. This will also make it easier to track which parts of codebase still use the old futures.
Reviewed By: farnz
Differential Revision: D24725776
fbshipit-source-id: 3574d2a0790f8212f6fad4106655cd41836ff74d
Summary: We're getting rid of old futures - remove them as a dep here
Reviewed By: StanislavGlebik
Differential Revision: D24705787
fbshipit-source-id: 83ae938be0c9f7f485c74d3e26d041e844e94a43
Summary:
Extend MononokeApp so admin apps can have a special default put behaviour (typically
overwrite) vs the soon to be new default of IfAbsent
Use it from the admin tools.
Reviewed By: farnz
Differential Revision: D24623094
fbshipit-source-id: 5709c68429f8e1de0535eec132998d20411fc0e6
Summary: SQLBlob GC (next diff in stack) will need a ConfigStore in SQLBlob. Make one available to blobstore creation
Reviewed By: krallin
Differential Revision: D24460586
fbshipit-source-id: ea2d5149e0c548844f1fd2a0d241ed0647e137ae
Summary:
- convert ManifestOps to new style futures
- at this point `//eden/manifest:manifest` crate is completely free from old style futures
Reviewed By: krallin
Differential Revision: D24502214
fbshipit-source-id: f1cdb11bd8234f22af5c905243f71e1e9fca11f1
Summary:
Our case conflict checking is very inefficient on large changesets. The root
cause is that we traverse the parent manifest for every single file we are
modifying in the new changeset.
This results in very poor performance on large changes since we end up
reparsing manifests and doing case comparisons a lot more than we should. In
some pathological cases, it results in us taking several *minutes* to do a case
conflict check, with all of that time being spent on CPU lower-casing strings
and deserializing manifests.
This is actually a step we do after having uploaded all the data for a commit,
so this is pure overhead that is being added to the push process (but note it's
not part of the pushrebase critical section).
I ended up looking at this issue because it is contributing to the high
latencies we are seeing in commit cloud right now. Some of the bundles I
checked had 300+ seconds of on-CPU time being spent to check for case
conflicts. The hope is that with this change, we'll get fewer pathological
cases, and might be able to root cause remaining instances of latency (or have
that finally fixed).
This is pretty easy to repro.
I added a binary that runs case conflict checks on an arbitrary commit, and
tested it on `38c845c90d59ba65e7954be001c1eda1eb76a87d` (a commit that I noted
was slow to ingest in commit cloud, despite all its data being present already,
meaning it was basically a no-op). The old code takes ~3 minutes. The new one
takes a second.
I also backtested this by rigging up the hook tailer to do case conflict checks
instead (P145550763). It is about the same speed for most commits (perhaps
marginally slower on some, but we're talking microseconds here), but for some
pathological commits, it is indeed much faster.
This notably revealed one interesting case:
473b6e21e910fcdf7338df66ee0cbeb4b8d311989385745151fa7ac38d1b46ef (~8K files)
took 118329us in the new code (~0.1s), and 86676677us in the old (~87 seconds).
There are also commits with more files in recent history, but they're
deletions, so they are just as fast in both (< 0.1 s).
Reviewed By: StanislavGlebik
Differential Revision: D24305563
fbshipit-source-id: eb548b54be14a846554fdf4c3194da8b8a466afe
Summary:
I'm reworking some of our case conflict handling, and as part of this, I'm
going to be using check_case_conflicts for all our checking of case conflicts,
and notably for the case where we introduce a new commit and check it against
its parent (which, right now, does not check for case conflicts).
To do this and provide a good user experience (i.e. indicate which files
conflicted and with what), I need `check_case_conflicts` to report what files
the change conflicts with. This is what this diff does.
This does mean a few more allocations, so I "paid those off" by updating our
case lowering to allocate one fewer Vec and one fewer String per MPathElement
being lowercased.
Reviewed By: StanislavGlebik
Differential Revision: D24305562
fbshipit-source-id: 8ac14466ba3e84a3ee3d9216a84c2d9125a51b86
Summary:
This trait is no longer used all that much outsides of a handful of tests, the
walker, and an admin subcommand, as it has been replaced by the `Manifest`
trait, which works over all kinds of Manifests, and has stronger typing (its
sub-entries always have a path, and they are wrapped in an enum that knows if
they're leaves or trees).
This left a bunch of old legacy code here or there, which is worth removing
to make sure we don't introduce any new callsites to this. Another motivation
is that this legacy code is often not very compatible with new code, and has
historically made it a bit tricky (everything owns a blobstore in this code,
which is pretty awkward and not at all how we do things nowadays).
There is, I think, a bit more potential here since we could also perhaps try to
remove the `HgBlobEntry` struct, but that has a callsites still, so I'm not
doing this here.
Reviewed By: StanislavGlebik
Differential Revision: D24306946
fbshipit-source-id: 8a73dbbf40a904ce19ac65d791b732091c206263
Summary: Update Memblob::new callsites to ::default() in preparation for adding arguments to ::new() to specify the put behaviour desired
Differential Revision: D24021173
fbshipit-source-id: 07bf4e6c576ba85c9fa0374d5aac57a533132448
Summary:
Remove assert_present from Blobstore trait as it had only one callsite other than the various blobstore layers/impls.
Replaced that one last call in repo_commit.rs/assert_in_blobstore() with an equivalent call to is_present.
Reviewed By: farnz
Differential Revision: D24016927
fbshipit-source-id: 764fddbebeb4b1192d196078b8824cf8a08e9691
Summary:
This completely converts mercurial changeset to be an instance of derived data:
- Custom lease logic is removed
- Custom changeset traversal logic is removed
Naming scheme of keys for leases has been changed to conform with other derived data types. This might cause temporary spike of cpu usage during rollout.
Reviewed By: farnz
Differential Revision: D23575777
fbshipit-source-id: 8eb878b2b0a57312c69f865f4c5395d98df7141c
Summary:
This change move logic associated with mercurial changeset derivation to `mercurial_derived_data` crate.
NOTE: it is not converted to derived data infrastructure at this point, it is a preparation step to actually do this
Reviewed By: farnz
Differential Revision: D23573610
fbshipit-source-id: 6e8cbf7d53ab5dbd39d5bf5e06c3f0fc5a8305c8
Summary: Moving some of the functionality (which is required for mercurial changeset derivation) into a separate crate. This is required to convert mercurial changeset to derived data to avoid circular dependency it would create otherwise.
Reviewed By: StanislavGlebik
Differential Revision: D23566293
fbshipit-source-id: 9d30b4b3b7d8a922f72551aa5118c43104ef382c
Summary:
Generated by formatting with rustfmt 2.0.0-rc.2 and then a second time with fbsource's current rustfmt (1.4.14).
This results in formatting for which rustfmt 1.4 is idempotent but is closer to the style of rustfmt 2.0, reducing the amount of code that will need to change atomically in that upgrade.
---
*Why now?* **:** The 1.x branch is no longer being developed and fixes like https://github.com/rust-lang/rustfmt/issues/4159 (which we need in fbcode) only land to the 2.0 branch.
---
Reviewed By: StanislavGlebik
Differential Revision: D23568780
fbshipit-source-id: b4b4a0aa683d236e2fdeb5b96d723ac2d84b9faf
Summary:
Before redacting something it would be good to check that this file is not
accessed by anything. Having log-only mode would help with that.
Reviewed By: ikostia
Differential Revision: D23503666
fbshipit-source-id: ae492d4e0e6f2da792d36ee42a73f591e632dfa4
Summary:
In the next diff I'm going to add log-only mode to redaction, and it would be
good to have a way of testing it (i.e. testing that it actually logs accesses
to bad keys).
In this diff let's use a config option that allows logging censored scuba
accesses to file, and let's update redaction integration test to use it
Reviewed By: ikostia
Differential Revision: D23537797
fbshipit-source-id: 69af2f05b86bdc0ff6145979f211ddd4f43142d2