Summary:
Paying the setup and teardown overhead of multiple processes seems silly, when we can pack in parallel in a single process.
Make it possible to run multiple packing runs from a single packer process
Reviewed By: ahornby
Differential Revision: D28508527
fbshipit-source-id: eab07d028db46d62731f06effbde2f5bc5579000
Summary:
manual_scrub was using the ordered form of buffered so that checkpoint was written correctly.
This diff switches to buffered_unordered which can give better throughput. To do so checkpoint uses a tracker to know what keys have completed, so it can save the latest done key which has no preceding keys still executing.
Reviewed By: farnz
Differential Revision: D28438371
fbshipit-source-id: 274aa371a0c33d37d0dc7779b04daec2b5e1bc15
Summary:
This struct is intended to be a single entry-point to the megarepo logic. It is also intended to be owned directly by scs_server without the `Mononoke` struct (from `mononoke_api`) intermediary. In effect, this means that mononoke server won't be affected by `MegarepoApi` at all.
Apart from adding this struct, this diff also adds instantiation of prod `AsyncMethodRequestQueue` and wires it up to the scs_server to enqueue and poll requests.
Reviewed By: StanislavGlebik
Differential Revision: D28356563
fbshipit-source-id: b67ee48387d794fd333d106d3ffd40124086c97e
Summary:
Adding mappng to keep track of two things:
1) keep track of the latest source commit that was synced into a given target - this will be used during sync_changeset() method to validate if a parent changeset of a given changeset was already synced
2) which source commit maps to what target commit
Reviewed By: ikostia
Differential Revision: D28319908
fbshipit-source-id: f776d294d779695e99d644bf5f0a5a331272cc14
Summary:
This is going to be use to rewrite (or transform) commits from source to
target. This diff does a few tihngs:
1) adds a MultiMover type and a function that produces a mover given a config. This is similar to Mover type we used for fbsource<-> ovrsource megarepo sync, though this time it can produce a few target paths for a given source path.
2) Moves `rewrite_commit` function from cross_repo_sync to megarepo_api, and make it work with MultiMover.
Reviewed By: ikostia
Differential Revision: D28259214
fbshipit-source-id: 16ba106dc0c65cb606df10c1a210578621c62367
Summary:
This crate is a foundation for the async requests support in megarepo service.
The idea is to be able to store serialized parameters in the blobstore upon
request arrival, and to be able to query request results from the blobstore
while polling.
This diff manipulates the following classes of types:
- param types for async methods: self-explanatory
- response types: these contain only a resulting value of a completed successful execution
- stored result types: these contain a result value of a completed execution. It may either be successful or failed. These types exist for the purpose of preserving execution result in the blobstore.
- poll-response types: these contain and option of a response. If the optional value is empty, this means that the request is not yet ready
- polling tokens: these are used by the client to ask about the processing status for a submitted request
Apart from that, some of these types have both Rust and Thrift counterparts, mainly for the purposes of us being able to implement traits for Rust types.
Relationships between these types are encoded in various traits and their associated types.
The lifecycle of an async request is as follows therefore:
1. the request is submitted by the client, and enqueued
1. params are serialized and saved into a blobstore
1. an entry is created in the SQL table
1. the key from that table is used to create a polling token
1. some external system processes a request [completely absent form this diff]
1. it notices a new entry in the queue
1. it reads request's params from the blobstore
1. it processes the request
1. it preserves either a success of a failure of the request into the blobstore
1. it updates the SQL table to mention that the request is now ready to be polled
1. the client polls the request
1. queue struct receives a polling token
1. out of that token it constructs DB keys
1. it looks up the request row and checks if it is in the ready state
1. if that is the case, it reads the result_blobstore_key value and fetches serialized result object
1. now it has to turn this serialized result into a poll response:
1. if the result is absent, poll response is a success with an empty payload
1. if the result is present and successful, poll response is a success with the result's successful variant as a payload
1. if the result is present and is a failure, the polling call throws a thrift exception with that failure
Note: Why is there yet another .thrift file introduced in this diff? I felt like these types aren't a part of the scs interface, so they don't belong in `source_control.thrift`. On the other hand, they wrap things defined in `source_control.thrift,` so I needed to include it.
Reviewed By: StanislavGlebik
Differential Revision: D27964822
fbshipit-source-id: fc1a33a799d01c908bbe18a5394eba034b780174
Summary:
Because cachelib is not initialised at this point, it returns `None` unconditionally.
I'm refactoring the cachelib bindings so that this returns an error - take it out completely for now, leaving room to add it back in if caching is useful here
Reviewed By: sfilipco
Differential Revision: D28286986
fbshipit-source-id: cd9f43425a9ae8f0eef6fd32b8cd0615db9af5f6
Summary:
This is where async requests are logged to be processed, and from where they
are polled later.
It will acquire more functionality when the actual request processing business
logic is implemented.
Reviewed By: StanislavGlebik
Differential Revision: D28092910
fbshipit-source-id: 00e45229aa2db73fa0ae5a1cf99b8f2d3a162006
Summary: Upstream crate has landed my PR for zstd 1.4.9 support and made a release, so can remove this patch now.
Reviewed By: ikostia
Differential Revision: D28221163
fbshipit-source-id: b95a6bee4f0c8d11f495dc17b2737c9ac9142b36
Summary:
We used to carry patches for Tokio 0.2 to add support for disabling Tokio coop
(which was necessary to make Mononoke work with it), but this was upstreamed
in Tokio 1.x (as a different implementation), so that's no longer needed. Nobody
else besides Mononoke was using this.
For Hyper we used to carry a patch with a bugfix. This was also fixed in Tokio
1.x-compatible versions of Hyper. There are still users of hyper-02 in fbcode.
However, this is only used for servers and only when accepting websocket
connections, and those users are just using Hyper as a HTTP client.
Reviewed By: farnz
Differential Revision: D28091331
fbshipit-source-id: de13b2452b654be6f3fa829404385e80a85c4420
Summary:
This used to be used by Mononoke, but we're now on Tokio 1.x and on
corresponding versions of Gotham so it's not needed anymore.
Reviewed By: farnz
Differential Revision: D28091091
fbshipit-source-id: a58bcb4ba52f3f5d2eeb77b68ee4055d80fbfce2
Summary:
Keeping the `Changesets` trait as well as its implementations in the same crate means that users of `Changesets` also transitively depend on everything that is needed to implement it.
Flatten the dependency graph a little by splitting it into two crates: most users of `Changesets` will only depend on the trait definition. Only the factories need depend on the implementations.
Reviewed By: krallin
Differential Revision: D27430612
fbshipit-source-id: 6b45fe4ae6b0fa1b95439be5ab491b1675c4b177
Summary:
NOTE: there is one final pre-requisite here, which is that we should default all Mononoke binaries to `--use-mysql-client` because the other SQL client implementations will break once this lands. That said, this is probably the right time to start reviewing.
There's a lot going on here, but Tokio updates being what they are, it has to happen as just one diff (though I did try to minimize churn by modernizing a bunch of stuff in earlier diffs).
Here's a detailed list of what is going on:
- I had to add a number `cargo_toml_dir` for binaries in `eden/mononoke/TARGETS`, because we have to use 2 versions of Bytes concurrently at this time, and the two cannot co-exist in the same Cargo workspace.
- Lots of little Tokio changes:
- Stream abstractions moving to `tokio-stream`
- `tokio::time::delay_for` became `tokio::time::sleep`
- `tokio::sync:⌚:Sender::send` became `tokio::sync:⌚:Sender::broadcast`
- `tokio::sync::Semaphore::acquire` returns a `Result` now.
- `tokio::runtime::Runtime::block_on` no longer takes a `&mut self` (just a `&self`).
- `Notify` grew a few more methods with different semantics. We only use this in tests, I used what seemed logical given the use case.
- Runtime builders have changed quite a bit:
- My `no_coop` patch is gone in Tokio 1.x, but it has a new `tokio::task::unconstrained` wrapper (also from me), which I included on `MononokeApi::new`.
- Tokio now detects your logical CPUs, not physical CPUs, so we no longer need to use `num_cpus::get()` to figure it out.
- Tokio 1.x now uses Bytes 1.x:
- At the edges (i.e. streams returned to Hyper or emitted by RepoClient), we need to return Bytes 1.x. However, internally we still use Bytes 0.5 in some places (notably: Filestore).
- In LFS, this means we make a copy. We used to do that a while ago anyway (in the other direction) and it was never a meaningful CPU cost, so I think this is fine.
- In Mononoke Server it doesn't really matter because that still generates ... Bytes 0.1 anyway so there was a copy before from 0.1 to 0.5 and it's from 0.1 to 1.x.
- In the very few places where we read stuff using Tokio from the outside world (historical import tools for LFS), we copy.
- tokio-tls changed a lot, they removed all the convenience methods around connecting. This resulted in updates to:
- How we listen in Mononoke Server & LFS
- How we connect in hgcli.
- Note: all this stuff has test coverage.
- The child process API changed a little bit. We used to have a ChildWrapper around the hg sync job to make a Tokio 0.2.x child look more like a Tokio 1.x Child, so now we can just remove this.
- Hyper changed their Websocket upgrade mechanism (you now need the whole `Request` to upgrade, whereas before that you needed just the `Body`, so I changed up our code a little bit in Mononoke's HTTP acceptor to defer splitting up the `Request` into parts until after we know whether we plan to upgrade it.
- I removed the MySQL tests that didn't use mysql client, because we're leaving that behind and don't intend to support it on Tokio 1.x.
Reviewed By: mitrandir77
Differential Revision: D26669620
fbshipit-source-id: acb6aff92e7f70a7a43f32cf758f252f330e60c9
Summary:
Knowing the prepushrebase changeset id is required for retroactive_review.
retroactive_review checks landed commits, but verify_integrity hook runs on a commit before landing. This way the landed commit has no straightforward connection with the original one and retroactive_review can't acknowledge if verify_integrity have seen it.
Reviewed By: krallin
Differential Revision: D27911317
fbshipit-source-id: f7bb0cfbd54fa6ad2ed27fb9d4d67b9f087879f1
Summary:
Split pushrebase crate into pushrebase hook definition and pushrebase implementation.
Before this change it was impossible to store an attribute in BlobRepo that would depend on PushrebaseHook as it would create a circular dependency `pushrebase -> blobrepo -> pushrebase`.
Reviewed By: krallin
Differential Revision: D27968029
fbshipit-source-id: 030ef1b02216546cd3658de1f417bc52b8dd9843
Summary:
Update the zstd crates.
This also patches async-compression crate to point at my fork until upstream PR https://github.com/Nemo157/async-compression/pull/117 to update to zstd 1.4.9 can land.
Reviewed By: jsgf, dtolnay
Differential Revision: D27942174
fbshipit-source-id: 26e604d71417e6910a02ec27142c3a16ea516c2b
Summary:
`MononokeMegarepoConfig` is going to be a single point of access to
config storage system - provide both writes and reads. It is also a trait, to
allow for unit-test implementations later.
This diff introduces a trait, as well as implements the write side of the
configerator-based implementor. The read side/oss impl/test impl
is left `unimplemented`. Read side and test impl will be implemented in the future.
Things I had to consider while implementing this:
- I wanted to store each version of `SyncTargetConfig` in an individual
`.cconf` in configerator
- at the same time, I did not want all of them to live in the same dir, to
avoid having dirs with thousands of files in it
- dir sharding uses sha1 of the target repo + target bookmark + version name,
then separates it into a dir name and a file name, like git does
- this means that these `.cconf` files are not "human-addressable" in the
configerator repo
- to help this, each new config creation also creates an entry in one of the
"index" files: human-readable maps from target + version name to a
corresponding `.cconf`
- using a single index file is also impractical, so these are separated by
ascification of the repo_id + bookmark name
Note: this design means that there's no automatic way to fetch the list of all
targets in use. This can be bypassed by maintaining an extra index layer, whihc
will list all the targets. I don't think this is very important atm.
Reviewed By: StanislavGlebik
Differential Revision: D27795663
fbshipit-source-id: 4d824ee4320c8be5187915b23e9c9d261c198fe1
Summary: Remove gotham and hyper patches as referenced PRs have been released
Reviewed By: krallin
Differential Revision: D27905248
fbshipit-source-id: a2b25562614b71b25536b29bb1657a3f3a5de83c
Summary:
We want to distinguish user vs system errors in `configo_client` and its users
(`mononoke_config` for instance). The reason is to allow `scs_server`
distinguish the two types of errors. Normally user errors would only ever be
instantiated fairly "shallowly" in the `scs_server` itself, but `configo_client`
is a transactional client (by this I mean that it calls user-provided transformation functions on fetched data), so we need to allow for a user error to originate from these updater functions.
Reviewed By: StanislavGlebik
Differential Revision: D27880928
fbshipit-source-id: e00a5580a339fdabc4b858235da9cd7e9fc7a376
Summary:
This is a baseline packer, to let us experiment with packing.
It chooses the dictionary that gives us the smallest size, but does nothing intelligent around key handling.
Note that at this point, packing is not always a win - e.g. the alias blobs are 432 bytes in total, but the resulting pack is 1528 bytes.
Reviewed By: ahornby
Differential Revision: D27795486
fbshipit-source-id: 05620aadbf7b680cbfcb0932778f5849eaab8a48
Summary:
This is not used on its on, but in subsequent diffs I will add a use-case, by
the megarepo configs crate.
When built in non-fbcode mode, this crate does not export anything. I chose this approach as opposed to the approach of exporting no-op stubs to force the clients to pay attention and implement gating their side too. This seems reasonable for a rather generic configo client.
Reviewed By: StanislavGlebik
Differential Revision: D27790753
fbshipit-source-id: d6dcec884ed7aa88abe5796ef0e58be8525893e2
Summary: Now we're on rustc 1.51 the fork is no longer needed.
Reviewed By: dtolnay
Differential Revision: D27827632
fbshipit-source-id: 131841590d3987d53f5f8afb5ebc205cd36937fb
Summary:
There is a very frustrating operation that happens often when working on the
Mononoke code base:
- You want to add a flag
- You want to consume it in the repo somewhere
Unfortunately, when we need to do this, we end up having to thread this from a
million places and parse it out in every single main() we have.
This is a mess, and it results in every single Mononoke binary starting with
heaps of useless boilerplate:
```
let matches = app.get_matches();
let (caching, logger, mut runtime) = matches.init_mononoke(fb)?;
let config_store = args::init_config_store(fb, &logger, &matches)?;
let mysql_options = args::parse_mysql_options(&matches);
let blobstore_options = args::parse_blobstore_options(&matches)?;
let readonly_storage = args::parse_readonly_storage(&matches);
```
So, this diff updates us to just use MononokeEnvironment directly in
RepoFactory, which means none of that has to happen: we can now add a flag,
parse it into MononokeEnvironment, and get going.
While we're at it, we can also remove blobstore options and all that jazz from
MononokeApiEnvironment since now it's there in the underlying RepoFactory.
Reviewed By: HarveyHunt
Differential Revision: D27767700
fbshipit-source-id: e1e359bf403b4d3d7b36e5f670aa1a7dd4f1d209
Summary:
I'd like to be able to track time series for access within Mononoke. The
underlying use case here is that I want to be able to track the max count of
connections in our SQL connection pools over time (and possibly other things in
the future).
Now, the obvious question is: why am I rolling my own? Well, as it turns out,
there isn't really an implementation of this that I can reuse:
- You might expect to be able to track the max of a value via fb303, but you
can't:
https://www.internalfb.com/intern/diffusion/FBS/browse/master/fbcode/fb303/ExportType.h?commit=0405521ec858e012c0692063209f3e13a2671043&lines=26-29
- You might go look in Folly, but you'll find that the time series there only
supports tracking Sum & Average, but I want my timeseries to track Max (and
in fact I'd like it to be sufficiently flexible to track anything I want):
https://www.internalfb.com/intern/diffusion/FBS/browse/master/fbcode/folly/stats/BucketedTimeSeries.h
It's not the first time I've ran into a need for something like this. I needed
it in RendezVous to track connections over the last 2 N millisecond intervals,
and we needed it in metagit for host draining as well (note that the
implementation here is somewhat inspired by the implementation there).
Reviewed By: mzr
Differential Revision: D27678388
fbshipit-source-id: ba6d244b8bb848d4e1a12f9c6f54e3aa729f6c9c
Summary: Update the `curl` and `curl-sys` crates to use a patched version that supports `CURLOPT_SSLCERT_BLOB` and similar config options that allow the use of in-memory TLS credentials. These options were added last year in libcurl version `7.71.0`, but the Rust bindings have not yet been updated to support them. I intend to upstream this patch, but until then, this will allow us to use these options in fbcode.
Reviewed By: quark-zju
Differential Revision: D27633208
fbshipit-source-id: 911e0b8809bc0144ad8b32749e71208bd08458fd
Summary: This has been superseded by `RepoFactory`.
Reviewed By: krallin
Differential Revision: D27400617
fbshipit-source-id: e029df8be6cd2b7f3a5917050520b83bce5630e9
Summary: Use the equivalent function from `repo_factory`.
Reviewed By: krallin
Differential Revision: D27363470
fbshipit-source-id: dce3cf843174caa2f9ef7083409e7935749be4cd
Summary:
Add a factory for building development and production repositories.
This factory can be re-used to build many repositories, and they will share
metadata database factories and blobstores if their configs match.
Similarly, the factory will only load redacted blobs once per metadata
database config, rather than once per repo.
Reviewed By: krallin
Differential Revision: D27323369
fbshipit-source-id: 951f7343af97f5e507b76fb822ad2e66f4d8f3bd
Summary: Reduce amount of manual steps needed to restart a manual scrub by checkpointing where it has got to to a file.
Differential Revision: D27588450
fbshipit-source-id: cb0eda7d6ff57f3bb18a6669d38f5114ca9196b0
Summary:
Some manual scrub runs can take a long time. Provide progress feedback logging.
Includes a --quiet option for when progress reporting not required.
Reviewed By: farnz
Differential Revision: D27588449
fbshipit-source-id: 00840cdf2022358bc10398f08b3bbf3eeec2b299
Summary: Use the test factory for the remaining existing tests.
Reviewed By: StanislavGlebik
Differential Revision: D27169443
fbshipit-source-id: 00d62d7794b66f5d3b053e8079f09f2532d757e7
Summary:
Create a factory that can be used to build repositories in tests.
The test repo factory will be kept in a separate crate to the production repo factory, so that it can depend on a smaller set of dependencies: just those needed for in-memory test repos. This should eventually help make compilation speeds faster for tests.
A notable difference between the test repos produced by this factory and the ones produced by `blobrepo_factory` is that the new repos share the in-memory metadata database. This is closer to what we use in production, and in a couple of places it is relied upon and existing tests must use `dangerous_override` to make it happen.
Reviewed By: ahornby
Differential Revision: D27169441
fbshipit-source-id: 82541a2ae71746f5e3b1a2a8a19c46bf29dd261c
Summary:
We need to bump SCS counters expressing Mononoke's QPS. They will look something like:
`requests:mononoke:oregon:carolina` for requests coming from proxygen in prn and mononoke in frc.
CSLB expects regions' full names.
We're getting src region from proxygen as a header.
Reviewed By: krallin
Differential Revision: D27082868
fbshipit-source-id: 12accb8a9df5cf6a80c2c281d2f61ac1e68176d1
Summary:
Add `repo_derived_data`. This is a new struct that encapsulates derived data
configuration and lease operations, and will be used in the facet-based
construction of repositories.
Reviewed By: ahornby
Differential Revision: D27169431
fbshipit-source-id: dee7c032deb93db8934736c111ba7238a6aaf935
Summary:
Add `repo_identity`. This is a new struct that encapsulates repository
identity and will be used in the facet-based construction of repositories.
Reviewed By: ahornby
Differential Revision: D27169445
fbshipit-source-id: 02a435bba54a633190c6d2e4316e86726aecfdf0
Summary:
Resolve a circular dependency whereby `BlobRepo` needs to depend on
`Arc<dyn SegmentedChangelog>`, but the segmented changelog implementation
depends on `BlobRepo`, by moving the trait definition to its own crate.
Reviewed By: sfilipco
Differential Revision: D27169423
fbshipit-source-id: 5bf7c632607dc8baba40d7a9d65e96e265d58496
Summary:
This introduces a basic building block for query batching. I called this
rendezvous, since it's about multiple queries meeting up in the same place :)
There are a few (somewhat conflicting) goals this tries to satisfy, so let's go
over them:
1), we'd like to reduce the total number of queries made by batch jobs. For
example, group hg bonsai lookups made by the walker. Those jobs are
characterized by the fact that they have a lot of queries to make, all the
time. Here's an example: https://fburl.com/ods/zuiep7yh.
2), we'd like to reduce the overall number of connections held to MySQL by
our tasks. The main way we achieve this is by reducing the maximum number of
concurrent queries. Indeed, a high total number of queries doesn't necessarily
result in a lot of connections as long as they're not concurrent, because we
can reuse connections. On the other hand, if you dispatch 100 concurrent
queries, that _does_ use 100 connections. This is something that applies to
batch jobs due to their query volume, but also to "interactive" jobs like
Mononoke Server or SCS, just not all the time. Here's an example:
https://fburl.com/ods/o6gp07qp (you can see the query count is overall low, but
sometimes spikes substantially).
2.1) It's also worth noting that concurrent queries are often the result of
many clients wanting the same data, so deduplication is also useful here.
3), we also don't want to impact the latency of interactive jobs when they
need to a little query here or there (i.e. it's largely fine if our jobs all
hold a few connections to MySQL and use them somewhat consistently).
4), we'd like this to make it easier to do batching right. For example, if
you have 100 Bonsais to map to hg, you should be able to just map and call
`future::try_join_all` and have that do the right thing.
5), we don't want "bad" queries to affect other queries negatively. One
example would be the occasional queries we make to Bonsai <-> Hg mapping in
`known` for thousands (if not more) of rows.
6), we want this to be easy to incorporate into the codebase.
So, how do we try to address all of this? Here's how:
- We ... do batching, and we deduplicate requests in a batch. This is the
easier bit and should address #1, #2 and #2.1, #4.
- However, batching is conditional. We notably don't batch very large requests
with the rest (addresses #5). We also don't batch small queries all the time:
we only batch if we are observing a throughput of queries that suggests we
can find some benefit in batching (this targets #3).
- Finally, we have some utilities for common cases like having to group by repo
id (this is `MultiRendezVous`), and this is all configurable via tunables
(and the default is to not do anything).
Reviewed By: StanislavGlebik
Differential Revision: D27010317
fbshipit-source-id: 4a2397255f9785c6722c02e4d419438fd0aafa07
Summary:
Our current straming changelog updater logic is written in python, and it has a
few downsides:
1) It writes directly to manifold, which means it bypasses all the multiplexed
blobstore logic...
2) ...more importantly, we can't write to non-manifold blobstores at all.
3) There are no tests for the streaming changelogs
This diff moves the logic of initial creation of streaming changelog entry to
rust, which should fix the issues mentioned above. I want to highligh that
this implementation only works for the initial creation case i.e. when there are no
entries in the database. Next diffs will add incremental updates functionality.
Reviewed By: krallin
Differential Revision: D27008485
fbshipit-source-id: d9583bb1b98e5c4abea11c0a43c42bc673f8ed48
Summary:
BonsaiChangesets are rarely mutated, and their maps are stored in sorted order,
so we can use `SortedVectorMap` to load them more efficiently.
In the cases where mutable maps of filechanges are needed, we can use `BTreeMap`
during the mutation and then convert them to `SortedVectorMap` to store them.
Reviewed By: mitrandir77
Differential Revision: D25615279
fbshipit-source-id: 796219c1130df5cb025952bb61002e8d2ae898f4
Summary:
Add a microbenchmark for deriving data with large directories.
This benchmark creates a commit with 100k files in a single directory, and then
derives data for that commit and 10 descendant commits, each of which add,
modify and remove some files.
Reviewed By: ahornby
Differential Revision: D26947361
fbshipit-source-id: 4215f1ac806c53a112217ceb10e50cfad56f4f28
Summary: Rename this benchmark to a specific name so that we can add new benchmarks.
Differential Revision: D26947362
fbshipit-source-id: a1d060ee79781aa0ead51f284517471431418659