Summary:
We'll use this command to change the mapping version we use when doing push
redirection.
Reviewed By: ikostia
Differential Revision: D24392308
fbshipit-source-id: 4dab01c0e58a8953a0c6c84c7c166977a6baf00f
Summary:
This diff add new mode of tailing based on derived data graph, it uses same functionality as backfill.
- `tail_batch_iteration` uses `bounded_traversal_dag` directly instead of leveraging `DeriveGraph::derive` so we could warm-up dependencies for each node before calling `DerivedUitls::backfill_batch_dangerous`
Reviewed By: StanislavGlebik
Differential Revision: D24306156
fbshipit-source-id: 006cb6d4df9424cd6501cb4a381b95f096e70551
Summary:
This updates ManifoldBlob to log the aformentioned data points to perf
counters. There's a bit of refactoring that also had to go into this to make
`ctx` available everywhere it's needed.
Reviewed By: aslpavel
Differential Revision: D24333040
fbshipit-source-id: 1b63bcd1e1ee36bae4dbbc1da053c7f1bdf96675
Summary: Convert derived data utils to use new style futures
Reviewed By: StanislavGlebik
Differential Revision: D24331068
fbshipit-source-id: ad658b278802afa1e4ecd44c5a24164135748790
Summary:
This trait is no longer used all that much outsides of a handful of tests, the
walker, and an admin subcommand, as it has been replaced by the `Manifest`
trait, which works over all kinds of Manifests, and has stronger typing (its
sub-entries always have a path, and they are wrapped in an enum that knows if
they're leaves or trees).
This left a bunch of old legacy code here or there, which is worth removing
to make sure we don't introduce any new callsites to this. Another motivation
is that this legacy code is often not very compatible with new code, and has
historically made it a bit tricky (everything owns a blobstore in this code,
which is pretty awkward and not at all how we do things nowadays).
There is, I think, a bit more potential here since we could also perhaps try to
remove the `HgBlobEntry` struct, but that has a callsites still, so I'm not
doing this here.
Reviewed By: StanislavGlebik
Differential Revision: D24306946
fbshipit-source-id: 8a73dbbf40a904ce19ac65d791b732091c206263
Summary: Add support for deriving all types of derived data for a given repository
Reviewed By: StanislavGlebik
Differential Revision: D23842909
fbshipit-source-id: 2fa5c4a9444169b26c5cf70d91a6cc707cca8022
Summary:
Add predicate based PutBehaviour logic to manifoldblob.
This will prevent overwrites of keys when in IfAbsent mode, and will generate useful logging in OverwriteAndLog and IsAbsent mode.
This change factors our part of the put logic to put_check_conflict, so that it can use re-used from each of the PutBehaviour cases.
Reviewed By: StanislavGlebik
Differential Revision: D24021170
fbshipit-source-id: d2e71afadada3d5e661634449108e6c9f8dc5907
Summary:
We don't have any Preserved entries anymore - now all preserved entries will be
rewritten with "noop" mapping.
This diff removes it completely
Reviewed By: mitrandir77, ikostia
Differential Revision: D24173538
fbshipit-source-id: f2d6238633cea8dc3c06f2e607b2abd76edfca6b
Summary:
Mononoke command for running the SegmentedChangelogSeeder for an existing
repository. The result is going to be a new IdMap version in the metadata
store and a new IdDag stored in the the blobstore resulting in a brand new
SegmentedChangelog bundle.
Reviewed By: krallin
Differential Revision: D24096963
fbshipit-source-id: 1eaf78392d66542d9674a99ad0a741f24bc2cb1b
Summary: Update Memblob::new callsites to ::default() in preparation for adding arguments to ::new() to specify the put behaviour desired
Differential Revision: D24021173
fbshipit-source-id: 07bf4e6c576ba85c9fa0374d5aac57a533132448
Summary: Add put behaviour handling to fileblob so that it can prevent overwrites if requested.
Differential Revision: D23933228
fbshipit-source-id: 8e74ac96b232be841174f6ad2bd2fccf92aaa90d
Summary:
This diff adds two things:
- the ability to compute the reverse of a `CommitSyncDataProvider::Test`, useful when creating both small-to-large and large-to-small `CommitSyncer` structs in tests
- the ability to set a current `CommitSyncConfigVersion` in the provider, which can also be useful, when simulating current version changes.
NB: I ended up not needing the set version functionality in my tests (further in the stack) in the end, so I can remove it, but I do think it will prove useful eventually.
Reviewed By: StanislavGlebik
Differential Revision: D24103206
fbshipit-source-id: 389169b2984684d83b0f6fdeb3be597d84cc0f12
Summary: `clippy` often complains about the use of `.len() != 0`, `.len() > 0` or `.len() == 0`and proposes to use `.is_empty()` instead. This diff does that across Mononoke.
Reviewed By: aslpavel
Differential Revision: D24099427
fbshipit-source-id: 1bba2f958485b7efb3f41bf3eae820879c92b0e5
Summary:
`get_one` is a deprecated method, because it uses incorrect logic to resolve ambiguities of multi-mapped commits: if just selects the very first of the potentially many mappings.
Correct resolution is to either handle the ambiguity at the caller site, or rely on provided resolution logic in commit_sync_outcome.rs.
Therefore, I am removing the uses of this method in this and a few surrounding commits.
In this case, the simplest thing is to replace it with `.get` and deal with multi-mappings on the client side:
- for `crossrepo map` subcommand we just print all mappings
- for `update_large_repo_bookmarks` we just fail on multi-mapping, as it seems dangerous to proceed without human intervention
Reviewed By: farnz
Differential Revision: D24030033
fbshipit-source-id: c84613579fbf8a5f6bac3c06da0cd4e0ad6c3fb0
Summary:
Remove assert_present from Blobstore trait as it had only one callsite other than the various blobstore layers/impls.
Replaced that one last call in repo_commit.rs/assert_in_blobstore() with an equivalent call to is_present.
Reviewed By: farnz
Differential Revision: D24016927
fbshipit-source-id: 764fddbebeb4b1192d196078b8824cf8a08e9691
Summary:
This diff introduces Mysql client for Rust to Mononoke as a one more backend in the same row with raw xdb connections and myrouter. So now Mononoke can use new Mysql client connections instead of Myrouter.
To run Mononoke with the new backend, pass `--use-mysql-client` options (conflicts with `--myrouter-port`).
I also added a new target for integration tests, which runs mysql tests using mysql client.
Now to run mysql tests using raw xdb connections, you can use `mononoke/tests/integration:integration-mysql-raw-xdb` and using mysql client `mononoke/tests/integration:integration-mysql`
Reviewed By: ahornby
Differential Revision: D23213228
fbshipit-source-id: c124ccb15747edb17ed94cdad2c6f7703d3bf1a2
Summary:
This diff makes blobstore healer to use MyAdmin to get replication lag for a DB shard and removes "laggable" interface for connections.
The old "laggable" API worked this way: we maintained potential connections to each possible region, then tried to query replica status on all of them. If there was no replica hosts in some of the regions, we just wanted to ignore it by handling a specific error type.
This is legacy and makes the logic more complicated. We want for the new code to use Myadmin instead.
Reviewed By: krallin
Differential Revision: D23767442
fbshipit-source-id: 9f85f07bd318ad020d203d2bcd1c8898061f7572
Summary: Finally remove version_name from CommitSyncRepos. Note that this diff adds a few TODOs that we'd need to fix later.
Reviewed By: ikostia
Differential Revision: D23929010
fbshipit-source-id: c72130af548ac7b26bc20ddaac9a59562cc75e0b
Summary: Just as in the previous diff, but this time remove bookmark_renamers from CommitSyncRepos
Reviewed By: ikostia
Differential Revision: D23910295
fbshipit-source-id: 0c2d147057c8d3e0749d5b31ef98ab5022255d95
Summary: Just as the previous diff, but this time it removes reverse_mover
Reviewed By: ikostia
Differential Revision: D23879509
fbshipit-source-id: ed111ca2d106120229c4facc0bb2435913c27966
Summary:
This diff starts to use CommitSyncDataProvider introduced in the previous diff
and removes Mover from CommitSyncRepos struct.
Reviewed By: ikostia
Differential Revision: D23878683
fbshipit-source-id: 0d54f889781aebe4726b3388343a87df783c17d4
Summary:
At the moment we have a weird setup where cross repo sync configuration is
stored in both live commit sync configuration and in normal mononoke config.
The latter is deprecated, however there are still a few parts of the codebase
that rely on that. This diff fixes one place
Reviewed By: ikostia
Differential Revision: D23903578
fbshipit-source-id: 2bf4b3d17c34fe2eb6330cd862f7b0f5cd6ffa40
Summary:
CommitSyncer is a struct that we use to remap a commit from one repo to another. It uses commit sync map to figure out which paths need to be changed. Commit sync mapping might change, and each commit sync mapping has a version associated with it.
At the moment CommitSyncer doesn't work correctly if a commit sync mapping is changed. Consider the following DAG
```
large repo
A' <- remapped with mapping V1
|
O B' <- remapped with mapping V1
| /
...
small repo
A
|
O B
| /
...
```
We have commit A and B from a small repo remapped into a large repo into commits A' and B'. They were remapped with commit sync mapping V1, which for example remaps files in "dir/" into "smallrepo/dir".
Now let's say we start to use a new mapping v2 which remaps "dir/" into "otherdir/". After this point every commit will be created with new mapping. But this is incorrect - if we create a commit on top of B in a small repo that touches "dir/file.txt" then it will be remapped into "otherdir/file.txt" in the large repo, even though every other file is still in "smallrepo/dir"!
The fix for this issue is to always use the same mapping as commit parent was using (there are a few tricky cases with merge commits and commits with no parents, but those will be dealt with separately).
This diff is the first step - it threads through LiveCommitSyncConfig all the way to the CommitSyncer object, so that CommitSyncer can always fetch whatever mapping it needs.
Reviewed By: ikostia
Differential Revision: D23845720
fbshipit-source-id: 555cc31fd4ce09f0a6fa2869bfcee2c7cdfbcc61
Summary: New type async/await can mutate variables, we no longer need synchronization for this counters
Reviewed By: ikostia
Differential Revision: D23704765
fbshipit-source-id: eb2341cb0c82b8a49c28ad3c8fd811ed3af73436
Summary: Useful when looking into blobstore corruption - you can compare all the blobstore versions by manual fetchees.
Reviewed By: krallin
Differential Revision: D23604436
fbshipit-source-id: 7b56947b0188536499514bae6615c6e81b9106c3
Summary: Going to add more features, so simplify by asyncifying first
Reviewed By: krallin
Differential Revision: D23604437
fbshipit-source-id: 52b2b372e4d3fbf1d59168c6c11311d9edf4ff0f
Summary:
This change move logic associated with mercurial changeset derivation to `mercurial_derived_data` crate.
NOTE: it is not converted to derived data infrastructure at this point, it is a preparation step to actually do this
Reviewed By: farnz
Differential Revision: D23573610
fbshipit-source-id: 6e8cbf7d53ab5dbd39d5bf5e06c3f0fc5a8305c8
Summary:
There are blobs that fail to scrub and terminate the process early for a variety of reasons; when this is running as a background task, it'd be nice to get the remaining keys scrubbed, so that you don't have a large number of keys to fix up later.
Instead of simply outputting to stdout, write keys to one of three files in the format accepted on stdin:
1. Success; you can use `sort` and `comm -3` to remove these keys from the input dat, thus ensuring that you can continue scrubbing.
2. Missing; you can look at these keys to determine which blobs are genuinely lost from all blobstores, and fix up.
3. Error; these will need running through scrub again to determine what's broken.
Reviewed By: krallin
Differential Revision: D23574855
fbshipit-source-id: a613e93a38dc7c3465550963c3b1c757b7371a3b
Summary:
With three blobstores in play, we have issues working out exactly what's wrong during a manual scrub. Make the error handling better:
1. Manual scrub adds the key as context for the failure.
2. Scrub error groups blobstores by content, so that you can see which blobstore is most likely to be wrong.
Reviewed By: ahornby, krallin
Differential Revision: D23565906
fbshipit-source-id: a199e9f08c41b8e967d418bc4bc09cb586bbb94b
Summary:
Another preparatory step for the actuall mapping model fix. This just renames
`get` method into a `get_one` to emphasize it's use-case and to ease the search later.
At the end of this change, I expect there to be no use-cases for `get_one` and expect is to be gone.
Reviewed By: mitrandir77
Differential Revision: D23574116
fbshipit-source-id: f5015329b15f3f08961006607d0f9bf10f499a88
Summary:
Generated by formatting with rustfmt 2.0.0-rc.2 and then a second time with fbsource's current rustfmt (1.4.14).
This results in formatting for which rustfmt 1.4 is idempotent but is closer to the style of rustfmt 2.0, reducing the amount of code that will need to change atomically in that upgrade.
---
*Why now?* **:** The 1.x branch is no longer being developed and fixes like https://github.com/rust-lang/rustfmt/issues/4159 (which we need in fbcode) only land to the 2.0 branch.
---
Reviewed By: StanislavGlebik
Differential Revision: D23568780
fbshipit-source-id: b4b4a0aa683d236e2fdeb5b96d723ac2d84b9faf
Summary:
Before redacting something it would be good to check that this file is not
accessed by anything. Having log-only mode would help with that.
Reviewed By: ikostia
Differential Revision: D23503666
fbshipit-source-id: ae492d4e0e6f2da792d36ee42a73f591e632dfa4
Summary:
In the next diff I'm going to add log-only mode to redaction, and it would be
good to have a way of testing it (i.e. testing that it actually logs accesses
to bad keys).
In this diff let's use a config option that allows logging censored scuba
accesses to file, and let's update redaction integration test to use it
Reviewed By: ikostia
Differential Revision: D23537797
fbshipit-source-id: 69af2f05b86bdc0ff6145979f211ddd4f43142d2
Summary:
Fsnodes have a lot of data about files, but right now we can't access it
through a Fsnode lookup or a manifest walk, because the LeafId for a Fsnode is
just the content id and the file type.
This is a bit sad, because it means we e.g. cannot dump a manifest with file
sizes (D23471561 (179e4eb80e)).
Just changing the LeafId is easy, but that brings a new problem with Fsnode
derivation.
Indeed, deriving manifests normally expects us to have the "derive leaf"
function produce a LeafId (so we'd want to produce a `FsnodeFile`), but in
Fsnodes, this currently happens in deriving trees instead.
Unfortunately, we cannot easily just move the code that produces `FsnodeFile`
from the tree derivation to the leaf derivation, that is, do:
```
fn check_fsnode_leaf(
leaf_info: LeafInfo<FsnodeFile, (ContentId, FileType)>,
) -> impl Future<Item = (Option<FsnodeSummary>, FsnodeFile), Error = Error>
```
Indeed, the performance of Fsnode derivation relies on all the leaves for a
given tree being derived together with the tree and its parents in context.
So, we'd need the ability for deriving a new leaf to return something different
from the actual leaf id. This means we want to return a `(ContentId,
FileType)`, even though our `LeafId` is a `FsnodeFile`.
To do this, this diff introduces a new `IntermediateLeafId` type in the
derivation. This represents the type of the leaf that is passed from deriving a
leaf to deriving a tree. We need to be able to turn a real `LeafId` into it,
because sometimes we don't re-derive leaves.
I think we could also refactor some of the code that passes a context here to
just do this through the `IntermediateLeafId`, but I didn't look into this too
much.
So, this diff does that, and uses it in Mononoke Admin so we can print file
sizes.
Reviewed By: StanislavGlebik
Differential Revision: D23497754
fbshipit-source-id: 2fc480be0b1e4d3d261da1d4d3dcd9c7b8501b9b
Summary:
In the next diff I'm going to add log_only mode for redaction.
And in this diff I make a small refactoring that makes next diff simpler.
find_files_with_given_content_id_blobstore_keys don't accept tasks anymore,
just content keys.
Reviewed By: aslpavel
Differential Revision: D23535829
fbshipit-source-id: 1dac37f5ea7038fc779ad51192a290fcc23e6556
Summary:
I'm going to change this function soon, so it's nice to asyncify it to make
next diffs simpler and also remove duplicated logic.
Also remove unnecessary `logger` parameter - we can always get logger from CoreContext
Reviewed By: krallin
Differential Revision: D23501634
fbshipit-source-id: 7ad2fc17167e4107481ceb230e0b7cb3e7f2549a
Summary:
I pattern matched off of this for the previous diff in this stack, and spotted
a bit of clean up that might make sense here:
- Using `.help()` for a subcommand overrides the whole help text. We meant to
use `.about()` here. I fixed this in some copy-pasted code as well.
- Printing debug output alongside real output makes it harder to select the
real output. I fixed this by logging debug output to stderr instead.
Reviewed By: StanislavGlebik
Differential Revision: D23471560
fbshipit-source-id: 7900cfe65613c48abd77faad6d6a45a7aa523b36
Summary:
This adds a subcommand for dumping all the paths in a repository. This is
helpful when you have a Content ID, limited imagination and time on your hands,
and you'd like to turn those into a file path where that Content ID lives.
This uses fsnodes for the traversal because that's O(# directories) as opposed
top O(# files). I had an earlier implementation that used unodes, but that was
really slow.
Reviewed By: StanislavGlebik
Differential Revision: D23471561
fbshipit-source-id: 948bfd20939adf4de0fb1e4b2852ad4d12182f16
Summary:
That's one of the sev followups. Before redacting a file content let's check if
it exists in "main-bookmark" (which is be default master), and refuse to redact
if it actually exists.
If this check passes (i.e. the content we are about to redact is not reachable
from master) that doesn't mean that we are 100% safe. E.g. this comment can be
in ancestor of master, or in any other repo or it can be added in the next
commit.
This check is a best-effort check to prevent shooting ourselves in the foot.
Reviewed By: aslpavel
Differential Revision: D23476278
fbshipit-source-id: 5a4cd10964a65b8503ba9a6391f17319f0ce37d8
Summary: Will change it in the next diff, so let's asyncify it now.
Reviewed By: aslpavel
Differential Revision: D23475332
fbshipit-source-id: f25fb7dc16f99cb140df9374f435e071401c2b90
Summary: This is streaming clone warmup binary as per https://fb.quip.com/hfuBAdYnzr9M
Reviewed By: StanislavGlebik
Differential Revision: D23347029
fbshipit-source-id: f187a2f3529a7eae5998bab199228bfbe6057e6e
Summary:
We might need to rebackfill blame for configerator (see
https://fburl.com/hfylxmag). It's good to have a command that shows how many
files with rejected blame we have.
Reviewed By: farnz
Differential Revision: D23267648
fbshipit-source-id: 33e658b53391285461890bda3a94b391e6063c12
Summary:
In a repository with files with large histories we run into a lot of SqlTimeout
errors while fetching file history to serve getpack calls. However fetching the
whole file history is not really necessary - client knows how to work with
partial history i.e. if client misses some portion of history then it would
just fetch it on demand.
This diff adds way to add a limit on how many entries were going to be fetched, and if more entries were fetched then we return FilenodeRangeResult::TooBig. The downside of this diff is that we'd have to do more sequential database
queries.
Reviewed By: krallin
Differential Revision: D23025249
fbshipit-source-id: ebed9d6df6f8f40e658bc4b83123c75f78e70d93