Summary: Add context to show the affected key if there are problems peeking a key.
Reviewed By: farnz
Differential Revision: D23003001
fbshipit-source-id: b46b7626257f49d6f11e80a561820e4b37a5d3b0
Summary:
Now that the previous diff has pre-computed the hash value using EagerHashMemo, its less expensive to try a read-lock only get() first before committing to a write lock acquiring insert().
The combination of these and the previous diff moved WalkState::visit from dominating the cpu profile to not ( the path interning dominates now ).
Reviewed By: krallin
Differential Revision: D22975881
fbshipit-source-id: 90b2be83282ee2095c517c0d4f13536ddadf6267
Summary:
DashMap takes the hash of its keys multiple times, once outside the lock, and then once or twice inside the lock depending if the key is present in the shard.
Pre-computing the hash value using EagerHashMemo means its done only once and more importantly, outside the lock.
To use EagerHashMemo one needs to supply the BuildHasher, so its added as a struct member and the record method is made a member function.
Reviewed By: farnz
Differential Revision: D22975878
fbshipit-source-id: c2ca362fdfe31e5dca329e6200029207427cd9a1
Summary:
Matches the `getcommitdata` SSH endpoint.
This is going to be used to remove the requirement that client repostories
need to have all commits locally.
Reviewed By: krallin
Differential Revision: D22979458
fbshipit-source-id: 75d7265daf4e51d3b32d76aeac12207f553f8f61
Summary:
The query we use to select blobs to heal is naturally expensive, due to the use of a subquery. This means that finding the perfect queue limit is hard, and we depend on task restarts to handle brief overload of MySQL.
Give us a fast fall in batch size (halve on each failure), and slow climb back (10% climb on each success), and a random delay after each failure before retrying.
Reviewed By: StanislavGlebik
Differential Revision: D23028518
fbshipit-source-id: f2909fe792280f81d604be99fabb8b714c1e6999
Summary:
`is_tree` weren't part of the cache key, and that means we could have returned
incorrect history if we had a file and a directory with the same name.
This diff fixes it.
Reviewed By: krallin
Differential Revision: D23028527
fbshipit-source-id: 98a3b2028fa62231dfb570a76fb836374ce1eed0
Summary:
I noticed that fastreplay doesn't init tunables, and that means that it doesn't
get the updates, and more importantly it doesn't use default values of
tunables.
That doesn't look expected (but lmk if I'm wrong!)
Reviewed By: krallin
Differential Revision: D23027311
fbshipit-source-id: ee43d02457d2240ebeb1530c672cb3847bc3afd4
Summary: This has my into_key() PR https://github.com/xacrimon/dashmap/pull/91 merged so the patch pointing to my fork is also removed.
Reviewed By: farnz
Differential Revision: D22896911
fbshipit-source-id: 188d438ce2aa20cfb3c466a62227d1cd27625f74
Summary:
Vendor ahash 0.4.4. In tests I haven't found this update significant in mononoke walker performance, but might as well be current now I'd tried it.
I have found that wrapping ahash in a memoizing hasher helps, but that is for another diff.
Reviewed By: farnz
Differential Revision: D22864635
fbshipit-source-id: 5019259273ae3bd2df95cdd18adceed895baf8f2
Summary: Add a non-thrift header to packblob so we can vary thrift protocol in future.
Reviewed By: farnz
Differential Revision: D22953758
fbshipit-source-id: a114a350105e75cbe57f6c824295d863c723f32f
Summary:
This is more complex than previous libraries, mainly because `dag` defines APIs
(traits) used by other code, which might raise error type not interested
by `dag` itself. `BackendError::Other(anyhow::Error)` is currently used to
capture types that do not fit in `dag`'s predefined error types.
Reviewed By: sfilipco
Differential Revision: D22883865
fbshipit-source-id: 3699e14775f335620eec28faa9a05c3cc750e1d1
Summary:
Prefix some `Result` with `dag::Result`. Since `dag::Result` is just
`anyhow::Result` for now, this does not change anything but makes
it more compatible with upcoming changes.
Reviewed By: sfilipco
Differential Revision: D22883864
fbshipit-source-id: 95a26897ed026f1bb8000b7caddeb461dcaad0e7
Summary:
To allow EdenFS to get aux manifest data from Mononoke without needing to derive fsnodes, provide
a mechanism to list a manifest using the hg manifest id that returns the size and content hashes
of each of the files.
NOTE: this is temporary until the EdenAPI server is fully online and serving this data.
Reviewed By: krallin
Differential Revision: D22975967
fbshipit-source-id: 0a25da6d74534d42fc3b5f38ba3b72107b209681
Summary: Previously it was opened twice, even though there were no reason to do it.
Reviewed By: krallin
Differential Revision: D22976149
fbshipit-source-id: 426858da4548f1eaffe1d989e5424937af2583a5
Summary:
Factor out the walkers state internals to BuildStateHasher and StateMap
This change keeps the defaults the same using DashMap and ahash::RandomState and uses the same ahash version that DashMap defaults to internally.
This is in preparation for the next diff the where the ahash dependency is updated to 0.4.4. Though it was clearer not to combine the refactoring and the update of the hasher used in the same diff.
Reviewed By: ikostia
Differential Revision: D22851585
fbshipit-source-id: 84fa0dc73ff9d32f88ad390243903812a4a48406
Summary:
Only emit NodeData from walker if required to save some memory. Each of the walks can now specify which NodeData it is interested in observing in the output stream.
We still need to emit Some as part of the Option<NodeData> in the output stream as it is used in things like the final count of loaded objects. Rather than stream over Option<Option<NodeData>> we instead add a NodeData::NotRequired variant
Reviewed By: markbt
Differential Revision: D22849831
fbshipit-source-id: ef212103ac2deb9d66b017b8febe233eb53c9ed3
Summary:
Extract verify_working_copy_inner function, which lets directly specify
source/target repo, hash and movers. It can be useful to verify equivalence of
two commits even if they are not in commit equivalence mapping.
Reviewed By: krallin
Differential Revision: D22950840
fbshipit-source-id: ab30be7190e29db3343b846b48333d7c7339d043
Summary: Move it from `'static` BoxFutures to async_trait and lifetimes
Reviewed By: markbt
Differential Revision: D22927171
fbshipit-source-id: 637a983fa6fa91d4cd1e73d822340cb08647c57d
Summary:
This is a backout of D22912569 (34760b5164), which is breaking opt-clang-thinlto builds on platform007 (S206790).
Original commit changeset: 5ffdc48adb1f
Reviewed By: aaronabramov
Differential Revision: D22956288
fbshipit-source-id: 45940c288d6f10dfe5457d295c405b84314e6b21
Summary:
Added more logs when running the binary to be able to track the progress more easily.
Saved bonsai hashes into a file. In case we fail at deriving data types, we can still try to derive them manually with the saves hashes and avoid running the whole tool again.
Reviewed By: StanislavGlebik
Differential Revision: D22943309
fbshipit-source-id: e03a74207d76823f6a2a3d92a1e31929a39f39a5
Summary:
Large commits and many hooks can mean checking 100 commits at a time overload
the system. Reduce the default concurrency to something more reasonable.
While we're here, lets use the proper mechanism for default values in clap.
Reviewed By: ikostia
Differential Revision: D22945597
fbshipit-source-id: 0f0a086c3b74bec614ada44a66409c8d2b91fe69
Summary:
Argument names should be `snake_case`. Long options should be `--kebab-case`.
Retain the old long options as aliases for compatibility.
Reviewed By: HarveyHunt
Differential Revision: D22945600
fbshipit-source-id: a290b3dc4d9908eb61b2f597f101b4abaf3a1c13
Summary:
Add `--log-interval` to log every N commits, so that it can be seen to be
making progress in the logs.
The default is set to 500, which logs about once every 10 seconds on my devserver.
Reviewed By: HarveyHunt
Differential Revision: D22945599
fbshipit-source-id: 7fc09b907793ea637289c9018958013d979d6809
Summary: Commitcloud fillers use wishlist priority because we want them to wait their turn behind other users; let's also stop them from flooding the blobstore healer queue by making them background priority.
Reviewed By: ahornby
Differential Revision: D22867338
fbshipit-source-id: 5d16438ea185b580f3537e3c4895a545483eca7a
Summary:
Backfillers and other housekeeping processes can run so far ahead of the blobstore sync queue that we can't empty it from the healer task as fast as the backfillers can fill it.
Work around this by providing a new mode that background tasks can use to avoid filling the queue if all the blobstores are writing successfully. This has a side-effect of slowing background tasks to the speed of the slowest blobstore, instead of allowing them to run ahead at the speed of the fastest blobstore and relying on the healer ensuring that all blobs are present.
Future diffs will add this mode to appropriate tasks
Reviewed By: ikostia
Differential Revision: D22866818
fbshipit-source-id: a8762528bb3f6f11c0ec63e4a3c8dac08d0b4d8e
Summary:
This operation is useful immediately after a small repo is merged into a large repo.
See example below
```
B' <- manually synced commit from small repo (in small repo it is commit B)
|
BM <- "big merge"
/ \
... O <- big move commit i.e. commit that moves small repo files in correct location
|
A <- commit that was copied from small repo. It is identical between small and large repos.
```
Immediately after a small repo is merged into a large one we need to tell that a commit B and all of
its ancestors from small repo needs to be based on top of "big merge" commit in large repo rather than on top of
commit A.
The function below can be used to achieve exactly that.
Reviewed By: ikostia
Differential Revision: D22943294
fbshipit-source-id: 33638a6e2ebae13a71abd0469363ce63fb6b014f
Summary: We were using a git snapshot of auto_impl from somewhere between 0.3 and 0.4; 0.4 fixes a bug around Self: 'lifetime constraints on methods that blocks work I'm doing in Mononoke, so update.
Reviewed By: dtolnay
Differential Revision: D22922790
fbshipit-source-id: 7bb68589a1d187393e7de52635096acaf6e48b7e
Summary:
Eden api endpoint for segmented changelog. It translates a path in the
graph to the hash corresponding to that commit that the path lands on.
It is expected that paths point to unique commits.
This change looks to go through the plumbing of getting the request from
the edenapi side through mononoke internals and to the segmented changelog
crate. The request used is an example. Follow up changes will look more at
what shape the request and reponse should have.
Reviewed By: kulshrax
Differential Revision: D22702016
fbshipit-source-id: 9615a0571f31a8819acd2b4dc548f49e36f44ab2
Summary:
This functionality is going to be used in EdenApi. The translation is required
to unblock removing the changelog from the local copy of the repositories.
However the functionality is not going to be turned on in production just yet.
Reviewed By: kulshrax
Differential Revision: D22869062
fbshipit-source-id: 03a5a4ccc01dddf06ef3fb3a4266d2bfeaaa8bd2
Summary:
To start the only configuration available is whether the functionality provided
by this component is available in any shape or form. By default the component
is going to be disabled to all repositories. We will enable it first to
bootstrapped repositories and after additional tooling is added to production
repositories.
Reviewed By: kulshrax
Differential Revision: D22869061
fbshipit-source-id: fbaed88f2f45e064c0ae1bc7762931bd780c8038
Summary:
- Enumerate API now provided via trait BlobstoreKeySource
- Implementation for Fileblob and ManifoldBlob
- Modified populate_healer to use new api
- Modified fixrepocontents to use new api
Reviewed By: ahornby
Differential Revision: D22763274
fbshipit-source-id: 8ee4503912bf40d4ac525114289a75d409ef3790
Summary: Update all the remaining steps in the walker to use the new early checks, so as to prune unnecessary edges earlier in the walk.
Reviewed By: farnz
Differential Revision: D22847412
fbshipit-source-id: 78c499a1870f97df7b641ee828fb8ec58303ebef
Summary:
Check whether to emit an edge from the walker earlier to reduce vec allocation of unnecessary edges that would immediately be dropped in WalkVistor::visit.
The VisitOne trait is introduced as a simpler api to the Visitor that can be used to check if one edge needs to be visited, and the Checker struct in walk.rs is a helper around that that will only call the VisitOne api if necessary. Checker also takes on responsibility for respecting keep_edge_paths when returning paths, so that parameter has be removed for migrated steps.
To keep the diff size reasonable, this change has all the necessary Checker/VisitOne changes but only converts hg_manifest_step, with the remainder of the steps converted in the next in stack. Marked todos labelling unmigrated types as always emit types are be removed as part of converting remaining steps.
Reviewed By: farnz
Differential Revision: D22864136
fbshipit-source-id: 431c3637634c6a02ab08662261b10815ea6ce293
Summary:
This tool can be used in tandem with pre_merge_delete tool to merge a one large
repository into another in a controlled manner - the size of the working copy
will be increased gradually.
Reviewed By: ikostia
Differential Revision: D22894575
fbshipit-source-id: 0055d3e080c05f870cfd0026174365813b0eb253
Summary:
There are two reasons to want a write quorum:
1. One or more blobstores in the multiplex are experimental, and we don't want to accept a write unless the write is in a stable blobstore.
2. To reduce the risk of data loss if one blobstore loses data at a bad time.
Make it possible
Reviewed By: krallin
Differential Revision: D22850261
fbshipit-source-id: ed87d71c909053867ea8b1e3a5467f3224663f6a
Summary: A couple of features stabilized, so drop their `#![feature(...)]` lines.
Reviewed By: eugeneoden, dtolnay
Differential Revision: D22912569
fbshipit-source-id: 5ffdc48adb1f57a1b845b1b611f34b8a7ceff216
Summary:
In several places in `library.sh` we had `--mononoke-config-path
mononoke-config`. This ensured that we could not run such commands from
non-`$TESTTMP` directorires. Let's fix that.
Reviewed By: StanislavGlebik
Differential Revision: D22901668
fbshipit-source-id: 657bce27ce6aee8a88efb550adc2ee5169d103fa
Summary: The more contexts the better. Makes debugging errors much more pleasant.
Reviewed By: StanislavGlebik
Differential Revision: D22890940
fbshipit-source-id: 48f89031b4b5f9b15f69734d784969e2986b926d
Summary:
An extremely thin wrapper around existing APIs: just a way to create merge commits from the command line.
This is needed to make the merge strategy work:
```
C
|
M3
| \
. \
| \
M2 \
| \ \
. \ \
| \ \
M1 \ \
| \ \ \
. TM3 \ \
. / | |
. D3 (e7a8605e0d) TM2 |
. | / /
. D2 (33140b117c) TM1
. | /
. D1 (733961456f)
| |
| \
| DAG to merge
|
main DAG
```
When we're creating `M2` as a result of merge of `TM2` into the main DAG, some files are deleted in the `TM3` branch, but not deleted in the `TM2` branch. Executing merge by running `hg merge` causes these files to be absent in `M2`. To make Mercurial work, we would need to execute `hg revert` for each such file prior to `hg merge`. Bonsai merge semantics however just creates correct behavior for us. Let's therefore just expose a way to create bonsai merges via the `megarepotool`.
Reviewed By: StanislavGlebik
Differential Revision: D22890787
fbshipit-source-id: 1508b3ede36f9b7414dc4d9fe9730c37456e2ef9
Summary:
This adds a CLI for the functionality, added in the previous diff. In addition, this adds an integration test, which tests this deletion functionality.
The output of this tool is meant to be stored in the file. It simulates a simple DAG, and it should be fairly easy to automatically parse the "to-merge" commits out of this output. In theory, it could have been enough to just print the "to-merge" commits alone, but it felt like sometimes it may be convenient to quickly examine the delete commits.
Reviewed By: StanislavGlebik
Differential Revision: D22866930
fbshipit-source-id: 572b754225218d2889a3859bcb07900089b34e1c
Summary:
This implements a new strategy of creating pre-merge delete commits.
As a reminder, the higher-level goal is to gradually merge two independent DAGs together. One of them is the main repo DAG, the other is an "import". It is assumed that the import DAG is already "moved", meaning that all files are at the right paths to be merged.
The strategy is as follows: create a stack of delete commits with gradually decreasing working copy size. Merge them into `master` in reverse order.
Reviewed By: StanislavGlebik
Differential Revision: D22864996
fbshipit-source-id: bfc60836553c656b52ca04fe5f88cdb1f15b2c18
Summary:
With upcoming write quorum work, it'll be interesting to know all the failures that prevent a put from succeeding, not just the most recent, as the most recent may be from a blobstore whose reliability is not yet established.
Store and return all errors, so that we can see exactly why a put failed
Reviewed By: ahornby
Differential Revision: D22896745
fbshipit-source-id: a3627a04a46052357066d64135f9bf806b27b974
Summary:
"Chunking hint" is a string (expected to be in a file) of the following format:
```
prefix1, prefix2, prefix3
prefix4,
prefix5, prefix6
```
Each line represents a single chunk: if a paths starts with any of the prefixes in the line, it should belong to the corresponding chunk. Prefixes are comma-separated. Any path that does not start with any prefix in the hint goes to an extra chunk.
This hint will be used in a new pre-merge-delete approach, to be introduced further in the stack.
Reviewed By: StanislavGlebik
Differential Revision: D22864999
fbshipit-source-id: bbc87dc14618c603205510dd40ee5c80fa81f4c3
Summary:
We need to use a different type of pre-merge deletes, it seems, as the one proposed requires a huge number of commits. Namely, if we have `T` files in total in the working copy and we're happy to delete at most `D` files per commit, while merging at most `S` files per deletion stack:
```
#stacks = T/S
#delete_commits_in_stack = (T-X)/D
#delete_commits_total = T/S * (T-X)/D = (T^2 - TX)/SD ~ T^2/SD
T ~= 3*10^6
If D~=10^4 and X~=10^4:
#delete_commits_total ~= 9*10^12 / 10^8 = 9*10^4
If D~=10^5 and X~=10^5:
#delete_commits_total ~= 9*10^12 / 10^10 = 9*10^2
```
So either 90K or 900 delete commits. 90K is clearly too big. 900 may be tolerable, but it's still hard to manage and make sense of. What's more, there seems to be a way to produce fewer of these, see further in the stack.
Reviewed By: StanislavGlebik
Differential Revision: D22864998
fbshipit-source-id: e615613a34e0dc0d598f3178dde751e9d8cde4da
Summary:
We're going to add an SQL blobstore to our existing multiplex, which won't have all the blobs initially.
In order to populate it safely, we want to have normal operations filling it with the latest data, and then backfill from Manifold; once we're confident all the data is in here, we can switch to normal mode, and never have an excessive number of reads of blobs that we know aren't in the new blobstore.
Reviewed By: krallin
Differential Revision: D22820501
fbshipit-source-id: 5f1c78ad94136b97ae3ac273a83792ab9ac591a9