Summary:
We've had a few sevs where expensive derivation for a single repo overloaded
our backends. This diff adds a tunable that makes it possible to quickly cancel
all derivations for a given repo. And one important caveat - it doesn't only
cancel derivations for a given repo, it also cancels the initial commit
traversal stage which can be quite expensive.
Note that it cancells all derivations for a given repo, at the moment there's
no way to cancel deriving just a single derived data type. I'd like to add it,
but in the next diffs since it might require changing the tunables - they don't
support tuning a list of strings for a given repo. Besides, having a way to
quickly disable all derivations for a given repo might be useful anyway.
Reviewed By: farnz
Differential Revision: D29028444
fbshipit-source-id: 559917e0e25eb060fe5be77a412bfec4ac3f1779
Summary: It will be used in the next diff to disable only certain derived data types
Reviewed By: yancouto
Differential Revision: D29031082
fbshipit-source-id: 9621a5eec522e369fef2b78e94599a056eda5317
Summary:
Mercurial has a lot of complex logic to try and create the "best" possible filenode for a given merge commit.
However, it works fine with imperfect filenodes, as long as they are semantically correct. Generate imperfect filenodes to speed up data derivation.
We are importing git repos to use here, which have some significantly mergy history; this change both reduces the number of filenodes we end up creating in data derivation, and speeds up data derivation by an order of magnitude overall.
Reviewed By: StanislavGlebik
Differential Revision: D28903515
fbshipit-source-id: 436ca4f43158533b32cf0d41db72228dda2688c3
Summary: revert the zstd crates back to previous version
Reviewed By: johansglock
Differential Revision: D29038514
fbshipit-source-id: 3cbc31203052034bca428441d5514557311b86ae
Summary: Separate `content_metadata` into a new method so that it can be used for both file and tree aux data fetching.
Reviewed By: DurhamG
Differential Revision: D28898226
fbshipit-source-id: 2dbca95fd02384e434ea80cad78793c8d8b2f489
Summary:
Previously, if someone had included a service definition like the following:
```
service Foo {
void match(1: string bar);
}
```
rustc could not compile the generated code because functions like `fn match(...)` would be emitted. This commit emits [raw identifiers](https://rust-lang.github.io/rfcs/2151-raw-identifiers.html) when possible (ie we still mangle `crate`, `self` `super`, and `Self` -- so in the previous example we now emit `fn r#match(...)`). The thrift compiler will also now throw an exception when trying to emit a keyword that participates in name resolution (ie `crate`, `super`, `self`, and `Self`).
This commit also implements the `rust.name` annotation for enums, enum values, fields, functions, types, and typedefs.
Reviewed By: yfeldblum, jsgf
Differential Revision: D28612277
fbshipit-source-id: a86ec1b3fd880c42a82d0e2239f616e3364edcba
Summary: I'm going to touch these files soon, found this easy improvement to do before.
Reviewed By: StanislavGlebik
Differential Revision: D28993265
fbshipit-source-id: 89fe9ac52f6d99e1b5ce7cb24d949e048226436a
Summary:
Turned out we still don't process copyfile attribute correctly. copyfile
creates two overrides like
{
"copyfile source" -> default prefix + "/copyfile source"
"copyfile source" -> "destination"
}
However we also have a check that validates that there no paths (that includes
default path, overrides, linkfiles etc) that are prefixes of each other. And
this check fails in the copyfile case, even though the path maps to exactly the
same path as if default prefix was applied.
Let's fix it. Also note that we haven't seen this failure in the integration
test because we didn't run verify_paths in tests. This diff fixes it as well.
Reviewed By: mitrandir77
Differential Revision: D28992456
fbshipit-source-id: 5fd993914b189cf768ba03010194b1c26026f7a8
Summary:
This makes sure that there is no caching when fetching skiplist from blobstores, by creating a blobstore that bypasses caching and using that on the creating of skiplists.
Because of D28898839 (6f05b85534) and stack, this is the only place that creates skiplists, so all creations do not use cache.
The alternative solution is D28901116, which is incomplete as I went with this one as it's much smaller, but I'm open to suggestions, and can polish up that one if needed.
Here are some pros and cons for this solution vs D28901116:
- Pro: much much smaller and simpler. Only needed to touch repo factory to create a blobstore without caching and feed that into Skiplist.
- Con: The "bypass-cache" feature of the skiplist is not baked into the type system, so it's harder to enforce. For example, this diff by itself might seem like the full solution, but if I had not done D28898839 (6f05b85534) and stack, those places would still use caching blobstores, while D28901116's solution wouldn't have that problem as it could enforce that skiplists can only be created with special blobstores that allow cache bypass on read (BlobstoreGetOps).
Reviewed By: StanislavGlebik
Differential Revision: D28903524
fbshipit-source-id: 09a09651a8b4003b03ca55e04235a8a75ec1f7bd
Summary:
Extracting `raw_blobstore` that creates a blobstore without caches, and `raw_repo_blobstore` which allows creating a repo_blobstore given a blobstore.
This diff is refactor only, the behaviour is still the same.
Reviewed By: StanislavGlebik
Differential Revision: D28903525
fbshipit-source-id: 20c7a0df6d00a26b2a8ec17a104af9cf1fa88e82
Summary:
Same as previous diff, but for a different command.
Building stuff through RepoFactory makes it easier to change how the build works *everywhere* later.
See D28877887 for goal
**This is the last diff of this type**. With this, I am able to delete `fetch_skiplist_index`, and now all skiplist building goes through factories.
Reviewed By: StanislavGlebik
Differential Revision: D28898839
fbshipit-source-id: af17d87b5890e09dbce317d52798bf78b14bc95c
Summary:
Same as previous diff, but for a different command.
Building stuff through RepoFactory makes it easier to change how the build works *everywhere* later.
See D28877887 for goal
Reviewed By: StanislavGlebik
Differential Revision: D28898653
fbshipit-source-id: 3db00cd99e50044d1acfa3d65065e5305280ec02
Summary:
Same as previous diff, but for a different command.
Building stuff through RepoFactory makes it easier to change how the build works *everywhere* later.
See D28877887 for goal
Reviewed By: StanislavGlebik
Differential Revision: D28898240
fbshipit-source-id: ede29b190c7bcf3f0a9c53c80149764a23cc4ef6
Summary:
Same as previous diff, but for a different command.
Building stuff through RepoFactory makes it easier to change how the build works *everywhere* later.
I changed from passing down a `BlobRepo` and a `LeastCommonAncestorsHint` to simply passing down an `InnerRepo`.
See D28877887 for goal
Reviewed By: StanislavGlebik
Differential Revision: D28897045
fbshipit-source-id: af546ae4702813bac2a0f938e7dec9d570c8cbf7
Summary:
Same as previous diff, but for a different command.
Building stuff through RepoFactory makes it easier to change how the build works *everywhere* later.
Had to move some code around in order to only build `BlobRepo` or `InnerRepo` once, and conditionally depending on `generate_bundles`.
See D28877887 for goal
Reviewed By: StanislavGlebik
Differential Revision: D28896888
fbshipit-source-id: 420dad831bd1759086ced72ed6a8b5726ca956d7
Summary:
Using the previous diff, this diff will now build `InnerRepo` instead of `BlobRepo`, as a way to build `Skiplist` without having to do it "manually" by calling `fetch_skiplist_index`.
See D28877887 for goal
Reviewed By: StanislavGlebik
Differential Revision: D28880352
fbshipit-source-id: 73e05864a0c0ffaba454a27ea521b4d3dc6ee78b
Summary:
BlobRepo is already easily cloneable (actually holds an arc inside).
This should also make code a little prettier, where we needed to use `(*blob_repo).clone()` instead of just `blob_repo.clone()`.
Reviewed By: StanislavGlebik
Differential Revision: D28930384
fbshipit-source-id: 59f95d10576a3f71808d0d26d36358421673351e
Summary:
The important change on this diff is in this file: `eden/mononoke/cmdlib/src/args/mod.rs`
On this diff I change that file's repo-building functions to be able to build both `BlobRepo` and `InnerRepo` (added on D28748221 (e4b6fd3751)). In fact, they are now able to build any facet container that can be built by the `RepoFactory` factory, so each binary can specify their own subset of needed "attributes" and only build those ones.
For now, they're all still using BlobRepo, this diff is only a refactor that enables easily changing the repo attributes you need.
The rest of the diff is mostly giving hints to the compiler, as in several places it couldn't infer it should use `BlobRepo` directly, so I had to add type hints.
## High level goal
This is part of the blobrepo refactoring effort.
I am also doing this in order to:
1. Make sure every place that builds `SkiplistIndex` uses `RepoFactory` for that.
2. Then add a `BlobstoreGetOps` trait for blobstores, and use the factory to feed it to skiplist index, so it can query the blobstore while skipping cache. (see [this thread](https://www.internalfb.com/diff/D28681737 (850a1a41b7)?dst_version_fbid=283910610084973&transaction_fbid=106742464866346))
Reviewed By: StanislavGlebik
Differential Revision: D28877887
fbshipit-source-id: b5e0093449aac734591a19d915b6459b1779360a
Summary: Update to latest version. This includes a patch to async-compression crate from [my PR updating it](https://github.com/Nemo157/async-compression/pull/125), I will remove once the crate is released.
Reviewed By: mitrandir77
Differential Revision: D28897019
fbshipit-source-id: 07c72f2880e7f8b85097837d084178c6625e77be
Summary: I need it for test in next diff
Reviewed By: StanislavGlebik
Differential Revision: D28959433
fbshipit-source-id: c67ca7eec03f94425332e446f6f97038edff598d
Summary:
Summar:
I'll be using it more in the next diff, so why not have it in it's own method.
Reviewed By: StanislavGlebik
Differential Revision: D28887333
fbshipit-source-id: 35accb495a577e1c01ec8114fc60acf38ed11fee
Summary:
When syncing merge commits with two parents it would be nice if it was the first parent that comes from the unified branch. In **case of octopus merges** we really don't want
the parent in unified branch to be third (that would turn the sync into
non-forward move!). Let's add a way to tell the commit rewriter which parent
needs to be first.
Reviewed By: farnz
Differential Revision: D28885488
fbshipit-source-id: 57a081ce2d285ba2b6d6d98110cd1c64a241548e
Summary: It can be useful to understand how many ancestors are not derived yet.
Reviewed By: Croohand
Differential Revision: D28902194
fbshipit-source-id: 87c11b3e35ba7f67122990318ff07408c47d4d6c
Summary:
The position the the lines with "count:" is undeterministic. Let's skip
matching them for now to unblock test.
Reviewed By: mzr
Differential Revision: D28940741
fbshipit-source-id: c05521722f838e7b72572a0cf67d3ebbc3b26868
Summary: The update timestamp field of the walker_checkpoints wasn't being set on updates.
Reviewed By: Croohand
Differential Revision: D28871609
fbshipit-source-id: b8873ec494f6671ef27b0243a8b2fb373a7313e8
Summary: Code inspection of derived data shows several places that are already set up to be I/O parallel; by adding `spawn` calls and looking for performance changes, I found that these two also gain from being CPU parallel. The others are not helped (or hindered) by a `spawn`, so leave them alone for now.
Reviewed By: ahornby
Differential Revision: D28901878
fbshipit-source-id: f774bdf93a11e9c0f0370612968f4f32179f3eb1
Summary:
implement filenode and tree lookup in edenapi
simple lookup in blobstore without any additional validation of blob content
assuming all the validation will be inside upload logic
here, assuming if key is known that content of blob is valid
Reviewed By: markbt
Differential Revision: D28868028
fbshipit-source-id: 590cc404f33adbec69f8adafd33365a0249d3241
Summary:
Files upload will be executed in 2 stages:
* check if content is already present
* upload missing files
The check api is generic, it could be used for any id type. Called 'lookup' API.
Reviewed By: markbt
Differential Revision: D28708934
fbshipit-source-id: 654c73b054790d5a4c6e76f7dac6c97091a4311f
Summary:
Recently we had an issue with `connectivity-lab` repo where 3 keys P416141335 had different values because of parent ordering P416094337.
Walker can detect difference between keys in the multiplex inner blobstores and repair them, however it doesn't have notion of the copy keys (there isn't concept of source and the target). We have a copy_blobstore_keys tool, which is used for restoring keys from the backup and with small modification it can handle copy between innerstore.
Reviewed By: StanislavGlebik
Differential Revision: D28707364
fbshipit-source-id: 3d5a4f39999623023539b9159fa7310d430f0ee4
Summary:
All megarepo methods write to a repository, so we need to check if write is
allowed to a given repo, and previously we weren't checking that.
Let's fix it and Let's start doing so now by trying to get RepoWriteContext
for the target repo. If writes are not allowed then RepoWriteContext fails.
Reviewed By: farnz
Differential Revision: D28838994
fbshipit-source-id: e45d4fe72603e7fe2755141874fc4125998bfed8
Summary:
Time 0.2 is current, and 0.1 is long obsolete. Unfortunately there's a
large 0.1 -> 0.2 API change, so I preserved 0.1 and updated the targets of its
users. Also unfortunate that `chrono` has `oldtime` as a default feature, which
makes it use `time-0.1`'s `Duration` type. Excluding it from the features
doesn't help because every other user is specifying it by default.
Reviewed By: dtolnay
Differential Revision: D28854148
fbshipit-source-id: 0c41ac6b998dfbdcddc85a22178aadb05e2b2f2b
Summary:
The worker should be able to process the requests from the queue no matter
which repo it is and what are it ACLs. It's during the request scheduling when
we should check the identity of the entity scheduling request.
Reviewed By: StanislavGlebik
Differential Revision: D28866807
fbshipit-source-id: 5d57eb9ba86e10d477be5cfc51dfb8f62ea16b9e
Summary:
Currently bookmark warmers receives a `BlobRepo`. This diff makes it receive an `InnerRepo`, but does no logic changes.
This will be useful as fields are moved from `BlobRepo` to `InnerRepo`, and will also be useful for accessing skiplists from warmers, as I plan to do on the next diff.
Reviewed By: StanislavGlebik
Differential Revision: D28796543
fbshipit-source-id: dbe5bec9fc34da3ae51e645ea09b03e2bb620445
Summary:
This diff simply extracts `InnerRepo` object type to a separate target.
This will be used on the next diff where we need to access `InnerRepo` from `BookmarkWarmer`, which needed to be split to a different target in order to not create a circular dependency in buck.
Reviewed By: StanislavGlebik
Differential Revision: D28796406
fbshipit-source-id: 4fdadbbde31719b809abb6b8a9ba8fa24b426299
Summary:
This diff creates a new `InnerRepo` container, that contains `BlobRepo` as well as the skiplist index.
The plan here is:
- As of code organisation, `InnerRepo` will eventually contain most of the fields currently in `Repo`, as well as the fields of `BlobRepo` that are only used in binaries that use `Repo`. This way each binary will only build the "attribute fields" it needs to, but the code to build them can still be neatly shared.
- As for `SkiplistIndex`, the plan is to be able to modify it inside `WarmBookmarksCache`, that's why I'm moving it to `InnerRepo` as well. I'll make bookmark warmers receive `InnerRepo` instead of `BlobRepo`, so they can access the skiplist index if wanted, and then modify it (this is an attempt to try to make skiplists faster on bookmarks).
Reviewed By: StanislavGlebik
Differential Revision: D28748221
fbshipit-source-id: bca31c14a6789a715a215cc69ad0a69b5e73404c
Summary: The run_start in the pack info logging is the run start for a given checkpoint name when checkpointing, so let's include that to provide context.
Reviewed By: farnz
Differential Revision: D28867962
fbshipit-source-id: 113b9e10f5b8e1869702b3ea83374d0d08a8792e
Summary: This makes it easier to understand what each move and merge commits are for.
Reviewed By: mitrandir77
Differential Revision: D28839677
fbshipit-source-id: 1a42205c164224b64c773cff80b690b251a48381
Summary:
When deriving `hgchangesets` for merge commits, we try to get filenodes right.
Speed this up a little by recognising that history checks are unnecessary if both parents have the same filenode.
Reviewed By: StanislavGlebik
Differential Revision: D28866024
fbshipit-source-id: 6da4c162abce5b426269630f82e9e0b84eea2b33
Summary: Sometimes it is useful to search/group by commit hash.
Reviewed By: krallin
Differential Revision: D28834644
fbshipit-source-id: 93f650e19ae512450e33542cf74b8aa3333c6c35
Summary:
Fetching all history of both filenodes to see if there's common history either side of a merge is wasteful, and in some megarepo work is causing long delays deriving merge changesets.
Where we have already derived filenodes for a given merge's ancestors, we can go faster; we can use the linknodes to determine the older of the two filenodes, and fetch only history for the newer of the two.
This is imperfect for the import use case, since filenodes depend on hgchangesets, and the batching in use at the moment prefers to generate all derived data of a given type before moving onto another type, but it's an improvement for cases where some filenodes are already derived (e.g. due to import of a repo with a similar history).
Reviewed By: StanislavGlebik
Differential Revision: D28796253
fbshipit-source-id: 5384b5d2841844794a518c321dbf995891374d3a
Summary:
This is a precaution. add_sync_target can create a very branchy repository, and
I'm not sure how well Mononoke is going to handle deriving of these commits by
all mononoke hosts at once (in particular, I'm worried about traversal that has
to be done by all hosts in parallel). In theory it should work fine, but
deriving data during add_sync_target call should be a reasonable thing to do
anyway.
While working on that I noticed that "git" derived data is not supported by derived data utils, and it was causing test failures. I don't think there's any reason to not have TreeHandle in derived data utils, so I'm adding it now.
Reviewed By: farnz
Differential Revision: D28831052
fbshipit-source-id: 5f60ac5b93d2c38b4afa0f725c6908edc8b98b18
Summary:
Previously even if two commits are unrelated (e.g. they are in two branches
that are not related to each other) we'd still derive them sequentially. This
diff makes it possible to derive them in parallel to speed up derivation.
A few notes:
1) There's a killswitch to disable parallel derivation
2) By default we derive at most 10 commits at the same time, but it can be
configured.
Reviewed By: farnz
Differential Revision: D28830311
fbshipit-source-id: b5499ad5ac179f73dc94ca09927ec9c906592460
Summary:
source_name to be unique even if the same git-repo and project name is mapped several times in the same manifest.
source_name need to match between all the different places we use it, `add_target_config, sync_changeset, changesets_to_merge` we where not consistent.
Reviewed By: StanislavGlebik
Differential Revision: D28845869
fbshipit-source-id: 54e96dcdeaf22ec68f626e9c30e5e60c54ec149b
Summary:
We are going to create quite a lot of move commits at the same time, and it can
be slow. Let's instead create them in parallel, and then call
`save_bonsai_changesets` for all the commits in one go.
Reviewed By: mitrandir77
Differential Revision: D28795525
fbshipit-source-id: f6b6420c2fe30bb98680ac7e25412c55c99883e0
Summary:
Previously add sync target method was creating a single merge commit. That
means that we might create a bonsai commit with hundreds of parents. This is
not ideal, because mercurial can only work correctly with 2 parents - for
bonsai changeset with 3 parents or more mercurial file histories might be lost.
So instead of creating a single giant merge commit let's create a stack of
merge commits.
Reviewed By: mitrandir77
Differential Revision: D28792581
fbshipit-source-id: 2f8ff6b49db29c4692b7385f1d1ab57986075d57