Summary:
This diff fixes how syncing of merge commits decides on the `CommitSyncConfigVersion` to use. Old and incorrect behavior just always uses current version from `LiveCommitSyncConfig`. The desired behavior is to reuse the version with which parent commits are synced, and manually sync commits when version changes are needed.
For merges it is more interesting, as merges have multiple parents. The overarching idea is to force all of the parents to have the same version and bail a merge if this is not the case. However, that is an ideal, and we are not there yet, because:
- there are `NotSyncCandidate` parents, which can (and should at the moment) be safely excluded from the list of parents of the synced commit.
- there are `Preserved` parents (which will turn into the ones synced with a `noop` version)
- there are `RewrittenAs` and `EquivalentWorkingCopy` parents, which don't have an associated version.
So until the problems above are solved:
- absent `RewrittenAs`/`EquivalentWorkingCopy` versions are replaced with the current version
- `Preserved` merge parents cause merge sync to fail.
Reviewed By: StanislavGlebik
Differential Revision: D24033905
fbshipit-source-id: c1c98b3e7097513af980b5a9f00cc62d248fc03b
Summary:
Our higher-level goal is to get rid of `CommitSyncOutcome::Preserved` altogether. This diff is a step in that direction. Specifically, this diff removes the creation of "accidental" Preserved commits: the ones where the hashes are identical, although a `Mover` of some version have been applied. There are a few sides to this fix:
- `get_commit_sync_outcome` now returns `Preserved` only when the source and target hashes are identical, plus stored version is `None` (previously it would only look at hashes).
- `sync_commit_no_parents` now records the `Mover` version it used to rewrite the commit (previously it did not, which would sometimes create `Preserved` roots)
- there are now just two ways to sync commits as `Preserved`:
- `unsafe_preserve_commit` (when the caller explicitly asks for it). The idea is to only remove it once we remove the callers of this methods, of course.
- `sync_commit_single_parent` when the parent is also `Preserved`. Note that automatically upgrading from `Preserved` parent to a rewritten changeset is incorrect for now: `Preserved` does not have an associated version by definition, so we would have to use a current version, which may corrupt the repo. Once we get rid of `Preserved`, this case will naturally go away.
- as we now have `update_mapping_with_version` and `update_mapping` (which consumes current version), we need to add explicit `update_mapping_no_version` for preserved commits we are still creating (again, recording a current version is a mistake here, same reason as above)
NB: I've added/changed a bunch of `println`s in tests, leaving them here, as they are genuinely useful IMO and not harmful.
Reviewed By: StanislavGlebik
Differential Revision: D24142837
fbshipit-source-id: 2153d3c5cc406b3410eadbdfca370f79d01471f9
Summary:
This diff adds two things:
- the ability to compute the reverse of a `CommitSyncDataProvider::Test`, useful when creating both small-to-large and large-to-small `CommitSyncer` structs in tests
- the ability to set a current `CommitSyncConfigVersion` in the provider, which can also be useful, when simulating current version changes.
NB: I ended up not needing the set version functionality in my tests (further in the stack) in the end, so I can remove it, but I do think it will prove useful eventually.
Reviewed By: StanislavGlebik
Differential Revision: D24103206
fbshipit-source-id: 389169b2984684d83b0f6fdeb3be597d84cc0f12
Summary:
This is one more fix to use correct commit sync config version. In particular,
this diff fixes a case where a single parent commit was rewritten out. E.g.
if a large repo commit touches only files that do not remap in a small repo. In
that case we still want to record correct mapping so that all descendants used
the correct mapping as well.
Reviewed By: ikostia
Differential Revision: D24109221
fbshipit-source-id: bcdbb01b964d70227dff8363e77964716a345261
Summary:
Let's move initialization into a separate function. I'm planning to use it in
the next diff for another test
Reviewed By: ikostia
Differential Revision: D24109222
fbshipit-source-id: 73142dd46ef3de15ff381670ed6d5e31653c5dd4
Summary:
Normally, sync logic infers `CommitSyncConfigVersion` to use from parent commits (or from current version for root commits). However, for test purposes it is convenient to force a version override This logic does not change any of the production behaviors, and will be used in a later diff.
TODO: can it ever be needed beyond tests? I've thought about using this for "version boundary" commits, but those would probably just be constructed while completely bypassing the sync logic.
TBH, I am not certain this diff is a good change. But I've spend a very large amount of time crafting the repos used in the `sync_merge` tests later in this stack, so I am proposing to land this, then spend some time refactoring sync tests (and hopefully making it easier to craft test repos), then removing this logic. Obviously, this logic should only be landed if we land the tests in the first place.
Reviewed By: StanislavGlebik
Differential Revision: D24104101
fbshipit-source-id: 0825f04ed74532e89fd5f1fbebeee5f2001fedcd
Summary: No need to allocate a new vector if we just need to remove items from the current one.
Reviewed By: StanislavGlebik
Differential Revision: D24088319
fbshipit-source-id: 10804d925f20fe8dd1e2bb8500aa06d30bd367c1
Summary:
This just adds a single fn. I did not come up with a better place/name to put
it, suggestions are welcome. Seems generic enough to belong at the top-level
common location.
I've already needed this twice, so decided to extract. Second callsite will be further in the stack.
Reviewed By: StanislavGlebik
Differential Revision: D24080193
fbshipit-source-id: c3e0646f263562f3eed93f1fdbab9a076729f33c
Summary: `clippy` often complains about the use of `.len() != 0`, `.len() > 0` or `.len() == 0`and proposes to use `.is_empty()` instead. This diff does that across Mononoke.
Reviewed By: aslpavel
Differential Revision: D24099427
fbshipit-source-id: 1bba2f958485b7efb3f41bf3eae820879c92b0e5
Summary: Now that there are no more use-cases of `get_one`, let's remove it completely.
Reviewed By: farnz
Differential Revision: D24027990
fbshipit-source-id: 47baa6b1e28eedd94d95808efca0a98007a1d388
Summary:
This is a bit of a cargo-cult diff: it replaces the uses of `get_one` with `get` in tests, just to make the same wrong decisions later - use the first item from the produced list of items. So the only thing it does it removes a call site for `get_one`.
The reason it is ok to do `.into_iter().next()` here is because these are tests and we control the situation precisely - we know that there will be one mapping. Same reason we use `.unwrap()` in tests.
Reviewed By: farnz
Differential Revision: D24027785
fbshipit-source-id: 1c11acadfc9f7c6c4af658b414589c32008a6cce
Summary:
`get_one` is a deprecated method, because it uses incorrect logic to resolve ambiguities of multi-mapped commits: if just selects the very first of the potentially many mappings.
Correct resolution is to either handle the ambiguity at the caller site, or rely on provided resolution logic in `commit_sync_outcome.rs`.
Therefore, I am removing the uses of this method in this and a few surrounding commits.
In this case, we can just rely on a provided `CommitSyncer::commit_sync_outcome_exists` method.
Reviewed By: farnz
Differential Revision: D24026470
fbshipit-source-id: 9f150eb3d6c39a58bb4b0d16d4cf18c324359013
Summary:
This diff adds an ability to optionally pass a `CandidateSelectionHint` to `scs` implementation of the `xrepo-lookup` call, which would help in cases when ancestor commits have multiple mappings in the large repo.
Adding this functionality to `scsc xrepo-lookup` is essentially a way to manually fix multi-mapping problems, which could otherwise block Mononoke progress.
For more information on multi-mapping problems, see https://fburl.com/gmywf2d6.
TLDR is that `synced_commit_mapping` is `1:n` with `n` on the large repo side. When syncing commits, we need a way to disambiguate multi-mapped ancestors. `CandidateSelectionHint` is our way of doing this: it expressed desired properties of the commit we would like Mononoke to choose among the list of multi-mapped candidates.
Reviewed By: markbt
Differential Revision: D23991178
fbshipit-source-id: 29c90b7910ad1b84ff71964d6609521fded2f987
Summary:
Let's start actually fixing what commit sync config version is used to remap a commit i.e. we
should use a commit sync config version that was used to remap a parent instead
of using a current version. See more details in
https://fb.quip.com/VYqAArwP0nr1
This diff fixes one particular case and also leaves a few TODOs that we need to
do later
Reviewed By: krallin
Differential Revision: D23953213
fbshipit-source-id: 021da04b0f431767fec5d1c4408287870cb83de1
Summary:
TestLiveCommitSyncConfig is supposed to be a test replacement of
CfgrLiveCommitSyncConfig, however it was quite a different semantic. In
particular, it wasn't even possible to have two versions of the mapping for the
single repo.
This diff changes that. Now we'll have a method to add commit sync config
version, and mark/remove a version as current
Reviewed By: krallin
Differential Revision: D23951202
fbshipit-source-id: 242b4f088f67dac504544987e484cc290ee4e400
Summary: Instead of always fetching the current version name to verify working copy let's instead fetch whatever the version was actually used to create this commit.
Reviewed By: krallin
Differential Revision: D23936503
fbshipit-source-id: 811e427eb62741401b866970b4a0de0c1753edb3
Summary:
Turned out validation didn't report an error if source repo contained an entry
that was supposed to be present in target repo but was actually missing.
This diff fixes it.
Reviewed By: krallin
Differential Revision: D23949909
fbshipit-source-id: 17813b4ad924470c2e8dcd9d3dc0852c79473c61
Summary: Since now we store it in the db, let's also expose it in CommitSyncOutcome enum
Reviewed By: krallin
Differential Revision: D23936502
fbshipit-source-id: a0758143ceaa8f5706f1d9cfe3040ac91c7bac49
Summary:
See motivation for the change in D23845720 (5de500bb99).
We'll need to store version name even for commits that weren't rewritten, but that have an equivalent working copy in another repo.
Reviewed By: ikostia
Differential Revision: D23864571
fbshipit-source-id: 408b68c3b0aa9885a9cd248b0b4abc2b87cd4cca
Summary:
get_source_target_mover likely awaits the same fate as
get_current_mover_DEPRECATED functions i.e. get_source_target_mover will likely
be removed.
This diff just removes a few intances of this function.
Reviewed By: ikostia
Differential Revision: D23929748
fbshipit-source-id: 2ac09da164de3916a552757acf0c39387f6126e4
Summary:
get_mover() and get_reverse_mover() functions return the mover for the
"current" version of the commit sync config, which means these are movers for the version
of the config that's used to create the latest commits on master branch.
So this function returns correct mover only for the latest master commit, but
for all other commits it returns an incorrect mover! This is wrong and it
happened to work just by change, and that's why these functions are marked as deprecated
now, and later we'll add functions 'get_mover_by_version()' which could be used to
replace deprecated functions.
Note that the story for get_bookmark_renamer()/get_reverse_bookmark_renamer()
functions seems to be different. If we can always figure out what's the correct
mover for a commit by e.g. look at its parent we can't really do the same for
bookmarks. Because of that I suggest to keep using the current version for
get_bookmark_renamer() function.
Reviewed By: ikostia
Differential Revision: D23929582
fbshipit-source-id: 3e5e9b46224aca0b75cf2d981ea21c4f9a378ba9
Summary: Finally remove version_name from CommitSyncRepos. Note that this diff adds a few TODOs that we'd need to fix later.
Reviewed By: ikostia
Differential Revision: D23929010
fbshipit-source-id: c72130af548ac7b26bc20ddaac9a59562cc75e0b
Summary: Just as in the previous diff, but this time remove bookmark_renamers from CommitSyncRepos
Reviewed By: ikostia
Differential Revision: D23910295
fbshipit-source-id: 0c2d147057c8d3e0749d5b31ef98ab5022255d95
Summary: Just as the previous diff, but this time it removes reverse_mover
Reviewed By: ikostia
Differential Revision: D23879509
fbshipit-source-id: ed111ca2d106120229c4facc0bb2435913c27966
Summary:
This diff starts to use CommitSyncDataProvider introduced in the previous diff
and removes Mover from CommitSyncRepos struct.
Reviewed By: ikostia
Differential Revision: D23878683
fbshipit-source-id: 0d54f889781aebe4726b3388343a87df783c17d4
Summary:
As described in D23845720 (5de500bb99) we are doing a pretty significant change in the
CommitSyncer. Previously it stored static Movers and BookmarkRenamers, but it
needs to change and they would need to fetch config from LiveCommitSyncConfig.
Unfortunately we already have a bunch of tests that create weird movers that
would be very hard to model via LiveCommitSyncConfig. So we have 2 options:
1) delete or rewrite all these tests
2) Create a wrapper that would return a mover/bookmark renamers.
Option #2 is preferable, and that's what this diff does. In production it would
use LiveCommitSyncConfig, but it also let's tests specify whatever weird movers
they need.
Reviewed By: ikostia
Differential Revision: D23909432
fbshipit-source-id: 83fb627812f625e07f7e40044e2f69274cd2d768
Summary:
A higher-level goal is to provide an interface for the manual remediation of
various xrepo-sync blockages. The nature of a candidate selection is such that
it may fail if the hint is not sufficient to decide which remapping changeset
is the best candidate. This is especially true about the `Only` hint variant:
it is designed to fail when there's more than one candidate. But even with
bookmark ancestorship hints, there are corner cases when the algorithm cannot
make a decision (as well as there may just be bugs in the algorithm). These
cases should be **extremely** rare. Nevertheless, we want to be able to unblock
ourselves. To do so, it is proposed to acccept parent selection hints in the `xrepo-lookup` scs
method. By default, it will use `Only` as a hint and be semantically equivalent
to the current behavior. But we'll provide CLI options to select other hints.
In order to make this work, we need the `sync_commit` method of the
`CommitSyncer` to accept hints too.
Reviewed By: StanislavGlebik
Differential Revision: D23913216
fbshipit-source-id: 05e1ff99cd2c6522829a6e8569040b226600af60
Summary:
This diff adds the use of candidate selection hints to `cross_repo_sync` code sights, which need to query `CommitSyncOutcome` in the small-to-large direction. Specifically: `unsafe_sync_commit` and `unsafe_sync_commit_pushrebase` are the two main functions.
One will now get `CandidateSelectionHint` from the callsight (most notably: `push_redirector`), the other one will build a bookmark-based hint itself.
Reviewed By: StanislavGlebik
Differential Revision: D23715259
fbshipit-source-id: 3f4924f1337b09f3762cc050c4017c5d2bd6cab6
Summary:
Previously we had just an empty TestLiveCommitSyncConfig in tests. Since we are
not using it at all right now, it was fine, but we are planning to start using
it later. To do that let's configure TestLiveCommitSyncConfig so that it's not
empty but actually stores a real content.
Reviewed By: ikostia
Differential Revision: D23903579
fbshipit-source-id: af05a377f730c1824b03327749e6f824361e23e2
Summary:
In D23845720 (5de500bb99) I described what changes we need to make in our commit syncer. One
part of it is that we should remove get_mover() method, as this method always
uses current version of commit sync map even, and that's incorrect.
This diff removes it from commit validator
Reviewed By: ikostia
Differential Revision: D23864350
fbshipit-source-id: 3f650a32835dda9f82949002d63b52cc36cf04e0
Summary:
CommitSyncer is a struct that we use to remap a commit from one repo to another. It uses commit sync map to figure out which paths need to be changed. Commit sync mapping might change, and each commit sync mapping has a version associated with it.
At the moment CommitSyncer doesn't work correctly if a commit sync mapping is changed. Consider the following DAG
```
large repo
A' <- remapped with mapping V1
|
O B' <- remapped with mapping V1
| /
...
small repo
A
|
O B
| /
...
```
We have commit A and B from a small repo remapped into a large repo into commits A' and B'. They were remapped with commit sync mapping V1, which for example remaps files in "dir/" into "smallrepo/dir".
Now let's say we start to use a new mapping v2 which remaps "dir/" into "otherdir/". After this point every commit will be created with new mapping. But this is incorrect - if we create a commit on top of B in a small repo that touches "dir/file.txt" then it will be remapped into "otherdir/file.txt" in the large repo, even though every other file is still in "smallrepo/dir"!
The fix for this issue is to always use the same mapping as commit parent was using (there are a few tricky cases with merge commits and commits with no parents, but those will be dealt with separately).
This diff is the first step - it threads through LiveCommitSyncConfig all the way to the CommitSyncer object, so that CommitSyncer can always fetch whatever mapping it needs.
Reviewed By: ikostia
Differential Revision: D23845720
fbshipit-source-id: 555cc31fd4ce09f0a6fa2869bfcee2c7cdfbcc61
Summary:
Our current megarepo configuration is in a bit of a mess:
1) We have LiveCommitSyncConfig, which fetches the latest version of configs
from configerator and should be used in all cases
2) However we still have an old commit that's stored in mononoke config. It
shouldn't really be used at all.
Unfortunately there are a few places where #2 is still used. This diff removes
one of them.
Reviewed By: ikostia
Differential Revision: D23845297
fbshipit-source-id: aa2d591223cc4b8fe5ef264147457fcb3d1faad7
Summary:
At the moment CommitSyncConfig can be set in two ways:
1) Set in the normal mononoke config. That means that config can be updated
only after a service is restarted. This is an outdated way, and won't be used
in prod.
2) Set in separate config which can be updated on demand. This is what we are
planning to use in production.
create_commit_syncer_from_matches was used to build a CommitSyncer object via
normal mononoke config (i.e. outdated option #1). According to the comments it
was done so that we can read configs from the disk instead of configerator, but
this doesn't make sense because you already can read configerator configs
from disk. So create_commit_syncer_from_matches doesn't look particularly
useful, and besides it also make further refactorings harder. Let's remove it.
Reviewed By: ikostia
Differential Revision: D23811090
fbshipit-source-id: 114d88d9d9207c831d98dfa1cbb9e8ede5adeb1d
Summary:
There's a [flaw](https://fb.workplace.com/groups/scm.mononoke/permalink/1220069065022333) in the current `synced_commit_mapping` data model. In a nutshell, the flaw is in the assumption that the `RewrtittenAs` relationship is `1:1`, while in fact it is `1:n` with `n` on the large repo side.
To address this flaw I propose to:
- relax the DB constraints to represent the semantically correct model
- select all the synced candidates from the DB
- for places in code, which require a single mapping for a candidate, use the provided hint to resolve any ambiguity
More concretely:
- instead of a single `CommitSyncOutcome` struct, I propose to have the "canonical" `PluralCommitSyncOutcome` and the "resolved" `CommitSyncOutcome`
- every variant of `PluralCommitSyncOutcome` that is not `RewrittenAs` just maps to an identical variant of `CommitSyncOutcome`
- have a `CandidateSelectionHint` passed from the clients, which would help resolve `PluralCommitSyncOutcome::RewrittenAs` into a `CommitSyncOutcome::RewrittenAs`
- if the hint does not help to resolve `PluralCommitSyncOutcome::RewrittenAs` into an unambiguous `CommitSyncOutcome::RewrittenAs`, just fail the request and require human intervention to deal with things
- within the hint, have for the following variants for the resolution algorithm:
- `Only` which fails the resolution if there's more than one candidate
- `Exact` behaves like `Only` if there's one candidate, otherwise selects a provided candidate
- `OnlyOr(Ancestor|Descendant)Of(Commit|Bookmark)` behave like `Only` if there's one candidate, otherwise select a candidate in the expected topological relationship
Note some important decisions, that may be surprising at first:
- if there's just one candidate, resolutions with all types of hints succeed, even if this candidate does not fit the hint otherwise (for example, if the hint is `Exact(A)`, and the list of candidates is `[B]`, the resolution succeeds.
- for bookmark-related hints, if the bookmark does not exist at the time of resolution, the hint just "downgrades" itself to be `Only`
Both of these emphasize the fact that if the mapping has only one `RewrittenAs` candidate for a given changeset, the behavior does not change.
Reviewed By: StanislavGlebik
Differential Revision: D23670180
fbshipit-source-id: 1cee1f65fc8020e0ae8a7da789b2532d2e436b77
Summary:
Let's add a command that validates that the created catchup commit is correct.
For now it validates that unodes are the same between catchup commit and commit
that we are merging in.
Later we can add more invariants that we want to check.
Reviewed By: krallin
Differential Revision: D23782369
fbshipit-source-id: 61d19aa73777d5fbb3e1b127bdcf39f5e6309b52
Summary:
In the later diffs we are going to change how CommitSyncer is initialized. In
order to make it simpler let's refactor cross_repo_sync_test to move
CommitSyncer creation in a single function.
There are a few tests that have very peculiar initialization - for example they
have movers that fail. For those tests I combined the new function for creation
of CommitSyncer with manual initialization of CommitSyncRepos struct.
Reviewed By: krallin
Differential Revision: D23811507
fbshipit-source-id: 682ab30aa09c9189fcd02850a19f1ddf021c0329
Summary:
This makes deletion commits a bit less confusing, but it also have another
benefit.
Without the sort some directories might have been changed multiple times in
deletion commits e.g. if a directory had 5 files, and these files were deleted
in 5 different deletion commits then the directory would be changed 5 times.
This was not good, because it made some data derivation slower (in particular,
fastlog), because it had to regenerate the same data over and over again.
Reviewed By: ikostia
Differential Revision: D23780066
fbshipit-source-id: d5c52b13f58dcaf2012d9c12bf77398561cf10ef
Summary:
Use smallvec for the internal storage of MPathElement.
The previous Bytes had a stack size of 32 bytes plus the text it pointed to.
Using SmallVec we can store up to 24 bytes without allocation keeping the same space as the previous Bytes object.
Given most path elements are directory names and directory names are usually short it is expected that this will save both space and allocations.
Reviewed By: farnz
Differential Revision: D23703344
fbshipit-source-id: 39ffc3bd3bb765bd1dbb757b4b1a7782382db909
Summary:
There was a bug. If an entry was skipped then we haven't updated the counter.
That means we might skip the same entry over and over again.
Let's fix it
Reviewed By: ikostia
Differential Revision: D23728790
fbshipit-source-id: f323d14c4deba5736ceb8ada7cb7ee48a69c1272
Summary:
Previously we were able to add a backpressure to the x-repo-sync job that waits
until backsync queue gets empty. However it's also useful to wait until hg sync
queue drains for the large repo. This diff makes it possible to do so.
Reviewed By: aslpavel
Differential Revision: D23728201
fbshipit-source-id: 6b198c8d9c35179169a46f2b804f83838edeff1e
Summary:
Instead of using force_set and force_delete let's use create() update() and
delete() calls.
Reviewed By: ikostia
Differential Revision: D23704245
fbshipit-source-id: 40bcfd906c4f61a860e5ec8312cddc0d80ea94ae
Summary:
This would let us allow only a certain bookmarks to be remapped from a small
repo to a large repo.
Reviewed By: krallin
Differential Revision: D23701341
fbshipit-source-id: cf17a1a21b7594a94c5fb117065f7d9298c8d1af
Summary:
Previously we used target repo for a commit from a source repo. This diff fixes
it.
Reviewed By: krallin
Differential Revision: D23685171
fbshipit-source-id: 4aa105aec244ebcff92b7b71a6cb22dd8a10d2e5
Summary:
This method is a future of synced-commit-mapping: there can be multiple query
results and we should make a decision of whether it is acceptable for the
business logic in the business logic, rather than pick a random one.
In later diffs I will introduce the consumers for this method.
Reviewed By: mitrandir77
Differential Revision: D23574165
fbshipit-source-id: f256f82c9848f54e5096c6e50d42600bfd260081