Summary:
# Goal of the stack
There goal of this stack is to make megarepo api safer to use. In particular, we want to achieve:
1) If the same request is executed a few times, then it won't cause corrupt repo in any way (i.e. it won't create commits that client didn't intend to create and it won't move bookmarks to unpredictable places)
2) If request finished successfully, but we failed to send the success to the client, then repeating the same request would finish successfully.
Achieving #1 is necessary because async_requests_worker might execute a few requests at the same time (though this should be rare). Achieveing #2 is necessary because if we fail to send successful response to the client (e.g. because of network issues), we want client to retry and get this successful response back, so that client can continue with their next request.
In order to achieve #1 we make all bookmark move conditional i.e. we move a bookmark only if current location of the bookmark is at the place where client expects it. This should help achieve goal #1, because even if we have two requests executing at the same time, only one of them will successfully move a bookmark.
However once we achieve #1 we have a problem with #2 - if a request was successful, but we failed to send a successful reply back to the client then client will retry the request, and it will fail, because a bookmark is already at the new location (because previous request was successful), but client expects it to be at the old location (because client doesn't know that the request was succesful). To fix this issue before executing the request we check if this request was already successful, and we do it heuristically by checking request parameters and verifying the commit remapping state. This doesn't protect against malicious clients, but it should protect from issue #2 described above.
So the whole stack of diffs is the following:
1) take a method from megarepo api
2) implement a diff that makes bookmark moves conditional
3) Fix the problem #2 by checking if a previous request was successful or not
# This diff
If a previous add_sync_target() call was successful on mononoke side, but we
failed to deliver this result to the client (e.g. network issues), then client
would just try to retry this call. Before this diff it wouldn't work (i.e. we
just fail to create a bookmark because it's already created). This diff fixes
it by checking a commit this bookmark points to and checking if it looks like
it was created by a previous add_sync_target call. In particular, it checks
that remapping state file matches the request parameters, and that config
version is the same.
Differential Revision: D29848377
fbshipit-source-id: 16687d975748929e5eea8dfdbc9e206232ec9ca6
Summary:
High-level goal of this diff:
We have a problem in long_running_request_queue - if a tw job dies in the
middle of processing a request then this request will never be picked up by any
other job, and will never be completed.
The idea of the fix is fairly simple - while a job is executing a request it
needs to constantly update inprogress_last_updated_at field with the current
timestamp. In case a job dies then other jobs would notice that timestamp
hasn't been updated for a while and mark this job as "new" again, so that
somebody else can pick it up.
Note that it obviously doesn't prevent all possible race conditions - the worker
might just be too slow and not update the inprogress timestamp in time, but
that race condition we'd handle on other layers i.e. our worker guarantees that
every request will be executed at least once, but it doesn't guarantee that it will
be executed exactly once.
Now a few notes about implementation:
1) I intentionally separated methods for finding abandoned requests, and marking them new again. I did so to make it easier to log which requests where abandoned (logging will come in the next diffs).
2) My original idea (D29821091) had an additional field called execution_uuid, which would be changed each time a new worker claims a request. In the end I decided it's not worth it - while execution_uuid can reduce the likelyhood of two workers running at the same time, it doesn't eliminate it completely. So I decided that execution_uuid doesn't really gives us much.
3) It's possible that there will be two workers will be executing the same request and update the same inprogress_last_updated_at field. As I mentioned above, this is expected, and request implementation needs to handle it gracefully.
Reviewed By: krallin
Differential Revision: D29845826
fbshipit-source-id: 9285805c163b57d22a1936f85783154f6f41df2f
Summary:
Currently they got zeros by default, but having NULL here seems like a nicer
option.
Reviewed By: krallin
Differential Revision: D29846254
fbshipit-source-id: 981d979055eca91594ef81f0d6dc4ba571a2e8be
Summary:
The patches to these crates have been upstreamed.
allow-large-files
Reviewed By: jsgf
Differential Revision: D29891894
fbshipit-source-id: a9f2ee0744752b689992b770fc66b6e66b3eda2b
Summary:
Do a similar change to change_target_config as we've done for add_sync_target
in D29848378. Move bookmark only if it points to an expected commit. That would
prevent make it safer to deal with cases where the same change_target_config
was executing twice.
Reviewed By: mojsarn
Differential Revision: D29874803
fbshipit-source-id: d21a3029ee58e2a8acc41e37284d0dd03d2803a3
Summary:
This is the first diff that tries to make megarepo asynchronous methods
idempotent - replaying the same reqeust twice shouldn't cause corruption on the
server. At the moment this is not the case - if we have a runaway
add_sync_target call, then in the end it moves a bookmark to a random place,
even if there was another same successful add_sync_target call and a few others on
top.
add_sync_target should create a new bookmark, and if a bookmark already exists
it's better to not move it to a random place.
This diff does it, however it creates another problem - if a request was successful on mononoke side, but we failed to deliver the successful result to the client (e.g. network issues), then retrying this request would fail because bookmark already exists. This problem will be addressed in the next diff.
Reviewed By: mojsarn
Differential Revision: D29848378
fbshipit-source-id: 8a58e35c26b989a7cbd4d4ac4cbae1691f6e9246
Summary: It's nice to be able to keep track of what's going on
Reviewed By: mwdevine
Differential Revision: D29790543
fbshipit-source-id: b855d72efe8826a99b3a6a562722e299e9cbfece
Summary:
This is not used. Even though this method has the "right intention" (i.e. we
need to start marking long running requests as new), I'm not sure we can use it
as is. So let's just delete it for now.
Reviewed By: farnz
Differential Revision: D29817068
fbshipit-source-id: 84d392fea01dfb5fb7bc56f0072baf2cf70b39f4
Summary:
In previous diff we started creating deletion commits on megarepo mainline.
This is not great since it breaks bisects, and this diff avoids that.
The way it does it is the following:
1) First do the same thing we did before - create deletion commit, and then
create a merge commit with p1 as deletion commit and p2 as an addition commit.
Let's call it "fake merge", since this commit won't be used for our mainline
2) Generate manifest for our "fake merge", and then use this manifest to
generate bonsai diff. But this time make p1 an old target commit (i.e. remove
deleted commit as if it never exited).
3) Use generated bonsai diff to create a commit.
So in short we split the procedure in two - first generate and validate the
resulting manifest (this is what we use "fake merge" commit for), and then
generate bonsai changeset using this manifest. It's unfortunate that in order
to generate resulting manifest we actually need to create commit and save it to
blobstore. If we had in-memory manifests we could have avoided that, but alas
we don't have them yet.
This way of creating bonsai changesets is a bit unconventional, but I think it has the benefit of relying on tools that we have confidence that they work (i.e. bonsai_diff), and we don't need to reimplement all the bonsai logic again
Reviewed By: mitrandir77
Differential Revision: D29633340
fbshipit-source-id: eebdb0e4db5abbab9346c575b662b7bb467497c4
Summary:
Initially I just wanted to address comments from D29515737 (fa8796ae19) about unnecessary
manifest retraversals, but there were a few more problems:
1) We didn't detect file conflicts in the final merge commit correctly. For
example, if additions_merge commit added a file "dir/1.txt", but there's
already file "dir" in target changeset that we won't detect this problem.
2) What's worse is that we might produce invalid bonsai merge changeset
ourselves. Say, if we delete "source_1/dir/file.txt", and then add file
"source_1/dir" in additions merge commit then resulting bonsai changeset should
have "source_1/dir" entry in the bonsai changeset.
This diff does the following:
1) Adds more tests to cover different corner cases - some of them were failing
before this diff.
2) Improves logic to verify file conflicts
3) Instead of trying to generate correct merge bonsai changeset it simplifies
the task and creates a separate deletion commit.
Note that creating a deletion commit on the mainline is something we want to
avoid to not break bisects. This will be addressed in the next diff.
Reviewed By: mitrandir77
Differential Revision: D29633341
fbshipit-source-id: 8f755d852212fbce8f9331049bf836c1d0a4ef42
Summary: This new method will allow the megarepo customers to create a sync target that's branching off the existing target. This feature is meant to be used for release branches.
Reviewed By: StanislavGlebik
Differential Revision: D29275281
fbshipit-source-id: 7b58d5cc49c99bbc5f7e01814178376aa3abfcdf
Summary: needed to set up tw health check
Reviewed By: StanislavGlebik
Differential Revision: D29580808
fbshipit-source-id: 6a3833d652979915fd44dc6d89511192397d8b96
Summary:
I got frustrated with the fact that half of the functions in
megarepo_api required the source name to be wrapped into newtype and
other half didn't. This refactor unifes it everywhere except the thrift
datastructure itself - not sure if we can afffect thrift codegen in this way.
Reviewed By: StanislavGlebik
Differential Revision: D29515474
fbshipit-source-id: 2d55a03cf396b174b0228c3fcc627b2296600400
Summary:
The merge commit in case of change_target_sync_config won't be representing any
consistent state of the target so we don't want to write the remapping state
file there.
Reviewed By: StanislavGlebik
Differential Revision: D29515476
fbshipit-source-id: b0703be1127af6582785510fde51ff8501fb4f17
Summary:
in case of change_target_sync_config we'll be creating move commits only for subset
of sources to let's change the function singature to so it's possible to
specify such subset.
Reviewed By: StanislavGlebik
Differential Revision: D29515475
fbshipit-source-id: 31002ec56dad872948bcbc79b0ed5fdb794e1f10
Summary:
The `change_target_config` methods responsibilities have a huge intersection
with `add_target_config`: the change method needs to know how to merge-in new
sources into the target and the whole "create move commits, then create merge
commits" flow can be reused.
Reviewed By: StanislavGlebik
Differential Revision: D29515301
fbshipit-source-id: c15f95875cbcbf5aad00e5047f6a8ffb55c4da31
Summary:
This reads the config added on D29305462. It populates it into `CommonConfig` struct, and also adds it to `RepoFactory`, but doesn't yet use it anywhere, this will be done on the next diff.
There is a single behaviour change in this diff, which I believe should be harmless but is noted in the comments in case it isn't.
Reviewed By: markbt
Differential Revision: D29272581
fbshipit-source-id: 62cd7dc78478c1d8cb212eafdd789527ead50ef6
Summary:
Previously it wasn't possible because symlink target was a key in the map that
mega_grepo_sync was sending to scs, and so we can't have two different symlink
for the same symlink target. However we actually need it - some of aosp repos
have symlink different sources that point to the same symlink target.
This diff fixes it by reverting the key and valud in the `linkfiles` map.
Differential Revision: D29359634
fbshipit-source-id: da74d6e934350822d82d2135ab06c754824525c9
Summary:
This is just updating the os_info crate to my fork with a fix for Centos
Stream: https://github.com/stanislav-tkach/os_info/pull/267
Reviewed By: quark-zju
Differential Revision: D29410043
fbshipit-source-id: 3642e704f5a056e75fee4421dc59020fde13ed5e
Summary: I think someone landed a dependency change or something and forgot to update autocargo
Reviewed By: dtolnay
Differential Revision: D29402335
fbshipit-source-id: e9a4906bf249470351c2984ef64dfba9daac8891
Summary:
1) Turned out it's possible to have non-prefix free paths in aosp manifests. So
we have to remove this check for now
2) also let's verify config earlier so that we can return an error to the user
faster
Differential Revision: D29335602
fbshipit-source-id: 3dd72d63a370515eca5d356b3b98bb2ac2245aee
Summary: There is a regression in 1.7.0 (which we're on at the moment) so we might as well update.
Reviewed By: zertosh, farnz
Differential Revision: D29358047
fbshipit-source-id: 226393d79c165455d27f7a09b14b40c6a30d96d3
Summary:
Path should be relative to the symlink path, not to the repo root. This diff
fixes it
Reviewed By: farnz
Differential Revision: D29327682
fbshipit-source-id: a51161a8039a88263fe941562f2c2134aa5d4fef
Summary:
Pull in a patch which fixes writing out an incorrect entsize for the
`SHT_GNU_versym` section:
ddbae72082
Reviewed By: igorsugak
Differential Revision: D29248208
fbshipit-source-id: 90bbaa179df79e817e3eaa846ecfef5c1236073a
Summary: Update versions for several of the crates we depend on.
Reviewed By: danobi
Differential Revision: D29165283
fbshipit-source-id: baaa9fa106b7dad000f93d2eefa95867ac46e5a1
Summary: revert the zstd crates back to previous version
Reviewed By: johansglock
Differential Revision: D29038514
fbshipit-source-id: 3cbc31203052034bca428441d5514557311b86ae
Summary:
Turned out we still don't process copyfile attribute correctly. copyfile
creates two overrides like
{
"copyfile source" -> default prefix + "/copyfile source"
"copyfile source" -> "destination"
}
However we also have a check that validates that there no paths (that includes
default path, overrides, linkfiles etc) that are prefixes of each other. And
this check fails in the copyfile case, even though the path maps to exactly the
same path as if default prefix was applied.
Let's fix it. Also note that we haven't seen this failure in the integration
test because we didn't run verify_paths in tests. This diff fixes it as well.
Reviewed By: mitrandir77
Differential Revision: D28992456
fbshipit-source-id: 5fd993914b189cf768ba03010194b1c26026f7a8
Summary: Update to latest version. This includes a patch to async-compression crate from [my PR updating it](https://github.com/Nemo157/async-compression/pull/125), I will remove once the crate is released.
Reviewed By: mitrandir77
Differential Revision: D28897019
fbshipit-source-id: 07c72f2880e7f8b85097837d084178c6625e77be
Summary: I need it for test in next diff
Reviewed By: StanislavGlebik
Differential Revision: D28959433
fbshipit-source-id: c67ca7eec03f94425332e446f6f97038edff598d
Summary:
Summar:
I'll be using it more in the next diff, so why not have it in it's own method.
Reviewed By: StanislavGlebik
Differential Revision: D28887333
fbshipit-source-id: 35accb495a577e1c01ec8114fc60acf38ed11fee
Summary:
When syncing merge commits with two parents it would be nice if it was the first parent that comes from the unified branch. In **case of octopus merges** we really don't want
the parent in unified branch to be third (that would turn the sync into
non-forward move!). Let's add a way to tell the commit rewriter which parent
needs to be first.
Reviewed By: farnz
Differential Revision: D28885488
fbshipit-source-id: 57a081ce2d285ba2b6d6d98110cd1c64a241548e
Summary:
The worker should be able to process the requests from the queue no matter
which repo it is and what are it ACLs. It's during the request scheduling when
we should check the identity of the entity scheduling request.
Reviewed By: StanislavGlebik
Differential Revision: D28866807
fbshipit-source-id: 5d57eb9ba86e10d477be5cfc51dfb8f62ea16b9e
Summary: This makes it easier to understand what each move and merge commits are for.
Reviewed By: mitrandir77
Differential Revision: D28839677
fbshipit-source-id: 1a42205c164224b64c773cff80b690b251a48381
Summary:
This is a precaution. add_sync_target can create a very branchy repository, and
I'm not sure how well Mononoke is going to handle deriving of these commits by
all mononoke hosts at once (in particular, I'm worried about traversal that has
to be done by all hosts in parallel). In theory it should work fine, but
deriving data during add_sync_target call should be a reasonable thing to do
anyway.
While working on that I noticed that "git" derived data is not supported by derived data utils, and it was causing test failures. I don't think there's any reason to not have TreeHandle in derived data utils, so I'm adding it now.
Reviewed By: farnz
Differential Revision: D28831052
fbshipit-source-id: 5f60ac5b93d2c38b4afa0f725c6908edc8b98b18
Summary:
source_name to be unique even if the same git-repo and project name is mapped several times in the same manifest.
source_name need to match between all the different places we use it, `add_target_config, sync_changeset, changesets_to_merge` we where not consistent.
Reviewed By: StanislavGlebik
Differential Revision: D28845869
fbshipit-source-id: 54e96dcdeaf22ec68f626e9c30e5e60c54ec149b
Summary:
We are going to create quite a lot of move commits at the same time, and it can
be slow. Let's instead create them in parallel, and then call
`save_bonsai_changesets` for all the commits in one go.
Reviewed By: mitrandir77
Differential Revision: D28795525
fbshipit-source-id: f6b6420c2fe30bb98680ac7e25412c55c99883e0
Summary:
Previously add sync target method was creating a single merge commit. That
means that we might create a bonsai commit with hundreds of parents. This is
not ideal, because mercurial can only work correctly with 2 parents - for
bonsai changeset with 3 parents or more mercurial file histories might be lost.
So instead of creating a single giant merge commit let's create a stack of
merge commits.
Reviewed By: mitrandir77
Differential Revision: D28792581
fbshipit-source-id: 2f8ff6b49db29c4692b7385f1d1ab57986075d57