Summary:
`Node` is a name that is overloaded. It means that in our conversations
we always have to constantly define which Node we are talking about.
Is it the type in Mercurial or the generic term in graphs/trees.
It is just an identifier so we rename it such.
Note that this doesn't remove the Node term yet. I will take care of the
longtail of uses incrementally.
Things that are not renamed:
* linknode
* nodeinfo
* WireHistoryEntry
* cdatapack
Reviewed By: quark-zju
Differential Revision: D18010149
fbshipit-source-id: 83b3911e231e4544848391cd3deb6e44ec2b809d
Summary:
We have most of the support for ChangegroupV3 in place already (we've been
using it commit cloud for a while, and we have tests for pushrebase). However,
we didn't really support push because we are looking for the wrong field in the
wrong set of headers (and we don't advertise V3 in capabilities).
This fixes that. It does make us a little more lenient by allowing the version
in either advisory and mandatory params (mainly because our commit cloud sync
and Mercurial itself look like they're doing something different there).
Note: I removed the constant names since the field name is really a thing that
only makes sense for each individual part type, so it seemed like it made more
sense to have the 2 right next to each other.
Reviewed By: ikostia
Differential Revision: D17982127
fbshipit-source-id: 32f822f7aa5fcb7ca5eded5d0cd5fff4af36e606
Summary: Earlier commits add a Mononoke API call for cross repo lookups, and a thrift interface for it. Provide the last piece.
Reviewed By: ikostia
Differential Revision: D17856468
fbshipit-source-id: c5c5714a5cea14b550954af8b19f676a89260342
Summary:
Unode batch has only up to 110 entries, while the initial request can want to skip or get much more commits.
Using `bounded_traversal_dag` I fetch as many changesets as I need and then sort them in BFS order before returning the result.
The algorithm: fetch first unode batch for the changeset we have, add <changeset -> parents> edges to the commit graph, then for each changeset in the batch, which has parents, but they are not in the batch (i.e. `FastlogParent::Uknown`), fetch new unode batch. Sort in BFS order.
Reviewed By: StanislavGlebik
Differential Revision: D17490344
fbshipit-source-id: a272223e667a0b989c1113824500b43e207454e8
Summary:
`bool` (and it's literals) can be confusing without context, it can prompt
people to go and look at function definitions. And explicitly-named enum with
well-named variants has immediately clear meaning.
Reviewed By: farnz
Differential Revision: D17978343
fbshipit-source-id: 5ac3f39c2bc19da504b94ed80aaf9c427d1deaac
Summary:
Currently, it is possible to combine normal, pure push and infinitepush push in the same `unbundle` wireproto command. I would argue that this is a bad thing, as they are pretty different beasts. This has lead to a certain amount of functions that have to deal with multiple variants of enums and, IMO, has made code harder to understand. This diff tries to separate these concerns from the perspective of `PostResolveAction` and below: separate all the verification, conversion and processing functions down until we hit the database layer.
It also gets rid of "resolve" naming in one place, as it has no clear semantic meaning (compared to "save to db" at least) IMO.
Note that this is not just a tech debt thing: for push redirector we need to know which scenario we are in to apply the correct redirection semantics.
Reviewed By: farnz
Differential Revision: D17966069
fbshipit-source-id: e7fe8384a35a0f5945e89b91166e11f2f675e79f
Summary:
When we generate bundles, we choose which files to send as LFS to the remote. Right now, we're configured so that the Mononoke and client thresholds match, and our LFS server is designed to replicate blobs into Dewey synchronously, so this should be safe.
That said, to further minimize the risk of breakage, we can verify that the blobs we are about to send as LFS are indeed present in the other backend's LFS server. This does that.
Reviewed By: StanislavGlebik
Differential Revision: D17957232
fbshipit-source-id: bfc77470377afcc94537fb627c214a94e8f08c0b
Summary:
We'd like to have the sync job validate that the blobs it's syncing over in
Dewey are present there (that should always be the case, but we should be
careful). Extracting our LFS protocol implementation into its own crate will
make it easier to do so.
Reviewed By: ikostia
Differential Revision: D17957233
fbshipit-source-id: fcef1b72503d4e8d7f2072d6a61e58289ce8e009
Summary:
Now find new commits (i.e. commits from source_repo that are not in target
repo) and backsync them
Reviewed By: krallin
Differential Revision: D17930706
fbshipit-source-id: ca08f2b8c644a4dbab6edec5019e4445f5c5b011
Summary:
This new method syncs all log entries until the the certain log entry.
This function will first read from replica, however if it can't find new entries in the replica it will try to fetch from master.
Reviewed By: krallin
Differential Revision: D17929152
fbshipit-source-id: 77fa0ef28ae3c530d427e079079d2d4d6b6c6b62
Summary:
This has been enabled for a while now. We won't be going back to using
loosefiles, so let's make fetchpacks the default in the code. A future step
will remove the code paths that are no longer exercised.
Reviewed By: quark-zju
Differential Revision: D17919275
fbshipit-source-id: 0614f5710b630690de92cdb43ec07d3a2888aa1e
Summary:
The file hook cache stores the result of running a hook on a specific
file. It allows the result of expensive hooks to be reused. However, there are
some limitations that mean the cache is practically useless. The hook cache
will be populated when a push happens to the server. The only case in which the
cached result is used is if someone pushes a file that exactly matches a
previous push to the exact same server (which hasn't been restarted). In
practice, this is very unlikely to happen.
In order to include the hook cache, the code for file and changeset hooks is
quite different. Removing the cache allows for some unification (such as a
single run_hook function) and simplification. I've also added logging to
the changeset hooks, as previously we only printed debug output for the file
hooks.
Further, it allow the removal of asyncmemo in a few places.
Reviewed By: StanislavGlebik
Differential Revision: D17932986
fbshipit-source-id: df8220aea7511a00aeb6b9de615e15d657bf4602
Summary:
This diff adds a missing test for D17952529. Unfortunately unit test won't work
because the problem happens only on mysql. For that I've added a mysql
integration test.
Also to make problem visible the setting the counter now errors out if
transaction failed
Reviewed By: HarveyHunt
Differential Revision: D17954122
fbshipit-source-id: 17b6bc563bf30fe1808e6301ee83a209759e674d
Summary:
In the multi-directional-sync world it is not enough to have a single direction
specified for all small repos, let's move it to the small repo config level.
This prompts some changes in how we do verification:
- no overlapping prefixes are allowed among all small repos with small-to-large
sync direction (when small repos tail into the large repo, we cannot deny
a pushrebase because of the path conflicts (since the small repo has already
accepted the commit in the past), so let's prohibit path conflicts in the first place)
- no overlapping prefixes are allowed between *all* small-to-large repos and
*each* large-to-small repo (same reason, we cannot reject tailed commits,
so we need to protect ourselves from path conflicts)
- overlapping *is* allowed between large-to-small repos, but only when used
prefixes are identical, not when one prefix is a prefix of another prefix.
To be clear, I am not certain that we cannot accommodate the latter case,
but I just want to avoid additional complexity so far. Overlapped prefixes
represent the "locked across repos" directories: ones, which can be changed
from each of the small repos as part of a push redirector pushrebase
Reviewed By: StanislavGlebik
Differential Revision: D17930877
fbshipit-source-id: 97f0ab8f5975f9635e84716851fe42c8cc8800f5
Summary:
While we have the `RepositoryId` dedicated type, the `RepoConfig` struct used
`i32` for its `repoid` field, so in a lot of places, which referenced or used
this config we continued to use `i32`. This is sad, so let's fix this.
This diff changes `RepoConfig` and everything that depends on it.
Ideally we would also just remove the `.id()` method of the `RepositoryId`
struct, so that our clients have to use the struct, but in a few places it
is deserialized into thift/SQL types, so unless we move those transformations
into the same file (which also seems bad and not future-proof), we cannot get
rid of the method.
Reviewed By: farnz
Differential Revision: D17928574
fbshipit-source-id: 3d9355272cfcd20af787edd6417cc529be640356
Summary:
Some observations:
- `bundle2_resolver` isn't actually used by anything, except for the `repo_client`
- `pushrebase`, on the other hand is used by a few things and feels like an independent piece of core logic
- `bundle2_resolver` only uses structs from `pushrebase`, not any actual logic
The proposal then is:
- make `pushrebase` a separate top-level crate
- move `bundle2_resolver` to be a subcrate of `repo_client`
Reviewed By: krallin
Differential Revision: D17908254
fbshipit-source-id: 1f412c5f91cc26393a44a06de2a05268da740a8e
Summary:
This is part of work on the push redirector, which works in this scenario:
1. an engineer tries to push a commit to `fbsource`/`ovrsource` (aka small repo)
1. push redirector makes sure that this push is redirected to `megarepo_test` (aka large repo). This means:
1. saving every uploaded changeset from the bundle2 into the original repo
1. syncing all of the uploaded changesets in the target (large) repo
1. parsing the instructions of what has to be done with the uploaded stuff
1. transforming these instructions so that they apply to the target repo
1. running these instructions in the target repo, noticing which changesets and bookmarks they produce
1. syncing these produced changesets back in to the original (small) repo
Steps 2.b, 2.d, 2.e and 2.f require `RepoClient` to know about the target repo and to know how to sync commits back and forth and how to change bookmark names.
Current `RepoSyncTarget` is basically my best guess of what may be needed, but the important bit is that it's threaded through and to add/remove something it is enough to change some `repo_handlers` code and that's it.
Reviewed By: StanislavGlebik
Differential Revision: D17900899
fbshipit-source-id: 5bc2b5dac5ecc7dcef855b9c8f91fa8a618fbbad
Summary:
This error reports bookmark hooks that are referenced but do not exist, but it
doesn't tell you which hooks don't exist, which is a bit inconvenient. This
fixes that.
Reviewed By: StanislavGlebik
Differential Revision: D17930267
fbshipit-source-id: 46ba3a2d448db66f3c49770283f22020074501b0
Summary:
This updates the repo client and hook tailer to use the text only store.
This does represent a slight behavior change in hooks for all repos (towards allowing more things) since we no longer run things like conflict marker checks on files that are too big. That seems like a reasonable thing to do, though.
Reviewed By: StanislavGlebik
Differential Revision: D17930266
fbshipit-source-id: 0abdf2382ec6b45558002c6aeed46da0acb840ea
Summary: This adds a new content store that doesn't return file contents for files that are larger than a certain threshold, or contain null bytes (which is how we've historically approximated binary-ness).
Reviewed By: StanislavGlebik
Differential Revision: D17930268
fbshipit-source-id: 2736a9bd0713114b8992ac922792e08315f092bd
Summary: This is a minor refactor to clean things up by moving the in-memory content stores to the content-stores crate, instead of having them inline in hooks/lib.rs.
Reviewed By: StanislavGlebik
Differential Revision: D17930269
fbshipit-source-id: dc55b48a19c1b3f4b06f656613933f63063f0c69
Summary:
This diff moves us towards having content stores that may choose to not return contents if those contents don't appear meaningful (e.g. because they are large files, or binary files).
This also updates some lua hooks accordingly, and removes some functions we're not using that wouldn't really work once contents are optional (i.e. `contains_string`).
Reviewed By: StanislavGlebik
Differential Revision: D17930265
fbshipit-source-id: e3de1ae58b3728debbe4568f588a011bf858efe9
Summary:
Revision numbers are deprecated, let's not print them in the rebase output.
The tests were automatically fixed with run-tests.py -i
Reviewed By: quark-zju
Differential Revision: D17936451
fbshipit-source-id: a8f0403b6af4573421ca874e9311f26931eba697
Summary:
It handles only very specific case of merges - repo imports e.g. no ancestor of
p2 has a parent that's ancestor of p1. Also at the moment it doesn't support
Mover that removes a p2's ancestor commit.
There's no reason to not add support for both of these cases later though.
Reviewed By: ikostia
Differential Revision: D17899741
fbshipit-source-id: 7d96c9dd13790e1fd8ee96a9d2f486d56e25abf9
Summary:
Debug logging each iteration of the background bookmark cache updater is
distracting. Instead, just log when it starts.
Reviewed By: krallin
Differential Revision: D17928810
fbshipit-source-id: 77cecc8641b145e921acc4fc91c1be14c22576c9
Summary:
Add a oneshot channel to the background bookmarks updater that terminates the
updates when it is triggered. Use this to terminate the background updater
when the warm bookmarks cache is dropped.
Reviewed By: krallin
Differential Revision: D17928811
fbshipit-source-id: 46b55450d803d26f36c12bda30e9b8d9dd561573
Summary: Use `for_each` to iterate over the infinite stream, rather than `collect`.
Reviewed By: krallin
Differential Revision: D17928812
fbshipit-source-id: 21cb196af080e4b9ba87237a2705dfcb94eb5a93
Summary:
Extract the cache of warmed-up bookmarks (those for which derived data has been
computed) from the apiserver code so that we can re-use it.
Reviewed By: krallin
Differential Revision: D17913440
fbshipit-source-id: 97bf980d4c9dc3fff93000e3d30f6ea3b783bd43
Summary: Print calculated statistics in csv format, so we can generate csv file by redirecting output to some file. As a TODO we may want to add new fields to RepoStatistics struct, refactor code and create csv file using e.g. serde deserialization.
Reviewed By: krallin
Differential Revision: D17907650
fbshipit-source-id: 0e7f0af522cc72c067d59431039e44998d5dd354
Summary:
Changes to blobrepo to make it more reusable from graph_walker
* Add open method that recieves underlying blobstore. Planning to use this from graph_walker.
* Only wait for myrouter in xdb mode.
Reviewed By: krallin
Differential Revision: D17113943
fbshipit-source-id: 6794c2e1b8ec7ff1ebba2ecdf7c5cc7963ca7b32
Summary: They are always used together, so let's just bundle them.
Reviewed By: krallin
Differential Revision: D17910811
fbshipit-source-id: 21afc43814d9fd513fc02498e7dae1e68606e389
Summary: Removes the dependency of ////common/rust/thrift/runtime:fbthrift// on ////common/rust/bytes-ext//. This is a move toward a self-contained runtime that is easy to open source.
Reviewed By: yfeldblum
Differential Revision: D17891341
fbshipit-source-id: ddc53735c3ecde32e16a10bf98ae24a68aec9d82
Summary: The BufExt trait only exists for Thrift. This change moves it into our fbthrift runtime which already contains a similar BufMutExt. This is a move toward a self-contained runtime that is easy to open source.
Reviewed By: yfeldblum
Differential Revision: D17891338
fbshipit-source-id: 17fe9d672ebb866e47c47bbd7c3b7c3da8d327ef
Summary:
Some backfill types, e.g. fsnodes, require access to the unredacted repo in
order to correctly compute their values. For example, fsnodes require the
content hashes of the redacted blobs to correctly form the fnsode graph.
This kind of access is ok, as we are only deriving summary information
(hashes), and won't reveal the content of the redacted blob.
Add the ability for `backfill_derive_data` to access the unredacted repo,
limited only to whitelisted derived data types.
Reviewed By: ikostia
Differential Revision: D17877191
fbshipit-source-id: 6c9b1dfdfb2e6f431815ddf3de60029fbb180454
Summary: This lets a user of a megarepo look up subrepo hashes
Reviewed By: ikostia
Differential Revision: D17831869
fbshipit-source-id: 16962fd7b09053e4a7be196ccd0f52c700626bdd
Summary:
Follow up from D17765160. Let's unify logview initialization in one place.
This will let us log to logview from all binaries we have in mononoke
Note - the diff touches a lot of files, but main changes are in
cmdlib/src/args.rs, apiserver/src/main.rs and server/src/main.rs.
Reviewed By: krallin
Differential Revision: D17895480
fbshipit-source-id: c922adfb385461ff168bd788e42ea1b88891f7cf
Summary: I'll add WireprotoLogging struct, so need to rename this one
Reviewed By: ikostia
Differential Revision: D17878970
fbshipit-source-id: 2966f5cb8c8d2399e1691ef1c710dd9362a976ee
Summary:
This diff updates all license headers to use the new text and style.
Also, a few internal files were missing the header, but now they have it.
`fbcode/common/rust/netstring/` had the internal header, but now it has
GPLV2PLUS - since that goes to Mononoke's Github too.
Differential Revision: D17881539
fbshipit-source-id: b70d2ee41d2019fc7c2fe458627f0f7c01978186
Summary:
Similarly to `Mover`, which acts on paths according to the sync config,
we need a `BookmarkRenamer`, which acts on bookmark names.
The logic now supports two types of bookmark configs:
- common pushrebase bookmarks: they are common for all small repos and they
don't get renamed. `master` is one example of those, and I don't know if
there can be any other :)
- prefixes: when the bookmark is updated in the small repo, an equivalent,
but prefixed bookmark is updated in the large repo. Correspondingly, when
a prefixed bookmark is updated in a large repo, an equivalent, but unprefixed
bookmark is updated in the small repo.
There are also "error" cases:
- bookmark is neither common pushrebase, nor prefixed with the correct small
repo's prefix. In that case we plan not to update that bookmark (although,
create the corresponding commit(we may have to be careful and make it draft
instead of public or something like this))
Reviewed By: krallin
Differential Revision: D17877749
fbshipit-source-id: 6e2f67456de354c2358376e5762fabffe0de3a42
Summary: Added option to calculate statistics for old changesets saved in file. Currently it prints calculated statistics.
Reviewed By: StanislavGlebik
Differential Revision: D17811305
fbshipit-source-id: 162946941b9e153ffedc1fc28539eebacab77132
Summary:
Now that we don't fire this error from bundle2_resolver, let's also not
create it from there.
Reviewed By: StanislavGlebik
Differential Revision: D17856088
fbshipit-source-id: 4e0ec896dcb4047c01bc9d858927d6e432b43962
Summary:
See bottom diff of the stack for the entire stack's purpose.
The goal of this particular diff is to address step 2:
Things that are only used by `repo_client::unbundle` and not by `bundle2_resolver`, should be moved out of `bundle2_resolver`. While moving such things, it will turn out that some things that `Bundle2Resolver` struct owned (and some things that `bundle2_resolver::resolver::resolve` function accepted as arguments) are only threaded down to whichever post-resolve action is appropriate. Such things should just not be passed into `resolve` in the first place.
In theory, after this diff it is already possible to introduce something in between the `resolve` and `run_post_resolve_action` calls.
Reviewed By: StanislavGlebik
Differential Revision: D17853603
fbshipit-source-id: 0442b478dad82723383896666096eec2aec64e58
Summary:
The goal of this stack is:
- extract the actual "business logic" of `unbundle` wireproto command out of `bundle2_resolver`
- make `bundle2_resolver` only parse the bundle, save uploaded blobs/changesets and return the desired next action with the required arguments
- run the action (`PostResolveAction`) in the `repo_client::RepoClient::unbundle`
This allows us to later insert the push redirection logic into the `unbundle()` fn like this:
1. `bundle2_resolver` resolves the uploaded bundle and saves all the changesets. At this point, instead of dealing with the stream of bytes we deal with uploaded changesets and a full knowledge of what needs to happen next (push, infinitepush, pushrebase or bookmark-only pushrebase)
1. check if push redirection needs to happen. If it does not, just to the last step.
1. Run commit sync on all of the uploaded changesets, thus creating them in the target repo. Then replace source repo commit ids with target repo commit ids in action arguments
1. Run the action in the target repo
1. Wait until the created changesets are synced into the source repo, generate and record (hg) changeset ids in the source repo
1. Construct a correct reply part
This is a cumbersome refactoring, therefore to make it easier to review (and write and think about) I break it down into multiple steps:
1. (this diff) Introduce the `PostResolveAction` `enum` and make sure `bundle2_resolver::resolve` returns it instead of a created response. Make `repo_client::RepoClient::unbundle` look at this `enum` and run the appropriate action from the newly-created `repo_client::unbundle` module. No business logic change, everything is completely mechanical, just copy-pastes. Because I don't yet move the functions/structs from `bundle2_resolver::resolver`, I need to make some things `pub` that I wouldn't otherwise. The goal is to move them into `unbundle` later and remove `pub` as I go. The goal is also to eventually never leak `BundleResolver` struct out of `bundle2_resolver::resolver` file.
1. Things that are only used by `repo_client::unbundle` and not by `bundle2_resolver`, should be moved out of `bundle2_resolver`. While moving such things, it will turn out that some things that `Bundle2Resolver` struct owned (and some things that `bundle2_resolver::resolver::resolve` function accepted as arguments) are only threaded down to whichever post-resolve action is appropriate. Such things should just not be passed into `resolve` in the
first place.
1. Separate `Push` and `InfinitePush` processing (simultaneously introducing a new variant to `PostResolveAction`)
Reviewed By: StanislavGlebik
Differential Revision: D17831366
fbshipit-source-id: f19f9d7a7abed28d1c6bf519ef81942699b892bc