Summary: There were already helpers to make this code less copy-pasty, this diff just uses them.
Reviewed By: markbt
Differential Revision: D30933408
fbshipit-source-id: acc27a0904425eccfc71fee884a8f2035ed0c37f
Summary:
The `backfill_batch_dangerous` method requires that the caller ensures
that all dependencies of the batch have been derived, otherwise errors,
such as mappings being written out before the things they map to, can
occur.
When the derived data manager takes over batch derivation, it will enforce this
requirement, so that it is no longer dangerous. However, The backfiller tests
were not ensuring the invariant, so the tests will fail with the new derivation
implementation.
Fix the tests by ensuring the parent commits are always derived before a
batch is started. The test is also extended to expose the failure mode
of accidentally deriving batch parents. This will be fixed in the next
commit.
Reviewed By: yancouto
Differential Revision: D30959132
fbshipit-source-id: 8489a5d0b375692a903854294e3810846c9e13de
Summary:
Implement `DerivedUtils` using the `DerivedDataManager`.
This is just for migration. In the future `DerivedUtils` will be replaced by the manager.
Reviewed By: yancouto
Differential Revision: D30944568
fbshipit-source-id: 32376e3b4aeb959e63f66e989a663c21dee30ba5
Summary:
Implement a new version of data derivation in the derived data manager. This is different from the old version in a few ways:
* `derived_data::BonsaiDerivable` is replaced by `derived_data_manager::BonsaiDerivable`. This trait defines both how to perform derivation and how to store and retrieve mapping values. Derivation is performed with reference to the derived data manager, rather than `BlobRepo`.
* The old `Mapping` structs and traits are replaced with a direct implementation in the derived data manager, using the `BonsaiDerivable` trait to handle the derived-data-type-specific parts.
* The new implementation assumes we will stick with parallel derivation, and doesn't implement serial derivation.
Code is copied from the `derived_data` crate, as it is intended to be a replacement once all the derived data types are migrated, and re-using code would create a circular dependency during migration.
This only covers the basic derivation implementation used during production. The derived data manager will also take over backfilling, but that will happen in a later diff.
Reviewed By: yancouto
Differential Revision: D30805046
fbshipit-source-id: b9660dd957fdf762f621b2cb37fc2eea7bf03074
Summary:
The `find_oldest_underived` method of `DerivedUtils` is used outside tests by
exactly one client (the backfiller in tailing mode). Simplify the
`DerivedUtils` trait by extracting this method from the trait, and replacing
with a more general one that will be easier to implement in terms of the
derived data manager.
Reviewed By: yancouto
Differential Revision: D30944567
fbshipit-source-id: a1d408e091d145297241a5eebc02a87155bc3765
Summary:
Split the `BonsaiDerived` type in two:
* `BonsaiDerived` is now just the interface which is used by callers
who want to derive some derived data type. It will be implemented by
both old and new derivation.
* `BonsaiDerivedOld` is the interface that old derivation uses to
determine the default mapping for derivation. This will not be
implemented by new derivation, and will be removed once migration is
complete.
Reviewed By: yancouto
Differential Revision: D30944566
fbshipit-source-id: 5d30a44da22bcf290ed3123844eb712c7b37dea4
Summary:
The builder pattern turned out to be unnecessary, as mappings don't need to be
stored in the manager after all.
Reviewed By: StanislavGlebik
Differential Revision: D30944565
fbshipit-source-id: 4300cdcc871c89f98e42d5b47600ac640b4b94eb
Summary:
Make the derivation process for mercurial filenodes not depend on `BlobRepo`.
Instead, use the repo attributes (`RepoBlobstore` and `Filenodes`) directly.
This will allow us to migrate to using `DerivedDataManager` in preparation
for removing `BlobRepo` from derivation entirely.
The existing use of `changesets` for determining the commit's parents is
changed to use the parents from the bonsai changeset. For normal derivation,
the bonsai changeset is already loaded, so this saves a database round-trip.
For batch derivation we currently need to load the changeset, but it should
be in cache anyway, as other derived data types will also have loaded it.
We still need to keep a `BlobRepo` reference at the moment. This is because
filenodes depend on the mercurial derived data. The recursive derivation is
hidden in the call to `repo.get_hg_from_bonsai_changeset`. When derivation
is migrated to the derived data manager, we can replace this will a direct
derivation.
Reviewed By: StanislavGlebik
Differential Revision: D30765254
fbshipit-source-id: 20cc17c2eb611544869e5f1c15d858663cd60fd1
Summary:
Let's give them a more descriptive names so that it's easier to understand
what's going on.
Reviewed By: markbt
Differential Revision: D31022612
fbshipit-source-id: 8e4f516f3d0b1cd661b1a8fceba80a8f85a2ed4f
Summary:
This is a new option in split_batch_in_linear_stacks - it either aggregates
file changes from all ancestors in the stack or not. Currently all of our
callsites wants Aggregate, but in the next diff we'll add a new callsite that
doesn't
Reviewed By: markbt
Differential Revision: D31022444
fbshipit-source-id: ce0613863855163f26ab18c7f35142ae569eb31a
Summary:
this relies on local changes to make it so cargo metadata ACTUALLY finds this
binary: https://github.com/tokio-rs/console/pull/146 is where I try to upstream
it
Reviewed By: jsgf
Differential Revision: D30944630
fbshipit-source-id: 5d34a32a042f83eff7e7ae7445e23badf10fffe3
Summary: For the time being we don't have checksums in saved states. As a temporary workaround add the ability to derive the checksum from the naming table.
Differential Revision: D30967637
fbshipit-source-id: 4ac34d988d08c9af6f08f7ce46206f756cf1cf0c
Summary: Without this bit of information we can't tell where the sync came from (i.e. from which of two repos) so we can't reliably find a commit "source" for a landed commit.
Reviewed By: StanislavGlebik
Differential Revision: D30902774
fbshipit-source-id: d85d0d028fbd6bfb2d64bce89bc7934bad2e242b
Summary:
This is a very basic commands that uses debug-printing to display all the
request details. In the future we might want to make it more ellaborate but
as-it-is it works.
Reviewed By: StanislavGlebik
Differential Revision: D30965076
fbshipit-source-id: 561c64597b94359843e575550be0ae6f39fad7bf
Summary:
This debug command will allow the user to see and interact with currently
running async requests.
Reviewed By: StanislavGlebik
Differential Revision: D30965077
fbshipit-source-id: 259f1af0eb6ade4a34f6004c7b1ad63cd5f0bc9f
Summary:
It makes it a bit hard to do experiments and compare derivation results.
It's easy to compare these types, so let's do it.
Reviewed By: mitrandir77
Differential Revision: D31017823
fbshipit-source-id: 6173bba53c7ee254198e023dde57564fe9c3efed
Summary:
This will be used in the next diffs to add batch derivations for unode.
Also it makes it symmetrical to create_manifest_unode
Reviewed By: mitrandir77
Differential Revision: D31015719
fbshipit-source-id: 65e12901c6a004375c7c0e3b07f1632ac9c6eaa8
Summary:
In some cases (e.g. when master bookmark moves backwards) there might be
commits in segmented changelog that are not ancestors of master. When reseeding
we still want to build segments for these chagnesets, and this is what this
diff does (see D30898955 for more details about why we want to build segments
for these changesets).
Reviewed By: quark-zju
Differential Revision: D30996484
fbshipit-source-id: 864aaaacfc04d6169afd3d04ebcb6096ae2514e5
Summary:
Delete a non-existing file is fine, and also deleting a file when a directory
with the same name just ignores the delete.
This diff adds tests to cover these cases. Overall it seems like a bug, but I'm
not sure it worth fixing - who knows if we have bonsai changesets that rely on
that!
Reviewed By: yancouto
Differential Revision: D30990826
fbshipit-source-id: b04992817469abe2fa82056c4fddac3689559855
Summary:
This method allows to append a value instead of just replacing it.
It will be used in the next diff when we derive manifest for a stack of commits
in one go.
Reviewed By: yancouto
Differential Revision: D30989889
fbshipit-source-id: dd9a574609b4d289c01d6eebcc6f5c76a973a96b
Summary:
Changes:
- Limit simultainous open git-repo objects by amount of CPUs.
- Put a semaphore limit so we wait inside tokio::task domain instead of tokio::blocking domain (later is more expensive and has a hard upper limit).
Reviewed By: mitrandir77
Differential Revision: D30976034
fbshipit-source-id: 3432983b5650bac6aa5178d98d8fd241398aa682
Summary:
This allows the mononoke_api user to choose whether the skiplists
should be used to spped up the ancestry checks or not.
The skiplists crate is already prepared for the situation where skiplist
entries are missing and traverses the graph then.
Reviewed By: yancouto
Differential Revision: D30958909
fbshipit-source-id: 7773487b78ac6641fa2a427c55f679b49f99ac8d
Summary:
Allow the mononoke_api user to choose whether they want
oprerations to be sped up using WBC or not.
Reviewed By: yancouto
Differential Revision: D30958908
fbshipit-source-id: 038cf77735e7c655f6801d714762e316b6817df5
Summary:
Some crates like mononoke_api depend on warm bookmark cache to speed up the
bookmark operations. This prevents them from being used in cases requiring
quick and low overhead startup like CLIs.
This diff makes it possible to swap out the warm bookmark cache to a
implementation that doesn't cache anything. (See next diffs to see how it's
used in mononoke_api crate).
Reviewed By: yancouto
Differential Revision: D30958910
fbshipit-source-id: 4d09367217a66f59539b566e48c8d271b8cc8c8e
Summary:
This method was added before the more generic list method was added. Let's get
rid of it for simplicity and to discourage listing all the bookmarks.
Reviewed By: yancouto
Differential Revision: D30958911
fbshipit-source-id: f4518da3f34591c313657161f69af96d15482e6c
Summary:
0.4.24 is incompatible with crates that use `deny(warnings)` on a compiler 1.55.0 or newer.
Example error:
```
error: unused borrow that must be used
--> common/rust/shed/futures_ext/src/stream/return_remainder.rs:22:1
|
22 | #[pin_project]
| ^^^^^^^^^^^^^^
|
= note: this error originates in the derive macro `::pin_project::__private::__PinProjectInternalDerive` (in Nightly builds, run with -Z macro-backtrace for more info)
```
The release notes for 0.4.28 call out this issue. https://github.com/taiki-e/pin-project/releases/tag/v0.4.28
Reviewed By: krallin
Differential Revision: D30858380
fbshipit-source-id: 98e98bcb5a6b795b93ed1efd706a1711f15c57db
Summary:
There's no real equivalent of hg changeset of snapshot, so let's not derive it.
Closes task T97939172
Reviewed By: liubov-dmitrieva
Differential Revision: D30902073
fbshipit-source-id: 8128597c25e12e40e719cdd7800d4b9b792391c9
Summary: We use it as an unique key for the detector
Reviewed By: ginfung
Differential Revision: D30703470
fbshipit-source-id: cb8e7dae5dc4192402530b2cfe564b86aa23c7c8
Summary:
Edenapi lookup (for file content, filenodes and trees): check all the multiplexed blobstores when we check is_present.
This will help us to avoid undesired behaviour for commit cloud blobs that haven't been replicated to all blobstores. Healer currently doesn't check commit cloud blobs.
Reviewed By: StanislavGlebik
Differential Revision: D30839608
fbshipit-source-id: d13cd4500f7b14731d8b75c763c14a698399ba02
Summary:
Make it more detailed, especially about corner cases. Avoid ambiguous words
like "valid" etc.
Reviewed By: farnz
Differential Revision: D30876339
fbshipit-source-id: a45ca643c6454645f7729053a7ea5dd78016fc68
Summary:
Some time ago (see D25910464 (fca761e153)) we've started using Background session class
while deriving data. This was done to avoid overloading blobstore sync queue - if Background
session class is set then multiplex blobstore waits for all blobstores to
finish instead of writing to the blobstore sync queue right away. However if any of the
blobstores fails then we start writing to the blobstore sync queue. In theory it should have avoided the problem of overloading blobstore sync queue while having the same multiplex reliability (i.e. if only a single blobstore fails the whole multiplex put doesn't fail)
Unfortunately there was a flaw - if blobstore put of a single blobstore wasn't
failing but was just too slow, then the whole multiplexed put operation becomes
too slow. This diff fixes this flaw by adding a timeout - if multiplexed put is
taking too long then we fallback to writing entries to the blobstore sync
queue.
Note that I added a new session class - BackgroundUnlessTooSlow -
because I figured that in some cases we are ok with waiting a long time but not
writing to the sync queue. Skiplist builder might be a good example of that -
since it's doing overwrites then we don't want to write to the blobstore sync
queue at all, because healer doesn't process overwrites correctly.
Reviewed By: farnz
Differential Revision: D30892377
fbshipit-source-id: 69ac1795002b124e11daac13d8bfe59895191168
Summary:
I added logging in D30805504 (d5e2624fbb), however it wasn't really logging anything,
because I forgot to pass scuba sample builder to CoreContext (facepalm).
This diff fixes it.
Reviewed By: HarveyHunt
Differential Revision: D30899642
fbshipit-source-id: 6e20f1e84fc96175be8ca7a6f91c0fc61caf8e49
Summary:
It looks like the comment is misleading (we don't really derive anything in
this block, just finding underived commits), and this CoreContext override
doesn't seem necessary anymore. Let's remove it
Reviewed By: farnz
Differential Revision: D30899641
fbshipit-source-id: 2850905891a9bd8b01f3f6fa9ef15c572fc2f07a
Summary:
Add an endpoint to provide repo configuration information, such as whether
segmented changelog is supported by the server or not. This helps the client
make decisions without hitting actual (expensive) endpoints and distinguishing
from unrelated server errors. It would allow us to remove error-prone
client-side config deciding whether to use segment clone.
Reviewed By: krallin
Differential Revision: D30831346
fbshipit-source-id: 872e20a32879e075c75481f622b2a49000059d04
Summary:
In a future diff, we want an endpoint to test if segmented changelog is
supported for a repo without doing any real computation using segmented
chagnelog. This would be useful for the client to decide whether it can
use segmented changelog clone or not, instead of relying on fragile
per-repo configuration.
Reviewed By: farnz
Differential Revision: D30825920
fbshipit-source-id: 16dc5bf762da2d2b9cd808c129e1830285023f3d
Summary:
It's nice to have these functions to open source and target repos.
Previously we always had to get repo id first, and then call
open_repo_internal_with_repo_id
Reviewed By: yancouto
Differential Revision: D30866314
fbshipit-source-id: dd74822da755de232f4701f8523088e0bb612cb9
Summary:
D30829928 (fd03bff2e2) updated some of Mononoke's integration tests to take into
account whitespace changes. However, it removed the globs from some parts of
the tests.
As the assigned port changes on each test run, the globs are required. Add them
back in again, as well as fix up some whitespace in a test.
Reviewed By: markbt
Differential Revision: D30866884
fbshipit-source-id: 1557eee2143a2459a6412b8649e7e3dce5a607a4
Summary:
It's nice to have that can quickly count and print stats about a commit. I'm
using it now to understand performance of derived data.
Reviewed By: ahornby
Differential Revision: D30865267
fbshipit-source-id: 26b91c3c05a1c417015b5be228796589348bf064
Summary:
`rust_include_srcs` is supported on `thrift_library` as a way of including other Rust code in the generated crate, generally used to implement other traits on the generated types.
Adding support for this in autocargo by copying these files into the output dir and making sure their option is specified to the thrift compiler
Reviewed By: ahornby
Differential Revision: D30789835
fbshipit-source-id: 325cb59fdf85324dccfff20a559802c11816769f
Summary:
Add impls for Layer for Box/Arc<L: Layer> and <dyn Layer>. Also a pile of other
updates in git which haven't been published to crates.io yet, including proper
level filtering of trace events being fed into log.
Reviewed By: dtolnay
Differential Revision: D30829927
fbshipit-source-id: c01c9369222df2af663e8f8bf59ea78ee12f7866
Summary:
Bump all the versions on crates.io to highest to make migration to github
versions in next diff work.
Reviewed By: dtolnay
Differential Revision: D30829928
fbshipit-source-id: 09567c26f275b3b1806bf8fd05417e91f04ba2ef
Summary:
We don't need to pass the bubble id to the server, it can find it from the changeset id.
This fixes a TODO I added previously, and should make the `restore` command complete.
Reviewed By: ahornby
Differential Revision: D30609423
fbshipit-source-id: d1c8eb0e0556069fa408520a0aea91a0f865fbe1
Summary:
This diff adds an endpoint `/download/file` that allows to download a file given an upload token.
This will be used for snapshots, as we need to download the snapshot changes, and there's no way to do that right now.
Other options, and why I didn't do them:
- Using the existing `/files` endpoint: Not possible, as it needs hg filenodes and we don't have those.
- Returning the file contents in the fetch_snapshot request: Might make the response too big
- Returning just a single Bytes instead of a stream: I thought streaming would be preferred, and more future proof. In the stack I still put everything in memory in the client, but maybe in the future it should be possible to stream it directly to the file. I'm happy to remove if preferred, though.
Reviewed By: StanislavGlebik
Differential Revision: D30582411
fbshipit-source-id: f9423bc42867402d380e831bc45d3ce3b3825434