Commit Graph

246 Commits

Author SHA1 Message Date
Alex Hornby
e3f3849d0b mononoke: add enum representing the types the walker interns
Summary: Add a new enum representing the types the walker interns so that we can optionally clear them between chunks. This change adds the enum and command line parsing, actual clearing follows in the next diff.

Reviewed By: krallin

Differential Revision: D25910622

fbshipit-source-id: 0226b4009bf8199498e21e52f734a9529ee7afaa
2021-01-27 01:15:24 -08:00
Daniel Xu
5715e58fce Add version specificiation to internal dependencies
Summary:
Lots of generated code in this diff. Only code change was in
`common/rust/cargo_from_buck/lib/cargo_generator.py`.

Path/git-only dependencies (ie `mydep = { path = "../foo/bar" }`) are not
publishable to crates.io. However, we are allowed to specify both a path/git
_and_ a version. When building locally, the path/git is chosen. When publishing,
the version on crates.io is chosen.

See https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#multiple-locations .

Note that I understand that not all autocargo projects are published on crates.io (yet).
The point of this diff is to allow projects to slowly start getting uploaded.
The end goal is autocargo generated `Cargo.toml`s that can be `cargo publish`ed
without further modification.

Reviewed By: lukaspiatkowski

Differential Revision: D26028982

fbshipit-source-id: f7b4c9d4f4dd004727202bd98ab10e201a21e88c
2021-01-25 22:10:24 -08:00
Alex Hornby
217c07c65a mononoke: introduce tail params to walker
Summary:
We are going to pass more params into tail.rs as part of being able to clear state between chunks.

This prepares by creating a struct for them and adding command line args for include/exclude node types to clear.

Reviewed By: krallin

Differential Revision: D25910615

fbshipit-source-id: 610a884c17da7af1e23cfa81d4f495fe03bad9a3
2021-01-25 09:11:07 -08:00
Thomas Orozco
4dd3461824 third-party/rust: update Tokio 0.2.x to 0.2.24 & futures 1.x to 1.30
Summary:
When we tried to update to Tokio 0.2.14, we hit lots of hangs. Those were due
to incompatibilities between Tokio 0.2.14 and Futures 1.29. We fixed some of
the bugs (and others had been fixed and were pending a release), and Futures
1.30 have now been released, which unblocks our update.

This diff updates Tokio accordingly (the previous diff in the stack fixes an
incompatibility).

The underlying motivation here is to ease the transition to Tokio 1.0.
Ultimately we'll be pulling in those changes one or way or another, so let's
get started on this incremental first step.

Reviewed By: farnz

Differential Revision: D25952428

fbshipit-source-id: b753195a1ffb404e0b0975eb7002d6d67ba100c2
2021-01-25 08:06:55 -08:00
Alex Hornby
5285dd1b3c mononoke: move enum ScrubAction to multiplexedblob
Summary: This doesn't need to be in metaconfig anymore, can move it to multiplexedblob

Reviewed By: krallin

Differential Revision: D25928061

fbshipit-source-id: 8aa6ce6aafa16f84730cf388ebf7eab6d5bf2c53
2021-01-22 06:04:54 -08:00
Alex Hornby
80fe031070 mononoke: walk hg filenodes in chunks
Summary:
The test added in previous diff showed that hg filenodes weren't being deferred between chunks in the expected way.

This is because we can't tell if a hg filenode is in a given chunk until it is loaded. This is similar to unodes, but the linked changeset in this case isHgChangesetId rather than the bonsai ChangesetId, so this change introduces hg_to_bcs mapping in the walker state, which is used for looking up whether the filenodes linked HgChangesetId is in the chunk, and if not defers the edge.

Reviewed By: krallin

Differential Revision: D25742276

fbshipit-source-id: 1f92452d012aab5b9fdf29f43fc05ebc043b2c7a
2021-01-22 03:13:30 -08:00
Alex Hornby
2d7459e61b mononoke: walk hg in chunks
Summary:
Add test for walking hg non-filenode data in chunks.  Expect some deferred edges to next chunk as parents point into history.

Done by deferring hg_changeset_via_bonsai_step if the bonsai is outside range of the chunk

Reviewed By: krallin

Differential Revision: D25742288

fbshipit-source-id: 385c9261151d10f7a7029f86ec10470226fc993c
2021-01-22 03:13:30 -08:00
Alex Hornby
a6adb65083 mononoke: walk deleted file manifest in chunks
Summary:
Add test to walk deleted manifests in chunks, no deferring expected as these manifests don't point into history.

Test showed was missing handling for this manifest type in chunking so fixed it.

Reviewed By: krallin

Differential Revision: D25742285

fbshipit-source-id: 5411f904510f9b4fd9028c7d0dde6c652a784796
2021-01-22 03:13:29 -08:00
Alex Hornby
929522eeae mononoke: walk unode in deferred chunks
Summary:
For Unodes we can't determine before loading them whether they fall within the current chunk as the linked changesetid value is not visibile until the step is executed.

This change adds the ability to defer an already executing step and uses it for unode to defer if its linked changeset is not in the chunk being processed

Deferred edges are stored in the walker state, and are checked on each chunk so that any deferred edges can be run

Reviewed By: StanislavGlebik

Differential Revision: D25742280

fbshipit-source-id: 8a0e7d96b8bf10889bf5e83fe4bee829a1a5cb4c
2021-01-22 03:13:28 -08:00
Alex Hornby
9e9975dd8a mononoke: add enum for walker step output
Summary: Add an enum for the walker step output in preparation for adding a Deferred variant to it in next diff

Reviewed By: StanislavGlebik

Differential Revision: D25742293

fbshipit-source-id: 6aabacb1cd39d16f4d36998908048fd2a10eba4d
2021-01-22 03:13:28 -08:00
Alex Hornby
6c8199cea6 mononoke: walk changeset_info in chunks
Summary: Allow scrubbing of ChangesetInfo in chunks of public commit ids

Reviewed By: markbt

Differential Revision: D25742286

fbshipit-source-id: a5e2faed16eb60c5b7054261a74595a945e68c15
2021-01-22 03:13:28 -08:00
Alex Hornby
cc3eb431b6 mononoke: walk repos by public changeset chunks
Summary:
For large repos is is desirable to walk them in chunks as a prerequisite for being able to clear state caches to reduced memory usage between chunks and to checkpoint between chunks so that an interrupted scrub can resume.

Chunks are fetched from the repo bounds of changeset id in newestfirst order,  this means that we scrub newest data first.  Any edges discovered from the walk that point outside the chunk are deferred until the later chunk that covers them.

This change adds chunking and tests if for core bonsai data, following diffs add it for other types.

Reviewed By: StanislavGlebik

Differential Revision: D25742295

fbshipit-source-id: b989abdf2ca367cf9b10f45d9f932eba55ee6dae
2021-01-22 03:13:27 -08:00
Alex Hornby
8490ead952 mononoke: add options to walk repos by public changeset chunks
Summary: New command line args to allow scrubbing a repo in chunks of N changesets.  Used in a later diff.

Reviewed By: StanislavGlebik

Differential Revision: D25742282

fbshipit-source-id: 4bcf74d26f8c2863c6e96f25eca69e01f9c2c0d5
2021-01-22 03:13:27 -08:00
Alex Hornby
e5307b654d mononoke: include root count in walker queue stats
Summary:
The main thing this change does is make sure pending roots to visit are represented in the difference between Walked and Children.  Children is the sum of all child nodes discovered, both visited and unvisited.  Walked is a measure of number of nodes visited.  Children-Walked is used as a measure of queue depth remaining to be processed.

When not chunking this is a minor issue as usually just one bookmark root node is not counted in children, but when chunking not counting the roots means mean the chunk of several 100000 roots is not visible as waiting to be processed.

Reviewed By: StanislavGlebik

Differential Revision: D25852526

fbshipit-source-id: df5f21a37be152f0baee40d33fd7dfb7aaa763de
2021-01-22 03:13:27 -08:00
Alex Hornby
4c99a8e49a mononoke: fix divide by zero if walk is really fast to complete
Summary: If progress is logged less than one millisecond apart it gives a divide by zero.  Fix it.

Reviewed By: farnz

Differential Revision: D25997768

fbshipit-source-id: 65dcba2dc7a789540a8e4fce6aeca0ee9668895d
2021-01-21 10:01:00 -08:00
Alex Hornby
94010f8539 mononoke: remove BlobConfig::Scrub variant
Summary:
Scrub was being enabled by mutating the BlobConfig into the BlobConfig::Scrub.

This change removes the BlobConfig::Scrub variant as its not present in the thrift config any more, so there is not need for it to exist. Scrubbing is an optional mode of multiplexed store construction, so instead add ScrubOptions to BlobstoreOptions.

Reviewed By: StanislavGlebik

Differential Revision: D25927253

fbshipit-source-id: 29fceae59be8f7068300a7b8b298c6edbe4da04b
2021-01-20 02:31:36 -08:00
Radu Szasz
5fb5d23ec8 Make tokio-0.2 include test-util feature
Summary:
This feature is useful for testing time-dependent stuff (e.g. it
allows you to stop/forward time). It's already included in the buck build.

Reviewed By: SkyterX

Differential Revision: D25946732

fbshipit-source-id: 5e7b69967a45e6deaddaac34ba78b42d2f2ad90e
2021-01-18 10:38:08 -08:00
Alex Hornby
bd4c512f2c mononoke: fix walker stats delta per s in last log of run
Summary:
Include changes since the last progress update (if any) in the final delta and total time logged.

When not chunking this was a minor inaccuracy, but when chunking in small chunks it would mean a lot of stats for each chunk were missing as if a chunk took less than 5 seconds there might not even have been a progress update before this end state.

Reviewed By: StanislavGlebik

Differential Revision: D25852274

fbshipit-source-id: 6cea76e3abd37908475052947794eed442a1ac82
2021-01-15 03:13:28 -08:00
Alex Hornby
8542996b64 mononoke: include root types in walker reachable_graph_elements
Summary:
When the root type is e.g. Changeset allow edges from ChangesetToX

This is important for chunking,  doing it independently to check it doesn't break non-chunked behaviour.

Reviewed By: krallin

Differential Revision: D25742294

fbshipit-source-id: ea4c989e9f61b30094d0fd83e543fe14a38254fd
2021-01-15 03:13:28 -08:00
Alex Hornby
1d1c2a293f mononoke: update walker loaded count log ready for chunking
Summary:
Calling the count Final doesn't make sense when chunking as it can appear more than once per tail, so update it to something more appropriate.

Seen is the count of Nodes seen in the stream. Loaded is the count of nodes where NodeData was loaded.  When chunking some deferred Nodes will be seen but not Loaded.

Reviewed By: krallin

Differential Revision: D25742283

fbshipit-source-id: 1f10007d94ad2dbd750bfa53bab3e46a2caad7fa
2021-01-15 03:13:27 -08:00
Alex Hornby
5f13fe1dec mononoke: make walker logging optional via tags
Summary: Add ability to turn on or off various types of logging,  allows removing boilerplate in test cases.

Reviewed By: krallin

Differential Revision: D25742284

fbshipit-source-id: bbbe7f477156fc49ff6779f9a09e1b397ff6f618
2021-01-15 03:13:27 -08:00
Alex Hornby
cbe292e69e mononoke: remove walker root logging
Summary: This logging was always being globbed away, so remove it

Reviewed By: krallin

Differential Revision: D25742292

fbshipit-source-id: 75e004d3fdadc617f479beee44999692c267d2a9
2021-01-15 03:13:27 -08:00
Alex Hornby
2d0b7db627 mononoke: allow cmdlib init_logging to return a Result
Summary: Allow us to return arg parsing errors rather than panicing

Reviewed By: krallin

Differential Revision: D25837626

fbshipit-source-id: 87e39de140b1dcd3b13a529602fdafc31233175d
2021-01-14 09:52:40 -08:00
Simon Farnsworth
b4a234bbe5 convert changesets to new type futures
Summary: Convert `Changsets` trait and all its uses to new type futures

Reviewed By: krallin

Differential Revision: D25638875

fbshipit-source-id: 947423e2ee47a463861678b146641bcc6b899a4a
2021-01-06 07:11:36 -08:00
Daniel Xu
1e78d023e7 Update regex to v1.4.2
Summary: Update so libbpf-cargo doens't need to downgrade regex version.

Reviewed By: kevin-vigor

Differential Revision: D25719327

fbshipit-source-id: 5781871a359f744e2701a34df1931f0c37958c27
2020-12-29 22:59:52 -08:00
Alex Hornby
8d7020f07d mononoke: many repos at once in the walker
Summary: Will reduce the number of jobs needed for small repos

Reviewed By: StanislavGlebik

Differential Revision: D25492059

fbshipit-source-id: de11c06615857ad43f3337e58973849d2026a114
2020-12-23 02:08:22 -08:00
Alex Hornby
92ce1e74d2 mononoke: allow passing of repo name through walker
Summary: preparation for multi repo, get the repo name into ErrorKind::NotTraversable

Reviewed By: StanislavGlebik

Differential Revision: D25541444

fbshipit-source-id: 8fd99d5d3f144d8a3a72c7c33205ae58bd5f1ae2
2020-12-23 02:08:22 -08:00
Alex Hornby
c71c9c1459 mononoke: factor out per repo params in walker
Summary:
In preparation for having the walker able to scrub multiple repos at once, define parameter structs.  This also simplifies the code in tail.rs.

The param objects are:

* RepoSubcommandParams - per repo params that can be setup in setup_common and are consumed in the subcommand.  They don't get passed through to the walk

* RepoWalkParams - per repo params that can be setup in setup_common and will get passed all the way into the walk.rs methods

* JobWalkParams - per job params that at can be setup in setup_common and will get passed all the way into the walk.rs methods

* TypeWalkParams - per repo params that need to be setup in the subcommands, and are passed all the way into walk.rs

Reviewed By: StanislavGlebik

Differential Revision: D25524256

fbshipit-source-id: bfc8e087e386b6ed45121908b48b6535f65debd3
2020-12-23 02:08:22 -08:00
Alex Hornby
a36ca7efa8 mononoke: factor out arg parsing functions in walker
Summary: parsing of progress options an sampling options was same in each subcommand,  move them to functions in setup.rs

Reviewed By: StanislavGlebik

Differential Revision: D25524255

fbshipit-source-id: a2f48814f24aa9b3a158cb7d4abbfc2c0c338305
2020-12-23 02:08:22 -08:00
Alex Hornby
45d9b20949 mononoke: simplify open_blobrepo_given_datasources parameters
Summary: Simplify open_blobrepo_given_datasources  parameters to pass less arguments, make it so can pass the sql_factory by reference.

Reviewed By: krallin

Differential Revision: D25524254

fbshipit-source-id: c324127f42c53a52f388d303e310014f4fa0d7bb
2020-12-23 02:08:22 -08:00
Alex Hornby
ef501ba297 mononoke: update walker setup_common to async fn
Summary: Shifts things left a bit

Reviewed By: krallin

Differential Revision: D25524257

fbshipit-source-id: ab0979b7e5370c1ad142ecabc1d27fea549a3342
2020-12-23 02:08:22 -08:00
Alex Hornby
9985458fa1 mononoke: prepare walker blobstore for multiple repo jobs
Summary: Allows the walker blobstore code to be used by more than one blobrepo.  This is a step to reduce the number of jobs needed to scrub small repos.

Reviewed By: StanislavGlebik

Differential Revision: D25422937

fbshipit-source-id: e2d11239f172f50680bb6e10dd60026c9e6c3c3d
2020-12-23 02:08:22 -08:00
Alex Hornby
a7658c112e mononoke: step to hg changesets via bonsai in walker
Summary:
By doing the hg to hg steps via bonsai I will later introduce a check if the bonsai is in the current chunk of commits to be processed as part of allowing walker checkpoint and restart.

On its own this is a minor change to the number of nodes the walk will cover as seen in the updated tests.

Reviewed By: krallin

Differential Revision: D25394085

fbshipit-source-id: 3e50cf76c7032635ce9e6a7375228979b2e9c930
2020-12-23 02:08:21 -08:00
Alex Hornby
422774f46d mononoke: track if hg is derived in walker bonsai steps
Summary: This is in preparation for all walker hg to hg steps (e.g HgChangeset to Parent HgChangeset) going via Bonsai, which without this would continually check if the filenodes are derived

Reviewed By: krallin

Differential Revision: D25394086

fbshipit-source-id: bb75e7ddf5b09f9d13a0f436627f4c3c95e24430
2020-12-23 02:08:21 -08:00
Aida Getoeva
e9f3284b5b mononoke/mysql: make mysql options not copyable
Summary:
In the next diff I'm going to add Mysql connection object to `MysqlOptions` in order to pass it down from `MononokeAppData` to the code that works with sql.
This change will make MysqlOptions un-copyable.

This diff fixed all issues produced by the change.

Reviewed By: ahornby

Differential Revision: D25590772

fbshipit-source-id: 440ae5cba3d49ee6ccd2ff39a93829bcd14bb3f1
2020-12-17 15:46:30 -08:00
Pavel Aslanov
0fc5c3aca7 convert BlobRepoHg to new type futures
Summary: Convert all BlobRepoHg methods to new type futures

Reviewed By: StanislavGlebik

Differential Revision: D25471540

fbshipit-source-id: c8e99509d39d0e081d082097cbd9dbfca431637e
2020-12-17 07:45:26 -08:00
Alex Hornby
b94b1ba21e mononoke: add "all" as an option for walker NodeType and EdgeType args
Summary: This makes it easier to run full walks on small repos.

Reviewed By: StanislavGlebik

Differential Revision: D25469485

fbshipit-source-id: 6e5b1426837a396d939e47a5b353e615437ae7cb
2020-12-15 00:48:03 -08:00
Mark Juggurnauth-Thomas
73cdac45e3 derived_data: use new derived data configuration format
Summary:
Change derived data config to have "enabled" config and "backfilling" config.

The `Mapping` object has the responsibility of encapsulating the configuration options
for the derived data type.  Since it is only possible to obtain a `Mapping` from
appropriate configuration, ownership of a `Mapping` means derivation is permitted,
and so the `DeriveMode` enum is removed.

Most callers will use `BonsaiDerived::derive`, or a default `derived_data_utils` implementation
that requires the derived data to be enabled and configured on the repo.

Backfillers can additionally use `derived_data_utils_for_backfill` which will use the
`backfilling` configuration in preference to the default configuration.

Reviewed By: ahornby

Differential Revision: D25246317

fbshipit-source-id: 352fe6509572409bc3338dd43d157f34c73b9eac
2020-12-14 09:24:58 -08:00
Mark Juggurnauth-Thomas
9e1b1448e6 derived_data: split BonsaiDerived trait
Summary:
The `BonsaiDerived` trait is split in two:

* The new `BonsaiDerivable` trait encapsulates the process of deriving the data, either
  a single item from its parents, or a batch.
* The `BonsaiDerived` trait is used only as an entry point for deriving with the default
  mapping and config.

This split will allow us to use `BonsaiDerivable` in batch backfilling with non-default
config, for example when backfilling a new version of a derived data type.

Reviewed By: krallin

Differential Revision: D25371964

fbshipit-source-id: 5874836bc06c18db306ada947a690658bf89723c
2020-12-14 09:24:57 -08:00
Alex Hornby
f2ca14b1bf mononoke: add hg derived data shortcut to walker node types
Summary: Could already specify "bonsai" useful to be able to pass "hg".

Reviewed By: farnz

Differential Revision: D25367322

fbshipit-source-id: aca6d22f98394af49e3d94d5fd533bc9a25a6869
2020-12-11 01:36:06 -08:00
Alex Hornby
22d77348cd mononoke: remove unnecessary static lifetime in walker constants
Summary: also makes ERROR_MSG a constant

Reviewed By: farnz

Differential Revision: D25422756

fbshipit-source-id: e2f2b9122e2b90c7cb07b7d64156055d55c8c653
2020-12-11 01:36:06 -08:00
Alex Hornby
8d997846e3 mononoke: remove fsnode from default walker params
Summary: Switching to specify derived data types other than hg explicitly on the command line

Reviewed By: farnz

Differential Revision: D25367323

fbshipit-source-id: 0e0aea1aab46b43b325486ed6161ea322f7cec4b
2020-12-10 05:28:45 -08:00
Alex Hornby
28d4471f75 mononoke: no need to collect walker iterators
Summary: Can just pass on the iterator

Reviewed By: ikostia

Differential Revision: D25216892

fbshipit-source-id: 79c08737477ac7ed1f824c50105d5977ee592126
2020-12-04 03:07:05 -08:00
Alex Hornby
591363e1c4 mononoke: allow binaries to specify a default for cachelib-only-blobstore
Summary: Reduces boilerplate for binaries usually run in this mode, notably the walker

Reviewed By: ikostia

Differential Revision: D25216883

fbshipit-source-id: e31d2a6aec7da3baafd8bcf208cf79cc696752c0
2020-12-04 03:07:04 -08:00
Alex Hornby
54bda6537d mononoke: allow binaries to default a blobstore read qps
Summary: This is useful to prevent accidentally consuming too much.  Enabled it for the walker

Reviewed By: ikostia

Differential Revision: D25216880

fbshipit-source-id: e80f490d6ece40d64cc8609e7d6b80d0ecbb1671
2020-12-04 03:07:04 -08:00
Alex Hornby
f814075cee mononoke: allow binaries to default blobstore-cachelib-attempt-zstd option
Summary: Reduces boiler plate on command line for binaries like walker that want different default

Reviewed By: krallin

Differential Revision: D25216876

fbshipit-source-id: 0df474568d28e0726be223e9dc0a760523063d21
2020-12-04 03:07:04 -08:00
Alex Hornby
b458ae4217 mononoke: remove --readonly-storage from walker test cmdlines
Summary: Remove this now it is the walker default.  Makes command lines shorter

Reviewed By: ikostia

Differential Revision: D25219551

fbshipit-source-id: bc5ad4237cad35218a0b4c54aa81eb20edb3f3e1
2020-12-02 07:27:24 -08:00
Alex Hornby
99fb41c5bd mononoke: allow binaries to default readonly-storage option
Summary:
This will reduce boilerplate command line for the walker, as most of the time we want to run it with readonly storage

Because the existing --readonly-storage flag can't take a value this introduces a new --with-readonly-storage=<true|false> option

Reviewed By: krallin

Differential Revision: D25216871

fbshipit-source-id: e1b83b428a9c3787f48c18fd396d23ac95991b77
2020-12-02 07:27:23 -08:00
Alex Hornby
935a7ddfc8 mononoke: remove the need to pass in cachelib settings twice
Summary:
Previously needed to pass in cachelib settings once to MononokeAppBuilder and once to parse_and_init_cachelib.

This change adds a MononokeClapApp and MononokeMatches that preserve the settings, thus preventing the need to pass them in twice (and thus avoiding possible inconsistency)

MononokeMatches uses MaybeOwned to hold the inner ArgMatches, which allows us to hold both the usual reference case from get_matches and an owned case for get_matches_from which is used in test cases.

Reviewed By: krallin

Differential Revision: D24788450

fbshipit-source-id: aad5fff2edda305177dcefa4b3a98ab99bc2d811
2020-12-02 07:27:23 -08:00
Alex Hornby
31bcf94df7 mononoke: set a default cache_size in the walker
Summary: Shorten command lines by setting a default in code.

Reviewed By: ikostia

Differential Revision: D24761025

fbshipit-source-id: 13deb1622ee1b97135ee787f6b6ffeed2f05813b
2020-12-02 07:27:23 -08:00