Summary: Add a new enum representing the types the walker interns so that we can optionally clear them between chunks. This change adds the enum and command line parsing, actual clearing follows in the next diff.
Reviewed By: krallin
Differential Revision: D25910622
fbshipit-source-id: 0226b4009bf8199498e21e52f734a9529ee7afaa
Summary:
Lots of generated code in this diff. Only code change was in
`common/rust/cargo_from_buck/lib/cargo_generator.py`.
Path/git-only dependencies (ie `mydep = { path = "../foo/bar" }`) are not
publishable to crates.io. However, we are allowed to specify both a path/git
_and_ a version. When building locally, the path/git is chosen. When publishing,
the version on crates.io is chosen.
See https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#multiple-locations .
Note that I understand that not all autocargo projects are published on crates.io (yet).
The point of this diff is to allow projects to slowly start getting uploaded.
The end goal is autocargo generated `Cargo.toml`s that can be `cargo publish`ed
without further modification.
Reviewed By: lukaspiatkowski
Differential Revision: D26028982
fbshipit-source-id: f7b4c9d4f4dd004727202bd98ab10e201a21e88c
Summary:
We are going to pass more params into tail.rs as part of being able to clear state between chunks.
This prepares by creating a struct for them and adding command line args for include/exclude node types to clear.
Reviewed By: krallin
Differential Revision: D25910615
fbshipit-source-id: 610a884c17da7af1e23cfa81d4f495fe03bad9a3
Summary:
When we tried to update to Tokio 0.2.14, we hit lots of hangs. Those were due
to incompatibilities between Tokio 0.2.14 and Futures 1.29. We fixed some of
the bugs (and others had been fixed and were pending a release), and Futures
1.30 have now been released, which unblocks our update.
This diff updates Tokio accordingly (the previous diff in the stack fixes an
incompatibility).
The underlying motivation here is to ease the transition to Tokio 1.0.
Ultimately we'll be pulling in those changes one or way or another, so let's
get started on this incremental first step.
Reviewed By: farnz
Differential Revision: D25952428
fbshipit-source-id: b753195a1ffb404e0b0975eb7002d6d67ba100c2
Summary: This doesn't need to be in metaconfig anymore, can move it to multiplexedblob
Reviewed By: krallin
Differential Revision: D25928061
fbshipit-source-id: 8aa6ce6aafa16f84730cf388ebf7eab6d5bf2c53
Summary:
The test added in previous diff showed that hg filenodes weren't being deferred between chunks in the expected way.
This is because we can't tell if a hg filenode is in a given chunk until it is loaded. This is similar to unodes, but the linked changeset in this case isHgChangesetId rather than the bonsai ChangesetId, so this change introduces hg_to_bcs mapping in the walker state, which is used for looking up whether the filenodes linked HgChangesetId is in the chunk, and if not defers the edge.
Reviewed By: krallin
Differential Revision: D25742276
fbshipit-source-id: 1f92452d012aab5b9fdf29f43fc05ebc043b2c7a
Summary:
Add test for walking hg non-filenode data in chunks. Expect some deferred edges to next chunk as parents point into history.
Done by deferring hg_changeset_via_bonsai_step if the bonsai is outside range of the chunk
Reviewed By: krallin
Differential Revision: D25742288
fbshipit-source-id: 385c9261151d10f7a7029f86ec10470226fc993c
Summary:
Add test to walk deleted manifests in chunks, no deferring expected as these manifests don't point into history.
Test showed was missing handling for this manifest type in chunking so fixed it.
Reviewed By: krallin
Differential Revision: D25742285
fbshipit-source-id: 5411f904510f9b4fd9028c7d0dde6c652a784796
Summary:
For Unodes we can't determine before loading them whether they fall within the current chunk as the linked changesetid value is not visibile until the step is executed.
This change adds the ability to defer an already executing step and uses it for unode to defer if its linked changeset is not in the chunk being processed
Deferred edges are stored in the walker state, and are checked on each chunk so that any deferred edges can be run
Reviewed By: StanislavGlebik
Differential Revision: D25742280
fbshipit-source-id: 8a0e7d96b8bf10889bf5e83fe4bee829a1a5cb4c
Summary: Add an enum for the walker step output in preparation for adding a Deferred variant to it in next diff
Reviewed By: StanislavGlebik
Differential Revision: D25742293
fbshipit-source-id: 6aabacb1cd39d16f4d36998908048fd2a10eba4d
Summary: Allow scrubbing of ChangesetInfo in chunks of public commit ids
Reviewed By: markbt
Differential Revision: D25742286
fbshipit-source-id: a5e2faed16eb60c5b7054261a74595a945e68c15
Summary:
For large repos is is desirable to walk them in chunks as a prerequisite for being able to clear state caches to reduced memory usage between chunks and to checkpoint between chunks so that an interrupted scrub can resume.
Chunks are fetched from the repo bounds of changeset id in newestfirst order, this means that we scrub newest data first. Any edges discovered from the walk that point outside the chunk are deferred until the later chunk that covers them.
This change adds chunking and tests if for core bonsai data, following diffs add it for other types.
Reviewed By: StanislavGlebik
Differential Revision: D25742295
fbshipit-source-id: b989abdf2ca367cf9b10f45d9f932eba55ee6dae
Summary: New command line args to allow scrubbing a repo in chunks of N changesets. Used in a later diff.
Reviewed By: StanislavGlebik
Differential Revision: D25742282
fbshipit-source-id: 4bcf74d26f8c2863c6e96f25eca69e01f9c2c0d5
Summary:
The main thing this change does is make sure pending roots to visit are represented in the difference between Walked and Children. Children is the sum of all child nodes discovered, both visited and unvisited. Walked is a measure of number of nodes visited. Children-Walked is used as a measure of queue depth remaining to be processed.
When not chunking this is a minor issue as usually just one bookmark root node is not counted in children, but when chunking not counting the roots means mean the chunk of several 100000 roots is not visible as waiting to be processed.
Reviewed By: StanislavGlebik
Differential Revision: D25852526
fbshipit-source-id: df5f21a37be152f0baee40d33fd7dfb7aaa763de
Summary: If progress is logged less than one millisecond apart it gives a divide by zero. Fix it.
Reviewed By: farnz
Differential Revision: D25997768
fbshipit-source-id: 65dcba2dc7a789540a8e4fce6aeca0ee9668895d
Summary:
Scrub was being enabled by mutating the BlobConfig into the BlobConfig::Scrub.
This change removes the BlobConfig::Scrub variant as its not present in the thrift config any more, so there is not need for it to exist. Scrubbing is an optional mode of multiplexed store construction, so instead add ScrubOptions to BlobstoreOptions.
Reviewed By: StanislavGlebik
Differential Revision: D25927253
fbshipit-source-id: 29fceae59be8f7068300a7b8b298c6edbe4da04b
Summary:
This feature is useful for testing time-dependent stuff (e.g. it
allows you to stop/forward time). It's already included in the buck build.
Reviewed By: SkyterX
Differential Revision: D25946732
fbshipit-source-id: 5e7b69967a45e6deaddaac34ba78b42d2f2ad90e
Summary:
Include changes since the last progress update (if any) in the final delta and total time logged.
When not chunking this was a minor inaccuracy, but when chunking in small chunks it would mean a lot of stats for each chunk were missing as if a chunk took less than 5 seconds there might not even have been a progress update before this end state.
Reviewed By: StanislavGlebik
Differential Revision: D25852274
fbshipit-source-id: 6cea76e3abd37908475052947794eed442a1ac82
Summary:
When the root type is e.g. Changeset allow edges from ChangesetToX
This is important for chunking, doing it independently to check it doesn't break non-chunked behaviour.
Reviewed By: krallin
Differential Revision: D25742294
fbshipit-source-id: ea4c989e9f61b30094d0fd83e543fe14a38254fd
Summary:
Calling the count Final doesn't make sense when chunking as it can appear more than once per tail, so update it to something more appropriate.
Seen is the count of Nodes seen in the stream. Loaded is the count of nodes where NodeData was loaded. When chunking some deferred Nodes will be seen but not Loaded.
Reviewed By: krallin
Differential Revision: D25742283
fbshipit-source-id: 1f10007d94ad2dbd750bfa53bab3e46a2caad7fa
Summary: Add ability to turn on or off various types of logging, allows removing boilerplate in test cases.
Reviewed By: krallin
Differential Revision: D25742284
fbshipit-source-id: bbbe7f477156fc49ff6779f9a09e1b397ff6f618
Summary: This logging was always being globbed away, so remove it
Reviewed By: krallin
Differential Revision: D25742292
fbshipit-source-id: 75e004d3fdadc617f479beee44999692c267d2a9
Summary: Allow us to return arg parsing errors rather than panicing
Reviewed By: krallin
Differential Revision: D25837626
fbshipit-source-id: 87e39de140b1dcd3b13a529602fdafc31233175d
Summary: Convert `Changsets` trait and all its uses to new type futures
Reviewed By: krallin
Differential Revision: D25638875
fbshipit-source-id: 947423e2ee47a463861678b146641bcc6b899a4a
Summary: Will reduce the number of jobs needed for small repos
Reviewed By: StanislavGlebik
Differential Revision: D25492059
fbshipit-source-id: de11c06615857ad43f3337e58973849d2026a114
Summary: preparation for multi repo, get the repo name into ErrorKind::NotTraversable
Reviewed By: StanislavGlebik
Differential Revision: D25541444
fbshipit-source-id: 8fd99d5d3f144d8a3a72c7c33205ae58bd5f1ae2
Summary:
In preparation for having the walker able to scrub multiple repos at once, define parameter structs. This also simplifies the code in tail.rs.
The param objects are:
* RepoSubcommandParams - per repo params that can be setup in setup_common and are consumed in the subcommand. They don't get passed through to the walk
* RepoWalkParams - per repo params that can be setup in setup_common and will get passed all the way into the walk.rs methods
* JobWalkParams - per job params that at can be setup in setup_common and will get passed all the way into the walk.rs methods
* TypeWalkParams - per repo params that need to be setup in the subcommands, and are passed all the way into walk.rs
Reviewed By: StanislavGlebik
Differential Revision: D25524256
fbshipit-source-id: bfc8e087e386b6ed45121908b48b6535f65debd3
Summary: parsing of progress options an sampling options was same in each subcommand, move them to functions in setup.rs
Reviewed By: StanislavGlebik
Differential Revision: D25524255
fbshipit-source-id: a2f48814f24aa9b3a158cb7d4abbfc2c0c338305
Summary: Simplify open_blobrepo_given_datasources parameters to pass less arguments, make it so can pass the sql_factory by reference.
Reviewed By: krallin
Differential Revision: D25524254
fbshipit-source-id: c324127f42c53a52f388d303e310014f4fa0d7bb
Summary: Allows the walker blobstore code to be used by more than one blobrepo. This is a step to reduce the number of jobs needed to scrub small repos.
Reviewed By: StanislavGlebik
Differential Revision: D25422937
fbshipit-source-id: e2d11239f172f50680bb6e10dd60026c9e6c3c3d
Summary:
By doing the hg to hg steps via bonsai I will later introduce a check if the bonsai is in the current chunk of commits to be processed as part of allowing walker checkpoint and restart.
On its own this is a minor change to the number of nodes the walk will cover as seen in the updated tests.
Reviewed By: krallin
Differential Revision: D25394085
fbshipit-source-id: 3e50cf76c7032635ce9e6a7375228979b2e9c930
Summary: This is in preparation for all walker hg to hg steps (e.g HgChangeset to Parent HgChangeset) going via Bonsai, which without this would continually check if the filenodes are derived
Reviewed By: krallin
Differential Revision: D25394086
fbshipit-source-id: bb75e7ddf5b09f9d13a0f436627f4c3c95e24430
Summary:
In the next diff I'm going to add Mysql connection object to `MysqlOptions` in order to pass it down from `MononokeAppData` to the code that works with sql.
This change will make MysqlOptions un-copyable.
This diff fixed all issues produced by the change.
Reviewed By: ahornby
Differential Revision: D25590772
fbshipit-source-id: 440ae5cba3d49ee6ccd2ff39a93829bcd14bb3f1
Summary: Convert all BlobRepoHg methods to new type futures
Reviewed By: StanislavGlebik
Differential Revision: D25471540
fbshipit-source-id: c8e99509d39d0e081d082097cbd9dbfca431637e
Summary: This makes it easier to run full walks on small repos.
Reviewed By: StanislavGlebik
Differential Revision: D25469485
fbshipit-source-id: 6e5b1426837a396d939e47a5b353e615437ae7cb
Summary:
Change derived data config to have "enabled" config and "backfilling" config.
The `Mapping` object has the responsibility of encapsulating the configuration options
for the derived data type. Since it is only possible to obtain a `Mapping` from
appropriate configuration, ownership of a `Mapping` means derivation is permitted,
and so the `DeriveMode` enum is removed.
Most callers will use `BonsaiDerived::derive`, or a default `derived_data_utils` implementation
that requires the derived data to be enabled and configured on the repo.
Backfillers can additionally use `derived_data_utils_for_backfill` which will use the
`backfilling` configuration in preference to the default configuration.
Reviewed By: ahornby
Differential Revision: D25246317
fbshipit-source-id: 352fe6509572409bc3338dd43d157f34c73b9eac
Summary:
The `BonsaiDerived` trait is split in two:
* The new `BonsaiDerivable` trait encapsulates the process of deriving the data, either
a single item from its parents, or a batch.
* The `BonsaiDerived` trait is used only as an entry point for deriving with the default
mapping and config.
This split will allow us to use `BonsaiDerivable` in batch backfilling with non-default
config, for example when backfilling a new version of a derived data type.
Reviewed By: krallin
Differential Revision: D25371964
fbshipit-source-id: 5874836bc06c18db306ada947a690658bf89723c
Summary: Could already specify "bonsai" useful to be able to pass "hg".
Reviewed By: farnz
Differential Revision: D25367322
fbshipit-source-id: aca6d22f98394af49e3d94d5fd533bc9a25a6869
Summary: also makes ERROR_MSG a constant
Reviewed By: farnz
Differential Revision: D25422756
fbshipit-source-id: e2f2b9122e2b90c7cb07b7d64156055d55c8c653
Summary: Switching to specify derived data types other than hg explicitly on the command line
Reviewed By: farnz
Differential Revision: D25367323
fbshipit-source-id: 0e0aea1aab46b43b325486ed6161ea322f7cec4b
Summary: Can just pass on the iterator
Reviewed By: ikostia
Differential Revision: D25216892
fbshipit-source-id: 79c08737477ac7ed1f824c50105d5977ee592126
Summary: Reduces boilerplate for binaries usually run in this mode, notably the walker
Reviewed By: ikostia
Differential Revision: D25216883
fbshipit-source-id: e31d2a6aec7da3baafd8bcf208cf79cc696752c0
Summary: This is useful to prevent accidentally consuming too much. Enabled it for the walker
Reviewed By: ikostia
Differential Revision: D25216880
fbshipit-source-id: e80f490d6ece40d64cc8609e7d6b80d0ecbb1671
Summary: Reduces boiler plate on command line for binaries like walker that want different default
Reviewed By: krallin
Differential Revision: D25216876
fbshipit-source-id: 0df474568d28e0726be223e9dc0a760523063d21
Summary: Remove this now it is the walker default. Makes command lines shorter
Reviewed By: ikostia
Differential Revision: D25219551
fbshipit-source-id: bc5ad4237cad35218a0b4c54aa81eb20edb3f3e1
Summary:
This will reduce boilerplate command line for the walker, as most of the time we want to run it with readonly storage
Because the existing --readonly-storage flag can't take a value this introduces a new --with-readonly-storage=<true|false> option
Reviewed By: krallin
Differential Revision: D25216871
fbshipit-source-id: e1b83b428a9c3787f48c18fd396d23ac95991b77
Summary:
Previously needed to pass in cachelib settings once to MononokeAppBuilder and once to parse_and_init_cachelib.
This change adds a MononokeClapApp and MononokeMatches that preserve the settings, thus preventing the need to pass them in twice (and thus avoiding possible inconsistency)
MononokeMatches uses MaybeOwned to hold the inner ArgMatches, which allows us to hold both the usual reference case from get_matches and an owned case for get_matches_from which is used in test cases.
Reviewed By: krallin
Differential Revision: D24788450
fbshipit-source-id: aad5fff2edda305177dcefa4b3a98ab99bc2d811