Summary:
Bounded traversal's internal book-keeping moves the futures returned from fold and unfold callbacks around while they are being queued to be scheduled. If these futures are large, then this can result in a significant portion of bounded traversal's CPU time being spent on `memcpy`ing these futures around.
This can be prevented by always boxing the futures that are returned to bounded traversal. Make this a requirement by changing the type from `impl Future<...>` to `BoxFuture<...>`.
Reviewed By: mitrandir77
Differential Revision: D26997706
fbshipit-source-id: 23a3583adc23c4e7d3607a78e82fc9d1056691c3
Summary:
Deleted files manifests are immutable, and their maps are stored in sorted
order, so we can use `SortedVectorMap` to load them more efficiently.
During derivation, maps are constructed out of order, so BTreeMap continues
to be used there.
Reviewed By: mitrandir77
Differential Revision: D26275362
fbshipit-source-id: 0d159f4f53ff3ec345ac9bd34a70018e151d4e37
Summary:
BonsaiChangesets are rarely mutated, and their maps are stored in sorted order,
so we can use `SortedVectorMap` to load them more efficiently.
In the cases where mutable maps of filechanges are needed, we can use `BTreeMap`
during the mutation and then convert them to `SortedVectorMap` to store them.
Reviewed By: mitrandir77
Differential Revision: D25615279
fbshipit-source-id: 796219c1130df5cb025952bb61002e8d2ae898f4
Summary:
Unodes are immutable, and their maps are stored in sorted order, so we can
use `SortedVectorMap` to load them more efficiently.
Reviewed By: mitrandir77
Differential Revision: D25615276
fbshipit-source-id: ff8c93e8cde50aa7d3565550c0eade38b67017ca
Summary:
ChangesetInfo is immutable and its maps are stored in sorted order, so we
can use `SortedVectorMap` to load them more efficiently.
Reviewed By: mitrandir77
Differential Revision: D25615278
fbshipit-source-id: 2078adeddc8bd049985fb4efafe3090858a1bc0a
Summary:
Thrift maps are stored in sorted order, so we can use `SortedVectorMap` to
load them more efficiently.
Reviewed By: mitrandir77
Differential Revision: D25615277
fbshipit-source-id: 509a9d63df6a6a793a4b0b2935e9ed86e7c6eaaf
Summary: Rename this benchmark to a specific name so that we can add new benchmarks.
Differential Revision: D26947362
fbshipit-source-id: a1d060ee79781aa0ead51f284517471431418659
Summary:
When generating hg changesets from git changesets we can run into a situation
where two git commits are almost completely identical with the exception of
committer field. Currently it results in us generating identical mercurial
changesets for different bonsai changesets, and this is something we do not
allow.
Let's do the same thing as hggit does i.e. record committer field in extras if
committer is different from author.
In case of ambiguities (i.e. committer extra is already set, or committer date is set but committer name is not) I opted for returning an error instead of trying to guess what the correct field should be.
Reviewed By: farnz
Differential Revision: D26867740
fbshipit-source-id: 6271e2f7ad421bec296c60ff211723c2162878c6
Summary:
AsyncVfs provides async vfs interface.
It will be used in the native checkout instead of current use case that spawns blocking tokio tasks for VFS action
Reviewed By: quark-zju
Differential Revision: D26801250
fbshipit-source-id: bb26c4fc8acac82f4b55bb3f2f3964a6d0b64014
Summary:
This diffs add a layer of indirection between fbinit and tokio, thus allowing
us to use fbinit with tokio 0.2 or tokio 1.x.
The way this works is that you specify the Tokio you want by adding it as an
extra dependency alongside `fbinit` in your `TARGETS` (before this, you had to
always include `tokio-02`).
If you use `fbinit-tokio`, then `#[fbinit::main]` and `#[fbinit::test]` get you
a Tokio 1.x runtime, whereas if you use `fbinit-tokio-02`, you get a Tokio 0.2
runtime.
This diff is big, because it needs to change all the TARGETS that reference
this in the same diff that introduces the mechanism. I also didn't produce it
by hand.
Instead, I scripted the transformation using this script: P242773846
I then ran it using:
```
{ hg grep -l "fbinit::test"; hg grep -l "fbinit::main" } | \
sort | \
uniq | \
xargs ~/codemod/codemod.py \
&& yes | arc lint \
&& common/rust/cargo_from_buck/bin/autocargo
```
Finally, I grabbed the files returned by `hg grep`, then fed them to:
```
arc lint-rust --paths-from ~/files2 --apply-patches --take RUSTFIXDEPS
```
(I had to modify the file list a bit: notably I removed stuff from scripts/ because
some of that causes Buck to crash when running lint-rust, and I also had to add
fbcode/ as a prefix everywhere).
Reviewed By: mitrandir77
Differential Revision: D26754757
fbshipit-source-id: 326b1c4efc9a57ea89db9b1d390677bcd2ab985e
Summary:
Previously we were just logging whether derivation was successful or not.
Let's start logging the error as well.
Reviewed By: krallin
Differential Revision: D26750165
fbshipit-source-id: 03983cfa7bfbd5aeeed27ffac5a7df96c05e0967
Summary:
With the upcoming rollout of C++ Manifold client, it's useful to know which blobstores are which in our Scuba log, `mononoke_blobstore_trace`.
Pass a debug name down to help
Reviewed By: krallin
Differential Revision: D26673162
fbshipit-source-id: e7c16ad217d8daf21565939a45ac82204459055a
Summary:
For dependencies V2 puts "version" as the first attribute of dependency or just after "package" if present.
Workspace section is after patch section in V2 and since V2 autoformats patch section then the third-party/rust/Cargo.toml manual entries had to be formatted manually since V1 takes it as it is.
The thrift files are to have "generated by autocargo" and not only "generated" on their first line. This diff also removes some previously generated thrift files that have been incorrectly left when the corresponding Cargo.toml was removed.
Reviewed By: ikostia
Differential Revision: D26618363
fbshipit-source-id: c45d296074f5b0319bba975f3cb0240119729c92
Summary: Done some reordering of fields in Cargo.toml, added test and doctest = false, name of the target that generated the Cargo.toml file and sorted the cratemap.
Reviewed By: ahornby
Differential Revision: D26581275
fbshipit-source-id: 4c363369438c72d43d8ccf4799f103ff092457cc
Summary:
The on demand update code we have is the most basic logic that we could have.
The main problem is that it has long and redundant write locks. This change
reduces the write lock strictly to the section that has to update the in memory
IdDag.
Updating the Dag has 3 phases:
* loading the data that is required for the update;
* updating the IdMap;
* updating the IdDag;
The Dag can function well for serving requests as long as the commits involved
have been built so we want to have easy read access to both the IdMap and the
IdDag. The IdMap is a very simple structure and because it's described as an
Arc<dyn IdMap> we push the update locking logic to the storage. The IdDag is a
complicated structure that we ask to update itself. Those functions take
mutable references. Updating the storage of the iddag to hide the complexities
of locking is more difficult. We deal with the IdDag directly by wrapping it in
a RwLock. The RwLock allows for easy read access which we expect to be the
predominant access pattern.
Updates to the dag are not completely stable so racing updates can have
conflicting results. In case of conflics one of the update processes would have
to restart. It's easier to reason about the process if we just allow one
"thread" to start an update process. The update process is locked by a sync
mutex. The "threads" that fail the race to update are asked to wait until the
ongoing update is complete. The waiters will poll on a shared future that
tracks the ongoing dag update. After the update is complete the waiters will go
back to checking if the data they have is available in the dag. It is possible
that the dag is updated in between determining that the an update is needed and
acquiring the ongoing_update lock. This is fine because the update building
process checks the state of dag before the dag and updates only what is
necessary if necessary.
Reviewed By: krallin
Differential Revision: D26508430
fbshipit-source-id: cd3bceed7e0ffb00aee64433816b5a23c0508d3c
Summary:
The earlier diffs in this stack have removed all our dependencies on the Tokio
0.1 runtime environment (so, basically, `tokio-executor` and `tokio-timer`), so
we don't need this anymore.
We do still have some deps on `tokio-io`, but this is just traits + helpers,
so this doesn't actually prevent us from removing the 0.1 runtime!
Note that we still have a few transitive dependencies on Tokio 0.1:
- async-unit uses tokio-compat
- hg depends on tokio-compat too, and we depend on it in tests
This isn't the end of the world though, we can live with that :)
Reviewed By: ahornby
Differential Revision: D26544410
fbshipit-source-id: 24789be2402c3f48220dcaad110e8246ef02ecd8
Summary:
This was a thing that was only ever used in Mononoke, and we don't think it's
usable and haven't been using it. Let's get rid of it. As-is, it won't even work
for most people due to its (indirect) dependency on Tokio 0.1.
Reviewed By: StanislavGlebik
Differential Revision: D26512243
fbshipit-source-id: faa16683f2adb20dfba43c4768486b982bc02ff9
Summary:
The changes (and fixes) needed were:
- Ignore rules that are not rust_library or thrift_library (previously only ignore rust_bindgen_library, so that binary and test dependencies were incorrectly added to Cargo.toml)
- Thrift package name to match escaping logic of `tools/build_defs/fbcode_macros/build_defs/lib/thrift/rust.bzl`
- Rearrange some attributes, like features, authors, edition etc.
- Authors to use " instead of '
- Features to be sorted
- Sort all dependencies as one instead of grouping third party and fbcode dependencies together
- Manually format certain entries from third-party/rust/Cargo.toml, since V2 formats third party dependency entries and V1 just takes them as is.
Reviewed By: zertosh
Differential Revision: D26544150
fbshipit-source-id: 19d98985bd6c3ac901ad40cff38ee1ced547e8eb
Summary:
Autocargo V2 will use a more structured format for autocargo field
with the help of `cargo_toml` crate it will be easy to deserialize and handle
it.
Also the "include" field is apparently obsolete as it is used for cargo-publish (see https://doc.rust-lang.org/cargo/reference/manifest.html#the-exclude-and-include-fields). From what I know this might be often wrong, especially if someone tries to publish a package from fbcode, then the private facebook folders might be shipped. Lets just not set it and in the new system one will be able to set it explicitly via autocargo parameter on a rule.
Reviewed By: ahornby
Differential Revision: D26339606
fbshipit-source-id: 510a01a4dd80b3efe58a14553b752009d516d651
Summary: Limit durations in derived data logging to two decimal places.
Reviewed By: krallin
Differential Revision: D26311139
fbshipit-source-id: 0f317d9c15c8425ad0591cbe7f07063c74e081b7
Summary:
For fsnodes and skeleton manifests it should be possible to allow gaps in the
commits that are backfilled. Any access to a commit in one of these gaps can
be quickly derived from the nearest ancestor that is derived. Since each
commit's derived data is independent, there is no loss from omitting them.
Add the `--gap-size` option to `backfill_derived_data`. This allows `fsnodes`,
`skeleton_manifests` and other derived data types that implement it in the
future to skip some of the commits during backfill. This will speed up
backfilling and reduced the amount of data that is stored for infrequently
accessed historical commits.
Reviewed By: StanislavGlebik
Differential Revision: D25997854
fbshipit-source-id: bf4df3f5c16a913c13f732f6837fc4c817664e29
Summary:
Lots of generated code in this diff. Only code change was in
`common/rust/cargo_from_buck/lib/cargo_generator.py`.
Path/git-only dependencies (ie `mydep = { path = "../foo/bar" }`) are not
publishable to crates.io. However, we are allowed to specify both a path/git
_and_ a version. When building locally, the path/git is chosen. When publishing,
the version on crates.io is chosen.
See https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#multiple-locations .
Note that I understand that not all autocargo projects are published on crates.io (yet).
The point of this diff is to allow projects to slowly start getting uploaded.
The end goal is autocargo generated `Cargo.toml`s that can be `cargo publish`ed
without further modification.
Reviewed By: lukaspiatkowski
Differential Revision: D26028982
fbshipit-source-id: f7b4c9d4f4dd004727202bd98ab10e201a21e88c
Summary:
When we tried to update to Tokio 0.2.14, we hit lots of hangs. Those were due
to incompatibilities between Tokio 0.2.14 and Futures 1.29. We fixed some of
the bugs (and others had been fixed and were pending a release), and Futures
1.30 have now been released, which unblocks our update.
This diff updates Tokio accordingly (the previous diff in the stack fixes an
incompatibility).
The underlying motivation here is to ease the transition to Tokio 1.0.
Ultimately we'll be pulling in those changes one or way or another, so let's
get started on this incremental first step.
Reviewed By: farnz
Differential Revision: D25952428
fbshipit-source-id: b753195a1ffb404e0b0975eb7002d6d67ba100c2
Summary:
This feature is useful for testing time-dependent stuff (e.g. it
allows you to stop/forward time). It's already included in the buck build.
Reviewed By: SkyterX
Differential Revision: D25946732
fbshipit-source-id: 5e7b69967a45e6deaddaac34ba78b42d2f2ad90e
Summary:
override_ctx() sets different SessionClass while data is derived. This should
reduce the number of entries we write to blobstore sync queue. See previous
diff for more motivation.
This diff uses override_ctx() for batch_derive() method
Reviewed By: krallin
Differential Revision: D25910465
fbshipit-source-id: b8a3e729c059cad5716b1b09bd2f1cc618273627
Summary:
We had issues with mononoke writing too much blobstore sync queue entries while
deriving data for large commits. We've contemplated a few solutions, and
decided to give this one a go.
This approach forces derive data to use Background SessionClass which has an
effect of not writing data to blobstore sync queue if the write to blobstore
was successful (it still writes data to the queue otherwise). This should
reduce the number of entries we write to the blobstore sync queue
significantly. The downside is that writes might get a bit slower - our
assumption is that this slowdown is acceptable. If that's not the case we can
always disable this option.
This diff overrides SessionClass for normal ::derive() method. However there's
also batch_derive() - this one will be addressed in the next diff.
One thing to note - we still write derived data mapping to blobstore sync queue. That should be find as we have a constant number of writes per commits.
Reviewed By: krallin
Differential Revision: D25910464
fbshipit-source-id: 4113d00bc0efe560fd14a5d4319b743d0a100dfa
Summary:
In this diff I've replaced non-transparent error definition error("{0}") with error(transparent).
The reason is non-transparent errors print the same thing as the original errors:
```
Error: failed to complete task
Caused by: other error description <-- this duplicate shouldn't be here
Caused by: other error description
```
Reviewed By: krallin
Differential Revision: D25899411
fbshipit-source-id: e586af86b635a7e2fbf8952297171c546b859300
Summary:
Depending on the thrift defition, `thrift_library` targets may also depend on `ref-cast`.
Add this to the `Cargo.toml`.
Reviewed By: lukaspiatkowski
Differential Revision: D25636872
fbshipit-source-id: 8263395db2bb31127528f5c66c4cc5dd9180d89f
Summary:
Like it says in the title. This is nice to do because we had old futures
wrapping new futures here, so this lets us get rid of a lot of cruft.
Reviewed By: ahornby
Differential Revision: D25502648
fbshipit-source-id: a34973b32880d859b25dcb6dc455c42eec4c2f94
Summary: Convert all BlobRepoHg methods to new type futures
Reviewed By: StanislavGlebik
Differential Revision: D25471540
fbshipit-source-id: c8e99509d39d0e081d082097cbd9dbfca431637e
Summary:
ChangesetInfo derivation doesn't depend on the parent ChangesetInfo being
available, so we can derive them in batches very easily.
Reviewed By: krallin
Differential Revision: D25470721
fbshipit-source-id: cc8ce305990eb6c9846158f0e9e3917cf35e169d
Summary:
Add documentation comments to the derived data crate to describe how they fit
together.
Reviewed By: krallin
Differential Revision: D25432449
fbshipit-source-id: b62440bcecae900ad75d74245ce175bd9e07a894
Summary:
The old case-conflict checks were more lenient, and only triggered if a commit
introduced a case conflict compared to its first parent.
This means that commits could still be landed to bookmarks that already had
pre-existing case conflicts.
Relax the new case-conflict checks to allow this same scenario.
Note that we're still a bit more strict: the previous checks ignored other
parents, and would not reject a commit if the act of merging introduces a case
conflict. The new case conflict checks only permit case conflicts in the case
where all conflicting files were present in one of the parents.
Reviewed By: StanislavGlebik
Differential Revision: D25508845
fbshipit-source-id: 95f4db1300ee73b8e6495ba8b5c1c2ce5a957d1a
Summary: Spotted this in passing. Was able to remove a call to fetch_root_manifest_id.
Reviewed By: StanislavGlebik
Differential Revision: D25472678
fbshipit-source-id: d450cb97630464be13d22fb37c3356611dc2e1b6
Summary:
Now that the mapping is separated from BonsaiDerivable, it becomes clear where
batch derivation is incorrectly using the default mapping, rather than the
mapping that has been provided for batch-derivation.
This could mean, for example, if we are backfilling a v2 of these derived
data types, we could accidentally base them on a v1 parent that we obtained
from the default mapping.
Instead, ensure that we don't use `BonsaiDerived` and thus can't use the
default mapping.
Reviewed By: krallin
Differential Revision: D25371963
fbshipit-source-id: fb71e1f1c4bd7a112d3099e0e5c5c7111d457cd2
Summary:
The backfiller may read or write to the blobstore too quickly. Apply QPS
limits to the backfill batch context to keep the read or write rate acceptable.
Reviewed By: ahornby
Differential Revision: D25371966
fbshipit-source-id: 276bf2dd428f7f66f7472aabd9e943eec5733afe
Summary:
When fetching many derived data mappings, the use of `FuturesUnordered` means
we may fetch many blobs concurrently, which may overload the blobstore.
Switch to using `buffered` to reduce the number of concurrent blob fetches.
Reviewed By: ahornby
Differential Revision: D25371965
fbshipit-source-id: 30417e86bc33defbb821f214a5520ab1b8a8c18c
Summary:
Re-introduce parallel backfilling of changesets in a batch using `batch_derive`,
however keep it under the control of a flag, so we can enable or disable it as
necessary.
Reviewed By: ahornby
Differential Revision: D25401207
fbshipit-source-id: f9aeef3415be48fc03220c18fa547e05538ed479
Summary:
Change derived data config to have "enabled" config and "backfilling" config.
The `Mapping` object has the responsibility of encapsulating the configuration options
for the derived data type. Since it is only possible to obtain a `Mapping` from
appropriate configuration, ownership of a `Mapping` means derivation is permitted,
and so the `DeriveMode` enum is removed.
Most callers will use `BonsaiDerived::derive`, or a default `derived_data_utils` implementation
that requires the derived data to be enabled and configured on the repo.
Backfillers can additionally use `derived_data_utils_for_backfill` which will use the
`backfilling` configuration in preference to the default configuration.
Reviewed By: ahornby
Differential Revision: D25246317
fbshipit-source-id: 352fe6509572409bc3338dd43d157f34c73b9eac
Summary:
Currently, data derivation for types that have options (currrently unode
version and blame filesize limit) take the value of the option from the
repository configuration.
This is a side-effect, and means it's not possible to have data derivation
types with different configs active in the same repository (e.g. to
server unodes v1 while backfilling unodes v2). To have data derivation
with different options, e.g. in tests, we must use `repo.dangerous_override`.
The first step to resolve this is to make the data derivation options a parameter.
Depending on the type of derived data, these options are passed into
`derive_from_parents` so that the right kind of derivation can happen.
The mapping is responsible for storing the options and providing it at the time
of derivation. In this diff it just gets it from the repository config, the same
as was done previously. In a future diff we will change this so that there
can be multiple configurations.
Reviewed By: krallin
Differential Revision: D25371967
fbshipit-source-id: 1cf4c06a4598fccbfa93367fc1f1c2fa00fd8235
Summary: Take the parameters to `derived_data_utils` and `derived_data_utils_unsafe` by reference.
Reviewed By: krallin
Differential Revision: D25371970
fbshipit-source-id: d260650c2398e33667e1bc5779fbabdff04f1f98
Summary:
The `BonsaiDerived` trait is split in two:
* The new `BonsaiDerivable` trait encapsulates the process of deriving the data, either
a single item from its parents, or a batch.
* The `BonsaiDerived` trait is used only as an entry point for deriving with the default
mapping and config.
This split will allow us to use `BonsaiDerivable` in batch backfilling with non-default
config, for example when backfilling a new version of a derived data type.
Reviewed By: krallin
Differential Revision: D25371964
fbshipit-source-id: 5874836bc06c18db306ada947a690658bf89723c