Summary:
Convert `BlobRepo` to a `facet::container`. This will allow it to be built
from an appropriate facet factory.
This only changes the definition of the structure: we still use
`blobrepo_factory` to construct it. The main difference is in the types
of the attributes, which change from `Arc<dyn Trait>` to
`Arc<dyn Trait + Send + Sync + 'static`>, specified by the `ArcTrait` alias
generated by the `#[facet::facet]` macro.
Reviewed By: StanislavGlebik
Differential Revision: D27169437
fbshipit-source-id: 3496b6ee2f0d1e72a36c9e9eb9bd3d0bb7beba8b
Summary:
Allow to use a custom accumulator inside gitimport so we selectively can decide what to save.
This was triggered mainly because we run out of memory due to the large BonsaiChangesets always collected even when not needed earlier.
Reviewed By: krallin
Differential Revision: D27117686
fbshipit-source-id: 99ce33562e76470f91ff8c0c46391bd513801afa
Summary:
Handle (ignore) git-submodules in gitimport.
git-sub-modules are represented as ObjectType::Commit inside the tree. For now we do not support git-sub-modules but we still need to import repositories that has sub-modules in them (just not synchronized), so ignoring any sub-module for now.
Reviewed By: StanislavGlebik
Differential Revision: D26999625
fbshipit-source-id: eb32247d4ad0325ee433e21a516ac4a92469fd90
Summary:
BonsaiChangesets are rarely mutated, and their maps are stored in sorted order,
so we can use `SortedVectorMap` to load them more efficiently.
In the cases where mutable maps of filechanges are needed, we can use `BTreeMap`
during the mutation and then convert them to `SortedVectorMap` to store them.
Reviewed By: mitrandir77
Differential Revision: D25615279
fbshipit-source-id: 796219c1130df5cb025952bb61002e8d2ae898f4
Summary:
Switch to using smallvec::SmallVec for the thrift types that contain small
amount of binary data.
Reviewed By: farnz
Differential Revision: D26461215
fbshipit-source-id: 8751a14bebdac3ea59fbd95197aa489e9f8e1d87
Summary:
I ran into an interesting issue - git and Mononoke/mercurial store timezones
differently.
Git - From https://fburl.com/utwmsmcu:
```
Git internal format
It is <unix timestamp> <time zone offset>, where <unix timestamp> is the number of seconds since the UNIX epoch. <time zone offset> is a positive or negative offset from UTC. **For example CET (which is 1 hour ahead of UTC) is +0100.**
```
Note that CET (which is to the east of utc) is stored as +0100.
Hg - now from `hg help dates`
```
This is the internal representation format for dates. The first number is
the number of seconds since the epoch (1970-01-01 00:00 UTC). The second
is the offset of the local timezone, in seconds **west of UTC (negative if the timezone is east of UTC)**.
```
that means that CET will be stored as -0100 i.e. with negative sign.
Mononoke - see https://fburl.com/diffusion/zf59f76j
We use FixedOffset::west_opt, and from docs (https://docs.rs/chrono/0.4.19/chrono/offset/struct.FixedOffset.html#method.west_opt)
```
Makes a new FixedOffset for the Western Hemisphere with given timezone difference. The negative secs means the Eastern Hemisphere.
Returns None on the out-of-bound secs.
```
So in order for mercurial and git to actually mean the same timezone, we need to multiply it by -1.
(note that hggit seem to be doing the same thing - https://fburl.com/code/pgdj5f2s).
You might wonder why mercurial's "hg log" now outputs the same timezone value as git - it converts it before outputting (https://fburl.com/code/ltmc66a1).
Reviewed By: krallin
Differential Revision: D26848463
fbshipit-source-id: fbd8c370565f5b663b438d0c11bddf39d090a16b
Summary:
AsyncVfs provides async vfs interface.
It will be used in the native checkout instead of current use case that spawns blocking tokio tasks for VFS action
Reviewed By: quark-zju
Differential Revision: D26801250
fbshipit-source-id: bb26c4fc8acac82f4b55bb3f2f3964a6d0b64014
Summary: This is used in Metagit and I'd like to decouple those 2 Tokio 1.x migrations.
Reviewed By: HarveyHunt
Differential Revision: D26813352
fbshipit-source-id: 7bc34e1cad00c83bf66edce559b07104d44a7357
Summary:
Better GitRepo name in gitimport logs.
When several git-repositories was handled by the same running instance, and all of them being called `.git`in the logs where not that useful.
Examples (when no name explicitly provided and git-import derives a name from the path to the git repository):
path: `/home/matshakanson/.cache/git_cache/aosp/device/generic/goldfish`
Note that the opened path is resolved to `/home/matshakanson/.cache/git_cache/aosp/device/generic/goldfish/.git` doe to not being a barren repository.
derived name previous revision: `.git`
derived name this revision: `/home/matshakanson/.cache/git_cache/aosp/device/generic/goldfish`
Now why not just remove the `.git` portion if present (to handle barren repositories)? Well we often have a big tree of git repositories we opreate on, and the final directory name in the tree is not unique. So instead using the full path to remove any ambiguity.
Note that this is just the default naming derived if the caller does not specify their own name .
Reviewed By: krallin
Differential Revision: D26697498
fbshipit-source-id: 3de893406525f88556f5bcd87abe238b7f2d8929
Summary:
For dependencies V2 puts "version" as the first attribute of dependency or just after "package" if present.
Workspace section is after patch section in V2 and since V2 autoformats patch section then the third-party/rust/Cargo.toml manual entries had to be formatted manually since V1 takes it as it is.
The thrift files are to have "generated by autocargo" and not only "generated" on their first line. This diff also removes some previously generated thrift files that have been incorrectly left when the corresponding Cargo.toml was removed.
Reviewed By: ikostia
Differential Revision: D26618363
fbshipit-source-id: c45d296074f5b0319bba975f3cb0240119729c92
Summary: Done some reordering of fields in Cargo.toml, added test and doctest = false, name of the target that generated the Cargo.toml file and sorted the cratemap.
Reviewed By: ahornby
Differential Revision: D26581275
fbshipit-source-id: 4c363369438c72d43d8ccf4799f103ff092457cc
Summary:
The on demand update code we have is the most basic logic that we could have.
The main problem is that it has long and redundant write locks. This change
reduces the write lock strictly to the section that has to update the in memory
IdDag.
Updating the Dag has 3 phases:
* loading the data that is required for the update;
* updating the IdMap;
* updating the IdDag;
The Dag can function well for serving requests as long as the commits involved
have been built so we want to have easy read access to both the IdMap and the
IdDag. The IdMap is a very simple structure and because it's described as an
Arc<dyn IdMap> we push the update locking logic to the storage. The IdDag is a
complicated structure that we ask to update itself. Those functions take
mutable references. Updating the storage of the iddag to hide the complexities
of locking is more difficult. We deal with the IdDag directly by wrapping it in
a RwLock. The RwLock allows for easy read access which we expect to be the
predominant access pattern.
Updates to the dag are not completely stable so racing updates can have
conflicting results. In case of conflics one of the update processes would have
to restart. It's easier to reason about the process if we just allow one
"thread" to start an update process. The update process is locked by a sync
mutex. The "threads" that fail the race to update are asked to wait until the
ongoing update is complete. The waiters will poll on a shared future that
tracks the ongoing dag update. After the update is complete the waiters will go
back to checking if the data they have is available in the dag. It is possible
that the dag is updated in between determining that the an update is needed and
acquiring the ongoing_update lock. This is fine because the update building
process checks the state of dag before the dag and updates only what is
necessary if necessary.
Reviewed By: krallin
Differential Revision: D26508430
fbshipit-source-id: cd3bceed7e0ffb00aee64433816b5a23c0508d3c
Summary:
The earlier diffs in this stack have removed all our dependencies on the Tokio
0.1 runtime environment (so, basically, `tokio-executor` and `tokio-timer`), so
we don't need this anymore.
We do still have some deps on `tokio-io`, but this is just traits + helpers,
so this doesn't actually prevent us from removing the 0.1 runtime!
Note that we still have a few transitive dependencies on Tokio 0.1:
- async-unit uses tokio-compat
- hg depends on tokio-compat too, and we depend on it in tests
This isn't the end of the world though, we can live with that :)
Reviewed By: ahornby
Differential Revision: D26544410
fbshipit-source-id: 24789be2402c3f48220dcaad110e8246ef02ecd8
Summary:
The changes (and fixes) needed were:
- Ignore rules that are not rust_library or thrift_library (previously only ignore rust_bindgen_library, so that binary and test dependencies were incorrectly added to Cargo.toml)
- Thrift package name to match escaping logic of `tools/build_defs/fbcode_macros/build_defs/lib/thrift/rust.bzl`
- Rearrange some attributes, like features, authors, edition etc.
- Authors to use " instead of '
- Features to be sorted
- Sort all dependencies as one instead of grouping third party and fbcode dependencies together
- Manually format certain entries from third-party/rust/Cargo.toml, since V2 formats third party dependency entries and V1 just takes them as is.
Reviewed By: zertosh
Differential Revision: D26544150
fbshipit-source-id: 19d98985bd6c3ac901ad40cff38ee1ced547e8eb
Summary:
krallin noted that this test takes 60 seconds to finish. This happens because
of r2d2 connection timeout - by default it's 30 seconds, and
fast_validate_repository test fails while we are trying to get git connection.
Instead of failing immediately it waits for 30 seconds (I looked into r2e2
implementation - when trying to get a connection it waits on a condition variable which never gets
notified because of a failure).
to mitigate the issue let's return Result<..., Error> now
Reviewed By: krallin
Differential Revision: D26462872
fbshipit-source-id: 93bb38da20459dca4e3737352ab3b638e8e88a84
Summary:
Autocargo V2 will use a more structured format for autocargo field
with the help of `cargo_toml` crate it will be easy to deserialize and handle
it.
Also the "include" field is apparently obsolete as it is used for cargo-publish (see https://doc.rust-lang.org/cargo/reference/manifest.html#the-exclude-and-include-fields). From what I know this might be often wrong, especially if someone tries to publish a package from fbcode, then the private facebook folders might be shipped. Lets just not set it and in the new system one will be able to set it explicitly via autocargo parameter on a rule.
Reviewed By: ahornby
Differential Revision: D26339606
fbshipit-source-id: 510a01a4dd80b3efe58a14553b752009d516d651
Summary:
For some repositories (e.g. configerator-sitevars) running "git gc" is too
slow. At the same time after initial rsync it would be good to run at least
some kind of validation to check that repo is not completely corrupt.
This diff adds that. This code checks that HEAD and 100 of its ancestors have
the object that they reference are reachable.
Reviewed By: krallin
Differential Revision: D26423769
fbshipit-source-id: a3fcd9fc5c30e5bf0fdc1cd0fb9e03bdc2e1371d
Summary:
Add a command to import the whole git tree as a single bonsai changeset. This
command maybe useful if we don't need the whole git history.
Reviewed By: krallin
Differential Revision: D26076645
fbshipit-source-id: 21b712776af1906ca7b06af088d98848bab907b8
Summary: I'm going to need this functionality in the next diff, so let's move it.
Reviewed By: krallin
Differential Revision: D26105565
fbshipit-source-id: bc31242713a4a37a3a48e17f6aae7690da7087f3
Summary:
Let's use this function since it removes copy-paste and adds additional checks
- for example it checks that parent are in blobstore.
Reviewed By: krallin
Differential Revision: D26079176
fbshipit-source-id: 9cd9bd170b929fa66c691432417a952ce11028ab
Summary:
Lots of generated code in this diff. Only code change was in
`common/rust/cargo_from_buck/lib/cargo_generator.py`.
Path/git-only dependencies (ie `mydep = { path = "../foo/bar" }`) are not
publishable to crates.io. However, we are allowed to specify both a path/git
_and_ a version. When building locally, the path/git is chosen. When publishing,
the version on crates.io is chosen.
See https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#multiple-locations .
Note that I understand that not all autocargo projects are published on crates.io (yet).
The point of this diff is to allow projects to slowly start getting uploaded.
The end goal is autocargo generated `Cargo.toml`s that can be `cargo publish`ed
without further modification.
Reviewed By: lukaspiatkowski
Differential Revision: D26028982
fbshipit-source-id: f7b4c9d4f4dd004727202bd98ab10e201a21e88c
Summary:
When we tried to update to Tokio 0.2.14, we hit lots of hangs. Those were due
to incompatibilities between Tokio 0.2.14 and Futures 1.29. We fixed some of
the bugs (and others had been fixed and were pending a release), and Futures
1.30 have now been released, which unblocks our update.
This diff updates Tokio accordingly (the previous diff in the stack fixes an
incompatibility).
The underlying motivation here is to ease the transition to Tokio 1.0.
Ultimately we'll be pulling in those changes one or way or another, so let's
get started on this incremental first step.
Reviewed By: farnz
Differential Revision: D25952428
fbshipit-source-id: b753195a1ffb404e0b0975eb7002d6d67ba100c2
Summary:
This feature is useful for testing time-dependent stuff (e.g. it
allows you to stop/forward time). It's already included in the buck build.
Reviewed By: SkyterX
Differential Revision: D25946732
fbshipit-source-id: 5e7b69967a45e6deaddaac34ba78b42d2f2ad90e
Summary:
We had issues with mononoke writing too much blobstore sync queue entries while
deriving data for large commits. We've contemplated a few solutions, and
decided to give this one a go.
This approach forces derive data to use Background SessionClass which has an
effect of not writing data to blobstore sync queue if the write to blobstore
was successful (it still writes data to the queue otherwise). This should
reduce the number of entries we write to the blobstore sync queue
significantly. The downside is that writes might get a bit slower - our
assumption is that this slowdown is acceptable. If that's not the case we can
always disable this option.
This diff overrides SessionClass for normal ::derive() method. However there's
also batch_derive() - this one will be addressed in the next diff.
One thing to note - we still write derived data mapping to blobstore sync queue. That should be find as we have a constant number of writes per commits.
Reviewed By: krallin
Differential Revision: D25910464
fbshipit-source-id: 4113d00bc0efe560fd14a5d4319b743d0a100dfa
Summary: Allow us to return arg parsing errors rather than panicing
Reviewed By: krallin
Differential Revision: D25837626
fbshipit-source-id: 87e39de140b1dcd3b13a529602fdafc31233175d
Summary:
Depending on the thrift defition, `thrift_library` targets may also depend on `ref-cast`.
Add this to the `Cargo.toml`.
Reviewed By: lukaspiatkowski
Differential Revision: D25636872
fbshipit-source-id: 8263395db2bb31127528f5c66c4cc5dd9180d89f
Summary: Convert `Changsets` trait and all its uses to new type futures
Reviewed By: krallin
Differential Revision: D25638875
fbshipit-source-id: 947423e2ee47a463861678b146641bcc6b899a4a
Summary:
Like it says in the title. This is nice to do because we had old futures
wrapping new futures here, so this lets us get rid of a lot of cruft.
Reviewed By: ahornby
Differential Revision: D25502648
fbshipit-source-id: a34973b32880d859b25dcb6dc455c42eec4c2f94
Summary: Convert all BlobRepoHg methods to new type futures
Reviewed By: StanislavGlebik
Differential Revision: D25471540
fbshipit-source-id: c8e99509d39d0e081d082097cbd9dbfca431637e
Summary:
Change derived data config to have "enabled" config and "backfilling" config.
The `Mapping` object has the responsibility of encapsulating the configuration options
for the derived data type. Since it is only possible to obtain a `Mapping` from
appropriate configuration, ownership of a `Mapping` means derivation is permitted,
and so the `DeriveMode` enum is removed.
Most callers will use `BonsaiDerived::derive`, or a default `derived_data_utils` implementation
that requires the derived data to be enabled and configured on the repo.
Backfillers can additionally use `derived_data_utils_for_backfill` which will use the
`backfilling` configuration in preference to the default configuration.
Reviewed By: ahornby
Differential Revision: D25246317
fbshipit-source-id: 352fe6509572409bc3338dd43d157f34c73b9eac
Summary:
Currently, data derivation for types that have options (currrently unode
version and blame filesize limit) take the value of the option from the
repository configuration.
This is a side-effect, and means it's not possible to have data derivation
types with different configs active in the same repository (e.g. to
server unodes v1 while backfilling unodes v2). To have data derivation
with different options, e.g. in tests, we must use `repo.dangerous_override`.
The first step to resolve this is to make the data derivation options a parameter.
Depending on the type of derived data, these options are passed into
`derive_from_parents` so that the right kind of derivation can happen.
The mapping is responsible for storing the options and providing it at the time
of derivation. In this diff it just gets it from the repository config, the same
as was done previously. In a future diff we will change this so that there
can be multiple configurations.
Reviewed By: krallin
Differential Revision: D25371967
fbshipit-source-id: 1cf4c06a4598fccbfa93367fc1f1c2fa00fd8235
Summary:
The `BonsaiDerived` trait is split in two:
* The new `BonsaiDerivable` trait encapsulates the process of deriving the data, either
a single item from its parents, or a batch.
* The `BonsaiDerived` trait is used only as an entry point for deriving with the default
mapping and config.
This split will allow us to use `BonsaiDerivable` in batch backfilling with non-default
config, for example when backfilling a new version of a derived data type.
Reviewed By: krallin
Differential Revision: D25371964
fbshipit-source-id: 5874836bc06c18db306ada947a690658bf89723c
Summary: Allows a binary to specify if the repo args are required on command line, and if so if OnlyOne of AtLeastOne is the requirement.
Reviewed By: farnz
Differential Revision: D25422757
fbshipit-source-id: 44d27c954bd1e0fa38b2d44c1c3b2eac3e50bd0c
Summary:
Following features added to gitimport, both the library rutine and the command line version.
* Define your own parts of a git-library to import by implementing the `GitimportTarget` trait.
* Added `GitimportTarget` implementation
`ImportMissingForCommit` that will search for any missing reference in Mononoke for the specified git-commit and import them. Note that it will not import commits unreachable by the specified commit history.
* Added support to update the bonsai<->git commit mapping while importing commits.
* Commit import progress is now shown, making it a bit easier to estimate how big an import job is and how long it will take.
* Adding optional git-repo name. This is useful when using gitimport as a library to import missing commits from many repositories simultaneously.
* Email to author is now added in the author field.
* Committer information is now also exported.
* Optimized the blob-store import by checking if a blob already exists prior to importing it.
* Added brief functions to basic hash structs, this is to get only the first 4 bytes (8 hex chars) for easier human inspection and debugging.
* Added support to suppress the long ref->BonzaiID mapping (on by default to match old behavior).
Reviewed By: krallin
Differential Revision: D25445974
fbshipit-source-id: 6dc7f977b61ceec1a95b5f3c38548ac8eddbea27
Summary:
Move expected_item_size_byte into CachelibSettings, seems like it should be there.
To enable its use also exposes a parse_and_init_cachelib method for callers that have different defaults to default cachelibe settings.
Reviewed By: krallin
Differential Revision: D24761024
fbshipit-source-id: 440082ab77b5b9f879c99b8f764e71bec68d270e
Summary:
It has a build() method and later in stack it will build a mononoke
specific type rather than the clap::App
Differential Revision: D25216827
fbshipit-source-id: 24a531856405a702e7fecf54d60be1ea3d2aa6e7
Summary: convert mercurial_derived_data to new type futures
Reviewed By: ahornby
Differential Revision: D25220329
fbshipit-source-id: c2532a12e915b315fe6eb72f122dbc37822bbb2a
Summary: convert `BlobRepo::get_bonsai_bookmark` to new type futures
Reviewed By: StanislavGlebik
Differential Revision: D25188577
fbshipit-source-id: fb6f2b592b9e9f76736bc1af5fa5a08d12744b5f
Summary: Remove 'static requirement for async methods of Blobstore, propagate this change and fixup low hanging fruits where the code can become 'static free easily.
Reviewed By: ahornby, farnz
Differential Revision: D24839054
fbshipit-source-id: 5d5daa04c23c4c9ae902b669b0a71fe41ee6dee6
Summary: Now that `derive03` is the only version available, rename it to `derive`.
Reviewed By: krallin
Differential Revision: D24900106
fbshipit-source-id: c7fbf9a00baca7d52da64f2b5c17e3fe1ddc179e
Summary: Like it says in the title.
Reviewed By: StanislavGlebik
Differential Revision: D24731300
fbshipit-source-id: b9c44fc1e4bd4cfe8655e1024a0547e40fb99424