Summary:
Eventually, I plan to make this the default, but for now I'd like to make it
something we can choose to turn on or off as a cmd argument (so we can start
with the experimental tier and Fastreplay).
Note that this mixes volatile vs. non-volatile pools when accessing the pools
for cacheblob. In practice, those pools are actually volatile, it's just that
things don't break if you access them as non-volatile.
Reviewed By: farnz
Differential Revision: D22356537
fbshipit-source-id: 53071b6b21ca5727d422e10f685061c709114ae7
Summary:
Add a new parameter, `pagination`, to the `list` method of the `Bookmarks` trait.
This restricts the returned bookmarks to those lexicographically after the
given bookmark name (exclusive). This can be use to implement pagination:
callers can provide the last bookmark in the previous page to fetch the
next page of bookmarks.
Reviewed By: krallin
Differential Revision: D22333943
fbshipit-source-id: 686df545020d936095e29ae5fee24258511f4083
Summary:
Rework the bookmarks traits:
* Split out log functions into a separate `BookmarkUpdateLog` trait. The cache doesn't care about these methods.
* Simplify `list` down to a single method with appropriate filtering parameters. We want to add more filtering types, and adding more methods for each possible combination will be messier.
* The `Bookmarks` and `BookmarkUpdateLog` traits become `attributes` on `BlobRepo`, rather than being a named member.
Reorganise the bookmarks crate to separate out the bookmarks log and transactions into their own modules.
Reviewed By: krallin
Differential Revision: D22307781
fbshipit-source-id: 4fe514df8b7ef92ed3def80b21a16e196d916c64
Summary: Convert the bookmarks traits to use new-style `BoxFuture<'static>` and `BoxStream<'static>`. This is a step along the path to full `async`/`await`.
Reviewed By: farnz
Differential Revision: D22244489
fbshipit-source-id: b1bcb65a6d9e63bc963d9faf106db61cd507e452
Summary:
Eventually, we want everything to be `async`/`await`; as a stepping stone in that direction, switch the remaining lobstore traits to new-style futures.
This just pushes the `.compat()` out to old-style futures, but it makes the move to non-'static lifetimes easier, as all the compile errors will relate to lifetime issues.
Reviewed By: krallin
Differential Revision: D22183228
fbshipit-source-id: 3fe3977f4469626f55cbf5636d17fff905039827
Summary:
Eventually, we want everything to be `async`/`await`; as a stepping stone in that direction, switch some of the blobstore interfaces to new-style `BoxFuture` with a `'static` lifetime.
This does not enable any fixes at this point, but does mean that `.compat()` moves to the places that need old-style futures instead of new. It also means that the work needed to make the transition fully complete is changed from a full conversion to new futures, to simply changing the lifetimes involved and fixing the resulting compile failures.
Reviewed By: krallin
Differential Revision: D22164315
fbshipit-source-id: dc655c36db4711d84d42d1e81b76e5dddd16f59d
Summary: This got broken in D22115015 — this fixes it.
Reviewed By: farnz
Differential Revision: D22186138
fbshipit-source-id: 54c05466cdbd3be4f6887a852f099351ea5e891e
Summary: DangerousOverride is moved into a separate crate. Not only it is usually not needed but it was introducing dependencies on mercurial crate.
Reviewed By: StanislavGlebik
Differential Revision: D22115015
fbshipit-source-id: c9646896f906ea54d11aa83a8fbd8490a5b115ea
Summary: This change will ensure that cloning blobrepo is cheap, even if someone adds field that is expensive to clone. Plus it will result in just one arc-clone instead of cloning all the fields one by one.
Reviewed By: mitrandir77
Differential Revision: D22114066
fbshipit-source-id: ca0c3c78033b4c74872da314a32deb37c05b70ca
Summary: Globalrev does not have any dependencies on mercurial so it can be moved to mononoke_types since it is used in BlobRepo
Reviewed By: StanislavGlebik
Differential Revision: D22092491
fbshipit-source-id: 1dded88eb2ace08e8c6c3673e2d50ae1fbb9850d
Summary: Move all mercurial changeset generation logic to `blobrepo_hg`. This is preliminary step is required to decouples BlobRepo from mercurial, and in later stages it will be moved to derived data infra once blobrepo is free of mercurial.
Reviewed By: StanislavGlebik
Differential Revision: D22089677
fbshipit-source-id: bca28dedda499f80899e729e4142e373d8bec0b8
Summary: move HgMutationStore to attributes, and all related methods to BlobRepoHg
Reviewed By: StanislavGlebik
Differential Revision: D22089657
fbshipit-source-id: 8fe87418ccb8a7ad43828758844bdbd73dc0573d
Summary: Move `Filenodes` to `BlobRepo::attributes` as it is mercurial specific.
Reviewed By: ikostia
Differential Revision: D21662418
fbshipit-source-id: 87648a3e6fd7382437424df3ee60e1e582b6b958
Summary: This diff introduces `BlobRepoHg` extension trait for `BlobRepo` object. Which contains mercurial specific methods that were previously part of `BlobRepo`. This diff also stars moving some of the methods from BlobRepo to BlobRepoHg.
Reviewed By: ikostia
Differential Revision: D21659867
fbshipit-source-id: 1af992915a776f6f6e49b03e4156151741b2fca2
Summary:
This diff adds additional filed `BlobRepo::attributes` which can store attributes of arbitrary type. This will help store opaque types inside blobrepo without creating dependency on a crate which contains type definition for this attribute. This diff also moves `BonsaiHgMapping` inside attributes set.
- This work will allow to move mercurial changeset generation logic to derive data infrastructure
Reviewed By: ikostia
Differential Revision: D21640438
fbshipit-source-id: 3abd912e7227738a73ea9b17aabdda72a33059aa
Summary:
Tooling can't handle named_deps yet, but it can warn about them
P133451794
Reviewed By: StanislavGlebik
Differential Revision: D22083499
fbshipit-source-id: 46de533c19b13b2469e912165c1577ddb63d15cd
Summary:
Remove unused dependencies for Rust targets.
This failed to remove the dependencies in eden/scm/edenscmnative/bindings
because of the extra macro layer.
Manual edits (named_deps) and misc output in P133451794
Reviewed By: dtolnay
Differential Revision: D22083498
fbshipit-source-id: 170bbaf3c6d767e52e86152d0f34bf6daa198283
Summary: asyncify get_file_history_using_prefetched. Initially I was planning to change this function in the next diff, but then changed my mind. However asyncifying this function won't hurt to land.
Reviewed By: krallin
Differential Revision: D21881121
fbshipit-source-id: 0b2e92ae015b72e1f4578153c56708faabc4bf13
Summary: Prep for 1.44 but also general cleanups.
Reviewed By: dtolnay
Differential Revision: D22024428
fbshipit-source-id: 8e1d39a1e78289129b38554674d3dbf80681f4c3
Summary: Found this wasn't used while looking at prefixblob usages
Reviewed By: farnz
Differential Revision: D22034647
fbshipit-source-id: a0d7dc02dbee9fdf685df464539323d4c200a818
Summary:
Some of the hg changeset structures use `comments` as the name for commit
message. But this is confusing - let's rename
Reviewed By: HarveyHunt
Differential Revision: D21974571
fbshipit-source-id: e7c5c5ad8db9b2f1343abe9101fc56a6d4287548
Summary:
Similar to get_all_filenodes_maybe_stale() make this method return
FilenodeResult if filenodes are disabled.
Note: this diff adds one TODO in fetch_root_filenode, which will be removed
together with other TODOs in the next diff.
Reviewed By: ahornby
Differential Revision: D21904399
fbshipit-source-id: 1569579699c02eb07021f8143aa652aa192d23bc
Summary:
Let's return FilenodeResult from get_all_filenodes_maybe_stale and change
callers to deal with that.
The change is straightforward with the exception of `file_history.rs`.
get_all_filenodes_maybe_stale() is used here to prefetch a lot filenodes in one
go. This diff changes it to return an empty vec in case filenodes are disabled.
Unfortunately this is not a great solution - since prefetched files are empty
get_file_history_using_prefetched() falls back to fetching filenodes
sequentially from the blobstore. that might be too slow, and the next diffs in
the stack will address this problem.
Reviewed By: krallin
Differential Revision: D21881082
fbshipit-source-id: a86dfd48a92182381ab56994f6b0f4b14651ea14
Summary:
I observed that for whatever reason our setting of `use_try_shorthand = true` in rustfmt.toml was causing entire functions to not get processed by rustfmt. Even files that contain neither `try` nor `?`. Remove it and reformat fbsource.
Documentation of that config:
- https://github.com/rust-lang/rustfmt/blob/master/Configurations.md#use_try_shorthand
We don't particularly care about the value anymore because nobody writes `r#try!(...)` in 2018 edition code.
Minimized:
```
fn f() {
g(
)
// ...
.h
}
```
This function gets formatted only if use_try_shorthand is not set.
The bug is fixed in the rustfmt 2.0 release candidate.
Reviewed By: jsgf
Differential Revision: D21878162
fbshipit-source-id: b028673c0eb703984d24bf0d2983453fc2a8c212
Summary: I will modify in the next diffs, it's nice to have some test coverage
Reviewed By: krallin
Differential Revision: D21880996
fbshipit-source-id: 266622982e5d7d6d19ab3ac921ddc51079e51457
Summary:
The max_depth parameter was confusing - one might have thought that it returns
all entries that are `max_depth` hops away from `startnode`. This is not the
case - in reality it just limits what's the total number of entries can be
returned.
In fact, I think it can be replaced with Stream::take() method, however that's
a larger refactoring, and I'm not going to do it now.
It doesn't matter much because usually we have linear histories, but I think
it's still worth to make it clearer.
Reviewed By: krallin
Differential Revision: D21880220
fbshipit-source-id: d209e1e129383f9181ae11b489992dc4c3ce2d5c
Summary:
The motivation for the whole stack:
At the moment if mysql is down then Mononoke is down as well, both for writes
and for reads. However we can relatively easily improve the situation.
During hg update client sends getpack() requests to fetch files, and currently
for each file fetch we also fetch file's linknode. However hg client knows how
to deal with null linknodes [1], so when mysql is unavailable we can disable
filenode fetching completely and just return null linknodes. So the goal of this stack is to
add a knob (i.e. a tunable) that can turn things filenode fetches on and off, and make
sure the rest of the code deals nicely with this situation.
Now, about this diff. In order to force callers to deal with the fact that
filenodes might unavailable I suggest to add a special type of result, which (in
later diffs) will be returned by every filenodes methods.
This diff just introduces the FilenodeResult and convert BlobRepo filenode
methods to return it. The reason why I converted BlobRepo methods first
is to limit the scope of changes but at the same time show how the callers' code will look
like after FilenodeResult is introduced, and get people's thoughts of whether
it's reasonable or not.
Another important change I'd like to introduce in the next diffs is modifying FilenodesOnlyPublic
derived data to return success if filenodes knob is off. If we don't do that
then any attempt to derive filenodes might fail which in turn would lead to the
same problem we have right now - people won't be able to do hg update/hg
pull/etc if mysql is down.
[1] null linknodes might make some client side operation slower (e.g. hg rebase/log/blame),
so we should use it only in sev-like situations
Reviewed By: krallin
Differential Revision: D21787848
fbshipit-source-id: ad48d5556e995af09295fa43118ff8e3c2b0e53e
Summary: Add the mutation store to blobrepo.
Reviewed By: krallin
Differential Revision: D20871336
fbshipit-source-id: 777cba6c2bdcfb16b711dbad61fc6d0d2f337117
Summary:
This updates our blobrepo factory code to async / await. The underlying
motivation is to make this easier to modify. I've ran into this a few times
now, and I'm sure others have to, so I think it's time.
In doing so, I've simplified the code a little bit to stop passing futures
around when values will do. This makes the code a bit more sequential, but
considering none of those futures were eager in any way, it shouldn't really
make any difference.
Reviewed By: markbt
Differential Revision: D21427290
fbshipit-source-id: e70500b6421a95895247109cec75ca7fde317169
Summary:
At least let's tell the use what to do about the problem and, where we can,
what the conflicting file was (see the attached task).
Reviewed By: farnz
Differential Revision: D21459412
fbshipit-source-id: 52b90cf7d41ebe6550083c6673b4e93b10edf5e2
Summary:
I initially wanted to modify this and it'll be easier to do so if it's
async-await. While in there, add tests and update the code to bail early if any
conflict is hit.
In writing the tests, I noted that the code that we need is already there and
his does work as expected, so I'm not actually going to modify this more, but
it's probably stil worth it to land the tests.
Reviewed By: StanislavGlebik
Differential Revision: D21457899
fbshipit-source-id: 91350962fa2d96a88e4595d1ae47ef7678dad8cb
Summary: I'm going to asyncify some things here. Let's start with this.
Reviewed By: farnz
Differential Revision: D21451761
fbshipit-source-id: 64c78de4ab640b826a3ec1d6d84149d46f225024
Summary:
- Change get return value for `Blobstore` from `BlobstoreBytes` to `BlobstoreGetData` which include `ctime` metadata
- Update the call sites and tests broken due to this change
- Change `ScrubHandler::on_repair` to accept metadata and log ctime
- `Fileblob` and `Manifoldblob` attach the ctime metadata
- Tests for fileblob in `mononoke:blobstore-test` and integration test `test-walker-scrub-blobstore.t`
- Make cachelib based caching use `BlobstoreGetData`
Reviewed By: ahornby
Differential Revision: D21094023
fbshipit-source-id: dc597e888eac2098c0e50d06e80ee180b4f3e069
Summary:
This is helpful. Also, while in there, I removed an error that wasn't used at
all.
Reviewed By: StanislavGlebik
Differential Revision: D21399489
fbshipit-source-id: 0e5ef20b842afa9ffc0bb8530c48eb48339c558e
Summary:
This removes our own (Mononoke's) implementation of failure chains, and instead
replaces them with usage of Anyhow. This doesn't appear to be used anywhere
besides Mononoke.
The historical motivation for failure chains was to make context introspectable
back when we were using Failure. However, we're not using Failure anymore, and
Anyhow does that out of the box with its `context` method, which you can
downcast to the original error or any of the context instances:
https://docs.rs/anyhow/1.0.28/anyhow/trait.Context.html#effect-on-downcasting
Reviewed By: StanislavGlebik
Differential Revision: D21384015
fbshipit-source-id: 1dc08b4b38edf8f9a2c69a1e1572d385c7063dbe
Summary:
This adds support for running Gitimport with `--readonly-storage`. The way we
do this is by masking the various storages we use (blobstore, changesets,
bonsai).
Reviewed By: markbt
Differential Revision: D21347939
fbshipit-source-id: 68084ba0d812dc200776c761afdfe41bab9a6d82
Summary:
This is needed because the tonic crate (see the diff stack) relies on tokio ^0.2.13
We can't go to a newer version because a bug that affects mononoke was introduced on 0.2.14 (discussion started on T65261126). The issue was reported upstream https://github.com/tokio-rs/tokio/issues/2390
This diff simply changed the version number on `fbsource/third-party/rust/Cargo.toml` and ran `fbsource/third-party/rust/reindeer/vendor`.
Also ran `buck run //common/rust/cargo_from_buck:cargo_from_buck` to fix the tokio version on generated cargo files
Reviewed By: krallin
Differential Revision: D21043344
fbshipit-source-id: e61797317a581aa87a8a54e9e2ae22655f22fb97
Summary:
We had accumulated lots of unused dependendencies, and had several test_deps in deps instead. Clean this all up to reduce build times and speed up autocargo processing.
Net removal is of around 500 unneeded dependency lines, which represented false dependencies; by removing them, we should get more parallelism in dev builds, and less overbuilding in CI.
Reviewed By: krallin, StanislavGlebik
Differential Revision: D20999762
fbshipit-source-id: 4db3772cbc3fb2af09a16601bc075ae8ed6f0c75
Summary:
RepoBlobstore is currently a type alias for the underlying blobstore type. This
is a bit unideal for a few reasons:
- It means we can't add convenience methods on it. Notably, getting access to
the underlying blobstore can be helpful in tests, but as-is we cannot do that
(see the test that I updated in the LFS server change in this diff for an
example).
- Since the various blobstores we use for wrapping are blobstores themselves,
it is possible when deconstructing the repo blobstore to accidentally forget
to remove one layer. By making the internal blobstore a `T`, we can let the
compiler prove that deconstructing the `RepoBlobstore` is done properly.
Most of the changes in this diff are slight refactorings to make this compile
(e.g. removing obsolete trait bounds, etc.), but there are a couple functional
changes:
- I've extracted the RedactedBlobstore configuration into its own Arc. This
enables us to pull it back out of a RedactedBlobstore without having to copy
the actual data that's in it.
- I've removed `as_inner()` and `into_inner()` from `RedactedBlobstore`. Those
methods didn't really make sense. They had 2 use cases:
- Deconstruct the `RedactedBlobstore` (to rebuild a new blobstore). This is
better handled by `as_parts()`.
- Get the underlying blobstore to make a request. This is better handled by
yielding the blobstore when checking for access, which also ensures you
cannot accidentally bypass redaction by using `as_inner()` (this which also
allowed me to remove a clone on blobstore in the process).
Reviewed By: farnz
Differential Revision: D20941351
fbshipit-source-id: 9fa566702598b916cb87be6b3f064cd7e8e0b3e0
Summary:
Right now we have a couple functions, but they're not easily composable. I'd
like to make the redacted blobs configurable when creating a test repo, but I
also don't want to have 2 new variants, so let's create a little builder for
test repos.
This should make it easier to extend in the future to add more customizability
to test repos, which should in turn make it easier to write unit tests :)
Reviewed By: HarveyHunt
Differential Revision: D20897253
fbshipit-source-id: 3cb9b52ffda80ccf5b9a328accb92132261616a1
Summary: Some methods that were unused or barely used outside of the cmdlib crate were made non-public (parse_caching, CachelibSettings, init_cachelib_from_settings).
Reviewed By: krallin
Differential Revision: D20671251
fbshipit-source-id: 232e786fa5af5af543239aca939cb15ca2d6bc10
Summary:
Our phases caching wasn't great. If you tried to ask for a draft commit then
we'd call mark_reachable_as_public method, bu this method was bypassing
caches.
The reason why we had this problem was because we had caching on a higher level
than necessary - we had SqlPhases struct which was "smarter" (i.e. it has a
logic of traversing ancestors of public heads and marking these ancestors and
public) and SqlPhasesStore which just did sql access. Previously we had our
caching layer on top of SqlPhases, meaning that when SqlPhases calls
`mark_reachable_as_public` it can't use caches anymore.
This diff fixes it by moving caching one layer lower - now we have a cache
right on top of SqlPhasesStore. Because of this change we no longer need
CachingPhases, and they were removed. Also `ephemeral_derive` logic was
simplified a bit
Reviewed By: krallin
Differential Revision: D20834740
fbshipit-source-id: 908b7e17d6588ce85771dedf51fcddcd2fabf00e
Summary:
Migrate the configuration of sql data managers from the old configuration using `sql_ext::SqlConstructors` to the new configuration using `sql_construct::SqlConstruct`.
In the old configuration, sharded filenodes were included in the configuration of remote databases, even when that made no sense:
```
[storage.db.remote]
db_address = "main_database"
sharded_filenodes = { shard_map = "sharded_database", shard_num = 100 }
[storage.blobstore.multiplexed]
queue_db = { remote = {
db_address = "queue_database",
sharded_filenodes = { shard_map = "valid_config_but_meaningless", shard_num = 100 }
}
```
This change separates out:
* **DatabaseConfig**, which describes a single local or remote connection to a database, used in configuration like the queue database.
* **MetadataDatabaseConfig**, which describes the multiple databases used for repo metadata.
**MetadataDatabaseConfig** is either:
* **Local**, which is a local sqlite database, the same as for **DatabaseConfig**; or
* **Remote**, which contains:
* `primary`, the database used for main metadata.
* `filenodes`, the database used for filenodes, which may be sharded or unsharded.
More fields can be added to **RemoteMetadataDatabaseConfig** when we want to add new databases.
New configuration looks like:
```
[storage.metadata.remote]
primary = { db_address = "main_database" }
filenodes = { sharded = { shard_map = "sharded_database", shard_num = 100 } }
[storage.blobstore.multiplexed]
queue_db = { remote = { db_address = "queue_database" } }
```
The `sql_construct` crate facilitates this by providing the following traits:
* **SqlConstruct** defines the basic rules for construction, and allows construction based on a local sqlite database.
* **SqlShardedConstruct** defines the basic rules for construction based on sharded databases.
* **FbSqlConstruct** and **FbShardedSqlConstruct** allow construction based on unsharded and sharded remote databases on Facebook infra.
* **SqlConstructFromDatabaseConfig** allows construction based on the database defined in **DatabaseConfig**.
* **SqlConstructFromMetadataDatabaseConfig** allows construction based on the appropriate database defined in **MetadataDatabaseConfig**.
* **SqlShardableConstructFromMetadataDatabaseConfig** allows construction based on the appropriate shardable databases defined in **MetadataDatabaseConfig**.
Sql database managers should implement:
* **SqlConstruct** in order to define how to construct an unsharded instance from a single set of `SqlConnections`.
* **SqlShardedConstruct**, if they are shardable, in order to define how to construct a sharded instance.
* If the database is part of the repository metadata database config, either of:
* **SqlConstructFromMetadataDatabaseConfig** if they are not shardable. By default they will use the primary metadata database, but this can be overridden by implementing `remote_database_config`.
* **SqlShardableConstructFromMetadataDatabaseConfig** if they are shardable. They must implement `remote_database_config` to specify where to get the sharded or unsharded configuration from.
Reviewed By: StanislavGlebik
Differential Revision: D20734883
fbshipit-source-id: bb2f4cb3806edad2bbd54a47558a164e3190c5d1
Summary:
Right now, ContextConcurrencyBlobstore is instantiated in make_blobstore, which
makes it a lot more effective (3 times more effective, in fact) than we want it
to be, since a ticket is acquired by 3 blobstores in the chain in order to
complete a put:
- The multiplex
- The two underlying blobstores
This also has the potential to deadlock if all tickets are held by the
multiplex, which results in an eventual timeout after 600s of waiting in the
multiplex (this looks like it might be happening at least once or twice per
hour right now on the experimental tier).
In any case, the intention had always been to have one of those per repo, not
one per sub-blobstore, so let's do that. The more natural place to put this
seems to be the RepoBlobstore instantiation.
Since I anticipate I might not be the only one who gets tripped up by this at
some point, I also added a comment about this. I also updated the blobsync
tests to stop re-implementing `RepoBlobstoreArgs::new()` so that adding new
blobstores in RepoBlobstoreArgs will have minimal friction.
Reviewed By: HarveyHunt
Differential Revision: D20467346
fbshipit-source-id: a6ad2d8f04bff1c6fcaa151e947cb8af919eec07
Summary: I'm about to introduce one more usecase of it so let's rename it first.
Reviewed By: farnz
Differential Revision: D20393776
fbshipit-source-id: d74146fa212cdc4989a18c2cbd28307f58994759
Summary:
A lot of callsites want to know repo name. Currently they need to pass it from
the place where repo was initialized, and that's quite awkward, and in some
places even impossible (i.e. in derived data, where I want to log reponame).
This diff adds reponame in BlobRepo
Reviewed By: krallin
Differential Revision: D20363065
fbshipit-source-id: 5e2eb611fb9d58f8f78638574fdcb32234e5ca0d
Summary:
The goal of the whole stack is quite simple (add reponame field to BlobRepo), but
this stack also tries to make it easier to initialize BlobRepo.
To do that BlobrepoBuilder was added. It now accepts RepoConfig instead of 6
different fields from RepoConfig - that makes it easier to pass a field from
config into BlobRepo. It also allows to customize BlobRepo. Currently it's used
just to add redaction override, but later we can extend it for other use cases
as well, with the hope that we'll be able to remove a bunch of repo-creation
functions from cmdlib.
Because of BlobrepoBuilder we no longer need open_blobrepo function. Later we
might consider removing open_blobrepo_given_datasources as well.
Note that this diff *adds* a few new clones. I don't consider it being a big
problem, though I'm curious to hear your thoughts folks.
Note that another option for the implementation would be to take a reference to objects
instead of taking them by value. I briefly looked into how they used, and lot of them are passed to the
objects that actually take ownership of what's inside these config fields. I.e. Blobstore essentially takes ownership
of BlobstoreOptions, because it needs to store manifold bucket name.
Same for scuba_censored_table, filestore_params, bookmarks_cache_ttl etc. So unless I'm missing anything, we can
either pass them as reference and then we'll have to copy them, or we can
just pass a value from BlobrepoBuilder directly.
Reviewed By: krallin
Differential Revision: D20312567
fbshipit-source-id: 14634f5e14f103b110482557254f084da1c725e1
Summary:
Note that comparing to many other asyncifying efforts, this one actually adds
one more clone instead of removing them. This is the clone of a logger field.
That shouldn't matter much because it can be cleaned up later and because this
function will be called once per repo.
Reviewed By: krallin
Differential Revision: D20311122
fbshipit-source-id: ace2a108790b1423f8525d08bdea9dc3a2e3c37c
Summary:
This updates the store_bytes method to chunk incoming data instead of uploading
it as-is. This is unfortunately a bit hacky (but so was the previous
implementation), since it means we have to hash the data before it has gone
through the Filestore's preparation.
That said, one of the invariants of the filestore is that chunk size shouldn't
affect the Content ID (and there is fairly extensive test coverage for this),
so, notionally, this does work.
Performance-wise, it does mean we are hashing the object twice. That actually
was the case before as well anyway (since obtain the ContentId for FileContents
would clone them then hash them).
The upshot of this change is that large files uploaded through unbundle will
actually be chunked (whereas before, they wouldn't be).
Long-term, we should try and delete this method, as it is quite unsavory to
begin with. But, for now, we don't really have a choice since our content
upload path does rely on its existence.
Reviewed By: StanislavGlebik
Differential Revision: D20281937
fbshipit-source-id: 78d584b2f9eea6996dd1d4acbbadc10c9049a408
Summary:
This is a very small struct (2 u64s) that really doesn't need to be passed by
reference. Might as well just pass it by value.
Differential Revision: D20281936
fbshipit-source-id: 2cc64c8ab6e99ee50b2e493eff61ea34d6eb54c1
Summary: separate out the Facebook-specific pieces of the sql_ext crate
Reviewed By: ahornby
Differential Revision: D20218219
fbshipit-source-id: e933c7402b31fcd5c4af78d5e70adafd67e91ecd
Summary:
Context: https://fb.workplace.com/groups/rust.language/permalink/3338940432821215/
This codemod replaces all dependencies on `//common/rust/renamed:tokio-preview` with `fbsource//third-party/rust:tokio-preview` and their uses in Rust code from `tokio_preview::` to `tokio::`.
This does not introduce any collisions with `tokio::` meaning 0.1 tokio because D20235404 previously renamed all of those to `tokio_old::` in crates that depend on both 0.1 and 0.2 tokio.
This is the tokio version of what D20213432 did for futures.
Codemod performed by:
```
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| sed 's,TARGETS$,:,' \
| xargs \
-x \
buck query "labels(srcs, rdeps(%Ss, //common/rust/renamed:tokio-preview, 1))" \
| xargs sed -i 's,\btokio_preview::,tokio::,'
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| xargs sed -i 's,//common/rust/renamed:tokio-preview,fbsource//third-party/rust:tokio-preview,'
```
Reviewed By: k21
Differential Revision: D20236557
fbshipit-source-id: 15068b93a0a944d6249a1d9f63840a4c61c9c1ba
Summary:
This updates microwave to also support changesets, in addition to filenodes.
Those create a non-trivial amount of SQL load when we warm up the cache (due to
sequential reads), which we can eliminate by loading them through microwave.
They're also a bottleneck when manifests are loaded already.
Note: as part of this, I've updated the Microwave wrapper methods to panic if
we try to access a method that isn't instrumented. Since we'd be running
the Microwave builder in the background, this feels OK (because then we'd find
out if we call them during cache warmup unexpectedly).
Reviewed By: farnz
Differential Revision: D20221463
fbshipit-source-id: 317023677af4180007001fcaccc203681b7c95b7
Summary:
Implementation of derivation logic for the changeset info.
BonsaiDerived is implemented for the ChangesetInfo. `derive_from_parents` just derives an info and BonsaiDerivedMapping then puts it into the blobstore.
```
ChangesetInfo::derive(..) -> ChacgesetInfo
```
Reviewed By: krallin
Differential Revision: D20185954
fbshipit-source-id: afe609d1b2711aed7f2740714df6b9417c6fe716
Summary:
Previously warm bookmark cache tried to derive all bookmarks on startup. It slows down the startup time and in some cases it might prevent scs server from starting up at all.
Let's change how warm bookmark cache initializes the bookmarks - instead of trying to derive all of them let's move underived bookmarks back in history.
Reviewed By: krallin
Differential Revision: D20195211
fbshipit-source-id: 5cb5d8599d3035973175d3063186a7c01536889a
Summary:
This removes the Extend implementation for FileBytes, which was incorrect (it
discarded existing data!). I had introduced this as a backwards compatibility
shim when doing the Bytes 0.4 to Bytes 0.5 migration :/
We don't really need this shim, considering:
- The only place that really matters that uses this is the remotefilelog crate,
where we have a content id, and where we should use `filestore::fetch_concat`
instead.
- The other places are tests (or close to abandonware...), which can do their
own folding.
Longer term, I'd like to remove the whole `Content` stream in hg entries, so
those callsites can use the filestore methods, which a) have test coverage
(unlike ad-hoc folds, which don't always do), and b) are more efficient since
they know how large the destination buffer needs to be ahead of time, and don't
need to re-allocate.
To make sure this fixes the bug, I also introduced tests for the remotefilelog
crate. As expected, the chunked variant fails without this fix.
Reviewed By: mitrandir77
Differential Revision: D20248978
fbshipit-source-id: 1b554d3e595eb867b6b6cf4204d31f27dd90a111
Summary:
Context: https://fb.workplace.com/groups/rust.language/permalink/3338940432821215/
This codemod replaces *all* dependencies on `//common/rust/renamed:futures-preview` with `fbsource//third-party/rust:futures-preview` and their uses in Rust code from `futures_preview::` to `futures::`.
This does not introduce any collisions with `futures::` meaning 0.1 futures because D20168958 previously renamed all of those to `futures_old::` in crates that depend on *both* 0.1 and 0.3 futures.
Codemod performed by:
```
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| sed 's,TARGETS$,:,' \
| xargs \
-x \
buck query "labels(srcs, rdeps(%Ss, //common/rust/renamed:futures-preview, 1))" \
| xargs sed -i 's,\bfutures_preview::,futures::,'
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| xargs sed -i 's,//common/rust/renamed:futures-preview,fbsource//third-party/rust:futures-preview,'
```
Reviewed By: k21
Differential Revision: D20213432
fbshipit-source-id: 07ee643d350c5817cda1f43684d55084f8ac68a6
Summary:
In targets that depend on *both* 0.1 and 0.3 futures, this codemod renames the 0.1 dependency to be exposed as futures_old::. This is in preparation for flipping the 0.3 dependencies from futures_preview:: to plain futures::.
rs changes performed by:
```
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| sed 's,TARGETS$,:,' \
| xargs \
-x \
buck query "labels(srcs,
rdeps(%Ss, fbsource//third-party/rust:futures-old, 1)
intersect
rdeps(%Ss, //common/rust/renamed:futures-preview, 1)
)" \
| xargs sed -i 's/\bfutures::/futures_old::/'
```
Reviewed By: jsgf
Differential Revision: D20168958
fbshipit-source-id: d2c099f9170c427e542975bc22fd96138a7725b0
Summary:
Our previous implementation of unodes had a problem with diamond merges -
essentially because p1 and p2 might have the same file but with different
content unode will always create a merge unode which can be unexpected.
(code comment in unodes/derive.rs has more info about it).
This diff fixes the problem by introducing unodes v2. This allows us to import
new repos with new unode implementation while keeping the old repos with unode
v1.
This implementation uses a heuristic which should be fast and should do the
correct thing most of the time. In some cases it might exclude some parts of
the history completely. For example:
O <- merge commit, doesn't change anything
/ \
P1 | <- modified "file.txt" to "B"
| P2 <- modified "file.txt" to "B"
\ /
ROOT <- created "file.txt" with content "A"
In that case history of "file.txt" starting from merge commit will contain only (P1, ROOT),
but it won't contain P2.
We also considered other options:
1) Move this heuristic to fastlog batch derived data. See D19973553 for more
details about why we decided not to do it.
2) Filter out parent unodes that are ancestors of other parent unodes. This should
always be correct, but it will be hard to implement, it wil be even harder to make
sure it always have good performance.
Reviewed By: krallin
Differential Revision: D19978157
fbshipit-source-id: 445ddd5629669d987e7aa88c35fecf0b34a40da0
Summary:
This updates our filenodes implementation to use different types for writing
(`PreparedFilenode`) and reading `(FilenodeInfo`).
The bottom line is that this avoids a bunch of cloning of paths on the read
path, which doesn't need to return the path to the caller, since the caller
already knows it! We can also take it out of Memcache, since we don't need
Memcache to tell us the path for a blob we could only possibly have found by
having the path to begin with.
This does update our filenodes serialization format. I bumped MC_CODEVER
accordingly.
Reviewed By: StanislavGlebik
Differential Revision: D19905400
fbshipit-source-id: 6037802c1773de564cade8e264d36087382ee15a
Summary:
This updates Mononoke to use the new filenodes implementation introduced
earlier in this stack.
See the test plan for detailed performance results supporting why I'm making
this change.
Reviewed By: StanislavGlebik
Differential Revision: D19905394
fbshipit-source-id: 8370fd30c9cfd075c3527b9220e4cf4f604705ae
Summary:
The API expects a stream of filenodes to insert, but we actually never used
that ability. Instead, every single callsites has a `Vec`, which it converts to
a stream and passes that in.
I'd like to change this for two reasons:
- It's un-necessary
- It makes the code more complex on the Filenodes implementation side, and less
efficient, since we need to `chunk()` there in small chunks, which might not
all be in the same shard. If we get the entire `Vec` at once, we can chunk on a
per-shard basis (this happens later in this stack).
Besides, if we end up having a stream and wanting the old behavior, we can
always call `chunk()` the stream and call `add_filenodes` on each batch (which
is actually nicer because if you have a futures 0.2 stream that isn't static,
you can do this, but you can't turn it into a `BoxStream`!).
Reviewed By: StanislavGlebik
Differential Revision: D19902537
fbshipit-source-id: a4c030c4a51afbb6e9db133b32464009eed197af
Summary:
The Bytes 0.5 update left us in a somewhat undesirable position where every
access to our blobstore incurs an extra copy whenever we fetch data out of our
cache (by turning it from Bytes 0.5 into Bytes 0.4) — we also have quite a few
place where we convert in one direction then immediately into the other.
Internally, we can start using Bytes 0.5 now. For example, this is useful when
pulling data out of our blobstore and deserializing as Thrift (or conversely,
when serializing and putting it into our blobstore).
However, when we interface with Tokio (i.e. decoders & encoders), we still have
to use Bytes 0.4. So, when needed, we convert our Bytes 0.5 to 0.4 there.
The tradeoff idea is that we deal with more bytes internally than we end up
sending to clients, so doing the Bytes conversion closer to the point of
sending data to clients means less copies.
We can also start removing those once we migrate to Tokio 0.2 (and newer
versions of Hyper for HTTP services).
Changes that were required:
- You can't extend new bytes (because that implicitly copies). You need to use
BytesMut instead, which I did where that was necessary (I also added calls in
the Filestore to do that efficiently).
- You can't create bytes from a `&'a [u8]`, unless `'a` is `'static`. You need
to use `copy_from_slice` instead.
- `slice_to` and `slice_from` have been replaced by a `slice()` function that
takes ranges.
Reviewed By: StanislavGlebik
Differential Revision: D20121350
fbshipit-source-id: eb31af2051fd8c9d31c69b502e2f6f1ce2190cb1
Summary:
Nearly all of the Mononoke SQL stores are instantiated once per repo but they don't store the `RepositoryId` anywhere so every method takes it as argument. And because providing the repo_id on every call is not ergonomical we tend to add methods to blob_repo that just call the right method with the right repo_id in on of the underlying stores (see `get_bonsai_from_globalrev` on blobrepo for example).
Because my reviewers [pushed back](https://our.intern.facebook.com/intern/diff/D19972871/?transaction_id=196961774880671&dest_fbid=1282141621983439) when I've tried to do the same for bonsai_git_mapping I've decided to make it right by adding the repo_id to the BonsaiGitMapping.
Reviewed By: krallin
Differential Revision: D20029485
fbshipit-source-id: 7585c3bf9cc8fa3cbe59ab1e87938f567c09278a
Summary:
During our tests we noticed that we can send too many blobstore read requests to the
mapping. Let's add exponential backoff to prevent that
Reviewed By: ikostia
Differential Revision: D20116043
fbshipit-source-id: 6fecbda4c36a5065b77ba9df561c6d9c6a969089
Summary:
During S196197 lease expired and we were rederiving the same derived data over and over again for a big commit.
this diff adds lease renewal that should help with this problem.
Reviewed By: HarveyHunt
Differential Revision: D20093323
fbshipit-source-id: d139abf6659722f47ea40d9b2f279daa03623ff4
Summary:
This updates our multiplexed blobstore configuration to carry its own DB
config. The upshot of this change is that we can move the blobstore sync queue
(a fairly unruly table) to its own DB.
Another nice side effect of this is that it cleans up a bunch of other code, by
finally decoupling the blobstore config from the DB config. For examples,
places that need to instantiate a blobstore can now to do even without a DB
config (such as wireproto logging).
Obviously, this cannot land until we update the configs to include this. I'll
do so in Configerator prior to landing the diff.
Reviewed By: HarveyHunt
Differential Revision: D19973905
fbshipit-source-id: 79e4ff92cdb989aab4532decd3fe4fd6c55e2bb2
Summary:
By having it in blobrepo we can ensure that all parts of mononoke can access it
easily
Reviewed By: StanislavGlebik
Differential Revision: D19949474
fbshipit-source-id: ac3831d61177c4ef0ad7db248f2a0cc5edb933b1
Summary:
This allows code that is being exercised under async_unit to call into code
that expects a Tokio 0.2 environment (e.g. 0.2 timers).
Unfortunately, this requires turning off LSAN for the async_unit tests, since
it looks like LSAN and Tokio 0.2 don't work very well together, resulting in
LSAN reporting leaked memory for some TLS structures that were initialized by
tokio-preview (regardless of whether the Runtime is being dropped):
https://fb.workplace.com/groups/rust.language/permalink/3249964938385432/
Considering async_unit is effectively only used in Mononoke, and Mononoke
already turns off LSAN in tests for precisely this reason ... it's probably
reasonable to do the same here.
The main body of changes here is also about updating the majority of our
changes to stop calling wait(), and use this new async unit everywhere. This is
effectively a pretty big batch conversion of all of our tests to use async fns
instead of the former approaches. I've also updated a substantial number of
utility functions to be async fns.
A few notable changes here:
- Some pushrebase tests were pretty flaky — the race they look for isn't
deterministic. I added some actual waiting (using pushrebase hooks) to make
it more deterministic. This is kinda copy pasted from the globalrev hook
(where I had introduced this first), but this will do for now.
- The multiplexblob tests don't work at all with new futures, because they call
`poll()` all over the place. I've updated them to new futures, which required
a bit of reworking.
- I took out a couple tests in async unit that were broken anyway.
Reviewed By: StanislavGlebik
Differential Revision: D19902539
fbshipit-source-id: 352b4a531ef5fa855114c1dd8bb4d70ed967dd55
Summary:
Now let's fail if we try to derive data that's not enabled in the config.
Note that this diff also adds `derive_unsafe()` method which should be used in backfiller.
Reviewed By: krallin
Differential Revision: D19807956
fbshipit-source-id: c39af8555164314cafa9691197629ab9ddb956f1
Summary: remove the need to pass mapping to `::derive` method
Reviewed By: StanislavGlebik
Differential Revision: D19856560
fbshipit-source-id: 219af827ea7e077a4c3e678a85c51dc0e3822d79
Summary:
This commit manually synchronizes the internal move of
fbcode/scm/mononoke under fbcode/eden/mononoke which couldn't be
performed by ShipIt automatically.
Reviewed By: StanislavGlebik
Differential Revision: D19722832
fbshipit-source-id: 52fbc8bc42a8940b39872dfb8b00ce9c0f6b0800
Summary:
Before we start blocking generation of derived data let's start with logging if
derived data is not specified.
Reviewed By: farnz
Differential Revision: D19791523
fbshipit-source-id: 15bed8463f8a021de76a2878f06ec95d9fef877f
Summary:
See D19787960 for more details why we need to do it.
This diff just adds a struct in BlobRepo
Reviewed By: HarveyHunt
Differential Revision: D19788395
fbshipit-source-id: d609638432db3061f17aaa6272315f0c2efe9328
Summary:
Fetching things from MySQL sequentially in a buffered fashion is a bad
practice, since we might end up saturating the underlying MySQL pool, and
starving other MySQL clients.
Instead, let's make fewer, bigger queries.
Reviewed By: ahornby
Differential Revision: D19766787
fbshipit-source-id: 1cf9102eaca8cc1ab55b7b85039ca99627a86b71
Summary:
Fetching things from MySQL sequentially in a buffered fashion is a bad
practice, since we might end up saturating the underlying MySQL pool with a lot
of requests. Doing so will result in other queries being delayed as they wait
behind our batch of queries, which results in higher dispatch latency.
Instead, let's make fewer, bigger queries. Also, while we're in here, let's
update blobrepo to have an up-to-date comment.
Reviewed By: StanislavGlebik
Differential Revision: D19766788
fbshipit-source-id: 318ec4778ca259b210d431fc2add8b327bfce99a
Summary:
Suggestions come in the error message as it is currently implemented in
Mercurial code. Format of suggestions also stays the same.
We give the hash, time, author and the title.
All suggestions are ordered (most recent go first).
We don't show them if there are two many.
Reviewed By: krallin
Differential Revision: D19732053
fbshipit-source-id: b94154cbc5a4f440a0053fc3fac2bca2ae0b7119
Summary:
Jump from "generating filenodes while generating hg changeset" to "generate
filenodes separately" is tricky to do without breaking production. This diff
adds additional logic in IncompleteFilenodes that should make this transition
smoother. See code comment for more details.
Reviewed By: krallin
Differential Revision: D19741913
fbshipit-source-id: 48987c15fc4144c50afcee7ae34072f6cd634271