Summary: we need an option to rebuild skiplists from scratch in case of corruptions.
Reviewed By: farnz
Differential Revision: D21863639
fbshipit-source-id: 56d8360a6c2c38aeb35f534758f5cde410fef421
Summary:
Replace the use of `RepoConfigs::read*` associated functions with free
functions. These didn't really need to be associated functions (and in the
case of the common and storage configs, really didn't belong there either).
Reviewed By: krallin
Differential Revision: D21837270
fbshipit-source-id: 2dc73a880ed66e11ea484b88b749582ebdf8a73f
Summary:
Out `CommitSyncConfig` struct now contains a `version_name` field, which is intended to be used as an identifier of an individual version of the `commitsyncmap` in use. We want to record this value in the `synced_commit_mapping` table, so that later it is possible to attribute a commit sync map to a given commit sync.
This is part of a broader goal of adding support for sync map versioning. The broader goal is important, as it allows us to move faster (and troubleshoot better) when sync maps need changing.
Note that when commit is preserved across repos, we set `version_name` to `NULL`, as it makes no sense to attribute commit sync maps to those case.
Reviewed By: farnz
Differential Revision: D21765408
fbshipit-source-id: 11a77cc4d926e4a4d72322b51675cb78eabcedee
Summary:
The motivation for the whole stack:
At the moment if mysql is down then Mononoke is down as well, both for writes
and for reads. However we can relatively easily improve the situation.
During hg update client sends getpack() requests to fetch files, and currently
for each file fetch we also fetch file's linknode. However hg client knows how
to deal with null linknodes [1], so when mysql is unavailable we can disable
filenode fetching completely and just return null linknodes. So the goal of this stack is to
add a knob (i.e. a tunable) that can turn things filenode fetches on and off, and make
sure the rest of the code deals nicely with this situation.
Now, about this diff. In order to force callers to deal with the fact that
filenodes might unavailable I suggest to add a special type of result, which (in
later diffs) will be returned by every filenodes methods.
This diff just introduces the FilenodeResult and convert BlobRepo filenode
methods to return it. The reason why I converted BlobRepo methods first
is to limit the scope of changes but at the same time show how the callers' code will look
like after FilenodeResult is introduced, and get people's thoughts of whether
it's reasonable or not.
Another important change I'd like to introduce in the next diffs is modifying FilenodesOnlyPublic
derived data to return success if filenodes knob is off. If we don't do that
then any attempt to derive filenodes might fail which in turn would lead to the
same problem we have right now - people won't be able to do hg update/hg
pull/etc if mysql is down.
[1] null linknodes might make some client side operation slower (e.g. hg rebase/log/blame),
so we should use it only in sev-like situations
Reviewed By: krallin
Differential Revision: D21787848
fbshipit-source-id: ad48d5556e995af09295fa43118ff8e3c2b0e53e
Summary:
This diff adds support for the `version_name` field, coming from the
`commitsyncmap` config, stored in the configerator.
Note: ATM, this field is optional in the thrift config, but once we get past
the initial deployment stage, I expect it to be present always. This is why
in `CommmitSyncConfig` I make it `String` (with a default value of `""`) rather
than `Option<String>`. The code, which will be writing this value into
`synced_commit_mapping` should not ever care whether it's present or not, since
every mapping should always have a `version_name` present.
Reviewed By: StanislavGlebik
Differential Revision: D21764641
fbshipit-source-id: 35a7f487acf0562b309fd6d1b6473e3b8024722d
Summary:
I want to reuse the functionality provided by `fetch_all_public_changesets`
in building Segmented Changelog. To share the code I am adding a new crate
intended to store utilities in dealing with bulk fetches.
Reviewed By: krallin
Differential Revision: D21471477
fbshipit-source-id: 609907c95b438504d3a0dee64ab5a8b8b3ab3f24
Summary: Add the mutation store to blobrepo.
Reviewed By: krallin
Differential Revision: D20871336
fbshipit-source-id: 777cba6c2bdcfb16b711dbad61fc6d0d2f337117
Summary:
subcommand_single calls `derived_data_utils.regenerate(vec![cs_id])` with the
intention that derived data for this commit will be regenerated. However
previously it didn't work because DerivedDataUtils::derive() was ignoring
regenerate parameter. This diff fixes it.
Reviewed By: krallin
Differential Revision: D21527344
fbshipit-source-id: 56d93135071a7f3789262b7a9d9ad84a0896c895
Summary:
Limits on concurrent calls are a bit hard to reason about, and it's not super
obvious what a good limit when all our underlying limits are expressed in QPS
(and when our data sets don't have peak concurrency - instead they have
completion time + # blob accesses).
Considering our past experience with ThrottledBlob has been quite positive
overall, I'd like to just use the same approach in ContextConcurrencyBlobstore.
To be safe, I've also updated this to be driven by tunables, which make it
easier to rollout and rollback.
Note that I removed `Debug` on `CoreContext` as part of this because it wasn't
used anywhere. We can bring back a meaningful implementation of `Debug` there
in the future if we want to. That triggered some warnings about unused fields,
which for now I just silenced.
Reviewed By: farnz
Differential Revision: D21449405
fbshipit-source-id: 5ca843694607888653a75067a4396b36e572f070
Summary:
The motivation for making this function async is that it needs to spawn things,
so it should only ever execute while polled by an executor. If we don't do
this, then it can panic if there is no executor, which is annoying.
I've been wanting to do this for a while but hadn't done it because it required
refactoring a lot of things (see the rest of this stack). But, now, it's done.
Reviewed By: mitrandir77
Differential Revision: D21427348
fbshipit-source-id: bad077b90bcf893f38b90e5c470538d2781c51e9
Summary:
This updates our blobrepo factory code to async / await. The underlying
motivation is to make this easier to modify. I've ran into this a few times
now, and I'm sure others have to, so I think it's time.
In doing so, I've simplified the code a little bit to stop passing futures
around when values will do. This makes the code a bit more sequential, but
considering none of those futures were eager in any way, it shouldn't really
make any difference.
Reviewed By: markbt
Differential Revision: D21427290
fbshipit-source-id: e70500b6421a95895247109cec75ca7fde317169
Summary:
This diff logs the delay in deriving data. In particular it logs how much time
has left since an underived commit was created.
Note that this code makes an assumption about monotonic dates - for repos with pushrebase
repos that should be the case.
Reviewed By: krallin
Differential Revision: D21427265
fbshipit-source-id: bfddf594467dfd2424f711f895275fb54a4e1c60
Summary:
Two things will be simplified:
1) Do not pass sqlbookmarks, we can always get them from blobrep
2) Instead of passing repo per derived data type let's just always pass
unredacted repo
Add a very simple unittest
Differential Revision: D21426885
fbshipit-source-id: 712ef23340466438bf34a086517f7ba33d4eabed
Summary:
- Change get return value for `Blobstore` from `BlobstoreBytes` to `BlobstoreGetData` which include `ctime` metadata
- Update the call sites and tests broken due to this change
- Change `ScrubHandler::on_repair` to accept metadata and log ctime
- `Fileblob` and `Manifoldblob` attach the ctime metadata
- Tests for fileblob in `mononoke:blobstore-test` and integration test `test-walker-scrub-blobstore.t`
- Make cachelib based caching use `BlobstoreGetData`
Reviewed By: ahornby
Differential Revision: D21094023
fbshipit-source-id: dc597e888eac2098c0e50d06e80ee180b4f3e069
Summary:
We have a number of error enums that wrap an existing errors, but fail to
register the underlying error as a `#[source]`. This results in truncated
context chains when we print the error. This fixes that. It also removes a
bunch of manual `From` implementation that can be provided by thiserror's
`#[from]`.
This also required updating the `Display` implementation for those errors. I've
opted for not printing the underlying error, since the context chain will
include it. This does mean that if we print one of those errors without the
context chain (i.e. `{}` as opposed to `{:#}` or `{:?}`), then we'll lose out a
bit of context. That said, this should be OK, as we really shouldn't ever being
do this, because we'd be missing the rest of the chain anyways.
Reviewed By: StanislavGlebik
Differential Revision: D21399490
fbshipit-source-id: a970a7ef0a9404e51ea3b59d783ceb7bf33f7328
Summary:
This removes our own (Mononoke's) implementation of failure chains, and instead
replaces them with usage of Anyhow. This doesn't appear to be used anywhere
besides Mononoke.
The historical motivation for failure chains was to make context introspectable
back when we were using Failure. However, we're not using Failure anymore, and
Anyhow does that out of the box with its `context` method, which you can
downcast to the original error or any of the context instances:
https://docs.rs/anyhow/1.0.28/anyhow/trait.Context.html#effect-on-downcasting
Reviewed By: StanislavGlebik
Differential Revision: D21384015
fbshipit-source-id: 1dc08b4b38edf8f9a2c69a1e1572d385c7063dbe
Summary:
Currently we need to specify which derived data we need to derive, however they
are already specified in the configerator configs. Let's just read it from
there.
That means that we no longer need to update tw spec to add new derived data types - we'll just need to add them to configerator and restart the backfiller.
Reviewed By: krallin
Differential Revision: D21378640
fbshipit-source-id: f97c3f0b8bb6dbd23d5a50f479ecfccbebd33897
Summary:
Currently, Mononoke's configs are loaded at startup and only refreshed
during restart. There are some exceptions to this, including throttling limits.
Other Mononoke services (such as the LFS server) have their own implementations
of hot reloadable configs, however there isn't a universally agreed upon method.
Static configs makes it hard to roll out features gradually and safely. If a
bad config option is enabled, it can't be rectified until the entire tier is
restarted. However, Mononoke's code is structured with static configs in mind
and doesn't support hot reloading. Changing this would require a lot of work
(imagine trying to swap out blobstore configs during run time) and wouldn't
necessarily provide much benefit.
Instead, add a subset of hot reloadable configs called tunables. Tunables are
accessible from anywhere in the code and are cheap to read as they only require
reading an atomic value. This means that they can be used even in hot code
paths.
Currently tunables support reloading boolean values and i64s. In the future,
I'd like to expand tunables to include more functionality, such as a rollout
percentage.
The `--tunables-config` flag points to a configerator spec that exports a
Tunables thrift struct. This allows differents tiers and Mononoke services to
have their own tunables. If this isn't provided, `MononokeTunables::default()`
will be used.
This diff adds a proc_macro that will generate the relevant `get` and `update`
methods for the fields added to a struct which derives `Tunables`. This struct is
then stored in a `once_cell` and can be accessed using `tunables::tunables()`.
To add a new tunable, add a field to the `MononokeTunables` struct that is of
type `AtomicBool` or `AtomicI64`. Update the relevant tunables configerator
config to include your new field, with the exact same name.
Removing a tunable from `MononokeTunables` is fine, as is removing a tunable
from configerator.
If the `--tunables-config` path isn't passed, then a default tunables config
located at `scm/mononoke/tunables/default` will be loaded. There is also the
`--disable-tunables` flag that won't load anything from configerator, it
will instead use the `Tunable` struct's `default()` method to initialise it.
This is useful in integration tests.
Reviewed By: StanislavGlebik
Differential Revision: D21177252
fbshipit-source-id: 02a93c1ceee99066019b23d81ea308e4c565d371
Summary:
Add a new function that initialises common parts of Mononoke:
- cachelib
- logging
- runtime
In the next diff, I will also update `init_mononoke` to initialise tunables
too.
I left some users of `init_runtime` unchanged (such as `mononoke_admin`) as
they conditionally enable cachelib.
Reviewed By: krallin
Differential Revision: D21177253
fbshipit-source-id: 40b534d1b244fd8dcd05e1cff1c9f3edfb32a4b9
Summary:
This updates our bounds checking in hg_sync to be a bit more explicit. I
actually didn't realize at all that we were checking for > 0 somewhere for a
reason (StanislavGlebik noticed in D21228409).
Anyhow. This makes the checking a bit more explicit / obvious.
Reviewed By: ikostia
Differential Revision: D21229755
fbshipit-source-id: 88071b9bfbab428785fcb929ae2c04fee942067f
Summary:
It can be quite convenient when you have a bundle ID to be able to quickly
translate it to a Bonsai and get information about said Bonsai in order to e.g.
identify whether it was particularly large. This adds that functionality.
Reviewed By: StanislavGlebik
Differential Revision: D21228409
fbshipit-source-id: fc2ff938ff16e99c88b3e522b7ac39c4f39d60f2
Summary:
This code had grown into a pretty big monster of a future. This makes it a bit
easier to modify and work with.
Reviewed By: StanislavGlebik
Differential Revision: D21227210
fbshipit-source-id: 5982daac4d77d60428e80dc6a028cb838e6fade0
Summary:
In the next diffs I'd like to introduce cleaner for unodes. This diff just
moves a bunch of code around to make reviewing next diffs easier
Reviewed By: krallin
Differential Revision: D21226921
fbshipit-source-id: c9f9b37bf9b11f36f8fc070dfa293fd8e6025338
Summary:
This diff adds a special dry-run mode of backfilling (for now only fsnodes are
supported). It does by keeping all derived data in memory (i.e. nothing is
written to blobstore) and periodically cleaning entries that can no longer
be referenced.
This mode can be useful to e.g. estimate size of derived data before actually
running the derivation.
Note that it requires --readonly-storage in order to make sure that we don't
accidentally write anything to e.g. mysql.
Reviewed By: ahornby
Differential Revision: D21088989
fbshipit-source-id: aeb299d5dd90a7da1e06a6be0b6d64b814bc7bde
Summary: File is getting too large - let's split it
Reviewed By: farnz
Differential Revision: D21180807
fbshipit-source-id: 43f0af8e17ed9354a575b8f4dac6a9fe888e8b6f
Summary:
Now that subcommand building is extracted into separate files, it feels logical
to also put arg definitions there.
Reviewed By: StanislavGlebik
Differential Revision: D21143851
fbshipit-source-id: fee7ce72a544cf66e6bc26b7128aa95b2b9ea5f3
Summary:
This feels like a more natural place to store them. Also, it will make
`main.rs` more readable.
Reviewed By: StanislavGlebik
Differential Revision: D21143850
fbshipit-source-id: 6ab3ec268beea92d7f897860f7688a775d60c4bf
Summary: See comments for more details
Reviewed By: krallin
Differential Revision: D21088800
fbshipit-source-id: b4c187b5d4d476602e69d26d71d3fe1252fd78e6
Summary: I think it's more readable this way
Reviewed By: krallin
Differential Revision: D21088598
fbshipit-source-id: 1608c250701ae6870094f0f61c0c2ce4e2c12ebf
Summary:
In the next few diffs I'm going to add more functionality to backfill derived
data. The file has grown quite big already, so I'd rather put this new
functionality in a separate file. This diff does the first step - it just moves
a file to a separte directory.
Reviewed By: farnz
Differential Revision: D21087813
fbshipit-source-id: 4a8e3eac4b8d478aa4ceca6bb55fa0d2973068ba
Summary:
I was getting weird results from SQLBlob, which I traced to connections being closed a certain amount of time after the first query.
Fix this by recreating the blob store (thus using new connections) each time.
Reviewed By: StanislavGlebik
Differential Revision: D21089041
fbshipit-source-id: e94f8993d64dbd81d9f122f92d64aa92dad8514f
Summary:
We used to implicitly do this when creating the sync queue (though it wasn't
needed there - if we don't wait we crash later when checking for replication
lag), but we no longer do after the SqlConstruct refactor.
This fixes that so now we can start the healer again.
Reviewed By: farnz
Differential Revision: D21063118
fbshipit-source-id: 24f236d10b411bc9a5694b42c19bf2afa352a54c
Summary:
Being told `Input/output error: Connection refused (os error 111)` isn't very
helpful when things are broken. However, being told:
```
Execution error: While waiting for replication
Caused by:
0: While fetching repliction lag for altoona
1: Input/output error: Connection refused (os error 111)
```
Is nicer.
Reviewed By: farnz
Differential Revision: D21063120
fbshipit-source-id: 1408b9eca025b120790a95d336895d2f50be3d5d
Summary:
This turns out quite nice because we had some futures there that were always
`Ok`, and now we can use `Output` instead of `Item` and `Error`.
Reviewed By: ahornby
Differential Revision: D21063119
fbshipit-source-id: ab5dc67589f79c898d742a276a9872f82ee7e3f9
Summary:
I'd like to do a bit of work on this, so might as well convert it to async /
await first.
Reviewed By: ahornby
Differential Revision: D21063121
fbshipit-source-id: e388d59cecf5ba68d9bdf551868cea79765606f7
Summary:
This is generally something I wanted to have for a long time: instead of having to open a writable db shell, now we can just use the admin command. Also, this will be easier to document in the oncall wikis.
NB: this is lacking the `delete` functionality atm, but that one is almost never needed.
Reviewed By: krallin
Differential Revision: D21039606
fbshipit-source-id: 7b329e1782d1898f1a8a936bc711472fdc118a96
Summary: It makes it backfill a great deal faster
Reviewed By: krallin
Differential Revision: D21040292
fbshipit-source-id: f6d06cbc76e710b4812f15e85eba73b24cdbbd3e
Summary:
This is important for various syncs.
Note: there's an obvious race condition, TOCTTOU is non-zero for existing bookmark locations. I don't think this is a problem, as we can always re-run the admin.
Reviewed By: StanislavGlebik
Differential Revision: D21017448
fbshipit-source-id: 1e89df0bb33276a5a314301fb6f2c5049247d0cf
Summary:
Some time ago we decided on the "redaction" naming for this feature. A few
places were left unfixed.
Reviewed By: xavierd
Differential Revision: D21021354
fbshipit-source-id: 18cd86ae9d5c4eb98b843939273cfd4ab5a65a3a
Summary:
This diff may not have quite the right semantics.
It switches `prefetch_content` to async syntax,
in the process getting rid of the old function `spawn_future`,
which assumes old-style futures, in favor of using
`try_for_each_concurrent` to handle concurrency.
In the process, we were able to remove a couple levels of clones.
I *think* that the old code - in which each call to `spawn_future`
would spin off its own future on the side but then also wait
for completion, and then we buffered - would run at most 256
versions of `prefetch_content_node` at a time, and the current
code is the same. But it's possible that I've either halved or
doubled the concurrency somehow here, if I lost track of the
details.
Reviewed By: krallin
Differential Revision: D20665559
fbshipit-source-id: d95d50093f7a9ea5a04c835baea66e07a7090d14
Summary:
By switching to the new futures api, we can save a few heap allocations
and reduce indentation of the code.
Reviewed By: krallin
Differential Revision: D20666338
fbshipit-source-id: 730a97e0365c31ec1a8ab2995cba6dcbf7982ecd
Summary:
RepoBlobstore is currently a type alias for the underlying blobstore type. This
is a bit unideal for a few reasons:
- It means we can't add convenience methods on it. Notably, getting access to
the underlying blobstore can be helpful in tests, but as-is we cannot do that
(see the test that I updated in the LFS server change in this diff for an
example).
- Since the various blobstores we use for wrapping are blobstores themselves,
it is possible when deconstructing the repo blobstore to accidentally forget
to remove one layer. By making the internal blobstore a `T`, we can let the
compiler prove that deconstructing the `RepoBlobstore` is done properly.
Most of the changes in this diff are slight refactorings to make this compile
(e.g. removing obsolete trait bounds, etc.), but there are a couple functional
changes:
- I've extracted the RedactedBlobstore configuration into its own Arc. This
enables us to pull it back out of a RedactedBlobstore without having to copy
the actual data that's in it.
- I've removed `as_inner()` and `into_inner()` from `RedactedBlobstore`. Those
methods didn't really make sense. They had 2 use cases:
- Deconstruct the `RedactedBlobstore` (to rebuild a new blobstore). This is
better handled by `as_parts()`.
- Get the underlying blobstore to make a request. This is better handled by
yielding the blobstore when checking for access, which also ensures you
cannot accidentally bypass redaction by using `as_inner()` (this which also
allowed me to remove a clone on blobstore in the process).
Reviewed By: farnz
Differential Revision: D20941351
fbshipit-source-id: 9fa566702598b916cb87be6b3f064cd7e8e0b3e0
Summary:
Filenode envelopes have metadata, let's display it as well.
Althouth I've never seen it being non-empty, whenever I investigate some
filenode difference, I would like to know for sure.
Reviewed By: StanislavGlebik
Differential Revision: D20951954
fbshipit-source-id: 188321591e0d591d31e1ca765994f953dc23221c
Summary: It says I was doing it the slow way. Do it the fast way
Reviewed By: krallin
Differential Revision: D20926911
fbshipit-source-id: 65790d510d626e70a402c22a2df5d7606427aa7f
Summary: In production, we'll never look at blobstores on their own. Use the standard cachelib and memcache layers in benchmarks to test with caching.
Reviewed By: krallin
Differential Revision: D20926910
fbshipit-source-id: 030dcf7ced76293eda269a31adc153eb6d51b48a
Summary: This lets us look at a blobstore's behaviour for repeated single reads, parallel same-blob reads, and parallel reads to multiple blobs.
Reviewed By: krallin
Differential Revision: D20920206
fbshipit-source-id: 24d9a58024318ff3454fbbf44d6f461355191c55
Summary:
We were exluding warmup, which might take a noticeable amount of time. Let's
measure everything
Reviewed By: krallin
Differential Revision: D20920211
fbshipit-source-id: f48b0c2425eb2bae2991fa537dde1bc61b5e44ac