Summary:
Before this diff, we did DNS lookups using a crate called `dns_lookup`. This crate is a thin layer over libc DNS lookups. Those lookups are blocking (i.e. they hold a thread), so they're not very friendly to asynchronous code. We currently offload them on a dedicated thread pool to mitigate the issue, but this isn't ideal: if we experience e.g. slow DNS responses, we could saturate this thread pool pretty easily.
I updated it to use the trust-dns crate, which provides an asynchronous implementation of DNS lookups, and is currently used in other parts of Mononoke.
Reviewed By: krallin
Differential Revision: D25849872
fbshipit-source-id: 826ab4e5618844f3b48e5def4ad9bd163753ebb1
Summary:
The `pull` commmand has a lot of tech debt (with issues like inefficiency, race
conditions, named branches, etc). The new `repo.pull` API isn't designed to
support all legacy usecases.
This diff switches a subset of `pull` command that the new API can support to
the new API. It should solve race condition or `visibility.add` inefficiency
issues (ex. 20s on addgroup, 187s on visibility.add - P154836357).
Logic related to remotenames was ported from the remotenames extension.
The selectivepull tests seem strong enough to check their behaviors.
The new pull API is used by commit cloud for many months. So I turned the
new code path on by default. It can be turned off by changing
`commands.new-pull`.
There are a few test changes. The new behavior seems more correct overall:
- test-commitcloud-switch-workspace.t
- "M/F" swap is caused by rev number: both are correct
- "S: public remote/stable" is more correct than "S: draft"
- test-commitcloud-sync-rb-deletion.t
- "draft1: draft remote/scratch/draft1" is more correct because
"remote/scratch/draft1" is listed in the "book --list-subs"
command above.
- test-commitcloud-sync-rb-enabling.t
- "public1: draft" not showing up is more correct.
- test-fb-hgext-remotefilelog-lfs-prefetch.t
- Difference seems to be caused by rev order.
Reviewed By: markbt
Differential Revision: D25562089
fbshipit-source-id: ac22b2f0492ab53517d580d706dfbc823fd0e0cc
Summary: Add a method to get just the ids. This is a new best case for the fetching, so updated the benchmark as well.
Reviewed By: StanislavGlebik
Differential Revision: D25804027
fbshipit-source-id: ccc4573c8a4ebc07db854a0ffa737f572087019e
Summary: Simplify the tests for bulkops, no need to map to hg to check the expected changesets. Can compare to expected bonsai directly instead as its simpler
Reviewed By: StanislavGlebik
Differential Revision: D25804020
fbshipit-source-id: eb4381c37ed6a4fb1e213f0397ffb2136ddee473
Summary: When scrubbing repos it is preferable to scrub newest data first. This diff adds Direction::NewestFirst to bulkops for use in scrubbing and updates existing call sites to Direction::OldestFirst so as not to change behaviour
Reviewed By: StanislavGlebik
Differential Revision: D25742279
fbshipit-source-id: 363a4854b14e9aa970b2c1ec491dcaccac7a6ec9
Summary: Preparation for adding OldestFirst vs NewestFirst direction of fetching in next diff.
Reviewed By: StanislavGlebik
Differential Revision: D25742281
fbshipit-source-id: 655f297efc2094d4325717d97cce53e697c35597
Summary: Add a benchmark tool for bulkops that uses the criterion benchmarking framework. This is so we can measure effect of optimizations later in stack.
Differential Revision: D25804026
fbshipit-source-id: 71b8addf1145c0ecb69d6392b4602172f2b52080
Summary:
We had a mix of callsites using fetch_all_public_changesets directly vs those using the PublicChangesetBulkFetch::fetch.
Update callsites to use PublicChangesetBulkFetch::fetch for consistency. This also has the nice side effect to removing some explicit config_store usage.
Reviewed By: StanislavGlebik
Differential Revision: D25804019
fbshipit-source-id: 5a88888dd915d1d693fb26ffe3bb359c9e918d5c
Summary:
Right now, we have zero visibility HTTP level errors (such as broken pipes),
because this is all set up on Gotham with no logging. This diff moves the
bootstrapping from Gotham to us to fix this.
There's a bit of code I'd like to deduplicate here, but it's tricky to do given
that the code is a little different in the HTTP vs. HTTPS branches. For now,
this will give us some logging we need without too much effort. We can make it
more robust (and route it to Scuba or give it session IDs) if this proves
useful.
Reviewed By: StanislavGlebik
Differential Revision: D25851426
fbshipit-source-id: 4ca5d1ecb3931715f04af735aa1b7cfdac87846d
Summary:
This updates Gotham. Under the hood I rebased our fork, you can see the diff
here: P161171514.
The stuff that is relevant is that Gotham got rid of its dependency on
`failure` and now uses `anyhow` instead, and I also added a little bit to our
existing socket data patch by making a few things public so that we can get
access to a few more internals.
Reviewed By: StanislavGlebik
Differential Revision: D25850262
fbshipit-source-id: 25ebf5d63be39e3e93208705d91abc5c61c90453
Summary:
Depending on the thrift defition, `thrift_library` targets may also depend on `ref-cast`.
Add this to the `Cargo.toml`.
Reviewed By: lukaspiatkowski
Differential Revision: D25636872
fbshipit-source-id: 8263395db2bb31127528f5c66c4cc5dd9180d89f
Summary:
This diff adds a debug command that allows inserting different kinds of mapping
entries:
1) Rewritten, meaning that source repo commit rewrites into a target repo
commit
2) Equivalent working copy, meaning that source repo commit doesn't rewrite
into a target repo commit, but one of its ancestors does
3) NotSyncCandidate, meaning that large repo commit shouldn't be remapped into
a small repo
Reviewed By: ahornby
Differential Revision: D25844996
fbshipit-source-id: 1ba64540cf511da8cc50c80a5bee822a950707be
Summary: Let's be a bit more consistent and use ARG_VERSION_NAME
Reviewed By: krallin
Differential Revision: D25844995
fbshipit-source-id: c09be5a38ef97bc491b324f49a2c7d0b6a47212e
Summary:
Right now we don't log handler errors to Scuba. This can make debugging a
little tricky: if a client is sending an invalid request, we'll see that we
sent them a 400, but we won't know what was invalid about the request.
This diff updates our logging to actually log that.
Reviewed By: sfilipco
Differential Revision: D25826522
fbshipit-source-id: 89486014e0eeaac5c9b149224601db54a26080d9
Summary:
Cross repo sync has two tables to store mapping between commits (this is
something we should probably change, but this is what we have right now).
"map" subcommand was a bit useless because it searched only in a single table,
so for lots of commits it would just return empty response.
Let's instead return CommitSyncOutcome which gives more useful information
Reviewed By: krallin
Differential Revision: D25823629
fbshipit-source-id: afc14f48b6c30bec3714dc9b79cfc4a7d67e38a9
Summary:
Unbundlereplay command was not implemented in the mononoke but it is used by sync job. So let's add this command here
together with additional integration test for sync between 2 mononoke repos. In addition I'm adding non fast forward bookmark movements by specifying key to sync job.
Reviewed By: StanislavGlebik
Differential Revision: D25803375
fbshipit-source-id: 6be9e8bfed8976d47045bc425c8c796fb0dff064
Summary:
The segmented changelog tailer is going to run with multiple instances which
may race to update the database. This change adds a test that checks that
concurrent updates keep the IdMap correct.
Reviewed By: ahornby
Differential Revision: D25684783
fbshipit-source-id: a09f6e6c915bde38158d9737dcfdc7adc3f15cb7
Summary:
Fixed `README.md` so commands in it work now.
Fixed integration_runner.
Reviewed By: lukaspiatkowski
Differential Revision: D25823461
fbshipit-source-id: 0d6784758c9f86bca38beafe014af4766169bee3
Summary:
This unbreaks the test. The reversefiller need access to SMC to talk to
scmquery (we could set up our own scmquery instance but I don't think it's worh
it).
Reviewed By: krallin
Differential Revision: D25824395
fbshipit-source-id: 676b3ac1e3af95e8e02bd272f7cb25250e047eed
Summary:
Sometimes we want to rechunk just a few file contents, this diff makes it
possible to do so.
Reviewed By: ahornby
Differential Revision: D25804144
fbshipit-source-id: 6ce69f7cee8616a872531bdf5a48746dd401442d
Summary: There is only one implementation of the trait so remove it and use that impl directly. Removing the trait makes it simpler to work on bulkops in the rest of this stack.
Reviewed By: farnz
Differential Revision: D25804021
fbshipit-source-id: 22fe797cf87656932d383ae236f2f867e788a832
Summary:
Right now we get zero logs from the blobstore healer, which is pretty annoying
because it makes it impossible to really tell what it's doing.
This fixes that.
Reviewed By: HarveyHunt
Differential Revision: D25823800
fbshipit-source-id: ded420753ba809626d6e4291eb3d900dcfbff3d1
Summary:
In some cases we might have chunked file content in one blobstore component and
unchunked file content in another. And rechunking the second component was
awkward since we never know which version a filestore will fetch - filestore
can fetch a chunked version and decide that rechunking is not necessary.
This diff makes it possible to rechunk only a single component of a multiplexed
blobstore. It does so by manually creating BlobRepo with the single-component
blobstore.
Reviewed By: krallin
Differential Revision: D25803821
fbshipit-source-id: f2a992b73d0c5fc9d389a4b81e0f3e312c17fdea
Summary: Convert `Changsets` trait and all its uses to new type futures
Reviewed By: krallin
Differential Revision: D25638875
fbshipit-source-id: 947423e2ee47a463861678b146641bcc6b899a4a
Summary:
We should not filter based on parsed level when passiing an inner drain into
the `DynamicLevelDrain`, as in cases when the binary is ran
`--with-dynamic-observability=true`, this would default the level to `INFO` and
make the inner drain filter on that level, which would essentially make debug
logging impossible. Instead, we should pass unfiltering inner drain into
`DynamicLevelDrain`, as `DynamicLevelDrain` actually uses
`ObservabilityContext`, which when the binary is called with `--debug` or
`--level=SOMETHING` would [instantiate](https://fburl.com/diffusion/sib8ayrn) a `Static` variant, behaving just as
current static level filtering.
Note also that this bug does not affect production, as we never actually try to
control the logging levels dynamically: we always run either with `--debug` or
with `--level=SOMETHING`, which again uses `Static` variant of
`ObservabilityContext`, which in turn filters the same way as the inner drain.
Reviewed By: krallin
Differential Revision: D25783488
fbshipit-source-id: 8054863fb655dd66747b6d2306a38c13cbc64443
Summary:
This diff adds an (as yet unused) option to log verbose scuba samples.
Here's the high-level overview.
In addition to doing `scuba_sample.log_with_msg` and `scuba_sample.log()`, you can now do `scuba_sample.log_with_msg_verbose()` and `scuba_sample.log_verbose()`. These two methods indicate that the intended sample is verbose and should go through some filtering prior to logging.
By default verbose samples are just not logged, but there are ways to override this via `ScubaObservabilityConfig`. Namely, the config has a "system" `ScubaVerbosityLevel`, which is either `Normal` or `Verbose`. When the level is `Verbose`, all samples are logged (those triggered by `.log_with_msg()`, `.log()`, `.log_with_msg_verbose()` and `.log_verbose()`. In addition to the "system" verbosity level, `ScubaObservabilityConfig` supports a few filtering overrides: a list of verbose sessions, a list of verbose unixnames and a list of verbose hostnames. Whenever a verbose sample's session, unixname or source hostname belongs to a corresponding list, the sample is logged.
`ScubaObservabilityConfig` is a struct, queried from `configerator` without the need to restart a service. Querying/figuring out whether logging is needed is done by the `ObservabilityContext` struct, which was introduced a few diffs earlier.
Note: I also want to add regex-based filtering for hostnames, as it's likely to be more useful than exact-match filtering, but I will do that later.
Reviewed By: StanislavGlebik
Differential Revision: D25232429
fbshipit-source-id: 057af95fc31f70d796063cefac5b8f7c69d7b3ef
Summary: This was a bit triggering while looking at logs :p
Reviewed By: StanislavGlebik
Differential Revision: D25781047
fbshipit-source-id: 22ebf1273b8b8d0b765c1bc7df2ba93752bf45e8
Summary:
See D25780870 for a bit of context. Our admin server was failing to start up
because of changesets warmup taking too long, but that's not easy to figure out
if all you have are the logs that don't tell you what we are doing (you'd have
to look at counters to work this out).
Let's just log this stuff.
Reviewed By: StanislavGlebik
Differential Revision: D25781048
fbshipit-source-id: 57a783dadc618956f577f32df3d2ec92ee729d56
Summary:
Like it says in the title. This is helpful with e.g. Mononoke server where the
"server" handle includes a long winded startup sequence. Right now, if we get
an error, then we don't get an error message immediately, even if we have one.
This leaves you with logs like this:
```
0105 04:20:48.563924 995374 [main] eden/mononoke/cmdlib/src/helpers.rs:229] Server has exited! Starting shutdown...
I0105 04:20:48.564076 995374 [main] eden/mononoke/cmdlib/src/helpers.rs:240] Waiting 0s before shutting down server
I0105 04:20:48.564238 995374 [main] eden/mononoke/cmdlib/src/helpers.rs:248] Shutting down...
E0105 04:20:48.564315 995374 [main] eden/mononoke/server/src/main.rs:119] could not send termination signal: ()
```
This isn't great because you might have to wait for a while to see the error,
and if something hangs in the shutdown sequence later, then you might not see
it at all.
The downside is we might log twice if we have a server that crashes like this,
but I guess that's probably better than not logging at all.
Reviewed By: StanislavGlebik
Differential Revision: D25781095
fbshipit-source-id: bf5bf016d7aa36e3ff6302175bef1aab826977bc
Summary:
After the refactoring in the previous diff let's stop using CommitSyncConfig in
PushRedirectorArgs and start using get_common_pushrebase_bookmarks() method.
Reviewed By: mitrandir77
Differential Revision: D25636577
fbshipit-source-id: 126b38860b011c5a9506a38d4568e5d51b2af648
Summary:
At the moment we are in the bit of a mess with cross repo sync configuration,
and this diff will try to clean it up a bit.
In particular, we have LiveCommitSyncConfig which is refreshed automatically,
and also we have CommitSyncConfig which is stored in RepoConfig. The latter is
deprecated and is not supposed to be used, however there are still a few places
that do that. This stack is an attempt to clean it up.
In particular deprecated CommitSyncConfig is used to fetch common pushrebase
bookmarks i.e. bookmarks where pushes from both repos are sent. This diff adds
get_common_pushrebase_bookmarks() method to CommitSyncer so that in the later
diffs we can avoid using CommitSyncConfig for that.
Reviewed By: mitrandir77
Differential Revision: D25636394
fbshipit-source-id: 09b049eb8a54834881d215bc6b9c4150377e387f
Summary:
Like it says in the title. Judging by an earlier similar change (D21092866 (15f98fe58c)),
this kind of flakiness in walker tests occurs when a node's children are
reachable via other paths.
Reviewed By: HarveyHunt
Differential Revision: D25756891
fbshipit-source-id: 05bc0697381e068d466ea6dfe85529dbd9ef1a50
Summary: Extended git-import test to include both `full-repo` and `missing-for-commit` import modes.
Reviewed By: ahornby
Differential Revision: D25675361
fbshipit-source-id: b93e2db963c2060540308bf0477cd891d40e5810
Summary:
Managing tailer processes that run multiple times and run once is different. We
want separate code paths when we run contiously than when running only once.
Reviewed By: quark-zju
Differential Revision: D25684782
fbshipit-source-id: 354b32c1dd73f867d6a7b1bd4518d9dd98e6b9a3
Summary:
Comments for why we don't need a lock when updating the SqlIdMap with multiple
writers. Structure can definitely be improved but I'll live with this for a
short time.
No fundamental change in logic. I added extra checks to the insert function and
changed from an optimistic insert race logic to a pessimistic version. I
explain in the comments that it's to have an easier time reasoning about what
happens and that theoretically doesn't matter.
Reviewed By: quark-zju
Differential Revision: D25606290
fbshipit-source-id: ea21915fc797fe759b3fe481e8ad9e8cb594fb6a
Summary: Will reduce the number of jobs needed for small repos
Reviewed By: StanislavGlebik
Differential Revision: D25492059
fbshipit-source-id: de11c06615857ad43f3337e58973849d2026a114
Summary: preparation for multi repo, get the repo name into ErrorKind::NotTraversable
Reviewed By: StanislavGlebik
Differential Revision: D25541444
fbshipit-source-id: 8fd99d5d3f144d8a3a72c7c33205ae58bd5f1ae2
Summary:
In preparation for having the walker able to scrub multiple repos at once, define parameter structs. This also simplifies the code in tail.rs.
The param objects are:
* RepoSubcommandParams - per repo params that can be setup in setup_common and are consumed in the subcommand. They don't get passed through to the walk
* RepoWalkParams - per repo params that can be setup in setup_common and will get passed all the way into the walk.rs methods
* JobWalkParams - per job params that at can be setup in setup_common and will get passed all the way into the walk.rs methods
* TypeWalkParams - per repo params that need to be setup in the subcommands, and are passed all the way into walk.rs
Reviewed By: StanislavGlebik
Differential Revision: D25524256
fbshipit-source-id: bfc8e087e386b6ed45121908b48b6535f65debd3
Summary: parsing of progress options an sampling options was same in each subcommand, move them to functions in setup.rs
Reviewed By: StanislavGlebik
Differential Revision: D25524255
fbshipit-source-id: a2f48814f24aa9b3a158cb7d4abbfc2c0c338305
Summary: Simplify open_blobrepo_given_datasources parameters to pass less arguments, make it so can pass the sql_factory by reference.
Reviewed By: krallin
Differential Revision: D25524254
fbshipit-source-id: c324127f42c53a52f388d303e310014f4fa0d7bb
Summary: Allows the walker blobstore code to be used by more than one blobrepo. This is a step to reduce the number of jobs needed to scrub small repos.
Reviewed By: StanislavGlebik
Differential Revision: D25422937
fbshipit-source-id: e2d11239f172f50680bb6e10dd60026c9e6c3c3d
Summary:
By doing the hg to hg steps via bonsai I will later introduce a check if the bonsai is in the current chunk of commits to be processed as part of allowing walker checkpoint and restart.
On its own this is a minor change to the number of nodes the walk will cover as seen in the updated tests.
Reviewed By: krallin
Differential Revision: D25394085
fbshipit-source-id: 3e50cf76c7032635ce9e6a7375228979b2e9c930
Summary: This is in preparation for all walker hg to hg steps (e.g HgChangeset to Parent HgChangeset) going via Bonsai, which without this would continually check if the filenodes are derived
Reviewed By: krallin
Differential Revision: D25394086
fbshipit-source-id: bb75e7ddf5b09f9d13a0f436627f4c3c95e24430