Commit Graph

783 Commits

Author SHA1 Message Date
Thomas Orozco
cb48ff2092 mononoke/revset: buffer loading generations
Summary:
In the hg sync job, we need to load up the ancestors for all bookmarks known to
the server we are pushhing to, and for e.g. fbsource that might be > 10K
bookmarks. If we fetch those 1 by 1 (because e.g. cold cache), that will take a
very long time.

Unfortunately, we don't currently have a way of buffering access to changesets,
so for now let's mitigate by buffering.

Reviewed By: ikostia, HarveyHunt

Differential Revision: D21860228

fbshipit-source-id: 90977a9e00689c1df5ae53d149c267de9b2f973e
2020-06-04 01:24:08 -07:00
Thomas Orozco
1f030cf603 mononoke/revset: don't poll() things manually
Summary:
When you call poll() from within an async context (as opposed to calling
await), awkward things happen, like this:

```
thread 'validation::test::slow_ready_validates' panicked at 'no Task is currently running',
```

So, let's stop doing that by instead checking that our stream eventually
yields. We can't count the polls this way, but that feels a bit immaterial so
that's probably OK. Also, let's clean up the code a little bit by removing a
bunch of conditional compilation.

Reviewed By: ahornby

Differential Revision: D21862830

fbshipit-source-id: 3eb49575d940ca85f59c49295dd6b0dcfb2e5d15
2020-06-04 01:24:08 -07:00
Stanislau Hlebik
ad514c1e2a mononoke: rename filenodes tunables to sql_timeout_knobs
Summary:
We are going to start using tunables in Mononoke in the next diffs, and the
name clash between "tunables" and "newfilenodes::tunables" makes it confusing.
Let's rename newfilenodes::tunables to sql_timeout_knobs

Reviewed By: krallin

Differential Revision: D21879093

fbshipit-source-id: ab0bae4be3c319dcb6afeecdd1c13df395e79e3b
2020-06-04 01:18:18 -07:00
Mateusz Kwapich
0184270052 add option to rebuild skiplists
Summary: we need an option to rebuild skiplists from scratch in case of corruptions.

Reviewed By: farnz

Differential Revision: D21863639

fbshipit-source-id: 56d8360a6c2c38aeb35f534758f5cde410fef421
2020-06-03 14:24:16 -07:00
Lukas Piatkowski
8efc16b157 common/rust/identity_ext: unify identity parsing into a single crate
Summary: The `secure_utils` crate from common/rust/secure_utils was moved to rust-shed, the remaining crates in that folder are being refactored here into a single crate `identity_ext` for clarity.

Reviewed By: StanislavGlebik

Differential Revision: D21549861

fbshipit-source-id: 4da6566a09ba7a772e8062632f9d7520af2e09e6
2020-06-03 13:16:24 -07:00
Lukas Piatkowski
3d1587c3da rust-shed: add secure_utils to the shed
Reviewed By: StanislavGlebik

Differential Revision: D21549859

fbshipit-source-id: 0e143354a60578732ae1eed8c3c71b9f859e3958
2020-06-03 13:16:23 -07:00
Stanislau Hlebik
a4b21e589f mononoke: return dummy FilenodesOnlyPublic value if filenodes are disabled
Summary:
In the next diffs we'll make it possible to disable filenodes in Mononoke. See
D21787848 and attached task for more details, but TL;DR is that if xdb is down
we still want to serve "hg update" traffic.

If filenodes are disabled we obviously can't generate filenodes for new
commits. So one option would be to just return an error from
FilenodesOnlyPublic::derive(...) call. But that would mean that any attempt to
call derivation would fail, and e.g. Mononoke servers won't be able to start up
- (see https://fburl.com/diffusion/roau028d). We could change callers to always
process errors from FilenodesOnlyPublic, but I think it would be harder to
enforce and easier to forget.

So this diff changes FilenodesOnlyPublic to be an enum, and
FilenodesOnlyPublic::Disabled is returned immediately if filenodes are
disabled. For callers it means that they can't rely on filenodes being present
in db even after FilenodesOnlyPublic were derived. That's the whole of the
stack, and the next diffs will update the callers to properly deal with missing
filenodes.

One caveat is that when we re-enable filenodes back we might need to derive
them for a lot of commits.
I don't expect it to happen often (i.e. if xdb is down then we probably can't
commit anyway), but if somehow it happened, then we should be a bit more
careful with re-enabling them after the problem was fixed. For example, we can
first derive all the filenodes locally by e.g. running backfill_derived_data,
and only after that has finished successfully we can re-enable them.

Reviewed By: krallin

Differential Revision: D21840328

fbshipit-source-id: ce9594d4a21110a5cb392c3049ccaede064c1e66
2020-06-03 04:21:23 -07:00
Stefan Filip
da6a88756e mononoke: add IdMap::insert_many
Summary: Bulk insertion to the SQL backed IdMap.

Reviewed By: StanislavGlebik

Differential Revision: D21655847

fbshipit-source-id: 937cb910edbe399fed4e6b0c4013e18378cea82f
2020-06-02 19:17:53 -07:00
Stefan Filip
134a7a425c mononoke: update SegmentedChangelog building to pick up where it left off
Summary:
Instead of always building from scratch, continue assiging Vertexes and
Segments from the last commit that was processed.

Reviewed By: StanislavGlebik

Differential Revision: D21634699

fbshipit-source-id: 9f8b890dcf65c59a66651343f0ccc1487efc2394
2020-06-02 19:17:53 -07:00
Arun Kulshreshtha
3223d12f99 edenapi: add data subcommand to read_res
Summary: Previously, `read_res` was called `data_util` and only dealt with EdenAPI data responses. Support for history responses was added later as a `history` subcommand. For consistency, let's move the top-level commands for data responses underneath a new `data` subcommand. When support for addition response types is added in the future, those can also go under their own subcommands.

Reviewed By: quark-zju

Differential Revision: D21825197

fbshipit-source-id: f5cb759a68324e7d0f98e3448bd5d1cba6417bad
2020-06-02 12:49:18 -07:00
Arun Kulshreshtha
735b112d97 edenapi: rename data_util to read_res
Summary: Give this tool a more descriptive name. (It reads EdenAPI responses, so `read_res` seemed fitting.)

Reviewed By: quark-zju

Differential Revision: D21796964

fbshipit-source-id: 8a4ee365aa3bcf115fc7a3452406ed96b4a25edc
2020-06-02 12:49:18 -07:00
Mark Thomas
26b1d53e69 metaconfig/parser: clean up conversion functions
Summary:
Clean up some of the conversion functions by renaming variables that are
keywords in other languages, and simplifying error handling code.

Differential Revision: D21839019

fbshipit-source-id: d8945a14a230caa744040e134203a908ad9cef20
2020-06-02 09:30:04 -07:00
Mark Thomas
5bf8469e9d metaconfig/parser: apply clippy suggestions
Reviewed By: farnz

Differential Revision: D21838128

fbshipit-source-id: 10ba30c9f15b7119e6a6537a7e3c605f6d0698aa
2020-06-02 09:30:04 -07:00
Mark Thomas
c6d694b758 metaconfig/parser: rename ErrorKind to ConfigurationError
Summary: `ErrorKind` is not meaningful, and is an artifact of older-style error handling crates.  A better name is `ConfigurationError`.

Reviewed By: krallin

Differential Revision: D21837271

fbshipit-source-id: 709d9e2ab7f18dd2f7cb2489f24e91612bc378db
2020-06-02 09:30:04 -07:00
Mark Thomas
6ad7e5c96e metaconfig/parser: use free functions for loading configuration
Summary:
Replace the use of `RepoConfigs::read*` associated functions with free
functions.  These didn't really need to be associated functions (and in the
case of the common and storage configs, really didn't belong there either).

Reviewed By: krallin

Differential Revision: D21837270

fbshipit-source-id: 2dc73a880ed66e11ea484b88b749582ebdf8a73f
2020-06-02 09:30:03 -07:00
Mark Thomas
c1a54a1e21 metaconfig/parser: refactor repo config
Summary:
Refactor parsing of repo config using a new `Convert` trait to allow
definition of each part of parsing separately.

The wireproto logging args require access to the storage definitions, so need
to be parsed by their own special function for now.

Differential Revision: D21837269

fbshipit-source-id: 7ab0e3f4b3b8549aaefb45201388c3dfc7633ef7
2020-06-02 09:30:03 -07:00
Mark Thomas
9e7badc92c metaconfig/parser: refactor storage config
Summary:
Refactor parsing of storage config using a new `Convert` trait to allow
definition of each part of parsing separately.

Differential Revision: D21766761

fbshipit-source-id: 7e224e9d322a3a16a64f5ebba2243bbe6341c8f0
2020-06-02 09:30:02 -07:00
Mark Thomas
5ff4803e81 metaconfig/parser: refactor commit sync config
Summary:
Refactor parsing of commit sync config using a new `Convert` trait to allow
definition of each part of parsing separately.

Differential Revision: D21766760

fbshipit-source-id: 3c95d70788753316d3c1f36280e7d6dbb52a9710
2020-06-02 09:30:02 -07:00
Stanislau Hlebik
af1f3a0d8d mononoke: add filenodes_disable tunable
Summary:
We'd like to serve read traffic even if filenodes are disabled. Let's add a
tunable that can control it.

Reviewed By: HarveyHunt

Differential Revision: D21839672

fbshipit-source-id: 4ec4dd16b9e6e3ffb1ada0d812e1153e1a33a268
2020-06-02 09:22:43 -07:00
Stanislau Hlebik
cdbb734b29 mononoke: remove scs_enable_history_across_deletions tunable
Summary: It was replaced with a parameter

Reviewed By: HarveyHunt

Differential Revision: D21839397

fbshipit-source-id: e75900b3da80985cd762659993b8b285411fe928
2020-06-02 09:22:43 -07:00
Stanislau Hlebik
a47388f536 mononoke: remove DefferedDerivedMapping
Summary:
DefferedDerivedMapping was added so that we can make deriving stack of commits faster - it does it by postponing updating
derived data mapping (e.g. writing to a blobstore) until the whole stack is derived.
While it probably makes derivation a bit faster, we now think it's better to remove it. A few reasons:

1) It's confusing to understand and it already caused us ubns before
2) It's increases write amplification - because we release the lease before we wrote to a blobstore, writers will try to rederive the same commit a few times. That has caused us a ubn today

Reviewed By: farnz

Differential Revision: D20113854

fbshipit-source-id: 169e05febcd382334bf4da209a20aace0b7c2333
2020-06-02 01:46:08 -07:00
Stanislau Hlebik
094bf1d44f mononoke: process wantslfspointers from clienttelemtry
Summary:
See D21765065 for more context. TL;DR is that we want to control
lfs rollout from client side to make sure we don't put lfs pointers in the
shared memcache

Reviewed By: xavierd

Differential Revision: D21822159

fbshipit-source-id: daea6078d95eb4e9c040d353a20bcdf1b6ae07b1
2020-06-01 15:19:36 -07:00
Simon Farnsworth
bac2a66965 Rip out internal cache from SQLBlob
Summary: This cache didn't improve performance, and is a reasonable amount of extra code. Just rip it out, and improve higher caching layers if needed.

Reviewed By: StanislavGlebik

Differential Revision: D21818412

fbshipit-source-id: 3ca58bff180c6da91d451754b67b23e61b736059
2020-06-01 10:09:55 -07:00
Simon Farnsworth
c7f5edd7d1 Use MyAdmin to account for replication lag during puts
Summary:
We can kill a single shard's replication with weight of puts; by tracking lag once a second, we can slow down to a point where no shard overloads.

This is insufficient by itself, as it slows all shards down equally (whereas we only want to slow down the laggy shard), but this now avoids high replication lag

Reviewed By: StanislavGlebik

Differential Revision: D21720833

fbshipit-source-id: a819f641206641e80f8edde92006fb08cdcf36a9
2020-06-01 10:09:54 -07:00
Kostia Balytskyi
5cd4583c5a cross_repo: record sync map version_name in synced_commit_mapping
Summary:
Out `CommitSyncConfig` struct now contains a `version_name` field, which is intended to be used as an identifier of an individual version of the `commitsyncmap` in use. We want to record this value in the `synced_commit_mapping` table, so that later it is possible to attribute a commit sync map to a given commit sync.

This is part of a broader goal of adding support for sync map versioning. The broader goal is important, as it allows us to move faster (and troubleshoot better) when sync maps need changing.

Note that when commit is preserved across repos, we set `version_name` to `NULL`, as it makes no sense to attribute commit sync maps to those case.

Reviewed By: farnz

Differential Revision: D21765408

fbshipit-source-id: 11a77cc4d926e4a4d72322b51675cb78eabcedee
2020-06-01 07:16:52 -07:00
Egor Tkachenko
84586b342e Replace hardcoded hgsql db address in mononoke_hg_sync_job
Summary: Parse config inside mononoke_hg_sync_job to get proper hgsql address for repo, instead using address from command-line argument.

Reviewed By: krallin

Differential Revision: D21638740

fbshipit-source-id: 1319126535c2c148be47172663e86a82b5e92b40
2020-06-01 06:05:36 -07:00
Stanislau Hlebik
fde6230ea2 RFC: introduce FilenodeResult
Summary:
The motivation for the whole stack:
At the moment if mysql is down then Mononoke is down as well, both for writes
and for reads. However we can relatively easily improve the situation.
During hg update client sends getpack() requests to fetch files, and currently
for each file fetch we also fetch file's linknode. However hg client knows how
to deal with null linknodes [1], so when mysql is unavailable we can disable
filenode fetching completely and just return null linknodes. So the goal of this stack is to
add a knob (i.e. a tunable) that can turn things filenode fetches on and off, and make
sure the rest of the code deals nicely with this situation.

Now, about this diff. In order to force callers to deal with the fact that
filenodes might unavailable I suggest to add a special type of result, which (in
later diffs) will be returned by every filenodes methods.

This diff just introduces the FilenodeResult and convert BlobRepo filenode
methods to return it. The reason why I converted BlobRepo methods first
is to limit the scope of changes but at the same time show how the callers' code will look
like after FilenodeResult is introduced, and get people's thoughts of whether
it's reasonable or not.

Another important change I'd like to introduce in the next diffs is modifying FilenodesOnlyPublic
derived data to return success if filenodes knob is off. If we don't do that
then any attempt to derive filenodes might fail which in turn would lead to the
same problem we have right now - people won't be able to do hg update/hg
pull/etc if mysql is down.

[1] null linknodes might make some client side operation slower (e.g. hg rebase/log/blame),
so we should use it only in sev-like situations

Reviewed By: krallin

Differential Revision: D21787848

fbshipit-source-id: ad48d5556e995af09295fa43118ff8e3c2b0e53e
2020-06-01 05:27:34 -07:00
Thomas Orozco
6215d7daa4 common/rust/{scuba, scribe}: use a Duration for flush
Summary:
`flush()` takes a timeout, which is in milliseconds. However, that's not super
obvious, so that means whoever is using this has to re-discover it. Let's make
it explicit, and use a `Duration` argument instead.

(In fact, some places even got it wrong, notably common/rust/cli_usage)

Reviewed By: farnz

Differential Revision: D21783991

fbshipit-source-id: 6e3ac7c22b5c3297b41d5d61373ee077f12d5dd4
2020-06-01 01:38:58 -07:00
Arun Kulshreshtha
10fa44290e edenapi: use array to specify keys in history requests
Summary: Update the JSON format for history requests to use an array rather than an object to represent keys, for the same reason as D21412989. (Namely, that it's possible for two keys to share the same path, making the path unsuitable for use as a field name in a JSON object.)

Reviewed By: xavierd

Differential Revision: D21782763

fbshipit-source-id: eb04013795d1279ecbf00a8a0be106318695bd05
2020-05-29 15:45:10 -07:00
Kostia Balytskyi
78bf80c13a metaconfig: learn about version_name for CommitSyncConfig
Summary:
This diff adds support for the `version_name` field, coming from the
`commitsyncmap` config, stored in the configerator.

Note: ATM, this field is optional in the thrift config, but once we get past
the initial deployment stage, I expect it to be present always. This is why
in `CommmitSyncConfig` I make it `String` (with a default value of `""`) rather
than `Option<String>`. The code, which will be writing this value into
`synced_commit_mapping` should not ever care whether it's present or not, since
every mapping should always have a `version_name` present.

Reviewed By: StanislavGlebik

Differential Revision: D21764641

fbshipit-source-id: 35a7f487acf0562b309fd6d1b6473e3b8024722d
2020-05-29 09:31:33 -07:00
Egor Tkachenko
f12a6d9a77 Revert "rust/thrift: add an option to stop processing requests if client disconnected"
Summary: Reverting as D20763778 is a suspect in causing thrift_server_overload T67609407

Reviewed By: StanislavGlebik

Differential Revision: D21762641

fbshipit-source-id: 545b448afc0954271a2e29d1d3b48fdb959e3d3d
2020-05-28 07:41:09 -07:00
Stanislau Hlebik
7ef7fca580 mononoke: support ManualMove in hg sync job
Reviewed By: krallin

Differential Revision: D21762295

fbshipit-source-id: 4ab66634a0e9976ca2c0de4b2841c0b9bf4afd35
2020-05-28 07:24:05 -07:00
Mark Thomas
354c42f32c tests_utils: add_file content can take ownership
Summary:
Change the signature of `CreateCommitContext::as_file` and its associated
functions so that content is `impl Into<String>`, rather than
`impl AsRef<str>`.  The content will immediately be converted to a `String`
anyway, so we can avoid a string copy if the caller already has a string that
can be moved.

Reviewed By: krallin

Differential Revision: D21743429

fbshipit-source-id: d54914386439489fe4e47e37ff9a75c52b1a0443
2020-05-28 06:22:33 -07:00
Mark Thomas
807b7a4261 test_utils: add drawdag support
Summary:
Add support for drawdag in Mononoke unit tests.  Tests can use ASCII DAGs to construct
commit graphs, and can optionally customize the content of each commit.

Reviewed By: krallin

Differential Revision: D21743431

fbshipit-source-id: 9e6a52d1efe67ef4a5519ed7783f953fef7358f1
2020-05-28 06:22:33 -07:00
Mark Thomas
e0228de44a config: sync from configerator
Reviewed By: mitrandir77

Differential Revision: D21742559

fbshipit-source-id: f5e8e8802ddd2a5c760ece341b47b64da1b61277
2020-05-28 03:03:44 -07:00
Mark Thomas
ce103a7886 metaconfig: don't pattern destructure RawInfinitepushParams
Summary:
The parser currently uses pattern destructuring for `RawInfinitepushParams`.  This will break
if new fields are added to this structure.  Instead, use field access like the other raw
params parsers.

Reviewed By: mitrandir77

Differential Revision: D21742558

fbshipit-source-id: 6bfbb080a5e5cdbb02519855472f4df80f9d7453
2020-05-28 03:03:44 -07:00
Stanislau Hlebik
57f593f988 configerator-thrift-updater: sync D21743489
Reviewed By: ikostia

Differential Revision: D21743524

fbshipit-source-id: f320c19d5451942ffa3bab73148a657c322ce637
2020-05-27 18:47:01 -07:00
Stanislau Hlebik
e4023eaf19 mononoke: remove UseExistingIfAvailable from hg sync job
Summary:
It was used only once for testing push redirection. We no longer need it, so
I'd like to delete it to remove this old code and also to make it easier to
support ManualMove bookmarks.

Differential Revision: D21745630

fbshipit-source-id: 362952d95edb923cc4b60359321b563c1e4961de
2020-05-27 13:38:02 -07:00
Stefan Filip
e44681e307 mononoke: add IdMap::get_last_entry
Summary: Useful for determining where an incremental building step left off.

Reviewed By: StanislavGlebik

Differential Revision: D21634698

fbshipit-source-id: e9b0473003c529d5c934754f1ece23df69c4be66
2020-05-26 07:37:11 -07:00
Kostia Balytskyi
01695d59e8 commitcloud_backfiller: add tests for forward and reverse fillers
Summary:
This diff extends the integration test for the forward filler to execute queue operations, as well as the core business logic.
It also adds a test for the reverse filler, which does the same, but in a different difection.

Reviewed By: krallin

Differential Revision: D21628705

fbshipit-source-id: fb4ee0ecacc990d073425f3f37f794f74c057ea2
2020-05-26 07:00:12 -07:00
Kostia Balytskyi
56eea5b6f6 commitcloud_backfiller: add reverse filler
Summary:
This diff finally introduce the continuous reverse filler. Specifically, this adds a cli (and underlying wiring) to operate the filler logic in the `reversefillerqueue` table.

To achieve this:
- the filler class is turned into a base class with two subclasses for the forward and reverse fillers
- the main file is renamed from `forwardfiller.py` into `filler.py`, to better reflect the independence of direction.

Reviewed By: krallin

Differential Revision: D21628259

fbshipit-source-id: 5676a162a62f0dc6fe80e6300b72d30370fc80b4
2020-05-26 07:00:12 -07:00
Stanislau Hlebik
7ccd28de14 scs_server: add track_history_across_deletions parameter
Reviewed By: krallin

Differential Revision: D21718738

fbshipit-source-id: b7dd7e05b1773ac1a442d89991549f0f97e1e55b
2020-05-26 05:38:55 -07:00
Alex Hornby
a7a4dcd046 mononoke: add devdb support to integration test runner
Summary: Add devdb support to integration test runner so that one can more easily inspect mysql test data, also makes it easier to run tests with locally modified schema.

Reviewed By: HarveyHunt

Differential Revision: D21645234

fbshipit-source-id: ec75d70ef59f04548c7346a122298567dd09c264
2020-05-26 02:36:12 -07:00
Stefan Filip
a2254714c3 mononoke: update bulkops::fetch_all_public_changesets to return commits in order
Summary:
At first glance people will assume that changesets are returned in the same
order that they were added in the database or that at least commits are
returned in a deterministic fashion. That didn't happen because the both
changeset ids and changeset entries were received without any order.
This diff updates the function to returns results in order they were added
to the database.

Reviewed By: krallin

Differential Revision: D21676663

fbshipit-source-id: 912e6bea0532796b1d8e44e47d832c0420d97bc1
2020-05-21 20:43:45 -07:00
Stefan Filip
cafbbcd9c8 mononoke: add MemIdMap
Summary:
This structure has similar functionality to the IdMap that is backed by SQL.
It is probably going to be useful for caching in the case of batch operations.

Reviewed By: quark-zju

Differential Revision: D21601820

fbshipit-source-id: 9c3ebc3e9dc92a59ce0908fc241bd2b97da88dca
2020-05-21 17:19:16 -07:00
Stefan Filip
b9a93d49e5 mononoke: bulk construction of the segmented changelog Dag
Summary:
`Dag::build_all_graph` will load the whole graph for a given repository and
construct the segmented changelog from it.

Reviewed By: StanislavGlebik

Differential Revision: D21538029

fbshipit-source-id: b4ba846bb2870ba73257bed6128b8e198a0aab3e
2020-05-21 17:19:15 -07:00
Kostia Balytskyi
471a6a71f6 clone: increase timeout to 4 hours
Reviewed By: krallin

Differential Revision: D21684250

fbshipit-source-id: 703adb039f839f5c04b592f3ae20eaf76c3f876e
2020-05-21 04:19:13 -07:00
Alex Hornby
1cf88010c2 mononoke: walker: switch to DashMap for walk state
Summary:
Change from CHashMap to DashMap for the walk state tracking.

DashMap is using slightly less memory in testing, and is slightly quicker in walk rate (number of graph nodes processed per second ).

Reviewed By: HarveyHunt

Differential Revision: D21662210

fbshipit-source-id: ea0601df17e0e596fd59b67d9d01d0dc4e90799b
2020-05-21 04:08:08 -07:00
Harvey Hunt
a2111e06e3 mononoke: Add cache hit and miss statistics to CacheBlob
Summary:
CacheBlob logs hit and miss stastics to Scuba. Let's add the same for
ODS.

Reviewed By: krallin

Differential Revision: D21640922

fbshipit-source-id: 8f7d17f048bf53bdc6cd8bda0384a51cae7b6a30
2020-05-21 03:19:12 -07:00
Harvey Hunt
0ee0131454 mononoke: Remove hit and miss metrics from CountedBlobstore
Summary:
CountedBlobstore is a Blobstore that wraps another blobstore in order
to report metrics about it, such as the number of gets or errors. It's commonly
used to wrap a CacheBlob blobstore, which itself is a caching wrapper over an
inner blobstore.

CountedBlobstore exposes metrics that are supposed to track the number of hits
or misses when fetching blobs. However, these metrics don't make sense as the
CountedBlobstore has no view into cache activity. These metrics actually report
the number of requests and the number of missing blobs rather than hits and
misses.

Remove these misleading counters.

Reviewed By: krallin

Differential Revision: D21640923

fbshipit-source-id: 07b9fc9864c70991415c2b84f35d631b702c17d1
2020-05-21 03:19:11 -07:00