Commit Graph

505 Commits

Author SHA1 Message Date
Thomas Orozco
c55140f290 mononoke/lfs_server: download: handle redaction
Summary:
When a client requests a blob that is redacted, we should tell them that,
instead of returning a 500. This does that, we now return a `410 Gone` when
redacted content is accessed.

Reviewed By: farnz

Differential Revision: D20897251

fbshipit-source-id: fc6bd75c82e0cc92a5dbd86e95805d0a1c8235fb
2020-04-08 11:58:09 -07:00
Thomas Orozco
0a21ab46c4 mononoke/lfs_server: ignore redaction errors in batch
Summary:
If a blob is redacted, we shouldn't crash in batch. Instead, we should return
that the blob exists, and let the download path return to the client the
information that the blob is redacted. This diff does that.

Reviewed By: HarveyHunt

Differential Revision: D20897247

fbshipit-source-id: 3f305dfd9de4ac6a749a9eaedce101f594284d16
2020-04-08 11:58:09 -07:00
Thomas Orozco
77149d7ee8 mononoke/lfs_server: don't return a 502 on batch error
Summary:
502 made a bit of sense since we can occasionally proxy things to upstream, but
it's not very meaningful because our inability to service a batch request is
never fully upstream's fault (it would not a failure if we had everything
internally).

So, let's just return a 500, which makes more sense.

Reviewed By: farnz

Differential Revision: D20897250

fbshipit-source-id: 239c776d04d2235c95e0fc0c395550f9c67e1f6a
2020-04-08 11:58:09 -07:00
Thomas Orozco
ee45e76fcf mononoke/lfs_server: ignore failures from upstream if internal can satisfy
Summary:
I noticed this while doing some unrelated work on this code. Basically, if we
get an error from upstream, then we shouldn't return an error the client
*unless* upstream being down means we are unable to satisfy their request
(meaning, we are unable to say whether a particular piece of content is
definitely present or definitely missing).

This diff fixes that. Instead of checking for a success when hearing form
upstream _then_ running our routing logic, let's instead only fail if in the
course of trying to route the client, we discover that we need a URL from
upstream AND upstream has failed.

Concretely, this means that if upstream blew up but internal has all the data
we want, we ignore the fact that upstream is down. In practice, internal is
usually very fast (because it's typically all locally-cached) so this is
unlikely to really occur in real life, but it's still a good idea to account
for this failure scenario.

Reviewed By: HarveyHunt

Differential Revision: D20897252

fbshipit-source-id: f5a8598e8a9da382d0d7fa6ea6a61c2eee8ae44c
2020-04-08 11:58:08 -07:00
Thomas Orozco
368d43cb71 mononoke_types: add Sha256 stubs
Summary: Like it says in the title.

Reviewed By: farnz

Differential Revision: D20897248

fbshipit-source-id: bf17ee8bdec85153eed3c8265304af79ec9a8877
2020-04-08 11:58:08 -07:00
Thomas Orozco
6130f1290f mononoke/blobrepo_factory: add a builder for test repos
Summary:
Right now we have a couple functions, but they're not easily composable. I'd
like to make the redacted blobs configurable when creating a test repo, but I
also don't want to have 2 new variants, so let's create a little builder for
test repos.

This should make it easier to extend in the future to add more customizability
to test repos, which should in turn make it easier to write unit tests :)

Reviewed By: HarveyHunt

Differential Revision: D20897253

fbshipit-source-id: 3cb9b52ffda80ccf5b9a328accb92132261616a1
2020-04-08 11:58:08 -07:00
Steven Troxler
10bf48e871 Extract async fn tail_one_iteration
Summary:
This asyncifies the internals of `subcommand_tail`, which
loops over a stream, by taking the operation performed in
the loop and making it an async function.

The resulting code saves a few heap allocations by reducing
clones, and is also *much* less indented, which helps with
readability.

Reviewed By: krallin

Differential Revision: D20664511

fbshipit-source-id: 8e81a1507e37ad2cc59e616c739e19574252e72c
2020-04-08 11:19:35 -07:00
Lukas Piatkowski
c7d12b648f mononoke/mercurial: make revlog crate OSS buildable
Reviewed By: krallin

Differential Revision: D20869309

fbshipit-source-id: bc234b6cfcb575a5dabdf154969db7577ebdb5c5
2020-04-08 09:49:11 -07:00
Simon Farnsworth
4135c567a8 Port over all hooks whose behaviour doesn't change from Mercurial form to Bonsai form
Summary: These hooks behave the same way in Mercurial and Bonsai form. Port them over to operating on Bonsai form

Reviewed By: krallin

Differential Revision: D20891165

fbshipit-source-id: cbcdf217398714642d2f2d6669376defe8b944d7
2020-04-08 08:59:01 -07:00
Simon Farnsworth
da7cbd7f36 Run Bonsai hooks as well as old-style hooks
Summary: Running on Mercurial hooks isn't scalable long term - move the consumers of hooks to run on both forms for a transition period

Reviewed By: krallin

Differential Revision: D20879136

fbshipit-source-id: 4630cafaebbf6a26aa6ba92bd8d53794a1d1c058
2020-04-08 08:59:00 -07:00
Simon Farnsworth
c59ae3274b Teach hook loader to load new (Bonsai) form hooks
Summary: To use Bonsai-based hooks, we ned to be able to load them. Make it possible.

Reviewed By: krallin

Differential Revision: D20879135

fbshipit-source-id: 9b44d7ca83257c8fc30809b4b65ec27a8e9a8209
2020-04-08 08:59:00 -07:00
Simon Farnsworth
b66d875fa5 Move hooks over from an internal representation based on HgChangesets to BonsaiChangesets
Summary: We want all hooks to run against the Bonsai form, not a Mercurial form. Create a second form of hooks (currently not used) which acts on Bonsai hooks. Later diffs in the stack will move us over to Bonsai only, and remove support for Mercurial changeset derived hooks

Reviewed By: krallin

Differential Revision: D20604846

fbshipit-source-id: 61eece8bc4ec5dcc262059c19a434d5966a8d550
2020-04-08 08:59:00 -07:00
Steven Troxler
afdb247802 Swap out a while loop instead of .and_then + .fold
Summary:
Thanks to StanislavGlebik for this idea: we can make the looping over
upload changesets into straightforward imperative code instead
of using `.and_then` + `.fold` by taking the next chunk in a
while loop.

The resulting code is probably easier to understand (depends whether
you come from a functional background I guess), and it's less indented
which is definitely more readable

Reviewed By: StanislavGlebik

Differential Revision: D20881862

fbshipit-source-id: 7ecf76a2fae3eb0e6c24a1ee14e0684b6334b087
2020-04-08 08:19:32 -07:00
Steven Troxler
aabbd3b66a Minor cleanups of blobimport_lib/lib.rs
Summary:
A couple of minor improvements, removing some overhead:
 - We don't need to pass cloned structs to `erive_data_for_csids`,
   refs work just fine
 - We can strip out one of the boxing blocks by directly assigning
   an `async` block to `globalrevs_work`
   - We can't do the same for `synced_commit_mapping_work` because
     we have to iterate over `chunk` in synchronous code, so that
     `chunk` can later be consumed by the line defining `changesets`.

Reviewed By: StanislavGlebik

Differential Revision: D20863304

fbshipit-source-id: 14cad3324978a66bcf325b77df7803d77468d30b
2020-04-08 08:19:32 -07:00
Steven Troxler
814f428f03 Asyncify the max_rev code
Summary:
This wound up being a little tricky, because
that `async move` blocks capture any data used,
and most of the fields of the `Blobimport` struct
are values rather than refs.

The easiest solution that I came up with, which looks
a little weird but works better than anything else
I tried, is to just inject a little block of code
(which I commented so it will hopefully be clear to
future readers) taking refs of anything that we need
to use in an async block but also have available later.

In the process, we are able to strip out a layer of
clones, which should improve efficiency a bit.

Reviewed By: StanislavGlebik

Differential Revision: D20862358

fbshipit-source-id: 186bf9939b9496c432ff0d9a01e602da47f4b5d4
2020-04-08 08:19:32 -07:00
Lukas Piatkowski
2e7baa454b mononoke: cover more crates with OSS buildability that depend on cmdlib crate
Reviewed By: krallin

Differential Revision: D20735803

fbshipit-source-id: d4159d16384ff795717f6ccdd278b6b4af45d1ab
2020-04-08 03:09:07 -07:00
Lukas Piatkowski
8e9df760c5 mononoke: make cmdlib OSS buildable
Summary: Some methods that were unused or barely used outside of the cmdlib crate were made non-public (parse_caching, CachelibSettings, init_cachelib_from_settings).

Reviewed By: krallin

Differential Revision: D20671251

fbshipit-source-id: 232e786fa5af5af543239aca939cb15ca2d6bc10
2020-04-08 03:09:06 -07:00
Stefan Filip
d1ba21803a version: warn users when they are running an old build
Summary:
Old is defined by being based on a commit that is more than 30 days old.
The build date is taken from the version string.
One observation is that if we fail to release in more than 30 days then all
users will start seeing this message without any way of turning it off. Doesn't
seem worth while to add a config for silencing it though.

Reviewed By: quark-zju

Differential Revision: D20825399

fbshipit-source-id: f97518031bbda5e2c49226f3df634c5b80651c5b
2020-04-07 14:25:38 -07:00
Stanislau Hlebik
b2a8862a9a mononoke: add a test backfill derived data
Summary:
I decided to go with integration test because backfilling derived data at the
moment requires two separate calls - a first one to prefetch changesets, and a
second one to actually run backfill. So integration test is better suited for this
case than unit tests.

While doing so I noticed that fetch_all_public_changesets actually won't fetch
all changesets - it loses the last commit becauses t_bs_cs_id_in_range was
returning exclusive (i.e. max_id was not included). I fixed the bug and made the name clearer.

Reviewed By: krallin

Differential Revision: D20891457

fbshipit-source-id: f6c115e3fcc280ada26a6a79e1997573f684f37d
2020-04-07 08:44:25 -07:00
Aida Getoeva
2df76d79c8 mononoke/scs-log: add history stream terminator
Summary:
`log_v2` supports time-filters and that means it needs to be able to drop the history stream if the commits got older than the given time frame. (if not it just traverses the whole history...)
However, it cannot be done from the SCS commit_path API or from changeset_path, because they already receive history stream where commits are not ordered by creation time. And the naive solution "if next commit in the stream is older than `after_ts` then drop" won't work: there might be another branch (commit after the current one) which is still **in** the time frame.

I added a terminator-function to the `list_file_history` that is called on changeset id, for which a new fastlog batch is going to be fetched. If terminator returns true, then the fastlog is not fetched and the current history branch is dropped. All ready nodes are still streamed.
```
For example, if we have a history of the file changes like this:

      A 03/03    ^|
      |           |
      B 02/03     |
      |           |  - one fastlog batch
      C 01/03     |
      | \         |
02/01 D  E 10/02 _|  - let assume, that fastlog batches for D and E ancestors are needed to prefetch
      |  |
01/01 F  G 05/02

# Example 1

We query "history from A after time 01/02"

The old version would fetch all the commits and then filter them in `commit_path`. We would fetch both fastlog batches for the D branch and E branch.

With the terminator, `list_file_history` will call terminator on commit D and get `true` in return and then will drop the D branch,
then it will call terminator on E and get `false` and proceed with fetching fastlog for the E branch.

# Example 2

We query "history from A after time 01/04"

The old version would fetch all the commits and then filter them in `commit_path`, despite the fact that
the very first commit is already older than needed.

With the terminator it will call terminator on A and get `true` and won't proceed any further.

Reviewed By: StanislavGlebik

Differential Revision: D20801029

fbshipit-source-id: e637dcfb6fddceb4a8cfc29d08b427413bf42e79
2020-04-07 07:08:24 -07:00
Aida Getoeva
2dcfcbac62 mononoke/fastlog: asyncify part of ops
Summary: Asyncified main functions of the fastlog/ops, so it'd be easier to modify them and proceed with the new features.

Reviewed By: StanislavGlebik

Differential Revision: D20801028

fbshipit-source-id: 2a03eedca776c6e1048a72c7bd613a6ef38c5c17
2020-04-07 07:08:24 -07:00
Thomas Orozco
7e2ad0b529 mononoke/fastreplay: handle Gettreepack for designated nodes
Summary: We need to parse `directories` here. Let's do so.

Reviewed By: HarveyHunt

Differential Revision: D20869830

fbshipit-source-id: 74830aa0045b801fba089812447fb61d7d09ad14
2020-04-07 04:36:07 -07:00
Thomas Orozco
edadb9307a mononoke/repo_client: record depth
Summary: As it says in the title!

Reviewed By: HarveyHunt

Differential Revision: D20869828

fbshipit-source-id: df7728ce548739ef2dadad1629817fb56c166b66
2020-04-07 04:36:06 -07:00
Thomas Orozco
0e7cbcf453 mononoke/repo_client: use wireproto encoding for directories
Summary:
We use the logged arguments directly for wireproto replay, and then we replay
this directly in traffic replay, but just joining a list with `,` doesn't
actually work for directories:

- We need trailing commas
- We need wireproto encoding

This does that. It also clarifies that this encoding is for debug purposes by
updating function names, and relaxes a bunch of types (since hgproto uses
bytes_old).

Reviewed By: StanislavGlebik

Differential Revision: D20868630

fbshipit-source-id: 3b805c83505aefecd639d4d2375e0aa9e3c73ab9
2020-04-07 04:36:06 -07:00
Thomas Orozco
1c982d5258 mononoke/unbundle_replay: report size of the unbundle
Summary: This is helpful to draw conclusions as to how fast it is.

Reviewed By: StanislavGlebik

Differential Revision: D20872108

fbshipit-source-id: d323358bbba29de310d6dfb4c605e72ce550a019
2020-04-07 01:05:32 -07:00
Aida Getoeva
514a72e835 mononoke/fastlog: yield all ready nodes
Summary:
`list_file_history` implements BFS on the commit graph and returns a stream of changeset ids using `bounded_traversal_stream`.
The old version iterated BFS "levels" and each iteration streamed all nodes on the current level. For example, for the commit graph:
```
1 <- start # 1 level
|
2          # 2
| \
3  4       # 3
|  |
```
there would be 3 iterations and on each nodes would be yielded: [1], [2], [3, 4]. If there was need to prefetch fastlog batch or batches, it prefetched in parallel batch/batches for changesets on the same level.

The implementation was a bit hacky and it is a bit unfortunate that we need to make 100 iterations to stream changesets that are ready and do not require fetching fastlog. I also needed some simplification so I could then add a terminator function (3rd diff in the stack) on the fastlog batch prefetching stage (and add Deleted Manifest intefration).
So now the `bounded_traversal_stream` keeps a BFS-queue as a state and on each iteration streams all nodes in the queue until it hits the node for which it needs prefetch fastlog batch and goes to the next iteration.
```
state - [queue, prefetch_cs_id]
* on each iteration:
  1. If prefetch_cs_id.is_some() =>
        - fetch fastlog batch for prefetch_cs_id
        - fill the commit graph
        - add parents of the prefetch_cs_id to the bfs queue
  2. Get from the queue all nodes until we meet changeset without fetched parents.
      Mark this node as `prefetch_cs_id` for the next iteration.
  3. Stream ready nodes and go to the next iteration.
```
Thus
- we still fetch fastlog batches on demand and not before we really need them
- if we have 100 ready to be yield commits in the queue, we won't do 100 iterations and stream them in one go
- now if we need prefetch fastlog batches for 2 branches on the same "bfs level" we will do it one by one and not in parallel, but this situation is pretty uncommon
- code is simpler and allows to integrate Deleted Manifest and add terminator function.

Reviewed By: StanislavGlebik

Differential Revision: D20768401

fbshipit-source-id: cdba40539a842b3628826f6c72a29514da0d539e
2020-04-06 21:13:08 -07:00
Steven Troxler
ddcb9109ba Asyncify outermost logic in Blobimport::import
Summary:
In the previous diff we asyncified the signature of Blobimport::import,
but the body remained an old-style future with a compat and await at
the end.

This diff asyncifies the outermost logic from within the function,
slightly improving readability and removing one layer of clones
to cut down on heap allocations. The derivation of `max_rev` still
currently uses old-style streams and futures.

Reviewed By: StanislavGlebik

Differential Revision: D20861230

fbshipit-source-id: 1b462f17581c764e77a0a0a163c86ffa894df742
2020-04-06 15:26:01 -07:00
Steven Troxler
17717db851 Push compat down one layer in Blobimport::import
Summary:
Switch the Blobimport struct to take a reference to a ctx,
and have `import` be an `async fn`.

Reviewed By: StanislavGlebik

Differential Revision: D20861165

fbshipit-source-id: eda9d599af2e525ec3142facc1eeb6b5b433ab06
2020-04-06 15:26:01 -07:00
Alex Hornby
d7be844de5 mononoke: add default implementations for samplingblobs put and is_present handlers
Summary: Add default implementations for samplingblobs put and is_present handlers to save some boilerplate

Reviewed By: farnz

Differential Revision: D20868507

fbshipit-source-id: 40275cc832870019238c0635e097e53671b76783
2020-04-06 10:01:38 -07:00
Mateusz Kwapich
4163e2937f use operation keys for selects
Summary:
This way we'll never select more than (no_of_stores * limit) rather than
potentially unbounded output.

NOTE: This diff has to be landed and rolled out **after** D20557702 is rolled out. I'm assuming that after some time since D20557702 rollout  all the rows in the production db will have proper `operation_key` value set so we can make queries based on them.

Reviewed By: krallin

Differential Revision: D20557700

fbshipit-source-id: 5a1d4b69949b425915214f5227c5c0dcce374360
2020-04-06 09:57:24 -07:00
Mateusz Kwapich
549eb41059 run blobstore healer integration test with mysql
Summary: So we're sure that all the quries work not only in sqlite.

Reviewed By: krallin

Differential Revision: D20839958

fbshipit-source-id: 9d05cc175d65396af7495b31f8c6958ac7bd8fb6
2020-04-06 09:57:24 -07:00
Mateusz Kwapich
975ddc043f test showing the problem with repeating blob_keys
Summary:
When we have more entries with a single blobkey we always select all of them
regardles when and how they were added. That's why we need to select basing on
operation_key.

Reviewed By: krallin

Differential Revision: D20557699

fbshipit-source-id: 77ccf992bb24d1a46ea28a13ab0780e6c92935ae
2020-04-06 09:57:24 -07:00
Alex Hornby
ba8e6e0d1c mononoke: walker: log errors to scuba regardless of the error_as_data setting
Summary: Log errors to scuba regardless of the error_as_data setting, as as finding the logs is much easier from scuba than stderr.

Reviewed By: farnz

Differential Revision: D20838462

fbshipit-source-id: b78e3a3213ed4aee4e4b2feb871ad7e42e25ed00
2020-04-06 03:17:40 -07:00
Stanislau Hlebik
bd9fe4db1d mononoke: add missing telemetry to phases
Reviewed By: krallin

Differential Revision: D20839750

fbshipit-source-id: ea1f329cb0cc015461146428601a22685293bfc4
2020-04-03 10:38:42 -07:00
Stanislau Hlebik
0100cd75ee mononoke: asyncify all phases except for trait
Reviewed By: krallin

Differential Revision: D20839242

fbshipit-source-id: 75c5a8f9967c2c71b7e36b74abe151df142fcbab
2020-04-03 10:38:42 -07:00
Stanislau Hlebik
7b26350b74 mononoke: asyncify get_public_derive
Reviewed By: krallin

Differential Revision: D20838069

fbshipit-source-id: 3cda39571fd191f40663da3f1dd51bc03d86e250
2020-04-03 10:38:41 -07:00
Stanislau Hlebik
9ed388e34c mononoke: move phases store to new futures
Reviewed By: krallin

Differential Revision: D20837913

fbshipit-source-id: ddcbce9518255d9dda2cf09b61fdd4939cef5258
2020-04-03 10:38:41 -07:00
Stanislau Hlebik
5bc98d60db mononoke: asyncify test
Reviewed By: farnz

Differential Revision: D20837893

fbshipit-source-id: 43ca705ce2ada1532330da89d69392c6b97b5129
2020-04-03 09:12:18 -07:00
Kostia Balytskyi
717d82a828 unbundle processing: add stats reporting
Summary:
Combined with the unbundle resolver stats, we will be able to say which
percentage of pushrebases fails, for example.

Reviewed By: StanislavGlebik

Differential Revision: D20818224

fbshipit-source-id: 70888b1cb90ffae8b11984bb024ec1db0e0542f7
2020-04-03 09:05:59 -07:00
Kostia Balytskyi
cfadb57637 resolver.rs: report which kind of unbundle was resolved if any
Summary:
We need this to be able to monitor how frequently we get pushes vs
infinitepushes, etc. A furhter diff will add a similar reporting to
`processing.rs`, so that we can compute a percentage of successful pushes to
all pushes, for example.

Reviewed By: StanislavGlebik

Differential Revision: D20818225

fbshipit-source-id: 7945dc285560d1357bdc6aef8e5fe50b61622254
2020-04-03 09:05:58 -07:00
Stanislau Hlebik
a47bb8c5e1 mononoke: use caching in phases more efficiently
Summary:
Our phases caching wasn't great. If you tried to ask for a draft commit then
we'd call mark_reachable_as_public method, bu this method was bypassing
caches.

The reason why we had this problem was because we had caching on a higher level
than necessary - we had SqlPhases struct which was "smarter" (i.e. it has a
logic of traversing ancestors of public heads and marking these ancestors and
public) and SqlPhasesStore which just did sql access. Previously we had our
caching layer on top of SqlPhases, meaning that when SqlPhases calls
`mark_reachable_as_public` it can't use caches anymore.

This diff fixes it by moving caching one layer lower - now we have a cache
right on top of SqlPhasesStore. Because of this change we no longer need
CachingPhases, and they were removed. Also `ephemeral_derive` logic was
simplified a bit

Reviewed By: krallin

Differential Revision: D20834740

fbshipit-source-id: 908b7e17d6588ce85771dedf51fcddcd2fabf00e
2020-04-03 08:23:38 -07:00
Stanislau Hlebik
016f24b93e mononoke: asyncify mark_reachable_as_public
Reviewed By: krallin

Differential Revision: D20836348

fbshipit-source-id: 1f30e69f9b3f47967a54ab0bf70c6f40944098b1
2020-04-03 08:23:38 -07:00
Stanislau Hlebik
74c7d0b11f mononoke: use MemcacheHandler
Summary:
Very small refactoring to store MemcacheHandler (i.e. an enum which can either
be a real Memcache client or a mock) instead of a memcache client.
It will be used in the next diff to create mock caches

Reviewed By: krallin

Differential Revision: D20834916

fbshipit-source-id: cb1e3e8f0ae0e2c0f7018d3a003ada56725c65c6
2020-04-03 04:20:53 -07:00
Stanislau Hlebik
8afcb5aaa3 mononoke: remove SelectPhase method
Summary: SelectPhases does the same thing - no need to keep two queries

Reviewed By: krallin

Differential Revision: D20817379

fbshipit-source-id: 8cc56ea4a94e81f110a286899a8f5c596566a142
2020-04-03 04:20:53 -07:00
Stanislau Hlebik
8ff854c2dc mononoke: move SqlPhasesStore to a separate file
Summary: I'm going to refactor it soon, for now just move it to another file.

Reviewed By: krallin

Differential Revision: D20817293

fbshipit-source-id: 6fb44b4be858ebbd0e8c9dfee160b91806f78285
2020-04-03 04:20:53 -07:00
David Tolnay
1a86366f0e third-party/rust: Turn off async-trait/support_old_nightly
Summary:
This diff turns off the support_old_nightly feature of async-trait (https://github.com/dtolnay/async-trait/blob/0.1.24/Cargo.toml#L28-L32) everywhere in fbcode. I am getting ready to remove the feature upstream. It was an alternative implementation of async-trait that produces worse error messages but supports some older toolchains dating back to before stabilization of async/await that the default implementation does not support.

This diff includes updating async-trait from 0.1.24 to 0.1.29 to pull in fixes for some patterns that used to work in the support_old_nightly implementation but not the default implementation.

Differential Revision: D20805832

fbshipit-source-id: cd34ce55b419b5408f4f7efb4377c777209e4a6d
2020-04-02 17:01:24 -07:00
Xavier Deguillard
29727102db memcache: don't panic if Memcache fails to initialize
Summary: Simply return an error when that happens.

Reviewed By: dtolnay

Differential Revision: D20808660

fbshipit-source-id: 94ca1c6de5739e4e67f2db6be547ed92c5696e43
2020-04-02 10:07:23 -07:00
Alex Hornby
a156633c1f mononoke: add sampling_fingerprint to hash types
Summary:
Add a fingerprint method that returns a subset of the hash.

This will allow us to see compression benefit, or write out a corpus, sampling 1 in N of a group of keys

Reviewed By: krallin

Differential Revision: D20541312

fbshipit-source-id: 93bd44ba9c14285daf50d8cd18eeb4b6dcc38d82
2020-04-02 09:08:05 -07:00
Alex Hornby
7060cd47d6 mononoke: walker: use sampling blobstore in compression-benefit
Summary:
Use the new sampling blobstore and sampling key in existing compression-benefit subcommand and check the new vs old reported sizes.

The overall idea for these changes is that the walker uses a CoreContext tagged with a SamplingKey to correlate walker steps for a node to the underlying blobstore reads,  this allows us to track overall bytes size (used in scrub stats) or the data itself (used in compression benefit)  per node type.

The SamplingVisitor and NodeSamplingHandler cooperate to gather the sampled data into the maps in NodeSamplingHandler,  which the output stream from the walk then operates on, e.g. to compress the blobs and report on compression benefit.

The main new logic sits in sampling.rs, it is used from sizing.rs (and later in stack from scrub.rs)

Reviewed By: krallin

Differential Revision: D20534841

fbshipit-source-id: b20e10fcefa5c83559bdb15b86afba209c63119a
2020-04-02 09:08:05 -07:00
Mark Thomas
5fb25674ea sql_ext: remove SqlConstructors
Summary: Now that everything is using `sql_construct`, we can remove the old `SqlConstructors` trait.

Reviewed By: StanislavGlebik

Differential Revision: D20734881

fbshipit-source-id: af46b41d17b40f6eb0839cdb9e85b00067360fe9
2020-04-02 05:27:16 -07:00
Mark Thomas
640f272598 migrate from sql_ext::SqlConstructors to sql_construct
Summary:
Migrate the configuration of sql data managers from the old configuration using `sql_ext::SqlConstructors` to the new configuration using `sql_construct::SqlConstruct`.

In the old configuration, sharded filenodes were included in the configuration of remote databases, even when that made no sense:
```
[storage.db.remote]
db_address = "main_database"
sharded_filenodes = { shard_map = "sharded_database", shard_num = 100 }

[storage.blobstore.multiplexed]
queue_db = { remote = {
    db_address = "queue_database",
    sharded_filenodes = { shard_map = "valid_config_but_meaningless", shard_num = 100 }
}
```

This change separates out:
* **DatabaseConfig**, which describes a single local or remote connection to a database, used in configuration like the queue database.
* **MetadataDatabaseConfig**, which describes the multiple databases used for repo metadata.

**MetadataDatabaseConfig** is either:
* **Local**, which is a local sqlite database, the same as for **DatabaseConfig**; or
* **Remote**, which contains:
    * `primary`, the database used for main metadata.
    * `filenodes`, the database used for filenodes, which may be sharded or unsharded.

More fields can be added to **RemoteMetadataDatabaseConfig** when we want to add new databases.

New configuration looks like:
```
[storage.metadata.remote]
primary = { db_address = "main_database" }
filenodes = { sharded = { shard_map = "sharded_database", shard_num = 100 } }

[storage.blobstore.multiplexed]
queue_db = { remote = { db_address = "queue_database" } }
```

The `sql_construct` crate facilitates this by providing the following traits:

* **SqlConstruct** defines the basic rules for construction, and allows construction based on a local sqlite database.
* **SqlShardedConstruct** defines the basic rules for construction based on sharded databases.
* **FbSqlConstruct** and **FbShardedSqlConstruct** allow construction based on unsharded and sharded remote databases on Facebook infra.
* **SqlConstructFromDatabaseConfig** allows construction based on the database defined in **DatabaseConfig**.
* **SqlConstructFromMetadataDatabaseConfig** allows construction based on the appropriate database defined in **MetadataDatabaseConfig**.
* **SqlShardableConstructFromMetadataDatabaseConfig** allows construction based on the appropriate shardable databases defined in **MetadataDatabaseConfig**.

Sql database managers should implement:

* **SqlConstruct** in order to define how to construct an unsharded instance from a single set of `SqlConnections`.
* **SqlShardedConstruct**, if they are shardable, in order to define how to construct a sharded instance.
* If the database is part of the repository metadata database config, either of:
    * **SqlConstructFromMetadataDatabaseConfig** if they are not shardable.  By default they will use the primary metadata database, but this can be overridden by implementing `remote_database_config`.
    * **SqlShardableConstructFromMetadataDatabaseConfig** if they are shardable.  They must implement `remote_database_config` to specify where to get the sharded or unsharded configuration from.

Reviewed By: StanislavGlebik

Differential Revision: D20734883

fbshipit-source-id: bb2f4cb3806edad2bbd54a47558a164e3190c5d1
2020-04-02 05:27:16 -07:00
Mark Thomas
662f8bfb1c sql_construct: add crate for constructing sql data managers
Summary:
Refactor `sql_ext::SqlConstructors` and its related traits into a separate
crate.  The new `SqlConstruct` trait is joined by `SqlShardedConstruct` which
allows construction of sharded databases.

The new crate will support a new configuration model where the distinction
between the database configuration for different repository metadata types
will be made clear.

Reviewed By: StanislavGlebik

Differential Revision: D20734882

fbshipit-source-id: b44cf9d1efd014c29df88e2ad933025e440119dc
2020-04-02 05:27:16 -07:00
Jun Wu
2aec2dbcb6 commitcloud: migrate to tech-debt-free repo.pull for pulling
Summary:
The new API does nothing that cloud sync does not want: bookmarks, obsmarkers,
prefetch, etc. Wrappers to disable features are removed.

This solves a "lagged master" issue where selectivepull adds `-B master` to
pull extra commits but cloud sync cannot hide them without narrow-heads. Now
cloud sync just does not pull the extra commits.

Reviewed By: sfilipco

Differential Revision: D20808884

fbshipit-source-id: 0e60d96f6bbb9d4ce02c04e8851fc6bda442c764
2020-04-01 19:40:57 -07:00
Stanislau Hlebik
fb9f7fe931 mononoke: use rollout_smc_tier option
Summary:
For the initial rollout of lfs on fbsource we want to rollout just for our
team using rollout_smc_tier option. This diff adds a support for that in
Mononoke.

It spawns a future that periodically updates list of enabled hosts in smc tier.
I had a slight concern about listing all the available services and storing
them in memory - what if smc tier have too many services? I decided to go ahead
with that because
1) [Smc antipatterns](https://fburl.com/wiki/ox43ni3a) wiki page doesn't seem
to list it as a concern.
2) We are unlikely to use for large tier - most likely we'll use it just for
hg-dev which contains < 100 hosts.

Reviewed By: krallin

Differential Revision: D20789751

fbshipit-source-id: d35323e49530df6983e159e2ed5bce205cc5666d
2020-04-01 10:00:52 -07:00
Stanislau Hlebik
c65db9c551 mononoke: configerator-thrift-update
Reviewed By: farnz

Differential Revision: D20789686

fbshipit-source-id: 13033d5cb4239d97db70a5f4a89014ea8c9f07c4
2020-04-01 10:00:51 -07:00
Stanislau Hlebik
03c73035cb mononoke: use more efficient copy fetching in scs server
Summary:
This is a followup from D20766465. In D20766465 we've avoided re-traversing
fsnodes for all entries except for copy/move sources. This diff make copy/move
sources fetching more efficient as well.

It does by sending a find_entries() request to prefetch all fsnodes

Reviewed By: mitrandir77

Differential Revision: D20770182

fbshipit-source-id: 7e4a68a2ded20b2895ee4d1c4f8fd897dbe1c850
2020-04-01 06:00:26 -07:00
Pavel Aslanov
b2c81c6d63 validate chunks sizes in the streaming clone implementation
Summary:
We are currently having problems with streaming clone:
```
$ hg --config 'extensions.fsmonitor=!' clone --shallow -U --config 'ui.ssh=ssh -oControlMaster=no' --configfile /etc/mercurial/repo-specific/fbsource.rc 'ssh://hg.vip.facebook.com//data/scm/fbsource?force_mononoke' "$(pwd)/fbsource-clone-test"
remote: server: https://fburl.com/mononoke
remote: session: vJ3qkiQIm9FT7mCp
connected to twshared11499.02.cln2.facebook.com
streaming all changes
2 files to transfer, 5.42 GB of data
abort: unexpected response from remote server:
'\x00\x01B?AB\x00\x00\x00\x00\x02U\x00\x00\x02\xc7\x00b\xf0\xd5\x00b\xf0\xd5\x00b\xf0\xd4\xff\xff\xff\xff\xa8z\xc7W\xd0&\xab\xb2\xf1{\xbfq\xac<\xaf6W\x06q\x81\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B?C\x97\x00\x00\x00\x00\x053\x00\x00\x06\xce\x00b\xf0\xd6\x00b\xf0\xd6\x00b\xf0\xd5\xff\xff\xff\xff\xa3I\x19+\xe2\x0f\xae\xd2\x95\x14\x8a\xde\x19\x18\xf0\x8cUQu\xf1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B?H\xca\x00\x00\x00\x00\x02\xe4\x00\x00\x03\x9e\x00b\xf0\xd7\x00b\xf0\xd7\x00b\xf0\xd6\xff\xff\xff\xffx\xd6}\x12nt\xb9\xbc(\x83\xfb\xfa\xcc\xc1o?\xde\xcc\x06L\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B?K\xae\x00\x00\x00\x00\x02j\x00\x00\x02\xb5\x00b\xf0\xd8\x00b\xf0\xd8\x00b\xf0\xd7\xff\xff\xff\xff\x04"\xfcw6\'M\xba\xf1f\xdb\x02\xbeE\x93:\xc8\x17\x88P\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B?N\x18\x00\x00\x00\x00\x03\xbb\x00\x00\x04\xb8\x00b\xf0\xd9\x00b\xf0\xd9\x00b\xf0\xd8\xff\xff\xff\xff\xb9\x15*p/\xa4*\x00\x9dZw\x01B\x87L\x8f\x08\x11\x89\xe0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0000changelog.d\x005406413267\n'
```
 as the result of the debugging it is turned out that we are sending more data than expected, to have better error next time if we have any corruption of the `streaming_changelog_chunks` table

Reviewed By: StanislavGlebik

Differential Revision: D20763738

fbshipit-source-id: 6f6fa9f9a29909e044d9ba42fe84916ddcb62e8f
2020-04-01 04:55:17 -07:00
Simon Farnsworth
39f5aacf9e Allow filtering of blobstore benchmarks
Summary: Each benchmark takes about 3 minutes to run. We've already got 16 benchmarks, and we're going to grow. Allow you to limit the number of benchmarks we run at once..

Reviewed By: ahornby

Differential Revision: D20735795

fbshipit-source-id: 241184085b35da8ab85314fef1c6a08404bdb769
2020-04-01 03:13:11 -07:00
Simon Farnsworth
cd77fd6c21 Teach the blobstore benchmark to use saved baselines
Summary: We're going to be doing a variety of changes to sqlblob - let's enable working against a known baseline each time, instead of incremental changes.

Reviewed By: ahornby

Differential Revision: D20735796

fbshipit-source-id: 86f15dac1f004b2f3c83ced829a65f3f6e111d6b
2020-04-01 03:13:11 -07:00
Simon Farnsworth
aa86f24204 Add a blobstore benchmark tool
Summary:
We want to be able to benchmark blobstore stacks to allow us to compare blobstore stacks, and ensure that we're not regressing as we improve SQLblob to production state.

Use Criterion to benchmark a few basic workflows - starting with writes, but reads can be added later.

Reviewed By: ahornby

Differential Revision: D20720789

fbshipit-source-id: e8b10664a9d08a1aa7e646e1ebde251bec0db991
2020-04-01 03:13:10 -07:00
Arun Kulshreshtha
e236ef9df3 edenapi_server: use client identity middleware
Summary: Use the client identity middleware from gotham_ext in the EdenAPI server. This middleware parses validated client identities from HTTP headers inserted by Proxygen; these identities can then be used to enforce repo ACLs.

Reviewed By: HarveyHunt

Differential Revision: D20744887

fbshipit-source-id: 651e171d1b20448b3e99bfc938d118fb6dddea91
2020-03-31 14:07:14 -07:00
Mark Thomas
b331b355ef sync configerator thrift update
Reviewed By: StanislavGlebik

Differential Revision: D20769371

fbshipit-source-id: 46f476adee8abcc8248f89f768d3ee43ad29466f
2020-03-31 11:44:04 -07:00
Stanislau Hlebik
2eebab89c5 mononoke: make scs diff path-only call faster
Reviewed By: markbt

Differential Revision: D20766465

fbshipit-source-id: fa78fa66da32caddcd582a6500b9a8393904f687
2020-03-31 07:40:14 -07:00
Stanislau Hlebik
11f891a178 mononoke: remove allow dead_code
Summary: Looks like it's not dead anymore

Reviewed By: krallin

Differential Revision: D20766497

fbshipit-source-id: c49ae3b6c8a660b33e61e65adda94f78addd1498
2020-03-31 07:30:32 -07:00
Lukas Piatkowski
fa9d734ad1 mononoke: remove direct usages of common/rust/configerator
Summary:
It is preferable to use the higher-level API of cached_config instead of ConfigeratorAPI whenever possible since the higher-level API supports OSS builds.

For `ConfigStore` let `poll_interval` be None so that for one-off reading of configs the ConfigStore doesn't needlessly spawn an updating thread.

Also this update is with compliance to the discussion in D19026190.

Reviewed By: ahornby

Differential Revision: D20670224

fbshipit-source-id: 24fc124d440fd458a9fa88a906fc3a1cfdbd827e
2020-03-31 04:02:46 -07:00
Lukas Piatkowski
7fa825d40c rust-shed: move cached_config to the shed
Reviewed By: ahornby

Differential Revision: D20650304

fbshipit-source-id: 5fc704ce2964b9595722c3cd9c6f1dbd395a52ee
2020-03-31 04:02:46 -07:00
Lukas Piatkowski
bf34f084d0 mononoke: make blobrepo and its dependencies OSS buildable
Reviewed By: markbt

Differential Revision: D20495840

fbshipit-source-id: 3bbefae1923dc84e3daea158a24c0d2a802cc9a9
2020-03-31 04:02:45 -07:00
Lukas Piatkowski
1bee1993a3 mononoke: make newfilenodes and blobstore/factory OSS buildable
Summary: In the process the blobstore/factory/lib.rs was split into submodules - this way it was easier to untangle the dependencies and refactor it, so I've left the split in this diff.

Reviewed By: markbt

Differential Revision: D20302068

fbshipit-source-id: caa3a2b5487c30198c62f7e4f4e9cb7c488dc8de
2020-03-31 04:02:45 -07:00
Kostia Balytskyi
5858dc309e resolver.rs: make Bundle2Resolver contain refs to ctx and repo
Summary:
As suggested in D20680173, we can reduce the overall need to copy things by
storing refs in the resolver.

Reviewed By: krallin

Differential Revision: D20696588

fbshipit-source-id: 9456e2e208cfef6faed57fc52ca59fafdccfc68c
2020-03-30 12:21:09 -07:00
Kostia Balytskyi
d7af87342b upload_changesets: migrate the main fn to async/await
Summary:
See bottom diff of this stack for overview.

This diff in particular asyncifies the `upload_changeset` fn. Apart from that,
it also makes sure it can accept `&RevlogChangeset` instead of
`RevlogChangeset`, which helps us to get rid of cloning.

Reviewed By: krallin

Differential Revision: D20693932

fbshipit-source-id: b0e5e1604cbfb6f6b6e269c85a79208115325734
2020-03-30 12:21:09 -07:00
Kostia Balytskyi
19fff610d7 upload_changesets: rename Future into OldFuture
Summary: Same as the bottom diff of this stack, but for another file.

Reviewed By: krallin

Differential Revision: D20693934

fbshipit-source-id: 4c2d12bf9d9ab272898a7830ece6d9f563adb8fb
2020-03-30 12:21:08 -07:00
Kostia Balytskyi
014e19fbed resolver.rs: simplify a few post-asyncifying things
Summary:
This diff focuses on the following:
- replaces clones with references, both when this decreases the total sum of
  clones, and when it causes the only clone to be on the boundary with the
  compat code. This, when those boundaries are pushed further, we can only fix
  one place in resolver
- removes a weird wrapping of a closure into an `Arc` and just calls
  `upload_changesets` directly instead
- in cases when `BundleResolver` methods take `ctx` as an argument removes it
  and makes those methods use the one stored in the struct

Reviewed By: StanislavGlebik

Differential Revision: D20680173

fbshipit-source-id: c397c4ade57a07cbbc9206fa8a44f4225426778c
2020-03-30 12:21:08 -07:00
Kostia Balytskyi
e2bc1b7f36 resolver.rs: remove unneeded res local vars
Reviewed By: StanislavGlebik

Differential Revision: D20678513

fbshipit-source-id: 73ea5fbf028634fe18bba2690d65e7baf5bca512
2020-03-30 12:21:08 -07:00
Kostia Balytskyi
0a47a018f4 resolver.rs: replace stream with loops in upload_changesets
Reviewed By: krallin

Differential Revision: D20678301

fbshipit-source-id: 955cee628feb51639216366d09c2ffafbe31ac69
2020-03-30 12:21:07 -07:00
Thomas Orozco
11af551491 mononoke/benchmark_filestore: make it work again
Summary:
This bitrot with two different changes:

- D19473960 put it on a v2 runtime, but the I/O is v1 so that doesn't work (it
  panics).
- The clap update a couple months ago made duplicate arguments illegal, and a
  month before that we had put `debug` in the logger args (arguably where it
  belong), so this binary was now setting `debug` twice, which would now panic.

Evidently, just static typing wasn't quite enough to keep this working through
time (though that's perhaps due to the fact that both of those changes were
invisible to the type system), so I also added a smoke test for this.

Reviewed By: farnz

Differential Revision: D20618785

fbshipit-source-id: a1bf33783885c1bb2fe99d3746d1b73853bcdf38
2020-03-30 07:32:20 -07:00
Thomas Orozco
8315336b2c mononoke/unbundle_replay: run hooks
Summary:
As the name indicates, this updates unbundle_replay to run hooks. Hook failures
don't block the replay, but they're logged to Scuba.

Differential Revision: D20693851

fbshipit-source-id: 4357bb0d6869a658026dbc5421a694bc4b39816f
2020-03-30 06:25:08 -07:00
Thomas Orozco
fd546edbad mononoke/unbundle_replay: don't derive filenodes
Summary:
Setting up a derived data tailer for this is a better approach (see D20668301
for context).

Reviewed By: StanislavGlebik

Differential Revision: D20693270

fbshipit-source-id: 7a06ffe059c41c4e100f8b0f8837978717293829
2020-03-30 06:25:08 -07:00
Thomas Orozco
dfcaca8077 mononoke/unbundle_replay: move unbundle & filenodes derivation to their own task
Summary:
Since we do those concurrently, it makes sense to do them on their own task.
Besides, since those are still old futures that need ownership, there is
effectively no tradeoff here.

Differential Revision: D20691373

fbshipit-source-id: 1a45e43ec857d91bed1614568b4354d56a2b0848
2020-03-30 06:25:08 -07:00
Thomas Orozco
066cdcfb3d mononoke/unbundle_replay: also report recorded duration
Summary: This will make it easier to compare performance.

Differential Revision: D20674164

fbshipit-source-id: eb1a037b0b060c373c1e87635f52dd228f728c89
2020-03-30 06:25:07 -07:00
Thomas Orozco
213276eff5 mononoke/unbundle_replay: add Scuba reporting
Summary: This adds some Scuba reporting to unbundle_replay.

Differential Revision: D20674162

fbshipit-source-id: 59e12de90f5fca8a7c341478048e68a53ff0cdc1
2020-03-30 06:25:07 -07:00
Thomas Orozco
13f24f7425 mononoke/unbundle_replay: unbundle concurrently, derive filenodes concurrently
Summary:
This updates unbundle_replay to do things concurrently where possible.
Concretely, this means we do ingest unbundles concurrently, and filenodes
derivation concurrently, and only do the actual pushrebase sequentially. This
lets us get ahead on work wherever we can, and makes the process faster.

Doing unbundles concurrently isn't actually guaranteed to succeed, since it's
*possible* that an unbundle coming in immediately after a pushrebase actually
depends the commits created in said pushrebase. In this case, we simply retry
the unbundle when we're ready to proceed with the pushrebase (in the code, this
is the `Deferred` variant). This is fine from a performance perspective

As part of this, I've also moved the loading of the bundle to processing, as
opposed to the hg recording client (the motivation for this is that we want to
do this loading in parallel as well).

This will also let us run hooks in parallel once I add this in.

Reviewed By: StanislavGlebik

Differential Revision: D20668301

fbshipit-source-id: fe2c62ca543f29254b4c5a3e138538e8a3647daa
2020-03-30 06:25:07 -07:00
Thomas Orozco
60d427e93c mononoke/unbundle_replay: log when pushrebase is starting
Summary: More logging is always helpful!

Reviewed By: HarveyHunt

Differential Revision: D20668303

fbshipit-source-id: 776f41491c4108e5f5ab9caa9351584150d7b626
2020-03-30 06:25:06 -07:00
Thomas Orozco
d18cd74f7d mononoke/unbundle_replay: ignore entries with conflicts
Summary:
pushrebase_errmsg is NULL when we have conflicts, but we still shouldn't replay
the entry (because it'll fail, with conflicts). Let's exclude those.

Reviewed By: StanislavGlebik

Differential Revision: D20668304

fbshipit-source-id: a058bb466e0a8a53ec81e41db7ba138d6aedf3f9
2020-03-30 06:25:06 -07:00
Thomas Orozco
7dd1717f7d mononoke/unbundle_replay: log the age of the commit we just replayed
Summary: It's helpful.

Reviewed By: HarveyHunt

Differential Revision: D20668302

fbshipit-source-id: 0f8e8cc72363aed337fd6fa4c3950c17eb1f92b7
2020-03-30 06:25:06 -07:00
Thomas Orozco
58eeb318aa mononoke/unbundle_replay: log when we start deriving hg changesets
Summary: This is helpful.

Reviewed By: StanislavGlebik

Differential Revision: D20645576

fbshipit-source-id: b08ec151232e46dbde1a33010c6852e9563f6a1a
2020-03-30 06:25:05 -07:00
Thomas Orozco
259e096841 mononoke/unbundle_replay: sleep when watching bookmark
Summary:
This updates unbundle_replay to support sleeping when watching for updates in a
bookmark and said bookmark isn't moving. This will be useful so it can run as a
service.

Reviewed By: StanislavGlebik

Differential Revision: D20645157

fbshipit-source-id: 6edeb66b65b2ef8b88c8db5e664982756acbfaf1
2020-03-30 06:25:05 -07:00
Thomas Orozco
d1cce10ea7 mononoke/unbundle_replay: fixup incomplete test
Summary:
I accidentally forgot to insert the entry, so that made this test a bit
useless. Let's make it not useless.

Reviewed By: StanislavGlebik

Differential Revision: D20645158

fbshipit-source-id: 0f0eb0cf9d16e8c346897088891aa3277b4d9c07
2020-03-30 06:25:05 -07:00
Thomas Orozco
8ce3d94187 mononoke/unbundle_replay: add support for replaying a bookmark
Summary:
This adds support for replaying the updates to a bookmark through unbundle
replay. The goal is to be able to run this as a process that keeps a bookmark
continuously updated.

There is still a bit of work here, since we don't yet allow the stream to pause
until bookmark update becomes available (i.e. once caught up, it will exit).
I'll introduce this in another diff.

Note that this is only guaranteed to work if there is a single bookmark in the
repo. With more, it could fail if a commit is first introduced in a bookmark that
isn't the one being replayed here, and later gets introduced in said bookmark.

Reviewed By: StanislavGlebik

Differential Revision: D20645159

fbshipit-source-id: 0aa11195079fa6ac4553b0c1acc8aef610824747
2020-03-30 06:25:04 -07:00
Thomas Orozco
7cd5eb6774 mononoke/unbundle_replay: get a stream of requests to replay
Summary:
I'm going to update this to run in a loop, so to do that it would be nice to
represent the things to replay as a stream. This does that change, but for now
all our streams have just one element.

Reviewed By: StanislavGlebik

Differential Revision: D20645156

fbshipit-source-id: fce7536d0ccbc1911335704816b71c17e80f2116
2020-03-30 06:25:04 -07:00
Thomas Orozco
6b1894cec9 mononoke/unbundle_replay: derive filenodes
Summary:
We normally derive those lazily when accepting pushrebase, but we do derive
them eagerly in blobimport. For now, let's be consistent with blobimport.

This ensures that we don't lazily generate them, which would require read traffic,
and gives a picture a little more consistent with what an actual push would look like.

Reviewed By: ikostia

Differential Revision: D20623966

fbshipit-source-id: 2209877e9f07126b7b40561abf3e6067f7a613e6
2020-03-30 06:25:04 -07:00
Thomas Orozco
8b0f92e84b mononoke/unbundle_replay: report missing Bonsai onto_rev in hg replay
Summary:
This makes it easier to realize if you used the wrong entry ID when replaying
(instead of telling you the bookmark isn't at `None` as expected, it tells you
the Hg Changeset could not be mapped to a Bonsai).

Reviewed By: ikostia

Differential Revision: D20623847

fbshipit-source-id: aaa66e7825f12373742efd4f779ae20ff21f0b46
2020-03-30 06:25:03 -07:00
Thomas Orozco
90cf5df340 mononoke/unbundle_replay: add a little more logging
Summary: More logging is nice!

Reviewed By: ikostia

Differential Revision: D20623846

fbshipit-source-id: 61eb3d17f5fb3b2bf94ef3f946b1d90d725cfece
2020-03-30 06:25:03 -07:00
Thomas Orozco
7ca14665a2 mononoke/unbundle_replay: use repo pushrebase hooks
Summary:
This updates unbundle_replay to account for pushrebase hooks, notably to assign
globalrevs.

To do so, I've extracted the creation of pushrebase hooks in repo_client and
reused it in unbundle_replay. I also had to update unbundle_replay to no longer
use `args::get_repo` since that doesn't give us access to the config (which we
need to know what pushrebase hooks to enable).

Reviewed By: ikostia

Differential Revision: D20622723

fbshipit-source-id: c74068c920822ac9d25e86289a28eeb0568768fc
2020-03-30 06:25:03 -07:00
Thomas Orozco
3804f1ca16 mononoke: introduce unbundle_replay
Summary:
This adds a unbundle_replay Rust binary. Conceptually, this is similar to the
old unbundle replay Python script we used to have, but there are a few
important differences:

- It runs fully in-process, as opposed to pushing to a Mononoke host.
- It will validate that the pushrebase being produced is consistent with what
  is expected before moving the bookmark.
- It can find sources to replay from the bookmarks update log (which is
  convenient for testing).

Basically, this is to writes and to the old unbundle replay mechanism what
Fastreplay is to reads and to the traffic replay script.

There is still a bit of work to do here, notably:

- Make it possible to run this in a loop to ingest updates iteratively.
- Run hooks.
- Log to Scuba!
- Add the necessary hooks (notably globalrevs)
- Set up pushrebase flags.

I would also like to see if we can disable the presence cache here, which would
let us also use this as a framework for benchmarking work on push performance,
if / when we need that.

Reviewed By: StanislavGlebik

Differential Revision: D20603306

fbshipit-source-id: 187c228832fc81bdd30f3288021bba12f5aca69c
2020-03-30 06:25:03 -07:00
Thomas Orozco
4a62a3e629 mononoke/bookmarks: expose access to owned replay data
Summary: I'd like to get the timestamps here without needing to clone them.

Reviewed By: StanislavGlebik

Differential Revision: D20603308

fbshipit-source-id: 2d8f72b4fb3a3eed33b58dc2f0fb1a857bb3f5b9
2020-03-30 06:25:02 -07:00
Thomas Orozco
beb18f5113 mononoke/pushrebase: make into_transaction_hook async + accept context
Summary:
This updates pushrebase hooks to allow into_transaction_hook to be async (the
reason I hadn't made it async is because it hadn't been needed yet).

Currently, this is a no-op, but I'm going to use this later in this stack.

Reviewed By: StanislavGlebik

Differential Revision: D20603307

fbshipit-source-id: 79651184dbe08322c4cab03d7119a31036391852
2020-03-30 06:25:02 -07:00
Stanislau Hlebik
b86b4fd627 mononoke: log if skiplist failed
Summary:
A few of our tasks failed on startup and most likely it was during warmup
though we are not sure (see attached task).

Let's add move logging

Reviewed By: farnz

Differential Revision: D20698273

fbshipit-source-id: 4facd21a94d2917103e417a014b820c893da4718
2020-03-27 23:49:03 -07:00
Stanislau Hlebik
2742bea611 mononoke: fix warning
Reviewed By: krallin

Differential Revision: D20698518

fbshipit-source-id: 53550e2d3afb49a4a3bc8c940f37175ff7ee89c0
2020-03-27 23:44:42 -07:00
Stefan Filip
ea89b541e1 segmented_changelog: add Dag struct and location_to_name functionality
Summary:
The IdDag provides graph algorithms using Segments.
The IdMap allows converting from the SegmentedChangelogId domain to the
ChangesetId domain.
The Dag struct wraps IdDag and IdMap in order to provide graph algorithms using
the common application level identifiers for commits (ChangesetId).

The construction of the Dag is currently mocked with something that can only be
used in a test environment (unit tests but also integration tests).

This diff also implements a location_to_name function. This is the most
important new functionality that segmented changelog clients require. It
recovers the hash of a commit for which the client only has a segmented
changelog Id. The current assumption is that clients have identifiers for all
merge commit parents so the path to a known commit always follow a set
of first parents.

The IdMap queries will have to be changed to async in the future, but IdDag
queries we expect to stay sync.

Reviewed By: quark-zju

Differential Revision: D20635577

fbshipit-source-id: 4f9bd8dd4a5bd9b0de55f51086f3434ff507963c
2020-03-27 13:48:52 -07:00
Stefan Filip
a853c7a92b segmented_changelog: use [fbinit::compat_test] for idmap tests
Summary: Modernizing the codebase.

Reviewed By: krallin

Differential Revision: D20655252

fbshipit-source-id: c97fd46f1a224ca74606f4b42d5fa6b1a00c8ea8
2020-03-27 13:48:52 -07:00