Commit Graph

232 Commits

Author SHA1 Message Date
Thomas Orozco
9493a05e7b mononoke/filestore: update store_bytes to chunk content
Summary:
This updates the store_bytes method to chunk incoming data instead of uploading
it as-is. This is unfortunately a bit hacky (but so was the previous
implementation), since it means we have to hash the data before it has gone
through the Filestore's preparation.

That said, one of the invariants of the filestore is that chunk size shouldn't
affect the Content ID (and there is fairly extensive test coverage for this),
so, notionally, this does work.

Performance-wise, it does mean we are hashing the object twice. That actually
was the case before as well anyway (since obtain the ContentId for FileContents
would clone them then hash them).

The upshot of this change is that large files uploaded through unbundle will
actually be chunked (whereas before, they wouldn't be).

Long-term, we should try and delete this method, as it is quite unsavory to
begin with. But, for now, we don't really have a choice since our content
upload path does rely on its existence.

Reviewed By: StanislavGlebik

Differential Revision: D20281937

fbshipit-source-id: 78d584b2f9eea6996dd1d4acbbadc10c9049a408
2020-03-06 07:43:07 -08:00
Thomas Orozco
56a7ce8697 mononoke/filestore: make FilestoreConfig Copy and pass it by value
Summary:
This is a very small struct (2 u64s) that really doesn't need to be passed by
reference. Might as well just pass it by value.

Differential Revision: D20281936

fbshipit-source-id: 2cc64c8ab6e99ee50b2e493eff61ea34d6eb54c1
2020-03-06 02:00:23 -08:00
Lukas Piatkowski
7ddcdd818c mononoke: make sql_ext OSS buildable
Summary: separate out the Facebook-specific pieces of the sql_ext crate

Reviewed By: ahornby

Differential Revision: D20218219

fbshipit-source-id: e933c7402b31fcd5c4af78d5e70adafd67e91ecd
2020-03-06 01:33:38 -08:00
David Tolnay
754a755eee rust: Rename tokio_preview:: to tokio::
Summary:
Context: https://fb.workplace.com/groups/rust.language/permalink/3338940432821215/

This codemod replaces all dependencies on `//common/rust/renamed:tokio-preview` with `fbsource//third-party/rust:tokio-preview` and their uses in Rust code from `tokio_preview::` to `tokio::`.

This does not introduce any collisions with `tokio::` meaning 0.1 tokio because D20235404 previously renamed all of those to `tokio_old::` in crates that depend on both 0.1 and 0.2 tokio.

This is the tokio version of what D20213432 did for futures.

Codemod performed by:

```
rg \
    --files-with-matches \
    --type-add buck:TARGETS \
    --type buck \
    --glob '!/experimental' \
    --regexp '(_|\b)rust(_|\b)' \
| sed 's,TARGETS$,:,' \
| xargs \
    -x \
    buck query "labels(srcs, rdeps(%Ss, //common/rust/renamed:tokio-preview, 1))" \
| xargs sed -i 's,\btokio_preview::,tokio::,'

rg \
    --files-with-matches \
    --type-add buck:TARGETS \
    --type buck \
    --glob '!/experimental' \
    --regexp '(_|\b)rust(_|\b)' \
| xargs sed -i 's,//common/rust/renamed:tokio-preview,fbsource//third-party/rust:tokio-preview,'
```

Reviewed By: k21

Differential Revision: D20236557

fbshipit-source-id: 15068b93a0a944d6249a1d9f63840a4c61c9c1ba
2020-03-05 14:25:10 -08:00
Thomas Orozco
3ee98c82e2 mononoke/microwave: add support for changesets
Summary:
This updates microwave to also support changesets, in addition to filenodes.
Those create a non-trivial amount of SQL load when we warm up the cache (due to
sequential reads), which we can eliminate by loading them through microwave.

They're also a bottleneck when manifests are loaded already.

Note: as part of this, I've updated the Microwave wrapper methods to panic if
we try to access a method that isn't instrumented. Since we'd be running
the Microwave builder in the background, this feels OK (because then we'd find
out if we call them during cache warmup unexpectedly).

Reviewed By: farnz

Differential Revision: D20221463

fbshipit-source-id: 317023677af4180007001fcaccc203681b7c95b7
2020-03-05 11:57:43 -08:00
Aida Getoeva
db19504972 mononoke: derive changeset info
Summary:
Implementation of derivation logic for the changeset info.

BonsaiDerived is implemented for the ChangesetInfo. `derive_from_parents` just derives an info and BonsaiDerivedMapping then puts it into the blobstore.

```
ChangesetInfo::derive(..) -> ChacgesetInfo
```

Reviewed By: krallin

Differential Revision: D20185954

fbshipit-source-id: afe609d1b2711aed7f2740714df6b9417c6fe716
2020-03-05 08:24:38 -08:00
Stanislau Hlebik
dded155135 mononoke: do not derive while initializing warm bookmark cache
Summary:
Previously warm bookmark cache tried to derive all bookmarks on startup. It slows down the startup time and in some cases it might prevent scs server from starting up at all.

Let's change how warm bookmark cache initializes the bookmarks - instead of trying to derive all of them let's move underived bookmarks back in history.

Reviewed By: krallin

Differential Revision: D20195211

fbshipit-source-id: 5cb5d8599d3035973175d3063186a7c01536889a
2020-03-04 13:14:32 -08:00
Thomas Orozco
275e4eff76 mononoke/mercurial: remove incorrect FileBytes Extend implementation
Summary:
This removes the Extend implementation for FileBytes, which was incorrect (it
discarded existing data!). I had introduced this as a backwards compatibility
shim when doing the Bytes 0.4 to Bytes 0.5 migration :/

We don't really need this shim, considering:

- The only place that really matters that uses this is the remotefilelog crate,
  where we have a content id, and where we should use `filestore::fetch_concat`
  instead.
- The other places are tests (or close to abandonware...), which can do their
  own folding.

Longer term, I'd like to remove the whole `Content` stream in hg entries, so
those callsites can use the filestore methods, which a) have test coverage
(unlike ad-hoc folds, which don't always do), and b) are more efficient since
they know how large the destination buffer needs to be ahead of time, and don't
need to re-allocate.

To make sure this fixes the bug, I also introduced tests for the remotefilelog
crate. As expected, the chunked variant fails without this fix.

Reviewed By: mitrandir77

Differential Revision: D20248978

fbshipit-source-id: 1b554d3e595eb867b6b6cf4204d31f27dd90a111
2020-03-04 08:51:42 -08:00
David Tolnay
e988a88be9 rust: Rename futures_preview:: to futures::
Summary:
Context: https://fb.workplace.com/groups/rust.language/permalink/3338940432821215/

This codemod replaces *all* dependencies on `//common/rust/renamed:futures-preview` with `fbsource//third-party/rust:futures-preview` and their uses in Rust code from `futures_preview::` to `futures::`.

This does not introduce any collisions with `futures::` meaning 0.1 futures because D20168958 previously renamed all of those to `futures_old::` in crates that depend on *both* 0.1 and 0.3 futures.

Codemod performed by:

```
rg \
    --files-with-matches \
    --type-add buck:TARGETS \
    --type buck \
    --glob '!/experimental' \
    --regexp '(_|\b)rust(_|\b)' \
| sed 's,TARGETS$,:,' \
| xargs \
    -x \
    buck query "labels(srcs, rdeps(%Ss, //common/rust/renamed:futures-preview, 1))" \
| xargs sed -i 's,\bfutures_preview::,futures::,'

rg \
    --files-with-matches \
    --type-add buck:TARGETS \
    --type buck \
    --glob '!/experimental' \
    --regexp '(_|\b)rust(_|\b)' \
| xargs sed -i 's,//common/rust/renamed:futures-preview,fbsource//third-party/rust:futures-preview,'
```

Reviewed By: k21

Differential Revision: D20213432

fbshipit-source-id: 07ee643d350c5817cda1f43684d55084f8ac68a6
2020-03-03 11:01:20 -08:00
David Tolnay
fe65402e46 rust: Move futures-old rdeps to renamed futures-old
Summary:
In targets that depend on *both* 0.1 and 0.3 futures, this codemod renames the 0.1 dependency to be exposed as futures_old::. This is in preparation for flipping the 0.3 dependencies from futures_preview:: to plain futures::.

rs changes performed by:

```
rg \
    --files-with-matches \
    --type-add buck:TARGETS \
    --type buck \
    --glob '!/experimental' \
    --regexp '(_|\b)rust(_|\b)' \
| sed 's,TARGETS$,:,' \
| xargs \
    -x \
    buck query "labels(srcs,
        rdeps(%Ss, fbsource//third-party/rust:futures-old, 1)
        intersect
        rdeps(%Ss, //common/rust/renamed:futures-preview, 1)
    )" \
| xargs sed -i 's/\bfutures::/futures_old::/'
```

Reviewed By: jsgf

Differential Revision: D20168958

fbshipit-source-id: d2c099f9170c427e542975bc22fd96138a7725b0
2020-03-02 21:02:50 -08:00
Stanislau Hlebik
638e637ef6 RFC: mononoke: introduce unodes v2
Summary:
Our previous implementation of unodes had a problem with diamond merges -
essentially because p1 and p2 might have the same file but with different
content unode will always create a merge unode which can be unexpected.
(code comment in unodes/derive.rs has more info about it).

This diff fixes the problem by introducing unodes v2. This allows us to import
new repos with new unode implementation while keeping the old repos with unode
v1.

This implementation uses a heuristic which should be fast and should do the
correct thing most of the time. In some cases it might exclude some parts of
the history completely. For example:

     O <- merge commit, doesn't change anything
    / \
   P1  |  <- modified "file.txt" to "B"
   |   P2    <- modified "file.txt" to "B"
   \  /
    ROOT <- created "file.txt" with content "A"

In that case history of "file.txt" starting from merge commit will contain only (P1, ROOT),
but it won't contain P2.

We also considered other options:
1) Move this heuristic to fastlog batch derived data. See D19973553 for more
details about why we decided not to do it.

2) Filter out parent unodes that are ancestors of other parent unodes. This should
always be correct, but it will be hard to implement, it wil be even harder to make
sure it always have good performance.

Reviewed By: krallin

Differential Revision: D19978157

fbshipit-source-id: 445ddd5629669d987e7aa88c35fecf0b34a40da0
2020-03-02 05:27:31 -08:00
Thomas Orozco
95d463ce47 mononoke/filenodes: Remove path from FilenodeInfo
Summary:
This updates our filenodes implementation to use different types for writing
(`PreparedFilenode`) and reading `(FilenodeInfo`).

The bottom line is that this avoids a bunch of cloning of paths on the read
path, which doesn't need to return the path to the caller, since the caller
already knows it! We can also take it out of Memcache, since we don't need
Memcache to tell us the path for a blob we could only possibly have found by
having the path to begin with.

This does update our filenodes serialization format. I bumped MC_CODEVER
accordingly.

Reviewed By: StanislavGlebik

Differential Revision: D19905400

fbshipit-source-id: 6037802c1773de564cade8e264d36087382ee15a
2020-02-27 12:34:21 -08:00
Thomas Orozco
f6866eb97d mononoke: switch to new filenodes implementation
Summary:
This updates Mononoke to use the new filenodes implementation introduced
earlier in this stack.

See the test plan for detailed performance results supporting why I'm making
this change.

Reviewed By: StanislavGlebik

Differential Revision: D19905394

fbshipit-source-id: 8370fd30c9cfd075c3527b9220e4cf4f604705ae
2020-02-27 12:34:20 -08:00
Thomas Orozco
341b4f1bc3 mononoke/filenodes: expect a Vec of filenodes to insert
Summary:
The API expects a stream of filenodes to insert, but we actually never used
that ability. Instead, every single callsites has a `Vec`, which it converts to
a stream and passes that in.

I'd like to change this for two reasons:

- It's un-necessary
- It makes the code more complex on the Filenodes implementation side, and less
  efficient, since we need to `chunk()` there in small chunks, which might not
  all be in the same shard. If we get the entire `Vec` at once, we can chunk on a
  per-shard basis (this happens later in this stack).

Besides, if we end up having a stream and wanting the old behavior, we can
always call `chunk()` the stream and call `add_filenodes` on each batch (which
is actually nicer because if you have a futures 0.2 stream that isn't static,
you can do this, but you can't turn it into a `BoxStream`!).

Reviewed By: StanislavGlebik

Differential Revision: D19902537

fbshipit-source-id: a4c030c4a51afbb6e9db133b32464009eed197af
2020-02-27 12:34:18 -08:00
Thomas Orozco
26ae726af5 mononoke: update internals to Bytes 0.5
Summary:
The Bytes 0.5 update left us in a somewhat undesirable position where every
access to our blobstore incurs an extra copy whenever we fetch data out of our
cache (by turning it from Bytes 0.5 into Bytes 0.4) — we also have quite a few
place where we convert in one direction then immediately into the other.

Internally, we can start using Bytes 0.5 now. For example, this is useful when
pulling data out of our blobstore and deserializing as Thrift (or conversely,
when serializing and putting it into our blobstore).

However, when we interface with Tokio (i.e. decoders & encoders), we still have
to use Bytes 0.4.  So, when needed, we convert our Bytes 0.5 to 0.4 there.

The tradeoff idea is that we deal with more bytes internally than we end up
sending to clients, so doing the Bytes conversion closer to the point of
sending data to clients means less copies.

We can also start removing those once we migrate to Tokio 0.2 (and newer
versions of Hyper for HTTP services).

Changes that were required:

- You can't extend new bytes (because that implicitly copies). You need to use
  BytesMut instead, which I did where that was necessary (I also added calls in
  the Filestore to do that efficiently).
- You can't create bytes from a `&'a [u8]`, unless `'a` is  `'static`. You need
  to use `copy_from_slice` instead.
- `slice_to` and `slice_from` have been replaced by a `slice()` function that
  takes ranges.

Reviewed By: StanislavGlebik

Differential Revision: D20121350

fbshipit-source-id: eb31af2051fd8c9d31c69b502e2f6f1ce2190cb1
2020-02-27 08:08:28 -08:00
Mateusz Kwapich
3ff29a8810 make BonsaiGitMapping repo-specific
Summary:
Nearly all of the Mononoke SQL stores are instantiated once per repo but they don't store the `RepositoryId` anywhere so every method takes it as argument. And because providing the repo_id on every call is not ergonomical we tend to add methods to blob_repo that just call the right method with the right repo_id in on of the underlying stores (see `get_bonsai_from_globalrev` on blobrepo for example).

Because my reviewers [pushed back](https://our.intern.facebook.com/intern/diff/D19972871/?transaction_id=196961774880671&dest_fbid=1282141621983439) when I've tried to do the same for bonsai_git_mapping  I've decided to make it right by adding the repo_id to the BonsaiGitMapping.

Reviewed By: krallin

Differential Revision: D20029485

fbshipit-source-id: 7585c3bf9cc8fa3cbe59ab1e87938f567c09278a
2020-02-27 08:05:13 -08:00
Stanislau Hlebik
7076fac933 mononoke: add exponential backoff
Summary:
During our tests we noticed that we can send too many blobstore read requests to the
mapping. Let's add exponential backoff to prevent that

Reviewed By: ikostia

Differential Revision: D20116043

fbshipit-source-id: 6fecbda4c36a5065b77ba9df561c6d9c6a969089
2020-02-26 05:05:33 -08:00
Stanislau Hlebik
19e1e94984 mononoke: add lease renewing to derived data
Summary:
During S196197 lease expired and we were rederiving the same derived data over and over again for a big commit.
this diff adds lease renewal that should help with this problem.

Reviewed By: HarveyHunt

Differential Revision: D20093323

fbshipit-source-id: d139abf6659722f47ea40d9b2f279daa03623ff4
2020-02-25 09:22:46 -08:00
Thomas Orozco
b3bebee0b4 mononoke: include DB config in multiplexed blobstore configuration
Summary:
This updates our multiplexed blobstore configuration to carry its own DB
config. The upshot of this change is that we can move the blobstore sync queue
(a fairly unruly table) to its own DB.

Another nice side effect of this is that it cleans up a bunch of other code, by
finally decoupling the blobstore config from the DB config. For examples,
places that need to instantiate a blobstore can now to do even without a DB
config (such as wireproto logging).

Obviously, this cannot land until we update the configs to include this. I'll
do so in Configerator prior to landing the diff.

Reviewed By: HarveyHunt

Differential Revision: D19973905

fbshipit-source-id: 79e4ff92cdb989aab4532decd3fe4fd6c55e2bb2
2020-02-24 11:54:45 -08:00
Mateusz Kwapich
c2be00c45e add git mappings to blobrepo
Summary:
By having it in blobrepo we can ensure that all parts of mononoke can access it
easily

Reviewed By: StanislavGlebik

Differential Revision: D19949474

fbshipit-source-id: ac3831d61177c4ef0ad7db248f2a0cc5edb933b1
2020-02-21 05:41:44 -08:00
Thomas Orozco
16384599a8 mononoke (+ rust/shed/async_unit): update async_unit to expect async fn's
Summary:
This allows code that is being exercised under async_unit to call into code
that expects a Tokio 0.2 environment (e.g. 0.2 timers).

Unfortunately, this requires turning off LSAN for the async_unit tests, since
it looks like LSAN and Tokio 0.2 don't work very well together, resulting in
LSAN reporting leaked memory for some TLS structures that were initialized by
tokio-preview (regardless of whether the Runtime is being dropped):
https://fb.workplace.com/groups/rust.language/permalink/3249964938385432/

Considering async_unit is effectively only used in Mononoke, and Mononoke
already turns off LSAN in tests for precisely this reason ... it's probably
reasonable to do the same here.

The main body of changes here is also about updating the majority of our
changes to stop calling wait(), and use this new async unit everywhere. This is
effectively a pretty big batch conversion of all of our tests to use async fns
instead of the former approaches. I've also updated a substantial number of
utility functions to be async fns.

A few notable changes here:

- Some pushrebase tests were pretty flaky — the race they look for isn't
  deterministic. I added some actual waiting (using pushrebase hooks) to make
  it more deterministic.  This is kinda copy pasted from the globalrev hook
  (where I had introduced this first), but this will do for now.
- The multiplexblob tests don't work at all with new futures, because they call
  `poll()` all over the place. I've updated them to new futures, which required
  a bit of reworking.
- I took out a couple tests in async unit that were broken anyway.

Reviewed By: StanislavGlebik

Differential Revision: D19902539

fbshipit-source-id: 352b4a531ef5fa855114c1dd8bb4d70ed967dd55
2020-02-18 01:55:00 -08:00
Stanislau Hlebik
ee7c0a8d26 backfill_derived_data mononoke: fail if derived data disabled
Summary:
Now let's fail if we try to derive data that's not enabled in the config.
Note that this diff also adds `derive_unsafe()` method which should be used in backfiller.

Reviewed By: krallin

Differential Revision: D19807956

fbshipit-source-id: c39af8555164314cafa9691197629ab9ddb956f1
2020-02-16 04:56:34 -08:00
Stanislau Hlebik
975d226cf2 mononoke: disable filenodes generation during hg changeset generation, take 2
Reviewed By: aslpavel

Differential Revision: D19856419

fbshipit-source-id: 0d1bcecf9f700d0789bba1227ef8a58748f04bbf
2020-02-13 03:18:13 -08:00
Pavel Aslanov
c902acbc3f remove the need to pass mapping to ::derive method
Summary: remove the need to pass mapping to `::derive` method

Reviewed By: StanislavGlebik

Differential Revision: D19856560

fbshipit-source-id: 219af827ea7e077a4c3e678a85c51dc0e3822d79
2020-02-12 10:22:39 -08:00
Lukasz Piatkowski
542d1f93d3 Manual synchronization of fbcode/eden and facebookexperimental/eden
Summary:
This commit manually synchronizes the internal move of
fbcode/scm/mononoke under fbcode/eden/mononoke which couldn't be
performed by ShipIt automatically.

Reviewed By: StanislavGlebik

Differential Revision: D19722832

fbshipit-source-id: 52fbc8bc42a8940b39872dfb8b00ce9c0f6b0800
2020-02-11 11:42:43 +01:00
Stanislau Hlebik
2b7e7e5676 mononoke: log if derived data is not enabled
Summary:
Before we start blocking generation of derived data let's start with logging if
derived data is not specified.

Reviewed By: farnz

Differential Revision: D19791523

fbshipit-source-id: 15bed8463f8a021de76a2878f06ec95d9fef877f
2020-02-10 01:44:09 -08:00
Stanislau Hlebik
8abe1af621 mononoke: add DerivedDataConfig
Summary:
See D19787960 for more details why we need to do it.
This diff just adds a struct in BlobRepo

Reviewed By: HarveyHunt

Differential Revision: D19788395

fbshipit-source-id: d609638432db3061f17aaa6272315f0c2efe9328
2020-02-10 01:44:09 -08:00
Thomas Orozco
d39eea991b blobrepo: don't fetch Hg Changeset IDs sequentially
Summary:
Fetching things from MySQL sequentially in a buffered fashion is a bad
practice, since we might end up saturating the underlying MySQL pool, and
starving other MySQL  clients.

Instead, let's make fewer, bigger queries.

Reviewed By: ahornby

Differential Revision: D19766787

fbshipit-source-id: 1cf9102eaca8cc1ab55b7b85039ca99627a86b71
2020-02-06 12:11:22 -08:00
Thomas Orozco
ce8b9a0fbe getbundle_response: don't fetch Hg Changeset IDs sequentially
Summary:
Fetching things from MySQL sequentially in a buffered fashion is a bad
practice, since we might end up saturating the underlying MySQL pool with a lot
of requests. Doing so will result in other queries being delayed as they wait
behind our batch of queries, which results in higher dispatch latency.

Instead, let's make fewer, bigger queries. Also, while we're in here, let's
update blobrepo to have an up-to-date comment.

Reviewed By: StanislavGlebik

Differential Revision: D19766788

fbshipit-source-id: 318ec4778ca259b210d431fc2add8b327bfce99a
2020-02-06 12:11:21 -08:00
Liubov Dmitrieva
8228f84a60 Short hashes lookup: implement suggestions the same way as in Mercurial.
Summary:
Suggestions come in the error message as it is currently implemented in
Mercurial code. Format of suggestions also stays the same.

We give the hash, time, author and the title.

All suggestions are ordered (most recent go first).

We don't show them if there are two many.

Reviewed By: krallin

Differential Revision: D19732053

fbshipit-source-id: b94154cbc5a4f440a0053fc3fac2bca2ae0b7119
2020-02-06 07:43:51 -08:00
Stanislau Hlebik
c97ceda175 mononoke: update IncompleteFilenodes to make transition to FilenodesOnlyPublic
Summary:
Jump from "generating filenodes while generating hg changeset" to "generate
filenodes separately" is tricky to do without breaking production. This diff
adds additional logic in IncompleteFilenodes that should make this transition
smoother. See code comment for more details.

Reviewed By: krallin

Differential Revision: D19741913

fbshipit-source-id: 48987c15fc4144c50afcee7ae34072f6cd634271
2020-02-06 07:26:14 -08:00
Lukasz Piatkowski
e8d62b64d5 mononoke: move the codebase under eden/ directory
fbshipit-source-id: 43a0252cb3ec42aa365f20d1b6faa4d24d74c9b8
2020-02-06 13:46:04 +01:00