Commit Graph

174 Commits

Author SHA1 Message Date
Arun Kulshreshtha
8ec76a0bce mononoke_api: add hg module
Summary:
Add a a new `hg` module to the `mononoke_api` crate that provides a `HgRepoContext` type, which can be used to query the repo for data in Mercurial-specific formats. This will be used in the EdenAPI server.

Initially, the `HgRepoContext`'s functionality is limited to just getting the content of individual files. It will be expanded to support querying more things in later diffs.

Reviewed By: markbt

Differential Revision: D20117038

fbshipit-source-id: 23dd0c727b9e3d80bd6dc873804e41c7772f3146
2020-03-02 09:41:20 -08:00
Thomas Orozco
0dadca26e7 mononoke/gotham_ext: make MononokeHttpHandler middleware async & allow preemption
Summary:
This updates our middleware stack and introduces two new pieces of functinality:

- Middleware can now be async.
- Middleware can now preempt requests and dispatch a response.

The underlying motivation for this is to allow implementing Mononoke LFS's rate
limiting middleware in our existing middleware stack.

Reviewed By: kulshrax

Differential Revision: D20191213

fbshipit-source-id: fc1df7a14eb0bbefd965e32c1fca5557124076b5
2020-03-02 09:28:08 -08:00
Arun Kulshreshtha
615d8392bc mononoke_api: update doc comments on file content methods
Summary: D20121350 changed the methods for accessing file content on `FileContext` to no longer return `Stream`s. We should update the comments accordingly.

Reviewed By: ahornby

Differential Revision: D20160128

fbshipit-source-id: f5bfd7e31bc7e6db63f56b8f4fc238893aa09a90
2020-03-02 09:21:08 -08:00
Thomas Orozco
2d04773c23 mononoke/hg_sync_job: update Globalrevs in hgsql
Summary:
This updates the hg_sync_job to update Globalrevs in hgsql before attempting to
sync bundles. This means that if we're syncing successfully, hg is in sync with
Mononoke, and if we fail (which should be very uncommon to begin with!), hg
might skip a little bit ahead, but that's OK.

This only makes sense when generating bundles — when doing pushrebase, hg would
be updating its own globalrevs.

Reviewed By: StanislavGlebik

Differential Revision: D20159262

fbshipit-source-id: 6736f8592682da1001c7c9c4c9444462b71913c2
2020-03-02 08:24:16 -08:00
Stanislau Hlebik
638e637ef6 RFC: mononoke: introduce unodes v2
Summary:
Our previous implementation of unodes had a problem with diamond merges -
essentially because p1 and p2 might have the same file but with different
content unode will always create a merge unode which can be unexpected.
(code comment in unodes/derive.rs has more info about it).

This diff fixes the problem by introducing unodes v2. This allows us to import
new repos with new unode implementation while keeping the old repos with unode
v1.

This implementation uses a heuristic which should be fast and should do the
correct thing most of the time. In some cases it might exclude some parts of
the history completely. For example:

     O <- merge commit, doesn't change anything
    / \
   P1  |  <- modified "file.txt" to "B"
   |   P2    <- modified "file.txt" to "B"
   \  /
    ROOT <- created "file.txt" with content "A"

In that case history of "file.txt" starting from merge commit will contain only (P1, ROOT),
but it won't contain P2.

We also considered other options:
1) Move this heuristic to fastlog batch derived data. See D19973553 for more
details about why we decided not to do it.

2) Filter out parent unodes that are ancestors of other parent unodes. This should
always be correct, but it will be hard to implement, it wil be even harder to make
sure it always have good performance.

Reviewed By: krallin

Differential Revision: D19978157

fbshipit-source-id: 445ddd5629669d987e7aa88c35fecf0b34a40da0
2020-03-02 05:27:31 -08:00
Stanislau Hlebik
d7a4ff29b5 mononoke: log derivations to separate scuba table
Summary: I'd like to log all derivations to a single place so that's it's easier to understand what was derived and where

Reviewed By: aslpavel

Differential Revision: D20140004

fbshipit-source-id: 305ea533031a04ff95995a6fe2a6e57e95a87026
2020-03-02 04:30:12 -08:00
Alex Hornby
63937e3030 mononoke: walker: log the source node when validating
Summary: Log the source node when validating so that we can more quickly reproduce any issues in a single step via the --walk-root option, rather than needing to run the entire walk again.

Differential Revision: D20098200

fbshipit-source-id: 6b0d7d151c97f25080953d6c0fbf431dc2cec6a8
2020-03-02 02:29:34 -08:00
Stanislau Hlebik
168b74e38c mononoke: fix logging in bookmarks
Reviewed By: ahornby

Differential Revision: D20161053

fbshipit-source-id: 7c69bf9421dd9e55bc2ca805c2f14b9c4cd0e669
2020-02-28 13:24:29 -08:00
Stanislau Hlebik
9cf34d97ca mononoke: asyncify WarmBookmarksCache
Reviewed By: ikostia

Differential Revision: D20159967

fbshipit-source-id: dab201530416f17da4b4a3be6c4ecc04b2c10950
2020-02-28 13:24:28 -08:00
Thomas Orozco
82027505a0 mononoke/mercurial: add tests for metadata extraction
Summary:
I noticed in my earlier Bytes 0.5 diff that this doesn't have local test
coverage (there might be things somewhere else in the test suite that look for
it). Let's add some.

Reviewed By: ahornby

Differential Revision: D20139437

fbshipit-source-id: c17e4516574d674bb0b009cd1f322008fb3c1a79
2020-02-28 10:54:04 -08:00
Alex Hornby
938830d3f6 mononoke: walker: add ability to track route to node
Summary: Add ability to track route to node, so that one could report the node from which failing step started from.

Reviewed By: ikostia

Differential Revision: D20097615

fbshipit-source-id: 4f2c000f54bd212225533e7f3570178020f34a9d
2020-02-28 09:01:35 -08:00
Kostia Balytskyi
cec057adc5 mononoke: add some perf counters for hydrated getbundle responses
Summary:
In case this starts to cause problems, let's have a way to correlate those
problems with some exported metrics.

Reviewed By: StanislavGlebik

Differential Revision: D20158822

fbshipit-source-id: 6ac9e25861dbedaecdf04fd92bda835ae66535eb
2020-02-28 08:30:43 -08:00
Kostia Balytskyi
7ed52ee31b mononoke: return hydrated bundles for infinitepush, if config says so
Summary:
## Wider goal
See D20068839

## This diff
This diff actually implements the conditional hydration of `getbundle`
responses, as described in the D20068839.

Note that as well as implementing support for hydrated `getbyndle` responses, this diff also implements support for changegroup v3 and lfs in such responses, which is needed if we are to do this kind of stuff in LFS-enabled repository.

Reviewed By: StanislavGlebik

Differential Revision: D20068838

fbshipit-source-id: fbdd3f8f5fb7cd2cb60473a94094553a1d4b4d2f
2020-02-28 08:30:43 -08:00
Alex Hornby
7f09703c4c mononoke: walker: log per-run session id to scuba for validate
Summary:
Extend the session id logging to the validate command by adding ability to set
the progress reporters scuba builder.

Reviewed By: ikostia

Differential Revision: D20074153

fbshipit-source-id: ceaeebdb7eb976080061ad3b76b22d7a0f7bd891
2020-02-28 04:57:09 -08:00
Alex Hornby
7baf1066ab mononoke: walker: fix performance regression in loading file data for compression-benefit
Summary: Fix performance regression in loading file data in compression-benefit subcommand

Reviewed By: StanislavGlebik

Differential Revision: D20142143

fbshipit-source-id: 0b9d93feaddab1df4b9d5777e0637f35aed2feda
2020-02-28 04:57:08 -08:00
Thomas Orozco
c6957c1f1e mononoke/newfilenodes: use for for_sharded_connection()
Summary: I canaried with this but I forgot to fold it in -_-

Reviewed By: HarveyHunt

Differential Revision: D20158157

fbshipit-source-id: 4a570bbca421d8c3e1e66605f164f2b8e2a433f6
2020-02-28 04:53:03 -08:00
Kostia Balytskyi
d5080d20ce mononoke: asyncify get_manifest_and_filenodes in getbundle_response
Summary:
## Wider goal
See D20068839

## This diff
Modernize this particular function

Reviewed By: StanislavGlebik

Differential Revision: D20097802

fbshipit-source-id: fe76aaf2c0b65cf9b47a1dedc66d417d22cad255
2020-02-28 04:36:38 -08:00
Kostia Balytskyi
7755c4c4e6 mononoke: asyncify prepare_filenode_entries_stream in getbundle_response
Summary:
## Wider goal
See D20068839

## This diff
Modernize this particular function.

Reviewed By: krallin

Differential Revision: D20097805

fbshipit-source-id: bbcf371921d3a709cc7178ec50b7729bddf1f630
2020-02-28 02:49:57 -08:00
Thomas Orozco
c680696e40 mononoke: defer hook loading
Summary:
Most binaries don't need hooks. Let's not require them. This might not be very
long lived since Simon is working on removing lua hooks, but this was a trivial
fix.

Reviewed By: johansglock

Differential Revision: D20140026

fbshipit-source-id: cc74b37459f63c5dd550c5779b72aa1d6531202c
2020-02-28 02:03:07 -08:00
Thomas Orozco
515f4a507d mononoke/cachelob: remove Memcache blob write leases
Summary:
(this doesn't remove ad-hoc leases, like derived data)

Let's see if this has any impact on performance. We no longer fail Manifold
writes on conflicts, and

Reviewed By: StanislavGlebik

Differential Revision: D20038572

fbshipit-source-id: 4a972ff09ceb65e69a1d22a643a8f2d9b2ab1b17
2020-02-28 01:59:36 -08:00
David Tolnay
37a8401761 rust/thrift: Un-rename futures-preview dependency
Summary: The Thrift generated code depends only on futures 0.3, not 0.1. Thus it isn't necessary to depend on renamed:futures-preview and we can depend on futures-preview directly, which is exposed to Rust code as `futures::`.

Reviewed By: jsgf

Differential Revision: D20145921

fbshipit-source-id: 5cae94ec6747a374c2bf05f124ab237c798de005
2020-02-27 22:27:58 -08:00
David Tolnay
d8bd00ce36 rust/thrift: Drop unused dependencies on old futures in various places
Summary:
The last uses of futures 0.1 were removed in D18411564 and D18392252.

A later diff will switch thrift from using renamed:futures-preview to plain futures-preview to prepare for eliminating the -preview suffix.

Reviewed By: jsgf

Differential Revision: D20143832

fbshipit-source-id: b7fd79f18368ade59eeba6ed0ac09613000c046b
2020-02-27 22:24:10 -08:00
Jeff Zhang
c517e81329 Push compat down deeper into subcommands & make subcommand functions async in eden/mononoke/cmds/admin/main.rs
Summary: Continue to push `compat()` deeper into subcommands. This enables us to refactor each file one at a time and ultimately remove the old futures from our code base.

Reviewed By: farnz

Differential Revision: D20132126

fbshipit-source-id: cc10dde6eda7ddcbf911dbe8d3ebe1713f8ec2ab
2020-02-27 12:39:28 -08:00
Thomas Orozco
b7dfbdd09d mononoke/newfilenodes: stop using i8 internally for is_tree
Summary: Makes the code a little nicer to work with.

Reviewed By: HarveyHunt

Differential Revision: D20138720

fbshipit-source-id: 19f228782ab3582739e35fddcb2b0bf952110641
2020-02-27 12:34:23 -08:00
Thomas Orozco
ed602e6009 mononoke/newfilenodes: retry on master whens paths are missing
Summary:
Paths are in a different replica, so they can be missing even if copy info is
present. Let's fallback to master in this case.

Differential Revision: D20098902

fbshipit-source-id: 838ab1c70a74420c431a2f442f1504c8edd29a2e
2020-02-27 12:34:23 -08:00
Thomas Orozco
4d2932c43b mononoke/newfilenodes: switch to a virtual sharding strategy
Summary:
Locking by physical shard worked earlier in this stack as indicated in the
benchmarks, but after Ondemand restored their fetching for www, it proved
insufficient in terms of parallelism, and resulted in substantially slower
gettreepacks.

Besides, with the "physical sharding" approach, we found ourselves between a rock and a hard place in terms of what to do with paths:

- We could keep holding the semaphore for a filenode while fetching paths. This is undesirable because it further limits our level our concurrency (because fetching a filenode + paths is going to be at least 2x as slow as fetching a filenode).
- We could fetch them without holding a lease at all. This is even more undesirable, because it means that when we release the semaphore for a given shard, we haven't filled the cache yet. This means that if we have a queue of 2 requests for the same bit of data, we're going to fetch twice (task A acquires the lock, goes to MySQL for the filenode, releases the lock and starts going to paths, at which point task B acquires the lock and goes to MySQL again since the filenode hasn't been filled yet).

To fix this, I had to add a dedicated cache for paths, and put it behind  semaphores as well. In the example above, this would ensure task B finds a "partial filenode" in the cache and doesn't go to MySQL (instead, it goes straight up to queuing for access to paths, where it will wait behind task A and also won't hit MySQL).

There are a few problems with this:

- It's a lot of extra complexity (because we need to handle half misses where we have the filenode but not the path).
- It ties together our level of concurrency a second time to that of the underlying number of physical shards, which is kinda meaningless when some of this data can be provided by Memcache to begin with.

This diff fixes both problems.

The root cause of our problem that is that we're tying our level of concurrency to physical
MySQL shards, whereas what we actually want is a tunable level of concurrency
that matches our work load, yet effectively deduplicates queries.

In this diff, I'm updating our exclusive locking to be purely virtual. This
means that we're still not over-fetching, but we are no longer constrained by
the parallelism of the underlying DB (this does mean we might queue up requests
there, but they won't be duplicate requests).

This also results in simpler code, and opens up the way for further
improvements in the future, such as using Memcache lease-get operations to
further deduplicate calls, if we'd like.

As part of that, I've also updated our remote_cache to use the same CacheKey
entity as the local cache, to avoid spending time producing new keys when we
have perfectly good ones available.

Reviewed By: StanislavGlebik

Differential Revision: D20097821

fbshipit-source-id: 03d7be9082982fc1c6ef365d541c1ed8ae3e6e8d
2020-02-27 12:34:23 -08:00
Thomas Orozco
b4e8201d4c mononoke/newfilenodes: track perf counters appropriately
Summary: Let's record perf counters properly.

Reviewed By: StanislavGlebik

Differential Revision: D20097823

fbshipit-source-id: 0daed281d3c080fcbe7b4fac996fb265bdd6d408
2020-02-27 12:34:22 -08:00
Thomas Orozco
500baffb5c mononoke/newfilenodes: add tests for cache fill behavior
Summary:
This adds a test for our cache fill behavior, which is to fill the remote cache
if we miss in local cache. I hadn't added this later and it's a little easier
to add now that the refactor for FilenodeInfo is through.

Reviewed By: ahornby

Differential Revision: D19905396

fbshipit-source-id: 88b5fd83f5d2213e91efc3c5dfb91dfe4e395136
2020-02-27 12:34:22 -08:00
Thomas Orozco
95d463ce47 mononoke/filenodes: Remove path from FilenodeInfo
Summary:
This updates our filenodes implementation to use different types for writing
(`PreparedFilenode`) and reading `(FilenodeInfo`).

The bottom line is that this avoids a bunch of cloning of paths on the read
path, which doesn't need to return the path to the caller, since the caller
already knows it! We can also take it out of Memcache, since we don't need
Memcache to tell us the path for a blob we could only possibly have found by
having the path to begin with.

This does update our filenodes serialization format. I bumped MC_CODEVER
accordingly.

Reviewed By: StanislavGlebik

Differential Revision: D19905400

fbshipit-source-id: 6037802c1773de564cade8e264d36087382ee15a
2020-02-27 12:34:21 -08:00
Thomas Orozco
7fa9607859 mononoke/newfilenodes: remove sqlfilenodes
Summary:
This removes the old sqlfilenodes implementation, since we're now using the new
one. There's also a bit of cruft here and there we can get rid of.

Reviewed By: StanislavGlebik

Differential Revision: D19905395

fbshipit-source-id: 2526b6d65eeb981f5aedda9951b44b389ecec29d
2020-02-27 12:34:21 -08:00
Thomas Orozco
149e15f2ad mononoke: use spawn_future in getpack to fetch history
Summary:
The former implementation would eagerly query Memcache when fetching history
(due to how old futures work) for files in getpack, but the new one does not.
This means the new one loses out on a lot of buffering, which the old one used
to do.

This diff emulates the old behavior by eagerly querying filenodes in getpack,
which improves performance on a very big getpack (32K files) by about 3x, and
makes it 30% faster than the old code, instead of > 2x slower.

Note that I'm not certain we really want to do this kind of aggressive
buffering in getpack long term, but for now, I'd like to keep this unchanged.

Reviewed By: StanislavGlebik

Differential Revision: D19905398

fbshipit-source-id: 49f9a2cd505a98123fd1dabb835e8e378d45c930
2020-02-27 12:34:21 -08:00
Thomas Orozco
f6866eb97d mononoke: switch to new filenodes implementation
Summary:
This updates Mononoke to use the new filenodes implementation introduced
earlier in this stack.

See the test plan for detailed performance results supporting why I'm making
this change.

Reviewed By: StanislavGlebik

Differential Revision: D19905394

fbshipit-source-id: 8370fd30c9cfd075c3527b9220e4cf4f604705ae
2020-02-27 12:34:20 -08:00
Thomas Orozco
a039745642 mononoke/newfilenodes: introduce timeouts talking to Memcache, MySQL
Summary:
Since we have one connection per shard, it's a good idea to make sure we don't
keep those locked for too long. This diffs adds generous timeouts to protect
against this, as well as ODS reporting to track errors.

Reviewed By: StanislavGlebik

Differential Revision: D19905393

fbshipit-source-id: ee4f4d3e33cf48a9002b016e31d37a401c6578f2
2020-02-27 12:34:20 -08:00
Thomas Orozco
c31b7d9ef9 mononoke/newfilenodes: introduce remote caching
Summary:
This introduces caching of filenodes to Memcache as in the old filenodes
implementation. The code is mostly was ported over from the existing filenodes
implementation, and converted to async / await. However, one key difference is
that the lookups happen once we hold the semaphore to talk to the underlying
MySQL shard.

The reason for this is:

- Reads to Memcache are really fast. They're often under 1ms. If you're going
  to miss in Memcache and have to go to SQL, it won't make you much slower.
- Reads to Memcache are kinda expensive CPU-wise. Data in Memcache is
  compressed, and we often see a lot of our CPU cycles spent talking to Memache
  when we're under load.
- Memcache isn't an infinite resource. If we're reading the exact same
  key a hundred times, that's going to hit the same Memcache box. A bit of
  deduplication on our end is a nice thing to strive for. Besides, our own
  thread pool we use to talk to Memcache is limited in size.

From a performance perspective, this doesn't make things any slower, but
reduces CPU usage when we'd otherwise have a lot of duplicate fetching.

Finally, note that this update also includes support for dirty-tracking in our
local cache. We use this to know if we should fill the remote cache (if we 100%
hit in local cache, we don't fill the remote cache).

Reviewed By: StanislavGlebik

Differential Revision: D19905390

fbshipit-source-id: 363f638bb24cf488c7cd3a8ecea43e93f8391d3f
2020-02-27 12:34:19 -08:00
Thomas Orozco
1c94a586f0 mononoke/newfilenodes: introduce local caching
Summary:
This is the meat of the change I'm trying to make here. This updates
newfilenodes to check their cache before dispatching queries to MySQL once they
acquire the connection.

Since we only get one connection per shard, this ensures that we don't query
several times for the same piece of data.

Note that the caching structure is a little different from the old one, which
cached entire filenode info. Instead, this now caches the exact data we'd get
out of MySQL, since we want to map MySQL queries 1-1 to cache lookups.

With this change, we also now have a local cache for file history queries.
Historically, we hadn't cached those at all, but with this change, we can get a
lot of value of caching them even for small period of time in order to
de-amplify reads to MySQL and Memcache.

However, they are in separate cache pools to make sure they don't evict point
filenodes, which we use for gettreepack (and have a good hit rate, unlike
history blocks, which have a pretty poor hit rate).

Note that having those semaphored connections might feel a little scary, but
it's worth noting that the exact same bottleneck is implicitly present in the
existing filenodes implementation, since we can only have one active query to
any given shard a given time. That said, this approach also gives us a little
more future flexibility, if we'd like, since we could map multiple semaphores to
"sub shards" that map N-to-1 to real, physical shards.

Reviewed By: HarveyHunt

Differential Revision: D19905391

fbshipit-source-id: 02b5efaa44789e6afcccdeb9ee2b4791f7c3c824
2020-02-27 12:34:19 -08:00
Thomas Orozco
ab4f7adaeb mononoke/newfilenodes: introduce a queue-conscious filenodes implementation
Summary:
This introduces a new implementation of filenodes that maintains its own
queuing on top of the queuing enforced by the SQL crate.

Later in this stack, the goal is for this implementation to avoid dispatching
duplicate queries when there is a lot of contention talking to MySQL, which
happens when large changes land and suddenly everyone wants the updated code.

The underlying goal is to avoid dispatching a lot of duplicate queries when
there is contention. Indeed, if there is contention, then the latency between
query and response increases. As a result, without visibility in the queue, the
following can happen:

- Task 1 looks for A in the cache. It misses
- Task 1 dispatches a SQL query
- Task 2 looks for A in the cache. It misses
- Task 2 dispatches a SQL query
- Task 3 looks for A in the cache. It misses
- Task 3 dispatches a SQL query
- ...
- Task 1's SQL query finally executes and fills the cache.
- All other queries execute anyway.

The longer the dispatch queue, the longer it takes to run those queries.
Looking at Mononoke's stats in prod, this happens pretty often:
https://pxl.cl/10xxmo (the spike at 3pm was a 10K-files change in fbsource, for
example).

The goal of this stack is to avoid this effect, by checking the cache only once
we know we're ready to go to SQL.

In this particular diff, what's added is:

- The SQL read and write implementation. This is all implemented using new
  futures, but the logic should be largely unchanged from before (i.e. we store
  filenodes and their associated copy info in shards by the filenode's path —
  not the source path if there is copy info —, and paths in their own shard).
  The queries themselves largely unchanged from the existing filenodes, with
  only a few tweaks:
  - Filenodes and copy info are now selected in one go.
  - There are types to distinguish path hashes and paths.
- The structs to support this implementation.

Reviewed By: StanislavGlebik

Differential Revision: D19905397

fbshipit-source-id: bec981e7bfb396d62eb06e5ce249c21555afc64b
2020-02-27 12:34:19 -08:00
Thomas Orozco
341b4f1bc3 mononoke/filenodes: expect a Vec of filenodes to insert
Summary:
The API expects a stream of filenodes to insert, but we actually never used
that ability. Instead, every single callsites has a `Vec`, which it converts to
a stream and passes that in.

I'd like to change this for two reasons:

- It's un-necessary
- It makes the code more complex on the Filenodes implementation side, and less
  efficient, since we need to `chunk()` there in small chunks, which might not
  all be in the same shard. If we get the entire `Vec` at once, we can chunk on a
  per-shard basis (this happens later in this stack).

Besides, if we end up having a stream and wanting the old behavior, we can
always call `chunk()` the stream and call `add_filenodes` on each batch (which
is actually nicer because if you have a futures 0.2 stream that isn't static,
you can do this, but you can't turn it into a `BoxStream`!).

Reviewed By: StanislavGlebik

Differential Revision: D19902537

fbshipit-source-id: a4c030c4a51afbb6e9db133b32464009eed197af
2020-02-27 12:34:18 -08:00
Stanislau Hlebik
cc8be5997e mononoke: asyncify derived data
Reviewed By: krallin

Differential Revision: D20139701

fbshipit-source-id: 7f1c8370707eb415dd7e23d94eb923846f7ef59b
2020-02-27 12:17:54 -08:00
Alex Hornby
e70f3dc76c mononoke: walker: log per-run session id to scuba for scrub
Summary:
Log a per-run session id to distinguish runs more easily.

This diff adds the session for scrub logging ,  following one extends this to validate/progress logging.

So that each tail has a separate session logged,  setup is delayed until the start of each tail by passing it in as a function.

Differential Revision: D19907398

fbshipit-source-id: 8e5470918112321866c67c9f94e703fd46e6a16b
2020-02-27 09:00:44 -08:00
Thomas Orozco
f1121ccef6 mononoke: add a @nocommit hook
Reviewed By: HarveyHunt

Differential Revision: D20139540

fbshipit-source-id: 0be6d1aa8ad7ad1197197ec886f0cf44bd6b864d
2020-02-27 08:28:05 -08:00
Thomas Orozco
26ae726af5 mononoke: update internals to Bytes 0.5
Summary:
The Bytes 0.5 update left us in a somewhat undesirable position where every
access to our blobstore incurs an extra copy whenever we fetch data out of our
cache (by turning it from Bytes 0.5 into Bytes 0.4) — we also have quite a few
place where we convert in one direction then immediately into the other.

Internally, we can start using Bytes 0.5 now. For example, this is useful when
pulling data out of our blobstore and deserializing as Thrift (or conversely,
when serializing and putting it into our blobstore).

However, when we interface with Tokio (i.e. decoders & encoders), we still have
to use Bytes 0.4.  So, when needed, we convert our Bytes 0.5 to 0.4 there.

The tradeoff idea is that we deal with more bytes internally than we end up
sending to clients, so doing the Bytes conversion closer to the point of
sending data to clients means less copies.

We can also start removing those once we migrate to Tokio 0.2 (and newer
versions of Hyper for HTTP services).

Changes that were required:

- You can't extend new bytes (because that implicitly copies). You need to use
  BytesMut instead, which I did where that was necessary (I also added calls in
  the Filestore to do that efficiently).
- You can't create bytes from a `&'a [u8]`, unless `'a` is  `'static`. You need
  to use `copy_from_slice` instead.
- `slice_to` and `slice_from` have been replaced by a `slice()` function that
  takes ranges.

Reviewed By: StanislavGlebik

Differential Revision: D20121350

fbshipit-source-id: eb31af2051fd8c9d31c69b502e2f6f1ce2190cb1
2020-02-27 08:08:28 -08:00
Thomas Orozco
7698cded43 mononoke/hooks: add a signed source hook
Reviewed By: HarveyHunt

Differential Revision: D20139152

fbshipit-source-id: a0a48d447444cf969162f5f9655ab003e7ca2f76
2020-02-27 08:05:14 -08:00
Mateusz Kwapich
6f9f82767c add git identifiers to Source Control Service
Summary: This allows us to translate git hashes

Reviewed By: markbt

Differential Revision: D19972870

fbshipit-source-id: 871a4cf94d468d987221cb08fe7b6135050bac93
2020-02-27 08:05:14 -08:00
Mateusz Kwapich
5825db21c6 add the git<->bonsai translation to mononoke_api crate
Reviewed By: markbt

Differential Revision: D19972871

fbshipit-source-id: 79c0c59f0bd1bd033bf2a8999dbe56b60a7ac085
2020-02-27 08:05:13 -08:00
Mateusz Kwapich
3ff29a8810 make BonsaiGitMapping repo-specific
Summary:
Nearly all of the Mononoke SQL stores are instantiated once per repo but they don't store the `RepositoryId` anywhere so every method takes it as argument. And because providing the repo_id on every call is not ergonomical we tend to add methods to blob_repo that just call the right method with the right repo_id in on of the underlying stores (see `get_bonsai_from_globalrev` on blobrepo for example).

Because my reviewers [pushed back](https://our.intern.facebook.com/intern/diff/D19972871/?transaction_id=196961774880671&dest_fbid=1282141621983439) when I've tried to do the same for bonsai_git_mapping  I've decided to make it right by adding the repo_id to the BonsaiGitMapping.

Reviewed By: krallin

Differential Revision: D20029485

fbshipit-source-id: 7585c3bf9cc8fa3cbe59ab1e87938f567c09278a
2020-02-27 08:05:13 -08:00
Kostia Balytskyi
7ee657f124 mononoke: asyncify signatures of two fns in getbundle_response
Summary:
## Wider goal
See D20068839

## This diff
Asyncifying only singatures allows us to independently work on function bodies, without touching the callsites later in the diff.

Reviewed By: StanislavGlebik

Differential Revision: D20097804

fbshipit-source-id: f1391a055947c7802f719bc99b9eae71a4ac39cd
2020-02-27 05:01:52 -08:00
Kostia Balytskyi
bd90a843a7 mononoke: asyncify diff_with_parents in getbundle_response
Summary:
## Wider goal
See D20068839

## This diff
Let's modernize this particular fucntion

Reviewed By: StanislavGlebik

Differential Revision: D20097800

fbshipit-source-id: a919b5ad1b544a7b784668ca265e24c375100fa3
2020-02-27 05:01:51 -08:00
Kostia Balytskyi
90b03f5a0d mononoke: call old-style Future OldFuture in getbundle_response
Summary:
## Wider goal
See D20068839

## This diff
This file contains a mix of old and new-style futures. It even has futures,
which have items composed of futures. To be able to convert on one of the
levels and not the other, we need to deal with the confusion.

Let's have old things have `Old` in the name.

Reviewed By: StanislavGlebik

Differential Revision: D20097803

fbshipit-source-id: fedb3669ef34a8328ec389a30ff2c512ab363818
2020-02-27 05:01:51 -08:00
Kostia Balytskyi
4f2993c765 mononoke: move bundle generation bits from hg_sync_job into getbundle_response
Summary:
## Wider goal
We want the flexibility to return hydrated responses for `getbundle` wireproto
requests for draft commits. This means that the responses will contain not
only the commit data (as they do now), but also trees and files.
For context, when an "unhydrated" response is returned for the `getbundle`
request for a draft commit, we expect one of two things to happen later
in the e2e scenario:
- either `hg` client would immediately make another wireproto request
  (`gettreepack`, `getpackv1`) within the same client `hg` command execution
- or a subsequent `hg update` call will cause another wireproto request

In any case, another request is needed before the pulled commit can be used.
This request can hit a different server, sometimes it can even be Mercurial
instead of Mononoke. Specifically, it can Mercurial instead of Mononoke if the
`fallback` path markers are configured incorrectly. In that case we have a
problem, as Mercurial is incapable of serving `gettreepack` or `getpackv1` for
infinitepush commits.

One way to deal with this is to always have correct path markers, which is
prone to human mistakes. Another way is to guarantee that Mononoke returns
everything in the original `getbundle` request. We don't want to do this for
public commits, as `pull`s of public commits typically fetch thousands of those
commits and never care about tree or file data for all but one of them. Draft
commits are different however, as they are usually exactly what the client
intends to use, so hydrating those is fine. Still, we want this behavior to
be gated behind a config flag.

## This diff
A lot of the needed code is already implemented in the hg-sync job, bundle
generating variant. So prior to implementing the actual behavior described
above, let's move the relevant bits to `getbundle_response`. Later we can comb
them up a bit (asyncify) and use to implement the needed behavior.

Reviewed By: StanislavGlebik

Differential Revision: D20068839

fbshipit-source-id: 0ab63d57b2d167401b7ee8864fe7760f5f65f8ec
2020-02-27 05:01:51 -08:00
Kostia Balytskyi
aac7bff59d mononoke: pull config schema changes from configerator
Summary:
This is the moral equivalent of D20115877 in fbcode. See that diff for
motivation.

Reviewed By: StanislavGlebik

Differential Revision: D20118575

fbshipit-source-id: 8f77f572068e611003b1344be3434f2d04ec56ca
2020-02-27 05:01:50 -08:00