Commit Graph

2334 Commits

Author SHA1 Message Date
Thomas Orozco
5537bc0097 mononoke/blobimport: parallelize filenode computations
Summary: This updates blobimport to run filenode computations on their own Tokio task. This is particularly helpful when dealing with a repo with large files: recomputing filenodes can be a fairly expensive process if a large number of large files have moved. Running those on separate tasks allows for better utilization of multiple CPUs.

Reviewed By: HarveyHunt

Differential Revision: D16667040

fbshipit-source-id: 0746af4da876c0b4313b86aca32c5e261a316a8c
2019-08-07 08:33:44 -07:00
Thomas Orozco
98f94de92a mononoke/blobimport: lfs retries
Summary: This updates blobimport to retry on failures to fetch through its LFS helper.

Reviewed By: HarveyHunt

Differential Revision: D16667042

fbshipit-source-id: dfaec8fc029030a7d90806018301d22525d094df
2019-08-07 08:33:43 -07:00
Thomas Orozco
1a83e1eb49 mononoke/blobimport: remove extern crate statements
Summary: This updates blobimport to a newer Rust-2018-style. Rust 2018 is nicer.

Reviewed By: StanislavGlebik

Differential Revision: D16667043

fbshipit-source-id: a217cbef0337e7d7cf288f972e9b4af0340a8e0b
2019-08-07 08:33:43 -07:00
Thomas Orozco
10116e3a04 mononoke/blobimport: use shared limits for blob uploads and LFS imports
Summary:
This updates blobimport to avoid using a per-changeset limit for the number of blob uploads (which used to be 100), and instead to use a global blob upload limit, and a global LFS import limit.

This allows for finer-grained control over the resource utilization (notably in terms of Manifold quota) of blobimport.

While I was in there, I also eliminated an `Arc<BlobRepo>` that was laying around, sicne we don't use those anymore.

Reviewed By: HarveyHunt

Differential Revision: D16667044

fbshipit-source-id: 9fc2f347969c7ca9472ce8dd3d4e2f1acb175b66
2019-08-07 08:33:43 -07:00
Thomas Orozco
97b0763798 mononoke/blobimport: check for reusable LFS blobs
Summary: Updates blobimport to check for existing LFS blobs. This is perfectly safe since at worse it'll fail closed (i.e. blobs will be missing and blobimport will hang).

Reviewed By: HarveyHunt

Differential Revision: D16648560

fbshipit-source-id: ed9da7f2fa69c28451ac058a4e1adc937d111b31
2019-08-07 08:33:42 -07:00
Thomas Orozco
f792f983b7 mononoke/blobimport: make commit limit configurable
Summary: This updates blobimport to allow specifying a different commit limit than the default of 100. This makes it a little more convenient when tuning limits.

Reviewed By: HarveyHunt

Differential Revision: D16648558

fbshipit-source-id: 2e8c8496487a79bec84ec964302b1d6e871caf2a
2019-08-07 08:33:42 -07:00
Thomas Orozco
6d0601db38 mononoke/blobimport: log more often
Summary:
Let's log a bit more often than once every 5,000 commits. Logging 10 times per
run seems reasonable.

Reviewed By: HarveyHunt

Differential Revision: D16621623

fbshipit-source-id: b5ae481f5d899a12c736bb2de74efc8b370002df
2019-08-07 08:33:42 -07:00
Thomas Orozco
823e7a14ee mononoke/blobimport: log LFS helper context
Summary:
If you give blobimport a bad path for the LFS helper, it just tells you the
file doesn't exist, which is not very useful if you don't know what file we're
talking about. This fixes that.

Reviewed By: HarveyHunt

Differential Revision: D16621624

fbshipit-source-id: ff7fb5c8800604f8799c682e5957c888d2b01647
2019-08-07 08:33:41 -07:00
Thomas Orozco
8f0e05dc1b mononoke: remove glusterblob
Reviewed By: HarveyHunt, farnz

Differential Revision: D16687877

fbshipit-source-id: 0cf9f785adb603eac180b4b7a35d3f6b6b3fcf2a
2019-08-07 07:37:17 -07:00
Thomas Orozco
74b47b28da mononoke/admin: don't process::exit
Summary:
Sometimes, our Mononoke CLI tests report errors in mode/opt where they don't
actually output error messages at all when returning an error. Those error
messages actually get buffered by our logger, and are normally emitted when
said logger is dropped.

However, I have a suspicion that by calling process::exit(), we're not giving
Rust a chance to drop the logger (because that eventually just calls the exit
syscall), and as a result our buffer isn't getting flushed.

This patch attempts to fix this by removing all calls to process::exit in the
admin CLI, and instead returning an exit code from main, which I hope will
allow main to drop everything before it returns.

More broadly speaking, I don't think it's very good practice to exit from all
over the place in the admin CLI, so this fixes that :)

Reviewed By: HarveyHunt

Differential Revision: D16687747

fbshipit-source-id: 38e987463363e239d4f9166050a8dd26a4bef985
2019-08-07 06:43:10 -07:00
Alex Hornby
6011988595 mononoke: blobstore healer code cleanups
Summary:
Address nits from D16599270:
* use map instead of and_then
* use &[T] instead of &Vec[T]

Reviewed By: StanislavGlebik

Differential Revision: D16666838

fbshipit-source-id: ad0afa2d44d7a713409ac75bab599a35f5cf1a87
2019-08-06 10:44:41 -07:00
Thomas Orozco
16f19442a1 mononoke/filestore: remove ::* usage
Summary:
fanzeyi pointed out that get_metadata calling a different get_metadata because
of `use metadata ::*` was confusing. This cleans this up.

Reviewed By: fanzeyi

Differential Revision: D16669239

fbshipit-source-id: f1e56fa9691c885ed56475c911f9407454afc4bb
2019-08-06 10:35:14 -07:00
Pavel Aslanov
06c8fae85b added bounded_traversal_dag
Summary:
This diff introduces `bounded_traversal_dag` which can handle arbitrary DAGs and detect invalid DAGs with cycles, but it has limitation in comparison to `bounded_traversal`:
  - `bounded_traversal_dag` keeps `Out` result of computation for all the nodes
     but `bounded_traversal` only keeps results for nodes that have not been completely
     evaluatated
  - `In` has additional constraints to be `Eq + Hash + Clone`
  - `Out` has additional constraint to be `Clone`

Reviewed By: krallin

Differential Revision: D16621004

fbshipit-source-id: b9f60e461d5d50e060be4f5bb6b970f16a9b99f9
2019-08-05 05:41:17 -07:00
Thomas Orozco
e192f295df mononoke/admin: add filestore debug subcommands
Summary: This adds debug subcommands metadata and verify in the Filestore. Those respectively output the metadata for a file verify that the file is reachable through all aliases.

Reviewed By: ahornby

Differential Revision: D16621789

fbshipit-source-id: 4a2156bfffb9d9641ce58f6d5f691364ba9dc145
2019-08-05 05:28:53 -07:00
Thomas Orozco
07f3fa1a88 mononoke/integration: un-blackhole apiserver tests
Summary:
Johan fixed retry logic in Mercurial, so those tests can now succeed even if
the blackhole is enabled (though we haven't fully understood why the blackhole
was breaking them in the first place).

Differential Revision: D16646032

fbshipit-source-id: 8b7ff2d8d284e003e49681e737367e9942370fa1
2019-08-05 05:21:14 -07:00
Alex Hornby
e2e9f35211 mononoke: Ensure blobstore_healer can still make progress when queue contains unknown stores
Summary:
Update blobstore_healer handling of unknown stores to re-queue and delete original entries.

Make sure it we still make progress in the case where there are lots of unknown blobstore entries on the queue.

Previous diff in stack took the approach of not deleting, which could keep loading and logging same entries if there were more than blobstore_sync_queue_limit of them.  Better to reinsert with new timestamp and delete old entries.

Reviewed By: krallin

Differential Revision: D16599270

fbshipit-source-id: efa3e5602f0ab3a037d0534e1fe8e3d42fbb52e6
2019-08-05 03:50:55 -07:00
Alex Hornby
d17d3475ed mononoke: make blob store healer preserve queue entries for unknown blobstores
Summary: make blob store healer preserve queue entries for unknown blobstores rather than erroring

Reviewed By: ikostia

Differential Revision: D16586816

fbshipit-source-id: 3d4987a95adcddd0329b9ededdf95887aa11286e
2019-08-05 03:50:54 -07:00
Alex Hornby
f864348558 mononoke: add healer logic to fetch from all source blobstores on the queue
Summary:
Add healer logic to fetch from all source blobstores on the queue

Add tests for the healer queue state including put failures

Reviewed By: krallin

Differential Revision: D16549013

fbshipit-source-id: 6aa55b3cb2ed7fa9a1630edd5bc5b2ad2c6f5011
2019-08-05 03:50:54 -07:00
Alex Hornby
00d855084d mononoke/blobstore_healer: fixes for the handling of blobstores after heal
Summary:
Fixes for the handling of blobstores after heal:
1. If all blobstores are successfully healed for a key, no need to requeue it
2. Where all heal puts fail, make sure we requeue with at least the original source blobstore we loaded the blob from
3. When we do write to the queue,  write with all blobstore ids where we know we have good data, so that when it is read later it is not considered missing.

Reviewed By: krallin

Differential Revision: D15911853

fbshipit-source-id: 1c81ce4ec5f975e5230b27934662e02ec515cb8f
2019-08-05 03:50:54 -07:00
Alex Hornby
d1a8c487ae mononoke: make blobstore_healer auto-heal missing source blobstores where possible
Summary: make blobstore_healer auto-heal source blobstores found to be missing data so long as at least one other source blobstore from the queue has the data for the missing key

Reviewed By: krallin

Differential Revision: D16464895

fbshipit-source-id: 32549e58933f39bb20c173caf02a35c91123fe8d
2019-08-05 03:50:54 -07:00
Alex Hornby
a06468acd6 mononoke: add key filter option to blobstore healer
Summary: Add blobstore key filter option to blobstore healer to allow easier reproduction of healer issues for particular keys.

Reviewed By: StanislavGlebik

Differential Revision: D16457530

fbshipit-source-id: 23201e45fbdf14fa7fdccbaf8e0f4b29355aa906
2019-08-05 03:50:53 -07:00
Alex Hornby
82f2a70c3b mononoke/blobstore_healer: remove ratelimiter
Summary: Since we're only running a single healer in the process for a single blobstore, its easy to bound the concurrency by limiting it to the number of entries we deal with at once. As a result, we don't need a separate mechanism to do overall control.

Reviewed By: StanislavGlebik

Differential Revision: D15912818

fbshipit-source-id: 3087b88cfdfed2490664cd0df10bd6f126267b83
2019-08-05 03:50:53 -07:00
Alex Hornby
3d27faba08 mononoke/blobstore_healer: add comments and type annotations
Summary: Basically notes I took for myself to truely understand the code.

Reviewed By: StanislavGlebik

Differential Revision: D15908406

fbshipit-source-id: 3f21f7a1ddce8e15ceeeffdb5518fd7f5b1749c4
2019-08-05 03:50:53 -07:00
Alex Hornby
978242fb35 mononoke/repoconfig: update to using chain_err
Reviewed By: StanislavGlebik

Differential Revision: D15072797

fbshipit-source-id: 5339d78de265463ad800d7fb8db8a1444e3fdd6b
2019-08-05 03:50:52 -07:00
Alex Hornby
4322423811 mononoke: Update blobstore_healer for new storage config model
Summary:
Allow blobstore_healer to be directly configured to operate on a blobstore.
This makes two changes:
- Define which blobstore to operate on defined in storage.toml (doesn't
  currently support server.toml-local storage configs)
- Only heal one blobstore at a time. We can run multiple separate instances of the
  healer to heal multiple blobstores.

Reviewed By: HarveyHunt

Differential Revision: D15065422

fbshipit-source-id: 5bc9f1a16fc83ca5966d804b5715b09d359a3832
2019-08-05 03:50:52 -07:00
Alex Hornby
b43dc6e5a7 mononoke: Migrate populate_healer for new storage config data model
Summary: Update populate_healer to act directly on a blobstore config rather than indirectly via a repo config.

Reviewed By: StanislavGlebik

Differential Revision: D15065424

fbshipit-source-id: 638778a61283dc9ed991c49936a21d02b8d2e3f3
2019-08-05 03:50:52 -07:00
Alex Hornby
59b47cf4fa mononoke: Drop repoid from healer structures
Summary:
The healer is a blobstore-level operation, which is orthogonal to the concept of a repo; therefore, there should be no mention of repoid in any of the healer's structures or tables.

For now this leaves the schema unmodified, and fills the repoid with a dummy value (0). We can clean that up later.

Reviewed By: lukaspiatkowski, HarveyHunt

Differential Revision: D15051896

fbshipit-source-id: 438b4c6885f18934228f43d85cdb8bf2f0e542f1
2019-08-05 03:50:51 -07:00
Alex Hornby
f4e304eb09 mononoke/sqlblob: drop repo_id from everywhere
Summary: RepositoryId shouldn't leak into the blobstore layer. This leaves repoid in the schema, but just populates it with a dummy value (0). We can clean up the schema and this code in a later diff.

Reviewed By: StanislavGlebik

Differential Revision: D15021285

fbshipit-source-id: 3ecb04a76ce74409ed0cced3d2a0217eacd0e2fb
2019-08-05 03:50:51 -07:00
Kostia Balytskyi
bc985480e9 mononoke: add the filenode subcommand to the admin tool
Summary:
This is useful to inspect the Mercurial filenodes in Mononoke, like in S183272.

For example, I intend to use this subcommand to verify how well the future linknode healing works.

Reviewed By: krallin

Differential Revision: D16621516

fbshipit-source-id: 4266f85bce29b59072bf9c4f3e63777dae09a4f1
2019-08-02 12:45:57 -07:00
Kostia Balytskyi
1420897ff8 mononoke: separate cs_id fetching from filenode fetching in the admin tool
Summary: Let's separate some concerns.

Reviewed By: krallin

Differential Revision: D16621518

fbshipit-source-id: 2d6ca96b72d5ffbc0fac4a4f9643ecc2acde0ca2
2019-08-02 12:45:57 -07:00
Kostia Balytskyi
432f1f6401 mononoke: in admin, move get_file_nodes to common
Summary: This is needed in the following diff.

Reviewed By: krallin

Differential Revision: D16621517

fbshipit-source-id: 5a50cae7c8b761d7578bcbe5caf302a5ee2578a3
2019-08-02 12:45:57 -07:00
Thomas Orozco
ea059ef2c7 mononoke/benchmark_filestore: add support for testing with caches
Summary: This updates benchmark_filestore to allow testing with caches (notably, Memcache & Cachelib). It also reads twice now, which is nice for caches that aren't filled by us (e.g. Manifold CDN).

Reviewed By: ahornby

Differential Revision: D16584952

fbshipit-source-id: 48ceaa9f2ea393626ac0e5f3988672df020fbc28
2019-08-02 05:40:29 -07:00
Thomas Orozco
bea4a85117 mononoke/types: clean up ContentMetadata out of FileContents
Summary: There's a lot of stuff in file_contents.rs that's not actually about file contents per-se. This fixes that.

Reviewed By: ahornby

Differential Revision: D16598905

fbshipit-source-id: 9832b96261264c54809e0c32980cf449f8537517
2019-08-02 03:43:16 -07:00
Thomas Orozco
68569e5d0c mononoke/{types,filestore}: use a separate type for File Chunks
Summary:
NOTE: This changes our file storage format. It's fine to do it now since we haven't started enabling chunking yet (see: D16560844).

This updates the Filestore's chunking to store chunks as their own entity in Thrift, as opposed to have them be just FileContents.

The upside of this approach is that this we can't have an entity that's both a File and a Chunk, which means:

- We don't have to deal with recursive chunks (since, unlike Files, Chunks can't contain be pointers to other chunks).
- We don't have to produce metadata (forward mappings and backmappings) for chunks (the reason we had to produce it was to make sure we wouldn't accidentally produce inconsitent data if the upload for one of our chunks happened to have been tried as a file earlier and failed).

Note that this also updates the return value from the Filestore to `ContentMetadata`. We were using `Chunk` before there because it was sufficient and convenient, but now that `Chunk` no longer contains a `ContentId`, it no longer is convenient, so it's worth changing :)

Reviewed By: HarveyHunt

Differential Revision: D16598906

fbshipit-source-id: f6bec75d921f1dea3a9ea3441f57213f13aeb395
2019-08-02 03:43:16 -07:00
Thomas Orozco
cfa4c8332f mononoke/integration: disable blackhole for apiserver tests
Summary:
The network blackhole is causing the API server to occasionally hang while serving requests, which has broken some LFS tests. This appears to be have happened in the last month or so, but unfortunately, I haven't been able to root cause why this is happening.

From what I can tell, we have an hg client that tries an upload to the API Server, and uploads everything... and then the API server just hangs. If I kill the hg client, then the API server responds with a 400 (so it's not completely stuck), but otherwise it seems like the API server is waiting for something to happen on the client-side, but the client isn't sending that.

As far as I can tell, the API Server isn't actualy trying to make outbound requests (strace does report that it has a Scribe client that's trying to connect, but Scuba logging isn't enabled, and this is just trying to connect but not send anything), but something with the blackhole is causing this hg - API server interaciton to fail.

In the meantime, this diff disables the blackhole for those tests that definitely don't work when it's enabled ...

Reviewed By: HarveyHunt

Differential Revision: D16599929

fbshipit-source-id: c6d77c5428e206cd41d5466e20405264622158ab
2019-08-01 07:36:02 -07:00
Thomas Orozco
7ba44d737c mononoke: add filestore params to configuration
Summary: This updates our repo config to allow passing through Filestore params. This will be useful to conditionally enable Filestore chunking for new repos.

Reviewed By: HarveyHunt

Differential Revision: D16580700

fbshipit-source-id: b624bb524f0a939f9ce11f9c2983d49f91df855a
2019-07-31 11:48:18 -07:00
Thomas Orozco
9b90fd63cb mononoke/blobimport: allow running without re-creating store
Summary:
This allows running blobimport multiple times over the same path locally (with a blob files storage, for example), which is how we use it in prod (but there we don't use the file blobstore so it works).

This is helpful when playing around with local changes to blobimport.

Reviewed By: HarveyHunt

Differential Revision: D16580697

fbshipit-source-id: 4a62ff89542f67ce6396948c666244ef40ffe5e7
2019-07-31 11:42:36 -07:00
Thomas Orozco
ac19446c03 mononoke/filestore: dont boxify
Summary: This updates the Filestore to avoid boxifying in its chunking code. The upshot is that this gets us to a place where passing a Send Stream into the Filestore gives you a Send Future back, and passing in a non-Send Stream in the Filestore gives you a non-Send future back (I hinted at this earlier in the diff that introduced faster writes).

Reviewed By: aslpavel

Differential Revision: D16560768

fbshipit-source-id: b77766380f2eaed5919f78cef6fbc02afeead0b9
2019-07-31 05:19:42 -07:00
Thomas Orozco
6c07d8da97 mononoke/filestore: make reads faster
Summary: As noted earlier in this stack, the Filestore read implementation was very inefficient, since it required reading a Chunk before moving on to the next one. Since our Chunked files will typically have just one level of chunking, this is very inefficient (we could be fetching additional chunks ahead of time). This new implementation lets us take advantage of buffering, so we can load arbitrarily as many chunks in parallel as we'd like.

Reviewed By: aslpavel

Differential Revision: D16560767

fbshipit-source-id: c02c10c5de0fc5fdc3ee3897ae855b316ea34605
2019-07-31 05:19:42 -07:00
Thomas Orozco
1c6ca01a25 mononoke/filestore: make writes fast
Summary:
This updates the Filestore to make writes faster by farming out all hashing to
separate Tokio tasks. This lets us increase throughput of the Filestore
substantially, since we're no longer limited by the ability of a single core to
hash data.

On my dev server, when running on a 1MB file, this lets us improve the
throughput of the Filestore for writes from 36.50 MB/s (0.29 Gb/s) to 152.61
MB/s (1.19 Gb/s) when using a chunk size of 1MB and a concurrency level of 10
(i.e. 10 concurrent chunk uploads).

Note that the chunk size has a fairly limited impact on performance (e.g.
making 10KB instead has a <10% impact on performance).

Of course, this doesn't reflect performance when uploading to a remote
blobstore, but note that we can tune that by tweaking our upload concurrency
(making uploads faster at the expense of more memory).

 ---

Note that as part of this change, I updated the implementation away from stream
splitting, and into an implementation that fans out to Sinks. I actually had
implementation of a higher-performance filestore for both, but went with this
approach because it doesn't require the incoming Stream to be Send (and I have
a forthcoming diff to make the whole Filestore not require a Send input), which
will be useful when incorporating with the API Server, which unfortunately does
not provide us with a Send input.

Reviewed By: aslpavel

Differential Revision: D16560769

fbshipit-source-id: b2e414ea3b47cc4db17f82d982618bbd837f93a9
2019-07-31 05:19:42 -07:00
Thomas Orozco
9bc7076a6a mononoke/filestore: default to no chunking
Summary:
This is a very trivial patch simply intended to make it easier to safely roll
out the Filestore. With this, we can roll out the Filestore with chunking fully
disabled and make sure all hosts know how to read chunked data before we turn
it on.

Reviewed By: aslpavel, HarveyHunt

Differential Revision: D16560844

fbshipit-source-id: 30c49c27e839dfb06b417050c0fcde13296ddade
2019-07-31 05:19:42 -07:00
Thomas Orozco
6a9007e9e0 mononoke: add a filestore benchmark + configurable concurrency
Summary: This adds a filestore benchmark that allows for playing around with Filestore parameters and makes it easier to measure performance improvements.

Reviewed By: aslpavel

Differential Revision: D16559941

fbshipit-source-id: 50a4e91ad07bf6f9fc1efab14aa1ea6c81b9ca27
2019-07-31 05:19:41 -07:00
Thomas Orozco
b4a2d2f36a mononoke/apiserver: make DownloadLargeFile a streaming response
Summary: This updates the API server's DownloadLargeFile method to return a streaming response, instead of buffering all file contents in a Bytes blob.

Reviewed By: aslpavel

Differential Revision: D16494240

fbshipit-source-id: bdbece99215d87be6a65e67f8f2d920933109e15
2019-07-31 05:19:41 -07:00
Thomas Orozco
4e30164506 mononoke/blobimport: support LFS filenodes
Summary:
This adds support for LFS filenodes in blobimport. This works by passing a `--lfs-helper` argument, which should be an executable that can provide a LFS blob's contents on stdout when called with the OID and size of a LFS blob. My thinking is to `curl` directly from Dewey when running this in prod.

Note that, as of this change, we can blobimport LFS files, but doing so might be somewhat inefficient, since we'll roundtrip the blobs to our filestore, then generate filenodes. For now, however, I'd like to start with this so we can get a sense of whether this is acceptable performance-wise.

Reviewed By: farnz

Differential Revision: D16494241

fbshipit-source-id: 2ae032feb1530c558edf2cfbe967444a9a7c0d0f
2019-07-31 05:19:41 -07:00
Thomas Orozco
7c25a6010e mononoke: UploadHgFileContents: don't buffer contents to compute a Filenode ID
Summary: This update our UploadHgFileContents::ContentUploaded implementation to not require buffering file contents in order to produce a Mercurial Filenode ID.

Reviewed By: farnz

Differential Revision: D16457833

fbshipit-source-id: ce2c5577ffbe91dfd0de1cac7d85b8d90ded140e
2019-07-31 05:19:40 -07:00
Thomas Orozco
f9360cab9d mononoke/filestore: incorporate in Mononoke
Summary:
NOTE: This isn't 100% complete yet. I have a little more work to do around the aliasverify binary, but I think it'll make sense to rework this a little bit with the Filestore anyway.

This patch incorporates the Filestore throughout Mononoke. At this time, what this means is:

- Blobrepo methods return streams of `FileBytes`.
- Various callsites that need access to `FileBytes` call `concat2` on those streams.

This also eliminates the Sha256 aliasing code that we had written for LFS and replaces it with a Filestore-based implementation.

However, note that this does _not_ change how files submitted through `unbundle` are written to blobstores right now. Indeed, those contents are passed into the Filestore through `store_bytes`, which doesn't do chunking. This is intentional since it lets us use LFS uploads as a testbed for chunked storage before turning it on for everything else (also, chunking those requires further refactoring of content uploads, since right now they don't expect the `ContentId` to come back through a Future).

The goal of doing it this way is to make the transition simpler. In other words, this diff doesn't change anything functionally — it just updates the underlying API we use to access files. This is also important to get a smooth release: it we had new servers that started chunking things while old servers tried to read them, things would be bad. Doing it this way ensures that doesn't happen.

This means that streaming is there, but it's not being leveraged just yet. I'm planning to do so in a separate diff, starting with the LFS read and write endpoints in

Reviewed By: farnz

Differential Revision: D16440671

fbshipit-source-id: 02ae23783f38da895ee3052252fa6023b4a51979
2019-07-31 05:19:40 -07:00
Thomas Orozco
16801112a2 mononoke/filestore: add store_bytes for compatibility
Summary:
This adds a `store_bytes` call in the Filestore that can be used to store a set of Bytes *without chunking* and return a `FileContents` blob. This is intended as a transitional API while we incorporate the Filestore throughout the codebase. It's useful for 2 reasons:

- It lets us roll out chunked writes gradually where we need it. My goal is to use the Filestore chunking writes API for LFS, but keep using `store_bytes` initially in other places. This means content submitted through regular Mercurial bundles won't be chunked until we feel comfortable with chunking it.
- It lets us use the Filestore in places where we were relying on the assumption that you can immediately turn Bytes into a ContentId (notably: in the `UploadHgFileContents::RawBytes` code path).

This is intended to be removed later.

Reviewed By: aslpavel

Differential Revision: D16440670

fbshipit-source-id: e591f89bb876d08e6b6f805e35f0b791e61a6474
2019-07-31 05:19:40 -07:00
Thomas Orozco
4faaa6b2f7 mononoke/filestore: introduce peek()
Summary:
This adds a new `peek(.., N, ..)` call in the Filestore that allows reading at least N bytes from a file in the Filestore. This is helpful for generating Mercurial metadata blobs.

This is implemented using the same ChunkStream we use to write to the Filestore (but that needed a little fixing to support streams of empty bytes as well).

Reviewed By: aslpavel

Differential Revision: D16440672

fbshipit-source-id: a584099f87ab34e2151b9c3f5c9f1289575f024b
2019-07-31 05:19:39 -07:00
Thomas Orozco
6d675846bf mononoke/filestore: switch to functions (config is only needed in writes)
Summary: This updates the Filestore API to be a set of functions, instead of a Struct. The rationale here is that there is only one Filestore call that needs anything on top of a blobstore (the write call), so making those functions makes it much easier to incorporate Filestore in various places that need it without having to thread a Filestore all the way through.

Reviewed By: aslpavel

Differential Revision: D16440668

fbshipit-source-id: 4b4bc8872e205a66a12ec96a478f0f1811f2e6b1
2019-07-31 05:19:39 -07:00
Thomas Orozco
5e8148a968 mononoke: UploadHgFileContents: optimistically reuse HG filenodes
Summary:
This adds supporting for reusing Mercurial filenodes in
`UploadHgFileContents::execute`.

The motivation behind this is that given file contents, copy info, and parents,
the file node ID is deterministic, but computing it requires fetching and
hashing the body of the file. This implementation implements a lookup through
the blobstore to find a pre-computed filenode ID.

Doing so is in general a little inefficient (but it's not entirely certain that
the implementation I'm proposing here is faster -- more on this below), but
it's particularly problematic large files. Indeed, fetching a multiple-GB file
to recompute the filenode even if we received it from the client can be fairly
slow (and use up quite a bit of RAM, though that's something we can mitigate by
streaming file contents).

Once thing worth noting here (hence the RFC flag) is that there is a bit of a
potential for performance regression. Indeed, we could have a cache miss when
looking up the filenode ID, and then we'll have to fetch the file.

*At this time*, this is also somewhat inefficient, since we'll have to fetch
the file anyway to peek at its contents in order to generate metadata. This
is fixed later in this Filestore stack.

That said, an actual regression seems a little unlikely to happen since in
practice we'll write out the lookup entry when accepting a pushrebase
then do a lookup on it later when converting the pushrebased Bonsai changeset
to a Mercurial changeset).

If we're worried, then perhaps adding hit / miss stats on the lookup might make
sense. Let me know what you think.

 ---

Finally, there's a bit I don't love here, which is trusting LFS clients with the size of their uploads.
I'm thinking of fixing this when I finish the Filestore work.

Reviewed By: aslpavel

Differential Revision: D16345248

fbshipit-source-id: 6ce8a191efbb374ff8a1a185ce4b80dc237a536d
2019-07-31 05:19:39 -07:00