Summary: This updates blobimport to run filenode computations on their own Tokio task. This is particularly helpful when dealing with a repo with large files: recomputing filenodes can be a fairly expensive process if a large number of large files have moved. Running those on separate tasks allows for better utilization of multiple CPUs.
Reviewed By: HarveyHunt
Differential Revision: D16667040
fbshipit-source-id: 0746af4da876c0b4313b86aca32c5e261a316a8c
Summary: This updates blobimport to retry on failures to fetch through its LFS helper.
Reviewed By: HarveyHunt
Differential Revision: D16667042
fbshipit-source-id: dfaec8fc029030a7d90806018301d22525d094df
Summary: This updates blobimport to a newer Rust-2018-style. Rust 2018 is nicer.
Reviewed By: StanislavGlebik
Differential Revision: D16667043
fbshipit-source-id: a217cbef0337e7d7cf288f972e9b4af0340a8e0b
Summary:
This updates blobimport to avoid using a per-changeset limit for the number of blob uploads (which used to be 100), and instead to use a global blob upload limit, and a global LFS import limit.
This allows for finer-grained control over the resource utilization (notably in terms of Manifold quota) of blobimport.
While I was in there, I also eliminated an `Arc<BlobRepo>` that was laying around, sicne we don't use those anymore.
Reviewed By: HarveyHunt
Differential Revision: D16667044
fbshipit-source-id: 9fc2f347969c7ca9472ce8dd3d4e2f1acb175b66
Summary: Updates blobimport to check for existing LFS blobs. This is perfectly safe since at worse it'll fail closed (i.e. blobs will be missing and blobimport will hang).
Reviewed By: HarveyHunt
Differential Revision: D16648560
fbshipit-source-id: ed9da7f2fa69c28451ac058a4e1adc937d111b31
Summary: This updates blobimport to allow specifying a different commit limit than the default of 100. This makes it a little more convenient when tuning limits.
Reviewed By: HarveyHunt
Differential Revision: D16648558
fbshipit-source-id: 2e8c8496487a79bec84ec964302b1d6e871caf2a
Summary:
Let's log a bit more often than once every 5,000 commits. Logging 10 times per
run seems reasonable.
Reviewed By: HarveyHunt
Differential Revision: D16621623
fbshipit-source-id: b5ae481f5d899a12c736bb2de74efc8b370002df
Summary:
If you give blobimport a bad path for the LFS helper, it just tells you the
file doesn't exist, which is not very useful if you don't know what file we're
talking about. This fixes that.
Reviewed By: HarveyHunt
Differential Revision: D16621624
fbshipit-source-id: ff7fb5c8800604f8799c682e5957c888d2b01647
Summary:
Sometimes, our Mononoke CLI tests report errors in mode/opt where they don't
actually output error messages at all when returning an error. Those error
messages actually get buffered by our logger, and are normally emitted when
said logger is dropped.
However, I have a suspicion that by calling process::exit(), we're not giving
Rust a chance to drop the logger (because that eventually just calls the exit
syscall), and as a result our buffer isn't getting flushed.
This patch attempts to fix this by removing all calls to process::exit in the
admin CLI, and instead returning an exit code from main, which I hope will
allow main to drop everything before it returns.
More broadly speaking, I don't think it's very good practice to exit from all
over the place in the admin CLI, so this fixes that :)
Reviewed By: HarveyHunt
Differential Revision: D16687747
fbshipit-source-id: 38e987463363e239d4f9166050a8dd26a4bef985
Summary:
Address nits from D16599270:
* use map instead of and_then
* use &[T] instead of &Vec[T]
Reviewed By: StanislavGlebik
Differential Revision: D16666838
fbshipit-source-id: ad0afa2d44d7a713409ac75bab599a35f5cf1a87
Summary:
fanzeyi pointed out that get_metadata calling a different get_metadata because
of `use metadata ::*` was confusing. This cleans this up.
Reviewed By: fanzeyi
Differential Revision: D16669239
fbshipit-source-id: f1e56fa9691c885ed56475c911f9407454afc4bb
Summary:
This diff introduces `bounded_traversal_dag` which can handle arbitrary DAGs and detect invalid DAGs with cycles, but it has limitation in comparison to `bounded_traversal`:
- `bounded_traversal_dag` keeps `Out` result of computation for all the nodes
but `bounded_traversal` only keeps results for nodes that have not been completely
evaluatated
- `In` has additional constraints to be `Eq + Hash + Clone`
- `Out` has additional constraint to be `Clone`
Reviewed By: krallin
Differential Revision: D16621004
fbshipit-source-id: b9f60e461d5d50e060be4f5bb6b970f16a9b99f9
Summary: This adds debug subcommands metadata and verify in the Filestore. Those respectively output the metadata for a file verify that the file is reachable through all aliases.
Reviewed By: ahornby
Differential Revision: D16621789
fbshipit-source-id: 4a2156bfffb9d9641ce58f6d5f691364ba9dc145
Summary:
Johan fixed retry logic in Mercurial, so those tests can now succeed even if
the blackhole is enabled (though we haven't fully understood why the blackhole
was breaking them in the first place).
Differential Revision: D16646032
fbshipit-source-id: 8b7ff2d8d284e003e49681e737367e9942370fa1
Summary:
Update blobstore_healer handling of unknown stores to re-queue and delete original entries.
Make sure it we still make progress in the case where there are lots of unknown blobstore entries on the queue.
Previous diff in stack took the approach of not deleting, which could keep loading and logging same entries if there were more than blobstore_sync_queue_limit of them. Better to reinsert with new timestamp and delete old entries.
Reviewed By: krallin
Differential Revision: D16599270
fbshipit-source-id: efa3e5602f0ab3a037d0534e1fe8e3d42fbb52e6
Summary: make blob store healer preserve queue entries for unknown blobstores rather than erroring
Reviewed By: ikostia
Differential Revision: D16586816
fbshipit-source-id: 3d4987a95adcddd0329b9ededdf95887aa11286e
Summary:
Add healer logic to fetch from all source blobstores on the queue
Add tests for the healer queue state including put failures
Reviewed By: krallin
Differential Revision: D16549013
fbshipit-source-id: 6aa55b3cb2ed7fa9a1630edd5bc5b2ad2c6f5011
Summary:
Fixes for the handling of blobstores after heal:
1. If all blobstores are successfully healed for a key, no need to requeue it
2. Where all heal puts fail, make sure we requeue with at least the original source blobstore we loaded the blob from
3. When we do write to the queue, write with all blobstore ids where we know we have good data, so that when it is read later it is not considered missing.
Reviewed By: krallin
Differential Revision: D15911853
fbshipit-source-id: 1c81ce4ec5f975e5230b27934662e02ec515cb8f
Summary: make blobstore_healer auto-heal source blobstores found to be missing data so long as at least one other source blobstore from the queue has the data for the missing key
Reviewed By: krallin
Differential Revision: D16464895
fbshipit-source-id: 32549e58933f39bb20c173caf02a35c91123fe8d
Summary: Since we're only running a single healer in the process for a single blobstore, its easy to bound the concurrency by limiting it to the number of entries we deal with at once. As a result, we don't need a separate mechanism to do overall control.
Reviewed By: StanislavGlebik
Differential Revision: D15912818
fbshipit-source-id: 3087b88cfdfed2490664cd0df10bd6f126267b83
Summary: Basically notes I took for myself to truely understand the code.
Reviewed By: StanislavGlebik
Differential Revision: D15908406
fbshipit-source-id: 3f21f7a1ddce8e15ceeeffdb5518fd7f5b1749c4
Summary:
Allow blobstore_healer to be directly configured to operate on a blobstore.
This makes two changes:
- Define which blobstore to operate on defined in storage.toml (doesn't
currently support server.toml-local storage configs)
- Only heal one blobstore at a time. We can run multiple separate instances of the
healer to heal multiple blobstores.
Reviewed By: HarveyHunt
Differential Revision: D15065422
fbshipit-source-id: 5bc9f1a16fc83ca5966d804b5715b09d359a3832
Summary: Update populate_healer to act directly on a blobstore config rather than indirectly via a repo config.
Reviewed By: StanislavGlebik
Differential Revision: D15065424
fbshipit-source-id: 638778a61283dc9ed991c49936a21d02b8d2e3f3
Summary:
The healer is a blobstore-level operation, which is orthogonal to the concept of a repo; therefore, there should be no mention of repoid in any of the healer's structures or tables.
For now this leaves the schema unmodified, and fills the repoid with a dummy value (0). We can clean that up later.
Reviewed By: lukaspiatkowski, HarveyHunt
Differential Revision: D15051896
fbshipit-source-id: 438b4c6885f18934228f43d85cdb8bf2f0e542f1
Summary: RepositoryId shouldn't leak into the blobstore layer. This leaves repoid in the schema, but just populates it with a dummy value (0). We can clean up the schema and this code in a later diff.
Reviewed By: StanislavGlebik
Differential Revision: D15021285
fbshipit-source-id: 3ecb04a76ce74409ed0cced3d2a0217eacd0e2fb
Summary:
This is useful to inspect the Mercurial filenodes in Mononoke, like in S183272.
For example, I intend to use this subcommand to verify how well the future linknode healing works.
Reviewed By: krallin
Differential Revision: D16621516
fbshipit-source-id: 4266f85bce29b59072bf9c4f3e63777dae09a4f1
Summary: This is needed in the following diff.
Reviewed By: krallin
Differential Revision: D16621517
fbshipit-source-id: 5a50cae7c8b761d7578bcbe5caf302a5ee2578a3
Summary: This updates benchmark_filestore to allow testing with caches (notably, Memcache & Cachelib). It also reads twice now, which is nice for caches that aren't filled by us (e.g. Manifold CDN).
Reviewed By: ahornby
Differential Revision: D16584952
fbshipit-source-id: 48ceaa9f2ea393626ac0e5f3988672df020fbc28
Summary: There's a lot of stuff in file_contents.rs that's not actually about file contents per-se. This fixes that.
Reviewed By: ahornby
Differential Revision: D16598905
fbshipit-source-id: 9832b96261264c54809e0c32980cf449f8537517
Summary:
NOTE: This changes our file storage format. It's fine to do it now since we haven't started enabling chunking yet (see: D16560844).
This updates the Filestore's chunking to store chunks as their own entity in Thrift, as opposed to have them be just FileContents.
The upside of this approach is that this we can't have an entity that's both a File and a Chunk, which means:
- We don't have to deal with recursive chunks (since, unlike Files, Chunks can't contain be pointers to other chunks).
- We don't have to produce metadata (forward mappings and backmappings) for chunks (the reason we had to produce it was to make sure we wouldn't accidentally produce inconsitent data if the upload for one of our chunks happened to have been tried as a file earlier and failed).
Note that this also updates the return value from the Filestore to `ContentMetadata`. We were using `Chunk` before there because it was sufficient and convenient, but now that `Chunk` no longer contains a `ContentId`, it no longer is convenient, so it's worth changing :)
Reviewed By: HarveyHunt
Differential Revision: D16598906
fbshipit-source-id: f6bec75d921f1dea3a9ea3441f57213f13aeb395
Summary:
The network blackhole is causing the API server to occasionally hang while serving requests, which has broken some LFS tests. This appears to be have happened in the last month or so, but unfortunately, I haven't been able to root cause why this is happening.
From what I can tell, we have an hg client that tries an upload to the API Server, and uploads everything... and then the API server just hangs. If I kill the hg client, then the API server responds with a 400 (so it's not completely stuck), but otherwise it seems like the API server is waiting for something to happen on the client-side, but the client isn't sending that.
As far as I can tell, the API Server isn't actualy trying to make outbound requests (strace does report that it has a Scribe client that's trying to connect, but Scuba logging isn't enabled, and this is just trying to connect but not send anything), but something with the blackhole is causing this hg - API server interaciton to fail.
In the meantime, this diff disables the blackhole for those tests that definitely don't work when it's enabled ...
Reviewed By: HarveyHunt
Differential Revision: D16599929
fbshipit-source-id: c6d77c5428e206cd41d5466e20405264622158ab
Summary: This updates our repo config to allow passing through Filestore params. This will be useful to conditionally enable Filestore chunking for new repos.
Reviewed By: HarveyHunt
Differential Revision: D16580700
fbshipit-source-id: b624bb524f0a939f9ce11f9c2983d49f91df855a
Summary:
This allows running blobimport multiple times over the same path locally (with a blob files storage, for example), which is how we use it in prod (but there we don't use the file blobstore so it works).
This is helpful when playing around with local changes to blobimport.
Reviewed By: HarveyHunt
Differential Revision: D16580697
fbshipit-source-id: 4a62ff89542f67ce6396948c666244ef40ffe5e7
Summary: This updates the Filestore to avoid boxifying in its chunking code. The upshot is that this gets us to a place where passing a Send Stream into the Filestore gives you a Send Future back, and passing in a non-Send Stream in the Filestore gives you a non-Send future back (I hinted at this earlier in the diff that introduced faster writes).
Reviewed By: aslpavel
Differential Revision: D16560768
fbshipit-source-id: b77766380f2eaed5919f78cef6fbc02afeead0b9
Summary: As noted earlier in this stack, the Filestore read implementation was very inefficient, since it required reading a Chunk before moving on to the next one. Since our Chunked files will typically have just one level of chunking, this is very inefficient (we could be fetching additional chunks ahead of time). This new implementation lets us take advantage of buffering, so we can load arbitrarily as many chunks in parallel as we'd like.
Reviewed By: aslpavel
Differential Revision: D16560767
fbshipit-source-id: c02c10c5de0fc5fdc3ee3897ae855b316ea34605
Summary:
This updates the Filestore to make writes faster by farming out all hashing to
separate Tokio tasks. This lets us increase throughput of the Filestore
substantially, since we're no longer limited by the ability of a single core to
hash data.
On my dev server, when running on a 1MB file, this lets us improve the
throughput of the Filestore for writes from 36.50 MB/s (0.29 Gb/s) to 152.61
MB/s (1.19 Gb/s) when using a chunk size of 1MB and a concurrency level of 10
(i.e. 10 concurrent chunk uploads).
Note that the chunk size has a fairly limited impact on performance (e.g.
making 10KB instead has a <10% impact on performance).
Of course, this doesn't reflect performance when uploading to a remote
blobstore, but note that we can tune that by tweaking our upload concurrency
(making uploads faster at the expense of more memory).
---
Note that as part of this change, I updated the implementation away from stream
splitting, and into an implementation that fans out to Sinks. I actually had
implementation of a higher-performance filestore for both, but went with this
approach because it doesn't require the incoming Stream to be Send (and I have
a forthcoming diff to make the whole Filestore not require a Send input), which
will be useful when incorporating with the API Server, which unfortunately does
not provide us with a Send input.
Reviewed By: aslpavel
Differential Revision: D16560769
fbshipit-source-id: b2e414ea3b47cc4db17f82d982618bbd837f93a9
Summary:
This is a very trivial patch simply intended to make it easier to safely roll
out the Filestore. With this, we can roll out the Filestore with chunking fully
disabled and make sure all hosts know how to read chunked data before we turn
it on.
Reviewed By: aslpavel, HarveyHunt
Differential Revision: D16560844
fbshipit-source-id: 30c49c27e839dfb06b417050c0fcde13296ddade
Summary: This adds a filestore benchmark that allows for playing around with Filestore parameters and makes it easier to measure performance improvements.
Reviewed By: aslpavel
Differential Revision: D16559941
fbshipit-source-id: 50a4e91ad07bf6f9fc1efab14aa1ea6c81b9ca27
Summary: This updates the API server's DownloadLargeFile method to return a streaming response, instead of buffering all file contents in a Bytes blob.
Reviewed By: aslpavel
Differential Revision: D16494240
fbshipit-source-id: bdbece99215d87be6a65e67f8f2d920933109e15
Summary:
This adds support for LFS filenodes in blobimport. This works by passing a `--lfs-helper` argument, which should be an executable that can provide a LFS blob's contents on stdout when called with the OID and size of a LFS blob. My thinking is to `curl` directly from Dewey when running this in prod.
Note that, as of this change, we can blobimport LFS files, but doing so might be somewhat inefficient, since we'll roundtrip the blobs to our filestore, then generate filenodes. For now, however, I'd like to start with this so we can get a sense of whether this is acceptable performance-wise.
Reviewed By: farnz
Differential Revision: D16494241
fbshipit-source-id: 2ae032feb1530c558edf2cfbe967444a9a7c0d0f
Summary: This update our UploadHgFileContents::ContentUploaded implementation to not require buffering file contents in order to produce a Mercurial Filenode ID.
Reviewed By: farnz
Differential Revision: D16457833
fbshipit-source-id: ce2c5577ffbe91dfd0de1cac7d85b8d90ded140e
Summary:
NOTE: This isn't 100% complete yet. I have a little more work to do around the aliasverify binary, but I think it'll make sense to rework this a little bit with the Filestore anyway.
This patch incorporates the Filestore throughout Mononoke. At this time, what this means is:
- Blobrepo methods return streams of `FileBytes`.
- Various callsites that need access to `FileBytes` call `concat2` on those streams.
This also eliminates the Sha256 aliasing code that we had written for LFS and replaces it with a Filestore-based implementation.
However, note that this does _not_ change how files submitted through `unbundle` are written to blobstores right now. Indeed, those contents are passed into the Filestore through `store_bytes`, which doesn't do chunking. This is intentional since it lets us use LFS uploads as a testbed for chunked storage before turning it on for everything else (also, chunking those requires further refactoring of content uploads, since right now they don't expect the `ContentId` to come back through a Future).
The goal of doing it this way is to make the transition simpler. In other words, this diff doesn't change anything functionally — it just updates the underlying API we use to access files. This is also important to get a smooth release: it we had new servers that started chunking things while old servers tried to read them, things would be bad. Doing it this way ensures that doesn't happen.
This means that streaming is there, but it's not being leveraged just yet. I'm planning to do so in a separate diff, starting with the LFS read and write endpoints in
Reviewed By: farnz
Differential Revision: D16440671
fbshipit-source-id: 02ae23783f38da895ee3052252fa6023b4a51979
Summary:
This adds a `store_bytes` call in the Filestore that can be used to store a set of Bytes *without chunking* and return a `FileContents` blob. This is intended as a transitional API while we incorporate the Filestore throughout the codebase. It's useful for 2 reasons:
- It lets us roll out chunked writes gradually where we need it. My goal is to use the Filestore chunking writes API for LFS, but keep using `store_bytes` initially in other places. This means content submitted through regular Mercurial bundles won't be chunked until we feel comfortable with chunking it.
- It lets us use the Filestore in places where we were relying on the assumption that you can immediately turn Bytes into a ContentId (notably: in the `UploadHgFileContents::RawBytes` code path).
This is intended to be removed later.
Reviewed By: aslpavel
Differential Revision: D16440670
fbshipit-source-id: e591f89bb876d08e6b6f805e35f0b791e61a6474
Summary:
This adds a new `peek(.., N, ..)` call in the Filestore that allows reading at least N bytes from a file in the Filestore. This is helpful for generating Mercurial metadata blobs.
This is implemented using the same ChunkStream we use to write to the Filestore (but that needed a little fixing to support streams of empty bytes as well).
Reviewed By: aslpavel
Differential Revision: D16440672
fbshipit-source-id: a584099f87ab34e2151b9c3f5c9f1289575f024b
Summary: This updates the Filestore API to be a set of functions, instead of a Struct. The rationale here is that there is only one Filestore call that needs anything on top of a blobstore (the write call), so making those functions makes it much easier to incorporate Filestore in various places that need it without having to thread a Filestore all the way through.
Reviewed By: aslpavel
Differential Revision: D16440668
fbshipit-source-id: 4b4bc8872e205a66a12ec96a478f0f1811f2e6b1
Summary:
This adds supporting for reusing Mercurial filenodes in
`UploadHgFileContents::execute`.
The motivation behind this is that given file contents, copy info, and parents,
the file node ID is deterministic, but computing it requires fetching and
hashing the body of the file. This implementation implements a lookup through
the blobstore to find a pre-computed filenode ID.
Doing so is in general a little inefficient (but it's not entirely certain that
the implementation I'm proposing here is faster -- more on this below), but
it's particularly problematic large files. Indeed, fetching a multiple-GB file
to recompute the filenode even if we received it from the client can be fairly
slow (and use up quite a bit of RAM, though that's something we can mitigate by
streaming file contents).
Once thing worth noting here (hence the RFC flag) is that there is a bit of a
potential for performance regression. Indeed, we could have a cache miss when
looking up the filenode ID, and then we'll have to fetch the file.
*At this time*, this is also somewhat inefficient, since we'll have to fetch
the file anyway to peek at its contents in order to generate metadata. This
is fixed later in this Filestore stack.
That said, an actual regression seems a little unlikely to happen since in
practice we'll write out the lookup entry when accepting a pushrebase
then do a lookup on it later when converting the pushrebased Bonsai changeset
to a Mercurial changeset).
If we're worried, then perhaps adding hit / miss stats on the lookup might make
sense. Let me know what you think.
---
Finally, there's a bit I don't love here, which is trusting LFS clients with the size of their uploads.
I'm thinking of fixing this when I finish the Filestore work.
Reviewed By: aslpavel
Differential Revision: D16345248
fbshipit-source-id: 6ce8a191efbb374ff8a1a185ce4b80dc237a536d