sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 08:47:12 +03:00

Author	SHA1	Message	Date
Thomas Orozco	5537bc0097	mononoke/blobimport: parallelize filenode computations Summary: This updates blobimport to run filenode computations on their own Tokio task. This is particularly helpful when dealing with a repo with large files: recomputing filenodes can be a fairly expensive process if a large number of large files have moved. Running those on separate tasks allows for better utilization of multiple CPUs. Reviewed By: HarveyHunt Differential Revision: D16667040 fbshipit-source-id: 0746af4da876c0b4313b86aca32c5e261a316a8c	2019-08-07 08:33:44 -07:00
Thomas Orozco	98f94de92a	mononoke/blobimport: lfs retries Summary: This updates blobimport to retry on failures to fetch through its LFS helper. Reviewed By: HarveyHunt Differential Revision: D16667042 fbshipit-source-id: dfaec8fc029030a7d90806018301d22525d094df	2019-08-07 08:33:43 -07:00
Thomas Orozco	1a83e1eb49	mononoke/blobimport: remove `extern crate` statements Summary: This updates blobimport to a newer Rust-2018-style. Rust 2018 is nicer. Reviewed By: StanislavGlebik Differential Revision: D16667043 fbshipit-source-id: a217cbef0337e7d7cf288f972e9b4af0340a8e0b	2019-08-07 08:33:43 -07:00
Thomas Orozco	10116e3a04	mononoke/blobimport: use shared limits for blob uploads and LFS imports Summary: This updates blobimport to avoid using a per-changeset limit for the number of blob uploads (which used to be 100), and instead to use a global blob upload limit, and a global LFS import limit. This allows for finer-grained control over the resource utilization (notably in terms of Manifold quota) of blobimport. While I was in there, I also eliminated an `Arc<BlobRepo>` that was laying around, sicne we don't use those anymore. Reviewed By: HarveyHunt Differential Revision: D16667044 fbshipit-source-id: 9fc2f347969c7ca9472ce8dd3d4e2f1acb175b66	2019-08-07 08:33:43 -07:00
Thomas Orozco	97b0763798	mononoke/blobimport: check for reusable LFS blobs Summary: Updates blobimport to check for existing LFS blobs. This is perfectly safe since at worse it'll fail closed (i.e. blobs will be missing and blobimport will hang). Reviewed By: HarveyHunt Differential Revision: D16648560 fbshipit-source-id: ed9da7f2fa69c28451ac058a4e1adc937d111b31	2019-08-07 08:33:42 -07:00
Thomas Orozco	f792f983b7	mononoke/blobimport: make commit limit configurable Summary: This updates blobimport to allow specifying a different commit limit than the default of 100. This makes it a little more convenient when tuning limits. Reviewed By: HarveyHunt Differential Revision: D16648558 fbshipit-source-id: 2e8c8496487a79bec84ec964302b1d6e871caf2a	2019-08-07 08:33:42 -07:00
Thomas Orozco	6d0601db38	mononoke/blobimport: log more often Summary: Let's log a bit more often than once every 5,000 commits. Logging 10 times per run seems reasonable. Reviewed By: HarveyHunt Differential Revision: D16621623 fbshipit-source-id: b5ae481f5d899a12c736bb2de74efc8b370002df	2019-08-07 08:33:42 -07:00
Thomas Orozco	823e7a14ee	mononoke/blobimport: log LFS helper context Summary: If you give blobimport a bad path for the LFS helper, it just tells you the file doesn't exist, which is not very useful if you don't know what file we're talking about. This fixes that. Reviewed By: HarveyHunt Differential Revision: D16621624 fbshipit-source-id: ff7fb5c8800604f8799c682e5957c888d2b01647	2019-08-07 08:33:41 -07:00
Thomas Orozco	8f0e05dc1b	mononoke: remove glusterblob Reviewed By: HarveyHunt, farnz Differential Revision: D16687877 fbshipit-source-id: 0cf9f785adb603eac180b4b7a35d3f6b6b3fcf2a	2019-08-07 07:37:17 -07:00
Thomas Orozco	74b47b28da	mononoke/admin: don't process::exit Summary: Sometimes, our Mononoke CLI tests report errors in mode/opt where they don't actually output error messages at all when returning an error. Those error messages actually get buffered by our logger, and are normally emitted when said logger is dropped. However, I have a suspicion that by calling process::exit(), we're not giving Rust a chance to drop the logger (because that eventually just calls the exit syscall), and as a result our buffer isn't getting flushed. This patch attempts to fix this by removing all calls to process::exit in the admin CLI, and instead returning an exit code from main, which I hope will allow main to drop everything before it returns. More broadly speaking, I don't think it's very good practice to exit from all over the place in the admin CLI, so this fixes that :) Reviewed By: HarveyHunt Differential Revision: D16687747 fbshipit-source-id: 38e987463363e239d4f9166050a8dd26a4bef985	2019-08-07 06:43:10 -07:00
Alex Hornby	6011988595	mononoke: blobstore healer code cleanups Summary: Address nits from D16599270: * use map instead of and_then * use &[T] instead of &Vec[T] Reviewed By: StanislavGlebik Differential Revision: D16666838 fbshipit-source-id: ad0afa2d44d7a713409ac75bab599a35f5cf1a87	2019-08-06 10:44:41 -07:00
Thomas Orozco	16f19442a1	mononoke/filestore: remove ::* usage Summary: fanzeyi pointed out that get_metadata calling a different get_metadata because of `use metadata ::*` was confusing. This cleans this up. Reviewed By: fanzeyi Differential Revision: D16669239 fbshipit-source-id: f1e56fa9691c885ed56475c911f9407454afc4bb	2019-08-06 10:35:14 -07:00
Pavel Aslanov	06c8fae85b	added `bounded_traversal_dag` Summary: This diff introduces `bounded_traversal_dag` which can handle arbitrary DAGs and detect invalid DAGs with cycles, but it has limitation in comparison to `bounded_traversal`: - `bounded_traversal_dag` keeps `Out` result of computation for all the nodes but `bounded_traversal` only keeps results for nodes that have not been completely evaluatated - `In` has additional constraints to be `Eq + Hash + Clone` - `Out` has additional constraint to be `Clone` Reviewed By: krallin Differential Revision: D16621004 fbshipit-source-id: b9f60e461d5d50e060be4f5bb6b970f16a9b99f9	2019-08-05 05:41:17 -07:00
Thomas Orozco	e192f295df	mononoke/admin: add filestore debug subcommands Summary: This adds debug subcommands metadata and verify in the Filestore. Those respectively output the metadata for a file verify that the file is reachable through all aliases. Reviewed By: ahornby Differential Revision: D16621789 fbshipit-source-id: 4a2156bfffb9d9641ce58f6d5f691364ba9dc145	2019-08-05 05:28:53 -07:00
Thomas Orozco	07f3fa1a88	mononoke/integration: un-blackhole apiserver tests Summary: Johan fixed retry logic in Mercurial, so those tests can now succeed even if the blackhole is enabled (though we haven't fully understood why the blackhole was breaking them in the first place). Differential Revision: D16646032 fbshipit-source-id: 8b7ff2d8d284e003e49681e737367e9942370fa1	2019-08-05 05:21:14 -07:00
Alex Hornby	e2e9f35211	mononoke: Ensure blobstore_healer can still make progress when queue contains unknown stores Summary: Update blobstore_healer handling of unknown stores to re-queue and delete original entries. Make sure it we still make progress in the case where there are lots of unknown blobstore entries on the queue. Previous diff in stack took the approach of not deleting, which could keep loading and logging same entries if there were more than blobstore_sync_queue_limit of them. Better to reinsert with new timestamp and delete old entries. Reviewed By: krallin Differential Revision: D16599270 fbshipit-source-id: efa3e5602f0ab3a037d0534e1fe8e3d42fbb52e6	2019-08-05 03:50:55 -07:00
Alex Hornby	d17d3475ed	mononoke: make blob store healer preserve queue entries for unknown blobstores Summary: make blob store healer preserve queue entries for unknown blobstores rather than erroring Reviewed By: ikostia Differential Revision: D16586816 fbshipit-source-id: 3d4987a95adcddd0329b9ededdf95887aa11286e	2019-08-05 03:50:54 -07:00
Alex Hornby	f864348558	mononoke: add healer logic to fetch from all source blobstores on the queue Summary: Add healer logic to fetch from all source blobstores on the queue Add tests for the healer queue state including put failures Reviewed By: krallin Differential Revision: D16549013 fbshipit-source-id: 6aa55b3cb2ed7fa9a1630edd5bc5b2ad2c6f5011	2019-08-05 03:50:54 -07:00
Alex Hornby	00d855084d	mononoke/blobstore_healer: fixes for the handling of blobstores after heal Summary: Fixes for the handling of blobstores after heal: 1. If all blobstores are successfully healed for a key, no need to requeue it 2. Where all heal puts fail, make sure we requeue with at least the original source blobstore we loaded the blob from 3. When we do write to the queue, write with all blobstore ids where we know we have good data, so that when it is read later it is not considered missing. Reviewed By: krallin Differential Revision: D15911853 fbshipit-source-id: 1c81ce4ec5f975e5230b27934662e02ec515cb8f	2019-08-05 03:50:54 -07:00
Alex Hornby	d1a8c487ae	mononoke: make blobstore_healer auto-heal missing source blobstores where possible Summary: make blobstore_healer auto-heal source blobstores found to be missing data so long as at least one other source blobstore from the queue has the data for the missing key Reviewed By: krallin Differential Revision: D16464895 fbshipit-source-id: 32549e58933f39bb20c173caf02a35c91123fe8d	2019-08-05 03:50:54 -07:00
Alex Hornby	a06468acd6	mononoke: add key filter option to blobstore healer Summary: Add blobstore key filter option to blobstore healer to allow easier reproduction of healer issues for particular keys. Reviewed By: StanislavGlebik Differential Revision: D16457530 fbshipit-source-id: 23201e45fbdf14fa7fdccbaf8e0f4b29355aa906	2019-08-05 03:50:53 -07:00
Alex Hornby	82f2a70c3b	mononoke/blobstore_healer: remove ratelimiter Summary: Since we're only running a single healer in the process for a single blobstore, its easy to bound the concurrency by limiting it to the number of entries we deal with at once. As a result, we don't need a separate mechanism to do overall control. Reviewed By: StanislavGlebik Differential Revision: D15912818 fbshipit-source-id: 3087b88cfdfed2490664cd0df10bd6f126267b83	2019-08-05 03:50:53 -07:00
Alex Hornby	3d27faba08	mononoke/blobstore_healer: add comments and type annotations Summary: Basically notes I took for myself to truely understand the code. Reviewed By: StanislavGlebik Differential Revision: D15908406 fbshipit-source-id: 3f21f7a1ddce8e15ceeeffdb5518fd7f5b1749c4	2019-08-05 03:50:53 -07:00
Alex Hornby	978242fb35	mononoke/repoconfig: update to using chain_err Reviewed By: StanislavGlebik Differential Revision: D15072797 fbshipit-source-id: 5339d78de265463ad800d7fb8db8a1444e3fdd6b	2019-08-05 03:50:52 -07:00
Alex Hornby	4322423811	mononoke: Update blobstore_healer for new storage config model Summary: Allow blobstore_healer to be directly configured to operate on a blobstore. This makes two changes: - Define which blobstore to operate on defined in storage.toml (doesn't currently support server.toml-local storage configs) - Only heal one blobstore at a time. We can run multiple separate instances of the healer to heal multiple blobstores. Reviewed By: HarveyHunt Differential Revision: D15065422 fbshipit-source-id: 5bc9f1a16fc83ca5966d804b5715b09d359a3832	2019-08-05 03:50:52 -07:00
Alex Hornby	b43dc6e5a7	mononoke: Migrate populate_healer for new storage config data model Summary: Update populate_healer to act directly on a blobstore config rather than indirectly via a repo config. Reviewed By: StanislavGlebik Differential Revision: D15065424 fbshipit-source-id: 638778a61283dc9ed991c49936a21d02b8d2e3f3	2019-08-05 03:50:52 -07:00
Alex Hornby	59b47cf4fa	mononoke: Drop repoid from healer structures Summary: The healer is a blobstore-level operation, which is orthogonal to the concept of a repo; therefore, there should be no mention of repoid in any of the healer's structures or tables. For now this leaves the schema unmodified, and fills the repoid with a dummy value (0). We can clean that up later. Reviewed By: lukaspiatkowski, HarveyHunt Differential Revision: D15051896 fbshipit-source-id: 438b4c6885f18934228f43d85cdb8bf2f0e542f1	2019-08-05 03:50:51 -07:00
Alex Hornby	f4e304eb09	mononoke/sqlblob: drop repo_id from everywhere Summary: RepositoryId shouldn't leak into the blobstore layer. This leaves repoid in the schema, but just populates it with a dummy value (0). We can clean up the schema and this code in a later diff. Reviewed By: StanislavGlebik Differential Revision: D15021285 fbshipit-source-id: 3ecb04a76ce74409ed0cced3d2a0217eacd0e2fb	2019-08-05 03:50:51 -07:00
Kostia Balytskyi	bc985480e9	mononoke: add the filenode subcommand to the admin tool Summary: This is useful to inspect the Mercurial filenodes in Mononoke, like in S183272. For example, I intend to use this subcommand to verify how well the future linknode healing works. Reviewed By: krallin Differential Revision: D16621516 fbshipit-source-id: 4266f85bce29b59072bf9c4f3e63777dae09a4f1	2019-08-02 12:45:57 -07:00
Kostia Balytskyi	1420897ff8	mononoke: separate cs_id fetching from filenode fetching in the admin tool Summary: Let's separate some concerns. Reviewed By: krallin Differential Revision: D16621518 fbshipit-source-id: 2d6ca96b72d5ffbc0fac4a4f9643ecc2acde0ca2	2019-08-02 12:45:57 -07:00
Kostia Balytskyi	432f1f6401	mononoke: in admin, move get_file_nodes to common Summary: This is needed in the following diff. Reviewed By: krallin Differential Revision: D16621517 fbshipit-source-id: 5a50cae7c8b761d7578bcbe5caf302a5ee2578a3	2019-08-02 12:45:57 -07:00
Thomas Orozco	ea059ef2c7	mononoke/benchmark_filestore: add support for testing with caches Summary: This updates benchmark_filestore to allow testing with caches (notably, Memcache & Cachelib). It also reads twice now, which is nice for caches that aren't filled by us (e.g. Manifold CDN). Reviewed By: ahornby Differential Revision: D16584952 fbshipit-source-id: 48ceaa9f2ea393626ac0e5f3988672df020fbc28	2019-08-02 05:40:29 -07:00
Thomas Orozco	bea4a85117	mononoke/types: clean up ContentMetadata out of FileContents Summary: There's a lot of stuff in file_contents.rs that's not actually about file contents per-se. This fixes that. Reviewed By: ahornby Differential Revision: D16598905 fbshipit-source-id: 9832b96261264c54809e0c32980cf449f8537517	2019-08-02 03:43:16 -07:00
Thomas Orozco	68569e5d0c	mononoke/{types,filestore}: use a separate type for File Chunks Summary: NOTE: This changes our file storage format. It's fine to do it now since we haven't started enabling chunking yet (see: D16560844). This updates the Filestore's chunking to store chunks as their own entity in Thrift, as opposed to have them be just FileContents. The upside of this approach is that this we can't have an entity that's both a File and a Chunk, which means: - We don't have to deal with recursive chunks (since, unlike Files, Chunks can't contain be pointers to other chunks). - We don't have to produce metadata (forward mappings and backmappings) for chunks (the reason we had to produce it was to make sure we wouldn't accidentally produce inconsitent data if the upload for one of our chunks happened to have been tried as a file earlier and failed). Note that this also updates the return value from the Filestore to `ContentMetadata`. We were using `Chunk` before there because it was sufficient and convenient, but now that `Chunk` no longer contains a `ContentId`, it no longer is convenient, so it's worth changing :) Reviewed By: HarveyHunt Differential Revision: D16598906 fbshipit-source-id: f6bec75d921f1dea3a9ea3441f57213f13aeb395	2019-08-02 03:43:16 -07:00
Thomas Orozco	cfa4c8332f	mononoke/integration: disable blackhole for apiserver tests Summary: The network blackhole is causing the API server to occasionally hang while serving requests, which has broken some LFS tests. This appears to be have happened in the last month or so, but unfortunately, I haven't been able to root cause why this is happening. From what I can tell, we have an hg client that tries an upload to the API Server, and uploads everything... and then the API server just hangs. If I kill the hg client, then the API server responds with a 400 (so it's not completely stuck), but otherwise it seems like the API server is waiting for something to happen on the client-side, but the client isn't sending that. As far as I can tell, the API Server isn't actualy trying to make outbound requests (strace does report that it has a Scribe client that's trying to connect, but Scuba logging isn't enabled, and this is just trying to connect but not send anything), but something with the blackhole is causing this hg - API server interaciton to fail. In the meantime, this diff disables the blackhole for those tests that definitely don't work when it's enabled ... Reviewed By: HarveyHunt Differential Revision: D16599929 fbshipit-source-id: c6d77c5428e206cd41d5466e20405264622158ab	2019-08-01 07:36:02 -07:00
Thomas Orozco	7ba44d737c	mononoke: add filestore params to configuration Summary: This updates our repo config to allow passing through Filestore params. This will be useful to conditionally enable Filestore chunking for new repos. Reviewed By: HarveyHunt Differential Revision: D16580700 fbshipit-source-id: b624bb524f0a939f9ce11f9c2983d49f91df855a	2019-07-31 11:48:18 -07:00
Thomas Orozco	9b90fd63cb	mononoke/blobimport: allow running without re-creating store Summary: This allows running blobimport multiple times over the same path locally (with a blob files storage, for example), which is how we use it in prod (but there we don't use the file blobstore so it works). This is helpful when playing around with local changes to blobimport. Reviewed By: HarveyHunt Differential Revision: D16580697 fbshipit-source-id: 4a62ff89542f67ce6396948c666244ef40ffe5e7	2019-07-31 11:42:36 -07:00
Thomas Orozco	ac19446c03	mononoke/filestore: dont boxify Summary: This updates the Filestore to avoid boxifying in its chunking code. The upshot is that this gets us to a place where passing a Send Stream into the Filestore gives you a Send Future back, and passing in a non-Send Stream in the Filestore gives you a non-Send future back (I hinted at this earlier in the diff that introduced faster writes). Reviewed By: aslpavel Differential Revision: D16560768 fbshipit-source-id: b77766380f2eaed5919f78cef6fbc02afeead0b9	2019-07-31 05:19:42 -07:00
Thomas Orozco	6c07d8da97	mononoke/filestore: make reads faster Summary: As noted earlier in this stack, the Filestore read implementation was very inefficient, since it required reading a Chunk before moving on to the next one. Since our Chunked files will typically have just one level of chunking, this is very inefficient (we could be fetching additional chunks ahead of time). This new implementation lets us take advantage of buffering, so we can load arbitrarily as many chunks in parallel as we'd like. Reviewed By: aslpavel Differential Revision: D16560767 fbshipit-source-id: c02c10c5de0fc5fdc3ee3897ae855b316ea34605	2019-07-31 05:19:42 -07:00
Thomas Orozco	1c6ca01a25	mononoke/filestore: make writes fast Summary: This updates the Filestore to make writes faster by farming out all hashing to separate Tokio tasks. This lets us increase throughput of the Filestore substantially, since we're no longer limited by the ability of a single core to hash data. On my dev server, when running on a 1MB file, this lets us improve the throughput of the Filestore for writes from 36.50 MB/s (0.29 Gb/s) to 152.61 MB/s (1.19 Gb/s) when using a chunk size of 1MB and a concurrency level of 10 (i.e. 10 concurrent chunk uploads). Note that the chunk size has a fairly limited impact on performance (e.g. making 10KB instead has a <10% impact on performance). Of course, this doesn't reflect performance when uploading to a remote blobstore, but note that we can tune that by tweaking our upload concurrency (making uploads faster at the expense of more memory). --- Note that as part of this change, I updated the implementation away from stream splitting, and into an implementation that fans out to Sinks. I actually had implementation of a higher-performance filestore for both, but went with this approach because it doesn't require the incoming Stream to be Send (and I have a forthcoming diff to make the whole Filestore not require a Send input), which will be useful when incorporating with the API Server, which unfortunately does not provide us with a Send input. Reviewed By: aslpavel Differential Revision: D16560769 fbshipit-source-id: b2e414ea3b47cc4db17f82d982618bbd837f93a9	2019-07-31 05:19:42 -07:00
Thomas Orozco	9bc7076a6a	mononoke/filestore: default to no chunking Summary: This is a very trivial patch simply intended to make it easier to safely roll out the Filestore. With this, we can roll out the Filestore with chunking fully disabled and make sure all hosts know how to read chunked data before we turn it on. Reviewed By: aslpavel, HarveyHunt Differential Revision: D16560844 fbshipit-source-id: 30c49c27e839dfb06b417050c0fcde13296ddade	2019-07-31 05:19:42 -07:00
Thomas Orozco	6a9007e9e0	mononoke: add a filestore benchmark + configurable concurrency Summary: This adds a filestore benchmark that allows for playing around with Filestore parameters and makes it easier to measure performance improvements. Reviewed By: aslpavel Differential Revision: D16559941 fbshipit-source-id: 50a4e91ad07bf6f9fc1efab14aa1ea6c81b9ca27	2019-07-31 05:19:41 -07:00
Thomas Orozco	b4a2d2f36a	mononoke/apiserver: make DownloadLargeFile a streaming response Summary: This updates the API server's DownloadLargeFile method to return a streaming response, instead of buffering all file contents in a Bytes blob. Reviewed By: aslpavel Differential Revision: D16494240 fbshipit-source-id: bdbece99215d87be6a65e67f8f2d920933109e15	2019-07-31 05:19:41 -07:00
Thomas Orozco	4e30164506	mononoke/blobimport: support LFS filenodes Summary: This adds support for LFS filenodes in blobimport. This works by passing a `--lfs-helper` argument, which should be an executable that can provide a LFS blob's contents on stdout when called with the OID and size of a LFS blob. My thinking is to `curl` directly from Dewey when running this in prod. Note that, as of this change, we can blobimport LFS files, but doing so might be somewhat inefficient, since we'll roundtrip the blobs to our filestore, then generate filenodes. For now, however, I'd like to start with this so we can get a sense of whether this is acceptable performance-wise. Reviewed By: farnz Differential Revision: D16494241 fbshipit-source-id: 2ae032feb1530c558edf2cfbe967444a9a7c0d0f	2019-07-31 05:19:41 -07:00
Thomas Orozco	7c25a6010e	mononoke: UploadHgFileContents: don't buffer contents to compute a Filenode ID Summary: This update our UploadHgFileContents::ContentUploaded implementation to not require buffering file contents in order to produce a Mercurial Filenode ID. Reviewed By: farnz Differential Revision: D16457833 fbshipit-source-id: ce2c5577ffbe91dfd0de1cac7d85b8d90ded140e	2019-07-31 05:19:40 -07:00
Thomas Orozco	f9360cab9d	mononoke/filestore: incorporate in Mononoke Summary: NOTE: This isn't 100% complete yet. I have a little more work to do around the aliasverify binary, but I think it'll make sense to rework this a little bit with the Filestore anyway. This patch incorporates the Filestore throughout Mononoke. At this time, what this means is: - Blobrepo methods return streams of `FileBytes`. - Various callsites that need access to `FileBytes` call `concat2` on those streams. This also eliminates the Sha256 aliasing code that we had written for LFS and replaces it with a Filestore-based implementation. However, note that this does _not_ change how files submitted through `unbundle` are written to blobstores right now. Indeed, those contents are passed into the Filestore through `store_bytes`, which doesn't do chunking. This is intentional since it lets us use LFS uploads as a testbed for chunked storage before turning it on for everything else (also, chunking those requires further refactoring of content uploads, since right now they don't expect the `ContentId` to come back through a Future). The goal of doing it this way is to make the transition simpler. In other words, this diff doesn't change anything functionally — it just updates the underlying API we use to access files. This is also important to get a smooth release: it we had new servers that started chunking things while old servers tried to read them, things would be bad. Doing it this way ensures that doesn't happen. This means that streaming is there, but it's not being leveraged just yet. I'm planning to do so in a separate diff, starting with the LFS read and write endpoints in Reviewed By: farnz Differential Revision: D16440671 fbshipit-source-id: 02ae23783f38da895ee3052252fa6023b4a51979	2019-07-31 05:19:40 -07:00
Thomas Orozco	16801112a2	mononoke/filestore: add store_bytes for compatibility Summary: This adds a `store_bytes` call in the Filestore that can be used to store a set of Bytes without chunking and return a `FileContents` blob. This is intended as a transitional API while we incorporate the Filestore throughout the codebase. It's useful for 2 reasons: - It lets us roll out chunked writes gradually where we need it. My goal is to use the Filestore chunking writes API for LFS, but keep using `store_bytes` initially in other places. This means content submitted through regular Mercurial bundles won't be chunked until we feel comfortable with chunking it. - It lets us use the Filestore in places where we were relying on the assumption that you can immediately turn Bytes into a ContentId (notably: in the `UploadHgFileContents::RawBytes` code path). This is intended to be removed later. Reviewed By: aslpavel Differential Revision: D16440670 fbshipit-source-id: e591f89bb876d08e6b6f805e35f0b791e61a6474	2019-07-31 05:19:40 -07:00
Thomas Orozco	4faaa6b2f7	mononoke/filestore: introduce peek() Summary: This adds a new `peek(.., N, ..)` call in the Filestore that allows reading at least N bytes from a file in the Filestore. This is helpful for generating Mercurial metadata blobs. This is implemented using the same ChunkStream we use to write to the Filestore (but that needed a little fixing to support streams of empty bytes as well). Reviewed By: aslpavel Differential Revision: D16440672 fbshipit-source-id: a584099f87ab34e2151b9c3f5c9f1289575f024b	2019-07-31 05:19:39 -07:00
Thomas Orozco	6d675846bf	mononoke/filestore: switch to functions (config is only needed in writes) Summary: This updates the Filestore API to be a set of functions, instead of a Struct. The rationale here is that there is only one Filestore call that needs anything on top of a blobstore (the write call), so making those functions makes it much easier to incorporate Filestore in various places that need it without having to thread a Filestore all the way through. Reviewed By: aslpavel Differential Revision: D16440668 fbshipit-source-id: 4b4bc8872e205a66a12ec96a478f0f1811f2e6b1	2019-07-31 05:19:39 -07:00
Thomas Orozco	5e8148a968	mononoke: UploadHgFileContents: optimistically reuse HG filenodes Summary: This adds supporting for reusing Mercurial filenodes in `UploadHgFileContents::execute`. The motivation behind this is that given file contents, copy info, and parents, the file node ID is deterministic, but computing it requires fetching and hashing the body of the file. This implementation implements a lookup through the blobstore to find a pre-computed filenode ID. Doing so is in general a little inefficient (but it's not entirely certain that the implementation I'm proposing here is faster -- more on this below), but it's particularly problematic large files. Indeed, fetching a multiple-GB file to recompute the filenode even if we received it from the client can be fairly slow (and use up quite a bit of RAM, though that's something we can mitigate by streaming file contents). Once thing worth noting here (hence the RFC flag) is that there is a bit of a potential for performance regression. Indeed, we could have a cache miss when looking up the filenode ID, and then we'll have to fetch the file. At this time, this is also somewhat inefficient, since we'll have to fetch the file anyway to peek at its contents in order to generate metadata. This is fixed later in this Filestore stack. That said, an actual regression seems a little unlikely to happen since in practice we'll write out the lookup entry when accepting a pushrebase then do a lookup on it later when converting the pushrebased Bonsai changeset to a Mercurial changeset). If we're worried, then perhaps adding hit / miss stats on the lookup might make sense. Let me know what you think. --- Finally, there's a bit I don't love here, which is trusting LFS clients with the size of their uploads. I'm thinking of fixing this when I finish the Filestore work. Reviewed By: aslpavel Differential Revision: D16345248 fbshipit-source-id: 6ce8a191efbb374ff8a1a185ce4b80dc237a536d	2019-07-31 05:19:39 -07:00

1 2 3 4 5 ...

2334 Commits