Commit Graph

2322 Commits

Author SHA1 Message Date
Pavel Aslanov
06c8fae85b added bounded_traversal_dag
Summary:
This diff introduces `bounded_traversal_dag` which can handle arbitrary DAGs and detect invalid DAGs with cycles, but it has limitation in comparison to `bounded_traversal`:
  - `bounded_traversal_dag` keeps `Out` result of computation for all the nodes
     but `bounded_traversal` only keeps results for nodes that have not been completely
     evaluatated
  - `In` has additional constraints to be `Eq + Hash + Clone`
  - `Out` has additional constraint to be `Clone`

Reviewed By: krallin

Differential Revision: D16621004

fbshipit-source-id: b9f60e461d5d50e060be4f5bb6b970f16a9b99f9
2019-08-05 05:41:17 -07:00
Thomas Orozco
e192f295df mononoke/admin: add filestore debug subcommands
Summary: This adds debug subcommands metadata and verify in the Filestore. Those respectively output the metadata for a file verify that the file is reachable through all aliases.

Reviewed By: ahornby

Differential Revision: D16621789

fbshipit-source-id: 4a2156bfffb9d9641ce58f6d5f691364ba9dc145
2019-08-05 05:28:53 -07:00
Thomas Orozco
07f3fa1a88 mononoke/integration: un-blackhole apiserver tests
Summary:
Johan fixed retry logic in Mercurial, so those tests can now succeed even if
the blackhole is enabled (though we haven't fully understood why the blackhole
was breaking them in the first place).

Differential Revision: D16646032

fbshipit-source-id: 8b7ff2d8d284e003e49681e737367e9942370fa1
2019-08-05 05:21:14 -07:00
Alex Hornby
e2e9f35211 mononoke: Ensure blobstore_healer can still make progress when queue contains unknown stores
Summary:
Update blobstore_healer handling of unknown stores to re-queue and delete original entries.

Make sure it we still make progress in the case where there are lots of unknown blobstore entries on the queue.

Previous diff in stack took the approach of not deleting, which could keep loading and logging same entries if there were more than blobstore_sync_queue_limit of them.  Better to reinsert with new timestamp and delete old entries.

Reviewed By: krallin

Differential Revision: D16599270

fbshipit-source-id: efa3e5602f0ab3a037d0534e1fe8e3d42fbb52e6
2019-08-05 03:50:55 -07:00
Alex Hornby
d17d3475ed mononoke: make blob store healer preserve queue entries for unknown blobstores
Summary: make blob store healer preserve queue entries for unknown blobstores rather than erroring

Reviewed By: ikostia

Differential Revision: D16586816

fbshipit-source-id: 3d4987a95adcddd0329b9ededdf95887aa11286e
2019-08-05 03:50:54 -07:00
Alex Hornby
f864348558 mononoke: add healer logic to fetch from all source blobstores on the queue
Summary:
Add healer logic to fetch from all source blobstores on the queue

Add tests for the healer queue state including put failures

Reviewed By: krallin

Differential Revision: D16549013

fbshipit-source-id: 6aa55b3cb2ed7fa9a1630edd5bc5b2ad2c6f5011
2019-08-05 03:50:54 -07:00
Alex Hornby
00d855084d mononoke/blobstore_healer: fixes for the handling of blobstores after heal
Summary:
Fixes for the handling of blobstores after heal:
1. If all blobstores are successfully healed for a key, no need to requeue it
2. Where all heal puts fail, make sure we requeue with at least the original source blobstore we loaded the blob from
3. When we do write to the queue,  write with all blobstore ids where we know we have good data, so that when it is read later it is not considered missing.

Reviewed By: krallin

Differential Revision: D15911853

fbshipit-source-id: 1c81ce4ec5f975e5230b27934662e02ec515cb8f
2019-08-05 03:50:54 -07:00
Alex Hornby
d1a8c487ae mononoke: make blobstore_healer auto-heal missing source blobstores where possible
Summary: make blobstore_healer auto-heal source blobstores found to be missing data so long as at least one other source blobstore from the queue has the data for the missing key

Reviewed By: krallin

Differential Revision: D16464895

fbshipit-source-id: 32549e58933f39bb20c173caf02a35c91123fe8d
2019-08-05 03:50:54 -07:00
Alex Hornby
a06468acd6 mononoke: add key filter option to blobstore healer
Summary: Add blobstore key filter option to blobstore healer to allow easier reproduction of healer issues for particular keys.

Reviewed By: StanislavGlebik

Differential Revision: D16457530

fbshipit-source-id: 23201e45fbdf14fa7fdccbaf8e0f4b29355aa906
2019-08-05 03:50:53 -07:00
Alex Hornby
82f2a70c3b mononoke/blobstore_healer: remove ratelimiter
Summary: Since we're only running a single healer in the process for a single blobstore, its easy to bound the concurrency by limiting it to the number of entries we deal with at once. As a result, we don't need a separate mechanism to do overall control.

Reviewed By: StanislavGlebik

Differential Revision: D15912818

fbshipit-source-id: 3087b88cfdfed2490664cd0df10bd6f126267b83
2019-08-05 03:50:53 -07:00
Alex Hornby
3d27faba08 mononoke/blobstore_healer: add comments and type annotations
Summary: Basically notes I took for myself to truely understand the code.

Reviewed By: StanislavGlebik

Differential Revision: D15908406

fbshipit-source-id: 3f21f7a1ddce8e15ceeeffdb5518fd7f5b1749c4
2019-08-05 03:50:53 -07:00
Alex Hornby
978242fb35 mononoke/repoconfig: update to using chain_err
Reviewed By: StanislavGlebik

Differential Revision: D15072797

fbshipit-source-id: 5339d78de265463ad800d7fb8db8a1444e3fdd6b
2019-08-05 03:50:52 -07:00
Alex Hornby
4322423811 mononoke: Update blobstore_healer for new storage config model
Summary:
Allow blobstore_healer to be directly configured to operate on a blobstore.
This makes two changes:
- Define which blobstore to operate on defined in storage.toml (doesn't
  currently support server.toml-local storage configs)
- Only heal one blobstore at a time. We can run multiple separate instances of the
  healer to heal multiple blobstores.

Reviewed By: HarveyHunt

Differential Revision: D15065422

fbshipit-source-id: 5bc9f1a16fc83ca5966d804b5715b09d359a3832
2019-08-05 03:50:52 -07:00
Alex Hornby
b43dc6e5a7 mononoke: Migrate populate_healer for new storage config data model
Summary: Update populate_healer to act directly on a blobstore config rather than indirectly via a repo config.

Reviewed By: StanislavGlebik

Differential Revision: D15065424

fbshipit-source-id: 638778a61283dc9ed991c49936a21d02b8d2e3f3
2019-08-05 03:50:52 -07:00
Alex Hornby
59b47cf4fa mononoke: Drop repoid from healer structures
Summary:
The healer is a blobstore-level operation, which is orthogonal to the concept of a repo; therefore, there should be no mention of repoid in any of the healer's structures or tables.

For now this leaves the schema unmodified, and fills the repoid with a dummy value (0). We can clean that up later.

Reviewed By: lukaspiatkowski, HarveyHunt

Differential Revision: D15051896

fbshipit-source-id: 438b4c6885f18934228f43d85cdb8bf2f0e542f1
2019-08-05 03:50:51 -07:00
Alex Hornby
f4e304eb09 mononoke/sqlblob: drop repo_id from everywhere
Summary: RepositoryId shouldn't leak into the blobstore layer. This leaves repoid in the schema, but just populates it with a dummy value (0). We can clean up the schema and this code in a later diff.

Reviewed By: StanislavGlebik

Differential Revision: D15021285

fbshipit-source-id: 3ecb04a76ce74409ed0cced3d2a0217eacd0e2fb
2019-08-05 03:50:51 -07:00
Kostia Balytskyi
bc985480e9 mononoke: add the filenode subcommand to the admin tool
Summary:
This is useful to inspect the Mercurial filenodes in Mononoke, like in S183272.

For example, I intend to use this subcommand to verify how well the future linknode healing works.

Reviewed By: krallin

Differential Revision: D16621516

fbshipit-source-id: 4266f85bce29b59072bf9c4f3e63777dae09a4f1
2019-08-02 12:45:57 -07:00
Kostia Balytskyi
1420897ff8 mononoke: separate cs_id fetching from filenode fetching in the admin tool
Summary: Let's separate some concerns.

Reviewed By: krallin

Differential Revision: D16621518

fbshipit-source-id: 2d6ca96b72d5ffbc0fac4a4f9643ecc2acde0ca2
2019-08-02 12:45:57 -07:00
Kostia Balytskyi
432f1f6401 mononoke: in admin, move get_file_nodes to common
Summary: This is needed in the following diff.

Reviewed By: krallin

Differential Revision: D16621517

fbshipit-source-id: 5a50cae7c8b761d7578bcbe5caf302a5ee2578a3
2019-08-02 12:45:57 -07:00
Thomas Orozco
ea059ef2c7 mononoke/benchmark_filestore: add support for testing with caches
Summary: This updates benchmark_filestore to allow testing with caches (notably, Memcache & Cachelib). It also reads twice now, which is nice for caches that aren't filled by us (e.g. Manifold CDN).

Reviewed By: ahornby

Differential Revision: D16584952

fbshipit-source-id: 48ceaa9f2ea393626ac0e5f3988672df020fbc28
2019-08-02 05:40:29 -07:00
Thomas Orozco
bea4a85117 mononoke/types: clean up ContentMetadata out of FileContents
Summary: There's a lot of stuff in file_contents.rs that's not actually about file contents per-se. This fixes that.

Reviewed By: ahornby

Differential Revision: D16598905

fbshipit-source-id: 9832b96261264c54809e0c32980cf449f8537517
2019-08-02 03:43:16 -07:00
Thomas Orozco
68569e5d0c mononoke/{types,filestore}: use a separate type for File Chunks
Summary:
NOTE: This changes our file storage format. It's fine to do it now since we haven't started enabling chunking yet (see: D16560844).

This updates the Filestore's chunking to store chunks as their own entity in Thrift, as opposed to have them be just FileContents.

The upside of this approach is that this we can't have an entity that's both a File and a Chunk, which means:

- We don't have to deal with recursive chunks (since, unlike Files, Chunks can't contain be pointers to other chunks).
- We don't have to produce metadata (forward mappings and backmappings) for chunks (the reason we had to produce it was to make sure we wouldn't accidentally produce inconsitent data if the upload for one of our chunks happened to have been tried as a file earlier and failed).

Note that this also updates the return value from the Filestore to `ContentMetadata`. We were using `Chunk` before there because it was sufficient and convenient, but now that `Chunk` no longer contains a `ContentId`, it no longer is convenient, so it's worth changing :)

Reviewed By: HarveyHunt

Differential Revision: D16598906

fbshipit-source-id: f6bec75d921f1dea3a9ea3441f57213f13aeb395
2019-08-02 03:43:16 -07:00
Thomas Orozco
cfa4c8332f mononoke/integration: disable blackhole for apiserver tests
Summary:
The network blackhole is causing the API server to occasionally hang while serving requests, which has broken some LFS tests. This appears to be have happened in the last month or so, but unfortunately, I haven't been able to root cause why this is happening.

From what I can tell, we have an hg client that tries an upload to the API Server, and uploads everything... and then the API server just hangs. If I kill the hg client, then the API server responds with a 400 (so it's not completely stuck), but otherwise it seems like the API server is waiting for something to happen on the client-side, but the client isn't sending that.

As far as I can tell, the API Server isn't actualy trying to make outbound requests (strace does report that it has a Scribe client that's trying to connect, but Scuba logging isn't enabled, and this is just trying to connect but not send anything), but something with the blackhole is causing this hg - API server interaciton to fail.

In the meantime, this diff disables the blackhole for those tests that definitely don't work when it's enabled ...

Reviewed By: HarveyHunt

Differential Revision: D16599929

fbshipit-source-id: c6d77c5428e206cd41d5466e20405264622158ab
2019-08-01 07:36:02 -07:00
Thomas Orozco
7ba44d737c mononoke: add filestore params to configuration
Summary: This updates our repo config to allow passing through Filestore params. This will be useful to conditionally enable Filestore chunking for new repos.

Reviewed By: HarveyHunt

Differential Revision: D16580700

fbshipit-source-id: b624bb524f0a939f9ce11f9c2983d49f91df855a
2019-07-31 11:48:18 -07:00
Thomas Orozco
9b90fd63cb mononoke/blobimport: allow running without re-creating store
Summary:
This allows running blobimport multiple times over the same path locally (with a blob files storage, for example), which is how we use it in prod (but there we don't use the file blobstore so it works).

This is helpful when playing around with local changes to blobimport.

Reviewed By: HarveyHunt

Differential Revision: D16580697

fbshipit-source-id: 4a62ff89542f67ce6396948c666244ef40ffe5e7
2019-07-31 11:42:36 -07:00
Thomas Orozco
ac19446c03 mononoke/filestore: dont boxify
Summary: This updates the Filestore to avoid boxifying in its chunking code. The upshot is that this gets us to a place where passing a Send Stream into the Filestore gives you a Send Future back, and passing in a non-Send Stream in the Filestore gives you a non-Send future back (I hinted at this earlier in the diff that introduced faster writes).

Reviewed By: aslpavel

Differential Revision: D16560768

fbshipit-source-id: b77766380f2eaed5919f78cef6fbc02afeead0b9
2019-07-31 05:19:42 -07:00
Thomas Orozco
6c07d8da97 mononoke/filestore: make reads faster
Summary: As noted earlier in this stack, the Filestore read implementation was very inefficient, since it required reading a Chunk before moving on to the next one. Since our Chunked files will typically have just one level of chunking, this is very inefficient (we could be fetching additional chunks ahead of time). This new implementation lets us take advantage of buffering, so we can load arbitrarily as many chunks in parallel as we'd like.

Reviewed By: aslpavel

Differential Revision: D16560767

fbshipit-source-id: c02c10c5de0fc5fdc3ee3897ae855b316ea34605
2019-07-31 05:19:42 -07:00
Thomas Orozco
1c6ca01a25 mononoke/filestore: make writes fast
Summary:
This updates the Filestore to make writes faster by farming out all hashing to
separate Tokio tasks. This lets us increase throughput of the Filestore
substantially, since we're no longer limited by the ability of a single core to
hash data.

On my dev server, when running on a 1MB file, this lets us improve the
throughput of the Filestore for writes from 36.50 MB/s (0.29 Gb/s) to 152.61
MB/s (1.19 Gb/s) when using a chunk size of 1MB and a concurrency level of 10
(i.e. 10 concurrent chunk uploads).

Note that the chunk size has a fairly limited impact on performance (e.g.
making 10KB instead has a <10% impact on performance).

Of course, this doesn't reflect performance when uploading to a remote
blobstore, but note that we can tune that by tweaking our upload concurrency
(making uploads faster at the expense of more memory).

 ---

Note that as part of this change, I updated the implementation away from stream
splitting, and into an implementation that fans out to Sinks. I actually had
implementation of a higher-performance filestore for both, but went with this
approach because it doesn't require the incoming Stream to be Send (and I have
a forthcoming diff to make the whole Filestore not require a Send input), which
will be useful when incorporating with the API Server, which unfortunately does
not provide us with a Send input.

Reviewed By: aslpavel

Differential Revision: D16560769

fbshipit-source-id: b2e414ea3b47cc4db17f82d982618bbd837f93a9
2019-07-31 05:19:42 -07:00
Thomas Orozco
9bc7076a6a mononoke/filestore: default to no chunking
Summary:
This is a very trivial patch simply intended to make it easier to safely roll
out the Filestore. With this, we can roll out the Filestore with chunking fully
disabled and make sure all hosts know how to read chunked data before we turn
it on.

Reviewed By: aslpavel, HarveyHunt

Differential Revision: D16560844

fbshipit-source-id: 30c49c27e839dfb06b417050c0fcde13296ddade
2019-07-31 05:19:42 -07:00
Thomas Orozco
6a9007e9e0 mononoke: add a filestore benchmark + configurable concurrency
Summary: This adds a filestore benchmark that allows for playing around with Filestore parameters and makes it easier to measure performance improvements.

Reviewed By: aslpavel

Differential Revision: D16559941

fbshipit-source-id: 50a4e91ad07bf6f9fc1efab14aa1ea6c81b9ca27
2019-07-31 05:19:41 -07:00
Thomas Orozco
b4a2d2f36a mononoke/apiserver: make DownloadLargeFile a streaming response
Summary: This updates the API server's DownloadLargeFile method to return a streaming response, instead of buffering all file contents in a Bytes blob.

Reviewed By: aslpavel

Differential Revision: D16494240

fbshipit-source-id: bdbece99215d87be6a65e67f8f2d920933109e15
2019-07-31 05:19:41 -07:00
Thomas Orozco
4e30164506 mononoke/blobimport: support LFS filenodes
Summary:
This adds support for LFS filenodes in blobimport. This works by passing a `--lfs-helper` argument, which should be an executable that can provide a LFS blob's contents on stdout when called with the OID and size of a LFS blob. My thinking is to `curl` directly from Dewey when running this in prod.

Note that, as of this change, we can blobimport LFS files, but doing so might be somewhat inefficient, since we'll roundtrip the blobs to our filestore, then generate filenodes. For now, however, I'd like to start with this so we can get a sense of whether this is acceptable performance-wise.

Reviewed By: farnz

Differential Revision: D16494241

fbshipit-source-id: 2ae032feb1530c558edf2cfbe967444a9a7c0d0f
2019-07-31 05:19:41 -07:00
Thomas Orozco
7c25a6010e mononoke: UploadHgFileContents: don't buffer contents to compute a Filenode ID
Summary: This update our UploadHgFileContents::ContentUploaded implementation to not require buffering file contents in order to produce a Mercurial Filenode ID.

Reviewed By: farnz

Differential Revision: D16457833

fbshipit-source-id: ce2c5577ffbe91dfd0de1cac7d85b8d90ded140e
2019-07-31 05:19:40 -07:00
Thomas Orozco
f9360cab9d mononoke/filestore: incorporate in Mononoke
Summary:
NOTE: This isn't 100% complete yet. I have a little more work to do around the aliasverify binary, but I think it'll make sense to rework this a little bit with the Filestore anyway.

This patch incorporates the Filestore throughout Mononoke. At this time, what this means is:

- Blobrepo methods return streams of `FileBytes`.
- Various callsites that need access to `FileBytes` call `concat2` on those streams.

This also eliminates the Sha256 aliasing code that we had written for LFS and replaces it with a Filestore-based implementation.

However, note that this does _not_ change how files submitted through `unbundle` are written to blobstores right now. Indeed, those contents are passed into the Filestore through `store_bytes`, which doesn't do chunking. This is intentional since it lets us use LFS uploads as a testbed for chunked storage before turning it on for everything else (also, chunking those requires further refactoring of content uploads, since right now they don't expect the `ContentId` to come back through a Future).

The goal of doing it this way is to make the transition simpler. In other words, this diff doesn't change anything functionally — it just updates the underlying API we use to access files. This is also important to get a smooth release: it we had new servers that started chunking things while old servers tried to read them, things would be bad. Doing it this way ensures that doesn't happen.

This means that streaming is there, but it's not being leveraged just yet. I'm planning to do so in a separate diff, starting with the LFS read and write endpoints in

Reviewed By: farnz

Differential Revision: D16440671

fbshipit-source-id: 02ae23783f38da895ee3052252fa6023b4a51979
2019-07-31 05:19:40 -07:00
Thomas Orozco
16801112a2 mononoke/filestore: add store_bytes for compatibility
Summary:
This adds a `store_bytes` call in the Filestore that can be used to store a set of Bytes *without chunking* and return a `FileContents` blob. This is intended as a transitional API while we incorporate the Filestore throughout the codebase. It's useful for 2 reasons:

- It lets us roll out chunked writes gradually where we need it. My goal is to use the Filestore chunking writes API for LFS, but keep using `store_bytes` initially in other places. This means content submitted through regular Mercurial bundles won't be chunked until we feel comfortable with chunking it.
- It lets us use the Filestore in places where we were relying on the assumption that you can immediately turn Bytes into a ContentId (notably: in the `UploadHgFileContents::RawBytes` code path).

This is intended to be removed later.

Reviewed By: aslpavel

Differential Revision: D16440670

fbshipit-source-id: e591f89bb876d08e6b6f805e35f0b791e61a6474
2019-07-31 05:19:40 -07:00
Thomas Orozco
4faaa6b2f7 mononoke/filestore: introduce peek()
Summary:
This adds a new `peek(.., N, ..)` call in the Filestore that allows reading at least N bytes from a file in the Filestore. This is helpful for generating Mercurial metadata blobs.

This is implemented using the same ChunkStream we use to write to the Filestore (but that needed a little fixing to support streams of empty bytes as well).

Reviewed By: aslpavel

Differential Revision: D16440672

fbshipit-source-id: a584099f87ab34e2151b9c3f5c9f1289575f024b
2019-07-31 05:19:39 -07:00
Thomas Orozco
6d675846bf mononoke/filestore: switch to functions (config is only needed in writes)
Summary: This updates the Filestore API to be a set of functions, instead of a Struct. The rationale here is that there is only one Filestore call that needs anything on top of a blobstore (the write call), so making those functions makes it much easier to incorporate Filestore in various places that need it without having to thread a Filestore all the way through.

Reviewed By: aslpavel

Differential Revision: D16440668

fbshipit-source-id: 4b4bc8872e205a66a12ec96a478f0f1811f2e6b1
2019-07-31 05:19:39 -07:00
Thomas Orozco
5e8148a968 mononoke: UploadHgFileContents: optimistically reuse HG filenodes
Summary:
This adds supporting for reusing Mercurial filenodes in
`UploadHgFileContents::execute`.

The motivation behind this is that given file contents, copy info, and parents,
the file node ID is deterministic, but computing it requires fetching and
hashing the body of the file. This implementation implements a lookup through
the blobstore to find a pre-computed filenode ID.

Doing so is in general a little inefficient (but it's not entirely certain that
the implementation I'm proposing here is faster -- more on this below), but
it's particularly problematic large files. Indeed, fetching a multiple-GB file
to recompute the filenode even if we received it from the client can be fairly
slow (and use up quite a bit of RAM, though that's something we can mitigate by
streaming file contents).

Once thing worth noting here (hence the RFC flag) is that there is a bit of a
potential for performance regression. Indeed, we could have a cache miss when
looking up the filenode ID, and then we'll have to fetch the file.

*At this time*, this is also somewhat inefficient, since we'll have to fetch
the file anyway to peek at its contents in order to generate metadata. This
is fixed later in this Filestore stack.

That said, an actual regression seems a little unlikely to happen since in
practice we'll write out the lookup entry when accepting a pushrebase
then do a lookup on it later when converting the pushrebased Bonsai changeset
to a Mercurial changeset).

If we're worried, then perhaps adding hit / miss stats on the lookup might make
sense. Let me know what you think.

 ---

Finally, there's a bit I don't love here, which is trusting LFS clients with the size of their uploads.
I'm thinking of fixing this when I finish the Filestore work.

Reviewed By: aslpavel

Differential Revision: D16345248

fbshipit-source-id: 6ce8a191efbb374ff8a1a185ce4b80dc237a536d
2019-07-31 05:19:39 -07:00
Thomas Orozco
386d2025fb mononoke/filestore: add invariant testing (lots of calls, simulated failures)
Summary: This adds invariant testing in the Filestore. Specifically, this tests for the fact that if you can read a file through any alias, you can read it through all aliases, and if you can read a file, you can also read its backmapping.

Reviewed By: StanislavGlebik

Differential Revision: D16440677

fbshipit-source-id: 737f736d99ec91bd6219d145380582341af755ae
2019-07-31 05:19:38 -07:00
Thomas Orozco
064b6a501c mononoke/filestore: rebuild metadata on read
Summary:
NOTE: This diff updates Thrift serialization for chunked files. Normally, this wouldn't be OK, but we never stored anything in this format, so that's fine.

This updates the Filestore to rebuild backmappings on read. The rationale for this is that since we store backmappings *after* writing FileContents, it is possible to have FileContents that logically exist but don't have backmappings. This should be a fairly exceptional case, but we can handle it by recomputing the missing backmappings on the fly.

As part of this change, I'm updating our Thrift serialization for chunked files and backmappings (as mentioned earlier, this is normally not a good idea, but it should be fine here):

- Chunked files now contain chunk lengths, which lets us derive offsets as well as total size for a chunked file.
- Backmappings don't contain the size anymore (since FileContents contains it now).

This was necessary because we need to have a file's size in order to recompute backmappings.

In this change, I also updated GitSha1 to stop embedding its blob type and length. This lets us reconstruct a strongly-typed GitSha1 object from just the hash (and was therefore necessary to avoid duplicating file size into backmappings), which seems sufficient at this stage (and if we do need the size, we can obtain it through the Filestore by querying the file).

Reviewed By: aslpavel

Differential Revision: D16440676

fbshipit-source-id: 23b66caf40fde2a2f756fef89af9fe0bb8bdadef
2019-07-31 05:19:38 -07:00
Thomas Orozco
c995b9f481 mononoke/filestore: Make ContentMetadata contents mandatory
Summary:
NOTE: This makes Thrift changes, but we never stored anything in this format.

This updates `ContentAliasBackmapping` to make all fields we are currently computing mandatory. We never stored anything using the old format, so this is safe to change for now.

In general, this makes code a little simpler, since we can rely on those fields to exist. Later on, if we add new backmappings, then we'll have to handle cases where they might not exist, but there's no reason to force this work upon ourselves for now if we don't need to.

Reviewed By: aslpavel

Differential Revision: D16440673

fbshipit-source-id: 6f5d0e4a687a2641a9b5e8e518859b796997e22c
2019-07-31 05:19:38 -07:00
Thomas Orozco
455ff23a1d mononoke/filestore: remove unused shared()
Summary: This removes unused calls to `shared()`. This lets us leverage the never type a little more and avoid having to spell out some `unreachable!`'s (instead, the compiler can prove them for us).

Reviewed By: StanislavGlebik

Differential Revision: D16440675

fbshipit-source-id: e53c496962ba9d3920dfae9953f6dc8778295509
2019-07-31 05:19:38 -07:00
Thomas Orozco
051000a6b8 mononoke/filestore: verify hashes before committing a store
Summary:
This adds support for verifying the hashes that were provided by writers (if any) when committing a Store. This lets writers do conditional writes (i.e. if the writer knows the Sha 256 of their content, then they can ask the blobstore to verify said Sha 256 when uploading).

Note that any uploaded chunks will not be cleaned up if a conditional write fails.

Reviewed By: StanislavGlebik

Differential Revision: D16440669

fbshipit-source-id: 88bc99e646616997a4e9d7e59d59315c18f47da9
2019-07-31 05:19:37 -07:00
Thomas Orozco
d7a318f6fc mononoke/filestore: use a strongly typed ExpectedSize to avoid confusing hints and real sizes
Summary: This updates the Filestore implementation to use a ExpectedSize type to differentiate sizes we know to be correct from sizes that were given to us by writers. This ensures we can't accidentally use the ExpectedSize we received from the writer when we should be using the effective observed size.

Reviewed By: aslpavel

Differential Revision: D16440674

fbshipit-source-id: 8bf03b9a962339ea2896f2f60d4a52417ca0327e
2019-07-31 05:19:37 -07:00
Thomas Orozco
ff10fff199 mononoke/filestore: introduce chunking support
Summary:
This adds support for chunking in the Filestore. To do so, this reworks writes to be a 2 stage process:

- First, you prepare your write. This puts everything into place, but doesn't upload the aliases, backmappings, and the logical FileContents blob representing your file. If you're uploading a file that fits in a single chunk, preparing your writes effectively makes no blobstore changes. If you're uploading chunks, then you upload the individual chunks (those are uploaded essentially as if they were small files).
- Then, you commit your write. At that point, prepared gave you `FileContents` that represent your write, and a set of aliases. To commit your write, you write the aliases, then the file contents, and then a backmapping.

Note that we never create hierarchies when writing to the Filestore (i.e. chunked files always reference concrete chunks that contain file data), but that's not a guarantee we can rely on when reading. Indeed, since chunks and files are identical as far as blobstore storage is concerned, we have to handle the case where a chunked file references chunks that are themselves chunked (as I mentioned, we won't write it that way, but it could happen if we uploaded a file, then later on reduced the chunk size and wrote an identical file).

Note that this diff also updates the FileContents enum and adds unimplemented methods there. These are all going away later in this stack (in the diff where I incorporate the Filestore into the rest of Mononoke).

Reviewed By: StanislavGlebik

Differential Revision: D16440678

fbshipit-source-id: 07187aa154f4fdcbd3b3faab7c0cbcb1f8a91217
2019-07-31 05:19:37 -07:00
Thomas Orozco
1e18736a6e mononoke/filestore: initial implementation of the FileStore API
Summary:
NOTE: This was Jeremy's original API design for the Filestore. I've left this diff mostly unchanged (since I didn't feel super comfortable commandeering a diff then substantially changing it!), but note that the implementations for reads and writes were largely updated in my next diff, so it's probably not necessary to review them here except for context.

This is a basic implementation of the API. While the API is expressed in terms
of streaming, there's no chunking yet, either in memory or in storage.

Questions:
- Should `store` accept the aliases, or should it only take `ContentId` and always recompute the aliases?
- Should it check the ContentId is correct? It could do so easily now since its walking the entire stream anyway, but it seems like an expensive API guarantee to make.

TODO:
- Interop testing with existing blobs
- Integrate into the rest of the code
- Spawn hashers into their own tasks for parallelism
- Chunking in later diff(s)

Reviewed By: StanislavGlebik

Differential Revision: D15795778

fbshipit-source-id: e56aad086ae9e3bba0227cf3c6206faac8c97f5e
2019-07-31 05:19:36 -07:00
Thomas Orozco
874e110c3d mononoke: move typed fetch to Blobstore Trait
Summary:
This moves typed fetches from the blobrepo to the blobstore. The upshot is that this allows consumers of a blobstore to do typed fetches, instead of forcing them to get bytes then cast bytes to a blob, then cast the blob to the thing they want.

This required refactoring our crate hierarchy a little bit by moving BlobstoreBytes into Mononoke Types, since we now need the Blobstore crate to depend on Mononoke types, whereas it was the other way around before (since BlobstoreValue was already there, that seems reasonable)

Reviewed By: StanislavGlebik

Differential Revision: D16486774

fbshipit-source-id: 05751986ce3cb7273d68a8b4ebe9957bb892bcd6
2019-07-31 05:19:36 -07:00
Thomas Orozco
6352d98a3e mononoke: don't require MononokeId impls to know how to recompute themselves from a value
Summary:
Currently, we require in the MononokeId trait that the Id know how to recompute itself from a value (and this has to return `Self, not `Result<Self>`, so it also can't fail).

This isn't ideal for a few reasons:

- It means we cannot support the MononokeId trait (and therefore typed blobstore fetches) for things that aren't purely a hash of their contents
- One workaround (which jsgf had implemented for ContentAliasBackmappingId) is to peek at the contents and embed an ID in there. A downside of this approach is that we end up parsing the content twice when loading a value from a blobstore (once to get the ID, and once to get the contents later).
- It's a little inefficient in general, since it means we recompute hashes of things we just fetched just to know what their hash should be (which we often proceed to immediately discard afterwards). This could be worth doing if we verified that the ID we got is the ID wee wanted, but we don't actually do this right now.

Reviewed By: StanislavGlebik

Differential Revision: D16486775

fbshipit-source-id: a75864eed3efa7e07b8bf642dbac3ada00cadc7c
2019-07-31 05:19:36 -07:00
Thomas Orozco
1871ddc473 mononoke/mononoke_types: Add a ContentAlias and ContentMetadata serialized types
Summary:
ContentAlias - the content of a blob mapping from an alias to the canonical id
ContentMetadata - metadata (aliases, size) for a piece of content

Reviewed By: StanislavGlebik

Differential Revision: D15404826

fbshipit-source-id: 7c5284a73caa873e7655858aa31f4817a1cb648b
2019-07-31 05:19:35 -07:00
Thomas Orozco
68b77f8fd1 mononoke/mononoke_types: add Sha1 and GitSha1 hash types
Summary:
These will be needed as object aliases, so put them on the same
footing as Sha256.

Refactor the implementation of these secondary hash functions so they are
implemented by macro.

Reviewed By: StanislavGlebik

Differential Revision: D15404825

fbshipit-source-id: 2ca65f96003e2b68875fad6f34e7c30d0dc6a8b1
2019-07-31 05:19:35 -07:00