sapling/eden/mononoke/gotham_ext
Thomas Orozco ce7f53422f mononoke/lfs_server: support the client not having the data it wants to send us
Summary:
This diff is probably going to sound weird ... but xavierd and I both think
this is the best approach for where we are right now. Here is why this is
necessary.

Consider the following scenario

- A client creates a LFS object. They upload it to Mononoke LFS, but not
  upstream.
- The client shares this (e.g. with Sandcastle), and includes a LFS pointer.
- The client tries to push this commit

When this happens, the client might not actually have the object locally.
Indeed, the only pieces of data the client is guaranteed to have is
locally-authored data.

Even if the client does have the blob, that's going to be in the hgcache, and
uploading from the hgcache is a bit sketchy (because, well, it's a cache, so
it's not like it's normally guaranteed to just hold data there for us to push
it to the server).

The problem boils down to a mismatch of assumptions between client and server:

- The client assumes that if the data wasn't locally authored, then the server
  must have it, and will never request this piece of data again.
- The server assumes that if the client offers a blob for upload, it can
  request this blob from the client (and the client will send it).

Those assumptions are obviously not compatible, since we can serve
not-locally-authored data from LFS and yet want the client to upload it, either
because it is missing in upstream or locally.

This leaves us with a few options:

- Upload from the hg cache. As noted above, this isn't desirable, because the
  data might not be there to begin with! Populating the cache on demand (from
  the server) just to push data back to the server would be quite messy.
- Skip the upload entirely, either by having the server not request the upload
  if the data is missing, by having the server report that the upload is
  optional, or by having the client not offer LFS blobs it doens't have to the
  server, or finally by having the client simply disobey the server if it
  doesn't have the data the server is asking for.

So, why can we not just skip the upload? The answer is: for the same reason we
upload to upstream to begin with. Consider the following scenario:

- Misconfigured client produces a commit, and upload it to upstream.
- Misconfigured client shares the commit with Sandcastle, and includes a LFS
  pointer.
- Sandcastle wants to push to master, so it goes to check if the blob is
  present in LFS. It isn't (Mononoke LFS checks both upstream and internal, and
  only finds the blob in upstream, so it requests that the client submit the
  blob), but it's also not not locally authored, so we skip the push.
- The client tries to push to Mononoke

This push will fail, because it'll reference LFS data that is not present in
Mononoke (it's only in upstream).

As for how we fix this: the key guarantee made by our proxying mechanism is
that if you write to either LFS server, your data is readable in both (the way
we do this is that if you write to Mononoke LFS, we write it to upstream too,
and if you write to upstream, we can read it from Mononoke LFS too).

What does not matter there is where the data came from. So, when the client
uploads, we simply let it submit a zero-length blob, and if so, we take that to
mean that the client doesn't think it authored data (and thinks we have it), so
we try to figure out where the blob is on the server side.

Reviewed By: xavierd

Differential Revision: D22192005

fbshipit-source-id: bf67e33e2b7114dfa26d356f373b407f2d00dc70
2020-06-24 10:02:01 -07:00
..
src mononoke/lfs_server: support the client not having the data it wants to send us 2020-06-24 10:02:01 -07:00
Cargo.toml eden: remove unused Rust dependencies 2020-06-17 17:55:03 -07:00