sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-11 09:17:30 +03:00

History

Thomas Orozco ce7f53422f mononoke/lfs_server: support the client not having the data it wants to send us Summary: This diff is probably going to sound weird ... but xavierd and I both think this is the best approach for where we are right now. Here is why this is necessary. Consider the following scenario - A client creates a LFS object. They upload it to Mononoke LFS, but not upstream. - The client shares this (e.g. with Sandcastle), and includes a LFS pointer. - The client tries to push this commit When this happens, the client might not actually have the object locally. Indeed, the only pieces of data the client is guaranteed to have is locally-authored data. Even if the client does have the blob, that's going to be in the hgcache, and uploading from the hgcache is a bit sketchy (because, well, it's a cache, so it's not like it's normally guaranteed to just hold data there for us to push it to the server). The problem boils down to a mismatch of assumptions between client and server: - The client assumes that if the data wasn't locally authored, then the server must have it, and will never request this piece of data again. - The server assumes that if the client offers a blob for upload, it can request this blob from the client (and the client will send it). Those assumptions are obviously not compatible, since we can serve not-locally-authored data from LFS and yet want the client to upload it, either because it is missing in upstream or locally. This leaves us with a few options: - Upload from the hg cache. As noted above, this isn't desirable, because the data might not be there to begin with! Populating the cache on demand (from the server) just to push data back to the server would be quite messy. - Skip the upload entirely, either by having the server not request the upload if the data is missing, by having the server report that the upload is optional, or by having the client not offer LFS blobs it doens't have to the server, or finally by having the client simply disobey the server if it doesn't have the data the server is asking for. So, why can we not just skip the upload? The answer is: for the same reason we upload to upstream to begin with. Consider the following scenario: - Misconfigured client produces a commit, and upload it to upstream. - Misconfigured client shares the commit with Sandcastle, and includes a LFS pointer. - Sandcastle wants to push to master, so it goes to check if the blob is present in LFS. It isn't (Mononoke LFS checks both upstream and internal, and only finds the blob in upstream, so it requests that the client submit the blob), but it's also not not locally authored, so we skip the push. - The client tries to push to Mononoke This push will fail, because it'll reference LFS data that is not present in Mononoke (it's only in upstream). As for how we fix this: the key guarantee made by our proxying mechanism is that if you write to either LFS server, your data is readable in both (the way we do this is that if you write to Mononoke LFS, we write it to upstream too, and if you write to upstream, we can read it from Mononoke LFS too). What does not matter there is where the data came from. So, when the client uploads, we simply let it submit a zero-length blob, and if so, we take that to mean that the client doesn't think it authored data (and thinks we have it), so we try to figure out where the blob is on the server side. Reviewed By: xavierd Differential Revision: D22192005 fbshipit-source-id: bf67e33e2b7114dfa26d356f373b407f2d00dc70	2020-06-24 10:02:01 -07:00
..
src	mononoke/lfs_server: support the client not having the data it wants to send us	2020-06-24 10:02:01 -07:00
Cargo.toml	eden: remove unused Rust dependencies	2020-06-17 17:55:03 -07:00

Thomas Orozco ce7f53422f mononoke/lfs_server: support the client not having the data it wants to send us

Summary:
This diff is probably going to sound weird ... but xavierd and I both think
this is the best approach for where we are right now. Here is why this is
necessary.

Consider the following scenario

- A client creates a LFS object. They upload it to Mononoke LFS, but not
  upstream.
- The client shares this (e.g. with Sandcastle), and includes a LFS pointer.
- The client tries to push this commit

When this happens, the client might not actually have the object locally.
Indeed, the only pieces of data the client is guaranteed to have is
locally-authored data.

Even if the client does have the blob, that's going to be in the hgcache, and
uploading from the hgcache is a bit sketchy (because, well, it's a cache, so
it's not like it's normally guaranteed to just hold data there for us to push
it to the server).

The problem boils down to a mismatch of assumptions between client and server:

- The client assumes that if the data wasn't locally authored, then the server
  must have it, and will never request this piece of data again.
- The server assumes that if the client offers a blob for upload, it can
  request this blob from the client (and the client will send it).

Those assumptions are obviously not compatible, since we can serve
not-locally-authored data from LFS and yet want the client to upload it, either
because it is missing in upstream or locally.

This leaves us with a few options:

- Upload from the hg cache. As noted above, this isn't desirable, because the
  data might not be there to begin with! Populating the cache on demand (from
  the server) just to push data back to the server would be quite messy.
- Skip the upload entirely, either by having the server not request the upload
  if the data is missing, by having the server report that the upload is
  optional, or by having the client not offer LFS blobs it doens't have to the
  server, or finally by having the client simply disobey the server if it
  doesn't have the data the server is asking for.

So, why can we not just skip the upload? The answer is: for the same reason we
upload to upstream to begin with. Consider the following scenario:

- Misconfigured client produces a commit, and upload it to upstream.
- Misconfigured client shares the commit with Sandcastle, and includes a LFS
  pointer.
- Sandcastle wants to push to master, so it goes to check if the blob is
  present in LFS. It isn't (Mononoke LFS checks both upstream and internal, and
  only finds the blob in upstream, so it requests that the client submit the
  blob), but it's also not not locally authored, so we skip the push.
- The client tries to push to Mononoke

This push will fail, because it'll reference LFS data that is not present in
Mononoke (it's only in upstream).

As for how we fix this: the key guarantee made by our proxying mechanism is
that if you write to either LFS server, your data is readable in both (the way
we do this is that if you write to Mononoke LFS, we write it to upstream too,
and if you write to upstream, we can read it from Mononoke LFS too).

What does not matter there is where the data came from. So, when the client
uploads, we simply let it submit a zero-length blob, and if so, we take that to
mean that the client doesn't think it authored data (and thinks we have it), so
we try to figure out where the blob is on the server side.

Reviewed By: xavierd

Differential Revision: D22192005

fbshipit-source-id: bf67e33e2b7114dfa26d356f373b407f2d00dc70

2020-06-24 10:02:01 -07:00

src

mononoke/lfs_server: support the client not having the data it wants to send us

2020-06-24 10:02:01 -07:00

Cargo.toml

eden: remove unused Rust dependencies

2020-06-17 17:55:03 -07:00