Commit Graph

58576 Commits

Author SHA1 Message Date
Xavier Deguillard
c1a6d6fd21 dirstate: paths in hg are unicode, not bytes
Summary:
This was causing `hg mv` to fail due to trying to hash a unicode path, but
Python3 refuses to hash anything but bytes.

Reviewed By: DurhamG

Differential Revision: D22235561

fbshipit-source-id: 3eb80b8e02d442a4036ab7be7ea5c139bd24ff5e
2020-06-25 15:48:00 -07:00
Jun Wu
66a0d4b42d treemanifest: avoid using revision numbers in repo.revs
Summary: This removes revnum warning when autopull happens.

Reviewed By: DurhamG

Differential Revision: D22200149

fbshipit-source-id: 1e002c4a8b263352e8b53f9144472ddaf9d915d4
2020-06-25 14:04:14 -07:00
Jun Wu
b1978f2ea5 revlogindex: use util::path::atomic_write_symlink to write nodemap
Summary:
The new `atomic_write_symlink` API handles platform weirdness especially on
Windows with mmap. Use it to avoid issues.

Reviewed By: DurhamG

Differential Revision: D22225317

fbshipit-source-id: c04a3948c30834e1025a541fc66b371654ed77e4
2020-06-25 13:56:12 -07:00
Jun Wu
5c66d921c6 util: implement a symlink-based atomic_write
Summary:
This diff aims to solve `atomic_write` issues on Windows. Namely:
- `tempfile` left overs if temp files are not deleted on Drop.
- `tempfile` does unnecessary `chmod`.
- For mmap-ed files, it has to be deleted before `atomic_write`, causing
  reader to have a chance to see inconsistent data.

This diff solves the above issues by:
- Use extra GC to clean up older files. Do not realy on successful `Drop`.
- Do not use `tempfile` and do not set permissions.
- Use a symlink so the symlink can still be atomic-replaced while the real
  content is being mmaped.

Reviewed By: DurhamG

Differential Revision: D22225039

fbshipit-source-id: d45bb198a53f8beeef71798cdb9ae57f9b4b8cd3
2020-06-25 13:56:12 -07:00
Jun Wu
570dbef1db util: add PathLock
Summary: RAII lock on a filesystem path.

Reviewed By: DurhamG

Differential Revision: D22226017

fbshipit-source-id: 4e26358bf0b6d4b2440fb77058032dbde8b6d02c
2020-06-25 13:56:12 -07:00
Simon Farnsworth
454de31134 Switch Loadable and Storable interfaces to new-style futures
Summary:
Eventually, we want everything to be `async`/`await`; as a stepping stone in that direction, switch some of the blobstore interfaces to new-style `BoxFuture` with a `'static` lifetime.

This does not enable any fixes at this point, but does mean that `.compat()` moves to the places that need old-style futures instead of new. It also means that the work needed to make the transition fully complete is changed from a full conversion to new futures, to simply changing the lifetimes involved and fixing the resulting compile failures.

Reviewed By: krallin

Differential Revision: D22164315

fbshipit-source-id: dc655c36db4711d84d42d1e81b76e5dddd16f59d
2020-06-25 08:45:37 -07:00
Carolyn Busch
9dd6f8f820 crecord: Fix --interactive mode for py3
Summary: Make crecord python 3 compatible by using bytes and floor division.

Reviewed By: quark-zju

Differential Revision: D22201151

fbshipit-source-id: b7a69aa9cfaa30c75d016f2e0d51f5b955fcc4c0
2020-06-25 08:29:44 -07:00
Mark Thomas
9a6ed4b6ca mutationstore: deal with history being extended backwards
Summary:
If the first client to send mutation data for a commit is only aware of partial
history for that commit, the primordial commit that is determined will be the
earliest of those commits.  If another client comes along later with a longer
history, the new set of commits will be assigned a different primordial commit.

Make sure that when this happens, we still fetch the full history.  We do this
by including the successor in the search-by-primordial case, which allows us
to join together disconnected histories at the cost of one extra round-trip to
the database.

Note that the fast path for addition of a single mutation will not fill in the
missing history.  This is an acceptable trade-off for the faster performance
in the usual case.

Reviewed By: mitrandir77

Differential Revision: D22206317

fbshipit-source-id: 49141d38844d6cddc543b6388f0c31dbc70dcbc5
2020-06-25 06:29:15 -07:00
Mark Thomas
0f229aff4c mutationstore: deal with cycles when determining primordial changesets
Summary:
By design, the mutation history of a commit should not have any cycles.  However,
synthetic entries created by backfilling obsmarkers may inadvertently create
erroneous cycles, which must be correctly ignored by the mutation store.

The mutation store is affected by cycles in two ways:

* Self-referential entries (created by backfilling "revive" obsmarkers) must
  be dropped early on, as these will overwrite any real mutation data for
  that successor.

* Larger cycles will prevent determination of the primordial commit for
  primordial optimization.  Here we drop all entries that are part of the cycle.
  These entries will not be shareable via the mutation store.

Note that it is still possible for cycles to form in the store if they are
added in multiple requests - the first request with a partial cycle will
allow determination of a primordial commit which is then used in subsequent
requests.  That's ok, as client-side cycle detection will break the cycle in
these entries.

As we move away from history that has been backfilled from obsmarkers, this
will become less of a concern, as cycles in pure mutation data are impossible
to create.

Reviewed By: mitrandir77

Differential Revision: D22206318

fbshipit-source-id: a57f30a19c482c7cde01cbd26deac53b7bb5973f
2020-06-25 06:29:15 -07:00
Stanislau Hlebik
b0e910655a mononoke: allow pushing only a single bookmark during push
Summary:
Push supported multiple bookmarks in theory, but in practice we never used it.
Since we want to start logging pushed commits in the next diffs we need to decide what to do with
bookmarks, since at the moment we can log only a single bookmark to scribe

let's just allow a single bookmark push

Reviewed By: farnz

Differential Revision: D22212674

fbshipit-source-id: 8191ee26337445ce2ef43adf1a6ded3e3832cc97
2020-06-25 05:51:30 -07:00
Stanislau Hlebik
a8209eb712 mononoke: pass PushParams to MononokeRepo
Summary:
In the next diffs it will be passed to unbundle processing so that we can use
scribe category to log pushed commits

Reviewed By: krallin

Differential Revision: D22212616

fbshipit-source-id: 17552bda11f102041a043f810125dc381e478611
2020-06-25 05:51:29 -07:00
Stanislau Hlebik
c3276b4c5e mononoke: sync configerator update
Reviewed By: krallin

Differential Revision: D22211347

fbshipit-source-id: 2bb2d02277d06462e4cda9347bfd8a2ae3fe7222
2020-06-25 05:51:29 -07:00
Mark Thomas
96a78404a4 rage: remove obsmarker data collection
Summary:
Remove data collection for obsmarker-related things:

* The obsstore size.

* The last 100 lines of `hg debugobsolete`.

* The unfiltered smartlog.  The data normally available here is replaced by the
  `hg debugmetalog` and `hg debugmutation` output.  This is also usually a very
  slow command.

Reviewed By: quark-zju

Differential Revision: D22207980

fbshipit-source-id: 4f7c0fe6571ad06ac331ced2540752c1937fb0eb
2020-06-25 05:46:11 -07:00
Thomas Orozco
ea734ae0af mononoke/repo_client: log perf counters for long running command
Summary: That was like 50% of the point of this change, and somehow I forgot to do it.

Reviewed By: farnz

Differential Revision: D22231923

fbshipit-source-id: 4a4daaeaa844acd219680907c0b5a5fdacdf535c
2020-06-25 04:13:22 -07:00
Kostia Balytskyi
016b101be9 xrepo: add CommitSyncerArgs
Summary:
Similarly to how we have `PushRedirectorArgs`, we need `CommitSyncerArgs`: a struct, which a long-living process can own and periodically create a real `CommitSyncer` out of it, by consuming freshly reloaded `CommitSyncConfig`.

It is a little unfortunate that I am introducing yet another struct to `commit_rewriting/cross_repo_sync`, as it's already pretty confusing with `CommitSyncer` and `CommitSyncRepos`, but hopefully `CommitSyncerArgs`'s purpose is simple enough that it can be inferred from the name. Note that this struct does have a few convenience methods, as we need to access things like `target_repo` and various repo ids before we even create a real `CommitSyncer`. This makes it's purpose a little less singular, but still fine IMO.

Reviewed By: StanislavGlebik

Differential Revision: D22197123

fbshipit-source-id: e2d993e186075e33acec00200d2aab10fb893ffd
2020-06-25 03:28:08 -07:00
Kostia Balytskyi
7be4b2ee1c backsyncer: get rid of backsync_many
Summary:
This fn is not used anywhere except tests, and its only difference from
`backsync_all_latest` is in the fact that it accepts a limit. So let's rename
`backsync_all_latest` into `backsync_latest` and make it accept a limit arg.

I decided to use a custom enum instead of `Option` so that people don't have to
open fn definition to understand what `BacksyncLimit::Limit(2)` or
`BacksyncLimit::NoLimit` mean.

Reviewed By: StanislavGlebik

Differential Revision: D22187118

fbshipit-source-id: 6bd97bd6e6f3776e46c6031f775739ca6788ec8c
2020-06-25 03:28:08 -07:00
Kostia Balytskyi
8c50e0d870 unbundle: use live_commit_sync_config for push redirection
Summary:
This diff enables `unbundle` flow to start creating `push_redirector` structs from hot-reloaded `CommitSyncConfig` (by using the `LiveCommitSyncConfig` struct).

Using `LiveCommitSyncConfig` unfortunately means that we need to make sure those tests, which don't use standard fixtures, need to have both the `.toml` and the `.json` commit sync configs present, which is a little verbose. But it's not too horrible.

Reviewed By: StanislavGlebik

Differential Revision: D21962960

fbshipit-source-id: d355210b5dac50d1b3ad277f99af5bab56c9b62e
2020-06-25 03:28:08 -07:00
Kostia Balytskyi
ed34e343c5 commmit_rewriting: introduce live_commit_sync_config
Summary:
`LiveCommitSyncConfig` is intended to be a fundamental struct, on which live push-redirection and commit sync config for push-redurector, x-repo sync job, backsyncer, commit and bookmark validators are based.

The struct wraps a few `ConfigStore` handles, which allows it to query latest values every time one of the public methods is called. Callers receive parsed structs/values (`true`/`false` for push redirection config, `CommitSyncConfig` for the rest), which they later need to use to build things like `Mover`, `BookmarkRenamer`, `CommitSyncer`, `CommitRepos` and so on. For now the idea is to rebuild these derived structs every time, but we can later add a memoization layer, if the overhead is going to be large.

Reviewed By: StanislavGlebik

Differential Revision: D22095975

fbshipit-source-id: 58e1f1d8effe921b0dc264fffa785593ef188665
2020-06-25 03:28:08 -07:00
Jun Wu
cde14847ee localrepo: remove left-over .tmp files in svfs
Summary:
It seems the `tempfile` crate sometimes fails to delete temporary files.
Workaround it by scanning and deleting them on Windows.
Add logging so we can know when to remove the bandaid.

Reviewed By: xavierd

Differential Revision: D22222339

fbshipit-source-id: c322134a1e3425294d85578f4649ca75a0e18a76
2020-06-24 23:13:29 -07:00
svcscm
02a0a394aa Updating submodules
Summary:
GitHub commits:

0f064f1117
7265e4e2e8
7dc2620790
ad5cb3c7ea

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: d25d10361574a04296f7f4be2f1a53c93ec5686d
2020-06-24 23:13:29 -07:00
svcscm
f192a2bb6b Updating submodules
Summary:
GitHub commits:

8a3d5b9ce6
be41c61f22

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 28f99fc16e2fc5349f52c5696f6fc1739dea4c4b
2020-06-24 20:40:18 -07:00
Xavier Deguillard
d6ca62a1d7 win32: when removing file, open it as O_TEMPORARY as a last effort
Summary:
When a file is mmap'ed, removing it will always fail, even with all the rename
magic. The only option that works is to ask the OS to remove the file when
there is no other file handles to it. In Python, we can use the O_TEMPORARY for
that.

Reviewed By: quark-zju

Differential Revision: D22224572

fbshipit-source-id: bee564a3006c8389f506633da5622aa7a27421ac
2020-06-24 20:34:47 -07:00
Xavier Deguillard
a935fc38b4 inodes: fix casing issue on Windows
Summary:
On Windows, paths are case insensitive (but the filesystem is case preserving),
and thus `open("FILE.TXT")` and `open("file.txt")` refer to the same file. When
that file is not materialized and its parent directory isn't yet enumerated,
PrjFS will call the PRJ_GET_PLACEHOLDER_INFO_CB with the file name passed in to
the `open` call. In this callback, if the passed in name refers to a valid
file, it needs to call PrjWritePlaceholderInfo to populate the directory entry.
Here is what the documentation for that function states:

"For example, if the PRJ_GET_PLACEHOLDER_INFO_CB callback specifies
dir1\dir1\FILE.TXT in callbackData->FilePathName, and the provider's backing
store contains a file called File.txt in the dir1\dir2 directory, and
PrjFileNameCompare returns 0 when comparing the names FILE.TXT and
File.txt, then the provider specifies dir1\dir2\File.txt as the value of
this parameter."

While the documentation doesn't state how that name is used internally, we can
infer (and test) that the returned case will be used as the canonical
representation of that file, ie: the one that a directory listing will see.

Since the PathMap code already does a case insensitive search, we just need to
make sure to use what it returns instead of re-using the name used for the search.

The only caveat to all of this is the original comment that describe that
`metadata.name` can't be used as it causes crashes. From what I can tell, this
was written in later 2018, and I believe is no longer relevant: the
`metadata.name` field was simply not populated.

Reviewed By: wez

Differential Revision: D21799627

fbshipit-source-id: aee877cc2d5f057944fcd39b1d59f0e97de6315c
2020-06-24 18:59:16 -07:00
svcscm
5a0a867e79 Updating submodules
Summary:
GitHub commits:

1a1a47353f
9cc25190e1
e110cdec69
4dadf624f0

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: bdd8a5950b36bb8cfe635bde5fccbf192cf518d6
2020-06-24 18:59:16 -07:00
svcscm
ea5c81cb94 Updating submodules
Summary:
GitHub commits:

78f5a09090
7888ca216e

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 05e54cafa9cbc6d1b5cc4240f2945adbe6750ae2
2020-06-24 16:17:31 -07:00
Stefan Filip
cd49247df6 hgext: remove the churn extension
Summary: Not that useful and does not align with the direction we are headed.

Reviewed By: quark-zju

Differential Revision: D22213796

fbshipit-source-id: ffd86fc1a9207c134448836d0e54e48510a11135
2020-06-24 16:11:39 -07:00
Ailin Zhang
dde49c1c44 add a new column FUSE FETCH to eden top to display fetchCounts
Summary:
updated `eden top` to:
- obtain PID-fetchCounts data from the updated -`getAccessCounts` thrift call in the previous diff
- display that data in a new column `FUSE FETCH`

Reviewed By: kmancini

Differential Revision: D22101430

fbshipit-source-id: 6584e71ce3a4629c73469607ca0a4c6ffd63e46f
2020-06-24 15:56:14 -07:00
Kostia Balytskyi
fbf1564559 config: add commit_sync validation to its Convert impl
Summary:
This diff does three things:
- moves existing `CommitSyncConfig` validation from `config.rs` into
  `convert/commit_sync.rs`, so that any user of `impl Convert for
  RawCommitSyncConfig` gets it for free
- adds another thing to validate `CommitSyncConfig` against (large repo is one
  of the small repos)
- adds `RawCommitSyncConfig` validation for something that can be lost when
  looking at `CommitSyncConfig` (no duplicate small repos).

Reviewed By: markbt

Differential Revision: D22211897

fbshipit-source-id: a9820cc8baf427da66ce7dfc943e25eb67e1fd6e
2020-06-24 15:45:59 -07:00
Ailin Zhang
c33f8ba494 add PID-fetchCount map data to the result of getAccessCounts
Summary:
This diff updates `getAccessCounts` to
- obtain the PID-fetchCount map data added in the previous diff
- put that data into its `result`

Reviewed By: kmancini

Differential Revision: D22101232

fbshipit-source-id: 1d41715339d418b03c17f6c93a7a497b432973ae
2020-06-24 15:36:51 -07:00
Stefan Filip
f77fedfd80 python3: fix test-hghave.t
Summary: Binary vs string difference for compression input.

Reviewed By: quark-zju

Differential Revision: D22212333

fbshipit-source-id: ffbaab337ef4f1b4518a508814808ba7cbda690a
2020-06-24 15:31:37 -07:00
Xavier Deguillard
bd26254f79 eden: fix windows build
Summary:
Both optional and pid_t weren't found and the right includes needed to be
provided. On Windows, the ProcessNameCache isn't compiled (yet), and since it
looks like the process name is optional in the BackingStoreLogger, let's not
provide it for now.

Reviewed By: fanzeyi

Differential Revision: D22215581

fbshipit-source-id: 31a7e7be62cd3d14108dc437d3dfabfb9e62f8d5
2020-06-24 15:12:47 -07:00
svcscm
0a151203c0 Updating submodules
Summary:
GitHub commits:

11256f0b16
df45ceb830
a4cfa8c75e
50d6969816

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 626cbdfe1e5d871cbc5ef81bf83c870dabf8fe6d
2020-06-24 14:39:03 -07:00
Durham Goode
81ec0a4138 py3: fix double invocations of commands when spawning sshpeers
Summary:
The sshaskpass extension forks the current process and attempts to do
some tty magic. If there was an exception though, the exception could go up the
stack and trigger the resumption of the pull logic, resulting in pull executing
twice.

The fix is to move the `_silentexception(terminate=True)` context to wrap the
entire child process, so we're guaranteed to exit in all cases.

I also fixed the str-vs-byte issue that caused the original exception.

Reviewed By: xavierd

Differential Revision: D22211476

fbshipit-source-id: 5f5ca6b33b425e517650f9a83cab605a4c9783de
2020-06-24 14:16:36 -07:00
svcscm
1d8151b758 Updating submodules
Summary:
GitHub commits:

bf9d2d931c
295b1db585
6a80873ddb
aeb4ecf8fb
4353384581
e2e050f00c
92b5ace907
efb32d5979
be29934154
dff4e40c5d
bdcceddb12
d8205f4b6e

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: d3c35813006182bf1ad36980a7d5190156b33538
2020-06-24 13:51:31 -07:00
Meyer Jacobs
08533c2b4f Back out "rage: open subprocesses in text mode"
Summary:
Original commit changeset: d91cee7909f4

This change broke hg rage, see T69004770.

Reviewed By: ikostia, xavierd

Differential Revision: D22210509

fbshipit-source-id: 79c68a41d198e770b1077da7365078d5c1653829
2020-06-24 13:16:56 -07:00
Lukas Piatkowski
14f7dd70e4 Re-sync with internal repository 2020-06-24 21:35:50 +02:00
svcscm
38deb2b0a6 Updating submodules
Summary:
GitHub commits:

132b03b682
80ea0dd98d
c98c0d6515
4485ef0c25
797c70ce2a
58f098e37f
83a4dd1a67
c66438f37b
8584c5cec2
12ae21f1fc
d733ca6bfe
f7c9277f97
a456e2aec9
93e4b6ba82

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 557daa08468f47ceb2f60f99fe6642e9a3d7cad5
2020-06-24 12:29:14 -07:00
Jun Wu
b4d3329bef win32text: remove the extension
Summary: The extension was deprecated by the eol extension.

Reviewed By: DurhamG

Differential Revision: D22129826

fbshipit-source-id: 293a57b4039f424154955454e0a7a74dc7d23069
2020-06-24 11:51:27 -07:00
Xavier Deguillard
5aaebd6e2c eden: decode the manifest body before using it
Summary:
The revision is passed down as a 40-bytes ascii string, and therefore it needs
decoding before being usable.

Reviewed By: DurhamG

Differential Revision: D22210203

fbshipit-source-id: b84bca5f89cbe4f267de1281c1a9ed55409174d2
2020-06-24 11:20:35 -07:00
Xavier Deguillard
136463ca75 eden: fix binary stdin/stdout
Summary:
In python3, sys.std{in,out} are opened in text mode, but when actual fds are
passed in, we fdopen them in bytes mode. Since the code expects these to be
bytes, let's fdopen them in bytes mode too.

Reviewed By: DurhamG

Differential Revision: D22196541

fbshipit-source-id: b913267918af1c3bdf819e243a384312a2a27df0
2020-06-24 11:20:35 -07:00
Viet Hung Nguyen
ebd041b0ec mononoke/tests: modified paths to absolute
Summary: When running integration tests we should make the paths absolute, but kept it relative so far. This results it breaking the tests.

Reviewed By: krallin

Differential Revision: D22209498

fbshipit-source-id: 54ca3def84abf313db32aecfac503c7a42ed6576
2020-06-24 11:17:07 -07:00
Chad Austin
24a1dfc13a define an exit code for abandoned transactions
Summary:
So automation can detect abandoned transactions and run `hg recover`,
define an exit code. None of sysexit.h seemed to fit this case, so
start with 90.

Reviewed By: quark-zju

Differential Revision: D22198980

fbshipit-source-id: 5f267a2671c843f350668daaa14de34752244d4b
2020-06-24 10:55:50 -07:00
Shrikrishna Khare
d4f7e156c7 fbcode_builder: getdeps: Update OpenNSA to 6.5.19
Summary: 6.5.19 is now available, switch OSS to pick that instead of old 6.5.17.

Reviewed By: rsunkad

Differential Revision: D22199286

fbshipit-source-id: 231346df8d2f918d2226cfe17b01bde12c18a5a7
2020-06-24 10:51:38 -07:00
svcscm
d0b64777e7 Updating submodules
Summary:
GitHub commits:

0c5ddd8cde
1ac36af79f
3f48ab21b3
d0c01e8758
0c17da58ca
4560da98db
842c0f4a1a
e958a833a6
ed4fe5c108
a0f4da5ab3
6a619ec621
c41ab8ee68
2b2ab6a018

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 071bf217dab0dad6073a0d88bab15d0576a8831a
2020-06-24 10:12:23 -07:00
Thomas Orozco
76606260c2 mononoke/lfs_server: automatically consume HTTP response bodies when dropped
Summary:
If we don't read the body for a response, then Hyper cannot return the
connection to the pool. So, let's do it automatically upon dropping. This will
typically happen when we send a request to upstream then don't read the
response.

I seem to remember this used to work fine at some point, but looking at the
code I think it's actually broken now and we don't reuse upstream connections
if we skip waiting for upstream in a batch request. So, let's fix it once and
for all with a more robust abstraction.

Reviewed By: HarveyHunt

Differential Revision: D22206742

fbshipit-source-id: 2da1c008556e1d964c1cc337d58f06f8d691a916
2020-06-24 10:02:02 -07:00
Thomas Orozco
b60ff4403f mononoke/lfs_server: clean up a bit of spawning code
Summary:
This was old Tokio 0.1 code that needed channels for spawns, but in 0.2 that
actually is built-in to tokio::spawn, so let's use this.

Reviewed By: HarveyHunt

Differential Revision: D22206738

fbshipit-source-id: 8f89ca4f7afc8dd888fe289e8d597148976cc54c
2020-06-24 10:02:01 -07:00
Thomas Orozco
e6d8747347 mononoke/lfs_server: don't require reading data streams to drop them
Summary:
This fixes a bit of a tech debt item in the LFS Server. We've had this
discard_stream functon for a while, which was necessary because if you just
drop the data stream, you get an error on the sending end.

This makes the code more complex than it needs to be, since you need to always
explicitly discard data streams you don't want instead of just dropping them.

This fixes that by letting us support a sender that tolerates the receiver
being closed, and just ignores those errors.

Reviewed By: HarveyHunt

Differential Revision: D22206739

fbshipit-source-id: d209679b20a3724bcd2e082ebd0d2ce10e9ac481
2020-06-24 10:02:01 -07:00
Thomas Orozco
7f48790fb4 mononoke/lfs_server: refactor upload to make it easier to unit test
Summary:
We have a lot of integration tests for LFS, but a handful of unit tests don't
hurt for some simpler changes. Let's make it easier to write those.

Reviewed By: HarveyHunt

Differential Revision: D22206741

fbshipit-source-id: abcb73b35c01f28dd54cc543cd0a746327d3787b
2020-06-24 10:02:01 -07:00
Thomas Orozco
ce7f53422f mononoke/lfs_server: support the client not having the data it wants to send us
Summary:
This diff is probably going to sound weird ... but xavierd and I both think
this is the best approach for where we are right now. Here is why this is
necessary.

Consider the following scenario

- A client creates a LFS object. They upload it to Mononoke LFS, but not
  upstream.
- The client shares this (e.g. with Sandcastle), and includes a LFS pointer.
- The client tries to push this commit

When this happens, the client might not actually have the object locally.
Indeed, the only pieces of data the client is guaranteed to have is
locally-authored data.

Even if the client does have the blob, that's going to be in the hgcache, and
uploading from the hgcache is a bit sketchy (because, well, it's a cache, so
it's not like it's normally guaranteed to just hold data there for us to push
it to the server).

The problem boils down to a mismatch of assumptions between client and server:

- The client assumes that if the data wasn't locally authored, then the server
  must have it, and will never request this piece of data again.
- The server assumes that if the client offers a blob for upload, it can
  request this blob from the client (and the client will send it).

Those assumptions are obviously not compatible, since we can serve
not-locally-authored data from LFS and yet want the client to upload it, either
because it is missing in upstream or locally.

This leaves us with a few options:

- Upload from the hg cache. As noted above, this isn't desirable, because the
  data might not be there to begin with! Populating the cache on demand (from
  the server) just to push data back to the server would be quite messy.
- Skip the upload entirely, either by having the server not request the upload
  if the data is missing, by having the server report that the upload is
  optional, or by having the client not offer LFS blobs it doens't have to the
  server, or finally by having the client simply disobey the server if it
  doesn't have the data the server is asking for.

So, why can we not just skip the upload? The answer is: for the same reason we
upload to upstream to begin with. Consider the following scenario:

- Misconfigured client produces a commit, and upload it to upstream.
- Misconfigured client shares the commit with Sandcastle, and includes a LFS
  pointer.
- Sandcastle wants to push to master, so it goes to check if the blob is
  present in LFS. It isn't (Mononoke LFS checks both upstream and internal, and
  only finds the blob in upstream, so it requests that the client submit the
  blob), but it's also not not locally authored, so we skip the push.
- The client tries to push to Mononoke

This push will fail, because it'll reference LFS data that is not present in
Mononoke (it's only in upstream).

As for how we fix this: the key guarantee made by our proxying mechanism is
that if you write to either LFS server, your data is readable in both (the way
we do this is that if you write to Mononoke LFS, we write it to upstream too,
and if you write to upstream, we can read it from Mononoke LFS too).

What does not matter there is where the data came from. So, when the client
uploads, we simply let it submit a zero-length blob, and if so, we take that to
mean that the client doesn't think it authored data (and thinks we have it), so
we try to figure out where the blob is on the server side.

Reviewed By: xavierd

Differential Revision: D22192005

fbshipit-source-id: bf67e33e2b7114dfa26d356f373b407f2d00dc70
2020-06-24 10:02:01 -07:00
Xavier Deguillard
ecee8f404a remotefilelog: properly decode the meta-flag
Summary:
In Python3, array indexing into a byte string returns a int, not a string.
Let's instead use the struct module to extract a byte string out of it that we
can then decode afterwards.

Reviewed By: DurhamG

Differential Revision: D22097226

fbshipit-source-id: e6b306b4d3bcf2ba08422296603b56fcadbb636e
2020-06-24 09:41:23 -07:00