Summary:
This was causing `hg mv` to fail due to trying to hash a unicode path, but
Python3 refuses to hash anything but bytes.
Reviewed By: DurhamG
Differential Revision: D22235561
fbshipit-source-id: 3eb80b8e02d442a4036ab7be7ea5c139bd24ff5e
Summary:
The new `atomic_write_symlink` API handles platform weirdness especially on
Windows with mmap. Use it to avoid issues.
Reviewed By: DurhamG
Differential Revision: D22225317
fbshipit-source-id: c04a3948c30834e1025a541fc66b371654ed77e4
Summary:
This diff aims to solve `atomic_write` issues on Windows. Namely:
- `tempfile` left overs if temp files are not deleted on Drop.
- `tempfile` does unnecessary `chmod`.
- For mmap-ed files, it has to be deleted before `atomic_write`, causing
reader to have a chance to see inconsistent data.
This diff solves the above issues by:
- Use extra GC to clean up older files. Do not realy on successful `Drop`.
- Do not use `tempfile` and do not set permissions.
- Use a symlink so the symlink can still be atomic-replaced while the real
content is being mmaped.
Reviewed By: DurhamG
Differential Revision: D22225039
fbshipit-source-id: d45bb198a53f8beeef71798cdb9ae57f9b4b8cd3
Summary:
Eventually, we want everything to be `async`/`await`; as a stepping stone in that direction, switch some of the blobstore interfaces to new-style `BoxFuture` with a `'static` lifetime.
This does not enable any fixes at this point, but does mean that `.compat()` moves to the places that need old-style futures instead of new. It also means that the work needed to make the transition fully complete is changed from a full conversion to new futures, to simply changing the lifetimes involved and fixing the resulting compile failures.
Reviewed By: krallin
Differential Revision: D22164315
fbshipit-source-id: dc655c36db4711d84d42d1e81b76e5dddd16f59d
Summary: Make crecord python 3 compatible by using bytes and floor division.
Reviewed By: quark-zju
Differential Revision: D22201151
fbshipit-source-id: b7a69aa9cfaa30c75d016f2e0d51f5b955fcc4c0
Summary:
If the first client to send mutation data for a commit is only aware of partial
history for that commit, the primordial commit that is determined will be the
earliest of those commits. If another client comes along later with a longer
history, the new set of commits will be assigned a different primordial commit.
Make sure that when this happens, we still fetch the full history. We do this
by including the successor in the search-by-primordial case, which allows us
to join together disconnected histories at the cost of one extra round-trip to
the database.
Note that the fast path for addition of a single mutation will not fill in the
missing history. This is an acceptable trade-off for the faster performance
in the usual case.
Reviewed By: mitrandir77
Differential Revision: D22206317
fbshipit-source-id: 49141d38844d6cddc543b6388f0c31dbc70dcbc5
Summary:
By design, the mutation history of a commit should not have any cycles. However,
synthetic entries created by backfilling obsmarkers may inadvertently create
erroneous cycles, which must be correctly ignored by the mutation store.
The mutation store is affected by cycles in two ways:
* Self-referential entries (created by backfilling "revive" obsmarkers) must
be dropped early on, as these will overwrite any real mutation data for
that successor.
* Larger cycles will prevent determination of the primordial commit for
primordial optimization. Here we drop all entries that are part of the cycle.
These entries will not be shareable via the mutation store.
Note that it is still possible for cycles to form in the store if they are
added in multiple requests - the first request with a partial cycle will
allow determination of a primordial commit which is then used in subsequent
requests. That's ok, as client-side cycle detection will break the cycle in
these entries.
As we move away from history that has been backfilled from obsmarkers, this
will become less of a concern, as cycles in pure mutation data are impossible
to create.
Reviewed By: mitrandir77
Differential Revision: D22206318
fbshipit-source-id: a57f30a19c482c7cde01cbd26deac53b7bb5973f
Summary:
Push supported multiple bookmarks in theory, but in practice we never used it.
Since we want to start logging pushed commits in the next diffs we need to decide what to do with
bookmarks, since at the moment we can log only a single bookmark to scribe
let's just allow a single bookmark push
Reviewed By: farnz
Differential Revision: D22212674
fbshipit-source-id: 8191ee26337445ce2ef43adf1a6ded3e3832cc97
Summary:
In the next diffs it will be passed to unbundle processing so that we can use
scribe category to log pushed commits
Reviewed By: krallin
Differential Revision: D22212616
fbshipit-source-id: 17552bda11f102041a043f810125dc381e478611
Summary:
Remove data collection for obsmarker-related things:
* The obsstore size.
* The last 100 lines of `hg debugobsolete`.
* The unfiltered smartlog. The data normally available here is replaced by the
`hg debugmetalog` and `hg debugmutation` output. This is also usually a very
slow command.
Reviewed By: quark-zju
Differential Revision: D22207980
fbshipit-source-id: 4f7c0fe6571ad06ac331ced2540752c1937fb0eb
Summary: That was like 50% of the point of this change, and somehow I forgot to do it.
Reviewed By: farnz
Differential Revision: D22231923
fbshipit-source-id: 4a4daaeaa844acd219680907c0b5a5fdacdf535c
Summary:
Similarly to how we have `PushRedirectorArgs`, we need `CommitSyncerArgs`: a struct, which a long-living process can own and periodically create a real `CommitSyncer` out of it, by consuming freshly reloaded `CommitSyncConfig`.
It is a little unfortunate that I am introducing yet another struct to `commit_rewriting/cross_repo_sync`, as it's already pretty confusing with `CommitSyncer` and `CommitSyncRepos`, but hopefully `CommitSyncerArgs`'s purpose is simple enough that it can be inferred from the name. Note that this struct does have a few convenience methods, as we need to access things like `target_repo` and various repo ids before we even create a real `CommitSyncer`. This makes it's purpose a little less singular, but still fine IMO.
Reviewed By: StanislavGlebik
Differential Revision: D22197123
fbshipit-source-id: e2d993e186075e33acec00200d2aab10fb893ffd
Summary:
This fn is not used anywhere except tests, and its only difference from
`backsync_all_latest` is in the fact that it accepts a limit. So let's rename
`backsync_all_latest` into `backsync_latest` and make it accept a limit arg.
I decided to use a custom enum instead of `Option` so that people don't have to
open fn definition to understand what `BacksyncLimit::Limit(2)` or
`BacksyncLimit::NoLimit` mean.
Reviewed By: StanislavGlebik
Differential Revision: D22187118
fbshipit-source-id: 6bd97bd6e6f3776e46c6031f775739ca6788ec8c
Summary:
This diff enables `unbundle` flow to start creating `push_redirector` structs from hot-reloaded `CommitSyncConfig` (by using the `LiveCommitSyncConfig` struct).
Using `LiveCommitSyncConfig` unfortunately means that we need to make sure those tests, which don't use standard fixtures, need to have both the `.toml` and the `.json` commit sync configs present, which is a little verbose. But it's not too horrible.
Reviewed By: StanislavGlebik
Differential Revision: D21962960
fbshipit-source-id: d355210b5dac50d1b3ad277f99af5bab56c9b62e
Summary:
`LiveCommitSyncConfig` is intended to be a fundamental struct, on which live push-redirection and commit sync config for push-redurector, x-repo sync job, backsyncer, commit and bookmark validators are based.
The struct wraps a few `ConfigStore` handles, which allows it to query latest values every time one of the public methods is called. Callers receive parsed structs/values (`true`/`false` for push redirection config, `CommitSyncConfig` for the rest), which they later need to use to build things like `Mover`, `BookmarkRenamer`, `CommitSyncer`, `CommitRepos` and so on. For now the idea is to rebuild these derived structs every time, but we can later add a memoization layer, if the overhead is going to be large.
Reviewed By: StanislavGlebik
Differential Revision: D22095975
fbshipit-source-id: 58e1f1d8effe921b0dc264fffa785593ef188665
Summary:
It seems the `tempfile` crate sometimes fails to delete temporary files.
Workaround it by scanning and deleting them on Windows.
Add logging so we can know when to remove the bandaid.
Reviewed By: xavierd
Differential Revision: D22222339
fbshipit-source-id: c322134a1e3425294d85578f4649ca75a0e18a76
Summary:
When a file is mmap'ed, removing it will always fail, even with all the rename
magic. The only option that works is to ask the OS to remove the file when
there is no other file handles to it. In Python, we can use the O_TEMPORARY for
that.
Reviewed By: quark-zju
Differential Revision: D22224572
fbshipit-source-id: bee564a3006c8389f506633da5622aa7a27421ac
Summary:
On Windows, paths are case insensitive (but the filesystem is case preserving),
and thus `open("FILE.TXT")` and `open("file.txt")` refer to the same file. When
that file is not materialized and its parent directory isn't yet enumerated,
PrjFS will call the PRJ_GET_PLACEHOLDER_INFO_CB with the file name passed in to
the `open` call. In this callback, if the passed in name refers to a valid
file, it needs to call PrjWritePlaceholderInfo to populate the directory entry.
Here is what the documentation for that function states:
"For example, if the PRJ_GET_PLACEHOLDER_INFO_CB callback specifies
dir1\dir1\FILE.TXT in callbackData->FilePathName, and the provider's backing
store contains a file called File.txt in the dir1\dir2 directory, and
PrjFileNameCompare returns 0 when comparing the names FILE.TXT and
File.txt, then the provider specifies dir1\dir2\File.txt as the value of
this parameter."
While the documentation doesn't state how that name is used internally, we can
infer (and test) that the returned case will be used as the canonical
representation of that file, ie: the one that a directory listing will see.
Since the PathMap code already does a case insensitive search, we just need to
make sure to use what it returns instead of re-using the name used for the search.
The only caveat to all of this is the original comment that describe that
`metadata.name` can't be used as it causes crashes. From what I can tell, this
was written in later 2018, and I believe is no longer relevant: the
`metadata.name` field was simply not populated.
Reviewed By: wez
Differential Revision: D21799627
fbshipit-source-id: aee877cc2d5f057944fcd39b1d59f0e97de6315c
Summary: Not that useful and does not align with the direction we are headed.
Reviewed By: quark-zju
Differential Revision: D22213796
fbshipit-source-id: ffd86fc1a9207c134448836d0e54e48510a11135
Summary:
updated `eden top` to:
- obtain PID-fetchCounts data from the updated -`getAccessCounts` thrift call in the previous diff
- display that data in a new column `FUSE FETCH`
Reviewed By: kmancini
Differential Revision: D22101430
fbshipit-source-id: 6584e71ce3a4629c73469607ca0a4c6ffd63e46f
Summary:
This diff does three things:
- moves existing `CommitSyncConfig` validation from `config.rs` into
`convert/commit_sync.rs`, so that any user of `impl Convert for
RawCommitSyncConfig` gets it for free
- adds another thing to validate `CommitSyncConfig` against (large repo is one
of the small repos)
- adds `RawCommitSyncConfig` validation for something that can be lost when
looking at `CommitSyncConfig` (no duplicate small repos).
Reviewed By: markbt
Differential Revision: D22211897
fbshipit-source-id: a9820cc8baf427da66ce7dfc943e25eb67e1fd6e
Summary:
This diff updates `getAccessCounts` to
- obtain the PID-fetchCount map data added in the previous diff
- put that data into its `result`
Reviewed By: kmancini
Differential Revision: D22101232
fbshipit-source-id: 1d41715339d418b03c17f6c93a7a497b432973ae
Summary:
Both optional and pid_t weren't found and the right includes needed to be
provided. On Windows, the ProcessNameCache isn't compiled (yet), and since it
looks like the process name is optional in the BackingStoreLogger, let's not
provide it for now.
Reviewed By: fanzeyi
Differential Revision: D22215581
fbshipit-source-id: 31a7e7be62cd3d14108dc437d3dfabfb9e62f8d5
Summary:
The sshaskpass extension forks the current process and attempts to do
some tty magic. If there was an exception though, the exception could go up the
stack and trigger the resumption of the pull logic, resulting in pull executing
twice.
The fix is to move the `_silentexception(terminate=True)` context to wrap the
entire child process, so we're guaranteed to exit in all cases.
I also fixed the str-vs-byte issue that caused the original exception.
Reviewed By: xavierd
Differential Revision: D22211476
fbshipit-source-id: 5f5ca6b33b425e517650f9a83cab605a4c9783de
Summary: The extension was deprecated by the eol extension.
Reviewed By: DurhamG
Differential Revision: D22129826
fbshipit-source-id: 293a57b4039f424154955454e0a7a74dc7d23069
Summary:
The revision is passed down as a 40-bytes ascii string, and therefore it needs
decoding before being usable.
Reviewed By: DurhamG
Differential Revision: D22210203
fbshipit-source-id: b84bca5f89cbe4f267de1281c1a9ed55409174d2
Summary:
In python3, sys.std{in,out} are opened in text mode, but when actual fds are
passed in, we fdopen them in bytes mode. Since the code expects these to be
bytes, let's fdopen them in bytes mode too.
Reviewed By: DurhamG
Differential Revision: D22196541
fbshipit-source-id: b913267918af1c3bdf819e243a384312a2a27df0
Summary: When running integration tests we should make the paths absolute, but kept it relative so far. This results it breaking the tests.
Reviewed By: krallin
Differential Revision: D22209498
fbshipit-source-id: 54ca3def84abf313db32aecfac503c7a42ed6576
Summary:
So automation can detect abandoned transactions and run `hg recover`,
define an exit code. None of sysexit.h seemed to fit this case, so
start with 90.
Reviewed By: quark-zju
Differential Revision: D22198980
fbshipit-source-id: 5f267a2671c843f350668daaa14de34752244d4b
Summary: 6.5.19 is now available, switch OSS to pick that instead of old 6.5.17.
Reviewed By: rsunkad
Differential Revision: D22199286
fbshipit-source-id: 231346df8d2f918d2226cfe17b01bde12c18a5a7
Summary:
If we don't read the body for a response, then Hyper cannot return the
connection to the pool. So, let's do it automatically upon dropping. This will
typically happen when we send a request to upstream then don't read the
response.
I seem to remember this used to work fine at some point, but looking at the
code I think it's actually broken now and we don't reuse upstream connections
if we skip waiting for upstream in a batch request. So, let's fix it once and
for all with a more robust abstraction.
Reviewed By: HarveyHunt
Differential Revision: D22206742
fbshipit-source-id: 2da1c008556e1d964c1cc337d58f06f8d691a916
Summary:
This was old Tokio 0.1 code that needed channels for spawns, but in 0.2 that
actually is built-in to tokio::spawn, so let's use this.
Reviewed By: HarveyHunt
Differential Revision: D22206738
fbshipit-source-id: 8f89ca4f7afc8dd888fe289e8d597148976cc54c
Summary:
This fixes a bit of a tech debt item in the LFS Server. We've had this
discard_stream functon for a while, which was necessary because if you just
drop the data stream, you get an error on the sending end.
This makes the code more complex than it needs to be, since you need to always
explicitly discard data streams you don't want instead of just dropping them.
This fixes that by letting us support a sender that tolerates the receiver
being closed, and just ignores those errors.
Reviewed By: HarveyHunt
Differential Revision: D22206739
fbshipit-source-id: d209679b20a3724bcd2e082ebd0d2ce10e9ac481
Summary:
We have a lot of integration tests for LFS, but a handful of unit tests don't
hurt for some simpler changes. Let's make it easier to write those.
Reviewed By: HarveyHunt
Differential Revision: D22206741
fbshipit-source-id: abcb73b35c01f28dd54cc543cd0a746327d3787b
Summary:
This diff is probably going to sound weird ... but xavierd and I both think
this is the best approach for where we are right now. Here is why this is
necessary.
Consider the following scenario
- A client creates a LFS object. They upload it to Mononoke LFS, but not
upstream.
- The client shares this (e.g. with Sandcastle), and includes a LFS pointer.
- The client tries to push this commit
When this happens, the client might not actually have the object locally.
Indeed, the only pieces of data the client is guaranteed to have is
locally-authored data.
Even if the client does have the blob, that's going to be in the hgcache, and
uploading from the hgcache is a bit sketchy (because, well, it's a cache, so
it's not like it's normally guaranteed to just hold data there for us to push
it to the server).
The problem boils down to a mismatch of assumptions between client and server:
- The client assumes that if the data wasn't locally authored, then the server
must have it, and will never request this piece of data again.
- The server assumes that if the client offers a blob for upload, it can
request this blob from the client (and the client will send it).
Those assumptions are obviously not compatible, since we can serve
not-locally-authored data from LFS and yet want the client to upload it, either
because it is missing in upstream or locally.
This leaves us with a few options:
- Upload from the hg cache. As noted above, this isn't desirable, because the
data might not be there to begin with! Populating the cache on demand (from
the server) just to push data back to the server would be quite messy.
- Skip the upload entirely, either by having the server not request the upload
if the data is missing, by having the server report that the upload is
optional, or by having the client not offer LFS blobs it doens't have to the
server, or finally by having the client simply disobey the server if it
doesn't have the data the server is asking for.
So, why can we not just skip the upload? The answer is: for the same reason we
upload to upstream to begin with. Consider the following scenario:
- Misconfigured client produces a commit, and upload it to upstream.
- Misconfigured client shares the commit with Sandcastle, and includes a LFS
pointer.
- Sandcastle wants to push to master, so it goes to check if the blob is
present in LFS. It isn't (Mononoke LFS checks both upstream and internal, and
only finds the blob in upstream, so it requests that the client submit the
blob), but it's also not not locally authored, so we skip the push.
- The client tries to push to Mononoke
This push will fail, because it'll reference LFS data that is not present in
Mononoke (it's only in upstream).
As for how we fix this: the key guarantee made by our proxying mechanism is
that if you write to either LFS server, your data is readable in both (the way
we do this is that if you write to Mononoke LFS, we write it to upstream too,
and if you write to upstream, we can read it from Mononoke LFS too).
What does not matter there is where the data came from. So, when the client
uploads, we simply let it submit a zero-length blob, and if so, we take that to
mean that the client doesn't think it authored data (and thinks we have it), so
we try to figure out where the blob is on the server side.
Reviewed By: xavierd
Differential Revision: D22192005
fbshipit-source-id: bf67e33e2b7114dfa26d356f373b407f2d00dc70
Summary:
In Python3, array indexing into a byte string returns a int, not a string.
Let's instead use the struct module to extract a byte string out of it that we
can then decode afterwards.
Reviewed By: DurhamG
Differential Revision: D22097226
fbshipit-source-id: e6b306b4d3bcf2ba08422296603b56fcadbb636e