Summary:
The Rust bindings handle the cross-platform differences and avoids issues
with Python / Rust interaction. Use it.
As we're here, extend the API to support cwd and env.
Reviewed By: DurhamG
Differential Revision: D23124171
fbshipit-source-id: fdc13f6eaeb25c05b53d385eb220af33dad984e1
Summary:
Spawning processes turns out to be tricky.
Python 2:
- "fork & exec" in plain Python is potentially dangerous. See D22855986 (c35b8088ef).
Disabling GC might have solved it, but still seems fragile.
- "close_fds=True" works on Windows if there is no redirection.
- Does not work well with `disable_standard_handle_inheritability` from `hgmain`.
We patched it. See `contrib/python2-winbuild/0002-windows-make-subprocess-work-with-non-inheritable-st.patch`.
Python 3:
- "subprocess" uses native code for "fork & exec". It's safer.
- (>= 3.8) "close_fds=True" works on Windows even with redirection.
- "subprocess" exposes options to tweak low-level details on Windows.
Rust:
- No "close_fds=True" support for both Windows and Unix.
- Does not have the `disable_standard_handle_inheritability` issue on Windows.
- Impossible to cleanly support "close_fds=True" on Windows with existing stdlib.
https://github.com/rust-lang/rust/pull/75551 attempts to add that to stdlib.
D23124167 provides a short-term solution that can have corner cases.
Mercurial:
- `win32.spawndetached` uses raw Win32 APIs to spawn processes, bypassing
the `subprocess` Python stdlib.
- Its use of `CreateProcessA` is undesirable. We probably want `CreateProcessW`
(unless `CreateProcessA` speaks utf-8 natively).
We are still on Python 2 on Windows, and we'd need to spawn processes correctly
from Rust anyway, and D23124167 kind of fills the missing feature of `close_fds=True`
from Python. So let's expose the Rust APIs.
The binding APIs closely match the Rust API. So when we migrate from Python to
Rust, the translation is more straightforward.
Reviewed By: DurhamG
Differential Revision: D23124168
fbshipit-source-id: 94a404f19326e9b4cca7661da07a4b4c55bcc395
Summary:
The Rust upstream took the "set F_CLOEXEC on every opened file" approach and
provided no support for closing fds at spawn time to make spawn lightweight [1].
However, that does not play well in our case:
- On Windows:
- stdin/stdout/stderr are not created by Rust, and inheritable by
default (other process like `cargo`, or `dotslash` might leak them too).
- a few other handles like "Null", "Afd" are inheritable. It's
unclear how they get created, though.
- Fortunately, files opened by Python or C in edenscm (ex. packfiles) seem to
be not inheritable and do not require special handling.
- On Linux:
- Files opened by Python or C are likely lack of F_CLOEXEC and need special
handling.
Implement logic to close file handlers (or set F_CLOEXEC) explicitly.
[1]: https://github.com/rust-lang/rust/issues/12148
Reviewed By: DurhamG
Differential Revision: D23124167
fbshipit-source-id: 32f3a1b9e3ae3a9475609df282151c9d6c4badd4
Summary:
It uses `sys.argv`, which might be rewritten by `debugshell`. Capture
`sys.argv` to make hgcmd more reliable.
Reviewed By: DurhamG
Differential Revision: D22993215
fbshipit-source-id: 5fa319e8023b656c6cdf96cb3229ea9f2c9b9b99
Summary: This allows us to run commands after changes were made to the repo.
Reviewed By: DurhamG
Differential Revision: D22993218
fbshipit-source-id: d9943dcda94da42970fb9107f48f4caa14b6a9d4
Summary:
Some code paths (ex. metalog.commit) use `util.timer()` as a way to get
seconds since epoch, and get 0 for tests. Other use-cases of `util.timer()`
are ad-hoc time measure for displaying speed / progress. They do not need high
precision or strong guarantee that the clock does not go backwards. Drop the
`time.perf_counter()` to meet the first use-case's expectation.
Reviewed By: singhsrb
Differential Revision: D23431253
fbshipit-source-id: 8bf2d1ed32e284e17285742e1d0fd7178f181fb3
Summary:
With segments backend, the revision numbers will be longer than commit hashes
and are confusing.
Reviewed By: DurhamG
Differential Revision: D23408971
fbshipit-source-id: e2057fa644fc7b6be4291f879eee3235bb4e687b
Summary:
Pulling from older repos (ex. years ago) could require GBs of commit text data.
Flush commit data if they exceed certain size.
This is for revlog compatibility.
In the future we probably just make commit text lazy to avoid this kind of issues.
Reviewed By: DurhamG
Differential Revision: D23408834
fbshipit-source-id: 273384f5a05be07877bb1c9871c17b53ba436233
Summary: This would be used to avoid excessive memory usage during pull.
Reviewed By: DurhamG
Differential Revision: D23408833
fbshipit-source-id: 8edd95ab8201697074f65cc118d14755a230567d
Summary:
`addcommits` is designed to be more efficiently if called with a batch of
commits. So let's buffer the commits to add then only call it once.
This avoids some N^2 behaviors, for example, the NameDag internally will
prepare "snapshot" of itself which involves coping the pending Rust vecs
about the segments and id <-> hash map.
The change makes `pull` usable from unusably slow:
Original Python Revlog backend:
```
In [1]: %trace repo.pull(bookmarknames=['master'],quiet=False)
5191 +466 | Apply Changegroup edenscm.mercurial.bundle2 line 516
| - Commits = 125 :
| - Range = a1d1b3ade136:2e3fe78af189 :
5191 +466 | changegroup.cg1unpacker.apply edenscm.mercurial.changegroup line 313
5192 +416 | Progress Bar: commits (progressbar)
5192 +415 | changelog.changelog.addgroup edenscm.mercurial.changelog line 536
5192 +409 | revlog.revlog.addgroup edenscm.mercurial.revlog line 2116
5215 +371 | changelog.changelog._addrevision (125 times) edenscm.mercurial.changelog line 558
```
DoubleWrite (Segments + Revlog) backend, Before:
```
In [2]: %trace repo.pull(bookmarknames=['master'],quiet=False)
2396 +154059 | Apply Changegroup edenscm.mercurial.bundle2 line 516
| - Commits = 323 :
| - Range = cb0b100180ba:5fb57c74f72e :
2396 +154059 | changegroup.cg1unpacker.apply edenscm.mercurial.changegroup line 313
2397 +151433 \ Progress Bar: commits (progressbar)
2397 +151433 | changelog2.changelog.addgroup edenscm.mercurial.changelog2 line 334
```
DoubleWrite (Segments + Revlog) backend, After:
```
In [2]: %trace repo.pull(bookmarknames=['master'],quiet=False)
4629 +512 | Apply Changegroup edenscm.mercurial.bundle2 line 516
| - Commits = 45 :
| - Range = cf23c6972934:1ff0c5f0e7ad :
4629 +512 | changegroup.cg1unpacker.apply edenscm.mercurial.changegroup line 313
4630 +494 | changelog2.changelog.addgroup edenscm.mercurial.changelog2 line 334
```
Reviewed By: DurhamG
Differential Revision: D23390435
fbshipit-source-id: dd97a5008dedd844d4134b87bfef190fa739a80b
Summary:
The users of addrevisoncb are gone.
This also removes the "alwayscache" parameter of "_addrevision".
Reviewed By: DurhamG
Differential Revision: D23390437
fbshipit-source-id: 7edd9dd0b93d4cb9d4f35d088a1aef719b450ec1
Summary: It is about legacy revlog formats that are no longer relevant.
Reviewed By: DurhamG
Differential Revision: D23390436
fbshipit-source-id: 58c2c432804181bcc6517d6c988777b843fc9ba4
Summary:
We have a few safeguards against creating full checkouts. However we have
sparse profiles that are not full, but that include very large directories
which normally should not be included.
This diff adds a logic that checks if a new sparse profile has any of the "marker"
files i.e. some files from a folder that should not be included. Operation
aborts if that the case, however there's always a way to workaround that.
Reviewed By: DurhamG
Differential Revision: D23414200
fbshipit-source-id: 626f392319eb1be8b35f39cadafb61f3c1dfefe3
Summary:
Scuba logging that tracks undesired file fetches has some blind spots i.e. a
lot of fetches have null pid and null cmd line. This diff tries to fix another
part of the problem.
TreeInode::getOrLoadChild() has TODO `pass a fetch context down through
getOrLoadChild to track this load`. This diff fixes this TODO, and also starts
to pass context from EdenDispatcher:lookup method.
Note that it adds quite a lot of new `ObjectFetchContext::getNullContext()`
calls, and potentially those might be responsible for blind spots in logging.
I'll try to address this problem in the next diffs.
Reviewed By: kmancini
Differential Revision: D23418218
fbshipit-source-id: 319d7436494d8dce3580289aae9963aa13bfc191
Summary:
Scuba logging that tracks undesired file fetches has some blind spots i.e. a
lot of fetches have null pid and null cmd line. This diff fixes at least part
of the problem.
TreePrefetchContext which is used from TreePrefetchLease didn't logged client
pid at all (in fact, it logged almost nothing). This diff fixes at least one blind spot, however it doesn't look like this is the only one.
Reviewed By: kmancini
Differential Revision: D23417451
fbshipit-source-id: 107884e94c6b40de999328ec2ef78fe22174c1ca
Summary: `legacy.py` depends on other files in this directory, so lets add them all to the link tree
Reviewed By: fanzeyi
Differential Revision: D23356917
fbshipit-source-id: e4bfd82ebbd9d143a5454a43bb47e8dd55b4485f
Summary: These tests fail locally since adding these checks in eden doctor, so we need to mock them.
Reviewed By: chadaustin
Differential Revision: D23326597
fbshipit-source-id: 87a0e6ab0472e3ae56f89503e928c5a00a16ab04
Summary:
"hg diff" has --sparse option which diffs only files inside a sparse checkout.
The problem is that it doesn't work on eden checkouts because eden repo doesn't
have sparsematch() function.
This diff makes it so that if sparsematch() function doesn't exist then
--sparse option is just ignored.
The motivation for this change is
https://fb.workplace.com/groups/corehg/?post_id=687768245151742. There are some
diff calls that are triggered by arc lint that race with "hg update" and might download
loads of data on people's laptops. This diff doesn't fix the race, but it:
1) Makes sure we don't download too much data that are not in sparse profiles.
2) arc lint doesn't care about files outside of sparse profiles anyway, so
running --sparse make sense.
Reviewed By: DurhamG
Differential Revision: D23396918
fbshipit-source-id: 2a386fdbeab85187e2c2acab69cb86b74124d46f
Summary:
Most of the fixes are pretty trivial as the code was using functions not
present on Windows, either work around them, or switch to ones that are
multi-platform.
Of note, it looks like `hg doctor` doesn't properly detect when Mercurial and
EdenFS are out of sync, disabling the tests until we figure out why.
Reviewed By: genevievehelsel, fanzeyi
Differential Revision: D23409708
fbshipit-source-id: 3314c197d43364dda13891a6874caab4c29e76ca
Summary:
This is practically just 0 in our production setup during `pull`s. In the
future when the commit data become lazy, it's no longer possible to read the
files locally. So let's just don't scan the commits.
Reviewed By: DurhamG
Differential Revision: D23390438
fbshipit-source-id: 4c54c4aac5fd840205296ab86955ec1b8ab76607
Summary:
Mergedrivers can call dirstate.add directly and are adding paths with
"." and "..". Let's block those paths.
Reviewed By: quark-zju
Differential Revision: D23375469
fbshipit-source-id: 64e9f20169cfd50325ecd8ebcc1dd3be7a5cb202
Summary:
extdiff uses shutil.rmtree which calls os.rmdir with new python 3
options. Since we pathc os.rmdir, we need to support those options.
Reviewed By: quark-zju
Differential Revision: D23350968
fbshipit-source-id: 081d179dcd67b51ffdeb6b85899adf4e574a8d0f
Summary: Similar to D18528858 so module names do not need to be spelled twice.
Reviewed By: markbt
Differential Revision: D23091380
fbshipit-source-id: a2a261abc9c78c8805cea62b38498ba65398796d
Summary: This crate would fail to build without the "fb" feature because `serde_json` was listed as an optional dependency (but is used in a way that isn't conditional on the `fb` feature). This diff makes the dependency non-optional, and also silences several dead code warnings that are emitted when building without the "fb" feature.
Reviewed By: quark-zju
Differential Revision: D23386786
fbshipit-source-id: b00a8b0b8b0b978c1cfab2838629fcb388a076e9
Summary:
The `debugfsync` command calls fsync on newly modified files in svfs.
Right now it only includes locations that we know have constant number
of files.
The fsync logic is put in a separate crate to avoid slow compiles.
Reviewed By: DurhamG
Differential Revision: D23124169
fbshipit-source-id: 438296002eed14db599d6ec225183bf824096940
Summary:
A warning means that every tree fetched will be printed in the edenfs log,
which is way too much. Let's decrease this to a debug message.
Reviewed By: genevievehelsel
Differential Revision: D23385778
fbshipit-source-id: d77f1cac3efb945d4b95750822f2f12f48c75ffe
Summary: `len(repo)` can no longer predicate the next rev number. Use nodes instead.
Reviewed By: DurhamG
Differential Revision: D23307791
fbshipit-source-id: cc20e53f039eee2a714748352e8e98aab253095a
Summary:
Some functions might be called very frequently. For example,
`phases.phasecache.loadphaserevs` might be called 100k+ times.
That makes the tracing data harder to process.
Limit the count of spans to 1k by default so the data is cheaper to process,
and some highly repetitive cases can now be reasoned about. Note the limit
is only put on static Span Ids. If a span uses dynamic metadata or ask for
different Span Ids each time, they will not be limited.
In debugshell,
td = %trace repo.revs('smartlog()')
len(td.serialize())
dropped from 6MB to 0.87MB.
It's also possible to reason about:
td = %trace len(repo.revs('ancestors(.)'))
in debugshell (taking 30s, 98KB serialized, vs 21s without tracing), while
previously the result would be too large to show (`%trace` just hangs).
Reviewed By: DurhamG
Differential Revision: D23307793
fbshipit-source-id: 3c1e9885ce7a275c2abd8935a4e4539a4f14ce83