Summary:
The state entry of a dirstate tuple is a single character. In python 3
it's a unicode string. To parse it, previously we used 'C' which takes a single
character unicode string and (little did I know) returns an int. We were storing
this in a char, which causes corruption.
Let's switch to reading the string, and just grabbing the first byte.
Reviewed By: xavierd
Differential Revision: D20143094
fbshipit-source-id: d9946c0cefdafe0941f4bdac070659fac27f30e3
Summary:
This new method returns the content of a blob without the copy-from metadata
header.
Reviewed By: DurhamG
Differential Revision: D20102889
fbshipit-source-id: e96f636b7d30460b59707a2cb700d667e616116a
Summary:
Python json produces unicode strings in the parsed results. This breaks
when passed to parts of the code that now assert that byte strings are required
(like the wire protocol). Let's switch phabricator stuff to use Mercurial json,
which produces bytes in Python 2 and unicode in Python 3.
Reviewed By: ikostia
Differential Revision: D20123140
fbshipit-source-id: d1b11426736a0f43ff7e74acf709ab1fd70d5bfe
Summary:
The NameSet is something similar to SpanSet and Mercurial's smartset but speaks
VertexNames instead of Ids. The idea is, NameSet will be part of NameDag APIs,
and potentially replace Mercurial's smartset layer (just smartset the container
types, not the revset language), in a way that revision numbers are completely
hidden behind the scenes.
This diff adds some basic abstraction around iteration-related operations.
Other operations will be added later.
Reviewed By: sfilipco
Differential Revision: D19912109
fbshipit-source-id: 504a26c074282ec51f260535ca63e943124f688e
Summary: EdenFS is planning on throwing an error if a user requests a checkout while a checkout is already in progress. Often, this is already disallowed by a mercurial repository lock, but there are instances where these calls can still get through. We would like to disallow these calls to queue, so we will throw an `EdenError` instead. Without this handling, a full stack trace prints, so this just makes it a bit prettier for the user.
Reviewed By: simpkins
Differential Revision: D20106480
fbshipit-source-id: e33df3d0b7aa42867ee752e4c1f3a47b31ade76b
Summary:
The ssh output order issue is a large contributor to test flakiness.
Example test failures are:
```
--- test-unbundlereplay.t
+++ test-unbundlereplay.t.respondfully.err
@@ -154,9 +154,9 @@
remote: [ReplayVerification] Expected: (master_bookmark, c2e526aacb5100b7c1ddb9b711d2e012e6c
69cda). Actual: (master_bookmark, 893d83f11bf81ce2b895a93d51638d4049d56ce2)
remote: pushkey-abort: prepushkey hook exited with status 1
remote: transaction abort!
+ replay failed: error:pushkey
+ unbundle replay batch item #0 failed
remote: rollback completed
- replay failed: error:pushkey
- unbundle replay batch item #0 failed
[1]
$ cat $TESTTMP/reports.txt
unbundle replay batch item #0 failed
--- test-commitcloud-backup-all.t
+++ test-commitcloud-backup-all.t.err
@@ -59,9 +59,9 @@
remote: pushing 1 commit:
remote: eccc11f58a56 D3
backing up stack rooted at 42952ab62cec
+ backing up stack rooted at 4903fdffd9c6
remote: pushing 1 commit:
remote: 42952ab62cec E1
- backing up stack rooted at 4903fdffd9c6
remote: pushing 1 commit:
remote: 4903fdffd9c6 E2
commitcloud: backed up 8 commits
test-fb-hgext-lfspushrebase-verify-blobs.t
--- test-fb-hgext-treemanifest-pushrebase.t
+++ test-fb-hgext-treemanifest-pushrebase.t.err
@@ -127,9 +127,9 @@
$ hg push --to master -B master --config treemanifest.sendtrees=True
pushing to ssh://user@dummy/master
searching for changes
- remote: baz
remote: prepushrebase.cat hook exited with status 1
abort: push failed on remote
+ remote: baz
[255]
- Disable the hook
```
The order is nondeterministic because the stderr reading thread can read the
content before or after ui.write or ui.write_err in the main thread.
This diff introduces an optional feature in dummyssh that buffers all stderr
output and only write them after the wrapped hg serve process has exited, at
which time the hg client should also have completed its operations and has no
reason to ui.write or ui.write_err anything nondeterministically. Then the
dummyssh wrapper writes out the buffered stderr so the output order becomes
well defined.
Reviewed By: xavierd
Differential Revision: D19872612
fbshipit-source-id: 84710f98a8e6b4a1c283ffecf008585cca12be0a
Summary: This makes the next change easier to see.
Reviewed By: xavierd
Differential Revision: D19872609
fbshipit-source-id: 9263a246258ffd18d8d883da7ced435a91fb5ced
Summary:
Right now, all of our manifest parsing and evaluation is in the repo() class, but this is a design mistake. Over a repo's convert lifetime, a single repo will have many different manifests, based on branch, and location in the commit history. What's worse is that the current design makes it hard to build unit tests and new features like include evaluation.
This commit creates a whole new class called repomanifest, that represents a specific manifest (and its included files). It also has unit tests to test the various operations that the manifest performs, such as path and revision mapping. This commit does not modify the existing converter code outside of the class to use this new implementation.
Reviewed By: tchebb
Differential Revision: D19402995
fbshipit-source-id: b97dadcc595c6332f4495460618317194873a780
Summary:
In the past I saw test breakages where the stderr from the remote ssh process
becomes incomplete. It's hard to reproduce by running the tests directly.
But inserting a sleep in the background stderr thread exposes it trivially:
```
# sshpeer.py:class threadedstderr
def run(self):
# type: () -> None
while not self._stop:
buf = self._stderr.readline()
+ import time
+ time.sleep(5)
if len(buf) == 0:
break
```
Example test breakage:
```
--- a/test-commitcloud-sync.t
+++ b/test-commitcloud-sync.t.err
@@ -167,8 +167,7 @@ Make a commit in the first client, and sync it
$ hg cloud sync
commitcloud: synchronizing 'server' with 'user/test/default'
backing up stack rooted at fa5d62c46fd7
remote: pushing 1 commit:
- remote: fa5d62c46fd7 commit1
commitcloud: commits synchronized
finished in * (glob)
....
```
Upon investigation it's caused by 2 factors:
- The connection pool calls pipee.close() before pipeo.close(), to workaround
an issue that I suspect solved by D19794281.
- The new threaded stderr (pipee)'s close() method does not actually closes the
pipe immediately. Instead, it limits the text to read to one more line at
most, which causes those incomplete messages.
This diff made the following changes:
- Remove the `pipee.close` workaround in connectionpool.
- Remove `pipee.close`. Embed it in `pipee.join` to prevent misuses.
- Add detailed comments in sshpeer.py for the subtle behaviors.
Reviewed By: xavierd
Differential Revision: D19872610
fbshipit-source-id: 4b61ef8f9db81c6c347ac4a634e41dec544c05d0
Summary:
This makes `peer.close()` actually close the ssh connection if it's an
sshpeer. This affects the `clone` path to actually clean up the ssh connection
so we don't depend on (fragile) `__del__`.
I traced the code back to peerrepository.close in 2011 [1]. At that time it
seems the codebase depends on `__del__`. Nowadays the codebase calls `close()`
properly so I think it's reasonable to make the change.
[1]: https://www.mercurial-scm.org/repo/hg/rev/d747774ca9da.
Reviewed By: ikostia
Differential Revision: D19911393
fbshipit-source-id: ea640d1cd82ffcb786e22f47da8116c7f50a4690
Summary:
The added function can be used by extensions to run extra logic before the
"clone" function closes the repos or peers.
This is needed to make the next diff work. Otherwise extensions like remotenames will try to write to a closed sshpeer and cause errors.
Reviewed By: DurhamG
Differential Revision: D19911390
fbshipit-source-id: ca1364e808cebb632e051fbbdcfe4bf0dca721bc
Summary:
Update the `print_status()` function to take a `clidispatch::io::IO` object as
a parameter, instead of a simple output object. This will allow us to also
print error messages from this function in a future diff.
Reviewed By: quark-zju
Differential Revision: D19958504
fbshipit-source-id: bf482fdc4420e1350363a730c6a539cd760aef25
Summary: Updates the C code to support unicode filenames and states.
Reviewed By: simpkins
Differential Revision: D19786275
fbshipit-source-id: e7aeb029b792818b1b1a9c5d3028640b56522235
Summary: There is no need to open a transaction otherwise.
Reviewed By: DurhamG
Differential Revision: D20109840
fbshipit-source-id: e47adaaeea2d7565f3629701d8de4a67d4b55182
Summary:
Verifying the changelog is quite slow and we've had more users needing
to run hg recover these days. Let's finally get rid of the verify step.
Reviewed By: simpkins
Differential Revision: D20109706
fbshipit-source-id: a512d9e11716514bce986b0e3a26347fe6afd955
Summary: Most of the fixes related to encoding in `patch.py`
Reviewed By: DurhamG
Differential Revision: D19713378
fbshipit-source-id: 66ccbd0fc7826ab2d4c05173c7e9edb96700d106
Summary: As I work, it's getting harder and harder to keep my multiple changes from introducing merge conflicts between different branches. We need to break out the repo_source's implementation in to a bunch of different files to make it easier to keep things separate.
Reviewed By: zhonglowu, tchebb
Differential Revision: D20015946
fbshipit-source-id: bf954ac581e5ca9e43c091b6b1b4c539c14471f2
Summary:
Fix the PathRelativizer APIs to accept `Path` and even `str` arguments instead
of just `PathBuf`. The old code required a `PathBuf`, which often forced
callers to make a copy of the path data.
Reviewed By: quark-zju
Differential Revision: D19958505
fbshipit-source-id: 6fa40dd4b75df4e3faf9ad2ae4f0e4e6595669f6
Summary:
The bytes 0.5 is a depencency of newer tokio, it's also newer, and thus better.
Staying on 0.4 means that copies between Bytes 0.4 and 0.5 need to be done,
this will be especially bad in the LFS code since 10+MB buffer will have to be
copied...
One main API change is for the configparser. The code used to take Into<Bytes>
for the keys, I switched it to AsRef<[u8]>.
For hg_memcache_client, an extra copy is performed to build a Delta, since this
code uses an old tokio, and is being replaced right now, the effort of
switching to a new tokio and new bytes was not deemed worth it, the copy will
do for now.
Reviewed By: dtolnay
Differential Revision: D20043137
fbshipit-source-id: 395bfc3749a3b1bdfea652262019ac6a086e61e0
Summary:
`treedirstatemap._repacked` is sometimes set in write(), but does not appear
to be used anywhere. Remove it. (I noticed this since Pyre complains about
it if you enable type checking for `write()`)
Reviewed By: xavierd
Differential Revision: D19958219
fbshipit-source-id: a55e237865160191d814ed950f69c3113bec4f64
Summary:
Add type annotations for the propertycache type.
Unfortunately at the moment Pyre still can't properly type check code that
uses this class, as it does not understand the special `__get__()` method.
It looks like support for this is hopefully coming in D19206575.
Reviewed By: xavierd
Differential Revision: D19958223
fbshipit-source-id: 0f8f15fc6935ec3feaef41d3be373a85225276fe
Summary:
Add type annotations for `dirstate.status()` and
`filesystem.pendingchanges()`
Unfortunately Pyre appears to choke when processing the `dirstate.status()`
function, and currently does not actually report type errors inside this
function at the moment. I've let the Pyre team know about this.
(If Pyre did work correctly it would report one issue since it doesn't realy
understand the `rootcache` decorator applied to `dirstate._ignore`)
Reviewed By: xavierd
Differential Revision: D19958226
fbshipit-source-id: a1cd4b9402a0a449481035cee819533c56b9b336
Summary:
This module previously used to handle deciding how a particular module should
be imported if it had multiple versions (e.g., pure Python or native).
However, as of D18819680 it was changed to always import the native C version.
Lets go ahead and remove it entirely now. Using `policy.importmod` simply
makes it harder for type checkers to figure out the actual module that will be
used.
The only functionality that `policy.importmod()` still provided was verifying
that the module contained a "version" field that looked like what was
expected. In practice these version numbers are not bumped often, so this
doesn't really seem to provide much value in checking that we imported the
correct version that we expected to be shipped with this release.
Reviewed By: xavierd
Differential Revision: D19958227
fbshipit-source-id: 05f1d027d0a41cf99c4aa93cb84a51e830305077
Summary:
Add *.pyi type stub files for most of the native C extensions.
This allows Pyre to type check functions that use these extensions.
These type annotations likely aren't complete, but contain enough information
to allow Pyre to pass cleanly on the existing type-checked locations in the
code using these modules.
Reviewed By: xavierd
Differential Revision: D19958220
fbshipit-source-id: 85dc39a16e595595a174a8e59e419c418d3531be
Summary:
This moves the build rules for the extensions in mercurial/cext into a TARGETS
file in this directory.
This will allow us to start writing `*.pyi` files that contain type
information for these modules, and store them alongside the corresponding `.c`
files. By having the build rules in the top-level `eden/scm` directory we
would have needed to keep the `.pyi` files for these modules directly in the
`eden/scm` directory instead, as the namespace for the `pyi` files is assumed
to be the basemodule plus their path relative to the TARGETS file.
Reviewed By: xavierd
Differential Revision: D19958222
fbshipit-source-id: fdc26ead16663036ffa2562a96eb1649f91cba81
Summary:
The last diff fixed this for fsmonitor. Let's skip these same paths for
non-fsmonitor.
Reviewed By: quark-zju
Differential Revision: D20014808
fbshipit-source-id: 02e3cd9aa29d9c024ba3e8e42a46e21a7c8dfc30
Summary:
Watchman may report invalid utf-8 filenames, even after they've been
deleted. Let's skip them, and print a warning.
Reviewed By: sfilipco
Differential Revision: D20012187
fbshipit-source-id: b13550918a8330ef3eb5c546105d1e054dcb7724
Summary:
Error strings were being converted to unicode if they contained certain
characters. This caused python 2 Mercurial to throw various errors when it tried
to turn them into strings to report errors.
Let's return cpython_ext::Str instead of String.
Reviewed By: sfilipco
Differential Revision: D20012188
fbshipit-source-id: af6fa7d98d68e3c188292e4972cfc1bdb758dbdf
Summary:
Whenever remotefilelog.cacheprocess2 is set, remotefilelog.cachekey is also
set, but the later is not be present when remotefilelog.cacheprocess is. Since
remotefilelog.cacheprocess already includes the cachekey, let's not add it
twice.
This also fixes the issue where hg_memcache_client would die early due to being
passed too many arguments.
Reviewed By: DurhamG
Differential Revision: D20014792
fbshipit-source-id: 8ed6775f70cf967d1c069f8acdb5a782ee819090
Summary:
This error handling can be extremely slow: calling `self.node()` can end up
triggering a linkrev scan of the changelog, which can take over 5 minutes.
If we did want to add this back in the future we would need some sort of API
on `filectx` to try and get the node ID only if it was cheap, and that would
fail fast if this is using remotefilelog and trying to get the node ID will
require scanning the changelog.
Note that KeyError can occur fairly regularly when invoked in long-lived
commands like `hg debugedenimporthelper`. If we are asked about data in a new
commit that was added since this repository was originally opened a KeyError
will be thrown here (in which case `debugedenimporthelper` will call
`repo.invalidate()` and then retry).
Reviewed By: quark-zju
Differential Revision: D20010279
fbshipit-source-id: 0e9b4c163cb9256de57daa91eed70a3736cb1075
Summary: There are two copies of pywatchman in fbcode (!) and some changes didn't make it into the edenscm copy.
Reviewed By: quark-zju
Differential Revision: D19794480
fbshipit-source-id: bcc85e0d3efc225d94b8bfa1e433f6e9cc024643
Summary:
Mercurial filenode hash is computed by including the copy information in the
blob header. Before computing the blob content hash, or returning it to the
upper layers, we need to either strip or reconstruct this header appropriately.
Reviewed By: DurhamG
Differential Revision: D19975887
fbshipit-source-id: 7555e7219e50f4d18ec677fdecc216ee705d7af4
Summary: This will make it easier to support more hash schemes in the future.
Reviewed By: DurhamG
Differential Revision: D19975888
fbshipit-source-id: 8b8ce3b20d72199bac3cd20a48475b5ab56bfc52
Summary:
With the Arc embedded into the store themselves, this forces a second
allocation in order to use them as trait objects. Since in most cases, we do
not want the stores themselves to be cloneable, we can move the Arc outside and
thus reduce the number of pointer indirection.
Reviewed By: DurhamG
Differential Revision: D19867568
fbshipit-source-id: 9cd126831fe2b9ee715472ac3299b7a09df95fce
Summary:
The ContentStore now can read LFS blobs from both the shared cache, and the
local store.
Reviewed By: DurhamG
Differential Revision: D19866249
fbshipit-source-id: a6fb3523495e9d3832613b56438f631cfa552b91
Summary:
With the LFS store being added, and the indexedlog being soon used for trees,
this simplification should help in formalizing the hierarchy of files/folders.
It will look like the following:
<root dir>/lfs: for the lfs store
<root dir>/indexedlog*: for the indexedlog
<root dir>/foobar: for a hypothetical foobar store
For manifests, <root dir> will therefore be: <store dir>/manifests. The
unfortunate part is that the current tree data lives under
<store dir>/packs/manifests. As packfiles will be replaced, this small
discrepency is acceptable.
Reviewed By: DurhamG
Differential Revision: D19866248
fbshipit-source-id: 7ef59ef7df19149b19a529b4f4a45a479cc9d23b
Summary:
This is the first step in having a stronger integration between LFS blobs and
the ContentStore abstraction. The 2 main difference between the Python based
LFS implementation and this one are:
- pointers are not stored alongside plain data,
- blobs are split between local and shared blobs
As of now, no reclamation is being performed for shared blobs, blobs aren't
fetched or uploaded. This will come in future diffs.
Reviewed By: DurhamG
Differential Revision: D19859291
fbshipit-source-id: 45000fc574e6fbd6d3487f4966cad4f49dab731c
Summary:
Some of our upcoming repo merges will make it infeasible for someone to
use a full checkout. Let's add a config that will warn users of this. It has a
few levels, starting with a suppressable hint, then a non-suppressable warning,
then a suppressable exception, then a non-suppressable exception.
Reviewed By: ikostia
Differential Revision: D19974408
fbshipit-source-id: bad35a477ad8626dbc0977465368f5d71007e2d5
Summary:
On Windows, there are *two* 8-bit encodings for each process.
* The ANSI code page is used for all `...A` system calls, and this is what
Mercurial uses internally. It can be overridden using the `--encoding`
command line option.
* The OEM code page is used when outputing to the console. Mercurial has no
concept of this, and instead renders to the console using the ANSI code page,
which results in mojibake like "Θ" instead of "é".
Add the concept of an `outputencoding`. If this differs from `encoding`, we
convert from the local encoding to the output encoding before writing to the
console.
On non-Windows platforms, this defaults to the same encoding as the local encoding,
so this is a no-op unless `--outputencoding` is manually specified.
On Windows, this defaults to the codepage given by `GetOEMCP`, causing output
to be converted to the OEM codepage before being printed.
For ordinary strings, the local encoded version is wrapped by `localstr` if the
encoding does not round-trip cleanly. This means the output encoding works
even if the character is not represented in the local encoding.
Unfortunately, the templater is not localstr-clean, which means strings can get
flattened down to the local encoding and the original code points are lost. In
this case we can only output characters which are in the intersection of the
encoding and the output encoding.
Most US English Windows systems use cp1252 for the ANSI code page and cp437 for
the OEM code page. These both contain many accented characters, so users with
accented characters in their names will now see them correctly rendered.
All of this only applies to Python 2.7. In Python 3, everything is Unicode,
the `--encoding` and `--outputencoding` options do nothing, and it just works.
Reviewed By: quark-zju, ikostia
Differential Revision: D19951381
fbshipit-source-id: d5cb8b5bfe2bc131b2e6c3b892137a48b2139ca9
Summary:
`hg rage` generates the rage in the user's encoding. Since pastes are expected
to be in UTF-8, non-UTF-8 encodings result in garbled pastes.
Similarly, the lines-dec graph renderer uses escape sequences that won't work
on web pages, and the lines graph renderer uses curved lines which don't
render very well either. Force the use of the lines-square graph renderer,
which renders well.
Reviewed By: quark-zju
Differential Revision: D19951382
fbshipit-source-id: d1a5fd2ef195658f9bf10210088031474355f168
Summary:
The Rust graph renderer expects the message to be a unicode string, so ensure
we convert it from the local encoding before passing it to Rust.
Reviewed By: quark-zju
Differential Revision: D19951383
fbshipit-source-id: 644862c63873079364cb9902bd1bb49de8aa1ab9
Summary:
See later in this stack for motivation. This seems to work fine, and it allows
characters that don't fit latin1 when rendering diffs.
Reviewed By: markbt
Differential Revision: D19969743
fbshipit-source-id: 79c4afce5a19822d9b075d23ff4c88aa76ce2f42
Summary:
As of 63c471ad8a4ba0bebd1acf70569bcdcefc3fffbf in upstream Dulwich, it
now turns commands into unicode. Unfortunately, _ssh.py in hggit sees that the
type is no longer str or bytes and thinks it's an array and puts spaces between
every letter, causing it to break.
Let's allow unicode. This broke because dulwich was recently upgraded.
Reviewed By: sfilipco
Differential Revision: D19983215
fbshipit-source-id: 059756905bf4b2c73009001b078c8723ae378246
Summary: This should get rid of the extraneous uninitialized attribute errors related to `setUp` and abstract classes.
Reviewed By: simpkins
Differential Revision: D19964487
fbshipit-source-id: 52d5a6496e372d99d4398473f9ed7672228a76f5
Summary:
This is a revised version of D19887220.
D19887220 has 2 problems:
- It can silently ignore the mt.exe error after failures of all retries.
- There is another place that `mt.exe` runs that is not covered by retry.
This diff fixes them by wrapping the `set_long_paths_manifest` function
directly so it covers two `mt.exe` places, and makes sure all retry failure
is still a failure.
Reviewed By: sfilipco
Differential Revision: D19977802
fbshipit-source-id: 774d0c42b247a7e111841cd69f71760a5544d685