Summary:
This is an example about how to use the new Bytes type. The performance change
is not obviously visible in benchmarks since the bottleneck is not at the bytes
copying.
Reviewed By: DurhamG
Differential Revision: D19818720
fbshipit-source-id: a431ae206cfa4fa08b2e162a48b3d7cbcd900f7f
Summary: The APIs are compatible so the switch is straightforward.
Reviewed By: DurhamG
Differential Revision: D19818713
fbshipit-source-id: 504e9149567c90eb661804e0dad20580a401aa76
Summary:
D20042045 changes the meaning of "lag_threshold". Update the value in mutation
store accordingly.
Reviewed By: DurhamG
Differential Revision: D20043116
fbshipit-source-id: 154e6dc2aa88ab0a9a9b21929ae5fa6163dcd403
Summary:
Previously indexes are only updated at `sync()` time. This diff makes it so
`open()` can also update lagging indexes. This should make index migration
(ex. D19851355) smoother - indexes are built in time and users suffer less from
the absent of indexes.
Reviewed By: DurhamG
Differential Revision: D20042046
fbshipit-source-id: 20412661a0ca4f5f67b671137c47b6373a42981d
Summary: The logic is currently only used by `sync()`. I'd like to reuse it at `open()`.
Reviewed By: DurhamG
Differential Revision: D20042044
fbshipit-source-id: 5c9734ff68bdcf8f8c8710c6a821b18d3afeaca0
Summary:
This is more friendly for indexedlog users - deciding lag_threshold by number
of entries is easier than by bytes.
Initially, I thought checking `bytes` is cheaper and checking `entries` is more
expensive. However, practically we will have to build indexes for `entires`
anyway. So we do know the number of entries lagging behind.
Reviewed By: DurhamG
Differential Revision: D20042045
fbshipit-source-id: 73042e406bd8b262d5ef9875e45a3fd5f29f78cf
Summary:
This can be useful for users of indexedlog when they want `Bytes` (to get rid
of the lifetime parameter).
This might be useful for storage layer that wants to take the ownership of the
returned bytes.
Reviewed By: xavierd
Differential Revision: D19818714
fbshipit-source-id: cb2d4e7deff921915e07454fee15cb94a3d5c00d
Summary: Those utilities are no longer necessary since the new code uses Bytes.
Reviewed By: xavierd
Differential Revision: D19818717
fbshipit-source-id: 0b43af0f1eae1a4288e84d4170db058b27f80334
Summary: This simplifies the code a bit and makes it cheaper to clone the Log.
Reviewed By: xavierd
Differential Revision: D19818716
fbshipit-source-id: bbf07b8b36009d53b63d8066ec422fc3c3796840
Summary: It's no longer used since Index now has inlined its checksum logic.
Reviewed By: ikostia
Differential Revision: D19850744
fbshipit-source-id: eb134e4c1613573a2d238710b44ad8119c80a5ee
Summary:
Change index filename and metadata name. This makes sure the new format and old
format are separate so upgrading or downgrading won't have issues.
Reviewed By: DurhamG
Differential Revision: D19851355
fbshipit-source-id: 25dee018073a90040f5818b32b753a3f589c10e0
Summary:
Enhance the index format: The Root entry can be followed by an optional
Checksum entry which replaces the need of ChecksumTable.
The format is backwards compatible since the old format will be just
treated as "there is no ChecksumTable", and the ChecksumTable will be built on
the next "flush".
This change is non-trivial. But the tests are pretty strong - the bitflip test
alone covered a lot of issues, and the dump of Index content helps a lot too.
For the index itself without ".sum", checksum, this change is bi-directional
compatible:
1. New code reading old file will just think the old file does not have the
checksum entry, similar to new code having checksum disabled.
2. Old code will think the root+checksum slice is the "root" entry. Parsing
the root entry is fine since it does not complain about unknown data at the
end.
However, this change dropped the logic updating ".sum" files. That part is an
issue blocking old clients from reading new data.
Reviewed By: DurhamG
Differential Revision: D19850741
fbshipit-source-id: 551a45cd5422f1fb4c5b08e3b207a2ffe3d93dea
Summary:
To solve the soundness issue of ChecksumTable raised by the last diff.
I plan to move Checksum logic to Index. This has multiple benefits:
- Solve the soundness issue of ChecksumTable.
- Indexedlog no longer writes the ".sum" files. `atomic_write` can be quite
slow (tens of milliseconds) on Windows. So this should help perf - with
many indexes, it can save hundreds of milliseconds on Windows per
indexedlog sync.
This diff adds the definition and serialization of the new Checksum entry.
The index format is not updated yet.
Reviewed By: markbt
Differential Revision: D19850742
fbshipit-source-id: df6e6ed12a12ef0d2a782dc9d6b4dc5dec3f4b46
Summary:
With the last change, mmap cost is reduced, but ChecksumTable is unsound in a
corner case: the buffer to check is shorter than what ChecksumTable covers:
checksum: |----chunk----|----chunk----|----chunk--|
buf: |-------------------------------| |
^ ^
logic len physical len
The checksum table will be unable to verify the last chunk, since it does not
have enough data in buf.
The issues is exposed by stress testing the multithread sync tests. It's not
always easy to reproduce, though.
Reviewed By: markbt
Differential Revision: D19850745
fbshipit-source-id: a1a96080163b7b9b56dcd6c1673d5d8d10e18a2b
Summary: This avoids some extra mmap syscalls by ChecksumTable.
Reviewed By: xavierd
Differential Revision: D19818721
fbshipit-source-id: dace55193f2b4b0f35e3868781faa2d2998d3b58
Summary:
This simplifies the code a bit (no special cases about 0-sized mmap buffers)
and makes it cheaper to clone the index buffer (just an Arc::clone, without
another mmap syscall).
Reviewed By: xavierd
Differential Revision: D19818718
fbshipit-source-id: e96d42af74c7f0bb11703c5da31cdfbd5d76c372
Summary:
TreeSpans used to use `&str`, which adds a lifetime to the struct, making it
harder to be used in the Python land. Use a type parameter so TreeSpans<String>
can be used.
Reviewed By: DurhamG
Differential Revision: D19797708
fbshipit-source-id: c66429abfaf16d876151ca6f29da976bed91485d
Summary:
The filtering interface allows callsite to select what they want. It's similar
to manifest walk with files or directory matchers in source control.
Reviewed By: DurhamG
Differential Revision: D19784467
fbshipit-source-id: 5cf6e4016d6fa1c90f8aeccc50809baccd4af5ab
Summary: The idea is that instants (events) can be a drop-in replacement for `ui.log`.
Reviewed By: DurhamG
Differential Revision: D19782897
fbshipit-source-id: 795bbba23d921e460f723f19ef529b203aea366a
Summary: This function will be reused by the next diff.
Reviewed By: DurhamG
Differential Revision: D19782895
fbshipit-source-id: 1e636eabee9b0dffd287a1e6784a24ab2259f51f
Summary: This allows us to define methods on the treespans, such as filtering APIs.
Reviewed By: DurhamG
Differential Revision: D19782896
fbshipit-source-id: 2e7bd8344c0196e382728c26a8233abf944bbf29
Summary: The Thrift generated code depends only on futures 0.3, not 0.1. Thus it isn't necessary to depend on renamed:futures-preview and we can depend on futures-preview directly, which is exposed to Rust code as `futures::`.
Reviewed By: jsgf
Differential Revision: D20145921
fbshipit-source-id: 5cae94ec6747a374c2bf05f124ab237c798de005
Summary: It was a list. Make it possible to use it as a string.
Reviewed By: xavierd
Differential Revision: D20144811
fbshipit-source-id: b280c0344215a4c23ab9c63d89f47adf34fb06f3
Summary: This should help reduce test flakiness.
Reviewed By: xavierd
Differential Revision: D19872952
fbshipit-source-id: d66f6c404534b3f47903b478e3cdfdda5ed46284
Summary:
The state entry of a dirstate tuple is a single character. In python 3
it's a unicode string. To parse it, previously we used 'C' which takes a single
character unicode string and (little did I know) returns an int. We were storing
this in a char, which causes corruption.
Let's switch to reading the string, and just grabbing the first byte.
Reviewed By: xavierd
Differential Revision: D20143094
fbshipit-source-id: d9946c0cefdafe0941f4bdac070659fac27f30e3
Summary:
This new method returns the content of a blob without the copy-from metadata
header.
Reviewed By: DurhamG
Differential Revision: D20102889
fbshipit-source-id: e96f636b7d30460b59707a2cb700d667e616116a
Summary:
Python json produces unicode strings in the parsed results. This breaks
when passed to parts of the code that now assert that byte strings are required
(like the wire protocol). Let's switch phabricator stuff to use Mercurial json,
which produces bytes in Python 2 and unicode in Python 3.
Reviewed By: ikostia
Differential Revision: D20123140
fbshipit-source-id: d1b11426736a0f43ff7e74acf709ab1fd70d5bfe
Summary:
The NameSet is something similar to SpanSet and Mercurial's smartset but speaks
VertexNames instead of Ids. The idea is, NameSet will be part of NameDag APIs,
and potentially replace Mercurial's smartset layer (just smartset the container
types, not the revset language), in a way that revision numbers are completely
hidden behind the scenes.
This diff adds some basic abstraction around iteration-related operations.
Other operations will be added later.
Reviewed By: sfilipco
Differential Revision: D19912109
fbshipit-source-id: 504a26c074282ec51f260535ca63e943124f688e
Summary: EdenFS is planning on throwing an error if a user requests a checkout while a checkout is already in progress. Often, this is already disallowed by a mercurial repository lock, but there are instances where these calls can still get through. We would like to disallow these calls to queue, so we will throw an `EdenError` instead. Without this handling, a full stack trace prints, so this just makes it a bit prettier for the user.
Reviewed By: simpkins
Differential Revision: D20106480
fbshipit-source-id: e33df3d0b7aa42867ee752e4c1f3a47b31ade76b
Summary:
The ssh output order issue is a large contributor to test flakiness.
Example test failures are:
```
--- test-unbundlereplay.t
+++ test-unbundlereplay.t.respondfully.err
@@ -154,9 +154,9 @@
remote: [ReplayVerification] Expected: (master_bookmark, c2e526aacb5100b7c1ddb9b711d2e012e6c
69cda). Actual: (master_bookmark, 893d83f11bf81ce2b895a93d51638d4049d56ce2)
remote: pushkey-abort: prepushkey hook exited with status 1
remote: transaction abort!
+ replay failed: error:pushkey
+ unbundle replay batch item #0 failed
remote: rollback completed
- replay failed: error:pushkey
- unbundle replay batch item #0 failed
[1]
$ cat $TESTTMP/reports.txt
unbundle replay batch item #0 failed
--- test-commitcloud-backup-all.t
+++ test-commitcloud-backup-all.t.err
@@ -59,9 +59,9 @@
remote: pushing 1 commit:
remote: eccc11f58a56 D3
backing up stack rooted at 42952ab62cec
+ backing up stack rooted at 4903fdffd9c6
remote: pushing 1 commit:
remote: 42952ab62cec E1
- backing up stack rooted at 4903fdffd9c6
remote: pushing 1 commit:
remote: 4903fdffd9c6 E2
commitcloud: backed up 8 commits
test-fb-hgext-lfspushrebase-verify-blobs.t
--- test-fb-hgext-treemanifest-pushrebase.t
+++ test-fb-hgext-treemanifest-pushrebase.t.err
@@ -127,9 +127,9 @@
$ hg push --to master -B master --config treemanifest.sendtrees=True
pushing to ssh://user@dummy/master
searching for changes
- remote: baz
remote: prepushrebase.cat hook exited with status 1
abort: push failed on remote
+ remote: baz
[255]
- Disable the hook
```
The order is nondeterministic because the stderr reading thread can read the
content before or after ui.write or ui.write_err in the main thread.
This diff introduces an optional feature in dummyssh that buffers all stderr
output and only write them after the wrapped hg serve process has exited, at
which time the hg client should also have completed its operations and has no
reason to ui.write or ui.write_err anything nondeterministically. Then the
dummyssh wrapper writes out the buffered stderr so the output order becomes
well defined.
Reviewed By: xavierd
Differential Revision: D19872612
fbshipit-source-id: 84710f98a8e6b4a1c283ffecf008585cca12be0a
Summary: This makes the next change easier to see.
Reviewed By: xavierd
Differential Revision: D19872609
fbshipit-source-id: 9263a246258ffd18d8d883da7ced435a91fb5ced
Summary:
Right now, all of our manifest parsing and evaluation is in the repo() class, but this is a design mistake. Over a repo's convert lifetime, a single repo will have many different manifests, based on branch, and location in the commit history. What's worse is that the current design makes it hard to build unit tests and new features like include evaluation.
This commit creates a whole new class called repomanifest, that represents a specific manifest (and its included files). It also has unit tests to test the various operations that the manifest performs, such as path and revision mapping. This commit does not modify the existing converter code outside of the class to use this new implementation.
Reviewed By: tchebb
Differential Revision: D19402995
fbshipit-source-id: b97dadcc595c6332f4495460618317194873a780
Summary:
In the past I saw test breakages where the stderr from the remote ssh process
becomes incomplete. It's hard to reproduce by running the tests directly.
But inserting a sleep in the background stderr thread exposes it trivially:
```
# sshpeer.py:class threadedstderr
def run(self):
# type: () -> None
while not self._stop:
buf = self._stderr.readline()
+ import time
+ time.sleep(5)
if len(buf) == 0:
break
```
Example test breakage:
```
--- a/test-commitcloud-sync.t
+++ b/test-commitcloud-sync.t.err
@@ -167,8 +167,7 @@ Make a commit in the first client, and sync it
$ hg cloud sync
commitcloud: synchronizing 'server' with 'user/test/default'
backing up stack rooted at fa5d62c46fd7
remote: pushing 1 commit:
- remote: fa5d62c46fd7 commit1
commitcloud: commits synchronized
finished in * (glob)
....
```
Upon investigation it's caused by 2 factors:
- The connection pool calls pipee.close() before pipeo.close(), to workaround
an issue that I suspect solved by D19794281.
- The new threaded stderr (pipee)'s close() method does not actually closes the
pipe immediately. Instead, it limits the text to read to one more line at
most, which causes those incomplete messages.
This diff made the following changes:
- Remove the `pipee.close` workaround in connectionpool.
- Remove `pipee.close`. Embed it in `pipee.join` to prevent misuses.
- Add detailed comments in sshpeer.py for the subtle behaviors.
Reviewed By: xavierd
Differential Revision: D19872610
fbshipit-source-id: 4b61ef8f9db81c6c347ac4a634e41dec544c05d0
Summary:
This makes `peer.close()` actually close the ssh connection if it's an
sshpeer. This affects the `clone` path to actually clean up the ssh connection
so we don't depend on (fragile) `__del__`.
I traced the code back to peerrepository.close in 2011 [1]. At that time it
seems the codebase depends on `__del__`. Nowadays the codebase calls `close()`
properly so I think it's reasonable to make the change.
[1]: https://www.mercurial-scm.org/repo/hg/rev/d747774ca9da.
Reviewed By: ikostia
Differential Revision: D19911393
fbshipit-source-id: ea640d1cd82ffcb786e22f47da8116c7f50a4690
Summary:
The added function can be used by extensions to run extra logic before the
"clone" function closes the repos or peers.
This is needed to make the next diff work. Otherwise extensions like remotenames will try to write to a closed sshpeer and cause errors.
Reviewed By: DurhamG
Differential Revision: D19911390
fbshipit-source-id: ca1364e808cebb632e051fbbdcfe4bf0dca721bc
Summary:
Update the `print_status()` function to take a `clidispatch::io::IO` object as
a parameter, instead of a simple output object. This will allow us to also
print error messages from this function in a future diff.
Reviewed By: quark-zju
Differential Revision: D19958504
fbshipit-source-id: bf482fdc4420e1350363a730c6a539cd760aef25
Summary: Updates the C code to support unicode filenames and states.
Reviewed By: simpkins
Differential Revision: D19786275
fbshipit-source-id: e7aeb029b792818b1b1a9c5d3028640b56522235
Summary: There is no need to open a transaction otherwise.
Reviewed By: DurhamG
Differential Revision: D20109840
fbshipit-source-id: e47adaaeea2d7565f3629701d8de4a67d4b55182
Summary:
Verifying the changelog is quite slow and we've had more users needing
to run hg recover these days. Let's finally get rid of the verify step.
Reviewed By: simpkins
Differential Revision: D20109706
fbshipit-source-id: a512d9e11716514bce986b0e3a26347fe6afd955
Summary: Most of the fixes related to encoding in `patch.py`
Reviewed By: DurhamG
Differential Revision: D19713378
fbshipit-source-id: 66ccbd0fc7826ab2d4c05173c7e9edb96700d106
Summary: As I work, it's getting harder and harder to keep my multiple changes from introducing merge conflicts between different branches. We need to break out the repo_source's implementation in to a bunch of different files to make it easier to keep things separate.
Reviewed By: zhonglowu, tchebb
Differential Revision: D20015946
fbshipit-source-id: bf954ac581e5ca9e43c091b6b1b4c539c14471f2
Summary:
Fix the PathRelativizer APIs to accept `Path` and even `str` arguments instead
of just `PathBuf`. The old code required a `PathBuf`, which often forced
callers to make a copy of the path data.
Reviewed By: quark-zju
Differential Revision: D19958505
fbshipit-source-id: 6fa40dd4b75df4e3faf9ad2ae4f0e4e6595669f6
Summary:
The bytes 0.5 is a depencency of newer tokio, it's also newer, and thus better.
Staying on 0.4 means that copies between Bytes 0.4 and 0.5 need to be done,
this will be especially bad in the LFS code since 10+MB buffer will have to be
copied...
One main API change is for the configparser. The code used to take Into<Bytes>
for the keys, I switched it to AsRef<[u8]>.
For hg_memcache_client, an extra copy is performed to build a Delta, since this
code uses an old tokio, and is being replaced right now, the effort of
switching to a new tokio and new bytes was not deemed worth it, the copy will
do for now.
Reviewed By: dtolnay
Differential Revision: D20043137
fbshipit-source-id: 395bfc3749a3b1bdfea652262019ac6a086e61e0
Summary:
`treedirstatemap._repacked` is sometimes set in write(), but does not appear
to be used anywhere. Remove it. (I noticed this since Pyre complains about
it if you enable type checking for `write()`)
Reviewed By: xavierd
Differential Revision: D19958219
fbshipit-source-id: a55e237865160191d814ed950f69c3113bec4f64
Summary:
Add type annotations for the propertycache type.
Unfortunately at the moment Pyre still can't properly type check code that
uses this class, as it does not understand the special `__get__()` method.
It looks like support for this is hopefully coming in D19206575.
Reviewed By: xavierd
Differential Revision: D19958223
fbshipit-source-id: 0f8f15fc6935ec3feaef41d3be373a85225276fe