Summary:
All of the callers are already using an Arc, so instead of forcing the remote
store to be cloneable, and thus wrap an inner self with an Arc, let's just pass
self as an Arc.
Reviewed By: DurhamG
Differential Revision: D20715580
fbshipit-source-id: 1bef23ae7da7b314d99cb3436a94d04134f1c0e4
Summary:
When LFS will be enabled on fbsource, the enablement will rolled out server,
with the server serving pointers (or not). In the catastrophic scenario where
Mononoke has to be rolled out, the Mononoke LFS server will be unable to serve
blobs, but some clients may still have LFS pointers locally but not the
corresponding blob. For this, we need to be able to fallback to fetching the
blob via the getpackv2 protocol.
Reviewed By: DurhamG
Differential Revision: D20662667
fbshipit-source-id: 4ac45558f6d205cbd1db33c21c6fb137a81bdbd5
Summary:
One of the main drawback of the current version of repack is that it writes
back the data to a packfile, making it hard to change file format. Currently, 2
file format changes are ongoing: moving away from packfiles entirely, and
moving from having LFS pointers stored in the packfiles, to a separate storage.
While an ad-hoc solution could be designed for this purpose, repack can
fullfill this goal easily by simply writing to the ContentStore, the
configuration of the ContentStore will then decide where this data will
be written into.
The main drawback of this code is the unfortunate added duplication of code.
I'm sure there is a way to avoid it by having new traits, I decided against it
for now from a code readability point of view.
Reviewed By: DurhamG
Differential Revision: D20567118
fbshipit-source-id: d67282dae31db93739e50f8cc64f9ecce92d2d30
Summary: There is no need to pass the path twice, once is enough.
Reviewed By: DurhamG
Differential Revision: D20567117
fbshipit-source-id: 169a088aa7558b4c25f8fbdecfe59bdd0d7575ef
Summary:
While failures in the Rust updater aren't expected, at least one valid case
requires requires retrying the operation in Python: old-style LFS pointers.
When these are stored in packfiles/indexedlog, only the Python code knows how
to deal with them, and thus the operation needs to be retried there.
Reviewed By: DurhamG
Differential Revision: D20603709
fbshipit-source-id: 7d24ba573f0ff540906d909f1b4440fd4d3469a6
Summary:
The ContentStore cannot deserialize LFS pointers stored in packfiles, to avoid
potential damage, let's refuse to update LFS blobs. A proper solution will be
built in a separate diff.
Reviewed By: DurhamG
Differential Revision: D20576575
fbshipit-source-id: 4e4ce6a9432157e2ce69881c0079e943ea3f3acd
Summary:
On Unix, pretend that NTFS doesn't support symlinks. While this isn't
technically true, NTFS on Linux is only used to alleviate performance issues
with `hg update` on Windows. With the pyworker code, I'm expecting these
performance issues to disappear allowing this code to be removed.
Reviewed By: ikostia
Differential Revision: D20527976
fbshipit-source-id: 4194f4b5af065de2e293b41b9d03e9d4ab6ea006
Summary:
Move them into a VFS struct, a future step will move the VFS code into its own
crate.
Reviewed By: DurhamG
Differential Revision: D20527977
fbshipit-source-id: 3250b05840688db72e1c43c72ec6defbc7f20851
Summary:
In a strongly typed langage, using strings should be avoided whenever possible
as they do not provide the safety guarantees that types provide.
I took the liberty of removing all the filesystems that are not relevant for
Mercurial for simplification reasons. If needs arise, we can always add a new
FsType to the enum.
Reviewed By: DurhamG
Differential Revision: D20517138
fbshipit-source-id: 0a38b53c6a87f05f4b2d664038e10c4293de96ae
Summary:
According to the anyhow documentation[0], the behavior of `.to_string()` is to
only stringify the top-level errors, hiding all the context of the error.
Instead, the debug format allows all the context to be displayed, and, if
available the backtrace.
This should significantly help debug Rust errors when context is available,
which we should strive to have everywhere!
[0]: https://docs.rs/anyhow/1.0.27/anyhow/struct.Error.html#display-representations
Reviewed By: sfilipco
Differential Revision: D20575944
fbshipit-source-id: 2968d7fb755edec7f7e5151138e8049ded181c1b
Summary:
There isn't really much to annotate here, but this lets us eliminate a couple
`pyre-fixme` comments about not being able to find these modules during type
checkign.
Reviewed By: singhsrb
Differential Revision: D20550267
fbshipit-source-id: 271f8406890787c0613294a9047365fdebcdeda1
Summary:
This will enables the fast-path for comparing LFS blobs without reading the
entire blob.
Reviewed By: DurhamG
Differential Revision: D20504233
fbshipit-source-id: 446cec57fba77e02cc7070203bd759d341fc01ab
Summary: Let's enable it for tests. We'll slow roll it in production.
Reviewed By: quark-zju
Differential Revision: D19543790
fbshipit-source-id: be7d18dd8ffe51615a27c39ebf4247ec405b4097
Summary:
The mutation store stores entries with a floating-point timestamp. This
pattern was copied from obsmarkers.
However, Mercurial uses integer timestamps in the commit metadata (the
parser supports floats for historical reasons, but only stores integer
timestamps). Mononoke also uses integer timestamps in its `DateTime`
type.
To keep things simple, switch to using integer timestamps for mutation
entries. Existing entries with floating point timestamps are truncated.
Add a new entry format version that encodes the timestamp as an integer.
For now, continue to generate the old version so that old clients can
read entries created by new clients.
Reviewed By: quark-zju
Differential Revision: D20444366
fbshipit-source-id: 4d6d9851aacb314abea19b87c9d0130c47fdf512
Summary:
Tracking the origin of mutation entries did not prove useful, and just creates
an un-necessary overhead. Remove the tracking and repurpose the field as a
version field.
Reviewed By: quark-zju
Differential Revision: D20444365
fbshipit-source-id: 65ff11ee8cfe77d5e67a83d03a510541d58ef69b
Summary:
Clippy had 3 sources of warnings in this crate:
- from_str method not in impl FromStr. We still have 2 of them in path.rs, but
this is documented as not supported by the FromStr trait due to returning a
reference. Maybe we can find a different name?
- Use of mem::transmute while casts are sufficient. I find the cast to be
ugly, but they are simply safer as the compiler can do some type checking on
them.
- Unecessary lifetime parameters
Reviewed By: quark-zju
Differential Revision: D20452257
fbshipit-source-id: 94abd8d8cd76ff7af5e0bbfc97c1e106cdd142b0
Summary:
Purge needs to be able to see what directories the walker traversed, so
it can delete them if they are empty. Instead of having the walker call
match.traversedir (which it seems like a bizarre pattern to use the matcher as a
holder for a non-matching related function), let's have the walker return an
enum and have an option to return directories.
At the python layer we then translate this into match.traversedir calls, but we
can clean that up later.
Reviewed By: quark-zju
Differential Revision: D19543795
fbshipit-source-id: cc51c86c91799d3df2c65d25a7b6cfe810206d0a
Summary: Empty flags can be sent by Mercurial, do not error out in that case.
Reviewed By: quark-zju
Differential Revision: D20450124
fbshipit-source-id: c85af42be2afb95b09057583f6fec3a2a13d478a
Summary:
In some rare cases, quickcheck may give us 2 identical paths with different
Key, Mercurial will never do that, therefore reject this.
Reviewed By: quark-zju
Differential Revision: D20451344
fbshipit-source-id: 800b31f1aeea38322052baedb918c4f45a1ec95d
Summary:
This type can either be a Mercurial type key, or a content hash based key. Both
the prefetch and get_missing now can handle these properly. This is essential
for stores where data can either be fetched in both ways or when the data is
split in 2. For LFS for instance, it is possible to have the LFS pointer (via
getpackv2), but not the actual blob. In which case get_missing will simply
return the content hash version of the StoreKey, to signify what it actually
has missing.
Reviewed By: quark-zju
Differential Revision: D20445631
fbshipit-source-id: 06282f70214966cc96e805e9891f220b438c91a7
Summary:
Similarly to the DataStore trait, this makes it easier to understand that they
deal with a Mercurial type Key.
Reviewed By: quark-zju
Differential Revision: D20445621
fbshipit-source-id: a1143d5f5d6a2c8686d517a6ea3c25b07c0df072
Summary: This makes it clear that these traits are dealing with Mercurial Key.
Reviewed By: quark-zju
Differential Revision: D20445626
fbshipit-source-id: d5acbf442e9407b973e95e40af69b5a61bff0a4d
Summary:
Since configparser enforces utf-8 config files (because pest wants Rust strings),
let's migrate from Bytes to Text to remove extra encoding conversions.
Previously this was blocked by the lack of ref-counted text (since the "source"
of each config location is the entire config file). Now minibytes provides Text
so we can use it.
This unfortunately requires dependent code to be updated. The pyconfigparser
interface is in theory wrong - it shouldn't return utf-8 bytes but
local-encoded bytes. I think it's cleaner to make pyconfigparser unaware of
HGENCODING, so I changed pyconfigparser to use unicode, and add compatibility
layer in uiconfig.py.
This also fixes non-ascii encoding issues on user name (especially on Windows).
The hgrc config file should be in utf-8 and the config parser returns explicit
unicode types, and Python code round-trip them with local encodings.
Reviewed By: markbt
Differential Revision: D20432938
fbshipit-source-id: b1359429b8f1c133ab2d6b2deea6048377dfeca1
Summary:
Expose the "TreeSpans" type from tracing-collector so filtering can be done on
them.
Reviewed By: DurhamG
Differential Revision: D19797707
fbshipit-source-id: 56b63c8fb79865666ce923612dbd5a9cc512dce6
Summary: Expose the Rust fstype implementation so we can remove C and Python code.
Reviewed By: xavierd
Differential Revision: D20313390
fbshipit-source-id: 931b4f6bc8c4b81f30bfea06d055fae3c677eedd
Summary:
Thes test proved to be a bit flaky due to some invalid input from quickeck,
let's filter these cases and ask quickcheck for different input.
Reviewed By: quark-zju
Differential Revision: D20397604
fbshipit-source-id: cc97bef0e286009f1394c7aa2c0429d1e1b1c8c2
Summary:
The clients should use an Rc/Arc if they need the ability to clone it. This
makes it more obvious and reduces the number of pointer indirection.
Reviewed By: quark-zju
Differential Revision: D20376839
fbshipit-source-id: c56e7e8f89ab17727be621894c329e344a7f3adb
Summary:
No need for a closure when the function that will be called takes the same
arguments.
Reviewed By: quark-zju
Differential Revision: D20367309
fbshipit-source-id: 986299c4b691330bf17164abd70f3b49ed5263f8
Summary:
In some situations Windows may be unable to remove a file and Mercurial
cannot do anything about it. Instead of raising an error, just ignore it.
In the Python code, errors are ignored in multiple layers, but ultimately
batchremove ignore all OSError errors and just prints a warning about it. A
future diff will build a side channel for errors/warnings reporting, this will
be used to properly report this.
Reviewed By: quark-zju
Differential Revision: D20284803
fbshipit-source-id: 7df80f9e27e720ce819fb4116ea996fca390ad76
Summary:
For now it only checks for symlinks outside of the repo, but can be extended to
check for more.
Caching is used to prevent syscalls to already checked directories, this cache
is intended to be done per-thread, and thus each directory may be checked a
maximum of N, with N being the number of threads. This is deemed reasonable for
now.
Things that the Python pathauditor does that this one doesn't:
- Case checking for Windows,
- Verifies no writes to .hg,
- Windows shortnames aliases,
Reviewed By: quark-zju
Differential Revision: D20265276
fbshipit-source-id: 8e29bedadcaa6df2515e394894ab9e1dd5a9e9a7
Summary:
Symlinks are treated a bit differently from plain files, what is stored in the
ContentStore is the destination of the symlink, not it's content (well, the
content of a symlink really is it's destination).
For now, only unix platforms support symlinks, in reality this should be a
filesystem property as writing to ntfs-3g should have the same behavior as on
Windows.
For executable, we just need to mark the file as executable after writing to
it.
Reviewed By: quark-zju
Differential Revision: D20250943
fbshipit-source-id: 022dabc750125df32953a151df7da60db69b2cec
Summary:
During `hg update`, Mercurial forks multiple processes to write files on disk
concurrently, this is done as fetching blobs from the content store, and
writing them to disk is CPU bound. Usually, threads would be the preferred way
of speeding up such process, but unfortunately, Python has GIL that severely
limit the available concurrency. So, multiple processes were chosen.
Unfortunately, the multi-process solution also brings a lot of other issues,
more recently, we've had cases where the connections to the server and memcache
had to be dropped after the fork. In some other cases, this caused deadlocks.
And the solution is not effective on Windows.
Now that Mercurial is getting more and more Rust, we could instead go back to
the threads solution by using them in Rust, and have Python just push work to
them, this is exactly what this change does.
Things that are left to be done, but I wanted to get a diff out first:
- no file path audit
- no file backup
- no symlink creation
- probably other things I'm missing
Reviewed By: quark-zju
Differential Revision: D20102888
fbshipit-source-id: d47829fd7818b97710586b9851880f178048e27b
Summary:
Previously env_logger is only initialized if Python is initialized.
This diff makes env_logger initialized for Rust native commands.
Reviewed By: xavierd
Differential Revision: D20286517
fbshipit-source-id: 18fee96c2b41db1da9648d615d1e18809de90a63
Summary:
The `lines` renderer doesn't work if the output encoding doesn't support the
curved line drawing characters. In this case we should fall back to
`lines-square`.
Rename `lines` to `lines-curved`, and change `lines` to pick the best renderer
to use based on what is possible with the current output encoding.
Reviewed By: quark-zju
Differential Revision: D20248022
fbshipit-source-id: dfaf359426528a9cb515fb3e1d366fbfb15162ff
Summary: This makes it friendly to Python 2.
Reviewed By: sfilipco
Differential Revision: D20162233
fbshipit-source-id: 5beb7a0f52159afc454332ff6e37e13087177cc0
Summary:
`parentfunc` is only needed when adding new nodes to the DAG.
Move it to `addheads` methods instead.
Reviewed By: sfilipco
Differential Revision: D20155398
fbshipit-source-id: 0bddd5f46e84c44891928b9f598a38206917aecb
Summary:
This makes it so that DAG calculations in NameDag are all using commit hashes.
The `id2node`, `node2id` APIs are still using integer ids, and hopefully their
usage can eventually be removed.
Reviewed By: sfilipco
Differential Revision: D20020527
fbshipit-source-id: ee32b1ccacabd5174ff1556e426b5ed32d2b8507
Summary:
This exposes the NameSet type to the Python world.
The code is similar to the SpanSet wrapper that exists in pydag.
Reviewed By: sfilipco
Differential Revision: D20020521
fbshipit-source-id: 840e009eadca7154f11ca61561da4c48022088f6
Summary:
Mercurial has a special case that b'\0' * 20 maps to rev -1 and means
"an empty commit". This cannot be cleanly supported by the zstore commit data,
since sha1("") is not '\0' * 20 and zstore does not allow faked SHA1 keys.
Therefore let's add the special case in the bindings layer. It's possible to
do this check in Python, but that'll be slower.
Reviewed By: sfilipco
Differential Revision: D20020520
fbshipit-source-id: 0686832666646f2e201035992e3951b47c32eb5a
Summary: Use the new NameDag as the backing structure and expose its APIs.
Reviewed By: sfilipco
Differential Revision: D20020528
fbshipit-source-id: ccb49e1a5e757bd35a3f71cfb54ceccfb544664e
Summary:
TreeSpans used to use `&str`, which adds a lifetime to the struct, making it
harder to be used in the Python land. Use a type parameter so TreeSpans<String>
can be used.
Reviewed By: DurhamG
Differential Revision: D19797708
fbshipit-source-id: c66429abfaf16d876151ca6f29da976bed91485d
Summary:
The bytes 0.5 is a depencency of newer tokio, it's also newer, and thus better.
Staying on 0.4 means that copies between Bytes 0.4 and 0.5 need to be done,
this will be especially bad in the LFS code since 10+MB buffer will have to be
copied...
One main API change is for the configparser. The code used to take Into<Bytes>
for the keys, I switched it to AsRef<[u8]>.
For hg_memcache_client, an extra copy is performed to build a Delta, since this
code uses an old tokio, and is being replaced right now, the effort of
switching to a new tokio and new bytes was not deemed worth it, the copy will
do for now.
Reviewed By: dtolnay
Differential Revision: D20043137
fbshipit-source-id: 395bfc3749a3b1bdfea652262019ac6a086e61e0
Summary:
This module previously used to handle deciding how a particular module should
be imported if it had multiple versions (e.g., pure Python or native).
However, as of D18819680 it was changed to always import the native C version.
Lets go ahead and remove it entirely now. Using `policy.importmod` simply
makes it harder for type checkers to figure out the actual module that will be
used.
The only functionality that `policy.importmod()` still provided was verifying
that the module contained a "version" field that looked like what was
expected. In practice these version numbers are not bumped often, so this
doesn't really seem to provide much value in checking that we imported the
correct version that we expected to be shipped with this release.
Reviewed By: xavierd
Differential Revision: D19958227
fbshipit-source-id: 05f1d027d0a41cf99c4aa93cb84a51e830305077
Summary:
Error strings were being converted to unicode if they contained certain
characters. This caused python 2 Mercurial to throw various errors when it tried
to turn them into strings to report errors.
Let's return cpython_ext::Str instead of String.
Reviewed By: sfilipco
Differential Revision: D20012188
fbshipit-source-id: af6fa7d98d68e3c188292e4972cfc1bdb758dbdf
Summary:
With the Arc embedded into the store themselves, this forces a second
allocation in order to use them as trait objects. Since in most cases, we do
not want the stores themselves to be cloneable, we can move the Arc outside and
thus reduce the number of pointer indirection.
Reviewed By: DurhamG
Differential Revision: D19867568
fbshipit-source-id: 9cd126831fe2b9ee715472ac3299b7a09df95fce
Summary:
There was a spot where we returned bytes for a filepath. Fix this to
make dirstate tests pass more.
Reviewed By: quark-zju
Differential Revision: D19786274
fbshipit-source-id: 7465cae8bb2e3be7758abc6279ed3f5f59581732