Summary:
The LFS server might be temporarily having issues, let's retry a bit before
giving up.
Reviewed By: DurhamG
Differential Revision: D20686659
fbshipit-source-id: 90dabd19e45a681d6eae5cd50c72b635d44c0517
Summary:
Since we have all the integer types, let's also allow float types in the
config.
Reviewed By: kulshrax
Differential Revision: D20697007
fbshipit-source-id: 21fa264d24c0f63c233f47c3bcfb2448b4c05c70
Summary:
When repacking for the purpose of file format changes, a single packfile may
contain data that needs to be moved out of it, and thus, we need to do a repack
then.
Reviewed By: DurhamG
Differential Revision: D20677442
fbshipit-source-id: c621dd2e657f5a4565b37d4b029731415b899117
Summary:
Remotestores can implement get_missing properly by simply querying the
underlying store that they will be writing to. This may prevent double fetching
some blobs in `hg prefetch` that we already have.
Reviewed By: DurhamG
Differential Revision: D20662668
fbshipit-source-id: 22140b5b7200c687e0ec723dd8879dc8fbea6fb9
Summary:
There are cases where the user of the abstraction needs to know if this is a
local store, this will simplify the caller code.
Reviewed By: DurhamG
Differential Revision: D20662666
fbshipit-source-id: e0bde7eb0dc3484979732a7c4cdf888fedc70e13
Summary:
By regularly flushing the blob store, we avoid keeping too many LFS blobs in
memory, which could cause OOM issues.
The default size is chosen to be 1GB, but is configurable for more control.
Reviewed By: DurhamG
Differential Revision: D20646213
fbshipit-source-id: 12c06fd0212ef3974bea10c82026b6e74fb5bf21
Summary:
In the legacy lfs extension, LFS blobs were stored as loosefiles on disk, and
as we saw with loosefiles for remotefilelog, they can incur a significant
overhead to maintain. Due to LFS blobs being large by definition, the number of
loose LFS blobs should be reasonable for repack to walk over all of them to
chose which one to throw away.
A different approach would be to simply store the blobs in an on-disk format
that allows automatic size management, and simple indexing. That format is an
IndexedLog. This of course doesn't come without drawbacks, the main one being
that the IndexedLog API mandate that the full blob is present on insertion,
preventing streaming writes to it, the solution is to simply chunk the blobs
before writing them to it. While proper streaming is not done just yet, the
storage format no longer prevent it from being implemented.
Reviewed By: DurhamG
Differential Revision: D20633783
fbshipit-source-id: 37a88331e747cf22511aa348da2d30edfa481a60
Summary:
RotateLog loads older logs lazily. If an older log is broken, remember that and avoid
loading the broken log again.
Reviewed By: DurhamG
Differential Revision: D20663899
fbshipit-source-id: 7a4b5279cc6387c19329a51048bfe1be2e0bc1f8
Summary:
Due to the Mononoke LFS server only being available on FB's network, the tests
using them cannot run outside of FB, including in the github workflows.
Reviewed By: quark-zju
Differential Revision: D20698062
fbshipit-source-id: f780c35665cf8dc314d1f20a637ed615705fd6cf
Summary:
The IdDag provides graph algorithms using Segments.
The IdMap allows converting from the SegmentedChangelogId domain to the
ChangesetId domain.
The Dag struct wraps IdDag and IdMap in order to provide graph algorithms using
the common application level identifiers for commits (ChangesetId).
The construction of the Dag is currently mocked with something that can only be
used in a test environment (unit tests but also integration tests).
This diff also implements a location_to_name function. This is the most
important new functionality that segmented changelog clients require. It
recovers the hash of a commit for which the client only has a segmented
changelog Id. The current assumption is that clients have identifiers for all
merge commit parents so the path to a known commit always follow a set
of first parents.
The IdMap queries will have to be changed to async in the future, but IdDag
queries we expect to stay sync.
Reviewed By: quark-zju
Differential Revision: D20635577
fbshipit-source-id: 4f9bd8dd4a5bd9b0de55f51086f3434ff507963c
Summary: The interesting observation is that InProcessStore is not public.
Reviewed By: quark-zju
Differential Revision: D20635578
fbshipit-source-id: a0149929c8059ff77f047fd385bf3b26dc738dfd
Summary:
One of the main drawback of the current version of repack is that it writes
back the data to a packfile, making it hard to change file format. Currently, 2
file format changes are ongoing: moving away from packfiles entirely, and
moving from having LFS pointers stored in the packfiles, to a separate storage.
While an ad-hoc solution could be designed for this purpose, repack can
fullfill this goal easily by simply writing to the ContentStore, the
configuration of the ContentStore will then decide where this data will
be written into.
The main drawback of this code is the unfortunate added duplication of code.
I'm sure there is a way to avoid it by having new traits, I decided against it
for now from a code readability point of view.
Reviewed By: DurhamG
Differential Revision: D20567118
fbshipit-source-id: d67282dae31db93739e50f8cc64f9ecce92d2d30
Summary:
While the primary (for now) way of addressing an LFS blob is via its sha256,
being able to address them via different hash schemes (sha1 for Eden/Buck,
blake2, etc) will be helpful down the line. Thus, let's store a HashMap of
ContentHash in the pointer store.
Reviewed By: DurhamG
Differential Revision: D20560197
fbshipit-source-id: 8bdc4fc4cd7fc19c7eed6a27d11953c4eedf9195
Summary: No locking is required for this one due to being loose files on disk.
Reviewed By: DurhamG
Differential Revision: D20522890
fbshipit-source-id: 72b7ebc063060a89f54976a1128977a3b7501053
Summary:
Instead of having the magic number 0x2000 all over the place, let's move the
logic to this method.
Reviewed By: DurhamG
Differential Revision: D20637749
fbshipit-source-id: bf666f8787e37e6d6c58ad8982a5679b7e3e717b
Summary:
`iter_segments_with_parent` has a few more conditions attached to it than the
name would imply. We are renaming it to give a better sense of its true
behavior.
Reviewed By: quark-zju
Differential Revision: D20547631
fbshipit-source-id: 406f46b9de5efc9e8e6a8c4bc22ab18fa5bc54bb
Summary:
The main question I had while writing the tests was whether we expect a
specific order for Segments for `iter_segments_with_parent`. `InProcessStore`
will return the segments in the order that they were inserted.
Reviewed By: quark-zju
Differential Revision: D20501401
fbshipit-source-id: 48ceb78f3191c7425c1488a3392cf3167f7e7268
Summary:
First 6 methods implemented from the IdDagStore trait for the InProcessStore.
Any suggestions welcome.
Reviewed By: quark-zju
Differential Revision: D20499228
fbshipit-source-id: cb536a3a0136077ada78934d82a25d079a5bc809
Summary:
Replace `rust-crypto` with `hex`, `sha-1`, `sha2`.
- `crypto::sha1::Sha1` with `sha1::Sha1`
- `crypto::sha2::Sha2` with `sha2::Sha2`
- `crypto::digest::Digest` with `sha1::Digest` and `sha2::Digest`
- `.result_str()` with `hex::encode` and `.result()`
Reviewed By: jsgf
Differential Revision: D20588313
fbshipit-source-id: 75c4342e8b6285f0f960f864c21457a1a0808f64
Summary:
In a strongly typed langage, using strings should be avoided whenever possible
as they do not provide the safety guarantees that types provide.
I took the liberty of removing all the filesystems that are not relevant for
Mercurial for simplification reasons. If needs arise, we can always add a new
FsType to the enum.
Reviewed By: DurhamG
Differential Revision: D20517138
fbshipit-source-id: 0a38b53c6a87f05f4b2d664038e10c4293de96ae
Summary:
Replace `rust-crypto` with `sha-1`:
- `crypto::digest::Digest` with `sha1::Digest`
- `crypto::Sha1` with `sha1::Sha1`
The interface changes slightly - no need to pass a mutable byte array when
getting the result.
Reviewed By: jsgf
Differential Revision: D20587638
fbshipit-source-id: c6c737f3f8eba94b98c728e198eb4fac12c5c80b
Summary:
Swap out `rust-crypto` for `sha-1`
- `crypto::sha1::Sha1` is replaced by `sha1::Sha1`
- `crypto::digest::Digest` is replaced by `digest::Digest`
Reviewed By: jsgf
Differential Revision: D20587685
fbshipit-source-id: 971fdaa8ce5b3e9e60db219131f6c36dcbc213d9
Summary:
Switched out the `sha` package for the `rust-crypto` package. The
apis aren't an exact match, so I had to insert a clone in place of
a modification to a mutable reference.
Reviewed By: jsgf
Differential Revision: D20585336
fbshipit-source-id: 22245157aea1115ae6f225b17b0346f0696653f7
Summary:
According to the anyhow documentation[0], the behavior of `.to_string()` is to
only stringify the top-level errors, hiding all the context of the error.
Instead, the debug format allows all the context to be displayed, and, if
available the backtrace.
This should significantly help debug Rust errors when context is available,
which we should strive to have everywhere!
[0]: https://docs.rs/anyhow/1.0.27/anyhow/struct.Error.html#display-representations
Reviewed By: sfilipco
Differential Revision: D20575944
fbshipit-source-id: 2968d7fb755edec7f7e5151138e8049ded181c1b
Summary: The signatures were used by the linter to warn if the files require regenerating, since the linter now regenerates the files regardless of the signature it is no longer needed to sign the files.
Reviewed By: krallin
Differential Revision: D20467745
fbshipit-source-id: aff2643f80939d5693e7a30abf07484c9060796f
Summary:
This is only intended for Mercurial .t tests and not in any production
environment.
Reviewed By: DurhamG
Differential Revision: D20504236
fbshipit-source-id: 618e17631b73afa650875cb7217ba7c55fb9f737
Summary:
For now, this is only used for LFS, as this is the only store that can
correctly answer both.
This API will be exposed to Python to be able to have cheap filectx comparison,
and other use cases.
Reviewed By: DurhamG
Differential Revision: D20504234
fbshipit-source-id: 0edb912ce479eb469d679b7df39ba80fceef05f2
Summary:
This enables fetching blobs from the LFS server. For now, this is limited to
fetching them, but the protocol specify ways to also upload. That second part
will matter for commit cloud and when pushing code to the server.
One caveat to this code is that the LFS server is not mocked in tests, and thus
requests are done directly to the server. I chose very small blobs to limit the
disruption to the server, by setting a test specific user-agent, we should be
able to monitor traffic due to tests and potentially rate limit it.
Reviewed By: DurhamG
Differential Revision: D20445628
fbshipit-source-id: beb3acb3f69dd27b54f8df7ccb95b04192deca30
Summary:
This is the start of migrating blackbox events to tracing events. The
motivation is to have a single data source for log processing (for simplicity)
and the tracing data seems a better fit, since it can represent a tree of
spans, instead of just a flat list. Eventually blackbox might be mostly
a wrapper for tracing data, with some minimal support for logging some indexed
events.
Reviewed By: DurhamG
Differential Revision: D19797710
fbshipit-source-id: 034f17fb5552242b60e759559a202fd26061f1f1
Summary:
Now Segment has no lifetime we can create it directly and return the ownership.
Performance of "building segments" does not seem to change:
# before
building segments 750.129 ms
# after
building segments 712.177 ms
Reviewed By: sfilipco
Differential Revision: D20505200
fbshipit-source-id: 2448814751ad1a754b90267e43262da072bf4a16
Summary:
This allows structures like BTreeMap to own and store Segment.
It was not possible until D19818714, which adds minibytes::Bytes interface for
indexedlog.
In theory this hurts performance a little bit. But the perf difference does not
seem visible by `cargo bench --bench dag_ops`:
# before
building segments 714.420 ms
ancestors 54.045 ms
children 490.386 ms
common_ancestors (spans) 2.579 s
descendants (small subset) 406.374 ms
gca_one (2 ids) 161.260 ms
gca_one (spans) 2.731 s
gca_all (2 ids) 287.857 ms
gca_all (spans) 2.799 s
heads 234.130 ms
heads_ancestors 39.383 ms
is_ancestor 113.847 ms
parents 251.604 ms
parent_ids 11.412 ms
range (2 ids) 117.037 ms
range (spans) 241.156 ms
roots 507.328 ms
# after
building segments 750.129 ms
ancestors 53.341 ms
children 515.607 ms
common_ancestors (spans) 2.664 s
descendants (small subset) 411.556 ms
gca_one (2 ids) 164.466 ms
gca_one (spans) 2.701 s
gca_all (2 ids) 290.516 ms
gca_all (spans) 2.801 s
heads 240.548 ms
heads_ancestors 39.625 ms
is_ancestor 115.735 ms
parents 239.353 ms
parent_ids 11.172 ms
range (2 ids) 115.483 ms
range (spans) 235.694 ms
roots 506.861 ms
Reviewed By: sfilipco
Differential Revision: D20505201
fbshipit-source-id: c34d48f0216fc5b20a1d348a75ace89ace7c080b
Summary:
The later is what is now recommended, and no longer requires a macro to
initialize a lazy value, leading to nicer code.
Reviewed By: DurhamG
Differential Revision: D20491488
fbshipit-source-id: 2e0126c9c61d0885e5deee9dbf112a3cd64376d6
Summary:
Lots of different warnings on this one. Main ones were:
- One bug where .write was used instead of .write_all
- Using .next instead of .nth(0) for iterators,
- Using .cloned() instead of .map(|x| x.clone())
- Using conditions as expressions instead of mut variables
- Using .to_vec() on slices instead of .iter().cloned().collect().
- Using .is_empty instead of comparing .len() against 0.
Reviewed By: DurhamG
Differential Revision: D20469894
fbshipit-source-id: 3666a44ad05e0fbfa68d490595703c022073af63
Summary:
These were from a wide variety of warnings. The only one I haven't addressed is
that clippy complains that Pin<Box<Vec<u8>>> can be replaced by Pin<Vec<u8>>. I
haven't investigated too much into it, someone more familiar with this code can
probably figure out if this is buggy or not :)
Reviewed By: DurhamG
Differential Revision: D20469647
fbshipit-source-id: d42891d95c1d21b625230234994ab49bbc45b961
Summary:
This belongs to D20149376. However buck test does not include benchmarks so it
was not noticed.
Reviewed By: DurhamG
Differential Revision: D20505097
fbshipit-source-id: 24daeb17b68808f8e69e18452ab2cf26c7aa10a7
Summary:
The mutation store stores entries with a floating-point timestamp. This
pattern was copied from obsmarkers.
However, Mercurial uses integer timestamps in the commit metadata (the
parser supports floats for historical reasons, but only stores integer
timestamps). Mononoke also uses integer timestamps in its `DateTime`
type.
To keep things simple, switch to using integer timestamps for mutation
entries. Existing entries with floating point timestamps are truncated.
Add a new entry format version that encodes the timestamp as an integer.
For now, continue to generate the old version so that old clients can
read entries created by new clients.
Reviewed By: quark-zju
Differential Revision: D20444366
fbshipit-source-id: 4d6d9851aacb314abea19b87c9d0130c47fdf512
Summary:
Tracking the origin of mutation entries did not prove useful, and just creates
an un-necessary overhead. Remove the tracking and repurpose the field as a
version field.
Reviewed By: quark-zju
Differential Revision: D20444365
fbshipit-source-id: 65ff11ee8cfe77d5e67a83d03a510541d58ef69b
Summary: Using ptr.add is shorter and preferred to ptr.offset.
Reviewed By: quark-zju
Differential Revision: D20452752
fbshipit-source-id: 1dc2fdbc392267d2d690673c10dcc161ecd00dfa
Summary:
These warnings are fairly trivial, as it recommends using single quote (char)
for single characters search instead of a double quote (str).
Reviewed By: quark-zju
Differential Revision: D20452408
fbshipit-source-id: b2951e133e57633a8e766536e22969fa9ac0ecee
Summary:
Clippy had 3 sources of warnings in this crate:
- from_str method not in impl FromStr. We still have 2 of them in path.rs, but
this is documented as not supported by the FromStr trait due to returning a
reference. Maybe we can find a different name?
- Use of mem::transmute while casts are sufficient. I find the cast to be
ugly, but they are simply safer as the compiler can do some type checking on
them.
- Unecessary lifetime parameters
Reviewed By: quark-zju
Differential Revision: D20452257
fbshipit-source-id: 94abd8d8cd76ff7af5e0bbfc97c1e106cdd142b0
Summary:
Clippy complains about 3 things:
- Using raw pointers in a public function that is not declared as unsafe. This
happens for C exported ones, this feels like a warning, so I haven't changed
it.
- Using .map(...).unwrap_or(<default value constructed>). The recommendation
is to use .unwrap_or_default().
- Single match instead of if let, the latter makes code much shorter.
Reviewed By: quark-zju
Differential Revision: D20452751
fbshipit-source-id: 8eeff7581c119c651ca41d8117f1f70f15774833
Summary:
Right now the module has one implementation IndexedLogStore. The name could
be more specific in the context of the crate.
The goal will be to add a trait for storage requirements of IdDag and
make IndexedLogStorage one implementation of that trait.
Reviewed By: quark-zju
Differential Revision: D20446042
fbshipit-source-id: 7576e1cc4ad757c1a2c00322936cc884838ff710
Summary:
Purge needs to be able to see what directories the walker traversed, so
it can delete them if they are empty. Instead of having the walker call
match.traversedir (which it seems like a bizarre pattern to use the matcher as a
holder for a non-matching related function), let's have the walker return an
enum and have an option to return directories.
At the python layer we then translate this into match.traversedir calls, but we
can clean that up later.
Reviewed By: quark-zju
Differential Revision: D19543795
fbshipit-source-id: cc51c86c91799d3df2c65d25a7b6cfe810206d0a
Summary:
In preparation for supporting returning directories from the walker (to
support purge), let's rename the result structure to be more generic.
Reviewed By: kulshrax
Differential Revision: D19543791
fbshipit-source-id: 9b71452c879cf397ae92533a4ef4727140ac7369
Summary:
The mercurial tests print errors when they encounter 'fifo' files.
Let's handle that case.
Differential Revision: D19543796
fbshipit-source-id: f87d4b9c3f0ad8b8d8ebe2e6d18e325fc93d0ae9
Summary:
While the sha256 of a blob gives access to its content, it doesn't allow
accessing its metadata, by adding a sha256 index, we can easily get the
metadata of a blob via its content hash.
Reviewed By: quark-zju
Differential Revision: D20445624
fbshipit-source-id: 42c04bd69d3c7380706c6237c5b4f4061c016cca
Summary: This is necessary to properly test LFS stores.
Reviewed By: quark-zju
Differential Revision: D20445625
fbshipit-source-id: 530ddf87249e8d721957806f2d8edef3262f303c
Summary:
The OpenOptions allow for multiple indices to be added, but lookup had no way
to querying these multiple indices.
Reviewed By: quark-zju
Differential Revision: D20445627
fbshipit-source-id: 0cb754ba17b452d892b7bcb56d502d5753ef963a
Summary:
This type can either be a Mercurial type key, or a content hash based key. Both
the prefetch and get_missing now can handle these properly. This is essential
for stores where data can either be fetched in both ways or when the data is
split in 2. For LFS for instance, it is possible to have the LFS pointer (via
getpackv2), but not the actual blob. In which case get_missing will simply
return the content hash version of the StoreKey, to signify what it actually
has missing.
Reviewed By: quark-zju
Differential Revision: D20445631
fbshipit-source-id: 06282f70214966cc96e805e9891f220b438c91a7
Summary:
Similarly to the DataStore trait, this makes it easier to understand that they
deal with a Mercurial type Key.
Reviewed By: quark-zju
Differential Revision: D20445621
fbshipit-source-id: a1143d5f5d6a2c8686d517a6ea3c25b07c0df072
Summary: This makes it clear that these traits are dealing with Mercurial Key.
Reviewed By: quark-zju
Differential Revision: D20445626
fbshipit-source-id: d5acbf442e9407b973e95e40af69b5a61bff0a4d
Summary:
Since configparser enforces utf-8 config files (because pest wants Rust strings),
let's migrate from Bytes to Text to remove extra encoding conversions.
Previously this was blocked by the lack of ref-counted text (since the "source"
of each config location is the entire config file). Now minibytes provides Text
so we can use it.
This unfortunately requires dependent code to be updated. The pyconfigparser
interface is in theory wrong - it shouldn't return utf-8 bytes but
local-encoded bytes. I think it's cleaner to make pyconfigparser unaware of
HGENCODING, so I changed pyconfigparser to use unicode, and add compatibility
layer in uiconfig.py.
This also fixes non-ascii encoding issues on user name (especially on Windows).
The hgrc config file should be in utf-8 and the config parser returns explicit
unicode types, and Python code round-trip them with local encodings.
Reviewed By: markbt
Differential Revision: D20432938
fbshipit-source-id: b1359429b8f1c133ab2d6b2deea6048377dfeca1
Summary:
This makes it easier to further migrate to `Text` interface.
Dependent crate (`auth`) is updated.
Reviewed By: markbt
Differential Revision: D20432941
fbshipit-source-id: 1dc29d52c9b17ce14676ef0555470c6d36a09c2b
Summary:
Text is a reference-counted shared String.
It's similar to Bytes but works for utf-8 strings.
The motivation is to replace configparser's use of Bytes to Text.
Reviewed By: markbt
Differential Revision: D20432940
fbshipit-source-id: ef990255d269e60d433c6520819f60ccdcbe488f
Summary: This makes it possible to implement "Text". See the next diff.
Reviewed By: markbt
Differential Revision: D20432943
fbshipit-source-id: 94b3810ab205c260d33f57bd637e4accc3ee871d
Summary:
This makes the API easier to use.
Practically this makes it easier for configparser to migrate to minibytes.
Reviewed By: markbt
Differential Revision: D20432942
fbshipit-source-id: ad08eb118d2216054dc24c86b0b129ae82b9d17c
Summary:
Previously Rust str was serialized into bytes. To be Python 3 friendly, let's
serialize it into `str`.
Reviewed By: markbt
Differential Revision: D19797706
fbshipit-source-id: 388eb044dc7e25cdc438f0c3d6fa5a5740f22e3d
Summary:
The goal of the stack is to support "rendering" diffs for large files in scs
server. Note that rendering is in quotes - we are fine with just showing a
placeholder like "Binary file ... differs". This is still better than the
current behaviour which just return an error.
In order to do that I suggest to tweak xdiff library to accept FileContentType
which can be either Normal(...) meaning that we have file content available, or
Omitted, which usually means the file is large and we don't even want to fetch it, and we
just want xdiff to generate a placeholder.
Reviewed By: markbt, krallin
Differential Revision: D20389226
fbshipit-source-id: 0b776d4f143e2ac657d664aa9911f6de8ccfea37
Summary:
This will be used in the Python world for legacy reasons. It shouldn't be used
in new Rust node.
To use it, the name `LegacyCodeNeedIdAccess` has to be used so we can do a code
search to find all users of it.
Reviewed By: sfilipco
Differential Revision: D20367834
fbshipit-source-id: 9b93a29f1461ce24bba6f31a2bbb1f327e216c6d
Summary: This will be useful to actually sort commits.
Reviewed By: sfilipco
Differential Revision: D20367835
fbshipit-source-id: 43bc7835277af3a14ef323ce34247e0c03878dc8
Summary:
The old "AllSet" implementation is not very practical - it does not support
iteration. Practically, the "all()" set comes from the DAG. Change the "all"
concept to a hint similar to "is_topo_sorted", and update the fast path
(intersection) accordingly.
Reviewed By: sfilipco
Differential Revision: D20367837
fbshipit-source-id: fdbf370897c93058bfcab0571c1f6fa4b99b0f6b
Summary: The word "snapshot" more accurately describes its purpose.
Reviewed By: sfilipco
Differential Revision: D20367836
fbshipit-source-id: c91a0bd402fa1718b5d805beedc0e062824c53d3
Summary:
Without this:
In [3]: util.getfstype('')
IOError: [Errno 2] No such file or directory (os error 2)
And there is a code path hitting this:
File "edenscm/mercurial/util.py", line 1483, in checknlink
fstype = getfstype(os.path.dirname(testfile))
# testfile = '.'
# os.path.dirname(".") = ""
The old implementation works fine for an empty path:
In [2]: m.util.getfstype('')
Out[2]: 'eden'
So let's make the new Rust implementation consistent.
Reviewed By: xavierd
Differential Revision: D20313387
fbshipit-source-id: 258c424a3e8a796d983e20b0d4656e8e3f413706
Summary: Similar to D13982877. Try to get names like "fuse.ntfs".
Reviewed By: farnz
Differential Revision: D20313392
fbshipit-source-id: 8363d3d92843e6afb53a0003950be083034bd841
Summary:
Only keep type parameters at the top-level function.
This reduces the binary size and speeds up rustc.
Reviewed By: xavierd
Differential Revision: D20313388
fbshipit-source-id: 29d77731ff462fee1f1bb9f234601e3430198ae7
Summary: This makes the code a bit more portable.
Reviewed By: xavierd
Differential Revision: D20313389
fbshipit-source-id: 080538939fa4d2d72e5905f23ad9be987d952748
Summary:
Rename the main method to "fstype". The API has no relation with repo.
So let's rename it.
Reviewed By: xavierd
Differential Revision: D20313386
fbshipit-source-id: 80dd1231ccccfe945150b117b151bce773f0dfeb
Summary:
Since the mocked memcache is shared between the tests, we need to make sure the
keys used by the tests are different, otherwise they are just caching each
others data.
Reviewed By: ikostia
Differential Revision: D20388783
fbshipit-source-id: 0f2f926e0ffe0e52e55291e46142808ce0921288
Summary:
Some `use`s are not used on Windows. The code was also formatted using the
latest rustfmt.
Reviewed By: xavierd
Differential Revision: D20379704
fbshipit-source-id: ffadcd68e4e0440dcbd2a4e1ad8532b47a9d83e2
Summary: Similarly to the ContentStore, remove the Arc from MetadataStore.
Reviewed By: quark-zju
Differential Revision: D20376838
fbshipit-source-id: 4321600b752c919b6d9fa7bdee6f6cb7ae083b10
Summary:
The clients should use an Rc/Arc if they need the ability to clone it. This
makes it more obvious and reduces the number of pointer indirection.
Reviewed By: quark-zju
Differential Revision: D20376839
fbshipit-source-id: c56e7e8f89ab17727be621894c329e344a7f3adb
Summary:
The dag crate is designed to work with any kind of binary commit hashes (ex. bonsai,
git or hg). The only use of `types` is to convert from binary to hex. Since dag
already has its own `to_hex` logic in `VertexName`. Let's use that instead.
Reviewed By: sfilipco
Differential Revision: D20378447
fbshipit-source-id: 00ecb551ea927fdb60dd91e5e645064f23139bcd
Summary:
Recently there are some Windows-related test flakiness in . All of them are
caused by `file.persist(path)` in `atomic_write_plain` failing with
"Access Denied". Since that can be caused by Windows Anti-Virus scans or other
weird stuff, let's workaround around it using automatically retires.
Process Explorer does not provide extra information:
indexedlog-d0c6135fd7ed9ece.exe 5868 SetRenameInformationFile C:\Users\quark\AppData\Local\Temp\.tmpKERc5G\.tmpcfDsQQ ACCESS DENIED ReplaceIfExists: True, FileName: C:\Users\quark\AppData\Local\Temp\.tmpKERc5G\meta
A successful rename looks like:
indexedlog-d0c6135fd7ed9ece.exe 5868 SetRenameInformationFile C:\Users\quark\AppData\Local\Temp\.tmpKERc5G\.tmpbXEVw0 SUCCESS ReplaceIfExists: True, FileName: C:\Users\quark\AppData\Local\Temp\.tmpKERc5G\meta
Reviewed By: ikostia
Differential Revision: D20379618
fbshipit-source-id: db3e6be3d785875486f7a517df11cbf58bf65ddd
Summary:
Now that the ContentStore can automatically strip the metadata header, no need
for duplicated code in the backingstore.
Reviewed By: fanzeyi
Differential Revision: D20376812
fbshipit-source-id: e863e1cc2fcdc8b9e612a464b305fa25ceb66e13
Summary:
During `hg update`, Mercurial forks multiple processes to write files on disk
concurrently, this is done as fetching blobs from the content store, and
writing them to disk is CPU bound. Usually, threads would be the preferred way
of speeding up such process, but unfortunately, Python has GIL that severely
limit the available concurrency. So, multiple processes were chosen.
Unfortunately, the multi-process solution also brings a lot of other issues,
more recently, we've had cases where the connections to the server and memcache
had to be dropped after the fork. In some other cases, this caused deadlocks.
And the solution is not effective on Windows.
Now that Mercurial is getting more and more Rust, we could instead go back to
the threads solution by using them in Rust, and have Python just push work to
them, this is exactly what this change does.
Things that are left to be done, but I wanted to get a diff out first:
- no file path audit
- no file backup
- no symlink creation
- probably other things I'm missing
Reviewed By: quark-zju
Differential Revision: D20102888
fbshipit-source-id: d47829fd7818b97710586b9851880f178048e27b
Summary:
With this new store, blobs will be transparently written to either an LFS
store, or a non-LFS one, depending on their size.
Initially, and as long as getpackv2 is supported, we also need to support
parsing lfs pointer data that the server is sending and write these to the lfs
pointer store. This code is very adhoc and does manual parsing of the pointer
data, definitively not great, suggestion for a simple and better solution is
welcome :).
From a migration standpoint, the read-only LFS stores are added to the
ContentStore, this allows blobs written in it to be readable at all time even
when `remotefilelog.lfs` isn't set. The code will effecitvely be dormant for a
while until the option is turned on, if we need to disable it, the dormant code
will still be able to read all the blobs written to disk. This forces us to
deploy a release that contains this code to stable first, before setting
`remotefilelog.lfs`.
Reviewed By: quark-zju
Differential Revision: D19986878
fbshipit-source-id: 260f5a542d52e748c0c703bfa7bb8ffac0e7b388
Summary: This makes `RUST_LOG` work for indexedlog tests.
Reviewed By: xavierd
Differential Revision: D20286515
fbshipit-source-id: ff4a1476eb01a9067dabe3622fd598f65fe86a18
Summary:
The tracing / env_logger integration works for hg as a binary. However I'd also
like to use it in library tests. This crate makes it easier to do so.
Reviewed By: xavierd
Differential Revision: D20286507
fbshipit-source-id: f5bf3288ce950591ddfe64b524ad51ce21ee4099
Summary: Those has helped me debugging some issues.
Reviewed By: xavierd
Differential Revision: D20286513
fbshipit-source-id: 012ddb16c2d0efd8f8697a5ecd4564ea31d65630
Summary: Move the scope of spans so the exit code is shown.
Reviewed By: xavierd
Differential Revision: D20286516
fbshipit-source-id: f39cbf60c86ea19a1bb0a09958748f04ff6a42e8
Summary:
Previously env_logger is only initialized if Python is initialized.
This diff makes env_logger initialized for Rust native commands.
Reviewed By: xavierd
Differential Revision: D20286517
fbshipit-source-id: 18fee96c2b41db1da9648d615d1e18809de90a63
Summary:
This means crates like env_logger (which reads $RUST_LOG, and writes to stderr)
can be used for convenient debugging.
Reviewed By: xavierd
Differential Revision: D20286514
fbshipit-source-id: e3b80cc4830ba5cc6dbf7aa1cbb92a4f4f046a54
Summary:
Those metadata include module_path, target, line number, etc, in Rust native
format. They will be used for the upcoming `log` integration.
Reviewed By: xavierd
Differential Revision: D20286510
fbshipit-source-id: 27019b941bef08c0bb3e505bbdae642282dcb141
Summary:
Spliting lock file acquisition from `IdDag::prepare_filesystem_sync` to its own
function.
Useful when looking ahead to split IdDag from IndexedLog.
Reviewed By: quark-zju
Differential Revision: D20316443
fbshipit-source-id: a0fd43439730376920706bb4349ce497f6624335
Summary:
This removes an inline use of the indexedlog indexes.
This is going to be useful when we try to separate IndexedLog specifics from
IdDag functionality.
Reviewed By: quark-zju
Differential Revision: D20316058
fbshipit-source-id: 942a0a71660bb327376c81fd3ac435d002ecca6e
Summary:
Instead of returning `anyhow::Error` wrapping an `ErrorKind` enum
from each Thrift client method, just return an error type specific
to that method. This will make error handling simpler and less
error-prone by removing the need to downcast the returned error.
This diff also removes the `ErrorKind` enums so that we can be sure
that there are no leftover places trying to downcast to them.
(Note: this ignores all push blocking failures!)
Reviewed By: dtolnay
Differential Revision: D20260398
fbshipit-source-id: f0dd96a7b83dd49f6b30948660456539012f82e6
Summary:
The old code does "read, lock, write", which is unsound because after "lock"
the data just read can be outdated and needs a reload.
Reviewed By: xavierd
Differential Revision: D20306137
fbshipit-source-id: a1c29d5078b2d47ee95cf00db8c1fcbe3447cccf
Summary:
I thought the index function could be the bottleneck. However, the Log reading
(xxhash, decoding vlqs) can be much slower for very long entries. Therefore
using bytes as the lag threshold is better. It does leaked the Log
implementation details (how it encodes an entry) to some extend, though.
Reverts D20042045 and D20043116 logically. The lagging calculation is using
the new Index::get_original_meta API, which is easier to verify correctness
(In fact, it seems the old code is wrong - it might skip Index flushes if
sync() is called multiple times without flushing).
This should mitigate an issue where a huge entry (generated by `hg trace`) in
blackbox does not get indexed in time and cause performance regressions.
Reviewed By: DurhamG
Differential Revision: D20286508
fbshipit-source-id: 7cd694b58b95537490047fb1834c16b30d102f18
Summary: This will be used to more reliably detect index lags.
Reviewed By: DurhamG
Differential Revision: D20286518
fbshipit-source-id: c553b6587363a55603b75df12580588e3100e35f
Summary:
This ensures indexes are complete even if index format or definition has been
changed.
Reviewed By: DurhamG
Differential Revision: D20286509
fbshipit-source-id: fcc4ebc616a4501e4b6fd2f1a9826f54f40b99b8
Summary:
This avoids loading all blackbox logs when `init()` gets called multiple times
(for example, once in Rust and once in Python).
Reviewed By: DurhamG
Differential Revision: D20286511
fbshipit-source-id: ef985e454782b787feac90a6249651a882b6552e
Summary: This API has the benefit that it does not trigger loading older logs.
Reviewed By: DurhamG
Differential Revision: D20286512
fbshipit-source-id: 426421691ad1130cdbb2305612d76f18c9f8798c
Summary:
With the new crate-public interfaces and Debug implementations it's possible to
write tests for DagSet. So let's do it.
Reviewed By: sfilipco
Differential Revision: D20242561
fbshipit-source-id: 180e04d9535f79471c79c4307f6ab6e8e8815067
Summary:
Don't restrict constructing a c_api datapack store to only Unix, we can
construct it on Windows too by assuming that their path will be valid UTF-8.
Reviewed By: quark-zju
Differential Revision: D20250718
fbshipit-source-id: 07234b6a71b50c803cfe3b962fa727f57037c919
Summary: This returns the ancestors in the reverser order as the parents method.
Reviewed By: sfilipco
Differential Revision: D20265277
fbshipit-source-id: 83277cee3d8e9070fc56d20d4c1877e6782c22f7
Summary: Those will be reused by nameset::DagSet.
Reviewed By: sfilipco
Differential Revision: D20242563
fbshipit-source-id: 944e9a04aeb15439256ecea64355b67e326e5c89
Summary:
This is useful for `assert_eq!(format!("{:?}", set), "...")` tests.
It will be eventually exposed to Python as `__repr__`, similar to Python's
smartsets.
Reviewed By: sfilipco
Differential Revision: D20242562
fbshipit-source-id: 5373bb180db7cafebf273ace7cf2cb80fbfb8038
Summary:
In the Python world all smartsets have some kind of "debug" information. Let's
do something similar in Rust.
Related code is updated so the test is more readable.
Reviewed By: sfilipco
Differential Revision: D20242564
fbshipit-source-id: 7439c93d82d5d037c7167818f4e1125c5a1e513e
Summary:
Previously, `flush()` will skip writing the file if there are only metadata
changes. Fix it by detecting metadata changes.
This can potentially fix an issue that certain blackbox indexes are empty,
lagging and require scanning the whole log again and again. In that case,
the index itself is not changed (the root radix entry is not changed), but
only the metadata tracking how many bytes in Log the index covered
changed.
Reviewed By: sfilipco
Differential Revision: D20264627
fbshipit-source-id: 7ee48454a92b5786b847d8b1d738cc38183f7a32
Summary:
Using `if cfg!` instead of `#[cfg]` allows for the compiler to understand
that the arguments aren't unused, and silence the warnings.
Reviewed By: quark-zju
Differential Revision: D20242280
fbshipit-source-id: 332dfe17b3a80a1096d15c91c9fb6644bd10e0cd
Summary:
Compiling it on Windows produced a bunch of warning due to
`hgrc_configset_load_path` not being compiled on it. Fixed it so it no longer
depends on Unix specific imports.
Reviewed By: quark-zju
Differential Revision: D20241102
fbshipit-source-id: 3002f961191fbb9bc51aa9ac1154d6d50bd7fe23
Summary:
The `.into_iter()` for this object is being deprecated and won't compile in
the future, fix it now.
Reviewed By: quark-zju
Differential Revision: D20241103
fbshipit-source-id: fdee463ed81cd07a65f3cc4c70a96c88928b3b87
Summary:
While compiling on Windows, this file issues a bunch of warnings, use `if
cfg!` instead of `#[cfg]` to silence them. The behavior is the same, but the
later allows the compiler to recognize that some is not unused.
Reviewed By: quark-zju
Differential Revision: D20241104
fbshipit-source-id: 2cd7f171c7a2f7220cc73bea9be3359260de19b2
Summary:
The change is in theory not necessary. However it improves the reliability on
OS crashes a bit, and can potentially workaround some bugs in filesystems
(as we saw in production where the atomic-written files are empty and the
system didn't crash).
The idea is, the `symlink` syscall does the file creation and "content" writing
together, while there is no way to create a file and write specific content
in one syscall. Note that the C symlink call uses 0-terminated string, and
the Rust stdlib exports it as accepting `Path`. To be safe, we encode binary
or non-utf8 content using `hex`.
For downgrade safety, the write path does not use symlink by default unless
format.use-symlink-atomic-write is set to true. This makes downgrade possible:
the read path is rolled out first, then we can turn on and off the write path.
The indexedlog Rust unit tests and test-doctor.t are migrated to use the new
symlink code paths.
Reviewed By: DurhamG
Differential Revision: D20153864
fbshipit-source-id: c31bd4287a8d29575180fbcf7227d2b04c4c1252
Summary:
This makes it possible to implement atomic_write differently (ex. use a
symlink).
Reviewed By: DurhamG
Differential Revision: D20153865
fbshipit-source-id: 07fa78c2f2dac696668f477c75f65cf70950b73f
Summary:
This makes it clear that `log` is a math concept, not an append-only file like
`Log`.
Reviewed By: DurhamG
Differential Revision: D20149376
fbshipit-source-id: 67d2e9584b15f48759ca9b6dfce4279a5b1365a0
Summary:
Context: https://fb.workplace.com/groups/rust.language/permalink/3338940432821215/
This codemod replaces *all* dependencies on `//common/rust/renamed:futures-preview` with `fbsource//third-party/rust:futures-preview` and their uses in Rust code from `futures_preview::` to `futures::`.
This does not introduce any collisions with `futures::` meaning 0.1 futures because D20168958 previously renamed all of those to `futures_old::` in crates that depend on *both* 0.1 and 0.3 futures.
Codemod performed by:
```
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| sed 's,TARGETS$,:,' \
| xargs \
-x \
buck query "labels(srcs, rdeps(%Ss, //common/rust/renamed:futures-preview, 1))" \
| xargs sed -i 's,\bfutures_preview::,futures::,'
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| xargs sed -i 's,//common/rust/renamed:futures-preview,fbsource//third-party/rust:futures-preview,'
```
Reviewed By: k21
Differential Revision: D20213432
fbshipit-source-id: 07ee643d350c5817cda1f43684d55084f8ac68a6
Summary:
Also patch aho-corasick to fix the issue.
The issue was introduced by [an optimization path](063ca0d253) added in aho-corasick 0.7 series (used by globset 0.4.3).
aho-corasick 0.6.x (globset 0.4.2) are not affected.
The next aho-corasick release (0.7.9) contains the fix.
See https://github.com/BurntSushi/aho-corasick/issues/53 for more context.
Reported by: yns88
Reviewed By: DurhamG
Differential Revision: D20125697
fbshipit-source-id: 592375b43d7ee494bb3e916a1cb11c18f9ebe425
Summary:
Migrate away from some uses of revision numbers.
Some dead code in discovery.py is removed.
I also fixed some test issues when I run tests locally.
Reviewed By: sfilipco
Differential Revision: D20155399
fbshipit-source-id: bfdcb57f06374f9f27be51b0980652ef50a2c8e0
Summary: This makes it possible to use NameIter in py_class.
Reviewed By: sfilipco
Differential Revision: D20020529
fbshipit-source-id: b9147b7dccb38d18d8361b420507fcbe97e01351
Summary: This will be used by commit hash prefix lookup.
Reviewed By: sfilipco
Differential Revision: D20020523
fbshipit-source-id: f2905ddf63098704b08dad8eb48272c3ffba7e25
Summary: Export common types at the top-level of the crate so it's easier to use.
Reviewed By: sfilipco
Differential Revision: D20020526
fbshipit-source-id: e9a0a8bc3cc91f81d0bc74e7530cd4613fc1dd61
Summary: Those just delegate to IdDag for the actual calculation.
Reviewed By: sfilipco
Differential Revision: D20020522
fbshipit-source-id: 272828c520097c993ab50dac6ecc94dc370c8e8b
Summary: This will be used to produce NameSet.
Reviewed By: sfilipco
Differential Revision: D20020519
fbshipit-source-id: abf6d73f2b985b74560d6b5db2800ff25450f02e
Summary: DagSet's SpanSet has fast paths for set operations. Use them.
Reviewed By: sfilipco
Differential Revision: D19912104
fbshipit-source-id: 24b55aa14d03be2f1be59c923e0b8e79d6bcbe6d
Summary: This is similar to hg's fullreposet. It'll be useful as a dummy "subset".
Reviewed By: sfilipco
Differential Revision: D19912108
fbshipit-source-id: 33a95bcb3cf5931a431a1201d1a1f3c627cec7a1
Summary: SortedSet is a wrapper to other sets that marks it as topologically sorted.
Reviewed By: sfilipco
Differential Revision: D19912111
fbshipit-source-id: 2637e8fd29b97f6db0c5bae3f0decd7ac382eeb1
Summary:
Wraps SpanSet + IdMap so it only exposes commit names without ids.
There is no equivalent smartset in Mercurial.
Reviewed By: sfilipco
Differential Revision: D19912112
fbshipit-source-id: 0d257de11527dfa8836065ac94f652730a97a468
Summary: Similar to Mercurial's smartset.baseset. All names are statically known.
Reviewed By: sfilipco
Differential Revision: D19912105
fbshipit-source-id: e4fcf2d59291adb3ca01b3b90f1ac32c65ad7eaa
Summary:
This is an example about how to use the new Bytes type. The performance change
is not obviously visible in benchmarks since the bottleneck is not at the bytes
copying.
Reviewed By: DurhamG
Differential Revision: D19818720
fbshipit-source-id: a431ae206cfa4fa08b2e162a48b3d7cbcd900f7f
Summary: The APIs are compatible so the switch is straightforward.
Reviewed By: DurhamG
Differential Revision: D19818713
fbshipit-source-id: 504e9149567c90eb661804e0dad20580a401aa76
Summary:
D20042045 changes the meaning of "lag_threshold". Update the value in mutation
store accordingly.
Reviewed By: DurhamG
Differential Revision: D20043116
fbshipit-source-id: 154e6dc2aa88ab0a9a9b21929ae5fa6163dcd403
Summary:
Previously indexes are only updated at `sync()` time. This diff makes it so
`open()` can also update lagging indexes. This should make index migration
(ex. D19851355) smoother - indexes are built in time and users suffer less from
the absent of indexes.
Reviewed By: DurhamG
Differential Revision: D20042046
fbshipit-source-id: 20412661a0ca4f5f67b671137c47b6373a42981d
Summary: The logic is currently only used by `sync()`. I'd like to reuse it at `open()`.
Reviewed By: DurhamG
Differential Revision: D20042044
fbshipit-source-id: 5c9734ff68bdcf8f8c8710c6a821b18d3afeaca0
Summary:
This is more friendly for indexedlog users - deciding lag_threshold by number
of entries is easier than by bytes.
Initially, I thought checking `bytes` is cheaper and checking `entries` is more
expensive. However, practically we will have to build indexes for `entires`
anyway. So we do know the number of entries lagging behind.
Reviewed By: DurhamG
Differential Revision: D20042045
fbshipit-source-id: 73042e406bd8b262d5ef9875e45a3fd5f29f78cf
Summary:
This can be useful for users of indexedlog when they want `Bytes` (to get rid
of the lifetime parameter).
This might be useful for storage layer that wants to take the ownership of the
returned bytes.
Reviewed By: xavierd
Differential Revision: D19818714
fbshipit-source-id: cb2d4e7deff921915e07454fee15cb94a3d5c00d
Summary: Those utilities are no longer necessary since the new code uses Bytes.
Reviewed By: xavierd
Differential Revision: D19818717
fbshipit-source-id: 0b43af0f1eae1a4288e84d4170db058b27f80334
Summary: This simplifies the code a bit and makes it cheaper to clone the Log.
Reviewed By: xavierd
Differential Revision: D19818716
fbshipit-source-id: bbf07b8b36009d53b63d8066ec422fc3c3796840
Summary: It's no longer used since Index now has inlined its checksum logic.
Reviewed By: ikostia
Differential Revision: D19850744
fbshipit-source-id: eb134e4c1613573a2d238710b44ad8119c80a5ee
Summary:
Change index filename and metadata name. This makes sure the new format and old
format are separate so upgrading or downgrading won't have issues.
Reviewed By: DurhamG
Differential Revision: D19851355
fbshipit-source-id: 25dee018073a90040f5818b32b753a3f589c10e0
Summary:
Enhance the index format: The Root entry can be followed by an optional
Checksum entry which replaces the need of ChecksumTable.
The format is backwards compatible since the old format will be just
treated as "there is no ChecksumTable", and the ChecksumTable will be built on
the next "flush".
This change is non-trivial. But the tests are pretty strong - the bitflip test
alone covered a lot of issues, and the dump of Index content helps a lot too.
For the index itself without ".sum", checksum, this change is bi-directional
compatible:
1. New code reading old file will just think the old file does not have the
checksum entry, similar to new code having checksum disabled.
2. Old code will think the root+checksum slice is the "root" entry. Parsing
the root entry is fine since it does not complain about unknown data at the
end.
However, this change dropped the logic updating ".sum" files. That part is an
issue blocking old clients from reading new data.
Reviewed By: DurhamG
Differential Revision: D19850741
fbshipit-source-id: 551a45cd5422f1fb4c5b08e3b207a2ffe3d93dea
Summary:
To solve the soundness issue of ChecksumTable raised by the last diff.
I plan to move Checksum logic to Index. This has multiple benefits:
- Solve the soundness issue of ChecksumTable.
- Indexedlog no longer writes the ".sum" files. `atomic_write` can be quite
slow (tens of milliseconds) on Windows. So this should help perf - with
many indexes, it can save hundreds of milliseconds on Windows per
indexedlog sync.
This diff adds the definition and serialization of the new Checksum entry.
The index format is not updated yet.
Reviewed By: markbt
Differential Revision: D19850742
fbshipit-source-id: df6e6ed12a12ef0d2a782dc9d6b4dc5dec3f4b46
Summary:
With the last change, mmap cost is reduced, but ChecksumTable is unsound in a
corner case: the buffer to check is shorter than what ChecksumTable covers:
checksum: |----chunk----|----chunk----|----chunk--|
buf: |-------------------------------| |
^ ^
logic len physical len
The checksum table will be unable to verify the last chunk, since it does not
have enough data in buf.
The issues is exposed by stress testing the multithread sync tests. It's not
always easy to reproduce, though.
Reviewed By: markbt
Differential Revision: D19850745
fbshipit-source-id: a1a96080163b7b9b56dcd6c1673d5d8d10e18a2b
Summary: This avoids some extra mmap syscalls by ChecksumTable.
Reviewed By: xavierd
Differential Revision: D19818721
fbshipit-source-id: dace55193f2b4b0f35e3868781faa2d2998d3b58
Summary:
This simplifies the code a bit (no special cases about 0-sized mmap buffers)
and makes it cheaper to clone the index buffer (just an Arc::clone, without
another mmap syscall).
Reviewed By: xavierd
Differential Revision: D19818718
fbshipit-source-id: e96d42af74c7f0bb11703c5da31cdfbd5d76c372
Summary:
TreeSpans used to use `&str`, which adds a lifetime to the struct, making it
harder to be used in the Python land. Use a type parameter so TreeSpans<String>
can be used.
Reviewed By: DurhamG
Differential Revision: D19797708
fbshipit-source-id: c66429abfaf16d876151ca6f29da976bed91485d
Summary:
The filtering interface allows callsite to select what they want. It's similar
to manifest walk with files or directory matchers in source control.
Reviewed By: DurhamG
Differential Revision: D19784467
fbshipit-source-id: 5cf6e4016d6fa1c90f8aeccc50809baccd4af5ab
Summary: The idea is that instants (events) can be a drop-in replacement for `ui.log`.
Reviewed By: DurhamG
Differential Revision: D19782897
fbshipit-source-id: 795bbba23d921e460f723f19ef529b203aea366a
Summary: This function will be reused by the next diff.
Reviewed By: DurhamG
Differential Revision: D19782895
fbshipit-source-id: 1e636eabee9b0dffd287a1e6784a24ab2259f51f
Summary: This allows us to define methods on the treespans, such as filtering APIs.
Reviewed By: DurhamG
Differential Revision: D19782896
fbshipit-source-id: 2e7bd8344c0196e382728c26a8233abf944bbf29
Summary: The Thrift generated code depends only on futures 0.3, not 0.1. Thus it isn't necessary to depend on renamed:futures-preview and we can depend on futures-preview directly, which is exposed to Rust code as `futures::`.
Reviewed By: jsgf
Differential Revision: D20145921
fbshipit-source-id: 5cae94ec6747a374c2bf05f124ab237c798de005
Summary:
This new method returns the content of a blob without the copy-from metadata
header.
Reviewed By: DurhamG
Differential Revision: D20102889
fbshipit-source-id: e96f636b7d30460b59707a2cb700d667e616116a
Summary:
The NameSet is something similar to SpanSet and Mercurial's smartset but speaks
VertexNames instead of Ids. The idea is, NameSet will be part of NameDag APIs,
and potentially replace Mercurial's smartset layer (just smartset the container
types, not the revset language), in a way that revision numbers are completely
hidden behind the scenes.
This diff adds some basic abstraction around iteration-related operations.
Other operations will be added later.
Reviewed By: sfilipco
Differential Revision: D19912109
fbshipit-source-id: 504a26c074282ec51f260535ca63e943124f688e
Summary:
Update the `print_status()` function to take a `clidispatch::io::IO` object as
a parameter, instead of a simple output object. This will allow us to also
print error messages from this function in a future diff.
Reviewed By: quark-zju
Differential Revision: D19958504
fbshipit-source-id: bf482fdc4420e1350363a730c6a539cd760aef25
Summary:
Fix the PathRelativizer APIs to accept `Path` and even `str` arguments instead
of just `PathBuf`. The old code required a `PathBuf`, which often forced
callers to make a copy of the path data.
Reviewed By: quark-zju
Differential Revision: D19958505
fbshipit-source-id: 6fa40dd4b75df4e3faf9ad2ae4f0e4e6595669f6
Summary:
The bytes 0.5 is a depencency of newer tokio, it's also newer, and thus better.
Staying on 0.4 means that copies between Bytes 0.4 and 0.5 need to be done,
this will be especially bad in the LFS code since 10+MB buffer will have to be
copied...
One main API change is for the configparser. The code used to take Into<Bytes>
for the keys, I switched it to AsRef<[u8]>.
For hg_memcache_client, an extra copy is performed to build a Delta, since this
code uses an old tokio, and is being replaced right now, the effort of
switching to a new tokio and new bytes was not deemed worth it, the copy will
do for now.
Reviewed By: dtolnay
Differential Revision: D20043137
fbshipit-source-id: 395bfc3749a3b1bdfea652262019ac6a086e61e0
Summary:
Mercurial filenode hash is computed by including the copy information in the
blob header. Before computing the blob content hash, or returning it to the
upper layers, we need to either strip or reconstruct this header appropriately.
Reviewed By: DurhamG
Differential Revision: D19975887
fbshipit-source-id: 7555e7219e50f4d18ec677fdecc216ee705d7af4
Summary: This will make it easier to support more hash schemes in the future.
Reviewed By: DurhamG
Differential Revision: D19975888
fbshipit-source-id: 8b8ce3b20d72199bac3cd20a48475b5ab56bfc52
Summary:
With the Arc embedded into the store themselves, this forces a second
allocation in order to use them as trait objects. Since in most cases, we do
not want the stores themselves to be cloneable, we can move the Arc outside and
thus reduce the number of pointer indirection.
Reviewed By: DurhamG
Differential Revision: D19867568
fbshipit-source-id: 9cd126831fe2b9ee715472ac3299b7a09df95fce
Summary:
The ContentStore now can read LFS blobs from both the shared cache, and the
local store.
Reviewed By: DurhamG
Differential Revision: D19866249
fbshipit-source-id: a6fb3523495e9d3832613b56438f631cfa552b91
Summary:
With the LFS store being added, and the indexedlog being soon used for trees,
this simplification should help in formalizing the hierarchy of files/folders.
It will look like the following:
<root dir>/lfs: for the lfs store
<root dir>/indexedlog*: for the indexedlog
<root dir>/foobar: for a hypothetical foobar store
For manifests, <root dir> will therefore be: <store dir>/manifests. The
unfortunate part is that the current tree data lives under
<store dir>/packs/manifests. As packfiles will be replaced, this small
discrepency is acceptable.
Reviewed By: DurhamG
Differential Revision: D19866248
fbshipit-source-id: 7ef59ef7df19149b19a529b4f4a45a479cc9d23b
Summary:
This is the first step in having a stronger integration between LFS blobs and
the ContentStore abstraction. The 2 main difference between the Python based
LFS implementation and this one are:
- pointers are not stored alongside plain data,
- blobs are split between local and shared blobs
As of now, no reclamation is being performed for shared blobs, blobs aren't
fetched or uploaded. This will come in future diffs.
Reviewed By: DurhamG
Differential Revision: D19859291
fbshipit-source-id: 45000fc574e6fbd6d3487f4966cad4f49dab731c
Summary:
Update the C files under edenscm/mercurial/cext to use absolute includes from
the repository root. Also update a few of the libraries in edenscm/mercurial
that the cext code depends on.
This makes these files easier to build with Buck in fbsource, and reduces the
number of places where we have to use deprecated Buck functionality to help
find these headers. This also allows autodeps to work with the build targets
for these rules.
Reviewed By: xavierd
Differential Revision: D19958221
fbshipit-source-id: e6e471583a795ba5773bae5f16ed582c9c5fd57e
Summary: Generate the Cargo.toml files inside xdiff with autocargo. This will enable Mononoke to depend on this code easily without sacrificing anything on eden/scm side.
Reviewed By: aslpavel
Differential Revision: D19948741
fbshipit-source-id: 905ff3d64b90830e5f075e4c6ed2b3de959e3f00
Summary: This will be used in the LFS store.
Reviewed By: DurhamG
Differential Revision: D19895803
fbshipit-source-id: 4cf447987c10fed0b5c98904f20c841428965d89
Summary:
In some cases, higher level stores may want to store data in either a plain
IndexedLog, or in a RotateLog, for local and shared data. Due to slight
difference between the 2, they can't easily be adapted into a common trait.
Instead let's just wrap both into an enum and implement the main functions that
the higher level stores need.
The first use of this will be the LfsStore, future use will include the
IndexedLogDataStore and the IndexedLogHistoryStores.
Reviewed By: DurhamG
Differential Revision: D19859292
fbshipit-source-id: 920572e0cf5f69bda4901a727a6b0dc0f08fc8d0
Summary:
When I run make local it's creating changes in our checked in thrift
types. I guess I need to check these in?
Reviewed By: quark-zju
Differential Revision: D19848706
fbshipit-source-id: 8a2e9a2617734eda41eade1f2645689362b1d75d
Summary:
Up to now, this has been done in chef, and thus for repos that we do not list,
they may share the memcache keys, with potential unintended consequences. Let's
always add the repo name to the key, so we can simplify the code in chef.
One small negative effect of this change is that while it is being rolled out,
the cache hit rate will be impacted. This should resolve itself quickly.
Reviewed By: DurhamG
Differential Revision: D19885775
fbshipit-source-id: 0b59ce9e378b0ab70f696a39d19d27cd89921098
Summary:
Failing means that we fallback to the Python importer. Let's simply warn about
it.
Reviewed By: fanzeyi
Differential Revision: D19897274
fbshipit-source-id: f9c63f5aa76015c28b31f00bba98244f5c86e923
Summary:
This makes it possible to use `Bytes` for mmap buffers.
The changes are because `minibytes::Bytes` does not implement `From<&[u8]>`
with the intention to make slice copy explicit.
Reviewed By: xavierd
Differential Revision: D19818719
fbshipit-source-id: c34ee451bfd2dc7bcbbcebd52a76444b6c236849
Summary:
EdenFS will now be able to fetch blobs directly from memcache. This won't have
any big benefits as no blobs are in memcache right now, but over time, this
will significantly reduce the cost of fetching blobs.
Reviewed By: fanzeyi
Differential Revision: D19861643
fbshipit-source-id: c2e9d317bd30d4656bf0b3f8897794161697761a
Summary:
These tracing points will help us understand the memcache hit rate as well as
the fetching speed.
Reviewed By: quark-zju
Differential Revision: D19836499
fbshipit-source-id: 1936c44efc3e7715069e6a959f5331139d591d5c
Summary:
Everytime a cache miss is seen, the data fetched from the server will be sent
directly to memcache for future use. Unfortunately, doing so in a blocking
manner severely impact the overall fetching speed from the server. Since
memcache is purely an optimization, we can afford to send data to it
asynchronously.
Let's move as much as possible of the code to a background thread to reduce the
overhead of memcache.
Reviewed By: DurhamG
Differential Revision: D19836011
fbshipit-source-id: 68e506ef7464d6e99d98457d0d37178f514be1a9
Summary:
Instead of fetching data one-by-one, let's prefetch data concurrently by using
the new get_iter function.
Reviewed By: DurhamG
Differential Revision: D19836009
fbshipit-source-id: 4a50328c0cbbba677c2de3777ebe4c34cb10c1e2
Summary:
Even when memcache would be able to prefetch everything, this would always call
into the underlying remote store with an empt key set. For things like `hg
prefetch` and a large number of keys, the effect of doing that is minimum, but
for EdenFS or `hg log -p`, the roundtrip to the server for every file/revision
would add a significant amount of overhead. Let's simply stop iterating when we
no longer need to fetch anything.
Reviewed By: DurhamG
Differential Revision: D19835797
fbshipit-source-id: 54ad704428c3b20d973cfa87f7171899ec44b3f9
Summary:
See also https://github.com/serde-rs/bytes/.
This will be used in the `dag` crate.
Reviewed By: DurhamG
Differential Revision: D19770858
fbshipit-source-id: 2a870a564e0ceecdc7a4667853b2b2a5ea4ce6e3
Summary:
This crate provides the core features of the commonly known `Bytes` crate:
zero-copy slicing and cloning, while also supports mmap-backed buffers.
The main motivation is to replace `Mmap` in `indexedlog`. That has multiple
benefits:
- Handles 0-sized mmap more cleanly.
- Handles clones more cleanly.
- Gain the flexibility to zero-copy data without lifetime / reference.
- Gain the flexibility to switch to non-mmap data.
The `bytes::Bytes` crate does not yet support mmap buffers as of its latest
release (0.5.4).
Implementation wise, `minibytes::Bytes` uses `Option<Arc<dyn Trait>>` for the
"trait object". This makes implementing the mmap storage just one line.
`bytes 0.5.4` re-invents the "trait object" manually using unsafe code. It requires
about 50 lines to implement the mmap storage (in D19756122).
Reviewed By: xavierd
Differential Revision: D19770856
fbshipit-source-id: 8cfa7052a18ac2e0cd6348b77d5e2a4acc61195c
Summary: This makes the output more readable even if the "name" of a span is very long.
Reviewed By: DurhamG
Differential Revision: D19780536
fbshipit-source-id: dce0d3777409c32b0752db51341a572addb823ea
Summary:
As initializing the memcache client takes ~0.7s, let's move it to a background
thread as to not impact Mercurial startup time. This diff uses ArcSwap in
order to reduce the overhead of the very common read paths as much as possible.
Using Mutex or RwLock instead would have caused unecessary contention.
Reviewed By: DurhamG
Differential Revision: D19518693
fbshipit-source-id: 886e9b86813fda6ff005ccce99659890026f643a