Commit Graph

740 Commits

Author SHA1 Message Date
Xavier Deguillard
dd69d770c3 edenapi: plumb MutableHistoryStore
Summary: The edenapi is now independant of the storage type for history data.

Reviewed By: kulshrax

Differential Revision: D15284355

fbshipit-source-id: 72a5db42bb0fb19ee03155b13914202581ab5966
2019-05-09 18:33:50 -07:00
Xavier Deguillard
4b79ef008f revisionstore: implement MutableHistoryStore for MutableHistoryPack
Summary:
This allows for a MutableHistoryPack to be used where a MutableHistoryStore
will be required. Once an IndexedLog based history store is implemented we will
be able to switch between the 2 more easily.

Reviewed By: kulshrax

Differential Revision: D15284356

fbshipit-source-id: 91d75ddc6991c26eace67d77679bb8d5806cf8b8
2019-05-09 18:33:49 -07:00
Xavier Deguillard
2ddabd35f9 revisionstore: add MutableHistoryStore
Summary: This will help in abstracting the kind of store that is being written to.

Reviewed By: kulshrax

Differential Revision: D15284358

fbshipit-source-id: ab6a6d23978480ca65587b745ae39ac6ed98cca9
2019-05-09 18:33:49 -07:00
Xavier Deguillard
89b44424d6 edenapi: use a MutableDeltaStore instead of a MutableDataPack
Summary:
The type of store where data is stored is now fully abstracted to the python
bindings. For now, edenapi will write to the pending mutabledatapack, but we
can now switch it easily to any other store implementing MutableDeltaStore,
including an IndexedLogDataStore.

Reviewed By: kulshrax

Differential Revision: D15266191

fbshipit-source-id: 638cf90a567ef170e0302376312c4b82e6d6b6da
2019-05-09 18:33:48 -07:00
Xavier Deguillard
96c66954e5 revisionstore: implement MutableDeltaStore for MutableDataPack and IndexedLogDataStore
Summary:
This will allow to transparently use the IndexedLogDataStore or a datapack in
the edenapi code.

Reviewed By: kulshrax

Differential Revision: D15266194

fbshipit-source-id: 6396118a5c8107a8c91e5fc83fe4297d4321d10c
2019-05-09 18:33:48 -07:00
Xavier Deguillard
053e854327 revisionstore: add MutableDeltaStore
Summary:
This will be used to abstract writing to a MutableDataPack or
IndexedLogDataStore (or both).

Reviewed By: kulshrax

Differential Revision: D15266193

fbshipit-source-id: 99f2383555addbafea81a2752e8d6759a1c1c5e7
2019-05-09 18:33:48 -07:00
Arun Kulshreshtha
0e0df5861e types: don't re-export structs from api module
Summary: As we add more functionality to the Eden API, we will have a lot more request structs. These structs are only used by the HTTP data fetching code, and should not be used by actual business logic. As such, while these types need to be public (so that both Mononoke and Mercurial can use them), they should not be re-exported at the top level.

Reviewed By: quark-zju

Differential Revision: D15268439

fbshipit-source-id: e7d1405d2ac234892baedbf7dbf3e133d187cb45
2019-05-09 14:46:57 -07:00
Stefan Filip
4e9787b879 manifest: handle missing line feed in tree manifest entries
Summary:
A tree manifest entry must always end with a line feed. It is somewhat
redundant but that's how the serialization is defined. Sometimes that last
line feed is missing in our data. I don't know why.

Reviewed By: quark-zju

Differential Revision: D15110860

fbshipit-source-id: c4ac5075e22a8b8851f6b246d22af8ab68f42a74
2019-05-08 10:07:28 -07:00
Stefan Filip
a8bc9fc3a7 manifest: use dynamic dispatch for tree manifest store
Summary:
This is a quality of life improvement for working with the storage layer.
We probably don't gain a whole lot by statically linking the store and it is
useful to have some flexibility in the storage layer.

Differential Revision: D15110859

fbshipit-source-id: 6102acafa21dd1dbaeed0f8fc3147538a8c301d1
2019-05-08 10:07:27 -07:00
Xavier Deguillard
f94c1f0c69 revisionstore: repack packfiles to get under 50 packfiles.
Summary:
When remotefilelog.fetchpacks is enabled, it's possible that 100 packfiles of
100MB each are present. In this case, every new packfiles that
hg_memcache_client would write will force an incremental repack, which will
only reduce the number of packfiles by a small number.

Let's have a simple heuristic that tries to bring the number of packfiles to be
lower than 50.

Reviewed By: DurhamG

Differential Revision: D15203771

fbshipit-source-id: 18c39487d5ac087d4879004993c1c1add087249c
2019-05-06 12:07:31 -07:00
Adam Simpkins
38819c08e8 add a copyright header to lib/dag/src/tests.rs
Summary: This fixes the test-check-fix-code.t test.

Reviewed By: quark-zju

Differential Revision: D15212386

fbshipit-source-id: b9125d691b50ca44e59674da818666b0b2633ee5
2019-05-06 11:59:04 -07:00
Jun Wu
5aff235940 dag: add some ancestor related algorithms
Summary:
Add simple algorithms to select all ancestors of a single node, or calculate a
"random" gca of two nodes.

Mercurial supports more "advanced" operations, like calculating ancestors of
multiple nodes, or calculating all ancestors of more than 2 nodes. We'll
see if those are necessary and maybe build fast or slow paths for them.

Reviewed By: sfilipco

Differential Revision: D15055347

fbshipit-source-id: c8c2bac2797d0389adb58c89b67e3ddfb62eb06f
2019-05-03 13:35:41 -07:00
Jun Wu
0be69e90e3 dag: add some tests about segmentation and id assignments
Summary:
We now have enough building blocks to put things together.

The tests are taken from the slides. Both examples and some corner cases, plus
two maybe-interesting synthetic cases.

There are probably more details to test. But this should give us some level of
confidence.

Reviewed By: sfilipco

Differential Revision: D15055346

fbshipit-source-id: a76b70fec0ec7e88378830f251f997d147416db0
2019-05-03 13:35:41 -07:00
Jun Wu
a04bdd0ab9 dag: add logic to build high level segments
Summary:
High-level segments are built on top of lower level segments.
We simply scan through them, and greedily pick the longest ones.

Reviewed By: sfilipco

Differential Revision: D15055348

fbshipit-source-id: 3b72dc766abd46669b787187b7d1d5f7171c026a
2019-05-03 13:35:41 -07:00
Jun Wu
c0295c5218 dag: add logic to build flat segments
Summary:
To build flat segments, we take a `get_parents(id) -> [id]` function and an end
`id`. Then scan through the missing ids and try to make them segments greedly.

Reviewed By: sfilipco

Differential Revision: D15055351

fbshipit-source-id: 21a503d4c3894583a314c6dfd4c7b87fafb95d95
2019-05-03 13:35:40 -07:00
Jun Wu
73fdb98c4c idmap: add logic to translate parent function
Summary:
segment::Dag wants a `get_parents` function that speaks Ids instead of slices,
as segment::Dag works entirely on Ids.

Provide a function to translate `get_parents` on byte slices to Ids.

Reviewed By: sfilipco

Differential Revision: D15055350

fbshipit-source-id: 795367cf809f068c0cad2515af02c93e14960236
2019-05-03 13:35:40 -07:00
Jun Wu
a45f29ebcd idmap: add logic to assign IDs
Summary:
Assigning IDs affects performance of segments. Therefore

The logic is abstracted in a way that the callsite only needs to provide a
`get_parents(slice) -> [slice]` function, and a slice to begin with. This
is intended to make it reusable for multiple cases:

- drawdag get_parents, for tests
- revlog get_parents
- Mononoke get_parents

Reviewed By: sfilipco

Differential Revision: D15055349

fbshipit-source-id: d6475737eb87f5ab7d7bd8123a8f4ae2b6d108e8
2019-05-03 13:35:40 -07:00
Jun Wu
c32c89ffa4 drawdag: implement a Rust drawdag library
Summary:
This library parses an ASCII DAG. It is similar to mercurial/drawdag.py, which
was added by me in [1].

There are some (intentional) differences from the Python drawdag:

- Stricter. Confusing DAG characters like `+` or crossing lines are forbidden.
- Do not special handle `o` as a name.
- Do not try to be compatible with `hg log -G` output.
- Do not support special comments (yet).
- Support both left to right and bottom to top directions.

This library tries to be abstract. i.e. it does not have actual logic about
how to make a commit. Its intended users are Mononoke and scmdag, which have
different ways to make commits.

Since this is a library that is intended to be used only for tests. I didn't spend too
much effort to optimize its performance.

[1]: https://www.mercurial-scm.org/repo/hg/rev/a31634336471

Reviewed By: kulshrax

Differential Revision: D15039768

fbshipit-source-id: 4c33d44759ecf59aadc3d443a84db07d702dc69b
2019-05-03 13:35:40 -07:00
Jun Wu
771b6f05e5 dag: implement basic operations
Summary: Implement basic lookup by head, lookup by id, and write operations.

Reviewed By: sfilipco

Differential Revision: D15019663

fbshipit-source-id: 3747f7e855a20120762d4e12b098e99b2ed3dfcb
2019-05-03 13:35:40 -07:00
Jun Wu
a84e552e85 dag: introduce segment::Dag
Summary:
The segment::Dag structure stores all levels of segments. The "segment" concept
is introduced by D14937221.

This diff adds empty structures and the serialization format.

Reviewed By: sfilipco

Differential Revision: D15019662

fbshipit-source-id: 8136acd45dc8526391e94c5ae98b609d4f8b392a
2019-05-03 13:35:39 -07:00
Arun Kulshreshtha
60f8a0938f edenapi: improve progress reporting
Summary:
This diff adds a new progress reporting framework to the Eden API crate and uses it to power progress bars for HTTP file downloads in Mercurial.

The new `ProgressManager` type is designed to aggregate progress values across multiple concurrent HTTP transfers. The API is currently designed to integrate well with libcurl's progress callback API, allowing all of the curl handles within a curl multi session to concurrently report their progress.

This progress can then be reported (in aggregate) to a user-provided callback. In most cases, this callback will be a Rust wrapper around a callback provided by the Python code. The `EdenAPI` trait and FFI bindings have been updated accordingly to allow optionally passing in a callback for long-running operations.

Lastly, in `remotefilelog`'s Python code, the callback is specified as a Python closure that simply updates the progress bar.

Reviewed By: quark-zju

Differential Revision: D15179983

fbshipit-source-id: ee677b71beff730f91aafe0364124f7ea0671387
2019-05-02 14:20:55 -07:00
Arun Kulshreshtha
453c7a95fc edenapi: make hg debughttp print real hostname of api server
Summary: Per title, `hg debughttp` now prints out the hostname that the API server reports rather than the hostname in the URL we used to connect to it. The reason for this is that if the API server is behind a VIP, we get the actual hostname rather than just the VIP URL.

Differential Revision: D15170618

fbshipit-source-id: 9af5480f9987d8ea9c914baf3b62a00ad88d1b32
2019-05-01 19:56:32 -07:00
Xavier Deguillard
a85ba47d9b radixbuf: replace error-chain with failure
Summary:
The former is no longer maintained and throws warnings with recent rust
versions.

Reviewed By: singhsrb

Differential Revision: D15109706

fbshipit-source-id: 94479cdedf42c4dd99e35fa8e337d2fc73f74eb5
2019-04-29 15:36:02 -07:00
Arun Kulshreshtha
77ac4bc5c2 edenapi: move active transfer check into loop body
Summary: Due to the structure of this loop, we were unnecessarily blocking and polling when all curl transfers were already complete. To fix this, move the loop condition check to the middle of the loop.

Reviewed By: quark-zju

Differential Revision: D15124823

fbshipit-source-id: 92b7eee83cbfd62d590c21893f3235e1ca04fcec
2019-04-29 13:18:39 -07:00
Shu-Ting Tseng
21051a1380 hg: fix test-check-fix-code.t
Summary: This is the result from running `python ./contrib/fix-code.py $(hg files .)`.

Reviewed By: HarveyHunt

Differential Revision: D15121815

fbshipit-source-id: 994a44e155806252c57c0a3c9c448101d21c6b57
2019-04-29 08:05:56 -07:00
Jun Wu
e5b14be2c9 pathmatcher: migrate to rust 2018 and tempfile
Summary: Modernize the crate.

Differential Revision: D15093337

fbshipit-source-id: 474af9118111fb056a786f3b4308f58c325bc0bb
2019-04-26 17:45:27 -07:00
Xavier Deguillard
b5b3d84be2 treestate: convert to rust 2018
Summary: Modernize it!

Reviewed By: singhsrb

Differential Revision: D15106304

fbshipit-source-id: 5fe900bb4a2a67591f20b2841122ed264f81dfa8
2019-04-26 16:23:35 -07:00
Xavier Deguillard
66888131f0 treestate: replace rand::ChaChaRng with rand_chacha
Summary: The former triggers warnings when compiling, and recommend using rand_chacha

Reviewed By: singhsrb

Differential Revision: D15106307

fbshipit-source-id: 58ad62cc96cc8878086d79d83bcdc2075b416375
2019-04-26 16:23:35 -07:00
Xavier Deguillard
00506c11d5 treestate: replace error-chain with failure
Summary:
The error-chain crate is un-maintained and triggers warnings when compiling
with new versions of Rust. Let's use the failure crate instead to be consistent
with the other crates.

Reviewed By: singhsrb

Differential Revision: D15106306

fbshipit-source-id: 8edcf9f9aaf4b6e2d5f214b26fed3e72d4f3acd1
2019-04-26 16:23:35 -07:00
Xavier Deguillard
be88594e1d configparse: use .as_span instead of .into_span
Summary:
The later is deprecated in pest, causing the compiler to warn about it. This
also removes a handful of clone operation.

Reviewed By: quark-zju

Differential Revision: D15091596

fbshipit-source-id: 9bd902d9efb9aef3aba55e11b4472653a895bfcd
2019-04-26 12:37:08 -07:00
Xavier Deguillard
2f58502ba8 revisionstore: use replace instead of direct drop
Summary:
Instead of manually dropping some of the datapack/historypack fields, we can
drop the entire object. This allows implementing the Drop trait more easily.
But, this prevents the code from later using some of the object fields. We can
use replace to move them in a zero-copy fashion.

Reviewed By: DurhamG

Differential Revision: D15076017

fbshipit-source-id: 4831dfcc2005c957862d32eeda02f62796be3afb
2019-04-25 18:53:06 -07:00
Jun Wu
e7a89e8c45 spanset: make Span a dedicated type
Summary:
Use a dedicated Span type so we can enforce reverse ordering and `start <= end`
directly on the Span structure.

The constructor of `SpanSet` becomes more expensive because it recreates the
`Vec`, and sorts it. Practically, hopefully it's fine. Internal logic like union will
not use that constructor.

Some comments and tweaks have been made to make the code easier to read.

There are some performance changes, though:

Before:

  intersection                    5.030 ms
  union                           5.920 ms
  difference                      4.804 ms

After:

  intersection                    6.036 ms
  union                           5.426 ms
  difference                      4.710 ms

`intersection` becomes slower, while `union` and `difference` become a bit faster.
Hopefully the regression is within the acceptable range.

Reviewed By: sfilipco

Differential Revision: D15023651

fbshipit-source-id: ea7845d5d20faf204cfb85c66fc3bd6e25c9fc0c
2019-04-25 17:05:11 -07:00
Jun Wu
c6d9d5604b spanset: add a benchmark of set operations
Summary: This would provide some data about changes around SpanSet.

Reviewed By: sfilipco

Differential Revision: D15023652

fbshipit-source-id: 4cff7d1876fe20cd876f26926f31e018b6c88fd9
2019-04-25 17:05:10 -07:00
Jun Wu
a611cbb0db idmap: implement write operations
Summary:
Complete the IdMap interface so it's usable.

There are 2 possible use patterns:
- On-disk IdMap + In-memory additions. Practically, the server provides an
  on-disk map, and the client might assign missing commits on demand. The
  client still needs to update the IdMap during pull.
- Everything is on-disk. There are no in-memory additions. This is more complex
  because the local commits might become part of the server commits in the
  future, and it might require Ids for those commits to be re-assigned.

I haven't decided which way to go exactly. So let's keep the interface flexible
for both.

That said, I do want to reduce the chance of causing filesystem race conditions
for filesystem writes. In this case, both reads and writes should hold a lock.
So a dedicated type is used to encourage the pattern of:

  - get the dedicated type (and hold the filesystem lock)
  - read, write, sync

Write related methods are not moved to the dedicated type, to cover the
in-memory addition use-case.

Reviewed By: sfilipco

Differential Revision: D15008517

fbshipit-source-id: 5d117ed7f2947aed6ed524a3b5199c071908c4ae
2019-04-25 17:05:10 -07:00
Jun Wu
c10a9bc67e idmap: introduce IdMap
Summary:
There will be lots of algorithms or structures that operate on integers as
commit identities. The source of truth of commit identities are the commit
hashes. Add a map to be able to translate between them.

The map is designed to be sparse, so it can be used as a cache if the map
is moved to server-side.

The map does not take `[u8; 20]` as its value type, with the intention to
support other hash functions. For example, Bonsai Blake2 hashes have 32 bytes.

Since the integer id is in global namespace and can conflict if there
are multiple writers. The interface is designed to make sure an explicit
critical section is needed for write (to filesystem) operations.

Reviewed By: sfilipco

Differential Revision: D15008518

fbshipit-source-id: 9f53aae551c54e1b47b5f837642ea00fca8579c3
2019-04-25 17:05:10 -07:00
Jun Wu
21c4700f0c spanset: implement iterator
Summary: The iterator iterates integers inside the SpanSet.

Reviewed By: sfilipco

Differential Revision: D15004983

fbshipit-source-id: 05b2de0d78f2640b8db2cbad69aa79d71b8d196d
2019-04-25 17:05:10 -07:00
Jun Wu
8a552db6da spanset: implement set operations
Summary: Add intersection, union, and difference operations.

Reviewed By: sfilipco

Differential Revision: D15004986

fbshipit-source-id: 8d1b5ca5ac5ba936afb136f5c5a4ddf1f8862161
2019-04-25 17:05:10 -07:00
Jun Wu
0f8c084266 spanset: introduce SpanSet
Summary:
The spanset is a set of integer spans. It will be used by some DAG related
operations. It'll be used as a subset of mercurial/smartset.py.

Note: smartset.py also has a Python `spanset` structure. That is different
from this Rust spanset in these ways:

- The Rust set does not preserve ordering.
- The Rust set can have multiple spans, instead of just one.
- The Rust set is less abstract (for now). Its set operations (union, etc.)
  only work on the same type.

This diff adds some initial functions for it.

Reviewed By: sfilipco

Differential Revision: D15004985

fbshipit-source-id: c2e5e2a80e2e4681c2f443e0d8a83dc97f7be371
2019-04-25 17:05:09 -07:00
Jun Wu
26f1f12a28 dag: add a library
Summary: The scmdag library is going to have things related to the commit graph.

Reviewed By: sfilipco

Differential Revision: D15004984

fbshipit-source-id: f274cceeabae4a57985763216572f7cd055f8e07
2019-04-25 17:05:09 -07:00
Arun Kulshreshtha
f1c1cf95d6 edenapi: release GIL during data fetching
Summary: Release the GIL during data fetching to allow for progress bars to update properly. The data fetching code is pure Rust and does not interact with the Python interpreter at all, so releasing the GIL here is safe.

Differential Revision: D15051852

fbshipit-source-id: 144da953720951f9a30aadfc2b7fc8c8bc6b14aa
2019-04-24 13:33:58 -07:00
Xavier Deguillard
621d5f637c revisionstore: describe the serialization format
Summary: Reading a comment is easier than trying to figure out the on-disk format.

Reviewed By: kulshrax

Differential Revision: D15056859

fbshipit-source-id: 097ed8bcaa51369aba4bcc9ed1cc95ebd6a67a66
2019-04-24 10:58:55 -07:00
Xavier Deguillard
ffba172165 revisionstore: compress/decompress when needed
Summary:
Compressing/Decompressing data can be expensive, so avoid doing it when not
needed. I though about using a RefCell but decided on just using mutable
reference as an Entry will always be private to indexedlogdatastore.rs.

Reviewed By: kulshrax

Differential Revision: D15056862

fbshipit-source-id: ac0b811f2df563be86e3ade9abe89476db5d13cc
2019-04-24 10:58:55 -07:00
Xavier Deguillard
5b2bdfb23d revisionstore: make indexedlogdatastore::Entry fields private
Summary: This will allow decompression to be done on the fly as opposed to always.

Reviewed By: kulshrax

Differential Revision: D15056860

fbshipit-source-id: 60635c431579fc924a61d08b35688222ec4930bb
2019-04-24 10:58:55 -07:00
Xavier Deguillard
dc612855be revisionstore: don't support delta chains in the indexedlog datastore
Summary:
Delta chains are only created during repack, as every download operation
fetches the full content of the file. Even if we wanted to support them,
interrupted chains adds undesirable complexity as it can lead to chain loops if
we're not careful. Let's just not support delta chains for now to avoid this.

Reviewed By: kulshrax

Differential Revision: D15056861

fbshipit-source-id: 4b0474ce134e946952a70f363190faf50850abe0
2019-04-24 10:58:55 -07:00
Xavier Deguillard
f4ce5d23b8 asyncrevisionstore: rename asynpacks to asyncrevisionstore
Summary: Now that IndexedLog are also in this crate, its name is no longer relevant.

Reviewed By: kulshrax

Differential Revision: D15056502

fbshipit-source-id: cb00c8322ac4ff7da97c8faaec2959e5f68ca4ca
2019-04-24 10:58:54 -07:00
Nick Terrell
ebdf5f1baf Update to zstd-1.4.x
Summary:
* Update to zstd-1.4.x
* Update to the latest zstd-rs

Reviewed By: Cyan4973

Differential Revision: D15040909

fbshipit-source-id: 938904d95ab8b1108d750d83602ee9c11c2c87b5
2019-04-23 22:41:55 -07:00
Arun Kulshreshtha
6745729cbd edenapi: make file validation configurable
Summary: Add a new config option to toggle file validation.

Differential Revision: D15034687

fbshipit-source-id: 3783ea1dacad9d1e494a5de1388f703db0ed1129
2019-04-22 14:46:29 -07:00
Stefan Filip
a802e610d1 revisionstore: rename Store to LocalStore
Summary:
I want to give Store a more specific name so that it doesn't get
confused with other Store abstractions that we will add in the
future.

Reviewed By: singhsrb

Differential Revision: D15007383

fbshipit-source-id: 499bcda4aecd5389e3bc1eba5206ba72a69c4c3d
2019-04-19 09:51:29 -07:00
Jun Wu
bbafce2167 indexedlog: expose range API on Log
Summary:
`Log::lookup_range` exposes the range query feature provided by `Index`.
The iterator is made double-ended by the way.

Reviewed By: sfilipco

Differential Revision: D14895477

fbshipit-source-id: 6aef0973e009bf8fc6f3b5e5a8f6c54e57c81360
2019-04-18 13:35:46 -07:00
Jun Wu
8c9dc8cf82 indexedlog: use RangeIter for prefix lookup
Summary:
The RangeIter is actually faster. The main reason is that it avoids recursion.
RangeIter does require double Vec, which seems like extra overhead. Practically
it does not seem to matter much.

The RangeIter code is also better written than PrefixIter. So let's delete
PrefixIter, and switch prefix lookups to use RangeIter.

Before:

  index prefix scan (2B)         89.788 ms
  index prefix scan (1B)         72.337 ms
  index prefix scan (2B, disk)  102.098 ms
  index prefix scan (1B, disk)   90.445 ms

After:

  index prefix scan (2B)         76.335 ms
  index prefix scan (1B)         54.517 ms
  index prefix scan (2B, disk)   91.798 ms
  index prefix scan (1B, disk)   67.143 ms

Reviewed By: sfilipco

Differential Revision: D14895478

fbshipit-source-id: 79a01774fb640c78fc5733db82f86f0f9403c960
2019-04-18 13:35:45 -07:00