Commit Graph

708 Commits

Author SHA1 Message Date
Jun Wu
c6d9d5604b spanset: add a benchmark of set operations
Summary: This would provide some data about changes around SpanSet.

Reviewed By: sfilipco

Differential Revision: D15023652

fbshipit-source-id: 4cff7d1876fe20cd876f26926f31e018b6c88fd9
2019-04-25 17:05:10 -07:00
Jun Wu
a611cbb0db idmap: implement write operations
Summary:
Complete the IdMap interface so it's usable.

There are 2 possible use patterns:
- On-disk IdMap + In-memory additions. Practically, the server provides an
  on-disk map, and the client might assign missing commits on demand. The
  client still needs to update the IdMap during pull.
- Everything is on-disk. There are no in-memory additions. This is more complex
  because the local commits might become part of the server commits in the
  future, and it might require Ids for those commits to be re-assigned.

I haven't decided which way to go exactly. So let's keep the interface flexible
for both.

That said, I do want to reduce the chance of causing filesystem race conditions
for filesystem writes. In this case, both reads and writes should hold a lock.
So a dedicated type is used to encourage the pattern of:

  - get the dedicated type (and hold the filesystem lock)
  - read, write, sync

Write related methods are not moved to the dedicated type, to cover the
in-memory addition use-case.

Reviewed By: sfilipco

Differential Revision: D15008517

fbshipit-source-id: 5d117ed7f2947aed6ed524a3b5199c071908c4ae
2019-04-25 17:05:10 -07:00
Jun Wu
c10a9bc67e idmap: introduce IdMap
Summary:
There will be lots of algorithms or structures that operate on integers as
commit identities. The source of truth of commit identities are the commit
hashes. Add a map to be able to translate between them.

The map is designed to be sparse, so it can be used as a cache if the map
is moved to server-side.

The map does not take `[u8; 20]` as its value type, with the intention to
support other hash functions. For example, Bonsai Blake2 hashes have 32 bytes.

Since the integer id is in global namespace and can conflict if there
are multiple writers. The interface is designed to make sure an explicit
critical section is needed for write (to filesystem) operations.

Reviewed By: sfilipco

Differential Revision: D15008518

fbshipit-source-id: 9f53aae551c54e1b47b5f837642ea00fca8579c3
2019-04-25 17:05:10 -07:00
Jun Wu
21c4700f0c spanset: implement iterator
Summary: The iterator iterates integers inside the SpanSet.

Reviewed By: sfilipco

Differential Revision: D15004983

fbshipit-source-id: 05b2de0d78f2640b8db2cbad69aa79d71b8d196d
2019-04-25 17:05:10 -07:00
Jun Wu
8a552db6da spanset: implement set operations
Summary: Add intersection, union, and difference operations.

Reviewed By: sfilipco

Differential Revision: D15004986

fbshipit-source-id: 8d1b5ca5ac5ba936afb136f5c5a4ddf1f8862161
2019-04-25 17:05:10 -07:00
Jun Wu
0f8c084266 spanset: introduce SpanSet
Summary:
The spanset is a set of integer spans. It will be used by some DAG related
operations. It'll be used as a subset of mercurial/smartset.py.

Note: smartset.py also has a Python `spanset` structure. That is different
from this Rust spanset in these ways:

- The Rust set does not preserve ordering.
- The Rust set can have multiple spans, instead of just one.
- The Rust set is less abstract (for now). Its set operations (union, etc.)
  only work on the same type.

This diff adds some initial functions for it.

Reviewed By: sfilipco

Differential Revision: D15004985

fbshipit-source-id: c2e5e2a80e2e4681c2f443e0d8a83dc97f7be371
2019-04-25 17:05:09 -07:00
Jun Wu
26f1f12a28 dag: add a library
Summary: The scmdag library is going to have things related to the commit graph.

Reviewed By: sfilipco

Differential Revision: D15004984

fbshipit-source-id: f274cceeabae4a57985763216572f7cd055f8e07
2019-04-25 17:05:09 -07:00
Arun Kulshreshtha
f1c1cf95d6 edenapi: release GIL during data fetching
Summary: Release the GIL during data fetching to allow for progress bars to update properly. The data fetching code is pure Rust and does not interact with the Python interpreter at all, so releasing the GIL here is safe.

Differential Revision: D15051852

fbshipit-source-id: 144da953720951f9a30aadfc2b7fc8c8bc6b14aa
2019-04-24 13:33:58 -07:00
Xavier Deguillard
621d5f637c revisionstore: describe the serialization format
Summary: Reading a comment is easier than trying to figure out the on-disk format.

Reviewed By: kulshrax

Differential Revision: D15056859

fbshipit-source-id: 097ed8bcaa51369aba4bcc9ed1cc95ebd6a67a66
2019-04-24 10:58:55 -07:00
Xavier Deguillard
ffba172165 revisionstore: compress/decompress when needed
Summary:
Compressing/Decompressing data can be expensive, so avoid doing it when not
needed. I though about using a RefCell but decided on just using mutable
reference as an Entry will always be private to indexedlogdatastore.rs.

Reviewed By: kulshrax

Differential Revision: D15056862

fbshipit-source-id: ac0b811f2df563be86e3ade9abe89476db5d13cc
2019-04-24 10:58:55 -07:00
Xavier Deguillard
5b2bdfb23d revisionstore: make indexedlogdatastore::Entry fields private
Summary: This will allow decompression to be done on the fly as opposed to always.

Reviewed By: kulshrax

Differential Revision: D15056860

fbshipit-source-id: 60635c431579fc924a61d08b35688222ec4930bb
2019-04-24 10:58:55 -07:00
Xavier Deguillard
dc612855be revisionstore: don't support delta chains in the indexedlog datastore
Summary:
Delta chains are only created during repack, as every download operation
fetches the full content of the file. Even if we wanted to support them,
interrupted chains adds undesirable complexity as it can lead to chain loops if
we're not careful. Let's just not support delta chains for now to avoid this.

Reviewed By: kulshrax

Differential Revision: D15056861

fbshipit-source-id: 4b0474ce134e946952a70f363190faf50850abe0
2019-04-24 10:58:55 -07:00
Xavier Deguillard
f4ce5d23b8 asyncrevisionstore: rename asynpacks to asyncrevisionstore
Summary: Now that IndexedLog are also in this crate, its name is no longer relevant.

Reviewed By: kulshrax

Differential Revision: D15056502

fbshipit-source-id: cb00c8322ac4ff7da97c8faaec2959e5f68ca4ca
2019-04-24 10:58:54 -07:00
Nick Terrell
ebdf5f1baf Update to zstd-1.4.x
Summary:
* Update to zstd-1.4.x
* Update to the latest zstd-rs

Reviewed By: Cyan4973

Differential Revision: D15040909

fbshipit-source-id: 938904d95ab8b1108d750d83602ee9c11c2c87b5
2019-04-23 22:41:55 -07:00
Arun Kulshreshtha
6745729cbd edenapi: make file validation configurable
Summary: Add a new config option to toggle file validation.

Differential Revision: D15034687

fbshipit-source-id: 3783ea1dacad9d1e494a5de1388f703db0ed1129
2019-04-22 14:46:29 -07:00
Stefan Filip
a802e610d1 revisionstore: rename Store to LocalStore
Summary:
I want to give Store a more specific name so that it doesn't get
confused with other Store abstractions that we will add in the
future.

Reviewed By: singhsrb

Differential Revision: D15007383

fbshipit-source-id: 499bcda4aecd5389e3bc1eba5206ba72a69c4c3d
2019-04-19 09:51:29 -07:00
Jun Wu
bbafce2167 indexedlog: expose range API on Log
Summary:
`Log::lookup_range` exposes the range query feature provided by `Index`.
The iterator is made double-ended by the way.

Reviewed By: sfilipco

Differential Revision: D14895477

fbshipit-source-id: 6aef0973e009bf8fc6f3b5e5a8f6c54e57c81360
2019-04-18 13:35:46 -07:00
Jun Wu
8c9dc8cf82 indexedlog: use RangeIter for prefix lookup
Summary:
The RangeIter is actually faster. The main reason is that it avoids recursion.
RangeIter does require double Vec, which seems like extra overhead. Practically
it does not seem to matter much.

The RangeIter code is also better written than PrefixIter. So let's delete
PrefixIter, and switch prefix lookups to use RangeIter.

Before:

  index prefix scan (2B)         89.788 ms
  index prefix scan (1B)         72.337 ms
  index prefix scan (2B, disk)  102.098 ms
  index prefix scan (1B, disk)   90.445 ms

After:

  index prefix scan (2B)         76.335 ms
  index prefix scan (1B)         54.517 ms
  index prefix scan (2B, disk)   91.798 ms
  index prefix scan (1B, disk)   67.143 ms

Reviewed By: sfilipco

Differential Revision: D14895478

fbshipit-source-id: 79a01774fb640c78fc5733db82f86f0f9403c960
2019-04-18 13:35:45 -07:00
Jun Wu
ef89e13207 indexedlog: add benchmark about scan_prefix
Summary:
This would provide data about scan_prefix performance.

The benchmark code is slightly changed to share the index across test cases.
That reduces test setup cost.

Reviewed By: sfilipco

Differential Revision: D14895481

fbshipit-source-id: e70098bd202e102822a0829c0ae28de8d49fbe85
2019-04-18 13:35:45 -07:00
Jun Wu
7b6ae70f2d indexedlog: implement range scan on Index
Summary:
This API allows range query, similar to `BTreeMap::range`.

It's going to be used by segmented changelog. There are spans (start, end)
stored in the index, and we need to find spans by rev (start <= rev <= end).

Initially I was changing PrefixIter incrementally towards the new RangeIter.
There are too many small commits and I got some useful feedback early. Now
it seems cleaner to just introduce the desired state of RangeIter first.
We can later migrate prefix lookup to RangeIter, if perf regression is
negligible.

The added code is long. But some of them are modified from existing code:
- `RangeIter::next_internal` is modified from `PrefixIter::next`.
- `Index::get_stack_by_bound` is modified from `Index::scan_prefix_base16`.

The tests helped find some issues of the code. I hope they're not too weak.

Reviewed By: sfilipco

Differential Revision: D14895479

fbshipit-source-id: fb8f1bd35c61187fe5f7764fa485206bbb13c8e0
2019-04-18 13:35:45 -07:00
Arun Kulshreshtha
e80dd37d5f edenapi: use DataEntry
Summary: Update the Eden API client and server to use the new DataEntry type for file content downloads.

Reviewed By: quark-zju

Differential Revision: D14907958

fbshipit-source-id: 8a7b1cbb54bdc119dda11179ff94d3efdb7e85c9
2019-04-16 22:13:41 -07:00
Stefan Filip
dd6dd0f998 types: remove deprecated Key.name() and Key.set_name()
Summary: Removing this function in favor of using Key.path

Reviewed By: quark-zju

Differential Revision: D14945331

fbshipit-source-id: 6b6bb70375629edf37b2b04a86545f18e15b33b4
2019-04-16 15:34:31 -07:00
Stefan Filip
4cdbb6b703 types: remove Key::from_name_slice
Summary: No longer used

Differential Revision: D14945333

fbshipit-source-id: 7038501e8a78061ac6e83d89b8e4f16d1c4c95de
2019-04-16 15:34:31 -07:00
Stefan Filip
3b46c887e8 asyncpacks: migrate to Key::new from Key::from_name_slice
Summary: migration

Differential Revision: D14945332

fbshipit-source-id: 0906eb54f205a865d4fc5f3599469a6851b5c6ca
2019-04-16 15:34:30 -07:00
Stefan Filip
4b0e94305f revisionstore: use RepoPath in DataEntry
Summary: migration

Differential Revision: D14945337

fbshipit-source-id: 96247d27bc9e829a1ebb73c5617a399e149ac69b
2019-04-16 15:34:30 -07:00
Xavier Deguillard
67a4b6af52 revisionstore: partial chains can be returned from get_delta_chain
Summary:
In the case where a delta chain is split between several logs, it's possible
that part of it may be removed due to some logs being removed. Instead of
treating this as an error, we can simply return the partial chain, the union
content store will simply continue the delta chain on the next store.

Reviewed By: quark-zju

Differential Revision: D14899943

fbshipit-source-id: 7369ee191dc4b35873344cd13c295c72472e0712
2019-04-16 10:47:09 -07:00
Xavier Deguillard
c358df34a8 revisionstore: a not found key should fail with KeyError
Summary:
The Python code interprets a KeyError as a lookup failure, and will retry the
lookup on the next store. Any other Rust errors will be translated into a
RuntimeError exception that Python will re-raise and stop the lookup.

Reviewed By: quark-zju

Differential Revision: D14895905

fbshipit-source-id: d22733c0a68ff3f28d502eb2cd4c3a0467ee35d1
2019-04-16 10:47:09 -07:00
Arun Kulshreshtha
68e1c44335 types: add DataEntry type
Summary: Add a new type representing a file content entry on the wire. This type is serializable and includes file content itself as well as the filenode's parents, which collectively allow for filenode hash verification.

Differential Revision: D14907957

fbshipit-source-id: ed0f85270c98bd5675da8553ffbfa0549b574b7f
2019-04-15 17:32:44 -07:00
Arun Kulshreshtha
382a42d958 edenapi: switch to CBOR for data serialization
Summary:
Previously, the Eden API endpoints on the API server used JSON for encoding requests/responses for debugging purposes. Given that these APIs are mostly used to transfer large amounts of binary data, we should use a binary encoding scheme in production. CBOR fits the bill since it is essentially binary JSON, allowing for more efficient coding of large byte strings.

Although this is a breaking API change, given that nothing depends on these endpoints yet, it should be OK to simply change the format.

Differential Revision: D14907978

fbshipit-source-id: e0aea30d7304f4b727e2ad7fe23379457b6c3e26
2019-04-15 12:32:25 -07:00
Arun Kulshreshtha
852626bbdb types: move quickcheck macro imports into tests module
Summary: Previously, this import would trigger dead code warnings because it would be conditionally included with either `cfg(test)` or the `for-tests` feature, whereas the tests module (which was the only user of the import) would only be conditionally included with `cfg(test)`. Fix the warning by moving the import into the tests module itself.

Reviewed By: quark-zju

Differential Revision: D14936954

fbshipit-source-id: ef7a84e8d36645624077283a0fb7798a1746f579
2019-04-15 12:14:25 -07:00
Stefan Filip
d45e21573a revisionstore: use RepoPath in HistoryPackIterator
Summary: migration

Reviewed By: quark-zju

Differential Revision: D14908310

fbshipit-source-id: 76623300c04bd8643796a99f66d9d3144787f072
2019-04-15 10:01:52 -07:00
Stefan Filip
be522e8dc5 revisionstore: remove uses Key::from_name_slice
Summary: migration

Reviewed By: quark-zju

Differential Revision: D14908315

fbshipit-source-id: 5d7d11982b70d10b49bb7fcd12cc6bf9c98146d6
2019-04-15 10:01:52 -07:00
Stefan Filip
77cdaca742 types: remove uses of Key::from_name_slice
Summary: migrating

Reviewed By: quark-zju

Differential Revision: D14908314

fbshipit-source-id: 92d9092bd879858349ab3b8cb98a484451c0442b
2019-04-15 10:01:52 -07:00
Stefan Filip
ee7703e821 revisionstore: use RepoPath in HistoryEntry
Summary: migrating

Reviewed By: quark-zju

Differential Revision: D14884957

fbshipit-source-id: 865f970627c08a26d1336fa57235f8ebbdb1d4a9
2019-04-15 10:01:51 -07:00
Stefan Filip
dd4a010aac revisionstore: use RepoPath in HistoryPack
Summary: Migrating

Reviewed By: quark-zju

Differential Revision: D14884958

fbshipit-source-id: 34bf2ea726b19f9929652d9836a224baac8b328b
2019-04-15 10:01:51 -07:00
Stefan Filip
a40d9db09d types: update history entry to use RepoPath
Summary: Refactoring

Reviewed By: quark-zju

Differential Revision: D14877540

fbshipit-source-id: c275c335ffe89ebf2fa1229925b1db2015374659
2019-04-15 10:01:51 -07:00
Stefan Filip
7e2b3c256f types: rename Key::new to Key::from_name_slice
Summary:
We should update the builder for Key to take a repo path. We could build
the key directly using the default struct constructor but representing
the two constructors as functions is more clear.

Reviewed By: quark-zju

Differential Revision: D14877543

fbshipit-source-id: 328906521cdbad535e28df22fea82f21e8b5410a
2019-04-14 19:56:50 -07:00
Stefan Filip
e4fc87ac37 types: deprecate Key::name
Summary:
Marking the uses of byte arrays for repository paths as deprecated
to make it easier to remove uses in the code.

Reviewed By: quark-zju

Differential Revision: D14877541

fbshipit-source-id: 4c06e0f7012a33cc92752530618396c3c529f986
2019-04-14 19:56:50 -07:00
Stefan Filip
4d59694b10 types: change the underlying type for Key::path to RepoPath
Summary:
It is fairly difficult to avoid an intermediary state where we don't have some
panics. Since we don't really deal with invalid paths this intermediary state
is not a real concern.

Reviewed By: quark-zju

Differential Revision: D14877553

fbshipit-source-id: 6f60f20af8d8f1e3ff23c5d8ab5353bc8d919ebf
2019-04-14 19:56:49 -07:00
Stefan Filip
b291cdb244 types: optimize PathComponentBuf::arbitrary
Summary:
The overhead of generating all the different strings is noticeable when
we start to generate a lot of values.

Reviewed By: quark-zju

Differential Revision: D14877547

fbshipit-source-id: 8a91241ff3e86b6ac9b68197c449ed2be445f941
2019-04-14 19:56:49 -07:00
Stefan Filip
967cd9c01b revisionstore: use testutil in indexdlogdatastore
Summary: testutil everywhere

Reviewed By: quark-zju

Differential Revision: D14884959

fbshipit-source-id: 7f999179866e4d71f0e89bd00df168e5932818f2
2019-04-14 19:56:49 -07:00
Stefan Filip
885c477d3b revisionstore: update mutabledatapack to use testutil
Summary: testutil everywhere

Reviewed By: quark-zju

Differential Revision: D14877550

fbshipit-source-id: 3aa7a345adaac3444ce73ae6c20326bbcef9e873
2019-04-14 19:56:48 -07:00
Stefan Filip
91245a8749 revisionstore: use testutil in datapack
Summary: testutil everywhere

Reviewed By: quark-zju

Differential Revision: D14877542

fbshipit-source-id: f4bd4bf97206d2a2f5deb4d28d22f9dd7bec5a72
2019-04-14 19:56:48 -07:00
Stefan Filip
792a366a91 types: add RepoPathBuf::from_utf8(Vec<u8>)
Summary: This builder is the mirror of `String::from_utf8` in the RepoPathBuf type.

Reviewed By: quark-zju

Differential Revision: D14877551

fbshipit-source-id: de79c36b0f5d638aad12428f7e5ee1bbe19c4bc6
2019-04-14 19:56:48 -07:00
Stefan Filip
1794ce765d types: make Key::name a private member
Summary: This is preparation to change the backing storage from Vec<u8> to RepoPath

Reviewed By: quark-zju

Differential Revision: D14877544

fbshipit-source-id: a7f3c805fcc9bc96a4135b2e36e73e2662cec54e
2019-04-14 19:56:48 -07:00
Stefan Filip
bec1a1d3e0 types: add mocks for RepoPathBuf
Summary: This will be used in transitioning key from bytes to repo path.

Reviewed By: quark-zju

Differential Revision: D14877548

fbshipit-source-id: f79863469d58b53557e51ebb033cc6a6b5f43499
2019-04-14 19:56:47 -07:00
Stefan Filip
cca46940f6 types: mark RepoPathBuf serializable
Summary:
Nothing fancy for RepoPathBuf to prevent us from marking them Serializable.
These will be used by the api server to wire data.

Reviewed By: quark-zju

Differential Revision: D14877552

fbshipit-source-id: 1a8728e28209213fced06d739698099ab8c462f2
2019-04-14 19:56:47 -07:00
Stefan Filip
78d11002eb types: remove Key::node()
Summary:
This function is difficult to justify in the context of the Rust borrow checker.
The primary concern for this pattern is preventing mutation when the object is
passed around.

We can always add the function back if it has to more than just return the
underlying value.

Reviewed By: quark-zju

Differential Revision: D14877545

fbshipit-source-id: acdd796e1bee5445c1bce5ce0ceb41a7334e4966
2019-04-14 19:56:47 -07:00
Stefan Filip
9014310969 types: update RepoPath::from_utf8 to take AsRef
Summary: AsRef as a parameter is more flexible than direct slice.

Reviewed By: quark-zju

Differential Revision: D14908313

fbshipit-source-id: 07b317f151403be433eded136122bf652c887a07
2019-04-14 19:56:47 -07:00
Stefan Filip
a420476a20 revisionstore: migrate repacks.rs to use testutil
Summary: testutil everywhere

Reviewed By: quark-zju

Differential Revision: D14877549

fbshipit-source-id: 9df8e76068e68eff2895a6454dff13b21f2894ac
2019-04-14 19:56:46 -07:00