Summary: This would provide some data about changes around SpanSet.
Reviewed By: sfilipco
Differential Revision: D15023652
fbshipit-source-id: 4cff7d1876fe20cd876f26926f31e018b6c88fd9
Summary:
Complete the IdMap interface so it's usable.
There are 2 possible use patterns:
- On-disk IdMap + In-memory additions. Practically, the server provides an
on-disk map, and the client might assign missing commits on demand. The
client still needs to update the IdMap during pull.
- Everything is on-disk. There are no in-memory additions. This is more complex
because the local commits might become part of the server commits in the
future, and it might require Ids for those commits to be re-assigned.
I haven't decided which way to go exactly. So let's keep the interface flexible
for both.
That said, I do want to reduce the chance of causing filesystem race conditions
for filesystem writes. In this case, both reads and writes should hold a lock.
So a dedicated type is used to encourage the pattern of:
- get the dedicated type (and hold the filesystem lock)
- read, write, sync
Write related methods are not moved to the dedicated type, to cover the
in-memory addition use-case.
Reviewed By: sfilipco
Differential Revision: D15008517
fbshipit-source-id: 5d117ed7f2947aed6ed524a3b5199c071908c4ae
Summary:
There will be lots of algorithms or structures that operate on integers as
commit identities. The source of truth of commit identities are the commit
hashes. Add a map to be able to translate between them.
The map is designed to be sparse, so it can be used as a cache if the map
is moved to server-side.
The map does not take `[u8; 20]` as its value type, with the intention to
support other hash functions. For example, Bonsai Blake2 hashes have 32 bytes.
Since the integer id is in global namespace and can conflict if there
are multiple writers. The interface is designed to make sure an explicit
critical section is needed for write (to filesystem) operations.
Reviewed By: sfilipco
Differential Revision: D15008518
fbshipit-source-id: 9f53aae551c54e1b47b5f837642ea00fca8579c3
Summary:
The spanset is a set of integer spans. It will be used by some DAG related
operations. It'll be used as a subset of mercurial/smartset.py.
Note: smartset.py also has a Python `spanset` structure. That is different
from this Rust spanset in these ways:
- The Rust set does not preserve ordering.
- The Rust set can have multiple spans, instead of just one.
- The Rust set is less abstract (for now). Its set operations (union, etc.)
only work on the same type.
This diff adds some initial functions for it.
Reviewed By: sfilipco
Differential Revision: D15004985
fbshipit-source-id: c2e5e2a80e2e4681c2f443e0d8a83dc97f7be371
Summary: The scmdag library is going to have things related to the commit graph.
Reviewed By: sfilipco
Differential Revision: D15004984
fbshipit-source-id: f274cceeabae4a57985763216572f7cd055f8e07
Summary: Release the GIL during data fetching to allow for progress bars to update properly. The data fetching code is pure Rust and does not interact with the Python interpreter at all, so releasing the GIL here is safe.
Differential Revision: D15051852
fbshipit-source-id: 144da953720951f9a30aadfc2b7fc8c8bc6b14aa
Summary: Reading a comment is easier than trying to figure out the on-disk format.
Reviewed By: kulshrax
Differential Revision: D15056859
fbshipit-source-id: 097ed8bcaa51369aba4bcc9ed1cc95ebd6a67a66
Summary:
Compressing/Decompressing data can be expensive, so avoid doing it when not
needed. I though about using a RefCell but decided on just using mutable
reference as an Entry will always be private to indexedlogdatastore.rs.
Reviewed By: kulshrax
Differential Revision: D15056862
fbshipit-source-id: ac0b811f2df563be86e3ade9abe89476db5d13cc
Summary: This will allow decompression to be done on the fly as opposed to always.
Reviewed By: kulshrax
Differential Revision: D15056860
fbshipit-source-id: 60635c431579fc924a61d08b35688222ec4930bb
Summary:
Delta chains are only created during repack, as every download operation
fetches the full content of the file. Even if we wanted to support them,
interrupted chains adds undesirable complexity as it can lead to chain loops if
we're not careful. Let's just not support delta chains for now to avoid this.
Reviewed By: kulshrax
Differential Revision: D15056861
fbshipit-source-id: 4b0474ce134e946952a70f363190faf50850abe0
Summary: Now that IndexedLog are also in this crate, its name is no longer relevant.
Reviewed By: kulshrax
Differential Revision: D15056502
fbshipit-source-id: cb00c8322ac4ff7da97c8faaec2959e5f68ca4ca
Summary: Add a new config option to toggle file validation.
Differential Revision: D15034687
fbshipit-source-id: 3783ea1dacad9d1e494a5de1388f703db0ed1129
Summary:
I want to give Store a more specific name so that it doesn't get
confused with other Store abstractions that we will add in the
future.
Reviewed By: singhsrb
Differential Revision: D15007383
fbshipit-source-id: 499bcda4aecd5389e3bc1eba5206ba72a69c4c3d
Summary:
`Log::lookup_range` exposes the range query feature provided by `Index`.
The iterator is made double-ended by the way.
Reviewed By: sfilipco
Differential Revision: D14895477
fbshipit-source-id: 6aef0973e009bf8fc6f3b5e5a8f6c54e57c81360
Summary:
The RangeIter is actually faster. The main reason is that it avoids recursion.
RangeIter does require double Vec, which seems like extra overhead. Practically
it does not seem to matter much.
The RangeIter code is also better written than PrefixIter. So let's delete
PrefixIter, and switch prefix lookups to use RangeIter.
Before:
index prefix scan (2B) 89.788 ms
index prefix scan (1B) 72.337 ms
index prefix scan (2B, disk) 102.098 ms
index prefix scan (1B, disk) 90.445 ms
After:
index prefix scan (2B) 76.335 ms
index prefix scan (1B) 54.517 ms
index prefix scan (2B, disk) 91.798 ms
index prefix scan (1B, disk) 67.143 ms
Reviewed By: sfilipco
Differential Revision: D14895478
fbshipit-source-id: 79a01774fb640c78fc5733db82f86f0f9403c960
Summary:
This would provide data about scan_prefix performance.
The benchmark code is slightly changed to share the index across test cases.
That reduces test setup cost.
Reviewed By: sfilipco
Differential Revision: D14895481
fbshipit-source-id: e70098bd202e102822a0829c0ae28de8d49fbe85
Summary:
This API allows range query, similar to `BTreeMap::range`.
It's going to be used by segmented changelog. There are spans (start, end)
stored in the index, and we need to find spans by rev (start <= rev <= end).
Initially I was changing PrefixIter incrementally towards the new RangeIter.
There are too many small commits and I got some useful feedback early. Now
it seems cleaner to just introduce the desired state of RangeIter first.
We can later migrate prefix lookup to RangeIter, if perf regression is
negligible.
The added code is long. But some of them are modified from existing code:
- `RangeIter::next_internal` is modified from `PrefixIter::next`.
- `Index::get_stack_by_bound` is modified from `Index::scan_prefix_base16`.
The tests helped find some issues of the code. I hope they're not too weak.
Reviewed By: sfilipco
Differential Revision: D14895479
fbshipit-source-id: fb8f1bd35c61187fe5f7764fa485206bbb13c8e0
Summary: Update the Eden API client and server to use the new DataEntry type for file content downloads.
Reviewed By: quark-zju
Differential Revision: D14907958
fbshipit-source-id: 8a7b1cbb54bdc119dda11179ff94d3efdb7e85c9
Summary: Removing this function in favor of using Key.path
Reviewed By: quark-zju
Differential Revision: D14945331
fbshipit-source-id: 6b6bb70375629edf37b2b04a86545f18e15b33b4
Summary:
In the case where a delta chain is split between several logs, it's possible
that part of it may be removed due to some logs being removed. Instead of
treating this as an error, we can simply return the partial chain, the union
content store will simply continue the delta chain on the next store.
Reviewed By: quark-zju
Differential Revision: D14899943
fbshipit-source-id: 7369ee191dc4b35873344cd13c295c72472e0712
Summary:
The Python code interprets a KeyError as a lookup failure, and will retry the
lookup on the next store. Any other Rust errors will be translated into a
RuntimeError exception that Python will re-raise and stop the lookup.
Reviewed By: quark-zju
Differential Revision: D14895905
fbshipit-source-id: d22733c0a68ff3f28d502eb2cd4c3a0467ee35d1
Summary: Add a new type representing a file content entry on the wire. This type is serializable and includes file content itself as well as the filenode's parents, which collectively allow for filenode hash verification.
Differential Revision: D14907957
fbshipit-source-id: ed0f85270c98bd5675da8553ffbfa0549b574b7f
Summary:
Previously, the Eden API endpoints on the API server used JSON for encoding requests/responses for debugging purposes. Given that these APIs are mostly used to transfer large amounts of binary data, we should use a binary encoding scheme in production. CBOR fits the bill since it is essentially binary JSON, allowing for more efficient coding of large byte strings.
Although this is a breaking API change, given that nothing depends on these endpoints yet, it should be OK to simply change the format.
Differential Revision: D14907978
fbshipit-source-id: e0aea30d7304f4b727e2ad7fe23379457b6c3e26
Summary: Previously, this import would trigger dead code warnings because it would be conditionally included with either `cfg(test)` or the `for-tests` feature, whereas the tests module (which was the only user of the import) would only be conditionally included with `cfg(test)`. Fix the warning by moving the import into the tests module itself.
Reviewed By: quark-zju
Differential Revision: D14936954
fbshipit-source-id: ef7a84e8d36645624077283a0fb7798a1746f579
Summary:
We should update the builder for Key to take a repo path. We could build
the key directly using the default struct constructor but representing
the two constructors as functions is more clear.
Reviewed By: quark-zju
Differential Revision: D14877543
fbshipit-source-id: 328906521cdbad535e28df22fea82f21e8b5410a
Summary:
Marking the uses of byte arrays for repository paths as deprecated
to make it easier to remove uses in the code.
Reviewed By: quark-zju
Differential Revision: D14877541
fbshipit-source-id: 4c06e0f7012a33cc92752530618396c3c529f986
Summary:
It is fairly difficult to avoid an intermediary state where we don't have some
panics. Since we don't really deal with invalid paths this intermediary state
is not a real concern.
Reviewed By: quark-zju
Differential Revision: D14877553
fbshipit-source-id: 6f60f20af8d8f1e3ff23c5d8ab5353bc8d919ebf
Summary:
The overhead of generating all the different strings is noticeable when
we start to generate a lot of values.
Reviewed By: quark-zju
Differential Revision: D14877547
fbshipit-source-id: 8a91241ff3e86b6ac9b68197c449ed2be445f941
Summary: This builder is the mirror of `String::from_utf8` in the RepoPathBuf type.
Reviewed By: quark-zju
Differential Revision: D14877551
fbshipit-source-id: de79c36b0f5d638aad12428f7e5ee1bbe19c4bc6
Summary: This is preparation to change the backing storage from Vec<u8> to RepoPath
Reviewed By: quark-zju
Differential Revision: D14877544
fbshipit-source-id: a7f3c805fcc9bc96a4135b2e36e73e2662cec54e
Summary: This will be used in transitioning key from bytes to repo path.
Reviewed By: quark-zju
Differential Revision: D14877548
fbshipit-source-id: f79863469d58b53557e51ebb033cc6a6b5f43499
Summary:
Nothing fancy for RepoPathBuf to prevent us from marking them Serializable.
These will be used by the api server to wire data.
Reviewed By: quark-zju
Differential Revision: D14877552
fbshipit-source-id: 1a8728e28209213fced06d739698099ab8c462f2
Summary:
This function is difficult to justify in the context of the Rust borrow checker.
The primary concern for this pattern is preventing mutation when the object is
passed around.
We can always add the function back if it has to more than just return the
underlying value.
Reviewed By: quark-zju
Differential Revision: D14877545
fbshipit-source-id: acdd796e1bee5445c1bce5ce0ceb41a7334e4966
Summary: AsRef as a parameter is more flexible than direct slice.
Reviewed By: quark-zju
Differential Revision: D14908313
fbshipit-source-id: 07b317f151403be433eded136122bf652c887a07