Summary:
Move the logic to a common method. A side effect is that
IndexedlogIdDag needs to calculate `last_span.high` again, and
InProcessIdDag needs to calculate `last_store_id` again. But it's
probably okay since the merge is rare and the InProcessIdDag operations should
be considered fast even if it's called frequently.
Reviewed By: sfilipco
Differential Revision: D26504915
fbshipit-source-id: e215ffe9fef7948b7b662dba3ed990e374be989a
Summary: Add a few more tests to the generic IdDag tests.
Reviewed By: sfilipco
Differential Revision: D26415514
fbshipit-source-id: 5ef2a1d4e03527b184a07d94de91c64a05427b90
Summary: It is a constant `true` now. Inline the logic and remove the field.
Reviewed By: sfilipco
Differential Revision: D26414838
fbshipit-source-id: ceace835ecfd263ba16c3dead41ce6ba95087e4f
Summary:
Similar to the previous diff. Implement the merge behavior for InProcess IdDag.
The feature is not turned on yet for easier review. It is turned on by the
next diff.
Reviewed By: sfilipco
Differential Revision: D26414839
fbshipit-source-id: 2a485e0dffcd9d8e2ad95380159d11342636f9aa
Summary:
Merge adjacent flat segments to reduce fragmentation.
The merging is done by a special entry that instructs:
- Remove the old segment from index.
- Insert the new segment.
Note: For the parent -> child index we record parent -> child id, not parent ->
segment. So that index does not need to be updated, because the merged segment
has the same "parent -> child id" information.
Note: The merging is a bit expensive (multiple lookups). But most segments are
not mergeable, because their parents are not "span.low - 1". So the slow path
is not hit often.
The feature is not enabled by default in this diff, because it will break IdDag
tests, which assume indexedlog and in-memory IdDags behave the same, and we
haven't updated in-memory IdDag.
Reviewed By: sfilipco
Differential Revision: D26414841
fbshipit-source-id: 6a268bef56b7a56c62bb8ad949f13e503a88b033
Summary:
With previous refactorings the `max_level` is now maintained by the underlying
IdDag stores. Remove the `max_level` on the wrapper IdDag to simplify the
logic.
Reviewed By: sfilipco
Differential Revision: D26414840
fbshipit-source-id: 084a7f6f75fd53103c45519190607607fef4e161
Summary: We're going to remove the fields. Those add noise to the serialized data.
Reviewed By: sfilipco
Differential Revision: D26504916
fbshipit-source-id: 7fd13bdab4511ceea50195595514fb89a4169419
Summary:
Previously the `IdDag` struct has max_level caching. With the previous diff the
cache was gone. Re-implement it on the indexedlog IdDag to maintain
performance. This has visible performance wins:
Before:
building segments 115.949 ms
ancestors 93.072 ms
children (spans) 495.732 ms
children (1 id) 10.384 ms
common_ancestors (spans) 3.567 s
descendants (small subset) 25.829 ms
gca_one (2 ids) 258.997 ms
gca_one (spans) 3.718 s
gca_all (2 ids) 440.764 ms
gca_all (spans) 3.732 s
heads 322.552 ms
heads_ancestors 67.567 ms
is_ancestor 165.046 ms
parents 304.392 ms
parent_ids 11.765 ms
range (2 ids) 21.466 ms
range (spans) 18.663 ms
roots 471.934 ms
After:
benchmarking dag::iddagstore::indexedlog_store::IndexedLogStore
building segments 103.177 ms
ancestors 68.485 ms
children (spans) 451.383 ms
children (1 id) 9.817 ms
common_ancestors (spans) 3.096 s
descendants (small subset) 24.845 ms
gca_one (2 ids) 201.458 ms
gca_one (spans) 3.185 s
gca_all (2 ids) 357.899 ms
gca_all (spans) 3.239 s
heads 295.462 ms
heads_ancestors 50.991 ms
is_ancestor 146.798 ms
parents 296.667 ms
parent_ids 12.305 ms
range (2 ids) 7.781 ms
range (spans) 17.630 ms
roots 478.574 ms
Reviewed By: sfilipco
Differential Revision: D26360564
fbshipit-source-id: 51f55a5bb4e69321515e02f45545618320c1bce5
Summary:
Previously, algorithms are defined on `IdDag<Store>`, which requires a type
parameter to use. Move them to be defined directly on `Store` so using the
algorithms no longer require a type parameter. This will make it easier
to write code that calls algorithms on different IdDagStores.
Reviewed By: sfilipco
Differential Revision: D26360561
fbshipit-source-id: 8e0faf741019c4ed4119ad8e754aea9057b31866
Summary: It is not used. The definitions were moved to dag-types.
Reviewed By: sfilipco
Differential Revision: D26360562
fbshipit-source-id: 35e672194918e3f35294d69cad9e6990d8921900
Summary:
This allows us to "fork" the iterator so we can "peek ahead" multiple items,
without affecting the original iterator.
Reviewed By: sfilipco
Differential Revision: D26360566
fbshipit-source-id: 4cba280e64338b20fb3e1584609be8fda9b3d616
Summary:
Implement Iterator::{size_hint,count,last} for potential fast paths (ex.
Collect into a Vec).
Reviewed By: sfilipco
Differential Revision: D26360565
fbshipit-source-id: 227d9c5e615c2a0a624ba88d6d4c3f81b10d7795
Summary:
Benchmark code is out of sync. Fix it.
Example run:
benchmarking dag::iddag::IdDag<dag::iddagstore::in_process_store::InProcessStore> serde
serializing inprocess iddag with mincode 0.664 ms
mincode serialized blob has 707565 bytes
deserializing inprocess iddag with mincode 42.477 ms
Reviewed By: sfilipco
Differential Revision: D26360563
fbshipit-source-id: f87e7ad53e6b6dadecaa0976e1c61f0399814104
Summary:
Benchmark code is out of sync. This is an important benchmark.
Example run:
benchmarking dag::iddagstore::indexedlog_store::IndexedLogStore
building segments 111.814 ms
ancestors 67.953 ms
children (spans) 484.954 ms
children (1 id) 12.599 ms
common_ancestors (spans) 3.337 s
descendants (small subset) 27.979 ms
gca_one (2 ids) 216.430 ms
gca_one (spans) 3.297 s
gca_all (2 ids) 371.049 ms
gca_all (spans) 3.348 s
heads 303.232 ms
heads_ancestors 52.821 ms
is_ancestor 149.525 ms
parents 294.633 ms
parent_ids 12.173 ms
range (2 ids) 7.612 ms
range (spans) 16.991 ms
roots 459.869 ms
benchmarking dag::iddagstore::in_process_store::InProcessStore
building segments 68.869 ms
ancestors 6.683 ms
children (spans) 175.711 ms
children (1 id) 2.061 ms
common_ancestors (spans) 408.220 ms
descendants (small subset) 6.990 ms
gca_one (2 ids) 16.983 ms
gca_one (spans) 411.237 ms
gca_all (2 ids) 27.921 ms
gca_all (spans) 415.704 ms
heads 110.486 ms
heads_ancestors 5.228 ms
is_ancestor 11.223 ms
parents 108.636 ms
parent_ids 0.746 ms
range (2 ids) 1.539 ms
range (spans) 5.885 ms
roots 172.910 ms
benchmarking NameDag with many heads
range (master::draft) 55.400 ms
range (recent_draft::drafts) 12.439 ms
Reviewed By: sfilipco
Differential Revision: D26360567
fbshipit-source-id: 6d3244e3f4655634c239f84a7304540860a7d34a
Summary:
Get the graph location of a given commit identifier.
The client using segmented changelog will have only a set of identifiers for
the commits in the graph. The client needs a way to translate user input to
data that it has locally. For example, when checking out an older commit by
hash the client will have to retrieve a location to understand the place in the
graph of the commit.
Reviewed By: quark-zju
Differential Revision: D26289623
fbshipit-source-id: 4192d91a4cce707419fb52168c5fdff53ac3a9d0
Summary: Used when the IdMap is lazy to fetch missing locations.
Reviewed By: quark-zju
Differential Revision: D26131617
fbshipit-source-id: cde0232b16afb961d9c9a18899ca78bd644f1b6b
Summary:
I think that it makes sense to standardize on importing:
`dag_types::Id` rather than `dag::Id` now that this is a crate of its own.
I am not sure about re-exporting `minibytes::Bytes` but it makes sense to me.
Reviewed By: quark-zju
Differential Revision: D26131616
fbshipit-source-id: fefd0334cf188f247b1541be16421967e8340546
Summary:
Wire types has it's own meaning in Edenapi. I don't see it necessary to
add the wire qualifier to this crate and overload the term.
Reviewed By: quark-zju
Differential Revision: D26129827
fbshipit-source-id: eea66eef2db609611d8ffa215ba63ae4f0b669c8
Summary: Provide a way to slice a set.
Reviewed By: sfilipco
Differential Revision: D26203562
fbshipit-source-id: 97a4349833a7a1c9664189d4737e2ad418369f22
Summary:
The SliceSet provides slicing support (skip n items, and then take m items)
for a general NameSet. The complexities come from caching and fast paths.
Basically, we cache the skipped items as a way to answer "contains" quickly.
We also cache the "taken" items if it's bounded to answer "iter_rev",
"contains".
Reviewed By: sfilipco
Differential Revision: D26203559
fbshipit-source-id: 6078b6178aff878e2169e87d446f0b254432aa80
Summary:
In the future we'd like to test "contains" but only use the "contains"
code path if it's better than O(N). Currently there is no way to know
whether "contains" is O(N) or not. Add an explicit API for that.
Reviewed By: sfilipco
Differential Revision: D26203554
fbshipit-source-id: 5d4c6014694c45b666a0ecd83fce33157cc15779
Summary: Enhance the test so it checks hints set by the `filter` function.
Reviewed By: sfilipco
Differential Revision: D26203553
fbshipit-source-id: b9bf5baa3bf51434835341e95e72073bd8c4256a
Summary:
It is incorrect with >= 2 hints.
For example,
iter: [hints1, hints2, hints1]
fold acc: hints1, None, hints1
The `None` should be permanent if there are two versions that do not have an
order.
Reviewed By: sfilipco
Differential Revision: D26203555
fbshipit-source-id: 96ff30ba45d439220519cd1505e3264118ffd9b2
Summary:
Previously, `LazySet` was constructed with default `Hints`. That disables fast
paths. Revise the API so LazySet requires an explicit `Hints` to address the
issue.
Reviewed By: sfilipco
Differential Revision: D26203561
fbshipit-source-id: c92cd1f7eb7b40ffaaf53abcf05e64f3d41b906d
Summary:
Previously, `MetaSet` was constructed with default `Hints`. That disables fast
paths. Revise the API so MetaSet requires an explicit `Hints` to address the
issue.
Reviewed By: sfilipco
Differential Revision: D26203557
fbshipit-source-id: 9e7658af8723b06d0efdcad1ab4671c79e907326
Summary: Those methods will be used for fast paths slicing a NameSet.
Reviewed By: sfilipco
Differential Revision: D26203556
fbshipit-source-id: aef18f60633653e19571e3fdeeb6b258e4dd32c7
Summary:
This just renames types so `IdSet` is the recommended name and `SpanSet`
remains an implementation detail.
Reviewed By: sfilipco
Differential Revision: D26203560
fbshipit-source-id: 7ca0262f3ad6d874363c73445f40f8c5bf3dc40e
Summary: This exposes the segments version of the algorithm.
Reviewed By: sfilipco
Differential Revision: D26182244
fbshipit-source-id: 716e6d5254c9962618040e7549c2804184230a97
Summary: This will be used by NameDag.
Reviewed By: sfilipco
Differential Revision: D26182243
fbshipit-source-id: 9db2ecde98281dc45fcfd0d7cf30d6c7bf2be81d
Summary:
This will be useful to optimize `_firstancestors` revset, which is useful to
calculate a linear branch to draw growth graphs. Without a fast path, the pure
Python `_firstancestors` implementation would have id <-> name translation
overhead that makes Rust changelog O(20) slower than the Python revlog.
Reviewed By: sfilipco
Differential Revision: D26182240
fbshipit-source-id: d44f5ca5dc8c38df74281832931d87868791209e
Summary:
Optimize the `x~n` revset function using Rust.
Note: This changes the behavior a bit, `x~n` no longer returns `null`.
Reviewed By: sfilipco
Differential Revision: D26142683
fbshipit-source-id: d6a45b7e67352d74986274e52002a769bbae772e
Summary:
When `n` is too large, return None. This matches the "parents()" behavior -
not error out but returns empty set.
Reviewed By: sfilipco
Differential Revision: D26142684
fbshipit-source-id: e45fca69e39c2968dc7abc5a4a155e6b7c280836
Summary:
The "merges" algorithm on the IdDag. Basically scan through flat segments
and conditionally pick up their "low"s.
Reviewed By: sfilipco
Differential Revision: D26142172
fbshipit-source-id: 305fe619a65ed4034423f303bee8d57be0424963
Summary:
The newly added API returns parent count wihtout actual parents.
Useful for the "merges" algorithm.
Reviewed By: sfilipco
Differential Revision: D26142176
fbshipit-source-id: 4f301b8de88f2af637f52bf62b24ddb12e65b6a7
Summary:
The function calculates all merges within a graph. It is useful to calculate
the "universally known" set, or to answer the "merge()" revset function.
This diff only adds a default impl. Upcoming diffs will add more efficient
versions on the segments graph.
Reviewed By: sfilipco
Differential Revision: D26142173
fbshipit-source-id: 02de180f6e444bcac63a1cc46dd23faeb8e08e14
Summary: The `filter` API filters a set by a function.
Reviewed By: sfilipco
Differential Revision: D26142177
fbshipit-source-id: f24cbeeaf1c85264706c933c98e364d7937790fe
Summary:
Expose the async LazySet API via NameSet constructor so users won't need to
care about the LazySet type.
Reviewed By: sfilipco
Differential Revision: D26142170
fbshipit-source-id: 178383684981e81e43f2a5610c45a7ebbd354ab4
Summary: This implementation is used for all things that are cached in Monononoke.
Reviewed By: quark-zju
Differential Revision: D26121497
fbshipit-source-id: a0088b539f3c3656921ab9a7a25c6442996aed18
Summary:
The name is being taken by stdlib:
warning: a method with this name may be added to the standard library in the future
--> eden/scm/lib/dag/src/spanset.rs:228:14
|
228 | .binary_search_by(|probe| span.low.cmp(&probe.low))
| ^^^^^^^^^^^^^^^^
|
= note: `#[warn(unstable_name_collisions)]` on by default
= warning: once this method is added to the standard library, the ambiguity may cause an error or change in behavior!
= note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
= help: call with fully qualified syntax `BinarySearchBy::binary_search_by(...)` to keep using the current method
= help: add `#![feature(vecdeque_binary_search)]` to the crate attributes to enable `VecDeque::<T>::binary_search_by`
Reviewed By: sfilipco
Differential Revision: D26092424
fbshipit-source-id: d2cdf7d73d2f808f038817c9dc9f4c531ff643bd
Summary:
Previously the LazySet only supports non-async Iterator. This makes it more
flexible useful. It will be used in upcoming changes.
Reviewed By: sfilipco
Differential Revision: D25858800
fbshipit-source-id: 8c8e874f05cfab721bc0fa55160a9337ed7c2c27
Summary:
In the future we'd like to allow building the dag crate without the indexedlog
portion. This diff adds support for that.
Reviewed By: DurhamG
Differential Revision: D25769054
fbshipit-source-id: eb5a200841f878836a9f68e65e7d50be7e6b9a79
Summary:
In the future we want to build dag without indexedlog dep for Mononoke
use-case. One of the problem is the ToWire trait implemented on dag::Id by
edenapi-types. Within buck, the dag crate will have 2 targets: dag and dag-lite
(no indexedlog). They are incompatible meaning that edenapi-types depending on
dag-lite will not provide Id::to_wire for crates using dag, or vice-versa.
To solve that, we move the Id and other types to a separate crate that only has
one buck target so edenapi-types, and segmented_changelog from Mononoke can
depend on it without issues. This also makes edenapi-types more lightweight.
Reviewed By: sfilipco
Differential Revision: D25857917
fbshipit-source-id: d3e15a2b6638cc6e15171a1e9bc37362e03df583
Summary: In upcoming changes, we're moving Id to a separate crate. This makes that easier.
Reviewed By: sfilipco
Differential Revision: D25857918
fbshipit-source-id: 6e2163f6fa171d4cd3be4fc0c4c248fd87ba739c
Summary:
`PartialOrd` was suggested by sfilipco. Note `Option<std::cmp::Ordering>` is
similar to `Side` in terms of expressiveness. `PartialOrd` can be written
using shorter symbols (`<=`, etc) so it's easier to understand.
The `compatible` family APIs were replaced by `partial_cmp` APIs.
There are some minor differences:
- Bitwise or used by union set is no longer supported. `Hints::union` was
added as a replacement.
- `Option<T>` implements full order. `Some(T) > None`. This is different
from `compatible_dag` and `compatible_id_map` APIs. Additional `> None`
checks were added for correctness.
Reviewed By: sfilipco
Differential Revision: D25652784
fbshipit-source-id: 51d88948fa556300678050088c06e9dda09cbf98
Summary:
Some code paths use (expensive) snapshot to be compatible with `Arc::ptr_eq`
compatibility check. With `VerLink` it's more efficient to use `VerLink`
directly. This is potentially more efficient for `VerLink` too because the
`Arc` won't be cloned unnecessarily and `VerLink::bump()` is more likely to
use its optimized path.
Reviewed By: sfilipco
Differential Revision: D25608200
fbshipit-source-id: 1b3ecc5d7ec5d495bdda22d66025bb812f3d68a0