Summary:
Similar to the previous change. `VerLink` tracks compatibility more accurately.
- No false positives comparing to the current `map_id` approach.
- Less false negatives comparing to the previous `Arc::ptr_eq` approach.
The `map_id` is kept for debugging purpose.
Reviewed By: sfilipco
Differential Revision: D25607513
fbshipit-source-id: 7d7c7e3d49f707a584142aaaf0a98cfd3a9b5fe8
Summary:
Previously, snapshots need to be invalidated manually. That is error-prone.
For example, `import_clone_data` forgot to call `invalidate_snapshot`.
With `VerLink`, it's easy to check if snapshot is up-to-date. So let's just
use that and remove the need of invalidating manually.
`invalidate_snapshot` is still useful to drop `version` in `snapshot` so
`VerLink::bump` might be more efficient. Forgetting about it no longer affects
correctness.
Reviewed By: sfilipco
Differential Revision: D25607514
fbshipit-source-id: 5efb489cda1d4875bcd274c5a197948f67101dc1
Summary:
`VerLink` tracks compatibility more accurately.
- No false positives comparing to the current `dag_id` approach.
- Less false negatives comparing to the previous `Arc::ptr_eq` approach.
The `dag_id` is kept for debugging purpose.
Note: By the current implementation, `dag.flush()` will make `dag`
incompatible from its previous state. This is somewhat expected, as
`flush` might pick up any changes on the filesystem, reassign non-master. Those
can be actually incompatible. This might be improved in the future to detect
reload changes by using some extra information.
Reviewed By: sfilipco
Differential Revision: D25607511
fbshipit-source-id: 3cfc97610504813a3e5bb32ec19a90495551fd3a
Summary:
There are 2 kinds of changes:
- Append-only changes. It is backwards-compatible.
- Non-append-only changes. It is not backwards-compatible.
Previously,
- `Arc::ptr_eq` on snapshot is too fragile. It treats append-only compatible
changes as incompatible.
- Even worse, because of wrapper types (ex. `Arc::new(Arc::new(dag))` is
different from `dag`), even a same underlying struct can be treated as
incompatible.
- `(map|dag)_id` is too rough. It treats incompatible non-append-only changes
as compatible.
Add `VerLink` to track those 2 different kinds of changes. It basically keeps a
(cheap) tree so backwards compatible changes will be detected precisely.
`VerLink` will replace IdMap and Dag compatibility checks.
Reviewed By: sfilipco
Differential Revision: D25607512
fbshipit-source-id: 478f81deee4d2494b56491ec4a851154ab7ae52d
Summary:
This makes it easier to check if set operations are using fast paths or not by
setting `RUST_LOG=dag=debug`.
Reviewed By: sfilipco
Differential Revision: D25598075
fbshipit-source-id: 1503a195268c0989d5166596f2c8a66e15201372
Summary:
See the previous diff for context. The new API will be used to check if two
dags are compatible.
Note: It can cause false positive on compatibility checks, which need a
more complex solution. See D25607513 in this stack.
Reviewed By: sfilipco
Differential Revision: D25598079
fbshipit-source-id: f5fc9c03d73b42fadb931038fe2e078881be955f
Summary:
It turns out `Arc::ptr_eq` is becoming unreliable, which will cause fast paths
to be not used, and extreme slowness in some cases (ex. `public & nodes`
iterating everything in `public`).
This diff adds an API for an IdMap to tell us its identity. That identity is
then used to replace the unreliable `Arc::ptr_eq`.
For an in-memory map, we just assign a unique number (per process) for its
identity on initialization. For an on-disk map, we use the type + path to
represent it.
Note: strictly speaking, this could cause false positives about
"maps are compatible", because two maps initially cloned from each other
can be mutated differently and their map_id do not change. That will
be addressed in upcoming diffs introducing a more complex but precise way to
track compatibility.
Reviewed By: sfilipco
Differential Revision: D25598076
fbshipit-source-id: 98c58f367770adaa14edcad20eeeed37420fbbaa
Summary: This makes it more flexible.
Reviewed By: kulshrax
Differential Revision: D24467604
fbshipit-source-id: 63023cf0dde2fb7eac592ac79008e4b7a62340c1
Summary: Make the parent function used by various graph building functions async.
Reviewed By: sfilipco
Differential Revision: D25353612
fbshipit-source-id: 31f173dc82f0cce6022cc2caae78369fdc821c8f
Summary:
It is no longer needed for building segments (replaced by "prepared flat
segments"). Remove it.
Reviewed By: sfilipco
Differential Revision: D25353613
fbshipit-source-id: aede9e33c3217a61b5b14aae5b128d8953bc578e
Summary: Make IdConvert async and migrate all its users.
Reviewed By: sfilipco
Differential Revision: D25350915
fbshipit-source-id: f05c89a43418f1180bf0ffa573ae2cdb87162c76
Summary: This will make it easier to make IdConvert async.
Reviewed By: sfilipco
Differential Revision: D25350912
fbshipit-source-id: fbaf638b16a9cf468b7530b19d699b7996ddc4f1
Summary: This will make async migrating easier.
Reviewed By: sfilipco
Differential Revision: D25350913
fbshipit-source-id: f33bdc0023ae0cc49601504b811991ea6813ff9e
Summary: This will make it easier to make IdConvert async.
Reviewed By: sfilipco
Differential Revision: D25350914
fbshipit-source-id: 9f2957731f13a28fdfab834de19763b8afcf8ffa
Summary: This will make it easier to make IdConvert async.
Reviewed By: sfilipco
Differential Revision: D25345239
fbshipit-source-id: 684a0843ae32270aa9b537ef9a2b17a28c027e51
Summary: This will make it easier to make IdConvert async.
Reviewed By: sfilipco
Differential Revision: D25345232
fbshipit-source-id: b8967ea51a6141a95070006a289dd724522f8e18
Summary:
Update DagAlgorithm and all its users to async. This makes it easier to make
IdConvert async.
Reviewed By: sfilipco
Differential Revision: D25345236
fbshipit-source-id: d6cf76723356bd0eb81822843b2e581de1e3290a
Summary:
Make it possible to use async functions in MetaSet functions.
It will be used when DagAlgorithm becomes async.
Reviewed By: sfilipco
Differential Revision: D25345229
fbshipit-source-id: 0469d572b56df21fbdbdfae4178377e572adbcda
Summary: This makes it easier to make DagAlgorithm async.
Reviewed By: sfilipco
Differential Revision: D25345234
fbshipit-source-id: 5ca4bac38f5aac4c6611146a87f423a244f1f5a2
Summary: `impl Trait` does not work with `async_trait`.
Reviewed By: sfilipco
Differential Revision: D25345238
fbshipit-source-id: e7890dbaeb162d44e072ea4428d045004608719b
Summary: This makes it easier to migrate to async.
Reviewed By: sfilipco
Differential Revision: D25345228
fbshipit-source-id: e819f0de5f805377a6977325216ef11b14d68c1d
Summary:
Marking IdConvert Sync makes it possible to be used as a trait object with async-trait.
See https://docs.rs/async-trait/0.1.41/async_trait/#dyn-traits
`dag` uses a lot `dyn DagAlgorithm`. In the future when async is used more, the
trait object will be required to be Send or Sync. Just require it on the trait
to make our life easier.
Marking `IdDagStore` as Send + Sync makes async migration easier.
Reviewed By: sfilipco
Differential Revision: D25345231
fbshipit-source-id: 45b96057907cbe2a1d38fd424e7d4c963dd1b245
Summary: Use async function for the PrefixLookup trait.
Reviewed By: sfilipco
Differential Revision: D24840820
fbshipit-source-id: d22cac9f11b06e3127fa956e3f116cf232214125
Summary: This makes the trait objects slightly easier to use.
Reviewed By: sfilipco
Differential Revision: D24840821
fbshipit-source-id: 22fcdf13b62420302b562c309874e08360d02372
Summary: This makes `dyn IdConvert` include `PrefixLookup`.
Reviewed By: sfilipco
Differential Revision: D24840819
fbshipit-source-id: 8d4e25c534f6e4397ec6f643eb3aa116bff12a2c
Summary:
In the future, when async APIs are used, Python bindings will have lifetime
issues. Make it possible to clone the IdMap so the Python bindings can be made
to work.
Reviewed By: sfilipco
Differential Revision: D24840822
fbshipit-source-id: 6aa4e369c877c428ed39d2cbea79e6943836afa8
Summary: This makes NameSet more friendly for async use-cases, interface-wise.
Reviewed By: sfilipco
Differential Revision: D24806695
fbshipit-source-id: 6e640ba2666872a9128d6460e8b53d6a0e595e56
Summary:
Change the main API of NameSet to async. Use the `nonblocking` crate to bridge
the sync and async world for compatibility. Future changes will migrate
Iterator to async Stream.
Reviewed By: sfilipco
Differential Revision: D24806696
fbshipit-source-id: f72571407a5747a4eabe096dada288656c9d426e
Summary:
This method reconstructs a dag from clone data.
At the moment we only have a clone data construction method in Mononoke. It's
the Dags job to construct and import the clone_data. We'll consolidate that at
a later time.
Reviewed By: quark-zju
Differential Revision: D24954823
fbshipit-source-id: fe92179ec80f71234fc8f1cf7709f5104aabb4fb
Summary:
This function is useful in the mononoke to compute the universal commit idmap
that is required for clone.
Reviewed By: quark-zju
Differential Revision: D24808327
fbshipit-source-id: 0cccd59bd7982dd0bc024d5fc85fb5aa5eafb831
Summary:
`flat_segments` are going to be used to generate CloneData. These segments will
be sent to a client repository and are going to bootstrap the iddag.
Reviewed By: quark-zju
Differential Revision: D24808331
fbshipit-source-id: 00bf9723a43bb159cd98304c2c4c6583988d75aa
Summary: This is the object that will be used to bootstrap a Dag after a clone.
Reviewed By: quark-zju
Differential Revision: D24808328
fbshipit-source-id: 2c7e97c027c84a11e8716f2e288500474990169b
Summary:
The goal is to reused the functionality provided by AssignHeadOutcome for clone
purposes.
Reviewed By: quark-zju
Differential Revision: D24717924
fbshipit-source-id: e88f21ee0d8210e805e9d6896bc8992009bd7975
Summary:
I initially saw the incremental build as something that would be run in places
that had IdMap and IdDag stored side by side in process. I am reconsidering
to use incremental build in the tailing process to keeps Segmented Changelog
artifacts up to date.
Since we update the IdMap before we update the IdDag, it is likely that we
will have runs that only update the IdMap and fail to update IdDags. This diff
adds a mechanism for the IdDag to catch up.
Reviewed By: krallin
Differential Revision: D24516440
fbshipit-source-id: 3a99248451d806ae20a0ba96199a34a8a35edaa4
Summary:
I am wondering whether we should customize the serialization format for the
InProcessStore. I want to have a basis for the comparison before I proceed.
Reviewed By: quark-zju
Differential Revision: D24580273
fbshipit-source-id: d3ddfdc029dbdd84f60acace06fddc80b4d005f4
Summary:
This will be used to avoid 1-by-1 fetching for the changelog backend with
commit text stored remotely.
Reviewed By: sfilipco
Differential Revision: D24321293
fbshipit-source-id: 9695c72166cadc0b167e2ce7fde822cdf6b1cea8
Summary:
Turn on rust changelog (changelog2) for all hosts (except hgsql).
Turn on doublewrite backend for hg-dev hosts, triggered by pull.
Tests are mostly working, and I have been using it for weeks.
Reviewed By: singhsrb
Differential Revision: D24259759
fbshipit-source-id: b89a27f98a6d3d1e4ea187bf7b29f875d0e96e2e
Summary: It's no longer useful as the new abstract interface does not need it.
Reviewed By: sfilipco
Differential Revision: D24399516
fbshipit-source-id: 2b6735d2a26706c6a3e6b592d2f3ecfc874c94cb
Summary:
This verifies the abstraction and simplifies the code.
The new code will use non-master segments for add_heads. Therefore the test
changes.
Reviewed By: sfilipco
Differential Revision: D24399496
fbshipit-source-id: 39067ad88ade79b4f7758bcdaafc03e5f34ced91
Summary: This makes the main namedag.rs cleaner. The next step is to move MemNameDag.
Reviewed By: sfilipco
Differential Revision: D24399495
fbshipit-source-id: c1e79a60edd8597fe7264f04548e5312414241a7
Summary: This is the last non-abstract interface of NameDag.
Reviewed By: sfilipco
Differential Revision: D24399514
fbshipit-source-id: f39bb84a1851a4fe4d1f29e6b0961e6a153c943d
Summary:
There is a need to open AbstractNameDag cleanly from a path.
Abstract that.
Reviewed By: sfilipco
Differential Revision: D24399498
fbshipit-source-id: ca242cd929e8f5580120c01eeaa928f630c21ed7
Summary:
I copied the code since it's hard to implement using the macros.
In the future I plan to merge MemNameDag into AbstractNameDag
and remove the macros.
Reviewed By: sfilipco
Differential Revision: D24399517
fbshipit-source-id: 326e76cd06a6e1ad26b39bcb51ba0ff24106c984
Summary: The `delegate!` is updated to support complex `impl`s.
Reviewed By: sfilipco
Differential Revision: D24399518
fbshipit-source-id: b9ba31174472cce4248e9644611cfc207abc3c1d
Summary: Will be used as bounds for abstraction.
Reviewed By: sfilipco
Differential Revision: D24399497
fbshipit-source-id: 343be12237d4850fbde9ebbe4034469527bd77fc
Summary: The `snapshot` field can be used instead.
Reviewed By: sfilipco
Differential Revision: D24399507
fbshipit-source-id: 67de20d897b8b763f724f3ccbd46618dec7911b9
Summary:
The trait requires an `IdMap` snapshot to be locally ready. That's not easy for
all possible implementations. Drop it to simplify things.
Reviewed By: sfilipco
Differential Revision: D24399501
fbshipit-source-id: 4d85f77c99208cda30b2a543a0bb5b295f49a65c
Summary: There were 2 prepare_filesystem_sync. Unify them into one implementation.
Reviewed By: sfilipco
Differential Revision: D24399513
fbshipit-source-id: 80d009c33b7f23dc2c4225da6fd0fb09589ba061
Summary: More general purposed type for Syncable{IdDag,IdMap}.
Reviewed By: sfilipco
Differential Revision: D24399502
fbshipit-source-id: 0599db6dd07fe3d430458f86a33a9144d850fca1
Summary: This makes it more generic.
Reviewed By: sfilipco
Differential Revision: D24399493
fbshipit-source-id: 8a1d0a13dd29989b17fe3ef1497b10b6fa0629d6
Summary: Similar to IdDag change, move impls to separate files.
Reviewed By: sfilipco
Differential Revision: D24399508
fbshipit-source-id: 575b6e7194677b67b6755b0a30ae7d014d498b10
Summary:
The lock, reload, mutate, persist pattern is general. It can be used for IdMap
too.
Reviewed By: sfilipco
Differential Revision: D24399512
fbshipit-source-id: d25e51ba735061ca101101d75aff95deb88b1d36
Summary:
Now `build_segments_persistent` and `build_segments_volatile` are the same.
Just keep one of them.
Reviewed By: sfilipco
Differential Revision: D24399511
fbshipit-source-id: a9f1ac920cdf5b448bd99bf9b6d4ca4160ba0304
Summary:
Previously, we keep the last high level segment per level in memory, and
drop it on disk. When we cross the memory / disk boundary, we had to
maintain such properties carefully. That was needed because some DAG
algorithms rely on complete high level segments.
Now that no DAG algorithms depend on such properties, let's just drop
the logic adding the last segment back to simplify the code.
This removes the need of building segments after open() and sync().
Reviewed By: sfilipco
Differential Revision: D24399515
fbshipit-source-id: 4c640d9aa03c050fcd97f70ee386e32d3a8ee26d
Summary:
This makes the algorithm a bit more robust. Now none of the DAG algorithms
depend on high-level segments are complete and cover all low-level segments.
This also removes constraints. For example, SyncableIdDag can now just
deref() to the normal IdDag for queries without worrying about correctness.
Reviewed By: sfilipco
Differential Revision: D24399503
fbshipit-source-id: e6a91010cff82264cf423e2f24dee1d372822ef6
Summary:
They depend on high-level segments covering low-level segments, which
adds extra complexities. Remove them to simplify logic.
Reviewed By: sfilipco
Differential Revision: D24399509
fbshipit-source-id: 56a8e06c263107d1da4d6754b884ce51e18e30bf
Summary: The panics can happen when the input sets are out of range.
Reviewed By: kulshrax
Differential Revision: D24191789
fbshipit-source-id: efbcbd7f6f69bd262aa979afa4f44acf9681d11e
Summary:
Some sort of serialization for the Dag is useful for saving the IdDag produced
by offline jobs load that when a mononoke server starts.
Reviewed By: quark-zju
Differential Revision: D24096964
fbshipit-source-id: 5fac40f9c10a5815fbf5dc5e2d9855cd7ec88973
Summary:
Generated by formatting with rustfmt 2.0.0-rc.2 and then a second time with fbsource's current rustfmt (1.4.14).
This results in formatting for which rustfmt 1.4 is idempotent but is closer to the style of rustfmt 2.0, reducing the amount of code that will need to change atomically in that upgrade.
---
*Why now?* **:** The 1.x branch is no longer being developed and fixes like https://github.com/rust-lang/rustfmt/issues/4159 (which we need in fbcode) only land to the 2.0 branch.
---
Reviewed By: zertosh
Differential Revision: D23568779
fbshipit-source-id: 477200f35b280a4f6471d8e574e37e5f57917baf
Summary:
This is based on fbsource data, building level 5 proves to be not useful.
This would save 300ms in the write path.
Reviewed By: sfilipco
Differential Revision: D23494505
fbshipit-source-id: ca795b4900af40dbfdaa463d36f3169413bf6a62
Summary:
Previously the IdMap's "Name -> Id" index simply ignores the "reassign
non-master" request. It turns out stale entries in that index can cause
issues as demonstrated by the previous diff.
Update IdMap to actually remove both indexes of non-master group on
remove_non_master so it cannot have stale entries.
To optimize the index, the format of IdMap is changed from:
[ 8 bytes Id (Big Endian) ] [ Name ]
to:
[ 8 bytes Id (Big Endian) ] [ 1 byte Group ] [ Name ]
So the index can use reference to the slice, instead of embedding the bytes, to
reduce index size.
The filesystem directory name for IdMap used by NameDag is bumped to `idmap2`
so it won't read the incompatible old `idmap` data.
Reviewed By: sfilipco
Differential Revision: D23494508
fbshipit-source-id: 3cb7782577750ba5bd13515b370f787519ed3894
Summary: Some vertexes can disappear from the graph!
Reviewed By: sfilipco
Differential Revision: D23494506
fbshipit-source-id: ecbf2a4169e5fc82596e89a4bfe4c442a82e9cd2
Summary: The TestDag struct will be used to do some more complicated tests.
Reviewed By: sfilipco
Differential Revision: D23494507
fbshipit-source-id: 11350f9e448725ae49f50a7b6f19efc57ad84448
Summary:
At open time, it's pointless to attempt to create new levels. So let's just
read the existing max_level and do not try to build max_level + 1.
This turns out to save 300ms in profiling result.
Reviewed By: sfilipco
Differential Revision: D23494509
fbshipit-source-id: 4ea326a3cc21792790ea0b87e5bf608a94ae382b
Summary:
Change dag_ops benchmarks to use different IdDagStores. An example run shows:
benchmarking dag::iddagstore::indexedlog_store::IndexedLogStore
building segments (old) 856.803 ms
building segments (new) 127.831 ms
ancestors 54.288 ms
children (spans) 619.966 ms
children (1 id) 12.596 ms
common_ancestors (spans) 3.050 s
descendants (small subset) 35.652 ms
gca_one (2 ids) 164.296 ms
gca_one (spans) 3.132 s
gca_all (2 ids) 270.542 ms
gca_all (spans) 2.817 s
heads 247.504 ms
heads_ancestors 40.106 ms
is_ancestor 108.719 ms
parents 243.317 ms
parent_ids 10.752 ms
range (2 ids) 7.370 ms
range (spans) 23.933 ms
roots 620.150 ms
benchmarking dag::iddagstore::in_process_store::InProcessStore
building segments (old) 790.429 ms
building segments (new) 55.007 ms
ancestors 8.618 ms
children (spans) 196.562 ms
children (1 id) 2.488 ms
common_ancestors (spans) 545.344 ms
descendants (small subset) 8.093 ms
gca_one (2 ids) 24.569 ms
gca_one (spans) 529.080 ms
gca_all (2 ids) 38.462 ms
gca_all (spans) 540.486 ms
heads 103.930 ms
heads_ancestors 6.763 ms
is_ancestor 16.208 ms
parents 103.889 ms
parent_ids 0.822 ms
range (2 ids) 1.748 ms
range (spans) 6.157 ms
roots 197.924 ms
benchmarking dag::iddagstore::bytes_store::BytesStore
building segments (old) 724.467 ms
building segments (new) 90.207 ms
ancestors 23.812 ms
children (spans) 348.237 ms
children (1 id) 4.609 ms
common_ancestors (spans) 1.315 s
descendants (small subset) 20.819 ms
gca_one (2 ids) 72.423 ms
gca_one (spans) 1.346 s
gca_all (2 ids) 116.025 ms
gca_all (spans) 1.470 s
heads 155.667 ms
heads_ancestors 19.486 ms
is_ancestor 51.529 ms
parents 157.285 ms
parent_ids 5.427 ms
range (2 ids) 4.448 ms
range (spans) 13.874 ms
roots 365.568 ms
Overall, InProcessStore > BytesStore > IndexedLogStore. The InProcessStore
uses `Vec<BTreeMap<Id, StoreId>>` for the level-head index, which is more
efficient on the "Level" lookup (Vec), and more cache efficient (BTree).
BytesStore outperforms IndexedLogStore because it does not need to verify
checksum on every read access - the checksum was verified at store creation
(IdDag::from_bytes).
Note: The `BytesStore` is something optimized for serialization, and hasn't been sent.
Reviewed By: sfilipco
Differential Revision: D23438174
fbshipit-source-id: 6e5f15188e3b935659ccde25fac573e9b963b78f
Summary: This allows them to use the SyncableIdDag APIs.
Reviewed By: sfilipco
Differential Revision: D23438170
fbshipit-source-id: 7ec7288cfb8186b88f85f0212a913cb0dffe7345
Summary: Other IdDagStores can also use the API. This will be used in benchmarks.
Reviewed By: sfilipco
Differential Revision: D23438180
fbshipit-source-id: 565552b66372dcfbb268c397883f627491d6e154
Summary:
Similar to `IdDagStore::sync` -> `GetLock::persist`, `reload` is more related
to filesystem/internal state exchange, and should be protected by a lock. So
let's move the API there, and requires a lock.
Reviewed By: sfilipco
Differential Revision: D23438169
fbshipit-source-id: 4228106b7739a1a758677adfddd213ad54aa4b6a
Summary:
`NameDag::reload` is used in `flush` to get a "fresh" NameDag.
In a future diff the `IdDag::reload` API gets changed, so let's
remove NameDag's use of it.
Instead, let's just re-`open` the path again to get a fresh NameDag.
It's a bit more expensive but probably okay, and easier to understand.
`get_new_segment_size()` was added as an internal API to preserve tests.
This also solves an issue where `NameDag` cannot recover properly if its
`flush` fails, because the old `NameDag` state is not lost.
After removing `NameDag::reload`, `idMap::reload` is no longer used publicly
and was made private.
Reviewed By: sfilipco
Differential Revision: D23438179
fbshipit-source-id: 0a32556a2cd786919c233d7efcae1cb9cbc5fb09
Summary:
The word "sync" is bi-directional: flush + reload. It was indexedlog::Log's
behavior. However, in the IdDag context "sync" is confusing - it is actually
only used to write data out, with protection from lock. Rename to `persist`
to clarify it's memory -> disk. Besides, requires a reference to a lock object
as a lightweight prove that some lock is held.
Reviewed By: sfilipco
Differential Revision: D23438175
fbshipit-source-id: 3d9ccd7431691d1c4e2ee74f3c80d95f5e7243b5
Summary:
This removes the need of cloning `IdMap`.
SyncableIdMap is a bit tricky. I added some comments to clarify things.
Reviewed By: sfilipco
Differential Revision: D23438176
fbshipit-source-id: fe66071da07067ed6c53a6437790af1d81b28586
Summary:
Make the test cover IndexedLogIdDagStore. The only change is the parent index
returns children in a different order.
Reviewed By: sfilipco
Differential Revision: D23438173
fbshipit-source-id: bcfabcd329e45bbc5e7e773103fa42307c23c35d
Summary: Make it possible to test other IdDagStores.
Reviewed By: sfilipco
Differential Revision: D23438178
fbshipit-source-id: e5fc1b20833c71dd7569c77c31c76a26a6e357fe
Summary:
Now SpanSet can easily support `push_front`, we can just use SpanSet
efficiently without SpanSetAsc.
Reviewed By: sfilipco
Differential Revision: D23385246
fbshipit-source-id: b2e0086f014977fa990d5142e6eee844293e7ca5
Summary: To remove SpanSetAsc, its API needs to be implemented on SpanSet.
Reviewed By: sfilipco
Differential Revision: D23385250
fbshipit-source-id: ebd9d537287b5c1cde6e2c52ffb6da57dbd71852
Summary: This will make it possible to `push_front` and remove SpanSetAsc special case.
Reviewed By: sfilipco
Differential Revision: D23385249
fbshipit-source-id: 63ac67e9bce7cb281236399b3fb86eba23bbf8a0
Summary:
This makes it easier to replace Vec<Span> with VecDeque<Span> in SpanSet for
efficient push_front and deprecates SpanSetAsc (which uses Id in a bit hacky
way - they are not real Ids).
Reviewed By: sfilipco
Differential Revision: D23385245
fbshipit-source-id: b612cd816223a301e2705084057bd24865beccf0
Summary:
Previously the `is_valid()` function only checks about ordering.
Make it also check "no mergeable adjacent spans" and `span.low<=span.high`.
To provide better debug messages, the function does assertions
directly without returning a bool.
Reviewed By: sfilipco
Differential Revision: D23385247
fbshipit-source-id: 84829e9242e47e68dc2a4b2a6775b13331eba959
Summary:
Previously, `SpanSet::from_sorted_spans` allows having adjacent spans like
`[1..=2, 3..=4]`, while `SpanSet::from_spans` would merge them into `[1..=4]`.
Change it so `SpanSet::from_sorted_spans` merges them too. This simplifies
the `contains` logic and could make some Sets more efficient.
Reviewed By: sfilipco
Differential Revision: D23385248
fbshipit-source-id: 85b5ba9533f15034779e93255085a4fa09c6328a
Summary: Set a default limit so the output won't be too long.
Reviewed By: DurhamG
Differential Revision: D23307792
fbshipit-source-id: 7e2ed99e96bbde06436a034e78f899fc2e3e03f8
Summary: Will be used to simplify code.
Reviewed By: sfilipco
Differential Revision: D23269859
fbshipit-source-id: bed0c4dca075ff60900025642af1d84bdd03452d
Summary:
`impl<T> Trait for T` in the current Rust makes it impossible to have
`impl<Q> Trait for Q`. Avoid using it for IdConvert and PrefixLookup.
Reviewed By: sfilipco
Differential Revision: D23269861
fbshipit-source-id: a837f3984ff4e1bd5a3983dd1642b9f064f51a36
Summary:
`impl<T> Trait for T` in the current Rust makes it impossible to have
`impl<Q> Trait for Q`. Avoid using it for DagAlgorithm.
Reviewed By: sfilipco
Differential Revision: D23269860
fbshipit-source-id: 031e75e9bf1f1eec2b9e8f36220ef8b817a143a5
Summary: LowLevelAccess is a subset of NameDagStorage. Use the latter instead.
Reviewed By: sfilipco
Differential Revision: D23269865
fbshipit-source-id: 81ebb1e986d8b02c968a9a237ad9a97d4afd54bf
Summary:
If there are too many heads, the current `descendants` algorithm would visit
all "old" heads. For example, with this graph:
head9999 (N9999)
/
Z (master)
:
: (many heads)
:/
: head2 (N2)
:/
C head1 (N1)
|/
B head0 (N0)
|/
A
`A::head9999` or `Z::head9999` will visit N0, N1, ..., N9999, because
`descendands_up_to` is provided with `max_id = N9999` and Z as a vertex in the
master group, is before N0 in non-master. The current algorithm also means
`descendands_up_to` gets linearly slower as the user uses the repo more, which
is quite undesirable.
This diff changes `descendands_up_to` to take an `ancestors` set, which is
`::head9999` in this case, and iterate non-master flat segments in it. So it
will skip N0 to N9998 directly by finding the N9999 flat segment and only use
it. The number of heads will have a smaller impact on performance.
Another slowness is `draft::draft_heads`, if there are too many `draft_heads`,
the internal calculation of `::draft_heads` can be slow. Optimize it by
limiting `draft_heads` to `draft:`. Practically this affects `y::` revset as
`y::` is translated to `y::visible_heads` and `visible_heads` can be large.
`cargo bench --bench dag_ops -- '::-master'` shows significant difference:
Before:
range (master::draft) 18.112 s
range (recent_draft::drafts) 2.594 s
After:
range (master::draft) 72.542 ms
range (recent_draft::drafts) 14.932 ms
In my fbsource checkout there were 20k+ heads. The improvement of
`master::recent_draft` (`x::y`) is pretty visible, and `y::` is also improved:
% lhg debugbenchmarkrevsets -m -x 'p1(min(7e8c86ae % master))' -Y 'draft() & 7e8c86ae' -e 'x::y' -e 'y::' --no-default
# x: 168f5228e570fb6b2ff7f851bd82413102748d84 (p1(min(7e8c86ae % master)))
# y: 7e8c86aec68ebc6e0b8254afcb381315991fd21c (draft() & 7e8c86ae)
# before
| revset \ backend | segments | revlog | revlog-cpy |
|------------------|----------|--------|------------|
| x::y | 17ms | 0.1ms | 0.5ms |
| y:: | 3.3ms | 0.7ms | 1.3ms |
# after
| revset \ backend | segments | revlog | revlog-cpy |
|------------------|----------|--------|------------|
| x::y | 0.2ms | 0.1ms | 0.6ms |
| y:: | 1.0ms | 0.7ms | 1.3ms |
Reviewed By: sfilipco
Differential Revision: D23214387
fbshipit-source-id: 4d11db84cd28f4e04e8b991cbc650c9d5781fd27
Summary:
Lots of non-master heads is not an exercised graph in the benchmarks.
Add it as it practically happens. This will be used by the next change.
Reviewed By: sfilipco
Differential Revision: D23259879
fbshipit-source-id: 7fe290d14403e42e6d135bde56e2d5c8519ae530
Summary:
Currently the fuzz test only uses the master group. Let it exercise non-master
group too.
Reviewed By: DurhamG
Differential Revision: D23214388
fbshipit-source-id: 7108a1055fbdda2b012f93c5948fb83ef3b9a96f
Summary:
Provide a way to print out all segments with resolved names. This will be used
in a debug command.
Reviewed By: sfilipco
Differential Revision: D23196410
fbshipit-source-id: 1712bfda0271aa548699fe4a6b8603c5ec07af7f
Summary:
Use the parent-child index to answer children query quickly.
`cargo bench --bench dag_ops -- children`:
Before:
children (spans) 606.076 ms
children (1 id) 124.105 ms
After:
children (spans) 602.999 ms
children (1 id) 10.777 ms
Reviewed By: sfilipco
Differential Revision: D23196411
fbshipit-source-id: 37195d5ccaa582d35314e0000352ef477287d38c
Summary: This will be used to optimize "children(single vertex)" query.
Reviewed By: sfilipco
Differential Revision: D23196409
fbshipit-source-id: 050c0859faf83b909e3174bb7c7bd6e7725165c0
Summary:
Update the parent index to store non-master group too. To make
"remove_non_master" work, the index contains a "child group" prefix that
allows efficient range invalidation.
This will allow answering "children(single vertex)" query more efficiently.
This diff does not expose an API to query the index yet.
Reviewed By: sfilipco
Differential Revision: D23196406
fbshipit-source-id: 9137da5ffa8306bdafbcabc06b6f0d23f38dcf57
Summary:
Practically, the input of `children` is often one vertex instead of a large set.
Add a benchmark for it.
It looks like:
children (spans) 606.076 ms
children (1 id) 124.105 ms
Reviewed By: sfilipco
Differential Revision: D23196407
fbshipit-source-id: 0645b59ac846836fd061386384f6386a57661741
Summary: They can be figured out at Hints initialization time. So they don't need to be mutable.
Reviewed By: sfilipco
Differential Revision: D23182518
fbshipit-source-id: 133375fdf27a2546a50b63fb130534acdadc5938
Summary:
Both IdSet and IdLazy set require both Dag and IdMap to construct.
This is step 1 torwards making Dag and IdMap immutable in hints.
A misspeall of "lhs" vs "hints" in the union set is discovered by the change
and fixed.
Reviewed By: sfilipco
Differential Revision: D23182520
fbshipit-source-id: 3d052de4b8681d3672ebc45d953d1e784f64b2a4
Summary:
It will be used in places (ex. tests) where a Dag is required but constructing
a real Dag is troublesome.
Reviewed By: sfilipco
Differential Revision: D23182517
fbshipit-source-id: 736911365778e5071c1e0b9615090a4e960392a0
Summary: This is more consistent with `id_map_snapshot`.
Reviewed By: sfilipco
Differential Revision: D23182519
fbshipit-source-id: 62b7fc8bfdc9d6b3a4639a6518ea084c7f3807dd
Summary:
Similar to descendants, the new range algorithm avoids potentially expensive
checks about whether high-level segments can be used or not. Practically this
is overall an improvement.
`cargo bench --bench dag_ops -- range`:
Before:
range (2 ids) 115.380 ms
range (spans) 243.666 ms
After:
range (2 ids) 123.274 ms
range (spans) 23.101 ms
It is 100x faster with the range x::y benchmark added later on `git.git`.
Reviewed By: sfilipco
Differential Revision: D23106175
fbshipit-source-id: 691e0418ba2b7ad9f52ac15b5cd6088ec28d5f48
Summary:
The old algorithm tries to make use high-level segments.
However, the code to test whether a high-level segment can be used is
often too expensive for the benefit. Often, high-level segments cannot
be used most of the time and it's similar to O(flat segments).
This diff adds a simpler algorithm that just iterates through the flat
segments. It's faster in most practical cases.
`cargo bench --bench dag_ops -- descendants` shows improvements too:
Before:
descendants (small subset) 436.515 ms
After:
descendants (small subset) 33.460 ms
Reviewed By: sfilipco
Differential Revision: D23106174
fbshipit-source-id: e6101483d8539b2b1c881be2ccfd0071f122352f
Summary: This will be used by upcoming changes.
Reviewed By: sfilipco
Differential Revision: D23106177
fbshipit-source-id: 9bf183f7464c06b801be64fd938db0babd544756
Summary: This internal struct will be used by upcoming changes.
Reviewed By: sfilipco
Differential Revision: D23106172
fbshipit-source-id: 6d5b9bc1c810984814d0912100acca38a2565a63
Summary:
The fuzz tests need `TestContext::id_dag()`, which was removed by D20471712 (1fb5acf242).
Restore it so fuzz tests can run. This is mainly to check the new `range`
function.
The `range` fuzz test does find an issue caused by `>` written as `>=`
relatively quickly.
Reviewed By: sfilipco
Differential Revision: D23106176
fbshipit-source-id: e9540cc932503a9d54246d24c70bac829fcb13df
Summary:
Read git commit graph and migrate them to `dag::Dag`.
This allows using Rust dag abstractions on the git
commit graph.
Reviewed By: DurhamG
Differential Revision: D23095471
fbshipit-source-id: 2163701350ce82ce6e97074e56ad5877f3c9c158
Summary:
If there is no new master segments, it's still possible to have new non-master
segments. Fix the loop condition so we don't skip building non-master segments.
Reviewed By: sfilipco
Differential Revision: D23095465
fbshipit-source-id: 46eb9d5b5f2b04241981558646e0bc090652abce
Summary:
I noticed that high-level segments are somehow not built for non-master vertexes.
Add a test to demonstrate the issue.
Reviewed By: DurhamG, sfilipco
Differential Revision: D23095466
fbshipit-source-id: c5a6da14bdfabcf7c432f6c6dfe096c71cc10ee9
Summary: This is useful to investigate internals of dag calculations.
Reviewed By: sfilipco
Differential Revision: D23095473
fbshipit-source-id: 4750c1b4ffad32b1317051d17db9659aaaed59c4
Summary:
Follow up of the previous change by actually using the flat segments to build
segments. This significantly improved the perf. `cargo bench --bench dag_ops`
shows:
building segments (old) 774.109 ms
building segments (new) 143.879 ms
Besides, a `O(N^2)` update to `head_ids` is changed. It improves performance
when the graph has many heads (ex. the mutation graph).
Reviewed By: sfilipco
Differential Revision: D23036080
fbshipit-source-id: 033565700f253c6f20e30a00adb6b579921d6679
Summary:
While testing the `obsolete()` set, I found an in-memory segmented DAG takes
10x time to build than a HashMap DAG.
Part of the inefficiency is to use a translated "parent_func" that round-trips
through Id and Vertex, used by segment building logic. This diff makes
`IdMap::assign_head` return flat segments, so we don't need a translated
"parent_func" to build flat segments.
This diff only adds checks to make sure the parent_func (Id version) matches
the segments. The next diff switches the segment building to not use the
translated parent_func.
Reviewed By: sfilipco
Differential Revision: D23036060
fbshipit-source-id: 99137f4b5be455cdf43218ba23eb3954b6d9e05a
Summary:
This affects the `tonodes` API in the Python world. Practically this will bind
the main commit graph to sets like draft, public.
The `ToSet` requirement on `DagAlgorithm` has to be removed to avoid stack
overflow of rustc resolving constraints.
Reviewed By: sfilipco
Differential Revision: D23036077
fbshipit-source-id: 912b924e29611680ab6b2ee4dbcd7ab39824409a
Summary: This will be useful for the `obsolete()` set.
Reviewed By: sfilipco
Differential Revision: D23036072
fbshipit-source-id: 2f944ef31cf19f902622d90545fa02b7dda89221
Summary:
If two sets have different IdMap, their Ids cannot be compared directly
for correctness.
Reviewed By: sfilipco
Differential Revision: D23036068
fbshipit-source-id: e800e8273b95c1f8174236e0f30445db7fd44556
Summary: This is similar to the previous change. This allows "binding" IdMaps to sets.
Reviewed By: sfilipco
Differential Revision: D23036058
fbshipit-source-id: ec1b1ec73e949ad4865aecf17bfcc5c1ca723e0d
Summary:
This trades a bit performance (calculating the snapshot) for correctness (no
pointer reuse issues) and convenience (set captures dag information with them
and enables use-cases like converting NameSet from another dag to the
current dag without requiring extra `dag` objects).
Reviewed By: sfilipco
Differential Revision: D23036067
fbshipit-source-id: 2e691f09ad401ba79dbc635e908d79e54dadca5e
Summary:
If `x` and `y` come from a same graph, `x & y` is more efficient than
`y & x` if `y` is larger. However, if `x` and `y` are from different
graphs, the `FULL` hint can no longer accurately predict which one
is larger. Therefore the swap should be avoided.
Reviewed By: sfilipco
Differential Revision: D23036081
fbshipit-source-id: fe3970fc38c853b36689bfd0ee1dec20643ace78
Summary:
For sets like `obsolete()`, `merge()`, they could have a fast "contains" path:
Just check the given commit without calculating a full set. It's also possible
to have a relatively efficient code path to return StaticSet (for obsolete()),
or IdStaticSet (for merge(), by checking flat segments). This diff adds a
`MetaSet` that allows defining two fast paths separately.
This will be used for the `obsolete()` set in upcoming changes.
Reviewed By: sfilipco
Differential Revision: D23036059
fbshipit-source-id: 06e6f90e7e9511626a12cfa729c306ff539256d2
Summary:
Before this change, `flush` with empty changes but `master` moves will cause an
error, because the `parents_func` only contains "pending changes", aka. new
vertexes. The `parents_func` does not know `master` and `master` is needed to
re-assign them from the non-master to the master group.
With the snapshot API, things become easier. We just take a snapshot before
reloading, and use the snapshot to answer parent_names.
Reviewed By: sfilipco
Differential Revision: D22970569
fbshipit-source-id: 99a25857ba98792edff69985c16df118a560ffb0
Summary:
This API allows the underlying Dag to provide a snapshot. The snapshot can then
be used in places that do not want a lifetime (ex. NameSet).
Reviewed By: sfilipco
Differential Revision: D22970579
fbshipit-source-id: ededff82009fd5b4583f871eef084ec907b45d33
Summary:
Make it possible to snapshot a Dag. This is useful for cases where another
struct wants access to the Dag without lifetimes. Namely, the LazySet can
might want to keep a snapshot of Dag.
Reviewed By: sfilipco
Differential Revision: D22970568
fbshipit-source-id: 508c38d3ffac2ffcd2e682578c3c5e5787ea3bcf
Summary:
The only intended use of the inverse DAG is to implement the Python dag
interface in `dagutil.py`. D22519589 (2d4d44cf3d) stack changed it so the Python dag
interface becomes optional. Therefore there is no need to keep the inverse DAG
interface, which is a bit tricky on sorting.
Reviewed By: sfilipco
Differential Revision: D22970581
fbshipit-source-id: 58a126b41d992e75beaf76ece25cb578ee84760b
Summary:
This allows importing from other DAGs. It will be used to import revlog DAG to
the new segmented format.
Reviewed By: sfilipco
Differential Revision: D22970572
fbshipit-source-id: 0a183e7b64831574cc9c60d4639124d02d19cf43
Summary:
This allows dag to use renderdag in tests to verify graph result. Previously
it was hard because dag <-> renderdag would form circular dependency.
It also make it possible to implement more efficient and integrated fast paths
for graph rendering.
Reviewed By: sfilipco
Differential Revision: D22970570
fbshipit-source-id: 526497339bd7aa8898d1af4aa9cf6d2a6797aae0
Summary:
This is more complex than previous libraries, mainly because `dag` defines APIs
(traits) used by other code, which might raise error type not interested
by `dag` itself. `BackendError::Other(anyhow::Error)` is currently used to
capture types that do not fit in `dag`'s predefined error types.
Reviewed By: sfilipco
Differential Revision: D22883865
fbshipit-source-id: 3699e14775f335620eec28faa9a05c3cc750e1d1
Summary:
All dependencies of revlogindex have migrated to concreted error types.
Let's migrate revlogindex itself. This allows compile-time type checks
and makes the error returned by revlogindex APIs more predictable.
Reviewed By: sfilipco
Differential Revision: D22857554
fbshipit-source-id: 7d32599508ad682c6e9c827d4599e6ed0769899c
Summary:
I thought it was just `roots & (::heads)`. It is actually more complex than
that.
Reviewed By: sfilipco
Differential Revision: D22657201
fbshipit-source-id: bd0b49fc4cdd2c516384cf70c1c5f79af4da1342
Summary:
No need to exhaust the entire IdLazySet if there are hints.
This is important to make `small & lazy` fast.
Reviewed By: sfilipco
Differential Revision: D22638462
fbshipit-source-id: 63a71986e6e254769c42eb6250c042ea6aa5808b
Summary:
When multiple DAGs (ex. a local DAG and a commit-cloud DAG) are involved,
certain fast paths become unsound. Namely, the fast paths of the FULL hint
should check DAG compatibility. For example:
localrepodag.all() & remotedag.all()
should not simply return `localrepodag.all()` or `remotedag.all()`.
Fix it by checking DAG pointers.
A StaticSet might be created without using a DAG, add an optimization
to change `all & static` to `static & all`. So StaticSet without DAG
wouldn't require full DAG scans when intersecting with other sets.
Reviewed By: sfilipco
Differential Revision: D22638454
fbshipit-source-id: 72396417e9c1238d5411829da8f16f2c6d4c2f3a
Summary:
Improve `fmt::Debug` so it fits better in the Rust and Python eco-system:
- Support Rust formatter flags. For example `{:#5.3?}`. `5` defines limit of a
large set to show, `3` defines hex commit hash length. `#` specifies the
alternate form.
- Show commit hashes together with integer Ids for IdStaticSet.
- Use HG rev range syntax (`a:b`) to represent ranges for IdStaticSet.
- Limit spans to show for IdStaticSet, similar to StaticSet.
- Show only 8 chars of a long hex commit hash by default.
- Minor renames like `dag` -> `spans`, `difference` -> `diff`.
Python bindings uses `fmt::Debug` as `__repr__` and will be affected.
Reviewed By: sfilipco
Differential Revision: D22638455
fbshipit-source-id: 957784fec9c99c8fc5600b040d964ce5918e1bb4
Summary:
This makes intersection set stop early. It's useful to stop iteration on some
lazy sets. For example, the below `ancestors(tip) & span` or
`descendants(1) & span` sets can take seconds to calculate without this
optimization.
```
In [1]: cl.dag.ancestors([cl.tip()]) & cl.tonodes(bindings.dag.spans.unsaferange(len(cl)-10,len(cl)))
Out[1]: <and <lazy-id> <dag [...]>>
In [3]: %time len(cl.dag.ancestors([cl.tip()]) & cl.tonodes(bindings.dag.spans.unsaferange(len(cl)-10,len(cl))))
CPU times: user 364 µs, sys: 0 ns, total: 364 µs
Wall time: 362 µs
In [7]: %time len(cl.dag.descendants([repo[1].node()]) & cl.tonodes(bindings.dag.spans.unsaferange(0,100)))
CPU times: user 0 ns, sys: 574 µs, total: 574 µs
Wall time: 583 µs
```
Reviewed By: sfilipco
Differential Revision: D22638458
fbshipit-source-id: b9064ce2ff1aecc2d7d00025928dfcb3c0d78e0c
Summary:
The hint indicates a set `X` is equivalent to `ancestors(X)`.
This allows us to make `heads` use `heads_ancestors` (which is faster in
segmented changelog) automatically without affecting correctness. It also
makes special queries like `ancestors(all())` super cheap because it'll just
return `all()` as-is.
Reviewed By: sfilipco
Differential Revision: D22638463
fbshipit-source-id: 44d9bbcbb0d7e2975a0c8322181c88daa1ba4e37
Summary: This is discovered by using it in Python world.
Reviewed By: sfilipco
Differential Revision: D22323186
fbshipit-source-id: 295811e0950b94ad2ad73ad242228b6a3f9765d0
Summary: Some DAG implementations does not support it.
Reviewed By: sfilipco
Differential Revision: D22249158
fbshipit-source-id: ebcdf164677ee647ef44aa1ee3cfd318bac658b0
Summary:
Different implementation might return different orders. They should be
considered correct.
Reviewed By: sfilipco
Differential Revision: D22249159
fbshipit-source-id: 36e4cadf814366f7ee2ed8a778948ff810760550
Summary: This makes it possible to run tests for other DAGs, like the revlog.
Reviewed By: sfilipco
Differential Revision: D22249155
fbshipit-source-id: 205579eeaccd42a21297d965973957168bb8726e
Summary:
The reverse `to_id_set` exists.
It turns out that the Python land wants this in many places.
Reviewed By: sfilipco
Differential Revision: D22240175
fbshipit-source-id: b6a3a3a3869dc0c521a21b1d86394421b816632b
Summary:
This provides a way for implementations to optimize the operation.
For segmented changelog, the default implementation is good enough.
For revlog, `only` can have a fast path that does not iterate through the
entire changelog.
A related API `only_both` is added. For revlog it has multiple use-cases,
including narrow-heads phase calculation and revlog.findcommonmissing used by
discovery.
Reviewed By: markbt
Differential Revision: D21944132
fbshipit-source-id: d11660dae85ea6158977eb00d1ceaceddf1d8234
Summary:
This makes it easier to remove cycles in other places.
There are probably fancier and more efficient algorithm for this.
For now I just wrote one that is easy to verify correctness.
Reviewed By: markbt
Differential Revision: D22174975
fbshipit-source-id: 8a2dc755e4bc0b066eda5f42a51208c92409f2f9
Summary: The trait converts NameSet to IdSet. It'll be used by the revlog index.
Reviewed By: sfilipco
Differential Revision: D21795869
fbshipit-source-id: 55f7a238158442db9d8bdfe84e64438be504f618
Summary: Add a way to inverse the DAG (swap parent / children relations).
Reviewed By: sfilipco
Differential Revision: D21795870
fbshipit-source-id: 2d076f4ae491141aa758faa5f5f303c97f7e56dc
Summary:
Similar to LazySet, but the iterator is using Ids. This will be useful for
lazy calculations that are cheaper with Ids.
Reviewed By: sfilipco
Differential Revision: D21626208
fbshipit-source-id: 9a34fbf18f0039caeb4f6e698294c4d335354093