Summary:
The `add_heads_and_flush` method might add new nodes in the master group,
and it should update `overlay_map_next_id` accordingly. Without it, it
might error out like:
RustError: ProgrammingError: Server returned x~n (x = 9ebc9ebc49f1819767b40f9ceb22c37547a10c37 8459584, n = 1411).
But x exceeds the head in the local master group {}. This is not expected and indicates some logic error on the server side.
Full error: P387088806
Reviewed By: sfilipco
Differential Revision: D27637278
fbshipit-source-id: b45370db0561dec52cd513a12e2fd0110f18e0e5
Summary:
While it is in theory "correct" going through the remote resolution even if the
protocol is "local". The overhead turns out to be something. And the tracing
message "resolve .. remotely" can be quite noisy. Let's just skip remote
resolutions early in IdConvert implementations.
Reviewed By: sfilipco
Differential Revision: D27630094
fbshipit-source-id: 7d87079876f040cf8f764f7362021dddba0d4723
Summary:
Currently the "contains vertex" check can trigger excessive
fetches for add_heads (and add_heads_and_flush used by flush).
Add a test to demonstrate the problem.
Reviewed By: sfilipco
Differential Revision: D27630091
fbshipit-source-id: ce3639c2a13226ba5681b4e8696edd7acbcb57f9
Summary:
Otherwise it can cause a lazy dag to think vertexes as "missing", insert
vertexes unnecessarily, and potentially break key graph properties (a
vertex should only have one Id).
Reviewed By: sfilipco
Differential Revision: D27629315
fbshipit-source-id: 1688d13cb94015bbc675613ecf9225556ff48372
Summary:
Also move related functions.
Similar to D27547584 (af3c3b3fd0), this allows `add_heads_and_flush` use `IdConvert`
on the `NameDag`, instead of the `IdMap` to trigger remote fetches properly.
This diff is easier to view with whitespace changes ignored.
Reviewed By: sfilipco
Differential Revision: D27629314
fbshipit-source-id: 8f79223c5d324aabfc5ab9813a9f65400fc533ec
Summary:
See the previous diff for context. Drop Locked and related APIs (prepare_filesystem_sync).
This makes it easier to use operate on a mut NameDag on flush because it does not need
to use separate types (Locked) for writing which has issues of not having the remote protocol
involved.
Reviewed By: sfilipco
Differential Revision: D27629306
fbshipit-source-id: 301445b242321ad5f424741ea91ebf6c075bff1c
Summary:
See the previous diff for context. Drop SyncableIdMap so we are one step
closer to using mut NameDag directly on add_heads, which knows when and how to
do remote fetching properly.
Reviewed By: sfilipco
Differential Revision: D27629310
fbshipit-source-id: 883606e40bb83907dfa6142ddd2c3030de61698f
Summary:
By using SyncableIdDag and SyncableIdMap, it's harder to use extra features
around them (ex. remote fetching). Drop SyncableIdDag so we are one step
closer to using mut NameDag directly on add_heads, which knows when and how to
do remote fetching properly.
Reviewed By: sfilipco
Differential Revision: D27629307
fbshipit-source-id: 8e9a5a4348a42b9751752b82feb3f3d2d7c4ba45
Summary:
The `Parents` trait is used for input of adding (non-lazy) vertexes to the
graph. The API will be used to provide extra hints to optimize network
fetches.
With the current logic, `assign_head` will ask the server to resolve the heads
first, to check if it is already assigned, then to resolve the parents, etc. to
the roots (in the "to assign" set). Ideally the `assign_head` logic can ask
the server to resolve the roots first, and if that's unassigned, then just mark
all descendants of the roots as unassigned, do not send more requests.
Note: the current pull logic has all the hashes ready (hashes are known).
But whether the hashes have Id assigned are unknown. It is more tricky
taking the "lock" and "reload" into consideration - hashes without Ids might
"become" having ids assigned after we obtain the lock to write data to disk.
Practically, `pull` using the current logic would take a very long time because
it tries to resolve things remotely for every "to assign" commits.
Reviewed By: sfilipco
Differential Revision: D27629317
fbshipit-source-id: e02f54f43ef65228ce6e3a8a8723dd9ae0a07008
Summary: This just simplifies the test code a bit.
Reviewed By: sfilipco
Differential Revision: D27629308
fbshipit-source-id: 04eac5cd045c71123e7fc410af74609dbadb8fb7
Summary: This avoids triggering remote lookups if an unknown name was looked up multiple times.
Reviewed By: sfilipco
Differential Revision: D27629316
fbshipit-source-id: 64c1ce5e872a5ce4f14c650a946ae8396f4cc74c
Summary:
When translating RequestNameToLocation to ResponseIdNamePair. If "heads" are
known, but some "names" aren't. Do not treat it as an error. This will be
used by the client-side to properly handle the "contains" check.
Reviewed By: sfilipco
Differential Revision: D27629309
fbshipit-source-id: 206ec5df956b33f4e816ab8d6a67ce776fd7bd74
Summary: This will make it easier to test client / server dags in upcoming changes.
Reviewed By: sfilipco
Differential Revision: D27629318
fbshipit-source-id: e3137654613aa3208a8f2e4b9f4ddfe73871f2c5
Summary: This will be used in upcoming changes. It just delegates to the Arc inner.
Reviewed By: sfilipco
Differential Revision: D27629313
fbshipit-source-id: ba6cd7cac2b9f5c1a2898c439c53768995a2dc42
Summary: This will be used by upcoming changes.
Reviewed By: sfilipco
Differential Revision: D27629312
fbshipit-source-id: 6c56e73caf4e1a398ac3a8b4294bd9f380af3764
Summary: This will be used by upcoming changes.
Reviewed By: sfilipco
Differential Revision: D27629319
fbshipit-source-id: d19e490268561f3154642e5bb1e415da4c5d03ee
Summary:
Otherwise it might panic (ex. calling into tokio without entering a tokio
runtime). This can happen in, for example, `Debug::fmt(&IdStaticSet, ...)`.
Reviewed By: sfilipco
Differential Revision: D27581487
fbshipit-source-id: feec53e088706adcc6710afcf24fa70598f886cf
Summary:
This will be used by "add_heads" logic to detect what vertexes to insert
and might reduce remote fetches.
Reviewed By: sfilipco
Differential Revision: D27572359
fbshipit-source-id: d0bf027a69d180663a1587dfde613cb76b05072a
Summary: The API returns entries buffered in memory not persisted.
Reviewed By: sfilipco
Differential Revision: D27572360
fbshipit-source-id: 555988f7c891f2d928bfa1e486a0fc1d089b4ad5
Summary: This will be used to select "dirty" (not written to disk) set in the IdDag.
Reviewed By: sfilipco
Differential Revision: D27572361
fbshipit-source-id: 0b4d2e092ece835e3d4b6aa831d32ffffc7087ca
Summary:
Before this change, overlap IdMap was not considered for prefix lookup. That
results in "shortest" template not working and smartlog prints full hashes
for remote/stable etc.
Reviewed By: sfilipco
Differential Revision: D27547582
fbshipit-source-id: 7a56590775eed9d509f2212f8e5009c04aaf4e9d
Summary: It will be reused in NameDag.
Reviewed By: sfilipco
Differential Revision: D27547583
fbshipit-source-id: da85fc7504d20877210e8ed1a97cbec18d06eede
Summary:
Now NameSet iteration can be blocking, SyncNameSetQuery is no longer sound.
Remove SyncNameSetQuery in key logic (namedag and ops) and replace them with
async logic.
Reviewed By: sfilipco
Differential Revision: D27547581
fbshipit-source-id: af69b1a8219e97d10278939407ee79f9b333a77f
Summary: Dag algorithms like `parent_names` need to fetch vertexes via remote automatically.
Reviewed By: sfilipco
Differential Revision: D27547584
fbshipit-source-id: 8106931d6f642c9a4bf0f3c546ba881c2ca73fea
Summary:
The "filter" set's filter function might not be prepared for inputs outside
the parent set. So let's the "contains" function to test against the parent
set first, then test the filter function.
This fixes the "merge()" set's "contains" check using the revlog backend:
In [1]: v=repo.revs('draft() & merge()')
In [2]: v._set
Out[2]: <meta ?>
In [3]: m.node.nullid in v._set
Out[3]: False
Before this diff it would be:
In [3]: m.node.nullid in v._set
CommitLookupError: '0000000000000000000000000000000000000000 cannot be found'
Note: Segmented changelog backend is not affected because it does not use
filter sets.
Reviewed By: xavierd
Differential Revision: D27657502
fbshipit-source-id: 30bb261fea59bdc5644580e98796f52fa93c2705
Summary:
The issue is that `mut i: usize` is no longer shared across multiple `async
move` blocks (introduced by D27308798 (0df4efa969)).
Rewrite the logic to collect `ids` first, then use `vertex_name_batch`
query instead.
Reviewed By: sfilipco
Differential Revision: D27406586
fbshipit-source-id: b41fe3a13114dc34aa5763e6e2bebe0571decc87
Summary:
Merge paths like `x~n` and `x~(n+1)` to `x~n (batch_size = 2)`.
This could be more efficient bandwidth-wise and algorithm-wise.
Reviewed By: sfilipco
Differential Revision: D27406587
fbshipit-source-id: f2a67352ad627945685e33667e8299a2bc652930
Summary: IdSet is more compact. This changes the order a bit.
Reviewed By: sfilipco
Differential Revision: D27339279
fbshipit-source-id: e9b50a47beba081b892eccd7711dbd6ab5c3a886
Summary: This will be used by the next change.
Reviewed By: sfilipco
Differential Revision: D27406591
fbshipit-source-id: fcacc35a9ae8ed96cebb2af804d26d1e5e83ad9e
Summary:
Add a way to flush the overlay map to disk so we can avoid network fetches over
and over.
Reviewed By: sfilipco
Differential Revision: D27406592
fbshipit-source-id: 7086ad665119cc3a0834f533690325c7a2363442
Summary: It will be reused elsewhere.
Reviewed By: sfilipco
Differential Revision: D27406593
fbshipit-source-id: 296cf5f50830bb7285e0cb9c7c15a9b374689819
Summary:
I spent some time thinking about how to flush the "overlay_map" to disk.
It is a bit tricky because the on-disk IdMap might have changed in an
incompatible way. I tried a few ways to verify the on-disk IdMap remains
compatible and didn't find a way that looks good (either slow - calculating
universal_ids, or is not 100% correct in corner cases).
Now I come up with this "just track x~n" idea. It trades memory usage (~2x
overlay_map) for easy-to-verify correctness, and efficient overlay_map
flush.
Reviewed By: sfilipco
Differential Revision: D27406583
fbshipit-source-id: 0b7fb3186a9c15f376c1dc4afe7f0516c25d3dec
Summary: It is not obvious. So let's add more comments.
Reviewed By: sfilipco
Differential Revision: D27406584
fbshipit-source-id: 9ce1215efc1a6d4849180c6693616613c08f2a51
Summary:
A sparse dag does not have full IdMap. Its IdMap only contains "universally known" entries.
Add a basic test about cloning from a sparse clone data and resolve vertex <-> id mapping
on the fly.
Reviewed By: sfilipco
Differential Revision: D27352018
fbshipit-source-id: 4a3f5f50be52e91bf7b2021cdc858bcab9c99e80
Summary:
The `NameDag::flush` API will actually rebuild the graph using a "parent" function.
That is not necessary if we got clone data, and won't work well for a lazy graph
(since the parent function talks about vertex names and some names are missing).
Let's bypass the `flush` function and write data directly in `import_clone_data`.
Reviewed By: sfilipco
Differential Revision: D27352019
fbshipit-source-id: a79569d25d858447b8c5eb86902b8d39ae0429a3
Summary: This will be used in tests.
Reviewed By: sfilipco
Differential Revision: D27343882
fbshipit-source-id: 5a2d94a9f755eed0fc27e5a11093b55c810dc8da
Summary:
Implement logic to export the clone data. This allows us to construct a sparse/lazy
dag via export + import CloneData.
Reviewed By: sfilipco
Differential Revision: D27343885
fbshipit-source-id: 71dc0d31e36876a8b6a8c3d7f3498be3262ce297
Summary:
Clone data can only be imported to an empty Dag and universally known vertexes
should be present in the IdMap.
Reviewed By: sfilipco
Differential Revision: D27343888
fbshipit-source-id: ba150d6afdbe15f0902ec20ff150a70657e24c80
Summary: Collect "missing" items and only use one request to fetch them.
Reviewed By: sfilipco
Differential Revision: D27406588
fbshipit-source-id: a5cd091b39d90c1ad0e7c5d509673c4665232304
Summary:
This will be used by upcoming changes. Sparse/Lazy NameSet will override it
to reduce network round-trips.
Reviewed By: sfilipco
Differential Revision: D27406590
fbshipit-source-id: a44a73b4aec6e14d6e82d55285fe1cfc0fcfd482
Summary: This will be used by upcoming changes.
Reviewed By: sfilipco
Differential Revision: D27343884
fbshipit-source-id: 0938b1fb3d90b35f9d51c468cffca53e3f421bb8
Summary:
Remove some TODOs. This serves as fallback paths where batch prefetch didn't happen.
I'd expect most use-cases will trigger IdStatic set's batch prefetch logic (to be
added).
I haven't decided what to do exactly with "contains". Fetching remotely there seems
to require some kind of negative cache (ex. in mutation records there might be nodes
not in the graph). But it _might_ be okay to say the "contains" is a local-only
operation, too. I leave it as-is and we'll see how the Python world uses "contains"
later.
Reviewed By: sfilipco
Differential Revision: D27339275
fbshipit-source-id: ba70b3c84a391a8e395c73ccd1d7e08f92b0cbd0
Summary:
Put everything together. I used "programming error" extensively to
provide more context if we have to investigate issues in the future.
Reviewed By: sfilipco
Differential Revision: D27339278
fbshipit-source-id: 574a2c048dc1d24dbe690f862fec3e5078cb067a
Summary: Provide more context about what invariants we expect. Not just show "vertex not found".
Reviewed By: sfilipco
Differential Revision: D27339273
fbshipit-source-id: 1c6c92537ff37666ff603783adfd8f9ea770fbaa
Summary:
Makes NameDag own the state (logic) about how to send remote requests.
So NameDag can send requests on its own.
Reviewed By: sfilipco
Differential Revision: D27339282
fbshipit-source-id: 3cb6327dfeaefae45d4e7b88a3535463a84b195b