Commit Graph

392 Commits

Author SHA1 Message Date
Jun Wu
d1dabad879 dag: update overlay_map on flush
Summary:
The `add_heads_and_flush` method might add new nodes in the master group,
and it should update `overlay_map_next_id` accordingly. Without it, it
might error out like:

  RustError: ProgrammingError: Server returned x~n (x = 9ebc9ebc49f1819767b40f9ceb22c37547a10c37 8459584, n = 1411).
  But x exceeds the head in the local master group {}. This is not expected and indicates some logic error on the server side.

Full error: P387088806

Reviewed By: sfilipco

Differential Revision: D27637278

fbshipit-source-id: b45370db0561dec52cd513a12e2fd0110f18e0e5
2021-04-13 17:12:47 -07:00
Jun Wu
1f4578b30d dag: skip remote resolution early
Summary:
While it is in theory "correct" going through the remote resolution even if the
protocol is "local". The overhead turns out to be something. And the tracing
message "resolve .. remotely" can be quite noisy. Let's just skip remote
resolutions early in IdConvert implementations.

Reviewed By: sfilipco

Differential Revision: D27630094

fbshipit-source-id: 7d87079876f040cf8f764f7362021dddba0d4723
2021-04-13 16:40:07 -07:00
Jun Wu
f138b012e9 dag: populate negative cache for add_heads
Summary: This will reduce round-trips calling remote protocols.

Reviewed By: sfilipco

Differential Revision: D27630093

fbshipit-source-id: e4ad26910f9d8faa609cdcaa20c9932cd7eb168b
2021-04-13 16:40:07 -07:00
Jun Wu
1ffc1824c1 dag: add a test about excessive fetches for add_heads
Summary:
Currently the "contains vertex" check can trigger excessive
fetches for add_heads (and add_heads_and_flush used by flush).

Add a test to demonstrate the problem.

Reviewed By: sfilipco

Differential Revision: D27630091

fbshipit-source-id: ce3639c2a13226ba5681b4e8696edd7acbcb57f9
2021-04-13 16:40:07 -07:00
Jun Wu
8d4f17ea82 dag: flush should set remote protocol on the new dag
Summary:
Otherwise it can cause a lazy dag to think vertexes as "missing", insert
vertexes unnecessarily, and potentially break key graph properties (a
vertex should only have one Id).

Reviewed By: sfilipco

Differential Revision: D27629315

fbshipit-source-id: 1688d13cb94015bbc675613ecf9225556ff48372
2021-04-13 16:40:07 -07:00
Jun Wu
a7876c7901 dag: make build take NameDag instead of separated map and dag
Summary:
Also move related functions.

Similar to D27547584 (af3c3b3fd0), this allows `add_heads_and_flush` use `IdConvert`
on the `NameDag`, instead of the `IdMap` to trigger remote fetches properly.

This diff is easier to view with whitespace changes ignored.

Reviewed By: sfilipco

Differential Revision: D27629314

fbshipit-source-id: 8f79223c5d324aabfc5ab9813a9f65400fc533ec
2021-04-13 16:40:07 -07:00
Jun Wu
94e471e932 dag: drop Locked
Summary:
See the previous diff for context. Drop Locked and related APIs (prepare_filesystem_sync).
This makes it easier to use operate on a mut NameDag on flush because it does not need
to use separate types (Locked) for writing which has issues of not having the remote protocol
involved.

Reviewed By: sfilipco

Differential Revision: D27629306

fbshipit-source-id: 301445b242321ad5f424741ea91ebf6c075bff1c
2021-04-13 16:40:06 -07:00
Jun Wu
bb8b11885f dag: drop SyncableIdMap
Summary:
See the previous diff for context. Drop SyncableIdMap so we are one step
closer to using mut NameDag directly on add_heads, which knows when and how to
do remote fetching properly.

Reviewed By: sfilipco

Differential Revision: D27629310

fbshipit-source-id: 883606e40bb83907dfa6142ddd2c3030de61698f
2021-04-13 16:40:06 -07:00
Jun Wu
f3e422560b dag: drop SyncableIdDag
Summary:
By using SyncableIdDag and SyncableIdMap, it's harder to use extra features
around them (ex. remote fetching). Drop SyncableIdDag so we are one step
closer to using mut NameDag directly on add_heads, which knows when and how to
do remote fetching properly.

Reviewed By: sfilipco

Differential Revision: D27629307

fbshipit-source-id: 8e9a5a4348a42b9751752b82feb3f3d2d7c4ba45
2021-04-13 16:40:06 -07:00
Jun Wu
7482265320 dag: add hint_subdag_for_insertion to Parents trait
Summary:
The `Parents` trait is used for input of adding (non-lazy) vertexes to the
graph.  The API will be used to provide extra hints to optimize network
fetches.

With the current logic, `assign_head` will ask the server to resolve the heads
first, to check if it is already assigned, then to resolve the parents, etc. to
the roots (in the "to assign" set).  Ideally the `assign_head` logic can ask
the server to resolve the roots first, and if that's unassigned, then just mark
all descendants of the roots as unassigned, do not send more requests.

Note: the current pull logic has all the hashes ready (hashes are known).
But whether the hashes have Id assigned are unknown. It is more tricky
taking the "lock" and "reload" into consideration - hashes without Ids might
"become" having ids assigned after we obtain the lock to write data to disk.

Practically, `pull` using the current logic would take a very long time because
it tries to resolve things remotely for every "to assign" commits.

Reviewed By: sfilipco

Differential Revision: D27629317

fbshipit-source-id: e02f54f43ef65228ce6e3a8a8723dd9ae0a07008
2021-04-13 16:37:48 -07:00
Jun Wu
d4e3c9df7c dag: add TestDag::draw
Summary: This just simplifies the test code a bit.

Reviewed By: sfilipco

Differential Revision: D27629308

fbshipit-source-id: 04eac5cd045c71123e7fc410af74609dbadb8fb7
2021-04-13 16:37:48 -07:00
Jun Wu
8fa464d20e dag: add in-memory negative cache for remote vertex lookups
Summary: This avoids triggering remote lookups if an unknown name was looked up multiple times.

Reviewed By: sfilipco

Differential Revision: D27629316

fbshipit-source-id: 64c1ce5e872a5ce4f14c650a946ae8396f4cc74c
2021-04-13 16:37:48 -07:00
Jun Wu
545e6eeb25 dag: unknown names when resolving (names, heads) to paths is not an error
Summary:
When translating RequestNameToLocation to ResponseIdNamePair. If "heads" are
known, but some "names" aren't. Do not treat it as an error. This will be
used by the client-side to properly handle the "contains" check.

Reviewed By: sfilipco

Differential Revision: D27629309

fbshipit-source-id: 206ec5df956b33f4e816ab8d6a67ce776fd7bd74
2021-04-13 16:37:48 -07:00
Jun Wu
ba7e1c6952 dag: move client Dag construction to TestDag
Summary: This will make it easier to test client / server dags in upcoming changes.

Reviewed By: sfilipco

Differential Revision: D27629318

fbshipit-source-id: e3137654613aa3208a8f2e4b9f4ddfe73871f2c5
2021-04-13 16:37:48 -07:00
Jun Wu
5c77fa2aed dag: implement RemoteIdConvertProtocol on dag snapshots
Summary: This will be used in upcoming changes. It just delegates to the Arc inner.

Reviewed By: sfilipco

Differential Revision: D27629313

fbshipit-source-id: ba6cd7cac2b9f5c1a2898c439c53768995a2dc42
2021-04-13 16:37:47 -07:00
Jun Wu
f4bc4b88b9 dag: add DagAlgorithm::is_vertex_lazy
Summary: This will be used by upcoming changes.

Reviewed By: sfilipco

Differential Revision: D27629312

fbshipit-source-id: 6c56e73caf4e1a398ac3a8b4294bd9f380af3764
2021-04-13 16:37:47 -07:00
Jun Wu
46fa8482a1 dag: add RemoteIdConvrtProtocol::is_local
Summary: This will be used by upcoming changes.

Reviewed By: sfilipco

Differential Revision: D27629319

fbshipit-source-id: d19e490268561f3154642e5bb1e415da4c5d03ee
2021-04-13 16:37:47 -07:00
Jun Wu
db51ea8228 dag: do not resolve id or names remotely inside a nonblocking context
Summary:
Otherwise it might panic (ex. calling into tokio without entering a tokio
runtime). This can happen in, for example, `Debug::fmt(&IdStaticSet, ...)`.

Reviewed By: sfilipco

Differential Revision: D27581487

fbshipit-source-id: feec53e088706adcc6710afcf24fa70598f886cf
2021-04-13 16:37:46 -07:00
Jun Wu
1038af19fa dag: add "dirty" to DagAlgorithm
Summary:
This will be used by "add_heads" logic to detect what vertexes to insert
and might reduce remote fetches.

Reviewed By: sfilipco

Differential Revision: D27572359

fbshipit-source-id: d0bf027a69d180663a1587dfde613cb76b05072a
2021-04-13 16:37:46 -07:00
Jun Wu
1913d17c4e dag: add "dirty" to IdDagAlgorithm
Summary: The API returns entries buffered in memory not persisted.

Reviewed By: sfilipco

Differential Revision: D27572360

fbshipit-source-id: 555988f7c891f2d928bfa1e486a0fc1d089b4ad5
2021-04-13 16:37:46 -07:00
Jun Wu
ce8bfb19a4 dag: add ways to get the minimal "dirty" Id in IdDag
Summary: This will be used to select "dirty" (not written to disk) set in the IdDag.

Reviewed By: sfilipco

Differential Revision: D27572361

fbshipit-source-id: 0b4d2e092ece835e3d4b6aa831d32ffffc7087ca
2021-04-13 16:37:45 -07:00
Jun Wu
9974efcfed dag: consider overlap IdMap for prefix lookup
Summary:
Before this change, overlap IdMap was not considered for prefix lookup. That
results in "shortest" template not working and smartlog prints full hashes
for remote/stable etc.

Reviewed By: sfilipco

Differential Revision: D27547582

fbshipit-source-id: 7a56590775eed9d509f2212f8e5009c04aaf4e9d
2021-04-13 16:37:45 -07:00
Jun Wu
d450aa7bd4 dag: extract prefix lookup logic from MemIdMap to CoreMemIdMap
Summary: It will be reused in NameDag.

Reviewed By: sfilipco

Differential Revision: D27547583

fbshipit-source-id: da85fc7504d20877210e8ed1a97cbec18d06eede
2021-04-13 16:37:45 -07:00
Jun Wu
2ab2ba667c dag: avoid SyncNameSetQuery in namedag and ops
Summary:
Now NameSet iteration can be blocking, SyncNameSetQuery is no longer sound.
Remove SyncNameSetQuery in key logic (namedag and ops) and replace them with
async logic.

Reviewed By: sfilipco

Differential Revision: D27547581

fbshipit-source-id: af69b1a8219e97d10278939407ee79f9b333a77f
2021-04-13 16:37:45 -07:00
Jun Wu
af3c3b3fd0 dag: trigger remote fetching in various dag algorithms
Summary: Dag algorithms like `parent_names` need to fetch vertexes via remote automatically.

Reviewed By: sfilipco

Differential Revision: D27547584

fbshipit-source-id: 8106931d6f642c9a4bf0f3c546ba881c2ca73fea
2021-04-13 16:37:45 -07:00
Jun Wu
7508e2f81f dag: fix filter set's contains implementation
Summary:
The "filter" set's filter function might not be prepared for inputs outside
the parent set. So let's the "contains" function to test against the parent
set first, then test the filter function.

This fixes the "merge()" set's "contains" check using the revlog backend:

  In [1]: v=repo.revs('draft() & merge()')

  In [2]: v._set
  Out[2]: <meta ?>

  In [3]: m.node.nullid in v._set
  Out[3]: False

Before this diff it would be:

  In [3]: m.node.nullid in v._set
  CommitLookupError: '0000000000000000000000000000000000000000 cannot be found'

Note: Segmented changelog backend is not affected because it does not use
filter sets.

Reviewed By: xavierd

Differential Revision: D27657502

fbshipit-source-id: 30bb261fea59bdc5644580e98796f52fa93c2705
2021-04-08 12:25:51 -07:00
Jun Wu
6cba98986a dag: fix path response with batch_size >= 2
Summary:
The issue is that `mut i: usize` is no longer shared across multiple `async
move` blocks (introduced by D27308798 (0df4efa969)).

Rewrite the logic to collect `ids` first, then use `vertex_name_batch`
query instead.

Reviewed By: sfilipco

Differential Revision: D27406586

fbshipit-source-id: b41fe3a13114dc34aa5763e6e2bebe0571decc87
2021-04-05 12:55:41 -07:00
Jun Wu
9342556e54 dag: merge adjacent x~n queries
Summary:
Merge paths like `x~n` and `x~(n+1)` to `x~n (batch_size = 2)`.
This could be more efficient bandwidth-wise and algorithm-wise.

Reviewed By: sfilipco

Differential Revision: D27406587

fbshipit-source-id: f2a67352ad627945685e33667e8299a2bc652930
2021-04-05 12:55:40 -07:00
Jun Wu
775899c0f2 dag: make protocol use IdSet instead of Vec<Id>
Summary: IdSet is more compact. This changes the order a bit.

Reviewed By: sfilipco

Differential Revision: D27339279

fbshipit-source-id: e9b50a47beba081b892eccd7711dbd6ab5c3a886
2021-04-05 12:55:40 -07:00
Jun Wu
c515d1f54f dag: show AnestorPath batch size in debug output
Summary: This will be used by the next change.

Reviewed By: sfilipco

Differential Revision: D27406591

fbshipit-source-id: fcacc35a9ae8ed96cebb2af804d26d1e5e83ad9e
2021-04-05 12:55:40 -07:00
Jun Wu
95ece1d6fe dag: add a way to flush the overlay map
Summary:
Add a way to flush the overlay map to disk so we can avoid network fetches over
and over.

Reviewed By: sfilipco

Differential Revision: D27406592

fbshipit-source-id: 7086ad665119cc3a0834f533690325c7a2363442
2021-04-05 12:55:40 -07:00
Jun Wu
dd042424f3 dag: move (x~n, name) -> (id, name) calculation to a function
Summary: It will be reused elsewhere.

Reviewed By: sfilipco

Differential Revision: D27406593

fbshipit-source-id: 296cf5f50830bb7285e0cb9c7c15a9b374689819
2021-04-05 12:55:40 -07:00
Jun Wu
5326b18c2b dag: track x~n paths in NameDag
Summary:
I spent some time thinking about how to flush the "overlay_map" to disk.
It is a bit tricky because the on-disk IdMap might have changed in an
incompatible way. I tried a few ways to verify the on-disk IdMap remains
compatible and didn't find a way that looks good (either slow - calculating
universal_ids, or is not 100% correct in corner cases).

Now I come up with this "just track x~n" idea. It trades memory usage (~2x
overlay_map) for easy-to-verify correctness, and efficient overlay_map
flush.

Reviewed By: sfilipco

Differential Revision: D27406583

fbshipit-source-id: 0b7fb3186a9c15f376c1dc4afe7f0516c25d3dec
2021-04-05 12:55:39 -07:00
Jun Wu
e6d231818d dag: add more comments about NameDag locking
Summary: It is not obvious. So let's add more comments.

Reviewed By: sfilipco

Differential Revision: D27406584

fbshipit-source-id: 9ce1215efc1a6d4849180c6693616613c08f2a51
2021-04-05 12:55:39 -07:00
Jun Wu
2b5f78d0ac dag: add a test about sparse dag
Summary:
A sparse dag does not have full IdMap. Its IdMap only contains "universally known" entries.

Add a basic test about cloning from a sparse clone data and resolve vertex <-> id mapping
on the fly.

Reviewed By: sfilipco

Differential Revision: D27352018

fbshipit-source-id: 4a3f5f50be52e91bf7b2021cdc858bcab9c99e80
2021-04-05 12:55:39 -07:00
Jun Wu
d5b5e1ea93 dag: make import_clone_data flush the dag directly
Summary:
The `NameDag::flush` API will actually rebuild the graph using a "parent" function.
That is not necessary if we got clone data, and won't work well for a lazy graph
(since the parent function talks about vertex names and some names are missing).

Let's bypass the `flush` function and write data directly in `import_clone_data`.

Reviewed By: sfilipco

Differential Revision: D27352019

fbshipit-source-id: a79569d25d858447b8c5eb86902b8d39ae0429a3
2021-04-05 12:55:39 -07:00
Jun Wu
cdbc0b9bb1 dag: add ways to use a NameDag as an implementation of the remote protocols
Summary: This will be used in tests.

Reviewed By: sfilipco

Differential Revision: D27343882

fbshipit-source-id: 5a2d94a9f755eed0fc27e5a11093b55c810dc8da
2021-04-05 12:55:39 -07:00
Jun Wu
13c9880eca dag: add logic to export clone data
Summary:
Implement logic to export the clone data. This allows us to construct a sparse/lazy
dag via export + import CloneData.

Reviewed By: sfilipco

Differential Revision: D27343885

fbshipit-source-id: 71dc0d31e36876a8b6a8c3d7f3498be3262ce297
2021-04-05 12:55:39 -07:00
Jun Wu
ec2c1a7928 dag: add verification importing clone data
Summary:
Clone data can only be imported to an empty Dag and universally known vertexes
should be present in the IdMap.

Reviewed By: sfilipco

Differential Revision: D27343888

fbshipit-source-id: ba150d6afdbe15f0902ec20ff150a70657e24c80
2021-04-05 12:55:39 -07:00
Jun Wu
429b0e1e15 dag: make import_clone_data async
Summary: It'll use some async functions.

Reviewed By: sfilipco

Differential Revision: D27406585

fbshipit-source-id: e757796f712a5f95f1227f88e797e43551039f0b
2021-04-05 12:55:38 -07:00
Jun Wu
fed4cdbe50 dag: implement Id prefetch for IdStatic set
Summary: Make IdStatic prefetch Id -> Names on iteration.

Reviewed By: sfilipco

Differential Revision: D27343886

fbshipit-source-id: 7957b574c8c14cfea476b9c42cbf9f11fefa39be
2021-04-05 12:55:38 -07:00
Jun Wu
8364f23186 dag: make IdConvert batch API on NameDag use less network requests
Summary: Collect "missing" items and only use one request to fetch them.

Reviewed By: sfilipco

Differential Revision: D27406588

fbshipit-source-id: a5cd091b39d90c1ad0e7c5d509673c4665232304
2021-04-05 12:55:38 -07:00
Jun Wu
55b0132518 dag: implement ExactSizeIterator on SpanSet
Summary: This allows something like `iter.rev().take(n).rev()`.

Reviewed By: sfilipco

Differential Revision: D27343887

fbshipit-source-id: 06c095eb448272dca6add0e707cdf38f0daee252
2021-04-05 12:55:38 -07:00
Jun Wu
0326d38c75 dag: add batch id <-> name API in IdConvert
Summary:
This will be used by upcoming changes. Sparse/Lazy NameSet will override it
to reduce network round-trips.

Reviewed By: sfilipco

Differential Revision: D27406590

fbshipit-source-id: a44a73b4aec6e14d6e82d55285fe1cfc0fcfd482
2021-04-05 12:51:38 -07:00
Jun Wu
7c8056ae38 dag: add APIs to test if Id or Vertex is present in IdMap locally
Summary: This will be used by upcoming changes.

Reviewed By: sfilipco

Differential Revision: D27343884

fbshipit-source-id: 0938b1fb3d90b35f9d51c468cffca53e3f421bb8
2021-04-05 12:51:38 -07:00
Jun Wu
0ff3c9109d dag: make NameDag::IdConvert resolve vertexes remotely
Summary:
Remove some TODOs. This serves as fallback paths where batch prefetch didn't happen.
I'd expect most use-cases will trigger IdStatic set's batch prefetch logic (to be
added).

I haven't decided what to do exactly with "contains". Fetching remotely there seems
to require some kind of negative cache (ex. in mutation records there might be nodes
not in the graph). But it _might_ be okay to say the "contains" is a local-only
operation, too. I leave it as-is and we'll see how the Python world uses "contains"
later.

Reviewed By: sfilipco

Differential Revision: D27339275

fbshipit-source-id: ba70b3c84a391a8e395c73ccd1d7e08f92b0cbd0
2021-04-05 12:51:37 -07:00
Jun Wu
b7c63d192d dag: add methods to resolve id <-> names remotely on NameDag
Summary:
Put everything together. I used "programming error" extensively to
provide more context if we have to investigate issues in the future.

Reviewed By: sfilipco

Differential Revision: D27339278

fbshipit-source-id: 574a2c048dc1d24dbe690f862fec3e5078cb067a
2021-04-05 12:51:37 -07:00
Jun Wu
8a381893db dag: improve error message in protocol
Summary: Provide more context about what invariants we expect. Not just show "vertex not found".

Reviewed By: sfilipco

Differential Revision: D27339273

fbshipit-source-id: 1c6c92537ff37666ff603783adfd8f9ea770fbaa
2021-04-05 12:51:37 -07:00
Jun Wu
c85c750baa dag: add a remote protocol field to NameDag
Summary:
Makes NameDag own the state (logic) about how to send remote requests.
So NameDag can send requests on its own.

Reviewed By: sfilipco

Differential Revision: D27339282

fbshipit-source-id: 3cb6327dfeaefae45d4e7b88a3535463a84b195b
2021-04-05 12:48:34 -07:00
Jun Wu
79b40c5ce8 dag: define remote protocol
Summary: Define a trait for implementing the remote protocol elsewhere.

Reviewed By: sfilipco

Differential Revision: D27339281

fbshipit-source-id: da5b316d98863507361d3bde4988fd6c9098f48c
2021-04-05 12:48:34 -07:00