In blockdescendants(), we had an assertion when line range of a merge
changeset was not consistent depending on which parent was considered for
computation. For instance, this might occur when file content (in lookup
range) is significantly different between parent branches of the merge as
demonstrated in added tests (where we almost completely rewrite the "baz" file
while also introducing similarities with its content in the other branch we
later merge to).
Now, in such case, we combine line ranges from all parents by storing the
envelope of both line ranges. This is conservative (the line range is
extended, possibly unnecessarily) but at least this should avoid missing
descendants with changes in a range that would fall in that of one parent but
not in another one (the case of "baz: narrow change (2->2+)" changeset in
tests).
This is naive implementation using two-pass scanning. Tracking descendants
isn't an easy problem if both start and stop depths are specified. It's
impractical to remember all possible depths of each node while scanning from
roots to descendants because the number of depths explodes. Instead, we could
cache (min, max) depths as a good approximation and track ancestors back when
needed, but that's likely to have off-by-one bug.
Since this implementation appears not significantly slower, and is quite
straightforward, I think it's good enough for practical use cases. The time
and space complexity is O(n) ish.
revisions:
0) 1-pass scanning with (min, max)-depth cache (worst-case quadratic)
1) 2-pass scanning (this version)
repository:
mozilla-central
# descendants(0) (for reference)
*) 0.430353
# descendants(0, depth=1000)
0) 0.264889
1) 0.398289
# descendants(limit(tip:0, 1, offset=10000), depth=1000)
0) 0.025478
1) 0.029099
# descendants(0, depth=2000, startdepth=1000)
0) painfully slow (due to quadratic backtracking of ancestors)
1) 1.531138
Prepares for adding depth support. I want to process depth=0 in
revdescendants() to make things simpler.
only() also calls dagop.revdescendants(), but it filters out root revisions
explicitly. So this should cause no problem.
# descendants(0) using hg repo
0) 0.052380
1) 0.051226
# only(tip) using hg repo
0) 0.001433
1) 0.001425
This is necessary to implement the set{gen} (set subscript) operator. For
example, set{-n} will be translated to ancestors(set, depth=n, startdepth=n).
https://www.mercurial-scm.org/wiki/RevsetOperatorPlan#ideas_from_mpm
The UI is undecided and I doubt if the startdepth option would be actually
useful, so the option is hidden for now. 'depth' could be extended to take
min:max range, in which case, integer depth should select a single generation.
ancestors(set, depth=:y) # scan up to y-th generation
ancestors(set, depth=x:) # skip until (x-1)-th generation
ancestors(set, depth=x) # select only x-th generation
Any ideas are welcomed.
# reverse(ancestors(tip)) using hg repo
3) 0.075951
4) 0.076175
Surprisingly, this makes revset benchmark slightly faster. I don't know why,
but it appears that wrapping -inputrev by tuple is the key. So I decided to
just enable depth computation by default.
# reverse(ancestors(tip)) using hg repo
1) 0.081051
2) 0.075408
Since we're using a max heap, the current rev should be a duplicate only
if it equals to the previous one. We don't have to maintain the whole seen
set.
# reverse(ancestors(tip)) using hg repo
0) 0.086420
1) 0.081051
- h -> pendingheap: "h" seems too short for variable of long lifetime
- current -> currev: future patches will add current "depth" variable
- parent -> prev or pctx: short lifetime, follows common naming rules
context.py seems not a good place to host these functions.
% wc -l mercurial/context.py mercurial/dagop.py
2306 mercurial/context.py
424 mercurial/dagop.py
2730 total
This module hosts the following functions. They are somewhat similar (e.g.
scanning revisions using heap queue or stack) and seem non-trivial in
algorithmic point of view.
- _revancestors()
- _revdescendants()
- reachableroots()
- _toposort()
I was thinking of adding revset._fileancestors() generator for better follow()
implementation, but it would be called from context.py as well. So I decided
to create new module.
Naming is hard. I couldn't come up with any better module name, so it's called
"dag operation" now. I rejected the following candidates:
- ancestor.py - existing, revlog-level DAG algorithm
- ancestorset.py - doesn't always return a set
- dagalgorithm.py - hard to type
- dagutil.py - existing
- revancestor.py - I want to add fileancestors()
% wc -l mercurial/dagop.py mercurial/revset.py
339 mercurial/dagop.py
2020 mercurial/revset.py
2359 total