context.py seems not a good place to host these functions.
% wc -l mercurial/context.py mercurial/dagop.py
2306 mercurial/context.py
424 mercurial/dagop.py
2730 total
This module hosts the following functions. They are somewhat similar (e.g.
scanning revisions using heap queue or stack) and seem non-trivial in
algorithmic point of view.
- _revancestors()
- _revdescendants()
- reachableroots()
- _toposort()
I was thinking of adding revset._fileancestors() generator for better follow()
implementation, but it would be called from context.py as well. So I decided
to create new module.
Naming is hard. I couldn't come up with any better module name, so it's called
"dag operation" now. I rejected the following candidates:
- ancestor.py - existing, revlog-level DAG algorithm
- ancestorset.py - doesn't always return a set
- dagalgorithm.py - hard to type
- dagutil.py - existing
- revancestor.py - I want to add fileancestors()
% wc -l mercurial/dagop.py mercurial/revset.py
339 mercurial/dagop.py
2020 mercurial/revset.py
2359 total
This replaces 'if y in subset' with '& subset'. first(null) and last(wdir())
are fixed thanks to fullreposet.__and__.
This also revealed that first() and last() don't follow the order of the
input set. 'ls & subset' is valid only if the ordering requirement is 'define'
or 'any'.
No performance regression observed:
revset #0: limit(0:9999, 100, 9000)
0) 0.001164
1) 0.001135
revset #2: 9000 & limit(0:9999, 100, 9000)
0) 0.001224
1) 0.001181
revset #3: last(0:9999, 100)
0) 0.000237
1) 0.000199
last() is implemented using a reversed iterator, so the result should be
reversed again.
I've marked this as BC since it's quite old bug seen in 3.0. The first bad
revision is 1ef0875a62f8 "revset: changed last implementation to use lazy
classes."
Negative offsets to the `~` operator now search for descendents. The search is
aborted when a node has more than one child as we do not have a definition for
'nth child'. Optionally we can introduce such a notion and take the nth child
ordered by rev number.
The current revset language does provides a short operator for ancestor lookup
but not for descendents. This gives user a simple revset to move to the previous
changeset, e.g. `hg up '.~1'` but not to the 'next' changeset. With this change
userse can now use `.~-1` as a shortcut to move to the next changeset.
This fits better into allowing users to specify revisions via revsets and
avoiding the need for special `hg next` and `hg prev` operations.
The alternative to negative offsets is adding a new operator. We do not have
many operators in ascii left that do not require bash escaping (',', '_', and
'/' come to mind). If we decide that we should add a more convenient short
operator such as ('/', e.g. './1') we can later add it and allow ascendents
lookup via negative numbers.
The idea is simple. If the given node id prefix is 'ff...f', add +1 to the
number of matches (e.g. ambiguous if partial + maybewdir > 1).
This patch also fixes id() revset and shortest() template since _partialmatch()
can raise WdirUnsupported exception.
For wdir(), we now raises an exception which will be raised when wdir() will be
passed, so catching that exception is better checking for wdir() using if-else.
We parse "descend" symbol as a Boolean using getboolean (prior extraction by
getargsdict already checked that it is a symbol).
In tests, check for error cases and vary Boolean values here and there.
This is useful to follow changes in a block of lines forward in the history
(for instance, when one wants to find out how a function evolved from a point
in history).
We added a 'descend' parameter to followlines(), which defaults to False. If
True, followlines() returns descendants of startrev.
Because context.blockdescendants() does not follow renames, these are not
followed by the revset either, so history will end when a rename occurs (as
can be seen in tests).
New revsetlang module hosts parser, tokenizer, and miscellaneous functions
working on parsed tree. It does not include functions for evaluation such as
getset() and match().
2288 mercurial/revset.py
684 mercurial/revsetlang.py
2972 total
get*() functions are aliased since they are common in revset.py.
This is part of a refactoring that moves some phase query optimization from
revset.py to phases.py. See the previous patch for motivation.
This patch changes revset code to use phasecache.getrevset so it no longer
accesses the private field: _phasecache._phasesets directly.
For performance impact, this patch was tested using the following query, on
my hg-committed repo:
for i in 'public()' 'not public()' 'draft()' 'not draft()'; do
echo $i;
hg perfrevset "$i";
hg perfrevset "$i" --hidden;
done
For the CPython implementation, most operations are unchanged (within
+/- 1%), while "not public()" and "draft()" is noticeably faster on an
unfiltered repo. It may be because the new code avoids a set copy if
filteredrevs is empty.
revset | public() | not public() | draft() | not draft()
hidden | yes | no | yes | no | yes | no | yes | no
------------------------------------------------------------------
before | 19006 | 17352 | 239 | 286 | 180 | 228 | 7690 | 5745
after | 19137 | 17231 | 240 | 207 | 182 | 150 | 7687 | 5658
delta | | -38% | | -52% |
(timed in microseconds)
For the pure Python implementation, some operations are faster while "not
draft()" is noticeably slower:
revset | public() | not public() | draft() | not draft()
hidden | yes | no | yes | no | yes | no | yes | no
------------------------------------------------------------------------
before | 18852 | 17183 | 17758 | 15921 | 17505 | 15973 | 41521 | 39822
after | 18924 | 17380 | 17558 | 14545 | 16727 | 13593 | 48356 | 43992
delta | | -9% | -5% | -15% | +16% | +10%
That may be the different performance characters of generatorset vs.
filteredset. The "not draft()" query could be optimized in this case where
both "public" and "secret" are passed to "getrevsets" so it won't iterate
the whole repo twice.
These classes are pretty large and independent from revset computation.
2961 mercurial/revset.py
973 mercurial/smartset.py
3934 total
revset.prettyformatset() is renamed to smartset.prettyformat(). Smartset
classes are aliased since they are quite common in revset.py.
outgoing() and remote() may stall for long due to network I/O, which seems
unsafe per definition, "whether a predicate is safe for DoS attack." But I'm
not 100% sure about this. If our concern isn't elapsed time but CPU resource,
these predicates are considered safe. Perhaps that would be up to the
web/application server configuration?
Anyway, outgoing() and remote() wouldn't be useful in hgweb, so I think
it's okay to ban them.
We have 4 revset functions that take integer arguments, and they handle
their arguments in slightly different ways. This patch unifies them:
- getstring() in place of getsymbol(), which is more consistent with the
handling of integer revisions (both 1 and '1' are valid)
- say "expects" instead of "requires" for type errors
We don't need to catch TypeError since getstring() must return a string.
The rev argument has the same meaning as startrev of follow(), and I think
startrev is more informative.
followlines() is new function, we can make BC now.
There's no reason to duplicate this so many times, and it's likely an instance
will be missed if support for a new pattern is added and documented. The
stringmatcher is mostly used by revsets, though it is also used for the 'tag'
related templates, and namespace filtering in the journal extension. So maybe
there's a better place to document it. `hg help patterns` seems inappropriate,
because that is all file pattern matching.
While here, indicate how to perform case insensitive regex searches.
It was probably unintentional for regex, as the meaning of some sequences like
\S and \s is actually inverted by changing the case. For backward compatibility
however, the matching is forced to case insensitive.
This revset returns the history of a range of lines (fromline, toline) of a
file starting from `rev` or the current working directory.
Added tests in test-annotate.t which already contains a reasonably complex
repository.
The bootstrapping issue was addressed at the parsing phase and we expect
that fullreposet.__and__() fully complies to the smartset API, in which
'self & other' should return a result set in self's order. See also
ab938e7ae803.
Let's resurrect the docstring since our help module can detect the EXPERIMENTAL
tag and display it only if -v is specified.
This patch updates the test added by bbdfa2d5aaa2 since wdir() is now
documented.
Unlike p1 = null, p2 = null denotes the revision has only one parent, which
shouldn't be considered a child of the null revision. This was spotted while
fixing the issue4682 and rediscovered as issue5439.