This is useful to follow changes in a block of lines forward in the history
(for instance, when one wants to find out how a function evolved from a point
in history).
We added a 'descend' parameter to followlines(), which defaults to False. If
True, followlines() returns descendants of startrev.
Because context.blockdescendants() does not follow renames, these are not
followed by the revset either, so history will end when a rename occurs (as
can be seen in tests).
New revsetlang module hosts parser, tokenizer, and miscellaneous functions
working on parsed tree. It does not include functions for evaluation such as
getset() and match().
2288 mercurial/revset.py
684 mercurial/revsetlang.py
2972 total
get*() functions are aliased since they are common in revset.py.
This is part of a refactoring that moves some phase query optimization from
revset.py to phases.py. See the previous patch for motivation.
This patch changes revset code to use phasecache.getrevset so it no longer
accesses the private field: _phasecache._phasesets directly.
For performance impact, this patch was tested using the following query, on
my hg-committed repo:
for i in 'public()' 'not public()' 'draft()' 'not draft()'; do
echo $i;
hg perfrevset "$i";
hg perfrevset "$i" --hidden;
done
For the CPython implementation, most operations are unchanged (within
+/- 1%), while "not public()" and "draft()" is noticeably faster on an
unfiltered repo. It may be because the new code avoids a set copy if
filteredrevs is empty.
revset | public() | not public() | draft() | not draft()
hidden | yes | no | yes | no | yes | no | yes | no
------------------------------------------------------------------
before | 19006 | 17352 | 239 | 286 | 180 | 228 | 7690 | 5745
after | 19137 | 17231 | 240 | 207 | 182 | 150 | 7687 | 5658
delta | | -38% | | -52% |
(timed in microseconds)
For the pure Python implementation, some operations are faster while "not
draft()" is noticeably slower:
revset | public() | not public() | draft() | not draft()
hidden | yes | no | yes | no | yes | no | yes | no
------------------------------------------------------------------------
before | 18852 | 17183 | 17758 | 15921 | 17505 | 15973 | 41521 | 39822
after | 18924 | 17380 | 17558 | 14545 | 16727 | 13593 | 48356 | 43992
delta | | -9% | -5% | -15% | +16% | +10%
That may be the different performance characters of generatorset vs.
filteredset. The "not draft()" query could be optimized in this case where
both "public" and "secret" are passed to "getrevsets" so it won't iterate
the whole repo twice.
These classes are pretty large and independent from revset computation.
2961 mercurial/revset.py
973 mercurial/smartset.py
3934 total
revset.prettyformatset() is renamed to smartset.prettyformat(). Smartset
classes are aliased since they are quite common in revset.py.
outgoing() and remote() may stall for long due to network I/O, which seems
unsafe per definition, "whether a predicate is safe for DoS attack." But I'm
not 100% sure about this. If our concern isn't elapsed time but CPU resource,
these predicates are considered safe. Perhaps that would be up to the
web/application server configuration?
Anyway, outgoing() and remote() wouldn't be useful in hgweb, so I think
it's okay to ban them.
We have 4 revset functions that take integer arguments, and they handle
their arguments in slightly different ways. This patch unifies them:
- getstring() in place of getsymbol(), which is more consistent with the
handling of integer revisions (both 1 and '1' are valid)
- say "expects" instead of "requires" for type errors
We don't need to catch TypeError since getstring() must return a string.
The rev argument has the same meaning as startrev of follow(), and I think
startrev is more informative.
followlines() is new function, we can make BC now.
There's no reason to duplicate this so many times, and it's likely an instance
will be missed if support for a new pattern is added and documented. The
stringmatcher is mostly used by revsets, though it is also used for the 'tag'
related templates, and namespace filtering in the journal extension. So maybe
there's a better place to document it. `hg help patterns` seems inappropriate,
because that is all file pattern matching.
While here, indicate how to perform case insensitive regex searches.
It was probably unintentional for regex, as the meaning of some sequences like
\S and \s is actually inverted by changing the case. For backward compatibility
however, the matching is forced to case insensitive.
This revset returns the history of a range of lines (fromline, toline) of a
file starting from `rev` or the current working directory.
Added tests in test-annotate.t which already contains a reasonably complex
repository.
The bootstrapping issue was addressed at the parsing phase and we expect
that fullreposet.__and__() fully complies to the smartset API, in which
'self & other' should return a result set in self's order. See also
ab938e7ae803.
Let's resurrect the docstring since our help module can detect the EXPERIMENTAL
tag and display it only if -v is specified.
This patch updates the test added by bbdfa2d5aaa2 since wdir() is now
documented.
Unlike p1 = null, p2 = null denotes the revision has only one parent, which
shouldn't be considered a child of the null revision. This was spotted while
fixing the issue4682 and rediscovered as issue5439.
There was a "leak", apparently introduced in b37a67b41690. When running:
hg = hglib.open('repo')
while True:
hg.log("max(branch('default'))")
all filteredset instances from branch() would be cached indefinitely by the
@util.cachefunc annotation on the max() implementation.
util.cachefunc seems dangerous as method decorator and is barely used elsewhere
in the code base. Instead, just open code caching by having the min/max
methods replace themselves with a plain lambda returning the result.
Taking only the last revision is inconsistent because ancestors(set) follows
all revisions given, and theoretically follow(startrev=set) == ancestors(set).
I'm planning to add a support for multiple start revisions, but that won't fit
to the 4.0 time frame. So reject multiple revisions now to avoid future BC.
len(revs) might be slow if revs were large, but we don't care since a valid
revs should have only one element.
For now, these sets will be unicode characters in Python 3, which is
probably wrong, but it un-blocks importing the module so we can get
further along. In the future we'll have to come up with a reasonable
encoding strategy for revsets in Python 3.
This patch was originally pair-programmed with Martijn.
Because smartset.reverse() may modify the underlying subset, it should be
called only if the set can define the ordering.
In the following example, 'a' and 'c' is the same object, so 'b.reverse()'
would reverse 'a' unexpectedly.
# '0:2 & reverse(all())'
<filteredset
<spanset- 0:2>, # a
<filteredset # b
<spanset- 0:2>, # c
<spanset+ 0:9>>>
present() is special in that it returns the argument set with no
modification, so the ordering requirement should be forwarded.
We could make present() fix the order like orset(), but that would be silly
because we know the extra filtering cost is unnecessary.
This fixes the order of 'x & (y + z)' where 'y' and 'z' are trivial, and the
other uses of _list()-family functions. The original functions are renamed to
'_ordered(|int|hex)list' to say clearly that they do not follow the subset
ordering.
This fixes the order of 'x & (y + z)' where 'y' and 'z' are not trivial.
The follow-order 'or' operation is slower than the ordered operation if
an input set is large:
#0#1#2#3
0) 0.002968 0.002980 0.002982 0.073042
1) 0.004513 0.004485 0.012029 0.075261
#0: 0:4000 & (0:1099 + 1000:2099 + 2000:3099)
#1: 4000:0 & (0:1099 + 1000:2099 + 2000:3099)
#2: 10000:0 & (0:1099 + 1000:2099 + 2000:3099)
#3: file("path:hg") & (0:1099 + 1000:2099 + 2000:3099)
I've tried another implementation, but which appeared to be slower than
this version.
ss = [getset(repo, fullreposet(repo), x) for x in xs]
return subset.filter(lambda r: any(r in s for s in ss), cache=False)
New flag 'order' is the hint to determine if a function or operation can
enforce its ordering requirement or take the ordering already defined. It
will be used to fix a couple of ordering bugs, such as:
a) 'x & (y | z)' disregards the order of 'x' (issue5100)
b) 'x & y:z' is listed from 'y' to 'z'
c) 'x & y' can be rewritten as 'y & x' if weight(x) > weight(y)
(a) and (b) are bugs of the revset core. Before this, there was no way to
tell if 'orset()' and 'rangeset()' can enforce its ordering. These bugs
could be addressed by overriding __and__() of the initial set to take the
ordering of the other set:
class fullreposet:
def __and__(self, other):
# allow other to enforce its ordering
return other
but it would expose (c), which is a hidden bug of optimize(). So, in either
ways, optimize() have to know the current ordering requirement. Otherwise,
it couldn't rewrite expressions by weights with no output change, nor tell
how a revset function or operation should order the entries.
'order' is tri-state. It starts with 'define', and shifts to 'follow' by
'x & y'. It changes back to 'define' on function call 'f(x)' or function-like
operation 'x (f) y' because 'f' may have its own ordering requirement for 'x'
and 'y'. The state 'any' will allow us to avoid extra cost that would be
necessary to constrain ordering where it isn't important, 'not x'.
This makes the number of 'or' arguments deterministic so we can attach
additional ordering flag to all operator nodes. See the next patch.
We rewrite the tree immediately after chained 'or' operations are flattened
by simplifyinfixops(), so we don't need to care if arguments are stored in
x[1] or x[1:].
This will allow us to evaluate unoptimized tree and compare the result with
optimized one.
The private _analyze() function isn't renamed since I'll add more parameters
to it.
This patch separates the simple tree transformation from the optimization step,
which is called as _analyze() since I'll extend this function to infer ordering
flags. I want to avoid making _optimize() more complicated.
This will also allow us to evaluate unoptimized tree.
This should have caught the bug of 'keyvalue' operator fixed at 910346866463.
The catch-all pattern is useless since optimize() should be aware of all known
operators.
Before, a keyvalue node was processed by the last catch-all condition of
_optimize(). Therefore, topo.firstbranch=expr would bypass tree rewriting
and would crash if an expr wasn't trivial.
Before the error was caught at func() as an unknown identifier, and the
optimizer failed to detect the syntax error. This patch introduces getsymbol()
helper to ensure that a string is not allowed as a function name.
match() is the special case of a single element list being passed
to matchany() with the additional error checking that the revset
spec is defined. Change the implementation to remove the redundant
code and have match() call matchany().
The ordering of 'x & head()' was broken in 329d82866742 (revset:
improve head revset performance, 2014-03-13). Presumably due to other
optimizations since then, undoing that change to fix the order does
not slow down the simple case of "hg log -r 'head()'" mentioned in
that commit. I see a small slowdown from ~0.16s to about ~0.19s with
'not 0 & head()', but I'd say it's worth it for the correct output.
Make it noop as before ddf6bfe09ab2. We could change it to an error, but
allowing empty key makes some sense for scripting that builds a key string
programmatically.
Sort revisions in reverse revision order but grouped by topographical branches.
Visualised as a graph, instead of:
o 4
|
| o 3
| |
| o 2
| |
o | 1
|/
o 0
revisions on a 'main' branch are emitted before 'side' branches:
o 4
|
o 1
|
| o 3
| |
| o 2
|/
o 0
where what constitutes a 'main' branch is configurable, so the sort could also
result in:
o 3
|
o 2
|
| o 4
| |
| o 1
|/
o 0
This sort was already available as an experimental option in the graphmod
module, from which it is now removed.
This sort is best used with hg log -G:
$ hg log -G "sort(all(), topo)"
This fix allows __nonzero__ to respect the direction of iteration of the
whole filteredset. Here's the case when it matters. Imagine that we have a
very large repository and we want to execute a command like:
$ hg log --rev '(tip:0) and user(ikostia)' --limit 1
(we want to get the latest commit by me).
Mercurial will evaluate a filteredset lazy data structure, an
instance of the filteredset class, which will know that it has to iterate
in a descending order (isdescending() will return True if called). This
means that when some code iterates over the instance of this filteredset,
the 'and user(ikostia)' condition will be first checked on the latest
revision, then on the second latest and so on, allowing Mercurial to
print matches as it founds them. However, cmdutil.getgraphlogrevs
contains the following code:
revs = _logrevs(repo, opts)
if not revs:
return revset.baseset(), None, None
The "not revs" expression is evaluated by calling filteredset.__nonzero__,
which in its current implementation will try to iterate the filteredset
in ascending order until it finds a revision that matches the 'and user(..'
condition. If the condition is only true on late revisions, a lot of
useless iterations will be done. These iterations could be avoided if
__nonzero__ followed the order of the filteredset, which in my opinion
is a sensible thing to do here.
The problem gets even worse when instead of 'user(ikostia)' some more
expensive check is performed, like grepping the commit diff.
I tested this fix on a very large repo where tip is my commit and my very
first commit comes fairly late in the revision history. Results of timing
of the above command on that very large repo.
-with my fix:
real 0m1.795s
user 0m1.657s
sys 0m0.135s
-without my fix:
real 1m29.245s
user 1m28.223s
sys 0m0.929s
I understand that this is a very specific kind of problem that presents
itself very rarely, only on very big repositories and with expensive
checks and so on. But I don't see any disadvantages to this kind of fix
either.
This makes it possible to use keyword arguments to specify per-sort options.
For example, a hypothetical 'first' option for the user sort could sort certain
users first with:
sort(all(), user, user.first=mpm@selenic.com)
When we introduce the develwarning, we did not had an official deprecation API
and infrastructure. We can now officially deprecate the old way with a version
deadline.
Unlike range, dagrange has no inverted range (such as '10:0'). So there should
be no practical reason to keep dagrange as a function that forces its own
ordering.
No performance regression is spotted in contrib/base-revsets.txt.
Because a suffix action never takes subsequent tokens, it should have
no binding strength nor closing character. I've tried if this value could
be used to resolve infix/suffix ambiguity of x^:y, but it appears not. So
I decided to resend this patch.
Our invert() function was too clever to not take length into account. I could
fix the problem by appending '\xff' as a terminator (opposite to '\0'), but
it turned out to be slower than simple multi-pass sorting.
New implementation is pretty straightforward, which just calls sort() from the
last key. We can do that since Python sort() is guaranteed to be stable. It
doesn't sound nice to call sort() multiple times, but actually it is faster.
That's probably because we have fewer Python codes in hot loop, and can avoid
heavy string and list manipulation.
revset #0: sort(0:10000, 'branch')
0) 0.412753
1) 0.393254
revset #1: sort(0:10000, '-branch')
0) 0.455377
1) 0.389191 85%
revset #2: sort(0:10000, 'date')
0) 0.408082
1) 0.376332 92%
revset #3: sort(0:10000, '-date')
0) 0.406910
1) 0.380498 93%
revset #4: sort(0:10000, 'desc branch user date rev')
0) 0.542996
1) 0.486397 89%
revset #5: sort(0:10000, '-desc -branch -user -date -rev')
0) 0.965032
1) 0.518426 53%
This provides a customization point for templater. In templater, there are
two ways to call a unary function: func(x) and x|func. They are processed
differently in templater due to historical reasons, but they should be
handled in the same way while expanding aliases. In short, x|func should be
processed as syntactic sugar for func(x).
_funcnode and _getlist() are replaced by _trygetfunc().
They will be commonly used by revset and templater. It isn't easy to understand
how _expand() works, so I'll add comments by a follow-up patch.
The local variable 'alias' is renamed to 'a' to avoid shadowing the global
'alias' class.