File matching is done by applying the matcher to all elements in the 'file'
field of all changesets in the repository. This requires to read/parse all
changesets in the repository and do a lot of matching. However about 1/3 of the
time of the function is used to create 'changectx' object and retrieve their
'file' field.
This is far too much overhead so we are skipping the changectx layer and
directly access the data from the changelog. This provide use significant speed
up:
repository: mozilla central 252524 revisions
command: hg perfrevset '_matchfiles("p:browser")'
Before: 15.899687s
After: 10.011705s
Slowdown is even more significant if you have a lot of namespace that slowdown
lookup.
The time is now spent with this approximate repartition:
Matcher: 20%
regexp matching: 10%
changelog.read: 80%
reading revision: 60%
checking hash: 15%
decompression: 15%
reading chunk: 30%
changelog parsing: 20%
decoding to local: 10%
The next easy win is probably to have more of the changelog stack implemented
using the CPython api.
Function in destutil are much simpler to wrap and more flexible than revset.
This also help consistency as 'destupdate' live here and cannot become a pure
revset anyway.
The revset is not ready for prime time yet. However it is useful to have some
version of it exposed to help candidate users to play with it and provide
feedback on what we should aim at.
We add a small test to make sure the code runs.
It's common for GUI or web frontend to fetch chunk of revisions per batch
size. Previously it was possible only if revisions were sorted by revision
number.
$ hg log -r 'limit({revspec} & :{last_known}, 101)'
So this patch introduces a general way to retrieve chunk of revisions after
skipping offset revisions.
$ hg log -r 'limit({revspec}, 100, {last_count})'
This is a dumb implementation. We can optimize it for baseset and spanset
later.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
We ultimately want this to be accessible through a revset, but there is too
much complexity here for that to work. Especially we'll have to return more
than just the destination to control the behavior (eg: bookmarks to activate,
etc).
To prevent cycle, a new module is created, it will receive other
destination/behavior function in the future.
Previously, calling 'if foo:' on a ordered filtered set would start iterating in
whatever the current direction was and return if a value was available. If the
current direction was ascending, but the set had a fastdesc available, this
meant we did a lot more work than necessary.
If this was applied without my previous max/min fixes, it would improve max()
performance (this was my first attempt at fixing the issue). Since those
previous fixes went in though, this doesn't have a visible benefit in the
benchmarks, but it does seem clearly better than it was before so I think it
should still go in.
min() and max() would first do an existence check. Unfortunately existence
checks can be slow in certain situations (like if the smartset is a list, and
quickly iterable in both ascending and descending directions, then doing an
existence check will start from the bottom, even if you want to check the
max()).
The fix is to not do the check, and just handle the error if it happens. In a
large repo, this speeds up:
hg log -r 'max(parents(. + .^) - (. + .^) & ::master)'
from 3.5s to 0.85s. That revset is contrived and just for testing. In our
real case we used 'bundle()' in place of '. + .^'
Interesting perf numbers for the revset benchmarks:
max(draft() and ::tip) => 0.027s to 0.0005s
max(author(lmoscovicz)) => 2.48s to 0.57s
min doesn't show any perf changes, but changing it as well will prevent a perf
regression in my next patch.
Result from revset benchmark
revset #0: draft() and ::tip
min max
0) 0.001971 0.001991
1) 0.001965 0.000428 21%
revset #1: ::tip and draft()
min max
0) 0.002017 0.001912
1) 0.001896 94% 0.000421 22%
revset #2: author(lmoscovicz)
min max
0) 1.049033 1.358913
1) 1.042508 0.319824 23%
revset #3: author(lmoscovicz) or author(mpm)
min max
0) 1.042512 1.367432
1) 1.019750 0.327750 23%
revset #4: author(mpm) or author(lmoscovicz)
min max
0) 1.050135 0.324924
1) 1.070698 0.319913
revset #5: roots((tip~100::) - (tip~100::tip))
min max
0) 0.000671 0.001018
1) 0.000605 90% 0.000946 92%
revset #6: roots((0::) - (0::tip))
min max
0) 0.149714 0.152369
1) 0.098677 65% 0.100374 65%
revset #7: (20000::) - (20000)
min max
0) 0.051019 0.042747
1) 0.035586 69% 0.016267 38%
This is another step toward having "default" destination more clear and unified.
Not all the logic is there because some bookmark related computation happened
elsewhere. It will be moved later.
The function is private because as for the other ones, cleanup is needed before
we can proceed.
Since ca895be75c36, condition function returns a cached value, so there's
little benefit to cache __contains__.
No measurable difference found in contrib/base-revsets.txt.
When using multiple revsets that get optimized into a list (like
hg log -r r1235 -r r1237 in hgsubversion), the revset list code was assuming the
strings were resolvable via repo[X]. hgsubversion and other extensions override
def stringset() to allow processing different revision identifiers (such as
r1235 or g<githash>), and there for the _list() implementation was circumventing
that resolution.
The fix is to just call stringset(). The default implementaiton does the same
thing that _list was already doing (namely repo[X]).
This has always been broken, but it was recently exposed by ad142c72c6db which
made "--rev X --rev Y" produce a combined revset "X | Y".
Before this patch, follow only supports full, exact filenames.
This patch makes follow argument to be treated like file
pattern same way like log treats their arguments.
It preserves current behaviour of follow() matching paths
relative to the repository root by default.
As the content of a smartset never changes, min and max will never change
either. This will save us time when this function is called multiple times.
This is relevant for issue4782 but does not fix it.
Changeset 79b4c33e868f uses smartset lazy sorting for the C version. We need to
apply the same to the pure version for consistency. This is fixing the tests
with --pure.
Baseset needs a list to operate, but will convert that list back to a set for
membership testing. It seems a bit silly to convert the set into a list to
convert it back afterward.
The main goal of this patch series is to reduce the use of PyXxx() function
that is likely to require ugly error handling and inc/decref. Plus, this is
faster than using PySet_Contains().
revset #0: 0::tip
0) 0.004168
1) 0.003678 88%
This patch ignores out-of-range roots as they are in the pure implementation.
Because reachable sets are calculated from heads, and out-of-range heads raise
IndexError, we can just take out-of-range roots as unreachable. Otherwise,
the test of "hg log -Gr '. + wdir()'" would fail.
"heads" argument is changed to a list. Should we have to rename the C function
as its signature is changed?
This patch is part of a series of patches to speed up the computation of
revset.reachableroots by introducing a C implementation. The main motivation is to
speed up smartlog on big repositories. At the end of the series, on our big
repositories the computation of reachableroots is 10-50x faster and smartlog on is
2x-5x faster.
Before this patch, reachableroots was computed in pure Python by default. This
patch makes the C implementation the default and provides a speedup for
reachableroots.
This patch is part of a series of patches to speed up the computation of
revset.revsbetween by introducing a C implementation. The main motivation is to
speed up smartlog on big repositories. At the end of the series, on our big
repositories the computation of revsbetween is 10-50x faster and smartlog on is
2x-5x faster.
This patch rename 'revsbetween' to 'reachableroots' and makes the computation of
the full path optional. This will allow graphlog to compute grandparents using
'reachableroots' and remove the need for a dedicated grandparent function.
This patch is part of a series of patches to speed up the computation of
revset.revsbetween by introducing a C implementation. The main motivation is to
speed up smartlog on big repositories. At the end of the series, on our big
repositories the computation of revsbetween is 10-50x faster and smartlog on is
2x-5x faster.
Later in this serie, we want to reuse the implementation of revsbetween in the
changelog module, therefore, we make it public.
An empty group expression "()" generates None in AST, so it should be tested
before destructuring a tuple.
"A | ()" is still evaluated to an error because I'm not sure whether "()"
represents an empty set or an empty expression (= a unit value). They are
identical in "or" operation, but they should be evaluated differently in
"and" operation.
expression empty set unit value
---------- --------- ----------
() {} A
A & () {} A
A | () A A
An empty group expression "()" generates None in AST, so the optimizer have
to test it before destructuring a tuple. The error message, "missing argument",
is somewhat obscure, but it should be better than crash.
This will allow us to define both a primary expression, ":", and a prefix
operator, ":y". The ambiguity will be resolved by the next patch.
Prefix actions in elements table are adjusted as follows:
original prefix primary prefix
----------------- -------- -----------------
("group", 1, ")") -> n/a ("group", 1, ")")
("negate", 19) -> n/a ("negate", 19)
("symbol",) -> "symbol" n/a
This is the simplest way to handle wdir() revision in revset. None didn't
work well because revset heavily depends on integer operations such as min(),
max(), sorted(), x:y, etc.
One downside is that we cannot do "wctx.rev() in set" because wctx.rev() is
still None. We could wrap the result set by wdirproxyset that translates None
to wdirrev, but it seems overengineered at this point.
result = getset(repo, subset, tree)
if 'wdir' in funcsused(tree):
result = wdirproxyset(result)
Test cases need the '(all() + wdir()) &' hack because we have yet to fix the
bootstrapping issue of null and wdir.
As already demonstrated, saving attribute lookup gains us some minor but
noticeable performance improvements.
revset #0: parents(all())
before) 0.024169
after ) 0.022756 94%