f5a63a5506d2 regressed performance of baseset.__sub__ by introducing
a lazyset. This patch restores that lost performance by eagerly
evaluating baseset.__sub__ if the other set is a baseset.
revsetbenchmark.py results impacted by this change:
revset #6: roots(0::tip)
0) wall 2.923473 comb 2.920000 user 2.920000 sys 0.000000 (best of 4)
1) wall 0.077614 comb 0.080000 user 0.080000 sys 0.000000 (best of 100)
revset #23: roots((0:tip)::)
0) wall 2.875178 comb 2.880000 user 2.880000 sys 0.000000 (best of 4)
1) wall 0.154519 comb 0.150000 user 0.150000 sys 0.000000 (best of 61)
On the author's machine, this slowdown manifested during evaluation of
'roots(%ln::)' in phases.retractboundary after unbundling the Firefox
repository. Using `time hg unbundle firefox.hg` as a benchmark:
Before: 8:00
After: 4:28
Delta: -3:32
For reference, the subset and cs baseset instances impacted by this
change were of lengths 193634 and 193627, respectively.
Explicit test coverage of roots(%ln::), while similar to the existing
roots(0::tip) benchmark, has been added.
This previously died in _revdescendants() taking the min() of the first set to
only(), when it was empty. An empty second set already worked. Likewise,
descendants() already handled an empty set.
We use the python syntax for range comparison: `a < x < c`. This is shorter,
more readable and less error prone. This comparison escaped the cleanup make in
166d6dde9310
We get rid of lambda in a bunch of other place. This is equivalent and much
faster. (no new timing as this is the same change as three other changesets)
Spanset are massively used in revset. First because the initial subset itself is
a repo wide spanset. We speed up the __and__ operation by getting rid of a
gratuitous lambda call. A more long terms solution would be to:
1. speed up operation between spansets,
2. have a special smartset for `all` revisions.
In the mean time, this is a very simple fix that buyback some of the performance
regression.
Below is performance benchmark for trival `and` operation between two spansets.
(Run on an unspecified fairly large repository.)
revset tip:0
2.9.2) wall 0.282543 comb 0.280000 user 0.260000 sys 0.020000 (best of 35)
before) wall 0.819181 comb 0.820000 user 0.820000 sys 0.000000 (best of 12)
after) wall 0.645358 comb 0.650000 user 0.650000 sys 0.000000 (best of 16)
Proof of concept implementation of an `all` smartset brings this to 0.10 but it's
too invasive for stable.
Calling a function is super expensive in python. We inline the trivial range
comparison to get back to more sensible performance on common revset operation.
Benchmark result below:
Revision mapping:
0) bced32a3fd6c 2.9.2 release
1) 2ab64f462d81 current @
2) This revision
revset #0: public()
0) wall 0.010890 comb 0.010000 user 0.010000 sys 0.000000 (best of 201)
1) wall 0.012109 comb 0.010000 user 0.010000 sys 0.000000 (best of 199)
2) wall 0.012211 comb 0.020000 user 0.020000 sys 0.000000 (best of 197)
revset #1: :10000 and public()
0) wall 0.007141 comb 0.010000 user 0.010000 sys 0.000000 (best of 361)
1) wall 0.014139 comb 0.010000 user 0.010000 sys 0.000000 (best of 186)
2) wall 0.008334 comb 0.010000 user 0.010000 sys 0.000000 (best of 308)
revset #2: draft()
0) wall 0.009610 comb 0.010000 user 0.010000 sys 0.000000 (best of 279)
1) wall 0.010942 comb 0.010000 user 0.010000 sys 0.000000 (best of 243)
2) wall 0.011036 comb 0.010000 user 0.010000 sys 0.000000 (best of 239)
revset #3: :10000 and draft()
0) wall 0.006852 comb 0.010000 user 0.010000 sys 0.000000 (best of 383)
1) wall 0.014641 comb 0.010000 user 0.010000 sys 0.000000 (best of 183)
2) wall 0.008314 comb 0.010000 user 0.010000 sys 0.000000 (best of 299)
We can see this changeset gains back the regression for `and` operation on
spanset. We are still a bit slowerfor the `public()` and `draft()`. Predicates
not touched by this changeset.
The argument is `x` but the variable tested for filtering is `rev`. `rev`
happens to be a revset methods, ... never part of the filtered revs. This
method is now using `rev` for everything.
For a Mercurial new-comer, the distinction between `contains(x)`,
`file(x)`, and `filelog(x)` in the "revsets" help page may not be
obvious. This commit tries to make things more obvious (text based on
an explanation from Matt in an FB group thread).
Previously we would iterate over every item in the subset, checking if it was in
the provided args. This often meant iterating over every rev in the repo.
Now we iterate over the args provided, checking if they exist in the subset.
On a large repo this brings setting phase boundaries (which use this revset
roots(X:: - X::Y)) down from 0.8 seconds to 0.4 seconds.
The "roots((tip~100::) - (tip~100::tip))" revset in revsetbenchmarks shows it
going from 0.12s to 0.10s, so we should be able to catch regressions here in the
future.
This actually introduces a regression in 'roots(all())' (0.2s to 0.26s) since
we're now using spansets, which are slightly slower to do containment checks on.
I believe this trade off is worth it, since it makes the revset O(number of
args) instead of O(size of repo).
Previously revset._descendants would iterate over the entire subset (which is
often the entire repo) and test if each rev was in the descendants list. This is
really slow on large repos (3+ seconds).
Now we iterate over the descendants and test if they're in the subset.
This affects advancing and retracting the phase boundary (3.5 seconds down to
0.8 seconds, which is even faster than it was in 2.9). Also affects commands
that move the phase boundary (commit and rebase, presumably).
The new revsetbenchmark indicates an improvement from 0.2 to 0.12 seconds. So
future revset changes should be able to notice regressions.
I removed a bad test. It was recently added and tested '1:: and reverse(all())',
which has an amibiguous output direction. Previously it printed in reverse order,
because we iterated over the subset (the reverse part). Now it prints in normal
order because we iterate over the 1:: . Since the revset itself doesn't imply an
order, I removed the test.
min([]) raise a ValueError, we do the same thing in smartset.min() and
smartset.max() for the sake of consistency.
The min/amax test are greatly improved in the process to prevent this familly
of regression
Back in the time where repo.revs(...) returned a list, calling `len(...)` on the
result was quite common. We reinstall this on _addset.
There is absolutely no easy way to test this from the command line. The commands
using this in the evolve extension will eventually land into core.
If two things were iterating over a generatorset at the same time, they could
miss out on the things the other was generating, resulting in incomplete
results. This fixes it by making it possible for two things to iterate at once,
by always checking the _genlist at the beginning of each iteration.
I was only able to repro it with pending changes from my other commits, but they
aren't ready yet. So I'm unable to add a test for now.
_generatorset.__contains__ and __contains__ from child classes were
calling into __iter__ to look for values. Since all
previously-encountered values from the generator were cached and checked
in __contains__ before this iteration, __contains__ was effectively
performing iteration busy work which could lead to an explosion of
redundant work.
This patch changes __contains__ to be more intelligent. Instead of
looking at all values via __iter__, __contains__ will instead go
straight to "new" values from the underlying generator.
On a clone of the Firefox repository with around 200,000 changesets,
this patch decreases the execution time of the revset '::(200067)::'
from ~100s to ~4s on the author's machine. Rebase operations (which use
the aforementioned revset), speed up accordingly.
Formerly an expression like "2.4-rc::" was tokenized as 2.4|-|rc|::.
This allows dashes in symbols iff the whole symbol-like string can be
looked up. Otherwise, it's tokenized as a series of symbols and
operators.
No attempt is made to accept dashed symbols inside larger symbol-like
string, for instance foo-bar or bar-baz inside foo-bar-baz.
This classes have no particular order so they rely on python min() and max()
implementation. This methods will be implemented in every smartset class in
future patches. For other classes there are lazy implementations that can be
made for this methods.
This methods are intended to duck-type baseset, so we will still have _addset
as a private class but now we can return it without wrapping it into an
orderedlazyset or a lazyset.
These were the last methods to add for smartset compatibility.
This method is intended to duck-type baseset, so we will still have _addset as a
private class but we will be able to return it without wrapping it into an
orderedlazyset or a lazyset.
This method is intended to duck-type baseset, so we will still have _addset as a
private class but now will be able to return it without wrapping it into an
orderedlazyset or a lazyset.
This method is intended to duck-type baseset, so we will still have _addset as a
private class but we will be able to return it without wrapping it into an
orderedlazyset or a lazyset.
This methods are intended to duck-type baseset, so we will still have _addset
as a private class but will be able return it without wrapping it into an
orderedlazyset or a lazyset.
This method is intended to duck-type baseset, so we will still have _addset
as a private class but we will be able return it without wrapping it into an
orderedlazyset or a lazyset.
This methods state if the class is sorted in an ascending or descending order
We need this to implement methods based on order on smartset classes in order
to be able to create new objects with a given order.
We cannot just rely on a simple boolean since unordered set are neither
ascending nor descending.
We need this method to duck-type generatorset since this class is not going to
be used outside revset.py and we don't need to duck-type baseset.
This sort method will only do something when the addset is not already sorted
or is not sorted in the way we want it to be.