Previously we would iterate over every item in the subset, checking if it was in
the provided args. This often meant iterating over every rev in the repo.
Now we iterate over the args provided, checking if they exist in the subset.
On a large repo this brings setting phase boundaries (which use this revset
roots(X:: - X::Y)) down from 0.8 seconds to 0.4 seconds.
The "roots((tip~100::) - (tip~100::tip))" revset in revsetbenchmarks shows it
going from 0.12s to 0.10s, so we should be able to catch regressions here in the
future.
This actually introduces a regression in 'roots(all())' (0.2s to 0.26s) since
we're now using spansets, which are slightly slower to do containment checks on.
I believe this trade off is worth it, since it makes the revset O(number of
args) instead of O(size of repo).
Previously revset._descendants would iterate over the entire subset (which is
often the entire repo) and test if each rev was in the descendants list. This is
really slow on large repos (3+ seconds).
Now we iterate over the descendants and test if they're in the subset.
This affects advancing and retracting the phase boundary (3.5 seconds down to
0.8 seconds, which is even faster than it was in 2.9). Also affects commands
that move the phase boundary (commit and rebase, presumably).
The new revsetbenchmark indicates an improvement from 0.2 to 0.12 seconds. So
future revset changes should be able to notice regressions.
I removed a bad test. It was recently added and tested '1:: and reverse(all())',
which has an amibiguous output direction. Previously it printed in reverse order,
because we iterated over the subset (the reverse part). Now it prints in normal
order because we iterate over the 1:: . Since the revset itself doesn't imply an
order, I removed the test.
min([]) raise a ValueError, we do the same thing in smartset.min() and
smartset.max() for the sake of consistency.
The min/amax test are greatly improved in the process to prevent this familly
of regression
Back in the time where repo.revs(...) returned a list, calling `len(...)` on the
result was quite common. We reinstall this on _addset.
There is absolutely no easy way to test this from the command line. The commands
using this in the evolve extension will eventually land into core.
If two things were iterating over a generatorset at the same time, they could
miss out on the things the other was generating, resulting in incomplete
results. This fixes it by making it possible for two things to iterate at once,
by always checking the _genlist at the beginning of each iteration.
I was only able to repro it with pending changes from my other commits, but they
aren't ready yet. So I'm unable to add a test for now.
_generatorset.__contains__ and __contains__ from child classes were
calling into __iter__ to look for values. Since all
previously-encountered values from the generator were cached and checked
in __contains__ before this iteration, __contains__ was effectively
performing iteration busy work which could lead to an explosion of
redundant work.
This patch changes __contains__ to be more intelligent. Instead of
looking at all values via __iter__, __contains__ will instead go
straight to "new" values from the underlying generator.
On a clone of the Firefox repository with around 200,000 changesets,
this patch decreases the execution time of the revset '::(200067)::'
from ~100s to ~4s on the author's machine. Rebase operations (which use
the aforementioned revset), speed up accordingly.
Formerly an expression like "2.4-rc::" was tokenized as 2.4|-|rc|::.
This allows dashes in symbols iff the whole symbol-like string can be
looked up. Otherwise, it's tokenized as a series of symbols and
operators.
No attempt is made to accept dashed symbols inside larger symbol-like
string, for instance foo-bar or bar-baz inside foo-bar-baz.
This classes have no particular order so they rely on python min() and max()
implementation. This methods will be implemented in every smartset class in
future patches. For other classes there are lazy implementations that can be
made for this methods.
This methods are intended to duck-type baseset, so we will still have _addset
as a private class but now we can return it without wrapping it into an
orderedlazyset or a lazyset.
These were the last methods to add for smartset compatibility.
This method is intended to duck-type baseset, so we will still have _addset as a
private class but we will be able to return it without wrapping it into an
orderedlazyset or a lazyset.
This method is intended to duck-type baseset, so we will still have _addset as a
private class but now will be able to return it without wrapping it into an
orderedlazyset or a lazyset.
This method is intended to duck-type baseset, so we will still have _addset as a
private class but we will be able to return it without wrapping it into an
orderedlazyset or a lazyset.
This methods are intended to duck-type baseset, so we will still have _addset
as a private class but will be able return it without wrapping it into an
orderedlazyset or a lazyset.
This method is intended to duck-type baseset, so we will still have _addset
as a private class but we will be able return it without wrapping it into an
orderedlazyset or a lazyset.
This methods state if the class is sorted in an ascending or descending order
We need this to implement methods based on order on smartset classes in order
to be able to create new objects with a given order.
We cannot just rely on a simple boolean since unordered set are neither
ascending nor descending.
We need this method to duck-type generatorset since this class is not going to
be used outside revset.py and we don't need to duck-type baseset.
This sort method will only do something when the addset is not already sorted
or is not sorted in the way we want it to be.
If the two collections are in ascending order, yield their values in an
ordered way by iterating both at the same time and picking the values to
yield.
When a spanset was being sorted it didn't take into account it's current
state (ascending or descending) and it reversed itself everytime the reverse
parameter was True.
This is not yet used but it will be as soon as the sort revset is changed to
directly use the structures sort method.
Previously the head() revset would iterate over every item in the subset and
check if it was a head. Since the subset is often the entire repo, this was
slow on large repos. Now we iterate over each item in the head list and check if
it's in the subset, which results in much less work.
hg log -r 'head()' on a large repo:
Before: 0.95s
After: 0.28s
Since this class is only going to be used inside revset.py (it does not duck
type baseset) it needs to duck type only a few more methods for the next
patches.
This class is not supposed to be used outside revset.py since it only
wraps content that is used by baseset typed classes.
It only gets created by revset operations or private methods.
This class is not supposed to be used outside revset.py since it only
wraps content that is used by baseset typed classes.
It only gets created by revset operations or private methods.
This class is not supposed to be used outside revset.py since it only
wraps content that is used by baseset typed classes.
It only gets created by revset operations or private methods.
This class are not supposed to be used outside revset.py since it only
wraps content that is used by baseset typed classes.
It only gets created by revset operations or private methods.
Method needed to propagate sort calls amongst lazy structures.
The generated list (stored in the object) is sorted.
If the generated list did not contain all elements from the generator, we
take care of that before sorting the list.