When the filtered subset has such methods, we can use them. It is implemented
as properties to be able to quickly return None if no corresponding fastasc exists
on the subset.
Instead of having the direction of iteration enforced through the ordering of
`start` and `end` attributes of spanset, we encode the iteration direction in
an explicit attribute and always store start < end. The logic for sort and
reverse has to be updated. The __iter__ is now based on the newly introduced
`fastasc` and `fastdesc` methods.
This will allow other code simplifications in the future.
These two methods are actually silly aliases for `sort()` and
`sort(reverse=True)`. So we get that aliasing at the abstractsmartset level. We
will slowly phase out all the custom implementations and eventually remove any
mentions of it from the code.
Changeset 1440ec8e33c0 switched the order of the operand of the "&" computation
to work around an issue from repo-wide spanset. The need for a workaround has been
alleviated by the introduction of `fullreposet`. So we restore it to normal.
The benchmark shows no significant changes as expected.
We also revert the bogus test change introduced by 1440ec8e33c0. The order is
actually important.
This class documents all methods required by a smartset. This makes it easier
for people to respect the API and ensure we fail loudly when something does
not. It will later also contain common default implementations for multiple
methods, making it easier to have smartset classes with minimal work.
We are about to add a base class for `baseset` with an abstract `__nonzero__`
method. So we need this method to be explicitly defined to avoid issues. The
built-in list object apparently does not have a `__nonzero__` and relies on
`__len__` for this purpose?
Using `x.__contains__(r)` instead of `r in x` does not matter for built-in type
(set) but have a positive impact for all other classes. This will let us drop
some usage of baseset.set() in future patches. This also probably improves some
performance.
Doing manual iteration is expensible. We rely on built in list iteration
whenever possible. The other case has to become a closure we cannot have a both
yield and return in the same function.
This takes advantage of the `fullreposet` smartness with a nice
speedup. It's a similar speedup to `p1()` when a merge is in progress
(the non merge case is already lightning fast anyway.)
This takes advantage of the `fullreposet` smartness and yields a nice
speedup.
revset #0: p1()
0) wall 0.003256 comb 0.010000 user 0.010000 sys 0.000000 (best of 527)
1) wall 0.000066 comb 0.000000 user 0.000000 sys 0.000000 (best of 23224)
This takes advantage of the `fullreposet` smartness and yields a nice
speedup.
revset #0: rev(25)
0) wall 0.005480 comb 0.000000 user 0.000000 sys 0.000000 (best of 305)
1) wall 0.000052 comb 0.000000 user 0.000000 sys 0.000000 (best of 21891)
This takes advantage of the `fullreposet` smartness.
revset #0: origin(tip)
0) wall 0.005353 comb 0.000000 user 0.000000 sys 0.000000 (best of 354)
1) wall 0.003080 comb 0.000000 user 0.000000 sys 0.000000 (best of 446)
This takes advantage of the `fullreposet` smartness.
revset #0: follow(COPYING)
0) wall 0.002446 comb 0.000000 user 0.000000 sys 0.000000 (best of 735)
1) wall 0.000331 comb 0.000000 user 0.000000 sys 0.000000 (best of 5672)
This takes advantage of the `fullreposet` smartness.
revset #0: file(COPYING)
0) wall 3.179066 comb 3.180000 user 3.140000 sys 0.040000 (best of 3)
1) wall 2.723699 comb 2.730000 user 2.690000 sys 0.040000 (best of 4)
This takes advantage of the `fullreposet` smartness.
revset #0: divergent()
0) wall 0.002047 comb 0.000000 user 0.000000 sys 0.000000 (best of 813)
1) wall 0.000052 comb 0.000000 user 0.000000 sys 0.000000 (best of 22757)
This takes advantage of the `fullreposet` smartness.
revset #0: bisect(range)
0) wall 0.014007 comb 0.010000 user 0.010000 sys 0.000000 (best of 115)
1) wall 0.005556 comb 0.010000 user 0.010000 sys 0.000000 (best of 235)
This takes advantage of the `fullreposet` smartness.
revset #0: tip~25
0) wall 0.004800 comb 0.010000 user 0.010000 sys 0.000000 (best of 259)
1) wall 0.002475 comb 0.000000 user 0.000000 sys 0.000000 (best of 717)
This should give us the same benefit as elsewhere. Result is simpler (and
"faster").
Outgoing is dominated by the discovery so no benchmark is provided.
Python lookups are slow, so do all lookup outside of the for loop.
This provide a small but still significant speedup:
revset #0: 0::
0) wall 0.063258 comb 0.060000 user 0.060000 sys 0.000000 (best of 100)
1) wall 0.057776 comb 0.050000 user 0.050000 sys 0.000000 (best of 100)
Attribute lookup is slow in python. So this version is going to be a bit
faster. This does not have a visible impact since the rest of the stack is much
slower but this shaves the yak a few extra nanometers.
Moreover the new version is more readable so it worth doing this change for code
quality purpose.
This optimisation was approved by a core python dev.
"And" operation with something that contains the whole repo should be super
cheap. Check method docstring for details.
This provide massive boost to simple revset that use `subset & xxx`
revset #0: p1(20000)
0) wall 0.002447 comb 0.010000 user 0.010000 sys 0.000000 (best of 767)
1) wall 0.000529 comb 0.000000 user 0.000000 sys 0.000000 (best of 3947)
revset #1: p2(10000)
0) wall 0.002464 comb 0.000000 user 0.000000 sys 0.000000 (best of 913)
1) wall 0.000530 comb 0.000000 user 0.000000 sys 0.000000 (best of 4226)
No other regression spotted.
More performance improvements are expected in the future as more
revset predicate are converted to use `subset & xxx`
The relaxed way `fullreposet` handles "&" operation may cause some trouble for
people comparing smartset from different filter levels. I'm not sure such people
exist and we can improve that aspect in later patches.
We rename the `spanset` class to `_spanset`. `spanset` is now a function that
builds either a `fullreposet` or a `_spanset` according to the argument passed.
At some point, we may force people to explicitly use the `fullreposet`
constructor, but the current approach makes it easier to ensure we use the new
class whenever possible and focus on the benefits of this class.
Every revset evaluation starts from `subset = spanset(repo)` and a lot of
revset predicates build a `spansetrepo` for their internal needs.
`spanset` is a generic class that can handle any situation. As a result a lot
of operation between spanset result in an `orderedlazyset`, a safe object but
suboptimal in may situation.
So we introduce a `fullreposet` class where some of the operation will be
overwritten to produce more interesting results.
The old code relied on the subset contents to get rid of invalid values. We would
like to be able to rely more on the computation in parents() so we filter out
the invalid value.
Both paths are doing similar thing in the end. We refactor the function so that
the `ps` set is commonly used at the end.
This will end excluding `nullrev` from this set in a future patch
The old code relied on the subset contents to get rid of invalid values. We would
like to be able to rely more on the computation in p1() and p2() so we filter out
the invalid value
The baseset class is based on a python list. This means that base.__contains__
was absolutely as crappy as list.__contains__. We now rely on __contains__ from
the underlying set.
This will avoid having to explicitly convert the baseset to a set (using
baseset.set()) whenever one want fast membership test.
Apparently there is already code that forgot to do such conversions since we
observe a massive speedup in some test.
revset #25: roots((0::) - (0::tip))
0) wall 2.079454 comb 2.080000 user 2.080000 sys 0.000000 (best of 5)
1) wall 0.132970 comb 0.130000 user 0.130000 sys 0.000000 (best of 65)
No regression is observed in benchmarks.
This change improve the issue4371 back to acceptable situation (but are still
slower than manual substraction)
We can simply use the `self.isascending` value instead of more complex if/else
clause. This get the code simpler.
Benchmarks show no performances harmed in the process.
Before this patches, empty spanset were seen as neither ascending nor
descending. This is mathematically wrong and create some edges case. We put
`isascending` and `isdescending` back on track so we can use them to simplify
some of the spanset code.
Benchmarks show no performances harmed in the process.
The histedit command uses a revset like:
(_intlist('1234\x001235')) and merge()
Previously the optimizer gave a weight of 1.5 to the _intlist side (1 for the
function, 0.5 for the string) which caused it to process the merge() side first.
This caused it to evaluate merge against every commit in the repo, which took
2.5 seconds on a large repo.
I changed the weight of _intlist to 0, since it's a trivial calculation, which
makes it process intlist first, which makes merge apply only to the revs in the
list. Which makes the revset take 0.15 seconds now. Cutting off 2.4 seconds off
our histedit performance.
>From the revset benchmark:
revset #25: (_intlist('20000\x0020001')) and merge()
0) obsolete feature not enabled but 54243 markers found!
! wall 0.036767 comb 0.040000 user 0.040000 sys 0.000000 (best of 100)
1) obsolete feature not enabled but 54243 markers found!
! wall 0.000198 comb 0.000000 user 0.000000 sys 0.000000 (best of 9084)
Strip executes a revset like this:
max(parents(_intlist('1234\x001235')) - _intlist('1234\x001235'))
Previously the parents() revset would do 'subset & parents' which iterates over
each item in the subset and checks if it's in parents. subset is usually the
entire repo (a spanset) so this takes a while.
Reversing the parameters to be 'parents & subset' means the operation becomes
O(number of parents) instead of O(size of repo). It also means the result gets
evaluated immediately (since parents isn't a lazy set), but I think this is a
win in most scenarios.
This shaves 0.3 seconds off strip (amend/histedit/rebase/etc) for large repositories.
revset #0: parents(20000)
0) obsolete feature not enabled but 54243 markers found!
! wall 0.006256 comb 0.010000 user 0.010000 sys 0.000000 (best of 289)
1) obsolete feature not enabled but 54243 markers found!
! wall 0.000391 comb 0.000000 user 0.000000 sys 0.000000 (best of 4323)
Previously descendants() would force the provided subset to become a set. In
the case of revsets like '(%ld::) - (%ld)' (as used by histedit) this would
force the '- (%ld)' set to be evaluated, which produced a set containing every
commit in the repo (except %ld). This takes 0.6s on large repos.
This changes descendants to trust the subset to implement __contains__
efficiently, which improves the above revset to 0.16s. Shaving 0.4 seconds off
of histedit.
revset #27: (20000::) - (20000)
0) obsolete feature not enabled but 54243 markers found!
! wall 0.023640 comb 0.020000 user 0.020000 sys 0.000000 (best of 100)
1) obsolete feature not enabled but 54243 markers found!
! wall 0.019589 comb 0.020000 user 0.020000 sys 0.000000 (best of 100)
This commit removes the final revset related perf hotspot from histedit.
Combined with the previous two patches, they shave a little over 3 seconds off
histedit on large repos.
f5a63a5506d2 regressed performance of baseset.__sub__ by introducing
a lazyset. This patch restores that lost performance by eagerly
evaluating baseset.__sub__ if the other set is a baseset.
revsetbenchmark.py results impacted by this change:
revset #6: roots(0::tip)
0) wall 2.923473 comb 2.920000 user 2.920000 sys 0.000000 (best of 4)
1) wall 0.077614 comb 0.080000 user 0.080000 sys 0.000000 (best of 100)
revset #23: roots((0:tip)::)
0) wall 2.875178 comb 2.880000 user 2.880000 sys 0.000000 (best of 4)
1) wall 0.154519 comb 0.150000 user 0.150000 sys 0.000000 (best of 61)
On the author's machine, this slowdown manifested during evaluation of
'roots(%ln::)' in phases.retractboundary after unbundling the Firefox
repository. Using `time hg unbundle firefox.hg` as a benchmark:
Before: 8:00
After: 4:28
Delta: -3:32
For reference, the subset and cs baseset instances impacted by this
change were of lengths 193634 and 193627, respectively.
Explicit test coverage of roots(%ln::), while similar to the existing
roots(0::tip) benchmark, has been added.
This previously died in _revdescendants() taking the min() of the first set to
only(), when it was empty. An empty second set already worked. Likewise,
descendants() already handled an empty set.
We use the python syntax for range comparison: `a < x < c`. This is shorter,
more readable and less error prone. This comparison escaped the cleanup make in
166d6dde9310
We get rid of lambda in a bunch of other place. This is equivalent and much
faster. (no new timing as this is the same change as three other changesets)
Spanset are massively used in revset. First because the initial subset itself is
a repo wide spanset. We speed up the __and__ operation by getting rid of a
gratuitous lambda call. A more long terms solution would be to:
1. speed up operation between spansets,
2. have a special smartset for `all` revisions.
In the mean time, this is a very simple fix that buyback some of the performance
regression.
Below is performance benchmark for trival `and` operation between two spansets.
(Run on an unspecified fairly large repository.)
revset tip:0
2.9.2) wall 0.282543 comb 0.280000 user 0.260000 sys 0.020000 (best of 35)
before) wall 0.819181 comb 0.820000 user 0.820000 sys 0.000000 (best of 12)
after) wall 0.645358 comb 0.650000 user 0.650000 sys 0.000000 (best of 16)
Proof of concept implementation of an `all` smartset brings this to 0.10 but it's
too invasive for stable.
Calling a function is super expensive in python. We inline the trivial range
comparison to get back to more sensible performance on common revset operation.
Benchmark result below:
Revision mapping:
0) bced32a3fd6c 2.9.2 release
1) 2ab64f462d81 current @
2) This revision
revset #0: public()
0) wall 0.010890 comb 0.010000 user 0.010000 sys 0.000000 (best of 201)
1) wall 0.012109 comb 0.010000 user 0.010000 sys 0.000000 (best of 199)
2) wall 0.012211 comb 0.020000 user 0.020000 sys 0.000000 (best of 197)
revset #1: :10000 and public()
0) wall 0.007141 comb 0.010000 user 0.010000 sys 0.000000 (best of 361)
1) wall 0.014139 comb 0.010000 user 0.010000 sys 0.000000 (best of 186)
2) wall 0.008334 comb 0.010000 user 0.010000 sys 0.000000 (best of 308)
revset #2: draft()
0) wall 0.009610 comb 0.010000 user 0.010000 sys 0.000000 (best of 279)
1) wall 0.010942 comb 0.010000 user 0.010000 sys 0.000000 (best of 243)
2) wall 0.011036 comb 0.010000 user 0.010000 sys 0.000000 (best of 239)
revset #3: :10000 and draft()
0) wall 0.006852 comb 0.010000 user 0.010000 sys 0.000000 (best of 383)
1) wall 0.014641 comb 0.010000 user 0.010000 sys 0.000000 (best of 183)
2) wall 0.008314 comb 0.010000 user 0.010000 sys 0.000000 (best of 299)
We can see this changeset gains back the regression for `and` operation on
spanset. We are still a bit slowerfor the `public()` and `draft()`. Predicates
not touched by this changeset.
The argument is `x` but the variable tested for filtering is `rev`. `rev`
happens to be a revset methods, ... never part of the filtered revs. This
method is now using `rev` for everything.
For a Mercurial new-comer, the distinction between `contains(x)`,
`file(x)`, and `filelog(x)` in the "revsets" help page may not be
obvious. This commit tries to make things more obvious (text based on
an explanation from Matt in an FB group thread).
Previously we would iterate over every item in the subset, checking if it was in
the provided args. This often meant iterating over every rev in the repo.
Now we iterate over the args provided, checking if they exist in the subset.
On a large repo this brings setting phase boundaries (which use this revset
roots(X:: - X::Y)) down from 0.8 seconds to 0.4 seconds.
The "roots((tip~100::) - (tip~100::tip))" revset in revsetbenchmarks shows it
going from 0.12s to 0.10s, so we should be able to catch regressions here in the
future.
This actually introduces a regression in 'roots(all())' (0.2s to 0.26s) since
we're now using spansets, which are slightly slower to do containment checks on.
I believe this trade off is worth it, since it makes the revset O(number of
args) instead of O(size of repo).
Previously revset._descendants would iterate over the entire subset (which is
often the entire repo) and test if each rev was in the descendants list. This is
really slow on large repos (3+ seconds).
Now we iterate over the descendants and test if they're in the subset.
This affects advancing and retracting the phase boundary (3.5 seconds down to
0.8 seconds, which is even faster than it was in 2.9). Also affects commands
that move the phase boundary (commit and rebase, presumably).
The new revsetbenchmark indicates an improvement from 0.2 to 0.12 seconds. So
future revset changes should be able to notice regressions.
I removed a bad test. It was recently added and tested '1:: and reverse(all())',
which has an amibiguous output direction. Previously it printed in reverse order,
because we iterated over the subset (the reverse part). Now it prints in normal
order because we iterate over the 1:: . Since the revset itself doesn't imply an
order, I removed the test.
min([]) raise a ValueError, we do the same thing in smartset.min() and
smartset.max() for the sake of consistency.
The min/amax test are greatly improved in the process to prevent this familly
of regression
Back in the time where repo.revs(...) returned a list, calling `len(...)` on the
result was quite common. We reinstall this on _addset.
There is absolutely no easy way to test this from the command line. The commands
using this in the evolve extension will eventually land into core.
If two things were iterating over a generatorset at the same time, they could
miss out on the things the other was generating, resulting in incomplete
results. This fixes it by making it possible for two things to iterate at once,
by always checking the _genlist at the beginning of each iteration.
I was only able to repro it with pending changes from my other commits, but they
aren't ready yet. So I'm unable to add a test for now.
_generatorset.__contains__ and __contains__ from child classes were
calling into __iter__ to look for values. Since all
previously-encountered values from the generator were cached and checked
in __contains__ before this iteration, __contains__ was effectively
performing iteration busy work which could lead to an explosion of
redundant work.
This patch changes __contains__ to be more intelligent. Instead of
looking at all values via __iter__, __contains__ will instead go
straight to "new" values from the underlying generator.
On a clone of the Firefox repository with around 200,000 changesets,
this patch decreases the execution time of the revset '::(200067)::'
from ~100s to ~4s on the author's machine. Rebase operations (which use
the aforementioned revset), speed up accordingly.
Formerly an expression like "2.4-rc::" was tokenized as 2.4|-|rc|::.
This allows dashes in symbols iff the whole symbol-like string can be
looked up. Otherwise, it's tokenized as a series of symbols and
operators.
No attempt is made to accept dashed symbols inside larger symbol-like
string, for instance foo-bar or bar-baz inside foo-bar-baz.
This classes have no particular order so they rely on python min() and max()
implementation. This methods will be implemented in every smartset class in
future patches. For other classes there are lazy implementations that can be
made for this methods.
This methods are intended to duck-type baseset, so we will still have _addset
as a private class but now we can return it without wrapping it into an
orderedlazyset or a lazyset.
These were the last methods to add for smartset compatibility.
This method is intended to duck-type baseset, so we will still have _addset as a
private class but we will be able to return it without wrapping it into an
orderedlazyset or a lazyset.
This method is intended to duck-type baseset, so we will still have _addset as a
private class but now will be able to return it without wrapping it into an
orderedlazyset or a lazyset.
This method is intended to duck-type baseset, so we will still have _addset as a
private class but we will be able to return it without wrapping it into an
orderedlazyset or a lazyset.
This methods are intended to duck-type baseset, so we will still have _addset
as a private class but will be able return it without wrapping it into an
orderedlazyset or a lazyset.
This method is intended to duck-type baseset, so we will still have _addset
as a private class but we will be able return it without wrapping it into an
orderedlazyset or a lazyset.
This methods state if the class is sorted in an ascending or descending order
We need this to implement methods based on order on smartset classes in order
to be able to create new objects with a given order.
We cannot just rely on a simple boolean since unordered set are neither
ascending nor descending.
We need this method to duck-type generatorset since this class is not going to
be used outside revset.py and we don't need to duck-type baseset.
This sort method will only do something when the addset is not already sorted
or is not sorted in the way we want it to be.
If the two collections are in ascending order, yield their values in an
ordered way by iterating both at the same time and picking the values to
yield.
When a spanset was being sorted it didn't take into account it's current
state (ascending or descending) and it reversed itself everytime the reverse
parameter was True.
This is not yet used but it will be as soon as the sort revset is changed to
directly use the structures sort method.
Previously the head() revset would iterate over every item in the subset and
check if it was a head. Since the subset is often the entire repo, this was
slow on large repos. Now we iterate over each item in the head list and check if
it's in the subset, which results in much less work.
hg log -r 'head()' on a large repo:
Before: 0.95s
After: 0.28s
Since this class is only going to be used inside revset.py (it does not duck
type baseset) it needs to duck type only a few more methods for the next
patches.
This class is not supposed to be used outside revset.py since it only
wraps content that is used by baseset typed classes.
It only gets created by revset operations or private methods.
This class is not supposed to be used outside revset.py since it only
wraps content that is used by baseset typed classes.
It only gets created by revset operations or private methods.
This class is not supposed to be used outside revset.py since it only
wraps content that is used by baseset typed classes.
It only gets created by revset operations or private methods.
This class are not supposed to be used outside revset.py since it only
wraps content that is used by baseset typed classes.
It only gets created by revset operations or private methods.
Method needed to propagate sort calls amongst lazy structures.
The generated list (stored in the object) is sorted.
If the generated list did not contain all elements from the generator, we
take care of that before sorting the list.
Performance Benchmarking:
$ time hg log -qr "0:: and 0:5"
...
real 0m3.665s
user 0m3.364s
sys 0m0.289s
$ time ./hg log -qr "0:: and 0:5"
...
real 0m0.492s
user 0m0.394s
sys 0m0.097s
This will not improve revsets like "::tip" but will do when that gets
intersected or substracted with another revset.
Performance Benchmarking:
$ time hg log -qr "draft() and ::tip"
...
real 0m3.961s
user 0m3.640s
sys 0m0.313s
$ time ./hg log -qr "draft() and ::tip"
...
real 0m1.080s
user 0m0.987s
sys 0m0.083s
Adds a only() revset that has two forms:
only(<set>) is equivalent to "::<set> - ::(heads() - heads(<set>::))"
only(<include>,<exclude>) is equivalent to "::<include> - ::<exclude>"
On a large repo, this implementation can process/traverse 50,000 revs in 0.7
seconds, versus 4.2 seconds using "::<include> - ::<exclude>".
This is useful for performing histedits on your branch:
hg histedit -r 'first(only(.))'
Or lifting branch foo off of branch bar:
hg rebase -d @ -s 'only(foo, bar)'
Or a variety of other uses.
This method will replace the creation of lazysets inside the revset methods.
Instead, the classes that handle lazy structures will create them based on
their current order.
$ time hg log -qr "first(0:tip or draft())"
...
real 0m1.032s
user 0m0.841s
sys 0m0.179s
$ time ./hg log -qr "first(0:tip or draft())"
...
real 0m0.378s
user 0m0.291s
sys 0m0.085s
Performance Benchmarking:
$ time hg log -qr "first(author(mpm) or branch(default))"
0:3a6a38229d41
real 0m3.875s
user 0m3.818s
sys 0m0.051s
$ time ./hg log -qr "first(author(mpm) or branch(default))"
0:3a6a38229d41
real 0m0.213s
user 0m0.174s
sys 0m0.038s
Instead of using getitem just reverse the revision list and get the first
'lim' elements. With classes like spanset which are easily reversible this
will work faster.
Performance Benchmarking:
$ time hg log -qr "last(all())"
...
real 0m0.569s
user 0m0.447s
sys 0m0.122s
$ time ./hg log -qr "last(all())"
...
real 0m0.215s
user 0m0.150s
sys 0m0.063s
A missing ancestor expression is any expression of the form (::x - ::y) or
equivalent. Such expressions are remarkably common, and so far have involved
multiple walks down the DAG, followed by a set difference operation.
With this patch, such expressions will be transformed into uses of the fast
algorithm at ancestor.missingancestor.
For a repository with over 600,000 revisions, perfrevset for '::tip - ::-10000'
returns:
Before: ! wall 3.999575 comb 4.000000 user 3.910000 sys 0.090000 (best of 3)
After: ! wall 0.132423 comb 0.130000 user 0.130000 sys 0.000000 (best of 75)
Performance Benchmarking:
$ time hg log -qr "first(matching(0))"
0:3a6a38229d41
real 0m2.213s
user 0m2.149s
sys 0m0.055s
$ time ./hg log -qr "first(matching(0))"
0:3a6a38229d41
real 0m0.177s
user 0m0.137s
sys 0m0.038s
Performance Benchmarking:
$ time hg log -qr "first(file(README))"
0:3a6a38229d41
real 0m2.234s
user 0m2.180s
sys 0m0.044s
$ time ./hg log -qr "first(file(README))"
0:3a6a38229d41
real 0m0.172s
user 0m0.129s
sys 0m0.042s
This improves the performance of the revsets 'adds' 'modifies' and 'removes'
Performance benchmarking:
$ time hg log -qr "first(adds(README))"
0:3a6a38229d41
real 0m2.279s
user 0m2.222s
sys 0m0.053s
$ time ./hg log -qr "first(adds(README))"
0:3a6a38229d41
real 0m0.172s
user 0m0.131s
sys 0m0.041s
$ time hg log -qr "first(modifies(README))"
1:692bee203f23
real 0m2.292s
user 0m2.227s
sys 0m0.061s
$ time ./hg log -qr "first(modifies(README))"
1:692bee203f23
real 0m0.178s
user 0m0.130s
sys 0m0.038s
$ time hg log -qr "first(removes(README))"
2379:f24de2acd560
real 0m2.297s
user 0m2.235s
sys 0m0.058s
$ time ./hg log -qr "first(removes(README))"
2379:f24de2acd560
real 0m0.975s
user 0m0.797s
sys 0m0.056s
Performance Benchmarking:
$ time hg log -qr "first(public())"
...
real 0m1.184s
user 0m1.051s
sys 0m0.130s
$ time ./hg log -qr "first(public())"
...
real 0m0.548s
user 0m0.427s
sys 0m0.118s
Performance benchmarking:
$ time hg log -qr "first(merge())"
102:1634643e0cd8
real 0m0.276s
user 0m0.208s
sys 0m0.047s
$ time ./hg log -qr "first(merge())"
102:1634643e0cd8
real 0m0.192s
user 0m0.154s
sys 0m0.027s
Performance benchmarking:
$ time hg log -qr "first(grep(hg))"
0:3a6a38229d41
real 0m2.214s
user 0m2.163s
sys 0m0.045s
$ time ./hg log -qr "first(grep(hg))"
0:3a6a38229d41
real 0m0.211s
user 0m0.146s
sys 0m0.035s
Performance benchmarking:
$ time hg log -qr "first(desc(hg))"
changeset: 0:3a6a38229d41
real 0m2.210s
user 0m2.158s
sys 0m0.049s
$ time ./hg log -qr "first(desc(hg))"
changeset: 0:3a6a38229d41
real 0m0.171s
user 0m0.131s
sys 0m0.035s
Performance Benchmarking:
$ time hg log -qr "first(date(05/03/2005))"
0:3a6a38229d41
real 0m3.157s
user 0m2.994s
sys 0m0.087s
$ time ./hg log -qr "first(date(05/03/2005))"
0:3a6a38229d41
real 0m0.509s
user 0m0.289s
sys 0m0.070s
Performance benchmarking:
$ time hg log -qr "first(author(mpm))"
0:3a6a38229d41
real 0m3.486s
user 0m3.317s
sys 0m0.077s
$ time ./hg log -qr "first(author(mpm))"
0:3a6a38229d41
real 0m0.551s
user 0m0.295s
sys 0m0.072s
Performance benchmarking:
$ time hg log -qr "first(keyword(changeset))"
0:3a6a38229d41
real 0m3.466s
user 0m3.345s
sys 0m0.072s
$ time ./hg log -qr "first(keyword(changeset))"
0:3a6a38229d41
real 0m0.365s
user 0m0.199s
sys 0m0.083s
Performance benchmarking:
$ time hg log -qr "first(branch(default))"
0:3a6a38229d41
real 0m3.130s
user 0m3.025s
sys 0m0.074s
$ time ./hg log -qr "first(branch(default))"
0:3a6a38229d41
real 0m0.300s
user 0m0.198s
sys 0m0.069s
Performance Benchmarking:
$ time hg log -l1 -qr "branch(default)"
0:3a6a38229d41
real 0m3.366s
user 0m3.217s
sys 0m0.095s
$ time ./hg log -l1 -qr "branch(default)"
0:3a6a38229d41
real 0m0.389s
user 0m0.199s
sys 0m0.061s
This class allows us to return values from large revsets as soon as they are
computed instead of having to wait for the entire revset to be calculated.
Added set caching to the baseset class. It lazily builds the set whenever it's
needed and keeps a reference which is returned when the set is requested
instead of being built again.
Before this patch, online help of "adds()", "contains()", "filelog()",
"file()", "modifies()" and "removes()" predicates doesn't explain
about how the pattern without explicit kind like "glob:" is treated,
even though each predicates treat it differently:
- as "relpath:" by "adds()", "modifies()" and "removes()"
- as "glob:" by "file()"
- as special by "contains()" and "filelog()"
- be relative to cwd, and
- match against a file exactly
("relpath:" matches also against a directory)
This may confuse users.
This patch adds explanation about the pattern without explicit kind
to these predicates.
Before this patch, revset predicate "filelog()" uses "match.files()"
to get filename also for the pattern without explicit kind.
But in such case, only canonicalization of relative path is required,
and other initializations of "match" object including regexp
compilation are meaningless.
This patch uses "pathutil.canonpath()" directly for "filelog()"
pattern without explicit kind like "glob:", for efficiency.
This patch also does below as a part of introducing "canonpath()":
- move location of "matchmod.match()" invocation, because "m" is no
more used in "if not matchmod.patkind(pat)" code path
- omit passing "default" argument to "matchmod.match()", because
"pat" should have explicit kind of pattern in this code path
This patch avoids the loop for "match.files()" having always one
element in revset predicate "filelog()" for efficiency: "match" object
"m" is constructed with "[pat]" as "patterns" argument.
Before this patch, default kind of pattern for revset predicate
"contains()" is treated as the exact file path rooted at the root of
the repository. This decreases usability, because:
- all other predicates taking pattern argument (also "filelog()")
treat such pattern as the path rooted at the current working
directory
- "contains()" doesn't describe this difference in its help
- this difference may confuse users
for example, this prevents revset aliases from sharing same
argument between "contains()" and other predicates
This patch makes default kind of pattern for revset predicate
"contains()" be rooted at the current working directory.
This patch uses "pathutil.canonpath()" instead of creating "match"
object for efficiency.
This patch narrows scope of the variable "m" in the function for
revset predicate "contains()", because it is referred only in "else"
code path of "if not matchmod.patkind(pat)" examination.
parse() cannot be called at the same time because a parser object keeps its
states. This is no problem for command-line hg client, but it would cause
strange errors in multi-threaded hgweb.
Creating parser object is not too expensive.
original:
% python -m timeit -s 'from mercurial import revset' 'revset.parse("0::tip")'
100000 loops, best of 3: 11.3 usec per loop
thread-safe:
% python -m timeit -s 'from mercurial import revset' 'revset.parse("0::tip")'
100000 loops, best of 3: 13.1 usec per loop
Some changesets can be wrongly reported as matched by this predicate
due to searching in a string joined with spaces and not individually.
A test case added, which fails without this fix.
Change ancestor to accept 0 or more arguments. The greatest common ancestor of a
single changeset is that changeset. If passed no arguments, the empty list is
returned.
Before this patch, sub expression may return unexpected result, if it
is joined with another expression by "or":
- "^"/parentspec():
"R or R^1" is not equal to "R^1 or R". the former returns only "R".
- "~"/ancestorspec():
"R or R~1" is not equal to "R~1 or R". the former returns only "R".
- ":"/rangeset():
"10 or (10 or 15):" is not equal to "(10 or 15): or 10". the
former returns only 10 and 15 or grater (11 to 14 are not
included).
In "or"-ed expression "A or B", the "subset" passed to evaluation of
"B" doesn't contain revisions gotten from evaluation of "A", for
efficiency.
In the other hand, "stringset()" fails to look corresponding revision
for specified string/symbol up, if "subset" doesn't contain that
revision.
So, predicates looking revisions up indirectly should evaluate sub
expressions of themselves not with passed "subset" but with "entire
revisions in the repository", to prevent "stringset()" from unexpected
failing to look symbols in them up.
But predicates in above example don't so. For example, in the case of
"R or R^1":
1. "R^1" is evaluated with "subset" containing revisions other than
"R", because "R" is already gotten by the former of "or"-ed
expressions
2. "parentspec()" evaluates "R" of "R^1" with such "subset"
3. "stringset()" fails to look "R" up, because "R" is not contained
in "subset"
4. so, evaluation of "R^1" returns no revision
This patch evaluates sub expressions for predicates above with "entire
revisions in the repository".
Now that changelog filtering is in place, it's become evident that
naming the filters according to the set of revs _not_ included in the
filtered changelog is confusing. This is especially evident in the
collaborative branch cache scheme.
This changes the names of the filters to reflect the revs that _are_
included:
hidden -> visible
unserved -> served
mutable -> immutable
impactable -> base
repoview.filteredrevs is renamed to filterrevs, so that callers read a
bit more sensibly, e.g.:
filterrevs('visible') # filter revs according to what's visible
This changesets add a new `divergent()` revset similar to `unstable()` and
`bumped()` one. Introducting this revset allows actuall test of the divergent
detection.
This replaces unnecessary parentrevs() calls with calculating min(parentset).
Even though the min operation is O(size of parentset), since parentrevs is
relatively expensive, this tradeoff almost always works in our favour. In a
repository with over 400,000 changesets, hg perfrevset "children(X)" takes:
Set X Before After
-1 0.51s 0.06s
-1000: 0.55s 0.08s
-10000: 0.56s 0.10s
-100000: 0.60s 0.25s
-100000:-99000 0.55s 0.19s
0:100000 0.60s 0.61s
all() 0.72s 0.74s
The relative performance is similar for Mercurial's own repository -- several
times faster in most cases, slightly slower for revisions close to 0 and
all().
When commiting to a repo with lots of history (>400000 changesets)
checking the results of revset.py:descendants against the subset takes
some time. Since the subset equals the entire changelog, the check
isn't necessary. Avoiding it in that case saves 0.1 seconds off of
a 1.78 second commit. A 6% gain.
We use the length of the subset to determine if it is the entire repo.
There is precedence for this in revset.py:stringset.
bundle() revset expression returns all changes that are present
in the bundle file (no matter whether they are in the repo or not).
Bundle file should be specified via -R option.
The old name was not very good for two reasons:
- caller does not care about "cache",
- set of revision returned may not be obsolete at all.
The new name was suggested by Kevin Bullock.
For changelog level filtering to take effect it need to be used for any
iteration. Some remaining use of `xrange` in revset code is replace by proper
use of `changelog.revs` or direct iteration over changelog.
We don't need to compute the set of all branchpoints. We can just check the
number of children that element of subset have. The extra work did not seems to
had particular performance impact but the code is simpler this way.
The branchpoint() function returns changesets with more than one child.
Eventually I would like to be able to see only branch points and merge
points in a graphical log to see the topology of the repository.
This changeset introduces caches on the `obsstore` that keeps track of sets of
revisions meaningful for obsolescence related logics. For now they are:
- obsolete: changesets used as precursors (and not public),
- extinct: obsolete changesets with osbolete descendants only,
- unstable: non obsolete changesets with obsolete ancestors.
The cache is accessed using the `getobscache(repo, '<set-name>')` function which
builds the cache on demand. The `clearobscaches(repo)` function takes care of
clearing the caches if any.
Caches are cleared when one of these events happens:
- a new marker is added,
- a new changeset is added,
- some changesets are made public,
- some public changesets are demoted to draft or secret.
Declaration of more sets is made easy because we will have to handle at least
two other "troubles" (latecomer and conflicting).
Caches are now used by revset and changectx. It is usually not much more
expensive to compute the whole set than to check the property of a few elements.
The performance boost is welcome in case we apply obsolescence logic on a lot of
revisions. This makes the feature usable!
"extinct" and "unstable" predicates use "obsolete" implementation
internally, but own predicate name should be used in error messages of
them instead of "obsolete".
This predicate is used to find csets that were created because of a graft,
transplant or rebase --keep. An optional revset can be supplied, in which case
the result will be limited to those copies which specified one of the revs as
the source for the command.
hg log -r destination() # csets copied from anywhere
hg log -r destination(branch(default)) # all csets copied from default
hg log -r origin(x) or destination(origin(x)) # all instances of x
This predicate will follow a cset through different types of copies. Given a
repo with a cset 'S' that is grafted to create G(S), which itself is
transplanted to become T(G(S)):
o-S
/
o-o-G(S)
\
o-T(G(S))
hg log -r destination( S ) # { G(S), T(G(S)) }
hg log -r destination( G(S) ) # { T(G(S)) }
The implementation differences between the three different copy commands (see
the origin() predicate) are not intentionally exposed, however if the
transplant was a graft instead:
hg log -r destination( G(S) ) # {}
because the 'extra' field in G(G(S)) is S, not G(S). The implementation cannot
correct this by following sources before G(S) and then select the csets that
reference those sources because the cset provided to the predicate would also
end up selected. If there were more than two copies, sources of the argument
would also get selected.
Note that the convert extension does not currently update the 'extra' map in its
destination csets, and therefore copies made prior to the convert will be
missing from the resulting set.
Instead of the loop over 'subset', the following almost works, but does not
select a transplant of a transplant. That is, 'destination(S)' will only
select T(S).
dests = set([r for r in subset if _getrevsource(repo, r) in args])
This predicate is used to find the original source of csets created by a graft,
transplant or rebase --keep. If a copied cset is itself copied, only the
source of the original copy is selected.
hg log -r origin() # all src csets, anywhere
hg log -r origin(branch(default)) # all srcs of copies on default
By following through different types of copy commands and only selecting the
original cset, the implementation differences between the copy commands are
hidden. (A graft of a graft preserves the original source in its 'extra' map,
while transplant and rebase use the immediate source specified for the
command).
Given a repo with a cset S that is grafted to create G(S), which itself is
grafted to become G(G(S))
o-S
/
o-o-G(S)
\
o-G(G(S))
hg log -r origin( G(S) ) # { S }
hg log -r origin( G(G(S)) ) # { S }, NOT { G(S) }
Even if the last graft were a transplant
hg log -r origin( T(G(S)) ) # { S }
A rebase without --keep essentially strips the source, so providing the cset
that results to this predicate will yield an empty set.
Note that the convert extension does not currently update the 'extra' map in
its destination csets, and therefore copies made prior to the convert will be
unable to find their source.
`extinct` changesets are obsolete changesets with obsolete descendants only. They
are of no interest anymore and can be:
- exclude from exchange
- hidden to the user in most situation
- safely garbage collected
This changeset just allows mercurial to detect them.
The implementation is a bit naive, as for unstable changesets. We better use a
simple revset query and a cache, but simple version comes first.
An unstable changeset is a changeset *not* obsolete but with some obsolete
ancestors.
The current logic to decide if a changeset is unstable is naive and very
inefficient. A better solution is to compute the set of unstable changeset with
a simple revset and to cache the result. But this require cache invalidation
logic. Simpler version goes first.
The new "diff" field lets you use the matching revset keyword to find revisions
that apply the same change as the selected revisions.
The match must be exact (i.e. same additions, same deletions, same modified
lines and same change context, same file renames and copies).
Two revisions matching their diff must also match their files. Thus, to match
the diff much faster we will always check that the 'files' match first, and only
then check that the 'diff' matches as well.
graft, transplant and rebase all embed a different type of source marker in
extra, and each with a different name. The current implementation of each is
such that there will never be more than one of these markers on a node.
Note that the rebase marker can only be resolved if the source is
still present, which excludes the typical rebase usage (without
--keep) from consideration (unless the resulting bundle in
strip-backup is overlayed). There probably isn't any reason to use
rebase --keep as a substitute for transplant or graft at this point,
but maybe there was at one point and there are even a few rebases in
the hg repo, so it may be of historical interest.
This selects changesets added because of repo conversions. For example
hg log -r "converted()" # all csets created by a convertion
hg log -r "converted(rev)" # the cset converted from rev in the src repo
The converted(rev) form is analogous to remote(id), where the remote repo is
the source of the conversion. This can be useful for cross referencing an old
repository into the current one.
The source revision may be the short changeset hash or the full hash from the
source repository. The local identifier isn't useful. An interesting
ramification of this is if a short revision is specified, it may cause more
than one changeset to be selected. (e.g. converted(6) matches changesets with
a convert_revision field of 6e..e and 67..0)
The convert.hg.saverev option must have been specified when converting the hg
source repository for this to work. The other sources automatically embed the
converted marker.
Caching has no performance effect on the revset aliases which triggered
the recent recursive evaluation bug. I wrote it not to feel bad about
expanding several times the same complicated expression.
If the string provided to the 'tag' predicate starts with 're:', the rest
of the string will be treated as a regular expression and matched against
all tags in the repository.
There is a slight backwards-compatibility problem for people who actually
have tags that start with 're:'. As a workaround, these tags can be matched
using a 'literal:' prefix.
If no tags match the pattern, an error is raised. This matches the behaviour
of the previous exact-match code.
There have been quite a few places where we pop elements off the
front of a list. This can turn O(n) algorithms into something more
like O(n**2). Python has provided a deque type that can do this
efficiently since at least 2.4.
As an example of the difference a deque can make, it improves
perfancestors performance on a Linux repo from 0.50 seconds to 0.36.
The alias expansion code it changed from:
1- Get replacement tree
2- Substitute arguments in the replacement tree
3- Expand the replacement tree again
into:
1- Get the replacement tree
2- Expand the replacement tree
3- Expand the arguments
4- Substitute the expanded arguments in the replacement tree
and fixes cases like:
[revsetalias]
level1($1, $2) = $1 or $2
level2($1, $2) = level1($2, $1)
$ hg log -r "level2(level1(1, 2), 3)"
where the original version incorrectly aborted on infinite expansion
error, because it was confusing the expanded aliases with their
arguments.
The current revset alias expansion code works like:
1- Get the replacement tree
2- Substitute the variables in the replacement tree
3- Expand the replacement tree
It makes it easy to substitute alias arguments because the placeholders
are always replaced before the updated replacement tree is expanded
again. Unfortunately, to fix other alias expansion issues, we need to
reorder the sequence and delay the argument substitution. To solve this,
a new "virtual" construct called _aliasarg() is introduced and injected
when parsing the aliases definitions. Only _aliasarg() will be
substituted in the argument expansion phase instead of all regular
matching string. We also check user inputs do not contain unexpected
_aliasarg() instances to avoid argument injections.
The original motivation was changectx.phase() had special logic to
correctly lookup in repo._phaserev, including invalidating it when
necessary. And at other places, repo._phaserev was accessed directly.
This led to the discovery that phases state including _phaseroots,
_phaserev and _dirtyphase was manipulated in localrepository.py,
phases.py, repair.py, etc. phasecache helps encapsulating that.
This patch replaces all phase state in localrepo with phasecache and
adjust related code except for advance/retractboundary() in phases.
These still access to phasecache internals directly. This will be
addressed in a followup.
current description of 'matching()' revset predicate can't be
formatted well on "hg help revset" output.
each descriptions for revset predicates (or something like them) are
split-ed into lines, and spaces on left side of them are stripped
before minirst processing. so, bullet list can't be nested.
this patch just flattens description of 'matching()' predicate to be
formatted well.
The fast path was triggered if the argument was not like "type:value", with
type a known pattern type. This is wrong for several reasons:
- path:value is valid for the fast path
- '*' is interpreted as a glob by default and is not valid for fast path
Fast path detection is now done after the pattern is parsed, and the normalized
path is extracted for direct comparison. All this seems a bit complicated, it
is tempting to drop the fast path completely. Also, the hasfile() revset does
something similar (only check .files()), without a fast path. If the fast path
is really that efficient maybe it should be used there too.
Note that:
$ log 'modifies("set:modified()")'
is different from:
$ log 'modifies("*")'
because of the usual merge ctx.files()/status(ctx.p1(), ctx) differences.
Reported by Steffen Eichenberg <steffen.eichenberg@msg-gillardon.de>
match
This patch sorts the fields that are passed to the matching function so that it
always starts by matching those fields that take less time to match.
Not all fields take the same amount of time to match. I've done several
measurements running the following command:
hg --time log -r "matching(1, field)"
on the mercurial repository, and where 'field' was each one of the fields
accepted by match. In order to avoid the print overhead (which could be
different for different fields, given the different number of matches) I used a
modified version of the matching() function which always returns no matches.
These tests showed that different fields take wildly different amounts of time
to match. Particulary the substate field takes up to 25 seconds to match on my
machine, compared to the 0.3 seconds that takes to match the phase field or the
2 seconds (approx) that takes to match most fields. With this patch, matching
both the phase and the substate of a revision takes the same amount of time as
matching the phase.
The field match order introduced by this patch is as follows:
phase, parents, user, date, branch, summary, files, description, substate
An extra nice thing about this patch is that it makes the match time stable.
Rather than getting all the fields that are being matches from every revision
and then comparing them to those of the target revision, compare each field one
by one and stop the match as soon as there is a match failure.
This can greatly reduce the match time when matching multiple fields.
The impact on match time when matching a single field seems negligible
(according to my measurements).
Apparently the "import x as xy" doesn't manage to update xy in the
current scope's dictionary after load, which causes nodemod.nullrev to do a huge amount of demandload magic in the inner loop.
This solves a similar problem than the previous --follow/--rev patch. This time
we need changelog.ancestors()/descendants() filtering on first parent.
Duplicating the code looked better than introducing keyword arguments. Besides,
the ancestors() version was already implemented in follow() revset.
This keyword can be used to find revisions that "match" one or more fields of a
given set of revisions.
A revision matches another if all the selected fields (description, author,
branch, date, files, phase, parents, substate, user, summary and/or metadata)
match the corresponding values of those fields on the source revision.
By default this keyword looks for revisions that whose metadata match
(description, author and date) making it ideal to look for duplicate revisions.
matching takes 2 arguments (the second being optional):
1.- rev: a revset represeting a _single_ revision (e.g. tip, ., p1(.), etc)
2.- [field(s) to match]: an optional string containing the field or fields
(separated by spaces) to match.
Valid fields are most regular context fields and some special fields:
* regular fields:
- description, author, branch, date, files, phase, parents,
substate, user.
Note that author and user are synonyms.
* special fields: summary, metadata.
- summary: matches the first line of the description.
- metatadata: It is equivalent to matching 'description user date'
(i.e. it matches the main metadata fields).
Examples:
1.- Look for revisions with the same metadata (author, description and date)
as the 11th revision:
hg log -r "matching(11)"
2.- Look for revisions with the same description as the 11th revision:
hg log -r "matching(11, description)"
3.- Look for revisions with the same 'summary' (i.e. same first line on their
description) as the 11th revision:
hg log -r "matching(11, summary)"
4.- Look for revisions with the same author as the current revision:
hg log -r "matching(., author)"
You could use 'user' rather than 'author' to get the same result.
5.- Look for revisions with the same description _AND_ author as the tip of the
repository:
hg log -r "matching(tip, 'author description')"
6.- Look for revisions touching the same files as the parent of the tip of the
repository
hg log -r "matching(p1(tip), files)"
7.- Look for revisions whose subrepos are on the same state as the tip of the
repository or its parent
hg log -r "matching(p1(tip):tip, substate)"
8.- Look for revisions whose author and subrepo states both match those of any
of the revisions on the stable branch:
hg log -r "matching(branch(stable), 'author substate')"
When _followfirst() revset was introduced it seemed to be the sole user of such
an argument, so filectx.ancestors() was duplicated and modified instead. It now
appears this argument could be used when computing the set of files to be
considered when --patch or --stat are passed along with --follow FILE.
This subtlety is not documented yet but:
- pats/--include/--exclude filesets are evaluated against the working directory
- --rev filesets are reevaluated against every revisions
log --graph --follow-first FILE cannot be compared with the regular version
because it never worked: --follow-first is not taken in account in
walkchangerevs() fast path and is explicitely bypassed in FILE case in
walkchangerevs() nested iterate() function.
The filtering logic of match objects cannot be reproduced with the existing
revsets as it operates at changeset files level. A changeset touching "a" and
"b" is matched by "-I a -X b" but not by "file(a) and not file(b)".
To solve this, a new internal "_matchfiles(...)" revset is introduced. It works
like "file(x)" but accepts more than one argument and its arguments are
prefixed with "p:", "i:" and "x:" to be used as patterns, include patterns or
exclude patterns respectively.
The _matchfiles revset is kept private for now:
- There are probably smarter ways to pass the arguments in a user-friendly way
- A "rev:" argument is likely appear at some point to emulate log command
behaviour with regard to filesets: they are evaluated for the parent revision
and applied everywhere instead of being reevaluated for each revision.
current documentation for 'remote()' predicate is wrong about
specification of parameters.
there are 3 patterns:
# of
param: id: remote:
- 0 current branch "defult" remote
- 1 specified "defult" remote
- 2 specified specified
The revset aliases expansion worked like:
expr = "some revset"
for alias in aliases:
expr = alias.process(expr)
where "process" was replacing the alias with its *unexpanded* substitution,
recursively. So it only worked when aliases were applied in proper dependency
order.
This patch rewrites the expansion process so all aliases are expanded
recursively at every tree level, after parent alias rewriting and variable
expansion.
Previously we always included '.', which may not touch a file.
Instead, find the file revision present in '.' and add its linkrev.
This matches the results of 'hg log --follow file'.
The large or-expressions we used to build required a substantial
amount of subset filtering in orset() which was inefficient. Instead we build a
single string which we process in one go with a special internal predicate.
Simplifies client logic in multiple places since it encapsulates the
computation of the common and, more importantly, the missing node lists.
This also allows an upcomping patch to communicate precomputed versions of
these lists to clients.
some problematic encoding (e.g.: cp932) uses ASCII alphabet characters
in byte sequence of multi byte characters.
"str.lower()" on such byte sequence may treat distinct characters as
same one, and cause unexpected log matching.
this patch uses "encoding.lower()" instead of "str.lower()" to
normalize strings for compare.
This allows folding external revsets or lists of revsets into a revset
expression. Revsets are pre-parsed for validity so that syntax errors
don't escape.
This patch adds two new revset descriptions:
- 'goods': the list of topologicaly-good csets:
- if good csets are topologically before bad csets, yields '::good'
- else, yields 'good::'
- and conversely for 'bads'
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@anciens.enib.fr>
The 'ignored' changesets are outside the bisection range, but are
changesets that may have an impact on the outcome of the bisection.
For example, in case there's a merge between the good and bad csets,
but the branch-point is out of the bisection range, and the issue
originates from this branch, the branch will not be visited by bisect
and bisect will find that the culprit cset is the merge.
So, the 'ignored' set is equivalent to:
( ( ::bisect(bad) - ::bisect(good) )
| ( ::bisect(good) - ::bisect(bad) ) )
- bisect(range)
- all ancestors of bad csets that are not ancestors of good csets, or
- all ancestors of good csets that are not ancestors of bad csets
- but that are not in the bisection range.
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@anciens.enib.fr>
Use repo.set() wherever possible, instead of locally trying to
reproduce complex graph computations.
'pruned' now means 'all csets that will no longer be visited by the
bisection'. The change is done is this very patch instead of its own
dedicated one becasue the code changes all over the place, and the
previous 'pruned' code was totally rewritten by the cleanup, so it
was easier to just change the behavior at the same time.
The previous series went in too fast for this cleanup pass to be
included, so here it is. ;-)
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@anciens.enib.fr>
The 'untested' set is made of changesets that are in the bisection range
but for which the status is still unknown, and that can later be used to
further decide on the bisection outcome.
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@anciens.enib.fr>
The 'pruned' set is made of changesets that did participate to
the bisection. They are made of
- all good changesets
- all bad changsets
- all skipped changesets, provided they are in the bisection range
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@anciens.enib.fr>
The 'range' set is made of all changesets that make the bisection
range, that is
- csets that are ancestors of bad csets and descendants of good csets
or
- csets that are ancestors of good csets and descendants of bad csets
That is, roughly equivalent of:
bisect(good)::bisect(bad) | bisect(bad)::bisect(good)
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@anciens.enib.fr>
Computing the ranges of csets in the bisection belongs to the hbisect
code. This allows for reusing the status computation from many places,
not only the revset code, but also to later display the bisection status
of a cset...
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@anciens.enib.fr>
Rename the 'bisected' keyword to simply 'bisect'.
Still accept the old name, but no longer advertise it.
As discussed with Matt on IRC.
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@anciens.enib.fr>
Given an operator ^ that's either postfix or infix and an operator :
that's either prefix or infix, the parser can't figure out the right
thing to do. So we rewrite the expression to be sensible in the optimizer.
The existing code seemed to have incorrect assumptions about how parameter
lists are represented by the parser.
Now the match and replace functions have been merged and simplified by using
getlist().
Like keyword(), but does not search in filenames and users.
No grepdesc() or descgrep() added, because it might be bad to introduce
grepfoo() versions of too many string searches.
- Handle 'subset' argument
- Stop returning the null rev from p1 and parents, as in the non-dirstate case
- Order parents as in the non-dirstate case (ascending revs)
This patch makes the 'set' argument to revset function parents() optional.
Like p1() and p2(), if no argument is given, returns the parent(s) of the
working directory.
Morally equivalent to 'p1()+p2()', as expected.
This patch makes the 'set' argument to revset functions p1() and p2()
optional. If no argument is given, p1() and p2() return the first or second
parent of the working directory.
If the working directory is not an in-progress merge (no 2nd parent), p2()
returns the empty set. For a checkout of the null changeset, both p1() and
p2() return the empty set.