Performance Benchmarking:
$ time hg log -qr "0:: and 0:5"
...
real 0m3.665s
user 0m3.364s
sys 0m0.289s
$ time ./hg log -qr "0:: and 0:5"
...
real 0m0.492s
user 0m0.394s
sys 0m0.097s
This will not improve revsets like "::tip" but will do when that gets
intersected or substracted with another revset.
Performance Benchmarking:
$ time hg log -qr "draft() and ::tip"
...
real 0m3.961s
user 0m3.640s
sys 0m0.313s
$ time ./hg log -qr "draft() and ::tip"
...
real 0m1.080s
user 0m0.987s
sys 0m0.083s
Adds a only() revset that has two forms:
only(<set>) is equivalent to "::<set> - ::(heads() - heads(<set>::))"
only(<include>,<exclude>) is equivalent to "::<include> - ::<exclude>"
On a large repo, this implementation can process/traverse 50,000 revs in 0.7
seconds, versus 4.2 seconds using "::<include> - ::<exclude>".
This is useful for performing histedits on your branch:
hg histedit -r 'first(only(.))'
Or lifting branch foo off of branch bar:
hg rebase -d @ -s 'only(foo, bar)'
Or a variety of other uses.
This method will replace the creation of lazysets inside the revset methods.
Instead, the classes that handle lazy structures will create them based on
their current order.
$ time hg log -qr "first(0:tip or draft())"
...
real 0m1.032s
user 0m0.841s
sys 0m0.179s
$ time ./hg log -qr "first(0:tip or draft())"
...
real 0m0.378s
user 0m0.291s
sys 0m0.085s
Performance Benchmarking:
$ time hg log -qr "first(author(mpm) or branch(default))"
0:3a6a38229d41
real 0m3.875s
user 0m3.818s
sys 0m0.051s
$ time ./hg log -qr "first(author(mpm) or branch(default))"
0:3a6a38229d41
real 0m0.213s
user 0m0.174s
sys 0m0.038s
Instead of using getitem just reverse the revision list and get the first
'lim' elements. With classes like spanset which are easily reversible this
will work faster.
Performance Benchmarking:
$ time hg log -qr "last(all())"
...
real 0m0.569s
user 0m0.447s
sys 0m0.122s
$ time ./hg log -qr "last(all())"
...
real 0m0.215s
user 0m0.150s
sys 0m0.063s
A missing ancestor expression is any expression of the form (::x - ::y) or
equivalent. Such expressions are remarkably common, and so far have involved
multiple walks down the DAG, followed by a set difference operation.
With this patch, such expressions will be transformed into uses of the fast
algorithm at ancestor.missingancestor.
For a repository with over 600,000 revisions, perfrevset for '::tip - ::-10000'
returns:
Before: ! wall 3.999575 comb 4.000000 user 3.910000 sys 0.090000 (best of 3)
After: ! wall 0.132423 comb 0.130000 user 0.130000 sys 0.000000 (best of 75)
Performance Benchmarking:
$ time hg log -qr "first(matching(0))"
0:3a6a38229d41
real 0m2.213s
user 0m2.149s
sys 0m0.055s
$ time ./hg log -qr "first(matching(0))"
0:3a6a38229d41
real 0m0.177s
user 0m0.137s
sys 0m0.038s
Performance Benchmarking:
$ time hg log -qr "first(file(README))"
0:3a6a38229d41
real 0m2.234s
user 0m2.180s
sys 0m0.044s
$ time ./hg log -qr "first(file(README))"
0:3a6a38229d41
real 0m0.172s
user 0m0.129s
sys 0m0.042s
This improves the performance of the revsets 'adds' 'modifies' and 'removes'
Performance benchmarking:
$ time hg log -qr "first(adds(README))"
0:3a6a38229d41
real 0m2.279s
user 0m2.222s
sys 0m0.053s
$ time ./hg log -qr "first(adds(README))"
0:3a6a38229d41
real 0m0.172s
user 0m0.131s
sys 0m0.041s
$ time hg log -qr "first(modifies(README))"
1:692bee203f23
real 0m2.292s
user 0m2.227s
sys 0m0.061s
$ time ./hg log -qr "first(modifies(README))"
1:692bee203f23
real 0m0.178s
user 0m0.130s
sys 0m0.038s
$ time hg log -qr "first(removes(README))"
2379:f24de2acd560
real 0m2.297s
user 0m2.235s
sys 0m0.058s
$ time ./hg log -qr "first(removes(README))"
2379:f24de2acd560
real 0m0.975s
user 0m0.797s
sys 0m0.056s
Performance Benchmarking:
$ time hg log -qr "first(public())"
...
real 0m1.184s
user 0m1.051s
sys 0m0.130s
$ time ./hg log -qr "first(public())"
...
real 0m0.548s
user 0m0.427s
sys 0m0.118s
Performance benchmarking:
$ time hg log -qr "first(merge())"
102:1634643e0cd8
real 0m0.276s
user 0m0.208s
sys 0m0.047s
$ time ./hg log -qr "first(merge())"
102:1634643e0cd8
real 0m0.192s
user 0m0.154s
sys 0m0.027s
Performance benchmarking:
$ time hg log -qr "first(grep(hg))"
0:3a6a38229d41
real 0m2.214s
user 0m2.163s
sys 0m0.045s
$ time ./hg log -qr "first(grep(hg))"
0:3a6a38229d41
real 0m0.211s
user 0m0.146s
sys 0m0.035s
Performance benchmarking:
$ time hg log -qr "first(desc(hg))"
changeset: 0:3a6a38229d41
real 0m2.210s
user 0m2.158s
sys 0m0.049s
$ time ./hg log -qr "first(desc(hg))"
changeset: 0:3a6a38229d41
real 0m0.171s
user 0m0.131s
sys 0m0.035s
Performance Benchmarking:
$ time hg log -qr "first(date(05/03/2005))"
0:3a6a38229d41
real 0m3.157s
user 0m2.994s
sys 0m0.087s
$ time ./hg log -qr "first(date(05/03/2005))"
0:3a6a38229d41
real 0m0.509s
user 0m0.289s
sys 0m0.070s
Performance benchmarking:
$ time hg log -qr "first(author(mpm))"
0:3a6a38229d41
real 0m3.486s
user 0m3.317s
sys 0m0.077s
$ time ./hg log -qr "first(author(mpm))"
0:3a6a38229d41
real 0m0.551s
user 0m0.295s
sys 0m0.072s
Performance benchmarking:
$ time hg log -qr "first(keyword(changeset))"
0:3a6a38229d41
real 0m3.466s
user 0m3.345s
sys 0m0.072s
$ time ./hg log -qr "first(keyword(changeset))"
0:3a6a38229d41
real 0m0.365s
user 0m0.199s
sys 0m0.083s
Performance benchmarking:
$ time hg log -qr "first(branch(default))"
0:3a6a38229d41
real 0m3.130s
user 0m3.025s
sys 0m0.074s
$ time ./hg log -qr "first(branch(default))"
0:3a6a38229d41
real 0m0.300s
user 0m0.198s
sys 0m0.069s
Performance Benchmarking:
$ time hg log -l1 -qr "branch(default)"
0:3a6a38229d41
real 0m3.366s
user 0m3.217s
sys 0m0.095s
$ time ./hg log -l1 -qr "branch(default)"
0:3a6a38229d41
real 0m0.389s
user 0m0.199s
sys 0m0.061s
This class allows us to return values from large revsets as soon as they are
computed instead of having to wait for the entire revset to be calculated.
Added set caching to the baseset class. It lazily builds the set whenever it's
needed and keeps a reference which is returned when the set is requested
instead of being built again.
Before this patch, online help of "adds()", "contains()", "filelog()",
"file()", "modifies()" and "removes()" predicates doesn't explain
about how the pattern without explicit kind like "glob:" is treated,
even though each predicates treat it differently:
- as "relpath:" by "adds()", "modifies()" and "removes()"
- as "glob:" by "file()"
- as special by "contains()" and "filelog()"
- be relative to cwd, and
- match against a file exactly
("relpath:" matches also against a directory)
This may confuse users.
This patch adds explanation about the pattern without explicit kind
to these predicates.
Before this patch, revset predicate "filelog()" uses "match.files()"
to get filename also for the pattern without explicit kind.
But in such case, only canonicalization of relative path is required,
and other initializations of "match" object including regexp
compilation are meaningless.
This patch uses "pathutil.canonpath()" directly for "filelog()"
pattern without explicit kind like "glob:", for efficiency.
This patch also does below as a part of introducing "canonpath()":
- move location of "matchmod.match()" invocation, because "m" is no
more used in "if not matchmod.patkind(pat)" code path
- omit passing "default" argument to "matchmod.match()", because
"pat" should have explicit kind of pattern in this code path
This patch avoids the loop for "match.files()" having always one
element in revset predicate "filelog()" for efficiency: "match" object
"m" is constructed with "[pat]" as "patterns" argument.
Before this patch, default kind of pattern for revset predicate
"contains()" is treated as the exact file path rooted at the root of
the repository. This decreases usability, because:
- all other predicates taking pattern argument (also "filelog()")
treat such pattern as the path rooted at the current working
directory
- "contains()" doesn't describe this difference in its help
- this difference may confuse users
for example, this prevents revset aliases from sharing same
argument between "contains()" and other predicates
This patch makes default kind of pattern for revset predicate
"contains()" be rooted at the current working directory.
This patch uses "pathutil.canonpath()" instead of creating "match"
object for efficiency.
This patch narrows scope of the variable "m" in the function for
revset predicate "contains()", because it is referred only in "else"
code path of "if not matchmod.patkind(pat)" examination.
parse() cannot be called at the same time because a parser object keeps its
states. This is no problem for command-line hg client, but it would cause
strange errors in multi-threaded hgweb.
Creating parser object is not too expensive.
original:
% python -m timeit -s 'from mercurial import revset' 'revset.parse("0::tip")'
100000 loops, best of 3: 11.3 usec per loop
thread-safe:
% python -m timeit -s 'from mercurial import revset' 'revset.parse("0::tip")'
100000 loops, best of 3: 13.1 usec per loop
Some changesets can be wrongly reported as matched by this predicate
due to searching in a string joined with spaces and not individually.
A test case added, which fails without this fix.
Change ancestor to accept 0 or more arguments. The greatest common ancestor of a
single changeset is that changeset. If passed no arguments, the empty list is
returned.
Before this patch, sub expression may return unexpected result, if it
is joined with another expression by "or":
- "^"/parentspec():
"R or R^1" is not equal to "R^1 or R". the former returns only "R".
- "~"/ancestorspec():
"R or R~1" is not equal to "R~1 or R". the former returns only "R".
- ":"/rangeset():
"10 or (10 or 15):" is not equal to "(10 or 15): or 10". the
former returns only 10 and 15 or grater (11 to 14 are not
included).
In "or"-ed expression "A or B", the "subset" passed to evaluation of
"B" doesn't contain revisions gotten from evaluation of "A", for
efficiency.
In the other hand, "stringset()" fails to look corresponding revision
for specified string/symbol up, if "subset" doesn't contain that
revision.
So, predicates looking revisions up indirectly should evaluate sub
expressions of themselves not with passed "subset" but with "entire
revisions in the repository", to prevent "stringset()" from unexpected
failing to look symbols in them up.
But predicates in above example don't so. For example, in the case of
"R or R^1":
1. "R^1" is evaluated with "subset" containing revisions other than
"R", because "R" is already gotten by the former of "or"-ed
expressions
2. "parentspec()" evaluates "R" of "R^1" with such "subset"
3. "stringset()" fails to look "R" up, because "R" is not contained
in "subset"
4. so, evaluation of "R^1" returns no revision
This patch evaluates sub expressions for predicates above with "entire
revisions in the repository".
Now that changelog filtering is in place, it's become evident that
naming the filters according to the set of revs _not_ included in the
filtered changelog is confusing. This is especially evident in the
collaborative branch cache scheme.
This changes the names of the filters to reflect the revs that _are_
included:
hidden -> visible
unserved -> served
mutable -> immutable
impactable -> base
repoview.filteredrevs is renamed to filterrevs, so that callers read a
bit more sensibly, e.g.:
filterrevs('visible') # filter revs according to what's visible
This changesets add a new `divergent()` revset similar to `unstable()` and
`bumped()` one. Introducting this revset allows actuall test of the divergent
detection.
This replaces unnecessary parentrevs() calls with calculating min(parentset).
Even though the min operation is O(size of parentset), since parentrevs is
relatively expensive, this tradeoff almost always works in our favour. In a
repository with over 400,000 changesets, hg perfrevset "children(X)" takes:
Set X Before After
-1 0.51s 0.06s
-1000: 0.55s 0.08s
-10000: 0.56s 0.10s
-100000: 0.60s 0.25s
-100000:-99000 0.55s 0.19s
0:100000 0.60s 0.61s
all() 0.72s 0.74s
The relative performance is similar for Mercurial's own repository -- several
times faster in most cases, slightly slower for revisions close to 0 and
all().
When commiting to a repo with lots of history (>400000 changesets)
checking the results of revset.py:descendants against the subset takes
some time. Since the subset equals the entire changelog, the check
isn't necessary. Avoiding it in that case saves 0.1 seconds off of
a 1.78 second commit. A 6% gain.
We use the length of the subset to determine if it is the entire repo.
There is precedence for this in revset.py:stringset.
bundle() revset expression returns all changes that are present
in the bundle file (no matter whether they are in the repo or not).
Bundle file should be specified via -R option.
The old name was not very good for two reasons:
- caller does not care about "cache",
- set of revision returned may not be obsolete at all.
The new name was suggested by Kevin Bullock.
For changelog level filtering to take effect it need to be used for any
iteration. Some remaining use of `xrange` in revset code is replace by proper
use of `changelog.revs` or direct iteration over changelog.
We don't need to compute the set of all branchpoints. We can just check the
number of children that element of subset have. The extra work did not seems to
had particular performance impact but the code is simpler this way.
The branchpoint() function returns changesets with more than one child.
Eventually I would like to be able to see only branch points and merge
points in a graphical log to see the topology of the repository.