Before, a keyvalue node was processed by the last catch-all condition of
_optimize(). Therefore, topo.firstbranch=expr would bypass tree rewriting
and would crash if an expr wasn't trivial.
Before the error was caught at func() as an unknown identifier, and the
optimizer failed to detect the syntax error. This patch introduces getsymbol()
helper to ensure that a string is not allowed as a function name.
The ordering of 'x & head()' was broken in 329d82866742 (revset:
improve head revset performance, 2014-03-13). Presumably due to other
optimizations since then, undoing that change to fix the order does
not slow down the simple case of "hg log -r 'head()'" mentioned in
that commit. I see a small slowdown from ~0.16s to about ~0.19s with
'not 0 & head()', but I'd say it's worth it for the correct output.
This adds mostly broken tests that will be fixed by subsequent patches. We
generally don't do that, but this patch series would be hard to review
without a set of broken tests.
Note that some tests pass thanks to the reordering problem in optimize().
For instance, '2:0 & _intlist(0 1 2)' doesn't fail because it is rewritten
as '_intlist(0 1 2) & 2:0'.
Make it noop as before ddf6bfe09ab2. We could change it to an error, but
allowing empty key makes some sense for scripting that builds a key string
programmatically.
Sort revisions in reverse revision order but grouped by topographical branches.
Visualised as a graph, instead of:
o 4
|
| o 3
| |
| o 2
| |
o | 1
|/
o 0
revisions on a 'main' branch are emitted before 'side' branches:
o 4
|
o 1
|
| o 3
| |
| o 2
|/
o 0
where what constitutes a 'main' branch is configurable, so the sort could also
result in:
o 3
|
o 2
|
| o 4
| |
| o 1
|/
o 0
This sort was already available as an experimental option in the graphmod
module, from which it is now removed.
This sort is best used with hg log -G:
$ hg log -G "sort(all(), topo)"
9981d464ac53 fixed matching() to preserve the order of the input set, but
the test was incorrect. Given "A and B", "A" should be the input set to "B".
But thanks to our optimizer, the test expression was rewritten as
"(2 or 3 or 1) and matching(1 or 2 or 3)", therefore it was working well.
Since I'm going to fix the overall ordering issue, the test needs to be
adjusted to do the right thing.
Unlike range, dagrange has no inverted range (such as '10:0'). So there should
be no practical reason to keep dagrange as a function that forces its own
ordering.
No performance regression is spotted in contrib/base-revsets.txt.
Our invert() function was too clever to not take length into account. I could
fix the problem by appending '\xff' as a terminator (opposite to '\0'), but
it turned out to be slower than simple multi-pass sorting.
New implementation is pretty straightforward, which just calls sort() from the
last key. We can do that since Python sort() is guaranteed to be stable. It
doesn't sound nice to call sort() multiple times, but actually it is faster.
That's probably because we have fewer Python codes in hot loop, and can avoid
heavy string and list manipulation.
revset #0: sort(0:10000, 'branch')
0) 0.412753
1) 0.393254
revset #1: sort(0:10000, '-branch')
0) 0.455377
1) 0.389191 85%
revset #2: sort(0:10000, 'date')
0) 0.408082
1) 0.376332 92%
revset #3: sort(0:10000, '-date')
0) 0.406910
1) 0.380498 93%
revset #4: sort(0:10000, 'desc branch user date rev')
0) 0.542996
1) 0.486397 89%
revset #5: sort(0:10000, '-desc -branch -user -date -rev')
0) 0.965032
1) 0.518426 53%
Cpython and pypy have different way to build and order set, so the result of
list(myset) is different. We work around this by using the sorted version of the
data when displaying a list.
This get pypy closer to pass test-revset.t.
Since _parsealiasdefn() rejects unknown alias arguments, _checkaliasarg() is
unnecessary. New test is added to make sure unknown '$n' symbols are rejected.
In short, this patch moves the hack from tokenizedefn() to _relabelaliasargs(),
which is called after parsing. This change aims to eliminate tight dependency
on the revset tokenizer.
Before this patch, we had to rewrite an alias argument to a pseudo function:
"$1" -> "_aliasarg('$1')"
('symbol', '$1') -> ('function', ('symbol', '_aliasarg'), ('string', '$1'))
This was because the tokenizer must generate tokens that are syntactically
valid. By moving the process to the parsing phase, we can assign a unique tag
to an alias argument.
('symbol', '$1') -> ('_aliasarg', '$1')
Since new _aliasarg node never be generated from a user input, we no longer
have to verify a user input at findaliases(). The test for _aliasarg("$1") is
removed as it is syntactically valid and should pass the parsing phase.
I'm going to refactor the alias processing functions. We need '_aliasarg' tag
to limit the scope of the alias expansion, but it wasn't covered by the test.
This patch adds the test that should fail if '_aliasarg' were 'symbol'.
This is the first half of the second part of the "template alias" series. The
whole series will consist of the following parts:
1. make parsed template tree to be compatible with parser functions
(98f4895a3a97 and d7aaf6f4715f)
2. refactor alias processing to be less dependent on revset module
(1/2 in this series)
3. extract reusable component to parser module
4. clean up it
5. extend it to support template syntax
6. add debugging/testing functions of template aliases
7. add alias expansion routine to templater
A filteredset is heavily used, but it cannot provide a printable information
how given set is filtered because a condition is an arbitrary callable object.
This patch adds an optional "condrepr" object that is used only by repr(). To
minimize the maintaining/runtime overhead of "condrepr", its type is overloaded
as follows:
type example
-------- ---------------------------------
tuple ('<not %r>', other)
str '<branch closed>'
callable lambda: '<branch %r>' % sorted(b)
object other
This patch consists of changes below (these can't be applied
separately).
- replace revset.extpredicate by registrar.revsetpredicate in
extensions
- remove setup() on an instance named as revsetpredicate in
uisetup()/extsetup() of each extensions
registrar.revsetpredicate doesn't have setup() API.
- put new entry for revsetpredicate into extraloaders in dispatch
This causes implicit loading predicate functions at loading
extension.
This loading mechanism requires that an extension has an instance
named as revsetpredicate, and this is reason why
largefiles/__init__.py is also changed in this patch.
Before this patch, test-revset.t tests that all decorated revset
predicates are loaded by explicit setup() at once ("all or nothing").
Now, test-revset.t tests that any revset predicate isn't loaded at
failure of loading extension, because loading itself is executed by
dispatch and it can't be controlled on extension side.
Previously, revsets like 'X - Y' were translated to be 'X and not Y'. This can
be expensive, since if Y is a single commit then 'not Y' becomes a huge set and
sometimes the query optimizer doesn't account for it well.
This patch changes revsets to use the built in smartset minus operator, which is
often smarter than 'X and not Y'.
On a large repo this saves 2.2 seconds on rebase and histedit because "X:: - X"
becomes almost instant.
Relevant performance numbers from revsetbenchmark.py
revset #13: roots((tip~100::) - (tip~100::tip))
plain min max first last reverse rev..rst rev..ast sort sor..rst sor..ast
0) 0.001080 0.001107 0.001102 0.001118 0.001121 0.001114 0.001141 0.001123 0.001099 0.001123 0.001137
1) 0.000708 65% 0.000738 66% 0.000735 66% 0.000739 66% 0.000784 69% 0.000780 70% 0.000807 70% 0.000756 67% 0.000727 66% 0.000759 67% 0.000808 71%
revset #14: roots((0::) - (0::tip))
plain min max first last reverse rev..rst rev..ast sort sor..rst sor..ast
0) 0.131304 0.079168 0.133129 0.076560 0.048179 0.133349 0.049153 0.077097 0.129689 0.076212 0.048543
1) 0.065066 49% 0.036941 46% 0.066063 49% 0.034755 45% 0.048558 0.071091 53% 0.047679 0.034984 45% 0.064572 49% 0.035680 46% 0.048508
revset #22: (not public() - obsolete())
plain min max first last reverse rev..rst rev..ast sort sor..rst sor..ast
0) 0.000139 0.000133 0.000133 0.000138 0.000134 0.000155 0.000157 0.000152 0.000157 0.000156 0.000153
1) 0.000108 77% 0.000129 0.000129 0.000134 0.000132 0.000127 81% 0.000151 0.000147 0.000127 80% 0.000152 0.000149
revset #25: (20000::) - (20000)
plain min max first last reverse rev..rst rev..ast sort sor..rst sor..ast
0) 0.050560 0.045513 0.022593 0.043588 0.021909 0.045517 0.021822 0.044660 0.049740 0.044227 0.021819
1) 0.018614 36% 0.000171 0% 0.019659 87% 0.000168 0% 0.015543 70% 0.021069 46% 0.015623 71% 0.000180 0% 0.018658 37% 0.000186 0% 0.015750 72%
Internal _matchfiles() function can take bunch of arguments, which would
lead to a maximum recursion depth error. This patch avoids the excessive
stack use by flattening 'list' nodes beforehand.
Since getlist() no longer takes a nested 'list' nodes, _parsealiasdecl()
also needs to flatten argument list, "aliasname($1, $2, ...)".
Before this patch, test-revset.t fails on Solaris using ksh as default
"sh", because nested quoting below isn't acceptable for it.
+-----------------------------+ inner quoting
"`python -c "print '|'.join(['0:1'] * 500)"`"
+-------------------------------------------+ outer quoting
This patch does below for portability.
- omit outer quoting
This should be safe, because generated string contains no white
space character.
- use '+' instead of '|' (for safety)
'|' has special meaning for many shell, but '+' isn't (at least,
for ordinary ones).
Previous patch introduced 'revset.predicate' decorator to register
revset predicate function easily.
But it shouldn't be used in extension directly, because it registers
specified function immediately. Registration itself can't be restored,
even if extension loading fails after that.
Therefore, registration should be delayed until 'uisetup()' or so.
This patch uses 'extpredicate' decorator derived from 'delayregistrar'
to register predicate in extension easily.
This patch also tests whether 'registrar.delayregistrar' avoids
function registration if 'setup()' isn't invoked on it, because
'extpredicate' is the first user of it.
It's common for GUI or web frontend to fetch chunk of revisions per batch
size. Previously it was possible only if revisions were sorted by revision
number.
$ hg log -r 'limit({revspec} & :{last_known}, 101)'
So this patch introduces a general way to retrieve chunk of revisions after
skipping offset revisions.
$ hg log -r 'limit({revspec}, 100, {last_count})'
This is a dumb implementation. We can optimize it for baseset and spanset
later.
smartset sorting is lazy (so faster in some case) and better (informs that the
set is sorted allowing some optimisation). So we rely on it directly.
Some test output are updated because we now have more information (ordering).
The odd-range hack was introduced by 9afcfbca1710 for backward compatibility,
but it was too widely applied. I've checked cmdutil.revpair() at 1.6, and
found that ".:", ":0" and ":" are also handled as pairs. So let's enable the
hack only for "x:y", "x:", "y:" and ":".
test-revset.t is updated because "tip^::tip^ or tip^" shouldn't be taken as
an odd range. This patch adds "tip^:tip^" instead.
This patch is written for the default branch because parse() of the stable
branch lacks compatibility hack for "foo+bar" tag. If we want to mitigate the
issue in stable, we can add something like "and '::' in revs[0]".
An empty group expression "()" generates None in AST, so it should be tested
before destructuring a tuple.
"A | ()" is still evaluated to an error because I'm not sure whether "()"
represents an empty set or an empty expression (= a unit value). They are
identical in "or" operation, but they should be evaluated differently in
"and" operation.
expression empty set unit value
---------- --------- ----------
() {} A
A & () {} A
A | () A A
An empty group expression "()" generates None in AST, so the optimizer have
to test it before destructuring a tuple. The error message, "missing argument",
is somewhat obscure, but it should be better than crash.
The next patch will move the exceptional parsing of old-style ranges
to revset.tokenize(). This patch will allow us to see the result tree.
Note that the parsing result of '-a-b-c-' is incorrect at this changeset.
It will be fixed soon.
This is the simplest way to handle wdir() revision in revset. None didn't
work well because revset heavily depends on integer operations such as min(),
max(), sorted(), x:y, etc.
One downside is that we cannot do "wctx.rev() in set" because wctx.rev() is
still None. We could wrap the result set by wdirproxyset that translates None
to wdirrev, but it seems overengineered at this point.
result = getset(repo, subset, tree)
if 'wdir' in funcsused(tree):
result = wdirproxyset(result)
Test cases need the '(all() + wdir()) &' hack because we have yet to fix the
bootstrapping issue of null and wdir.
It will be used as an keyword argument.
Note that our "=" operator is left-associative. In general, the assignment
operator is right-associative, but we don't care because it isn't allowed to
chain "=" operations.
I noticed when I mistyped 'matching', that it suggested '_matchfiles' as well.
Rather than simply exclude names that start with '_', this excludes anything
without a docstring. That way, if it isn't in the help text, it isn't
suggested, such as 'wdir()'.
This reduces the stack depth from O(n) to O(log(n)). Therefore, repeated
-rREV options will never exceed the Python stack limit.
Currently it depends on private revset._combinesets() function. But at some
point, we'll be able to drop the old-style parser, and revrange() can be
completely rewritten without using _combinesets():
trees = [parse(s) for s in revs]
optimize(('or',) + trees) # combine trees and optimize at once
...
Blockers that prevent eliminating old-style parser:
- nullary ":" operator
- revrange(repo, [intrev, ...]), can be mapped to 'rev(%d)' ?
- revrange(repo, [binnode, ...]), should be banned ?
As seen in issue4565 and issue4624, GUI wrappers and automated scripts are
likely to generate a long query that just has numeric revisions joined by 'or'.
One reason why is that they allows users to choose arbitrary revisions from
a list. Because this use case isn't handled well by smartset, let's optimize
it to a plain old list.
Benchmarks:
1) reduce nesting of chained 'or' operations
2) optimize to a list (this patch)
revset #0: 0 + 1 + 2 + ... + 1000
1) wall 0.483341 comb 0.480000 user 0.480000 sys 0.000000 (best of 20)
2) wall 0.025393 comb 0.020000 user 0.020000 sys 0.000000 (best of 107)
revset #1: sort(0 + 1 + 2 + ... + 1000)
1) wall 0.035240 comb 0.040000 user 0.040000 sys 0.000000 (best of 100)
2) wall 0.026432 comb 0.030000 user 0.030000 sys 0.000000 (best of 102)
revset #2: first(0 + 1 + 2 + ... + 1000)
1) wall 0.028949 comb 0.030000 user 0.030000 sys 0.000000 (best of 100)
2) wall 0.025503 comb 0.030000 user 0.030000 sys 0.000000 (best of 106)
This reduces the stack depth of chained 'or' operations:
- from O(n) to O(1) at the parsing, alias expansion and optimization phases
- from O(n) to O(log(n)) at the evaluation phase
simplifyinfixops() must be applied immediately after the parsing phase.
Otherwise, alias expansion would crash by "maximum recursion depth exceeded"
error.
Test cases use 'x:y|y:z' instead of 'x|y' because I'm planning to optimize
'x|y' in a different way.
Benchmarks:
0) a2acce8dcd95
1) this patch
revset #0: 0 + 1 + 2 + ... + 200
0) wall 0.026347 comb 0.030000 user 0.030000 sys 0.000000 (best of 101)
1) wall 0.023858 comb 0.030000 user 0.030000 sys 0.000000 (best of 112)
revset #1: 0 + 1 + 2 + ... + 1000
0) maximum recursion depth exceeded
1) wall 0.483341 comb 0.480000 user 0.480000 sys 0.000000 (best of 20)
revset #2: sort(0 + 1 + 2 + ... + 200)
0) wall 0.013404 comb 0.010000 user 0.010000 sys 0.000000 (best of 196)
1) wall 0.006814 comb 0.010000 user 0.010000 sys 0.000000 (best of 375)
revset #3: sort(0 + 1 + 2 + ... + 1000)
0) maximum recursion depth exceeded
1) wall 0.035240 comb 0.040000 user 0.040000 sys 0.000000 (best of 100)
This warning exists to prevent git users from prematurely polluting
their namespace when trying out Mercurial. But for repos that already
have multiple branches, understanding what branches are is not
optional so we should just shut up.
This fixes the crash caused by "branch(null)" revset. No cache should be
necessary for nullrev because changelog.branchinfo(nullrev) does not involve
IO operation.
Note that the problem of "branch(wdir())" isn't addressed by this patch.
"wdir()" will raise TypeError in many places because of None. This is the
reason why "wdir()" is still experimental.
This patch partially backs out a9a86cbbc5b2 and adds an alternative workaround
to functions that evaluate "null" and "wdir()". Because the new workaround is
incomplete, "first(null)" and "min(null)" don't work as expected. But they were
not usable until 3.4 and "null" isn't commonly used, we can postpone a complete
fix for 3.5.
The issue4682 was caused because "branch(default)" is evaluated to
"<filteredset <fullreposet>>", keeping fullreposet magic. The next patch will
fix crash on "branch(null)", but without this patch, it would make
"null in <branch(default)>" be True, which means "children(branch(default))"
would return all revisions but merge (p2 != null).
I believe the right fix is to stop propagating fullreposet magic on filter(),
but it wouldn't fit to stable release. Also, we should discuss how to handle
"null" and "wdir()" in revset before.
The patch solves two issues:
1. id(unknown_full_hash) aborts, but id(unknown_short_hash) doesn't
2. id(40byte_tag_or_bookmark) returns tagged/bookmarked revision,
but id(non-40byte_tag_or_bookmark) doesn't
After the patch:
1. id(unknown_full_hash) doesn't abort
2. id(40byte_tag_or_bookmark) returns empty set
With "dash" (as "/bin/sh" on Debian GNU/Linux), command execution in
"ENV=val foo bar" style doesn't work as expect in test script files,
if "foo" is user-defined function: it works fine, if "foo" is existing
commands like "hg".
5acecb004820 introduced tests for HGPLAIN and HGPLAINEXCEPT into
test-revset.t, and all of them are in such style.
This patch doesn't:
- add explicit unsetting for HGPLAIN and HGPLAINEXCEPT
they are already introduced by 5acecb004820
- write assignment and exporting in one line
"ENV=val; export ENV" for two or more environment variables in one
line causes failure of test-check-code-hg.t: it is recognized as
"don't export and assign at once" unfortunately.
ui.plain() is supposed to disable config options that change the UI to the
detriment of scripts. As the test demonstrates, revset aliases can actually
override builtin ones, just like command aliases. Therefore I believe this is a
bugfix and appropriate for stable.
I came across a case where an internal command was using a revset that I didn't
immediately pass in and it was difficult to debug what was going wrong with the
revset. This prints out the revset and informs the user that the error is with
a rebset so it should be more obvious what and where the error is.
If self is a smartset and other is a fullreposet, nothing should be necessary.
A small win for trivial query in mozilla-central repo:
revset #0: (0:100000)
0) wall 0.017211 comb 0.020000 user 0.020000 sys 0.000000 (best of 163)
1) wall 0.001324 comb 0.000000 user 0.000000 sys 0.000000 (best of 2160)
The main purpose of wdir() is to annotate working-directory files.
Currently many commands and revsets cannot handle workingctx and may raise
exception. For example, -r ":wdir()" results in TypeError. This problem will
be addressed by future patches.
We could add "wdir" symbol instead, but it would conflict with the existing
tag, bookmark or branch. So I decided not to.
List of commands that will potentially support workingctx revision:
command default remarks
-------- ------- -----------------------------------------------------
annotate p1 useful
archive p1 might be useful
cat p1 might be useful on Windows (no cat)
diff p1:wdir (default)
export p1 might be useful if wctx can have draft commit message
files wdir (default)
grep tip:0 might be useful
identify wdir (default)
locate wdir (default)
log tip:0 might be useful with -p or -G option
parents wdir (default)
status wdir (default)
This patch includes minimal test of "hg status" that should be able to handle
the workingctx revision.
Before this patch, when I have a brain fart and type `hg log -r
'add(foo)'`, hg exits and just says add isn't a function, leading me
to the help page for revset to figure out how to spell the
function. With this patch, it suggests 'adds' as a function I might
have meant.
As per fullreposet.__and__, it can omit the range check of rev. Therefore,
"null" revision is accepted automagically.
It seems this can fix many query results involving null symbol. Originally,
the simplest "(null)" query did fail if there were hidden revisions. Tests
are randomly chosen.
fullreposet mimics the behavior of localrepo, where "null" revision is not
listed but contained.
I'm not sure if "all()" should filter out "null", but "all()" is stated as
'the same as "0:tip"' (except that it doesn't reorder the subset, I think.)
This patch is intended to avoid exposing a fullreposet to graphmod.dagwalker(),
which would result in strange drawing in future version:
|
o changeset: 0:f8035bb17114
| user: test
| date: Thu Jan 01 00:00:00 1970 +0000
| summary: add a
caused by:
parents = sorted(set([p.rev() for p in ctx.parents()
if p.rev() in revs]))
We cannot add "and p.rev() != nullrev" here because revs may actually include
"null" revision.
If a user has a revsetalias defined, it is their explicit wish for
this alias to be parsed as a revset and nothing else. Although the
case of the alias being short enough and only contain the letters a-f
is probably kind of rare, it may still happen.
Before this patch, revset predicate "tag()" and "named('tags')" differ
from each other, because the former doesn't include "tip" but the
latter does.
For equivalence, "named('tags')" shouldn't include the revision
corresponded to "tip". But just removing "tip" from the "tags"
namespace causes breaking backward compatibility, even though "tip"
itself is planned to be eliminated, as mentioned below.
http://selenic.com/pipermail/mercurial-devel/2015-February/066157.html
To mask specific names ("tip" in this case) for "named()" predicate,
this patch introduces "deprecated" into "namespaces", and makes
"named()" predicate examine whether each names are masked by the
namespace, to which they belong.
"named()" will really work correctly after 3.3.1 (see a3c326a7f57a for
detail), and fixing this on STABLE before 3.3.1 can prevent initial
users of "named()" from expecting "named('tags')" to include "tip".
It is reason why this patch is posted for STABLE, even though problem
itself isn't so serious.
This may have to be flagged as "(BC)", if applied on DEFAULT.
Before this patch, revset predicate "named()" uses each nodes gotten
from target namespaces directly.
This causes problems below:
- combination of other predicates doesn't work correctly, because
they assume that revisions are listed up in number
- "hg log" doesn't show any revisions for "named()" result, because:
- "changeset_printer" stores formatted output for each revisions
into dict with revision number (= ctx.rev()) as a key of them
- "changeset_printer.flush(rev)" writes stored output for
the specified revision, but
- "commands.log" invokes it with the node, gotten from "named()"
- "hg debugrevspec" shows nodes (= may be binary) directly
Difference between revset predicate "tag()" and "named('tags')" in
tests is fixed in subsequent patch.
Before this patch, "bookmark()", "named()" and "tag()" predicates
raise "Abort", when the specified pattern doesn't match against
existing ones.
This prevents "present()" predicate from continuing the query, because
it only catches "RepoLookupError".
This patch raises "RepoLookupError" instead of "Abort", to make
"present()" predicate continue the query, even if "bookmark()",
"named()" or "tag()" in the sub-query of it are aborted.
This patch doesn't contain raising "RepoLookupError" for "re:" pattern
in "tag()", because "tag()" treats it differently from others. Actions
of each predicates at failure of pattern matching can be summarized as
below:
predicate "literal:" "re:"
---------- ----------- ------------
bookmark abort abort
named abort abort
tag abort continue (*1)
branch abort continue (*2)
---------- ----------- ------------
"tag()" may have to abort in the (*1) case for similarity, but this
change may break backward compatibility of existing revset queries. It
seems to have to be changed on "default" branch (with "BC" ?).
On the other hand, (*2) seems to be reasonable, even though it breaks
similarity, because "branch()" in this case doesn't check exact
existence of branches, but does pick up revisions of which branch
matches against the pattern.
This patch also adds tests for "branch()" to clarify behavior around
"present()" of similar predicates, even though this patch doesn't
change "branch()".
This can simplify the conversion from numeric revision to string. Without it,
we have to handle -1 specially because repo['-1'] != repo[-1].
The -1 revision is not officially documented, but this change makes sense
assuming that "rev(%d)" exists for scripting or third-party tools.
Before this patch, alias declaration is parsed by string base
operations: matching against "^([^(]+)\(([^)]+)\)$" and splitting by
",".
This overlooks many syntax errors like below (see the previous patch
introducing "_parsealiasdecl" for detail):
- un-closed parenthesis causes being treated as "alias symbol"
- symbol/function name aren't examined whether they are valid or not
- invalid argument list causes unexpected argument names
To parse alias declaration strictly, this patch replaces parsing
implementation by "_parsealiasdecl".
This patch tests only one typical declaration error case, because
error detection itself is already tested in the doctest of
"_parsealiasdecl".
This also removes class property "args" and "error", because these are
certainly initialized in "revsetalias.__init__".
Before this patch, any errors in the declaration of revset alias
aren't detected at all, and there is no information about error source
in the error message.
As a part of preparation for parsing alias declarations and
definitions more strictly, this patch stores full detail into
"revsetalias.error" for error source distinction.
This makes raising "Abort" and warning potential errors just use
"revsetalias.error" without any message composing.
Because spanset.isascending() ignored the ascending flag, the result of
"fullreposet() & x" was always sorted in ascending order.
The test case is carefully chosen to call fullreposet.__and__.
With this patch, we can make it much easier to specify 'only(A,B)' ->
A%B. Similarly, 'only(A)' -> A%.
On Windows, '%' is a semi-reserved symbol in the following way: using non-bash
shells (e.g. cmd.exe but NOT PowerShell, ConEmu, and cmder), %var% is only
expanded when 'var' exists and is surrounded by '%'.
That only leaves batch scripts which could prove to be problematic. I posit
that this isn't a big issue because any developer of batch scripts already
knows that to use '%' one needs to escape it by using a double '%%'.
Alternatives to '%' could be '=' but that might be limiting our future if we
ever decide to use temporary assignments in a revset.
Before this patch, referring alias arguments is parsed by string base
operation "str.replace".
This causes problems below (see the previous patch introducing
"_parsealiasdefn" for detail)
- the shorter name argument breaks referring the longer name
- argument names in the quoted string are broken
This patch replaces parsing alias definition by "_parsealiasdefn" to
parse strictly.
Before this patch, there is no way to concatenate strings at runtime.
For example, to search for the issue ID "1234" in descriptions against
all of "issue 1234", "issue:1234", issue1234" and "bug(1234)"
patterns, the revset below should be written fully from scratch for
each issue ID.
grep(r"\bissue[ :]?1234\b|\bbug\(1234\)")
This patch introduces new infix operator "##" to concatenate
strings/symbols at runtime. Operator symbol "##" comes from the same
one of C pre-processor. This concatenation allows parametrizing a part
of strings in revset queries.
In the case of example above, the definition of the revset alias using
operator "##" below can search issue ID "1234" in complicated patterns
by "issue(1234)" simply:
issue($1) = grep(r"\bissue[ :]?" ## $1 ## r"\b|\bbug\(" ## $1 ## r"\)")
"##" operator does:
- concatenate not only strings but also symbols into the string
Exact distinction between strings and symbols seems not to be
convenience, because it is tiresome for users (and
"revset.getstring" treats both similarly)
For example of revset alias "issue()", "issue(1234)" is easier
than "issue('1234')".
- have higher priority than any other prefix, infix and postfix
operators (like as "##" of C pre-processor)
This patch (re-)assigns the priority 20 to "##", and 21 to "(",
because priority 19 is already assigned to "-" as prefix "negate".
Before this patch, a problematic revset alias aborts execution
immediately, even if it isn't referred in the specified revset.
If old "hg" may be used too (for example, bisecting Mercurial itself),
it is also difficult to write alias definitions using features newly
introduced by newer "hg" into configuration files, because such alias
definitions cause unexpected abortion at parsing revset aliases with
old "hg".
This patch delays showing parse error for the revset alias until it is
actually referred at runtime.
This patch detects referring problematic aliases in "_expandaliases"
by examination of "revsetalias.error", which is initialized with the
error message only when parsing fails.
For usability, this patch also warns about problematic aliases, even
if they aren't referred at runtime. This should help users to know
potential problems in their alias definitions earlier.
Before this patch, _revancestors were giving false result when a revision was
duplicated in the input. Duplicated entry are rare but may happen when using the
`%lx` notation internally.
This series has no visible impact on the performance of the function according
to benchmark.
It makes perfect sense for tokens to parse as existing revset symbols
(revset functions), and doesn't break anything, since parsing symbols
as functions works correctly in the presence of parens. For example,
if "only" is a bookmark, this used to error out,
hg log -r "only(only, @)"
which shouldn't, as the inner "only" is unambiguously not a function.
So we just remove the symbolset function and replace its calling site
with the stringset function.
For the tests, we confirm that "date" and "only" are both parsed as
revision names both inside revset expressions (e.g. an expression
containing ::) and inside old-style revision expressions (e.g. those
containing the name of the revision alone).
The recent optimization of "and" operation relies on the assumption that
the rhs set does not contain invalid revisions. So rev() has to remove
invalid revisions.
This is still faster than using `.filter(lambda r: r == l)`.
revset #0: rev(25)
0) wall 0.026341 comb 0.020000 user 0.020000 sys 0.000000 (best of 113)
1) wall 0.000038 comb 0.000000 user 0.000000 sys 0.000000 (best of 66567)
2) wall 0.000062 comb 0.000000 user 0.000000 sys 0.000000 (best of 43699)
(0: 428fa22fb2d1^, 1: 3.2-rc, 2: this patch)
Lazy revset broke the ordering of the `or` revset. We now stop assuming that
two ascending revset are combine into an ascending one.
Behavior in 3.0:
3:4 or 2:5 == [2, 3, 4, 5]
Behavior in 2.9:
3:4 or 2:5 == [3, 4, 2, 5]
We are adding a test for it.
For unclear reason, the performance `or` revset with expensive filter are
getting even worse than they used to be. This is probably caused by extra
uncached containment check or iteration.
revset #9: author(lmoscovicz) or author(mpm)
before) wall 3.487583 comb 3.490000 user 3.490000 sys 0.000000 (best of 3)
after) wall 4.481486 comb 4.480000 user 4.470000 sys 0.010000 (best of 3)
revset #10: author(mpm) or author(lmoscovicz)
before) wall 3.164839 comb 3.170000 user 3.160000 sys 0.010000 (best of 3)
after) wall 4.574965 comb 4.570000 user 4.570000 sys 0.000000 (best of 3)
Changeset 1440ec8e33c0 switched the order of the operand of the "&" computation
to work around an issue from repo-wide spanset. The need for a workaround has been
alleviated by the introduction of `fullreposet`. So we restore it to normal.
The benchmark shows no significant changes as expected.
We also revert the bogus test change introduced by 1440ec8e33c0. The order is
actually important.
Strip executes a revset like this:
max(parents(_intlist('1234\x001235')) - _intlist('1234\x001235'))
Previously the parents() revset would do 'subset & parents' which iterates over
each item in the subset and checks if it's in parents. subset is usually the
entire repo (a spanset) so this takes a while.
Reversing the parameters to be 'parents & subset' means the operation becomes
O(number of parents) instead of O(size of repo). It also means the result gets
evaluated immediately (since parents isn't a lazy set), but I think this is a
win in most scenarios.
This shaves 0.3 seconds off strip (amend/histedit/rebase/etc) for large repositories.
revset #0: parents(20000)
0) obsolete feature not enabled but 54243 markers found!
! wall 0.006256 comb 0.010000 user 0.010000 sys 0.000000 (best of 289)
1) obsolete feature not enabled but 54243 markers found!
! wall 0.000391 comb 0.000000 user 0.000000 sys 0.000000 (best of 4323)
This previously died in _revdescendants() taking the min() of the first set to
only(), when it was empty. An empty second set already worked. Likewise,
descendants() already handled an empty set.