This can simplify the conversion from numeric revision to string. Without it,
we have to handle -1 specially because repo['-1'] != repo[-1].
The -1 revision is not officially documented, but this change makes sense
assuming that "rev(%d)" exists for scripting or third-party tools.
Before this patch, alias declaration is parsed by string base
operations: matching against "^([^(]+)\(([^)]+)\)$" and splitting by
",".
This overlooks many syntax errors like below (see the previous patch
introducing "_parsealiasdecl" for detail):
- un-closed parenthesis causes being treated as "alias symbol"
- symbol/function name aren't examined whether they are valid or not
- invalid argument list causes unexpected argument names
To parse alias declaration strictly, this patch replaces parsing
implementation by "_parsealiasdecl".
This patch tests only one typical declaration error case, because
error detection itself is already tested in the doctest of
"_parsealiasdecl".
This also removes class property "args" and "error", because these are
certainly initialized in "revsetalias.__init__".
Before this patch, any errors in the declaration of revset alias
aren't detected at all, and there is no information about error source
in the error message.
As a part of preparation for parsing alias declarations and
definitions more strictly, this patch stores full detail into
"revsetalias.error" for error source distinction.
This makes raising "Abort" and warning potential errors just use
"revsetalias.error" without any message composing.
Because spanset.isascending() ignored the ascending flag, the result of
"fullreposet() & x" was always sorted in ascending order.
The test case is carefully chosen to call fullreposet.__and__.
With this patch, we can make it much easier to specify 'only(A,B)' ->
A%B. Similarly, 'only(A)' -> A%.
On Windows, '%' is a semi-reserved symbol in the following way: using non-bash
shells (e.g. cmd.exe but NOT PowerShell, ConEmu, and cmder), %var% is only
expanded when 'var' exists and is surrounded by '%'.
That only leaves batch scripts which could prove to be problematic. I posit
that this isn't a big issue because any developer of batch scripts already
knows that to use '%' one needs to escape it by using a double '%%'.
Alternatives to '%' could be '=' but that might be limiting our future if we
ever decide to use temporary assignments in a revset.
Before this patch, referring alias arguments is parsed by string base
operation "str.replace".
This causes problems below (see the previous patch introducing
"_parsealiasdefn" for detail)
- the shorter name argument breaks referring the longer name
- argument names in the quoted string are broken
This patch replaces parsing alias definition by "_parsealiasdefn" to
parse strictly.
Before this patch, there is no way to concatenate strings at runtime.
For example, to search for the issue ID "1234" in descriptions against
all of "issue 1234", "issue:1234", issue1234" and "bug(1234)"
patterns, the revset below should be written fully from scratch for
each issue ID.
grep(r"\bissue[ :]?1234\b|\bbug\(1234\)")
This patch introduces new infix operator "##" to concatenate
strings/symbols at runtime. Operator symbol "##" comes from the same
one of C pre-processor. This concatenation allows parametrizing a part
of strings in revset queries.
In the case of example above, the definition of the revset alias using
operator "##" below can search issue ID "1234" in complicated patterns
by "issue(1234)" simply:
issue($1) = grep(r"\bissue[ :]?" ## $1 ## r"\b|\bbug\(" ## $1 ## r"\)")
"##" operator does:
- concatenate not only strings but also symbols into the string
Exact distinction between strings and symbols seems not to be
convenience, because it is tiresome for users (and
"revset.getstring" treats both similarly)
For example of revset alias "issue()", "issue(1234)" is easier
than "issue('1234')".
- have higher priority than any other prefix, infix and postfix
operators (like as "##" of C pre-processor)
This patch (re-)assigns the priority 20 to "##", and 21 to "(",
because priority 19 is already assigned to "-" as prefix "negate".
Before this patch, a problematic revset alias aborts execution
immediately, even if it isn't referred in the specified revset.
If old "hg" may be used too (for example, bisecting Mercurial itself),
it is also difficult to write alias definitions using features newly
introduced by newer "hg" into configuration files, because such alias
definitions cause unexpected abortion at parsing revset aliases with
old "hg".
This patch delays showing parse error for the revset alias until it is
actually referred at runtime.
This patch detects referring problematic aliases in "_expandaliases"
by examination of "revsetalias.error", which is initialized with the
error message only when parsing fails.
For usability, this patch also warns about problematic aliases, even
if they aren't referred at runtime. This should help users to know
potential problems in their alias definitions earlier.
Before this patch, _revancestors were giving false result when a revision was
duplicated in the input. Duplicated entry are rare but may happen when using the
`%lx` notation internally.
This series has no visible impact on the performance of the function according
to benchmark.
It makes perfect sense for tokens to parse as existing revset symbols
(revset functions), and doesn't break anything, since parsing symbols
as functions works correctly in the presence of parens. For example,
if "only" is a bookmark, this used to error out,
hg log -r "only(only, @)"
which shouldn't, as the inner "only" is unambiguously not a function.
So we just remove the symbolset function and replace its calling site
with the stringset function.
For the tests, we confirm that "date" and "only" are both parsed as
revision names both inside revset expressions (e.g. an expression
containing ::) and inside old-style revision expressions (e.g. those
containing the name of the revision alone).
The recent optimization of "and" operation relies on the assumption that
the rhs set does not contain invalid revisions. So rev() has to remove
invalid revisions.
This is still faster than using `.filter(lambda r: r == l)`.
revset #0: rev(25)
0) wall 0.026341 comb 0.020000 user 0.020000 sys 0.000000 (best of 113)
1) wall 0.000038 comb 0.000000 user 0.000000 sys 0.000000 (best of 66567)
2) wall 0.000062 comb 0.000000 user 0.000000 sys 0.000000 (best of 43699)
(0: 428fa22fb2d1^, 1: 3.2-rc, 2: this patch)
Lazy revset broke the ordering of the `or` revset. We now stop assuming that
two ascending revset are combine into an ascending one.
Behavior in 3.0:
3:4 or 2:5 == [2, 3, 4, 5]
Behavior in 2.9:
3:4 or 2:5 == [3, 4, 2, 5]
We are adding a test for it.
For unclear reason, the performance `or` revset with expensive filter are
getting even worse than they used to be. This is probably caused by extra
uncached containment check or iteration.
revset #9: author(lmoscovicz) or author(mpm)
before) wall 3.487583 comb 3.490000 user 3.490000 sys 0.000000 (best of 3)
after) wall 4.481486 comb 4.480000 user 4.470000 sys 0.010000 (best of 3)
revset #10: author(mpm) or author(lmoscovicz)
before) wall 3.164839 comb 3.170000 user 3.160000 sys 0.010000 (best of 3)
after) wall 4.574965 comb 4.570000 user 4.570000 sys 0.000000 (best of 3)
Changeset 1440ec8e33c0 switched the order of the operand of the "&" computation
to work around an issue from repo-wide spanset. The need for a workaround has been
alleviated by the introduction of `fullreposet`. So we restore it to normal.
The benchmark shows no significant changes as expected.
We also revert the bogus test change introduced by 1440ec8e33c0. The order is
actually important.
Strip executes a revset like this:
max(parents(_intlist('1234\x001235')) - _intlist('1234\x001235'))
Previously the parents() revset would do 'subset & parents' which iterates over
each item in the subset and checks if it's in parents. subset is usually the
entire repo (a spanset) so this takes a while.
Reversing the parameters to be 'parents & subset' means the operation becomes
O(number of parents) instead of O(size of repo). It also means the result gets
evaluated immediately (since parents isn't a lazy set), but I think this is a
win in most scenarios.
This shaves 0.3 seconds off strip (amend/histedit/rebase/etc) for large repositories.
revset #0: parents(20000)
0) obsolete feature not enabled but 54243 markers found!
! wall 0.006256 comb 0.010000 user 0.010000 sys 0.000000 (best of 289)
1) obsolete feature not enabled but 54243 markers found!
! wall 0.000391 comb 0.000000 user 0.000000 sys 0.000000 (best of 4323)
This previously died in _revdescendants() taking the min() of the first set to
only(), when it was empty. An empty second set already worked. Likewise,
descendants() already handled an empty set.
Previously revset._descendants would iterate over the entire subset (which is
often the entire repo) and test if each rev was in the descendants list. This is
really slow on large repos (3+ seconds).
Now we iterate over the descendants and test if they're in the subset.
This affects advancing and retracting the phase boundary (3.5 seconds down to
0.8 seconds, which is even faster than it was in 2.9). Also affects commands
that move the phase boundary (commit and rebase, presumably).
The new revsetbenchmark indicates an improvement from 0.2 to 0.12 seconds. So
future revset changes should be able to notice regressions.
I removed a bad test. It was recently added and tested '1:: and reverse(all())',
which has an amibiguous output direction. Previously it printed in reverse order,
because we iterated over the subset (the reverse part). Now it prints in normal
order because we iterate over the 1:: . Since the revset itself doesn't imply an
order, I removed the test.
min([]) raise a ValueError, we do the same thing in smartset.min() and
smartset.max() for the sake of consistency.
The min/amax test are greatly improved in the process to prevent this familly
of regression
Since recent revset changes, revrange now return a smartset. This smart set
probably does not support indexing (_addset does not). This led to crash.
Instead when the smartset is ordered we use the `min` and `max` method of
smart set. Otherwise we turn is into a list and use indexing on it.
The tests have been updated to catch such regression.
Adds a only() revset that has two forms:
only(<set>) is equivalent to "::<set> - ::(heads() - heads(<set>::))"
only(<include>,<exclude>) is equivalent to "::<include> - ::<exclude>"
On a large repo, this implementation can process/traverse 50,000 revs in 0.7
seconds, versus 4.2 seconds using "::<include> - ::<exclude>".
This is useful for performing histedits on your branch:
hg histedit -r 'first(only(.))'
Or lifting branch foo off of branch bar:
hg rebase -d @ -s 'only(foo, bar)'
Or a variety of other uses.
A missing ancestor expression is any expression of the form (::x - ::y) or
equivalent. Such expressions are remarkably common, and so far have involved
multiple walks down the DAG, followed by a set difference operation.
With this patch, such expressions will be transformed into uses of the fast
algorithm at ancestor.missingancestor.
For a repository with over 600,000 revisions, perfrevset for '::tip - ::-10000'
returns:
Before: ! wall 3.999575 comb 4.000000 user 3.910000 sys 0.090000 (best of 3)
After: ! wall 0.132423 comb 0.130000 user 0.130000 sys 0.000000 (best of 75)
Before this patch, revset predicate "filelog()" uses "match.files()"
to get filename also for the pattern without explicit kind.
But in such case, only canonicalization of relative path is required,
and other initializations of "match" object including regexp
compilation are meaningless.
This patch uses "pathutil.canonpath()" directly for "filelog()"
pattern without explicit kind like "glob:", for efficiency.
This patch also does below as a part of introducing "canonpath()":
- move location of "matchmod.match()" invocation, because "m" is no
more used in "if not matchmod.patkind(pat)" code path
- omit passing "default" argument to "matchmod.match()", because
"pat" should have explicit kind of pattern in this code path
Before this patch, default kind of pattern for revset predicate
"contains()" is treated as the exact file path rooted at the root of
the repository. This decreases usability, because:
- all other predicates taking pattern argument (also "filelog()")
treat such pattern as the path rooted at the current working
directory
- "contains()" doesn't describe this difference in its help
- this difference may confuse users
for example, this prevents revset aliases from sharing same
argument between "contains()" and other predicates
This patch makes default kind of pattern for revset predicate
"contains()" be rooted at the current working directory.
This patch uses "pathutil.canonpath()" instead of creating "match"
object for efficiency.
Some changesets can be wrongly reported as matched by this predicate
due to searching in a string joined with spaces and not individually.
A test case added, which fails without this fix.
Backout 308a153b9120. The changeset prevented closing non-head changesets but
did not provide any rationale or test case and I don't see what value it adds.
Users might have their reasons to commit something anywhere - and close it
immediately.
And contrary to the comment that is removed: The topo heads set is _not_
included in the branch heads set of the current branch. It do not include
closed topological heads.
The change thus prevented closing commits on top of closing commits. A valid
usecase for that is to merge closed heads to reduce the number of topological
heads.
The only existing test coverage for this is the failing double close in
test-revset.t. It was added in dc0e42c06b4e and seems to not be intentional.
Change ancestor to accept 0 or more arguments. The greatest common ancestor of a
single changeset is that changeset. If passed no arguments, the empty list is
returned.
Before this patch, sub expression may return unexpected result, if it
is joined with another expression by "or":
- "^"/parentspec():
"R or R^1" is not equal to "R^1 or R". the former returns only "R".
- "~"/ancestorspec():
"R or R~1" is not equal to "R~1 or R". the former returns only "R".
- ":"/rangeset():
"10 or (10 or 15):" is not equal to "(10 or 15): or 10". the
former returns only 10 and 15 or grater (11 to 14 are not
included).
In "or"-ed expression "A or B", the "subset" passed to evaluation of
"B" doesn't contain revisions gotten from evaluation of "A", for
efficiency.
In the other hand, "stringset()" fails to look corresponding revision
for specified string/symbol up, if "subset" doesn't contain that
revision.
So, predicates looking revisions up indirectly should evaluate sub
expressions of themselves not with passed "subset" but with "entire
revisions in the repository", to prevent "stringset()" from unexpected
failing to look symbols in them up.
But predicates in above example don't so. For example, in the case of
"R or R^1":
1. "R^1" is evaluated with "subset" containing revisions other than
"R", because "R" is already gotten by the former of "or"-ed
expressions
2. "parentspec()" evaluates "R" of "R^1" with such "subset"
3. "stringset()" fails to look "R" up, because "R" is not contained
in "subset"
4. so, evaluation of "R^1" returns no revision
This patch evaluates sub expressions for predicates above with "entire
revisions in the repository".
63af8381691a (released as Mercurial 2.3) introduced the issue that the
revset program started with 40 hexadecimal letters caused unexpected
result at "hg log" execution.
This issue was already fixed by b579dcd15206 (released as 2.3.1), but
there is no test to examine whether this issue is certainly fixed or
not: no test fails even if b579dcd15206 is backed out.
This patch adds test for this issue.
Added test is also confirmed to fail, when it is tested against:
- Mercurial 2.3, or
- Mercurial 2.3.1 or later with backing b579dcd15206 out
The branchpoint() function returns changesets with more than one child.
Eventually I would like to be able to see only branch points and merge
points in a graphical log to see the topology of the repository.
In MSYS, the test fails like this if the hghave exit at the beginning is
removed:
--- C:\Users\adi\hgrepos\hg-main\tests\test-revset.t
+++ C:\Users\adi\hgrepos\hg-main\tests\test-revset.t.err
@@ -58,7 +58,7 @@
$ hg co 3
2 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ hg branch /a/b/c/
- marked working directory as branch /a/b/c/
+ marked working directory as branch a:/b/c/
(branches are permanent and global, did you want a bookmark?)
$ hg ci -Aqm"5 bug"
@@ -252,7 +252,7 @@
2 a-b-c-
3 +a+b+c+
4 -a-b-c-
- 5 /a/b/c/
+ 5 a:/b/c/
6 _a_b_c_
7 .a.b.c.
$ log 'children(ancestor(4,5))'
due to the posix path conversion done by MSYS globally, as explained here
http://www.mingw.org/wiki/Posix_path_conversion
The solution is a bit lame, but it is simple and works: don't use strings that
look like '/a/b', in order not to trigger the path magic done by MSYS.
So, if we can agree not to insist on testing branch names starting with '/',
then this relatively simple patch makes the test pass both on Windows with MSYS
and Linux.
If the string provided to the 'tag' predicate starts with 're:', the rest
of the string will be treated as a regular expression and matched against
all tags in the repository.
There is a slight backwards-compatibility problem for people who actually
have tags that start with 're:'. As a workaround, these tags can be matched
using a 'literal:' prefix.
If no tags match the pattern, an error is raised. This matches the behaviour
of the previous exact-match code.
The alias expansion code it changed from:
1- Get replacement tree
2- Substitute arguments in the replacement tree
3- Expand the replacement tree again
into:
1- Get the replacement tree
2- Expand the replacement tree
3- Expand the arguments
4- Substitute the expanded arguments in the replacement tree
and fixes cases like:
[revsetalias]
level1($1, $2) = $1 or $2
level2($1, $2) = level1($2, $1)
$ hg log -r "level2(level1(1, 2), 3)"
where the original version incorrectly aborted on infinite expansion
error, because it was confusing the expanded aliases with their
arguments.
The current revset alias expansion code works like:
1- Get the replacement tree
2- Substitute the variables in the replacement tree
3- Expand the replacement tree
It makes it easy to substitute alias arguments because the placeholders
are always replaced before the updated replacement tree is expanded
again. Unfortunately, to fix other alias expansion issues, we need to
reorder the sequence and delay the argument substitution. To solve this,
a new "virtual" construct called _aliasarg() is introduced and injected
when parsing the aliases definitions. Only _aliasarg() will be
substituted in the argument expansion phase instead of all regular
matching string. We also check user inputs do not contain unexpected
_aliasarg() instances to avoid argument injections.
The fast path was triggered if the argument was not like "type:value", with
type a known pattern type. This is wrong for several reasons:
- path:value is valid for the fast path
- '*' is interpreted as a glob by default and is not valid for fast path
Fast path detection is now done after the pattern is parsed, and the normalized
path is extracted for direct comparison. All this seems a bit complicated, it
is tempting to drop the fast path completely. Also, the hasfile() revset does
something similar (only check .files()), without a fast path. If the fast path
is really that efficient maybe it should be used there too.
Note that:
$ log 'modifies("set:modified()")'
is different from:
$ log 'modifies("*")'
because of the usual merge ctx.files()/status(ctx.p1(), ctx) differences.
Reported by Steffen Eichenberg <steffen.eichenberg@msg-gillardon.de>
This adds a couple of tests for the revset "matching" keyword:
1. Test that the 2nd parameter is optional
2. Test that all the 1st argument can be a revset and that all the supported
fields of the 2nd argument work.
The revset aliases expansion worked like:
expr = "some revset"
for alias in aliases:
expr = alias.process(expr)
where "process" was replacing the alias with its *unexpanded* substitution,
recursively. So it only worked when aliases were applied in proper dependency
order.
This patch rewrites the expansion process so all aliases are expanded
recursively at every tree level, after parent alias rewriting and variable
expansion.
some problematic encoding (e.g.: cp932) uses ASCII alphabet characters
in byte sequence of multi byte characters.
"str.lower()" on such byte sequence may treat distinct characters as
same one, and cause unexpected log matching.
this patch uses "encoding.lower()" instead of "str.lower()" to
normalize strings for compare.
The existing code seemed to have incorrect assumptions about how parameter
lists are represented by the parser.
Now the match and replace functions have been merged and simplified by using
getlist().
Like keyword(), but does not search in filenames and users.
No grepdesc() or descgrep() added, because it might be bad to introduce
grepfoo() versions of too many string searches.
When limit, last, min and max were evaluated they worked on a reduced set in the
wrong way. Now they work on an unrestricted set (the whole repo) and get
limited later on.
^ (Nth parent) and ~ (Nth first ancestor) are infix operators that match
certain ancestors of the set:
set^0
the set
set^1 (also available as set^)
the first parent of every changeset in set
set^2
the second parent of every changeset in set
set~0
the set
set~1
the first ancestor (i.e. the first parent) of every changeset in set
set~2
the second ancestor (i.e. first parent of first parent) of every changeset
in set
set~N
the Nth ancestor (following first parents only) of every changeset in set;
set~N is equivalent to set^1^1..., with ^1 repeated N times.
This is valuable because now revsets like 'bookmarks() or tip' will
always show tip after bookmarks unless tip was itself a bookmark. This
is a somewhat contrived example, but this behavior is useful for
"where am I" type aliases that use log and revsets.
For the boolean operators, the subset optimization works by calculating
the cheaper argument first, and passing the subset to the second
argument to restrict the revision domain. This works well for filtering
predicates.
But parents() don't work like a filter: it may return revisions outside the
specified set. So, combining it with boolean operators may easily yield
incorrect results. For instance, for the following revision graph:
0 -- 1
the expression '0 and parents(1)' should evaluate as follows:
0 and parents(1) ->
0 and 0 ->
0
But since [0] is passed to parents() as a subset, we get instead:
0 and parents(1 and 0) ->
0 and parents([]) ->
0 and [] ->
[]
This also affects children(), p1() and p2(), for the same reasons.
Predicates that call these (like heads()) are also affected.
We work around this issue by ignoring the subset when propagating
the call inside those predicates.
This adds support for r'...' and r"..." as string literals. Strings
with the "r" prefix will not have their escape characters interpreted.
This is especially useful for grep(), where, with regular string
literals, \number is interpreted as an octal escape code, and \b is
interpreted as the backspace character (\x08).