The patch solves two issues:
1. id(unknown_full_hash) aborts, but id(unknown_short_hash) doesn't
2. id(40byte_tag_or_bookmark) returns tagged/bookmarked revision,
but id(non-40byte_tag_or_bookmark) doesn't
After the patch:
1. id(unknown_full_hash) doesn't abort
2. id(40byte_tag_or_bookmark) returns empty set
wdir() implementation is still incomplete and shouldn't be advertised to
users. This patch will be backed out when
- template values such as {rev} and {node} are settled
- major commands and revsets work without crashing
I came across a case where an internal command was using a revset that I didn't
immediately pass in and it was difficult to debug what was going wrong with the
revset. This prints out the revset and informs the user that the error is with
a rebset so it should be more obvious what and where the error is.
This will be useful to execute actions after the tree is parsed and
before the revset returns a match. Finding symbols in the parse tree
will later allow hashes of hidden revisions to work on the command
line without the --hidden flag.
If self is a smartset and other is a fullreposet, nothing should be necessary.
A small win for trivial query in mozilla-central repo:
revset #0: (0:100000)
0) wall 0.017211 comb 0.020000 user 0.020000 sys 0.000000 (best of 163)
1) wall 0.001324 comb 0.000000 user 0.000000 sys 0.000000 (best of 2160)
This returns the csets where matching subrepos have changed with respect to the
containing repo's first parent. The second parent shouldn't matter, because it
is either syncing up to the first parent (i.e. it hasn't changed from the
current branch's POV), or the merge changed it with respect to the first parent
(which already adds it to the set).
There's already a 'subrepo' fileset, but it is prefixed with 'set:', so there
should be no ambiguity (in code anyway). The only test I see for it is to
revert subrepos named by a glob pattern (in test-subrepo.t, line 58). Since it
doesn't return a tracked file, neither 'log "set:subrepo()"' nor
'files "set:subrepo()"' print anything. Therefore, it seems useful to have a
revset that will return something for log (and can be added to a revsetalias to
be chained with 'file' revsets.)
It might be nice to be able to filter for added, modified and removed
separately, but add/remove should be rare. It might also be nice to be able to
do a 'contains' check, in addition to this mutated check. Maybe it is possible
to get those with the existing 'adds', 'contains', 'modifies' and 'removes' by
teaching them to chase explicit paths into subrepos.
I'm not sure if this should be added to the 'modifies adds removes' line in
revset.optimize() (since it is doing an AMR check on .hgsubstate), or if it is
OK to put into 'safesymbols' (things like 'file' are on the list, and that takes
a regex, among other patterns).
The main purpose of wdir() is to annotate working-directory files.
Currently many commands and revsets cannot handle workingctx and may raise
exception. For example, -r ":wdir()" results in TypeError. This problem will
be addressed by future patches.
We could add "wdir" symbol instead, but it would conflict with the existing
tag, bookmark or branch. So I decided not to.
List of commands that will potentially support workingctx revision:
command default remarks
-------- ------- -----------------------------------------------------
annotate p1 useful
archive p1 might be useful
cat p1 might be useful on Windows (no cat)
diff p1:wdir (default)
export p1 might be useful if wctx can have draft commit message
files wdir (default)
grep tip:0 might be useful
identify wdir (default)
locate wdir (default)
log tip:0 might be useful with -p or -G option
parents wdir (default)
status wdir (default)
This patch includes minimal test of "hg status" that should be able to handle
the workingctx revision.
Previously we would instantiate the revbranchcache with a repo object, use it
briefly, then require it be passed in every time we wanted to fetch any
information. This seems unnecessary since it's obviously specific to that repo
(since it was constructed with it).
This patch stores the repo on the revbranchcache object, and removes the repo
parameter from the various functions on that class. This has the other nice
benefit of removing the double-revbranchcache-read that existed before (it was
read once for the branch revset, and once for the repo.revbranchcache).
Although Python supports `X = Y if COND else Z`, this was only
introduced in Python 2.5. Since we have to support Python 2.4, it was
a very common thing to write instead `X = COND and Y or Z`, which is a
bit obscure at a glance. It requires some intricate knowledge of
Python to understand how to parse these one-liners.
We change instead all of these one-liners to 4-liners. This was
executed with the following perlism:
find -name "*.py" -exec perl -pi -e 's,(\s*)([\.\w]+) = \(?(\S+)\s+and\s+(\S*)\)?\s+or\s+(\S*)$,$1if $3:\n$1 $2 = $4\n$1else:\n$1 $2 = $5,' {} \;
I tweaked the following cases from the automatic Perl output:
prev = (parents and parents[0]) or nullid
port = (use_ssl and 443 or 80)
cwd = (pats and repo.getcwd()) or ''
rename = fctx and webutil.renamelink(fctx) or []
ctx = fctx and fctx or ctx
self.base = (mapfile and os.path.dirname(mapfile)) or ''
I also added some newlines wherever they seemd appropriate for readability
There are probably a few ersatz ternary operators still in the code
somewhere, lurking away from the power of a simple regex.
As per fullreposet.__and__, it can omit the range check of rev. Therefore,
"null" revision is accepted automagically.
It seems this can fix many query results involving null symbol. Originally,
the simplest "(null)" query did fail if there were hidden revisions. Tests
are randomly chosen.
fullreposet mimics the behavior of localrepo, where "null" revision is not
listed but contained.
fcccbf073394 says we should avoid function calls in __contains__, so
super(fullreposet, self).__contains__(rev) is not an option.
Actually the super call doubled the benchmark result of trivial query:
revisions:
0) 6aa81b0c4658 (tip when I wrote this patch)
1) rev == node.nullrev or super(fullreposet, self).__contains__(rev)
revset #0: tip:0
0) wall 0.008441 comb 0.010000 user 0.010000 sys 0.000000 (best of 282)
1) wall 0.016152 comb 0.010000 user 0.010000 sys 0.000000 (best of 146)
I'm not sure if "all()" should filter out "null", but "all()" is stated as
'the same as "0:tip"' (except that it doesn't reorder the subset, I think.)
This patch is intended to avoid exposing a fullreposet to graphmod.dagwalker(),
which would result in strange drawing in future version:
|
o changeset: 0:f8035bb17114
| user: test
| date: Thu Jan 01 00:00:00 1970 +0000
| summary: add a
caused by:
parents = sorted(set([p.rev() for p in ctx.parents()
if p.rev() in revs]))
We cannot add "and p.rev() != nullrev" here because revs may actually include
"null" revision.
Transitioning to Mercurial versions with revision branch cache could be slow as
long as all operations were readonly (revset queries) and the cache would be
populated but not written back.
Instead, fall back to using the consistently slow path when readonly and the
cache doesn't exist yet. That avoids the overhead of populating the cache
without writing it back.
If not readonly, it will still populate all missing entries initially. That
avoids repeated writing of the cache file with small updates, and it also makes
sure a fully populated cache available for the readonly operations.
Before this patch, revset predicate "tag()" and "named('tags')" differ
from each other, because the former doesn't include "tip" but the
latter does.
For equivalence, "named('tags')" shouldn't include the revision
corresponded to "tip". But just removing "tip" from the "tags"
namespace causes breaking backward compatibility, even though "tip"
itself is planned to be eliminated, as mentioned below.
http://selenic.com/pipermail/mercurial-devel/2015-February/066157.html
To mask specific names ("tip" in this case) for "named()" predicate,
this patch introduces "deprecated" into "namespaces", and makes
"named()" predicate examine whether each names are masked by the
namespace, to which they belong.
"named()" will really work correctly after 3.3.1 (see a3c326a7f57a for
detail), and fixing this on STABLE before 3.3.1 can prevent initial
users of "named()" from expecting "named('tags')" to include "tip".
It is reason why this patch is posted for STABLE, even though problem
itself isn't so serious.
This may have to be flagged as "(BC)", if applied on DEFAULT.
Before this patch, revset predicate "named()" uses each nodes gotten
from target namespaces directly.
This causes problems below:
- combination of other predicates doesn't work correctly, because
they assume that revisions are listed up in number
- "hg log" doesn't show any revisions for "named()" result, because:
- "changeset_printer" stores formatted output for each revisions
into dict with revision number (= ctx.rev()) as a key of them
- "changeset_printer.flush(rev)" writes stored output for
the specified revision, but
- "commands.log" invokes it with the node, gotten from "named()"
- "hg debugrevspec" shows nodes (= may be binary) directly
Difference between revset predicate "tag()" and "named('tags')" in
tests is fixed in subsequent patch.
Before this patch, "bookmark()", "named()" and "tag()" predicates
raise "Abort", when the specified pattern doesn't match against
existing ones.
This prevents "present()" predicate from continuing the query, because
it only catches "RepoLookupError".
This patch raises "RepoLookupError" instead of "Abort", to make
"present()" predicate continue the query, even if "bookmark()",
"named()" or "tag()" in the sub-query of it are aborted.
This patch doesn't contain raising "RepoLookupError" for "re:" pattern
in "tag()", because "tag()" treats it differently from others. Actions
of each predicates at failure of pattern matching can be summarized as
below:
predicate "literal:" "re:"
---------- ----------- ------------
bookmark abort abort
named abort abort
tag abort continue (*1)
branch abort continue (*2)
---------- ----------- ------------
"tag()" may have to abort in the (*1) case for similarity, but this
change may break backward compatibility of existing revset queries. It
seems to have to be changed on "default" branch (with "BC" ?).
On the other hand, (*2) seems to be reasonable, even though it breaks
similarity, because "branch()" in this case doesn't check exact
existence of branches, but does pick up revisions of which branch
matches against the pattern.
This patch also adds tests for "branch()" to clarify behavior around
"present()" of similar predicates, even though this patch doesn't
change "branch()".
This can simplify the conversion from numeric revision to string. Without it,
we have to handle -1 specially because repo['-1'] != repo[-1].
The -1 revision is not officially documented, but this change makes sense
assuming that "rev(%d)" exists for scripting or third-party tools.
When running "hg log 'set:added()'", we create two matchers: one used
for producing the revset and one used for finding files to match. In
185b6b930e8c (graphlog: evaluate FILE/-I/-X filesets on the working
dir, 2012-02-26), we started passing a revision argument along from
what's currently in cmdutil._makelogrevset() to
revset._matchfiles(). When the revision was an empty string, it
referred to the working copy. This was subtly done with "repo[rev or
None]". Then, in 5ff5c5c9e69f (revset: avoid recalculating filesets,
2014-10-22), that conversion from empty string to None was lost. Note
that repo[''] is equivalent to repo['.'], not repo[None].
The consequence of this, to the user, is that when running "hg log
'set:added()'", the file matcher matches files added in the working
copy, while the revset matcher matches revisions that touch files
added in the parent of the working copy. As a result, only revisions
that touch any files added in the parent of the working copy will be
considered, but they will only be included if they also touch files
added in the working copy.
Fix the bug by converting '' to None again, but make it a little more
explicit this time (plus, we now have tests for it).
This change is intended to avoid exposing the implementation detail to
callers. I'm going to extend fullreposet to support "null" revision, so
these mfunc calls will have to use fullreposet() instead of spanset().
Before this patch, collisions between alias argument names in the
declaration are ignored, and this silently causes unexpected alias
evaluation.
This patch checks for such collisions, and aborts (or shows a warning) when
collisions are detected.
This patch doesn't add a test to "test-revset.t", because a doctest is
enough to test the collisions detection itself.
Before this patch, alias declaration is parsed by string base
operations: matching against "^([^(]+)\(([^)]+)\)$" and splitting by
",".
This overlooks many syntax errors like below (see the previous patch
introducing "_parsealiasdecl" for detail):
- un-closed parenthesis causes being treated as "alias symbol"
- symbol/function name aren't examined whether they are valid or not
- invalid argument list causes unexpected argument names
To parse alias declaration strictly, this patch replaces parsing
implementation by "_parsealiasdecl".
This patch tests only one typical declaration error case, because
error detection itself is already tested in the doctest of
"_parsealiasdecl".
This also removes class property "args" and "error", because these are
certainly initialized in "revsetalias.__init__".
This patch introduces "_parsealiasdecl" to parse alias declarations
strictly. For example, "_parsealiasdecl" can detect problems below,
which current implementation can't.
- un-closed parenthesis causes being treated as "alias symbol"
because all of declarations not in "func(....)" style are
recognized as "alias symbol".
for example, "foo($1, $2" is treated as the alias symbol.
- alias symbol/function names aren't examined whether they are valid
as symbol or not
for example, "foo bar" can be treated as the alias symbol, but of
course such invalid symbol can't be referred in revset.
- just splitting argument list by "," causes overlooking syntax
problems in the declaration
for example, all of invalid declarations below are overlooked:
- foo("bar") => taking one argument named as '"bar"'
- foo("unclosed) => taking one argument named as '"unclosed'
- foo(bar::baz) => taking one argument named as 'bar::baz'
- foo(bar($1)) => taking one argument named as 'bar($1)'
To decrease complication of patch, current implementation for alias
declarations is replaced by "_parsealiasdecl" in the subsequent
patch. This patch just introduces it.
This patch defines "_parsealiasdecl" not as a method of "revsetalias"
class but as a one of "revset" module, because of ease of testing by
doctest.
This patch factors some helper functions for "tree" out, because:
- direct accessing like "if tree[0] == 'func' and len(tree) > 1"
decreases readability
- subsequent patch (and also existing code paths, in the future) can
use them for readability
This patch also factors "_tokenizealias" out, because it can be used
also for parsing alias definitions strictly.
Before this patch, any errors in the declaration of revset alias
aren't detected at all, and there is no information about error source
in the error message.
As a part of preparation for parsing alias declarations and
definitions more strictly, this patch stores full detail into
"revsetalias.error" for error source distinction.
This makes raising "Abort" and warning potential errors just use
"revsetalias.error" without any message composing.
This patch defines the composing function not in "ParseError" class but
in "revset" module, because:
- "_()" shouldn't be used in "ParseError", to avoid adding "from
i18n import _" i18n" to "error" module
- generalizing message composition of"ParseError" for all code paths
other than revset isn't the purpose of this patch
we should also take care of showing "unexpected leading
whitespace" for some code paths, to generalize widely.
Before this patch, "tokenize" doesn't recognize the symbol starting
with "$" as a valid one.
This prevents revset alias declarations and definitions from being
parsed with "tokenize", because "$" may be used as the initial letter
of alias arguments.
BTW, the alias argument name doesn't require leading "$" itself, in
fact. But we have to assume that users may use "$" as the initial
letter of argument names in their aliases, because examples in "hg
help revsets" uses such names for a long time.
To make "tokenize" extensible to parse alias declarations and
definitions, this patch introduces optional arguments "syminitletters"
and "symletters". Giving these sets can change the policy of "valid
symbol" in tokenization easily.
This patch keeps original examination of letter validity for
reviewability, even though there is redundant interchanging between
"chr"/"ord" at initialization of "_syminitletters" and "_symletters".
At most 256 times examination (per initialization) is cheaper enough
than revset evaluation itself.
This patch is a part of preparation for parsing alias declarations and
definitions more strictly.
Because spanset.isascending() ignored the ascending flag, the result of
"fullreposet() & x" was always sorted in ascending order.
The test case is carefully chosen to call fullreposet.__and__.
Branch name filtering in revsets was expensive. For every rev it created a
changectx and called .branch() which retrieved the branch name from the
changelog.
Instead, use the revbranchcache.
The revbranchcache is used read-only. The revset implementation with generators
and callbacks makes it hard to figure out when we are done using/updating the
cache and could write it back. It would also be 'tricky' to lock the repo for
writing from within a revset execution. Finally, the branchmap update will
usually make sure that the cache is updated before any revset can be run.
The revbranchcache is used without any locking but is short-lived and used in a
tight loop where we can assume that the changelog doesn't change ... or where
it not is relevant to us if it does.
perfrevset 'branch(mobile)' on mozilla-central.
Before:
! wall 10.989637 comb 10.970000 user 10.940000 sys 0.030000 (best of 3)
After, no cache:
! wall 7.368656 comb 7.370000 user 7.360000 sys 0.010000 (best of 3)
After, with cache:
! wall 0.528098 comb 0.530000 user 0.530000 sys 0.000000 (best of 18)
The performance improvement even without cache come from being based on
branchinfo on the changelog instead of using ctx.branch().
Some tests are added to verify that the revbranchcache works and keep an eye on
when the cache files actually are updated.
It was introduced at deb42ca4dd93, where spanset.__contains__() did not exist.
Nowadays, we have to pay huge penalty for len(subset).
The following example showed that OR operation could be O(n * m^2)
(n: len(repo), m: number of OR operators, m >= 2) probably because of
filteredset.__len__.
revset #0: 0|1|2|3|4|5|6|7|8|9
0) wall 8.092713 comb 8.090000 user 8.090000 sys 0.000000 (best of 3)
1) wall 0.445354 comb 0.450000 user 0.430000 sys 0.020000 (best of 22)
2) wall 0.000389 comb 0.000000 user 0.000000 sys 0.000000 (best of 7347)
(0: 3.2.4, 1: 3.1.2, 2: this patch)
With this patch, we can make it much easier to specify 'only(A,B)' ->
A%B. Similarly, 'only(A)' -> A%.
On Windows, '%' is a semi-reserved symbol in the following way: using non-bash
shells (e.g. cmd.exe but NOT PowerShell, ConEmu, and cmder), %var% is only
expanded when 'var' exists and is surrounded by '%'.
That only leaves batch scripts which could prove to be problematic. I posit
that this isn't a big issue because any developer of batch scripts already
knows that to use '%' one needs to escape it by using a double '%%'.
Alternatives to '%' could be '=' but that might be limiting our future if we
ever decide to use temporary assignments in a revset.
fullreposet.__contains__() will be rewritten in order to support "null"
revision, and "rev()" won't be possible to rely on it.
This backs out 23ac42e12ce5, but there is no performance regression now.
revisions:
0) bd19f94d30e9 "l not in fullreposet(repo)"
1) this patch "l not in repo.changelog"
revset #0: rev(210000)
0) wall 0.000056 comb 0.000000 user 0.000000 sys 0.000000 (best of 48036)
1) wall 0.000049 comb 0.000000 user 0.000000 sys 0.000000 (best of 54969)
Before this patch, referring alias arguments is parsed by string base
operation "str.replace".
This causes problems below (see the previous patch introducing
"_parsealiasdefn" for detail)
- the shorter name argument breaks referring the longer name
- argument names in the quoted string are broken
This patch replaces parsing alias definition by "_parsealiasdefn" to
parse strictly.
This patch introduces "_parsealiasdefn" to parse alias definitions
strictly. For example, it can avoid problems below, which current
implementation can't.
- the shorter name argument breaks referring the longer name one in
the definition, if the former is completely prefix of the latter
for example, the alias definition "foo($1, $10) = $1 or $10" is
parsed as "_aliasarg('$1') or _aliasarg('$1')0" and causes parse
error, because tail "0" of "_aliasarg('$1')0" is invalid.
- argument names in the quoted string are broken
for example, the definition "foo($1) = $1 or desc('$1')" is parsed
as "_aliasarg('$1') or desc('_aliasarg(\'$1\')')" and causes
unexpected description matching against not '$1' but '_aliasarg(\'$1\')'.
To decrease complication of patch, current implementation for alias
definitions is replaced by "_parsealiasdefn" in the subsequent
patch. This patch just introduces it.
This patch defines "_parsealiasdefn" not as a method of "revsetalias"
class but as a one of "revset" module, because of ease of testing by
doctest.