Our invert() function was too clever to not take length into account. I could
fix the problem by appending '\xff' as a terminator (opposite to '\0'), but
it turned out to be slower than simple multi-pass sorting.
New implementation is pretty straightforward, which just calls sort() from the
last key. We can do that since Python sort() is guaranteed to be stable. It
doesn't sound nice to call sort() multiple times, but actually it is faster.
That's probably because we have fewer Python codes in hot loop, and can avoid
heavy string and list manipulation.
revset #0: sort(0:10000, 'branch')
0) 0.412753
1) 0.393254
revset #1: sort(0:10000, '-branch')
0) 0.455377
1) 0.389191 85%
revset #2: sort(0:10000, 'date')
0) 0.408082
1) 0.376332 92%
revset #3: sort(0:10000, '-date')
0) 0.406910
1) 0.380498 93%
revset #4: sort(0:10000, 'desc branch user date rev')
0) 0.542996
1) 0.486397 89%
revset #5: sort(0:10000, '-desc -branch -user -date -rev')
0) 0.965032
1) 0.518426 53%
This provides a customization point for templater. In templater, there are
two ways to call a unary function: func(x) and x|func. They are processed
differently in templater due to historical reasons, but they should be
handled in the same way while expanding aliases. In short, x|func should be
processed as syntactic sugar for func(x).
_funcnode and _getlist() are replaced by _trygetfunc().
They will be commonly used by revset and templater. It isn't easy to understand
how _expand() works, so I'll add comments by a follow-up patch.
The local variable 'alias' is renamed to 'a' to avoid shadowing the global
'alias' class.
It was odd that the revsetalias did the whole parsing stuff in __init__().
Instead, this patch adds a factory function to the aliasrules class, and
makes the alias (= revsetalias) class a plain-old value object.
We no longer need separate parsers. Only difference between _parsealiasdecl()
and _parsealiasdefn() is whether or not to flatten 'or' tree. Since alias
declaration should have no 'or' operator, there was no practical difference.
The original _parsealiasdefn() function is split into common _builddefn()
and revset-specific _parsealiasdefn(). revset._relabelaliasargs() is removed
as it is no longer used.
The doctests are ported by using the dummy parse().
The original _parsealiasdecl() function is split into common _builddecl()
and revset-specific _parsealiasdecl(). And the original _parsealiasdecl()
call is temporarily replaced by rules._builddecl(), which should be eliminated
later.
The doctests are mostly ported by using the dummy parse(), but the test for
'foo bar' is kept in _parsealiasdecl() as it checks if "pos != len(decl)" is
working. Also, 'foo($1)' test is added to make sure the alias tokenizer can
handle '$1' symbol, which is the only reason why we need _parsealiasdecl().
This class will keep syntax rules that are necessary to parse and expand
aliases. The implementations will be extracted from the revset module. In
order to make the porting easier, this class keeps parsedecl and parsedefn
separately, which will be unified later. Also, getlist and funcnode will
be refactored by future patches for better handling of the template aliases.
The following public functions will be added:
aliasrules.build(decl, defn) -> aliasobj
parse decl and defn into an object that keeps alias name, arguments
and replacement tree.
aliasrules.buildmap(aliasitems) -> aliasdict
helper to build() a dict of alias objects from a list of (decl, defn)
aliasrules.expand(aliasdict, tree) -> tree
expand aliases in tree recursively
Because these functions aren't introduced by this series, there would remain
a few wrapper functions in the revset module. These ugly wrappers should be
eliminated by the next series.
This class is considered an inheritable namespace, which will host only
class/static methods. That's because it won't have no object-scope variables.
I'm not a big fan of using class as a syntax sugar, but I admit it can improve
code readability at some level. So let's give it a try.
It is possible to initialize a baseset directly from a set object. However, in
this case the iteration order was inherited from the set. Set have undefined
iteration order (especially cpython and pypy will have different one) so we
should not rely on it anywhere.
Therefor we declare the baseset "ascending" to enforce a consistent iteration
order. The sorting is done lazily by the baseset class and should have no
performance impact when it does not matter.
This makes test-revset.t pass with pypy.
Cpython and pypy have different way to build and order set, so the result of
list(myset) is different. We work around this by using the sorted version of the
data when displaying a list.
This get pypy closer to pass test-revset.t.
PyPy would sometime call __len__ at points where it things preallocating
the container makes sense. Change the doctests so they're using generator
expressions and not list comprehensions
Since I'm going to extract a common alias parser, I want to eliminate
dependencies to the revset parsing rules. These functions are trivial,
so we can go without them.
If tree is a tuple, it must have at least one element. Also the length of node
tuple is guaranteed by the syntax elements. (e.g. 'func' must have 3 items.)
This change will help inlining these trivial functions in future patches.
Since _parsealiasdefn() rejects unknown alias arguments, _checkaliasarg() is
unnecessary. New test is added to make sure unknown '$n' symbols are rejected.
In short, this patch moves the hack from tokenizedefn() to _relabelaliasargs(),
which is called after parsing. This change aims to eliminate tight dependency
on the revset tokenizer.
Before this patch, we had to rewrite an alias argument to a pseudo function:
"$1" -> "_aliasarg('$1')"
('symbol', '$1') -> ('function', ('symbol', '_aliasarg'), ('string', '$1'))
This was because the tokenizer must generate tokens that are syntactically
valid. By moving the process to the parsing phase, we can assign a unique tag
to an alias argument.
('symbol', '$1') -> ('_aliasarg', '$1')
Since new _aliasarg node never be generated from a user input, we no longer
have to verify a user input at findaliases(). The test for _aliasarg("$1") is
removed as it is syntactically valid and should pass the parsing phase.
Previous patch makes this classes useless by replacing it with
revsetpredicate of registrar.
BTW, extpredicate itself has already been broken by that patch,
because revsetpredicate of registrar doesn't have compatibility with
original predicate (derived from funcregistrar of registrar), in fact.
A filteredset is heavily used, but it cannot provide a printable information
how given set is filtered because a condition is an arbitrary callable object.
This patch adds an optional "condrepr" object that is used only by repr(). To
minimize the maintaining/runtime overhead of "condrepr", its type is overloaded
as follows:
type example
-------- ---------------------------------
tuple ('<not %r>', other)
str '<branch closed>'
callable lambda: '<branch %r>' % sorted(b)
object other
To make all built-in predicates be known to hggettext, loading
built-in predicates by loadpredicate() should be placed before fixing
i18nfunctions but after all of predicate decorating.
revsetpredicate is used to replace revset.predicate and
revset.extpredicate in subsequent patches.
This patch also adds loadpredicate() to revset, because this
combination helps to figure out how the name of safe predicate is put
into safesymbols.
This patch still uses safesymbols set to examine whether the predicate
corresponded to the 'name' is safe from DoS attack or not, because
just setting func._safe property needs changes below for such
examination.
before:
name in revset.safesymbols
after:
getattr(revset.symbols.get(name, None), '_safe', False)
"automatic registration" described in help doc of revsetpredicate
class will be achieved by the subsequent patch, which lists
loadpredicate() up in dispatch.extraloaders.
It's a source of UnboundLocalError to define and use local variables
conditionally. As getstring() always returns a str, "pat" can be initialized
to None.
Previously, revsets like 'X - Y' were translated to be 'X and not Y'. This can
be expensive, since if Y is a single commit then 'not Y' becomes a huge set and
sometimes the query optimizer doesn't account for it well.
This patch changes revsets to use the built in smartset minus operator, which is
often smarter than 'X and not Y'.
On a large repo this saves 2.2 seconds on rebase and histedit because "X:: - X"
becomes almost instant.
Relevant performance numbers from revsetbenchmark.py
revset #13: roots((tip~100::) - (tip~100::tip))
plain min max first last reverse rev..rst rev..ast sort sor..rst sor..ast
0) 0.001080 0.001107 0.001102 0.001118 0.001121 0.001114 0.001141 0.001123 0.001099 0.001123 0.001137
1) 0.000708 65% 0.000738 66% 0.000735 66% 0.000739 66% 0.000784 69% 0.000780 70% 0.000807 70% 0.000756 67% 0.000727 66% 0.000759 67% 0.000808 71%
revset #14: roots((0::) - (0::tip))
plain min max first last reverse rev..rst rev..ast sort sor..rst sor..ast
0) 0.131304 0.079168 0.133129 0.076560 0.048179 0.133349 0.049153 0.077097 0.129689 0.076212 0.048543
1) 0.065066 49% 0.036941 46% 0.066063 49% 0.034755 45% 0.048558 0.071091 53% 0.047679 0.034984 45% 0.064572 49% 0.035680 46% 0.048508
revset #22: (not public() - obsolete())
plain min max first last reverse rev..rst rev..ast sort sor..rst sor..ast
0) 0.000139 0.000133 0.000133 0.000138 0.000134 0.000155 0.000157 0.000152 0.000157 0.000156 0.000153
1) 0.000108 77% 0.000129 0.000129 0.000134 0.000132 0.000127 81% 0.000151 0.000147 0.000127 80% 0.000152 0.000149
revset #25: (20000::) - (20000)
plain min max first last reverse rev..rst rev..ast sort sor..rst sor..ast
0) 0.050560 0.045513 0.022593 0.043588 0.021909 0.045517 0.021822 0.044660 0.049740 0.044227 0.021819
1) 0.018614 36% 0.000171 0% 0.019659 87% 0.000168 0% 0.015543 70% 0.021069 46% 0.015623 71% 0.000180 0% 0.018658 37% 0.000186 0% 0.015750 72%
We can now specify from where the merge is performed. The experimental revset
is updated to take revisions as argument, allowing to test the feature.
This will become very useful for pick the 'rebase' default destination. For this
reason, we also exclude all descendants from the rebased set from the candidate
destinations. This descendants exclusion was not necessary for merge as default
destination would not be picked from anything else than a head.
I'm not super excited with the current error messages, but I would prefer to
delay an overall messages rework once 'hg rebase' is done getting a default
destination aligned with 'hg merge'.
Internal _matchfiles() function can take bunch of arguments, which would
lead to a maximum recursion depth error. This patch avoids the excessive
stack use by flattening 'list' nodes beforehand.
Since getlist() no longer takes a nested 'list' nodes, _parsealiasdecl()
also needs to flatten argument list, "aliasname($1, $2, ...)".
On repos with lots of heads, the filelog() code could spend several
minutes decompressing manifests. This change instead tries to
efficiently scan the changelog for candidates and decompress as few
manifests as possible. This is a regression introduced in 3.3 by the
linkrev adjustment code. Prior to that, filelog was nearly instant.
For the repo in the bug report, this improves time of a simple log
command from ~3 minutes to ~.5 seconds, a 360x speedup.
For the main Mercurial repo, a log of commands.py slows down from
1.14s to 1.45s, a 27% slowdown. This is still faster than the file()
revset, which takes 2.1 seconds.
The old _follow revset iterated over every file in the commit and checked if it
matched. For repos with large manifests, this could take 500ms. By switching to
use manifest.matches() we can take advantage of the fastpaths built in to
manifest.py that allows iterating over only the files in the matcher when it's a
simple matcher. This brings the time spent down from 500ms to 0ms during simple
operations like 'hg log -f file.txt'.
Using decorator can localize changes for adding (or removing) a "safe"
revset predicate function in source code.
To avoid accidentaly treating unsuitable predicates as safe, this
patch uses False as default value of "safe" argument. This forces safe
predicates to be decorated with explicit 'safe=True'.
Previous patch introduced 'revset.predicate' decorator to register
revset predicate function easily.
But it shouldn't be used in extension directly, because it registers
specified function immediately. Registration itself can't be restored,
even if extension loading fails after that.
Therefore, registration should be delayed until 'uisetup()' or so.
This patch uses 'extpredicate' decorator derived from 'delayregistrar'
to register predicate in extension easily.
This patch also tests whether 'registrar.delayregistrar' avoids
function registration if 'setup()' isn't invoked on it, because
'extpredicate' is the first user of it.
Using decorator can localize changes for adding (or removing) a revset
predicate function in source code.
It is also useful to pick predicates up for specific purpose. For
example, subsequent patch marks predicates as "safe" by decorator.
This patch defines 'parsefuncdecl()' in 'funcregistrar' class, because
this implementation can be uesd by other decorator class for fileset
predicate and template function.
This patch makes hg log <file|folder> faster by using changelog.readfiles
instead of changelog.read.
On our large repos for hg log <file|folder> -l5 operations that were taking:
- ~8s I see a 25% improvement
- ~15s, I see a 35% improvement
For recently modified folder/file, the difference is negligible as we don't
have to consider many revisions.