It would be nice if we could detect recursion at the parsing phase, but we
can't because a template can refer to a keyword of the same name. For example,
"rev = {rev}" is valid if rev is a keyword, and we don't know if rev is a
keyword or a template while parsing.
This is necessary to obtain a _hybrid object from a dict. If get() yields
a value, it would be stringified.
I see no benefit to make get() lazy, so this patch just changes "yield" to
"return".
In templater, a callable symbol exists for lazy evaluation, which should have
f(**mapping) signature. On the other hand, _hybrid.__call__(), which was
introduced by 4e182fb53989, generates mapping for each element.
This patch renames _hybrid.__call__() to _hybrid.itermaps() so that a _hybrid
object can be a value of a mapping dict.
{namespaces % "{namespace}: {names % "{name }"}\n"}
~~~~~
a _hybrid object
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
It was introduced by 236440938a03, but the important code was removed by
fcf2407610f4. So there was no positive effect other than exhausting memory.
The problem spotted by 236440938a03 is that you can't use a generator keyword
more than once. For example, in hgweb template, "{child} {child}" doesn't work
because the first "{child}" consumes the generator. But as fcf2407610f4 says,
the fix was wrong because it could overwrite a callable keyword that returns
a generator. Also the fix didn't work for a generator of generator such as
"{diff}" keyword. So, the proper fix for that problem would be to not put
a generator in a keyword table. Instead, it should be a factory of a generator.
Note that this should fix the memory issue in hgweb, but my firefox killed by
OOM in place. Be careful to not use a modern web browser to test the issue4868.
This allows the latest class of tag to be found, such as a release candidate or
final build, instead of just the absolute latest.
It doesn't appear that the existing keyword can be given an optional argument.
There is a keyword, function and filter for 'date', so it doesn't seem harmful
to introduce a new function with the same name as an existing keyword. Most
functions are pretty Mercurial agnostic, but there is {revset()} as precedent.
Even though templatekw.getlatesttags() returns a single tuple, one entry of
which is a list, it is simplest to present this as a list of tags instead of a
single item, with each tag having a distance and change count attribute. It is
also closer to how {latesttag} returns a list of tags, and how this function
works when not given a '%' operator.
Because revset() function generates a list of revisions, it seems sensible
to switch the ctx as well where a list expression will be evaluated. I think
"{revset(...) % "..."}" expression wasn't considered well when it was
introduced at 45e0e191755f.
The keyword extension uses "utcdate" for a different function, so we can't
add new "utcdate" filter or function. Instead, this patch extends "localdate"
to a general timezone converter.
This will allow us to define both a primary expression, ":", and a prefix
operator, ":y". The ambiguity will be resolved by the next patch.
Prefix actions in elements table are adjusted as follows:
original prefix primary prefix
----------------- -------- -----------------
("group", 1, ")") -> n/a ("group", 1, ")")
("negate", 19) -> n/a ("negate", 19)
("symbol",) -> "symbol" n/a
"rawstring" was introduced by cd1b50e99ed8, but it's no longer necessary
because 4f14a9644001 and e99f4f59d2e9 changed the way of processing string
literals.
This patch moves string decoding to the parsing phase as it was before:
('rawstring', s) -> ('string', s)
('string', s) -> ('string', s.decode('string-escape'))
Instead of re-parsing quoted strings as templates, the tokenizer can delegate
the parsing of nested template strings to the parser. It has two benefits:
1. syntax errors can be reported with absolute positions
2. nested template can use quotes just like shell: "{"{rev}"}"
It doesn't sound nice that the tokenizer recurses into the parser. We could
instead make the tokenize itself recursive, but it would be much more
complicated because we would have to adjust binding strengths carefully and
put dummy infix operators to concatenate template fragments.
Now "string" token without r"" never appears. It will be removed by the next
patch.
This patch backs out 297d563e92af which should no longer be needed.
The test for '{\"invalid\"}' is removed because the parser is permissive for
\"...\" literal.
As of Mercurial 3.4, there were several syntax rules to process nested
template strings. Unfortunately, they were inconsistent and conflicted
each other.
a. buildmap() rule
- template string is _parsed_ as string, and parsed as template
- <\"> is not allowed in nested template:
{xs % "{f(\"{x}\")}"} -> parse error
- template escaping <\{> is handled consistently:
{xs % "\{x}"} -> escaped
b. _evalifliteral() rule
- template string is _interpreted_ as string, and parsed as template
in crafted environment to avoid double processing of escape sequences
- <\"> is allowed in nested template:
{if(x, "{f(\"{x}\")}")}
- <\{> and escape sequences in string literal in nested template are not
handled well
c. pad() rule
- template string is first interpreted as string, and parsed as template,
which means escape sequences are processed twice
- <\"> is allowed in nested template:
{pad("{xs % \"{x}\"}', 10)}
Because of the issue of template escaping, issue4714, 56e0b66a4c27 (in stable)
unified the rule (b) to (a). Then, 41e044cfb1ef (in default) unified the rule
(c) to (b) = (a). But they disabled the following syntax that was somewhat
considered valid.
{if(rev, "{if(rev, \"{rev}\")}")}
{pad("{files % \"{file}\"}", 10)}
So, this patch introduces \"...\" literal to work around the escaped-quoted
nested template strings. Because this parsing rule exists only for the backward
compatibility, it is designed to copy the behavior of old _evalifliteral() as
possible.
Future patches will introduce a better parsing rule similar to a command
substitution of POSIX shells or a string interpolation of Ruby, where extra
escapes won't be necessary at all.
{pad("{files % "{file}"}", 10)}
~~~~~~~~~~~~~~~~~~
parsed as a template, not as a string
Because <\> character wasn't allowed in a template fragment, this patch won't
introduce more breakages. But the syntax of nested templates are interpreted
differently by people, there might be unknown issues. So if we want, we could
instead remove e926f2ef639a, 72be08a15d8d and 56e0b66a4c27 from the stable
branch as the bug fixed by these patches existed for longer periods.
554d6fcc3c8, "strip single backslash before quotation mark in quoted template",
should be superseded by this patch. I'll remove it later.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
The backslash character should start escape sequences no matter if a string is
prefixed with 'r'. They are just not interpreted as escape sequences in raw
strings. revset.tokenize() handles them correctly, but templater didn't.
https://docs.python.org/2/reference/lexical_analysis.html#string-literals
This can simplify the interface of parse() function. Our tokenizer tends to
have optional arguments other than the message to be parsed.
Before this patch, the "lookup" argument existed only for the revset, and the
templater had to pack [program, start, end] to be passed to its tokenizer.
The problem was spotted at cd1b50e99ed8, that says "this patch invokes it
with "strtoken='rawstring'" in "_evalifliteral()", because "t" is the result
of "arg" evaluation and it should be "string-escape"-ed if "arg" is "string"
expression." This workaround is no longer valid since 72be08a15d8d introduced
strict parsing of '\{'.
Instead, we should interpret bare token as "string" or "rawstring" template.
This is what buildmap() does at parsing phase.
Because double backslashes are processed as a string escape sequence, '\\{'
should start the template syntax. On the other hand, r'' disables any sort
of \-escapes, so r'\{' can go either way, never start the template syntax
or always start it. I simply chose the latter, which means r'\{' is the same
as '\\{'.
This patch brings back pre-2.8.1 behavior.
The result of parsestring() is stored in templater's cache, t.cache, and then
it is parsed as a template string by compiletemplate(). So t.cache should keep
an unparsed string no matter if it is sourced from config value. Otherwise
backslashes would be processed twice.
The test vector is borrowed from 83ff877959a6.
The content of "hg help templating" is largely derived from docstrings
on functions providing functionality. Template functions are the long
holdout.
Prepare for generating them dynamically by defining docstrings for all
template functions.
There are numerous ways these docs could be improved. Right now, the
help output simply shows function names and arguments. So literally
any accurate data is better than what is there now.
I've tried to unify gettemplate() with buildtemplate(), but it didn't go well
because gettemplate() have to bypass mapping dict.
For example, web templates have '{tags%changelogtag}' and 'changelogtag' is
defined in both mapping, the default, and context.cache, sourced from map file.
In general, mapping shadows context variables, but gettemplate() have to pick
it from context.cache.
The previous patch made 'string' is always interpreted as a template. So
this patch removes the special handling of r'rawstring' instead. Now r''
disables template processing at all.
This patch series is intended to unify the interpretation of string literals.
It is breaking change that boldly assumes
a. string literal "..." never contains template-like fragment or it is
intended to be a template
b. we tend to use raw string literal r"..." for regexp pattern in which "{"
should have different meaning
Currently, we don't have a comprehensible rule how string literals are
evaluated in template functions. For example, fill() takes "initialindent"
and "hangindent" as templates, but not for "text", whereas "text" is a
template in pad() function.
date(date, fmt)
diff(includepattern, excludepattern)
fill(text, width, initialident: T, hangindent: T)
get(dict, key)
if(expr, then: T, else: T)
ifcontains(search, thing, then: T, else: T)
ifeq(expr1, expr2, then: T, else: T)
indent(text, indentchars, firstline)
join(list, sep)
label(label: T, expr: T)
pad(text: T, width, fillchar, right)
revset(query, formatargs...])
rstdoc(text, style)
shortest(node, minlength)
startswith(pattern, text)
strip(text, chars)
sub(pattern, replacement, expression: T)
word(number, text, separator)
expr % template: T
T: interpret "string" or r"rawstring" as template
This patch series adjusts the rule as follows:
a. string literal, '' or "", starts template processing (BC)
b. raw string literal, r'' or r"", disables both \-escape and template
processing (BC, done by subsequent patches)
c. fragment not surrounded by {} is non-templated string
"ccc{'aaa'}{r'bbb'}"
------------------ *: template
--- c: string
--- a: template
--- b: rawstring
Because this can eliminate the compilation of template arguments from the
evaluation phase, "hg log -Tdefault" gets faster.
% cd mozilla-central
% LANG=C HGRCPATH=/dev/null hg log -Tdefault -r0:10000 --time > /dev/null
before: real 4.870 secs (user 4.860+0.000 sys 0.010+0.000)
after: real 3.480 secs (user 3.440+0.000 sys 0.030+0.000)
Also, this will allow us to parse nested templates at once for better error
indication.
The next patch will introduce buildtemplate function that should be defined
near runtemplate. But I don't want to insert it between buildmap and runmap.
Although Python supports `X = Y if COND else Z`, this was only
introduced in Python 2.5. Since we have to support Python 2.4, it was
a very common thing to write instead `X = COND and Y or Z`, which is a
bit obscure at a glance. It requires some intricate knowledge of
Python to understand how to parse these one-liners.
We change instead all of these one-liners to 4-liners. This was
executed with the following perlism:
find -name "*.py" -exec perl -pi -e 's,(\s*)([\.\w]+) = \(?(\S+)\s+and\s+(\S*)\)?\s+or\s+(\S*)$,$1if $3:\n$1 $2 = $4\n$1else:\n$1 $2 = $5,' {} \;
I tweaked the following cases from the automatic Perl output:
prev = (parents and parents[0]) or nullid
port = (use_ssl and 443 or 80)
cwd = (pats and repo.getcwd()) or ''
rename = fctx and webutil.renamelink(fctx) or []
ctx = fctx and fctx or ctx
self.base = (mapfile and os.path.dirname(mapfile)) or ''
I also added some newlines wherever they seemd appropriate for readability
There are probably a few ersatz ternary operators still in the code
somewhere, lurking away from the power of a simple regex.
A style name should not contain "/", "\", "." and "..". Otherwise, templates
could be loaded from outside of the specified templates directory by invalid
?style= parameter. hgweb should not allow such requests.
This change means subdir/name is also rejected.
Template functions use "yield"s assuming that the result will be combined
into a string, which means both "f -> str" and "f -> generator" should behave
in the same way.
Before this patch, piping generator function resulted in a cryptic error.
We had to insert "|stringify" in this case.
$ hg log --template '{if(author, author)|user}\n'
abort: template filter 'userfilter' is not compatible with keyword
'[(<function runsymbol at 0x7f5af2e8d8c0>, 'author'),
(<function runsymbol at 0x7f5af2e8d8c0>, 'author')]'
7678263f920c is fine for "{revset()}", but "i.values()[0]" does not work if
each item has more than one values such as "{bookmarks}".
This fixes the problem by using list.__contains__ or dict.__contains__
appropriately.
This change is intended to avoid exposing the implementation detail to
callers. I'm going to extend fullreposet to support "null" revision, so
these mfunc calls will have to use fullreposet() instead of spanset().
"pad" function and "rawstring" type were introduced in parallel, 89145c35f76e
in default and cd1b50e99ed8 in stable respectively. Therefore, "pad" function
lacked handling of "rawstring" unintentionally.
Before this patch, we had to quote integer literals to pass to template
functions. It was error-prone, so we should allow "word(0, x)" syntax.
Currently only decimal integers are allowed. It's easy to support 0x, 0b and 0
prefixes, but I don't think they are useful.
This patch assumes that template keywords and names defined in map files do
not start with digits, except for positional variables seen in the schemes
extension.
The next patch will introduce integer literals, but the schemes extension
expects that '{1}', '{2}', ... are interpreted as keywords. This patch allows
us to process '{foo(1)}' as 'func(integer)', whereas '{1}' as 'symbol'.
e926f2ef639a fixed the issue of double escapes, but it made the following
template fail with syntax error because of <\">. Strictly speaking, <\">
appears to be invalid in non-string part, but we are likely to escape <">
if surrounded by quotes, and we are used to write such templates by trial
and error.
[templates]
sl = "{tags % \"{ifeq(tag,'tip','',label('log.tag', ' {tag}'))}\"}"
So, for backward compatibility between 2.8.1 and 3.4, a single backslash
before quotation mark is stripped only in quoted template. We don't care
for <\"> in string literal in quoted template, which never worked as expected
before.
template result
--------- ------------------------
{\"\"} parse error
"{""}" {""} -> <>
"{\"\"}" {""} -> <>
{"\""} {"\""} -> <">
'{"\""}' {"\""} -> <">
"{"\""}" parse error (don't care)
This keyword remapping was introduced in 236440938a03 as part of converting
generator based iterators into list based iterators, mentioning "undesired
behavior in template" when a generator is exhausted, but doesn't say what and
introduces no tests.
The problem with the remapping was that it corrupted the output for keywords
like 'extras', 'file_copies' and 'file_copies_switch' in templates such as:
$ hg log -r 82a4f5557c6b --template "{file_copies % ' File: {file_copy}\n'}"
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
What was happening was that in the first call to runtemplate() inside runmap(),
'lm' mapped the keyword (e.g. file_copies) to the appropriate showxxx() method.
On each subsequent call to runtemplate() in that loop however, the keyword was
mapped to a list of the first item's pieces, e.g.:
'file_copy': ['mercurial/changelog.py', ' (', 'mercurial/hg.py', ')']
Therefore, the dict for the second and any subsequent items were not processed
through the corresponding showxxx() method, and the first item's data was
reused.
The 'extras' keyword regressed in 56b014c52204, and 'file_copies' regressed in
4e182fb53989 for other reasons. The common thread of things fixed by this seems
to be when a list of dicts are passed to the templatekw._hybrid class.
The search was introduced in 501e9ec9a85d. It might have been necessary back
then when using __file__ directly and frozen-ness wasn't considered. Now we
should know exactly where the templates can be found.
templates, help and locale data is normally stored as sub folders in the
directory containing the source of the mercurial module. In a frozen build they
live as sub folders next to 'hg.exe' and 'library.zip'.
These different kind of data were handled in different ways. Unify that by
introducing util.datapath. The value is computed from the environment and is
always used, so we just calculate the value on module load.
"diff" allows to embed changes in the target revision into template
output, even if the command itself doesn't take "--patch" option
Combination of "[committemplate]" configuration and "diff" template
function can achieve the feature like issue231 ("option to have diff
displayed in commit editor buffer")
http://bz.selenic.com/show_bug.cgi?id=231
For example, templating below can be used to add each "diff" output
lines "HG: " prefix::
{splitlines(diff) % 'HG: {line}\n'}
This patch implements "diff" not as "a template keyword" but as "a
template function" to take include/exclude patterns at runtime.
It allows to specify target files of command (by -I/-X command line
options) and "diff" separately.
Before this patch, predicates defined in "[revsetalias]" can't be used
in the query specified to template function "revset()", because:
- "revset()" uses "localrepository.revs()" to get query result, but
- "localrepository.revs()" passes "None" as "ui" to "revset.match()", then
- "revset.match()" can't recognize any alias predicates
To enable alias predicates to be used in "revset()" function, this
patch invokes "revset.match()" directly with "repo.ui".
This patch doesn't make "localrepository.revs()" pass "self.ui" to
"revset.match()", because this may be intentional implementation to
prevent alias predicates from shadowing built-in ones and breaking
functions internally using "localrepository.revs()".
Even if it isn't intentional one, the check for shadowing should be
implemented (maybe on default branch) before fixing it for safety.
This function allows returning only the nth "word" from a string. By default
a string is split as by Python's split() function default, but an optional
third parameter can also override what string the string is split by.
This function returns a string only if it starts with a given string.
It is particularly useful when combined with splitlines and/or used with
conditionals that fail when empty strings are passed in to take action
based on the contents of a line.
Previously the ifcontains revset was checking against the set using a pure
__contains__ check. It turns out the set was actually a list of
formatted strings meant for ui output, which meant the contains check failed if
the formatted string wasn't significantly different from the raw value.
This change makes it check against the raw data, prior to it being formatted.
This should correct an earlier couple of bad merges (5433856b2558 and
596960a4ad0d, now pruned) that accidentally brought in a change that had
been marked obsolete (244ac996a821).
Previously, if a template '{foo()}' was given, the buildfunc would not be able
to match it and hit a code path that would not return so it would error out
later in the templater stating that NoneType was not iterable. This patch makes
sure that a proper error is raised so that the user can be informed.
Tests have been updated.
Changeset 83ff877959a6 (released with 2.8.1) fixed "recursively
evaluate string literals as templates" problem (issue4102) by moving
the location of "string-escape"-ing from "tokenizer()" to
"compiletemplate()".
But some parts in template expressions below are not processed by
"compiletemplate()", and it may cause unexpected result.
- 'expr' of 'if(expr, then, else)'
- 'expr's of 'ifeq(expr, expr, then, else)'
- 'sep' of 'join(list, sep)'
- 'text' and 'style' of 'rstdoc(text, style)'
- 'text' and 'chars' of 'strip(text, chars)'
- 'pat' and 'repl' of 'sub(pat, repl, expr)'
For example, '\n' of "{join(extras, '\n')}" is not "string-escape"-ed
and treated as a literal '\n'. This breaks "Display the contents of
the 'extra' field, one per line" example in "hg help templates".
Just "string-escape"-ing on each parts above may not work correctly,
because inside expression of nested ones already applies
"string-escape" on string literals. For example:
- "{join(files, '\n')}" doesn't return "string-escape"-ed string, but
- "{join(files, if(branch, '\n', '\n'))}" does
To fix this problem, this patch does:
- introduce "rawstring" token and "runrawstring" method to handle
strings not to be "string-escape"-ed correctly, and
- make "runstring" method return "string-escape"-ed string, and
delay "string-escape"-ing until evaluation
This patch invokes "compiletemplate()" with "strtoken=exp[0]" in
"gettemplate()", because "exp[1]" is not yet evaluated. This code path
is tested via mapping ("expr % '{template}'").
In the other hand, this patch invokes it with "strtoken='rawstring'"
in "_evalifliteral()", because "t" is the result of "arg" evaluation
and it should be "string-escape"-ed if "arg" is "string" expression.
This patch doesn't test "string-escape"-ing on 'expr' of 'if(expr,
then, else)', because it doesn't affect the result.
Templating syntax allows nested expression to be specified as parts
below, but they are evaluated as a generator and don't work correctly.
- 'sep' of 'join(list, sep)'
- 'text' and 'chars' of 'strip(text, chars)'
In the former case, 'sep' returns expected string only for the first
separation, and empty one for the second or later, because the
generator has only one element.
In the latter case, templating is aborted by exception, because the
generator doesn't have 'strip()' method (as 'text') and can't be
passed as the argument to 'str.strip()' (as 'chars').
This patch applies "stringify()" on these sub expression to get string
correctly.
Changeset c84f81c3e120 (released with 2.8.1) fixed "recursively
evaluate string literals as templates" problem (issue4103) by
introducing "_evalifliteral()".
But some parts in template expressions below are still processed by
the combination of "compiletemplate()" and "runtemplate()", and may
cause same problem unexpectedly.
- 'init' and 'hang' of 'fill(text, width, init, hang)'
- 'expr' of 'sub(pat, repl, expr)'
- 'label' of 'label(label, expr)'
This patch processes them by "_evalifliteral()" instead of the
combination of "compiletemplate()" and "runtemplate()" to avoid
recursive evaluation of string literals completely.
Originally, the addition of the 'shorten' template function in 83d773060c28
would not consider pure integers for shortening. This patch considers two
simple cases: when the integer starts with zero (which is parsed by Mercurial
as a hash first) and when the integer is larger than the tip (obviously not a
rev).