Using decorator can localize changes for adding (or removing) a
template filter function in source code.
This patch also removes leading ":FILTER:" part in help document of
each filters, because using templatefilter makes it useless.
This patch uses not 'filter' but 'templatefilter' as a decorator name,
because the former name hides Python built-in one, even though the
latter is a little redundant in 'templatefilters.py'.
This patch also adds loadfilter() to templatefilters, because this
combination helps to figure out how they cooperate with each other.
Listing up loadfilter() in dispatch.extraloaders causes implicit
loading template filter functions at loading (3rd party) extension.
This change requires that "templatefilter" attribute of (3rd party)
extension is registrar.templatefilter or so.
This is necessary to preserve filename encoding over JSON. Instead, this
patch inserts "|utf8" where non-ascii local-encoding texts can be passed
to "|json".
See also the commit that introduced "utf8" filter.
This will be applied prior to "|json" filter. This sounds like odd, but it
is necessary to handle local-encoding text as well as raw filename bytes.
Because filenames are bytes in Mercurial and Unix world, {filename|json} should
preserve the original byte sequence, which implies
{x|json} -> '"' toutf8b(x) '"'
On the other hand, most template strings are in local encoding. Because
"|json" filter have to be byte-transparent to filenames, we need something to
annotate an input as a local string, that's what "|utf8" will do.
{x|utf8|json} -> '"' toutf8b(fromlocal(x)) '"'
"|utf8" is an explicit call, so aborts if input bytes can't be converted to
UTF-8.
It's been unused, undocumented and flawed in that it expects a unicode input,
never works correctly if an input has non-ascii character. We should use "json"
filter instead.
As JSON string is known to be a unicode, we should try round-trip conversion
for localstr type. This patch tests localstr type explicitly because
encoding.fromlocal() may raise Abort for undecodable str, which is probably
not what we want. Maybe we can refactor json filter to use encoding module
more later.
Still "{desc|json}" can't round-trip because showdescription() modifies a
localstr object.
We were assuming everything under 128 was printable ascii, but there are a lot
of control characters in that range that can't simply be included in json and
other targets. We forcibly encode everything under 32, because they are either
control char or oddly printable (like tab or line ending).
We also add the hypothesis-powered test that caught this.
There needs to be a way to escape symbolic revisions containing forward
slashes, but urlescape filter doesn't escape slashes at all (in fact, it is
used in places where forward slashes must be preserved).
The filter considers @ to be safe just for bookmarks like @ and @default to
look good in urls.
A few template keywords can in fact return None, such as {bisect}. In
some contexts, these get stringified into None instead of "". This is
leaking Python details into the UI.
The json filter was previously iterating over keys in an object in an
undefined order. Let's throw a sorted() in there so output is
consistent.
It's somewhat frightening that there are no tests for the json filter.
Subsequent commits will add them, so we pass on the opportunity to add
them here.
The "changeset" template from hgweb is using a lambda in the
"diffsummary" key. In preparation for enabling JSON output from hgweb,
teach the json filter how to call functions.
Previously there was no way of telling how much children or bookmarks or tags a
certain changeset has in a template. It was possible to tell if a changeset has
either 0 or not 0 bookmarks, but not to tell if it has 1 or 2 of them, for
example.
This filter, simply named count, makes it possible to count the number of items
in a list or the length of a string (or, anything that python's len can count).
E.g.: {children|count}, {bookmarks|count}, {file_adds|count}.
Testing the filter on node hash and shortened node hash is chosen because they
both have defined length.
As for lists of strings - children, tags and file_adds are used, because they
provide some variety and also prove that what's counted is the number of string
items in the list, and not the list stringified (they are lists of non-empty,
multi-character strings).
Additionally, revset template function is used for testing the filter, since
the combination is very flexible and will possibly be used together a lot.
(The previous version of this patch had an incorrect email subject and was
apparently lost - patchwork says the patch has been accepted, but it's not so.
The changes between that and this patch are minimal: now the filter does not
disturb the alphabetical order of function definitions and dict keys.)
This is useful for applying changes to each line, and it's especially powerful
when used in conjunction with conditionals to modify lines based on content.
The purpose of this new filter is to make it possible to partially replace the
functionality of the interhg extension. The idea is to be able to define regular
expression based substitutions on a new "websub" config section. hgweb will then
be able to apply these substitutions wherever the "websub" filter is used on a
template.
This first revision just adds the code necessary to load the websub expressions
and adds the websub filter, but it does not add any calls to the websub filter
itself on any of the templates. That will be done on the following revisions.
Currently, the 'user' filter is using util.shortuser(text) (which clearly
doesn't extract only the user portion of an email address, even though the
help text says it does).
The new 'emailuser' filter uses the new util.emailuser(text) function which,
instead, does exactly that.
The help text on the 'user' filter has been modified accordingly.
Add a doctest with an hopefuly-comprehensive list of combinations
we can expect in real-life situations.
This does not cover corner cases, for example when a CR or LF is
embedded in the name (allowed by RFC 5322!).
Code in tests/test-doctest.py contributed by:
Martin Geisler <mg@aragost.com>
Thanks!
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
RFC5322 (Internet Message Format) [0] says that the 'display name' of
an internet address [1] (what Mercurial calls 'person') can be quoted
with DQUOTE (ASCII 34: ") if it contains non-atom characters [2].
For example, dot '.' is a non-atom character. Also, DQUOTEs in a
quoted string will be escaped using "\" [2][3].
The current {author|person} template+filter just extracts the part
before an email address as-is. This can look ugly, especially on the
web interface, or when generating output for post-processing...
Moreover, as an example, the Mercurial repository has a bunch of
incoherent uses of DQUOTES in author names. As per Matt's digging:
$ hg log --template "{author|person}\n" | grep '"' | sort | uniq
"Andrei Vermel
"Aurelien Jacobs
"Daniel Santa Cruz
"Hidetaka Iwai
"Hiroshi Funai"
"Mathieu Clabaut
"Paul Moore
"Peter Arrenbrecht"
"Rafael Villar Burke
"Shun-ichi GOTO"
"Wallace, Eric S"
"Yann E. MORIN"
Josef "Jeff" Sipek
Radoslaw "AstralStorm" Szkodzinski
Fix the 'person' filter to remove leading and trailing DQUOTES,
and unescape remaining DQUOTES.
Given this author: "J. \"random\" DOE" <john@doe.net>
before: {author|person} : "J. \"random\" DOE"
after: {author|person} : J. "random" DOE
For the Mercurial repository, that leaves us with two authors with
DQUOTES, in acceptable positions:
$ hg log --template "{author|person}\n" | grep '"' | sort | uniq
Josef "Jeff" Sipek
Radoslaw "AstralStorm" Szkodzinski
[0] https://tools.ietf.org/html/rfc5322
[1] https://tools.ietf.org/html/rfc5322#section-3.4
[2] https://tools.ietf.org/html/rfc5322#section-3.2.4
[3] https://tools.ietf.org/html/rfc5322#section-3.2.1
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
This new 'bisect' template expands to a cset's bisection status (good,
bad and so on...). There is also a new 'shortbisect' filter that yields
a single char representing the cset's bisection status.
It uses the two recently-added hbisect.label() and .shortlabel() functions.
Example output using the repository in test-bisect2.t, and some made-up
state of the 'end at merge' test (with graphlog, it's so explicit):
$ hg glog --template '{rev}:{node|short} {bisect}\n' \
-r 'bisect(range)|bisect(ignored)'
o 17:228c06deef46: bad
|
o 16:609d82a7ebae: bad (implicit)
|
o 15:857b178a7cf3: bad
|\
| o 13:b0a32c86eb31: good
| |
| o 12:9f259202bbe7: good (implicit)
| |
| o 11:82ca6f06eccd: good
| |
@ | 10:429fcd26f52d: untested
|\ \
| o | 9:3c77083deb4a: skipped
| |/
| o 8:dab8161ac8fc: good
| |
o | 6:a214d5d3811a: ignored
|\ \
| o | 5:385a529b6670: ignored
| | |
o | | 4:5c668c22234f: ignored
| | |
o | | 3:0950834f0a9c: ignored
|/ /
o / 2:051e12f87bf1: ignored
|/
And now the same with the short label:
$ hg log --template '{bisect|shortbisect} {rev}:{node|short}\n'
18:d42e18c7bc9b
B 17:228c06deef46
B 16:609d82a7ebae
B 15:857b178a7cf3
14:faa450606157
G 13:b0a32c86eb31
G 12:9f259202bbe7
G 11:82ca6f06eccd
U 10:429fcd26f52d
S 9:3c77083deb4a
G 8:dab8161ac8fc
7:50c76098bbf2
I 6:a214d5d3811a
I 5:385a529b6670
I 4:5c668c22234f
I 3:0950834f0a9c
I 2:051e12f87bf1
1:4ca5088da217
0:33b1f9bc8bc5
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@anciens.enib.fr>
This will highly simplify the docstring integration. I measured "hg log
--style=changelog" duration on mercurial itself and could not detect any
difference.