Previously there was no way of telling how much children or bookmarks or tags a
certain changeset has in a template. It was possible to tell if a changeset has
either 0 or not 0 bookmarks, but not to tell if it has 1 or 2 of them, for
example.
This filter, simply named count, makes it possible to count the number of items
in a list or the length of a string (or, anything that python's len can count).
E.g.: {children|count}, {bookmarks|count}, {file_adds|count}.
Testing the filter on node hash and shortened node hash is chosen because they
both have defined length.
As for lists of strings - children, tags and file_adds are used, because they
provide some variety and also prove that what's counted is the number of string
items in the list, and not the list stringified (they are lists of non-empty,
multi-character strings).
Additionally, revset template function is used for testing the filter, since
the combination is very flexible and will possibly be used together a lot.
(The previous version of this patch had an incorrect email subject and was
apparently lost - patchwork says the patch has been accepted, but it's not so.
The changes between that and this patch are minimal: now the filter does not
disturb the alphabetical order of function definitions and dict keys.)
This is useful for applying changes to each line, and it's especially powerful
when used in conjunction with conditionals to modify lines based on content.
The purpose of this new filter is to make it possible to partially replace the
functionality of the interhg extension. The idea is to be able to define regular
expression based substitutions on a new "websub" config section. hgweb will then
be able to apply these substitutions wherever the "websub" filter is used on a
template.
This first revision just adds the code necessary to load the websub expressions
and adds the websub filter, but it does not add any calls to the websub filter
itself on any of the templates. That will be done on the following revisions.
Currently, the 'user' filter is using util.shortuser(text) (which clearly
doesn't extract only the user portion of an email address, even though the
help text says it does).
The new 'emailuser' filter uses the new util.emailuser(text) function which,
instead, does exactly that.
The help text on the 'user' filter has been modified accordingly.
Add a doctest with an hopefuly-comprehensive list of combinations
we can expect in real-life situations.
This does not cover corner cases, for example when a CR or LF is
embedded in the name (allowed by RFC 5322!).
Code in tests/test-doctest.py contributed by:
Martin Geisler <mg@aragost.com>
Thanks!
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
RFC5322 (Internet Message Format) [0] says that the 'display name' of
an internet address [1] (what Mercurial calls 'person') can be quoted
with DQUOTE (ASCII 34: ") if it contains non-atom characters [2].
For example, dot '.' is a non-atom character. Also, DQUOTEs in a
quoted string will be escaped using "\" [2][3].
The current {author|person} template+filter just extracts the part
before an email address as-is. This can look ugly, especially on the
web interface, or when generating output for post-processing...
Moreover, as an example, the Mercurial repository has a bunch of
incoherent uses of DQUOTES in author names. As per Matt's digging:
$ hg log --template "{author|person}\n" | grep '"' | sort | uniq
"Andrei Vermel
"Aurelien Jacobs
"Daniel Santa Cruz
"Hidetaka Iwai
"Hiroshi Funai"
"Mathieu Clabaut
"Paul Moore
"Peter Arrenbrecht"
"Rafael Villar Burke
"Shun-ichi GOTO"
"Wallace, Eric S"
"Yann E. MORIN"
Josef "Jeff" Sipek
Radoslaw "AstralStorm" Szkodzinski
Fix the 'person' filter to remove leading and trailing DQUOTES,
and unescape remaining DQUOTES.
Given this author: "J. \"random\" DOE" <john@doe.net>
before: {author|person} : "J. \"random\" DOE"
after: {author|person} : J. "random" DOE
For the Mercurial repository, that leaves us with two authors with
DQUOTES, in acceptable positions:
$ hg log --template "{author|person}\n" | grep '"' | sort | uniq
Josef "Jeff" Sipek
Radoslaw "AstralStorm" Szkodzinski
[0] https://tools.ietf.org/html/rfc5322
[1] https://tools.ietf.org/html/rfc5322#section-3.4
[2] https://tools.ietf.org/html/rfc5322#section-3.2.4
[3] https://tools.ietf.org/html/rfc5322#section-3.2.1
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
This new 'bisect' template expands to a cset's bisection status (good,
bad and so on...). There is also a new 'shortbisect' filter that yields
a single char representing the cset's bisection status.
It uses the two recently-added hbisect.label() and .shortlabel() functions.
Example output using the repository in test-bisect2.t, and some made-up
state of the 'end at merge' test (with graphlog, it's so explicit):
$ hg glog --template '{rev}:{node|short} {bisect}\n' \
-r 'bisect(range)|bisect(ignored)'
o 17:228c06deef46: bad
|
o 16:609d82a7ebae: bad (implicit)
|
o 15:857b178a7cf3: bad
|\
| o 13:b0a32c86eb31: good
| |
| o 12:9f259202bbe7: good (implicit)
| |
| o 11:82ca6f06eccd: good
| |
@ | 10:429fcd26f52d: untested
|\ \
| o | 9:3c77083deb4a: skipped
| |/
| o 8:dab8161ac8fc: good
| |
o | 6:a214d5d3811a: ignored
|\ \
| o | 5:385a529b6670: ignored
| | |
o | | 4:5c668c22234f: ignored
| | |
o | | 3:0950834f0a9c: ignored
|/ /
o / 2:051e12f87bf1: ignored
|/
And now the same with the short label:
$ hg log --template '{bisect|shortbisect} {rev}:{node|short}\n'
18:d42e18c7bc9b
B 17:228c06deef46
B 16:609d82a7ebae
B 15:857b178a7cf3
14:faa450606157
G 13:b0a32c86eb31
G 12:9f259202bbe7
G 11:82ca6f06eccd
U 10:429fcd26f52d
S 9:3c77083deb4a
G 8:dab8161ac8fc
7:50c76098bbf2
I 6:a214d5d3811a
I 5:385a529b6670
I 4:5c668c22234f
I 3:0950834f0a9c
I 2:051e12f87bf1
1:4ca5088da217
0:33b1f9bc8bc5
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@anciens.enib.fr>
This will highly simplify the docstring integration. I measured "hg log
--style=changelog" duration on mercurial itself and could not detect any
difference.
It aims to fix javascript error of hgweb's graph view in Japanese 'cp932'
encoding.
'cp932' contains multibyte characters ending with '\x5c' (backslash),
e.g. '\x94\x5c' for Japanese Kanji 'Noh'.
Due to json filter escapes '\' to '\\', multibyte string ending with
'\x5c' is translated to "xxx\", resulting javascript parse error on
a web browser.
This patch changes json() to pass unicode to jsonescape().
Unicode decoding error handler changed to 'replace' by Patrick Mézard.
Mercurial has problem around text wrapping/filling in MBCS encoding
environment, because standard 'textwrap' module of Python can not
treat it correctly. It splits byte sequence for one character into two
lines.
According to unicode specification, "east asian width" classifies
characters into:
W(ide), N(arrow), F(ull-width), H(alf-width), A(mbiguous)
W/N/F/H can be always recognized as 2/1/2/1 bytes in byte sequence,
but 'A' can not. Size of 'A' depends on language in which it is used.
Unicode specification says:
If the context(= language) cannot be established reliably they
should be treated as narrow characters by default
but many of class 'A' characters are full-width, at least, in Japanese
environment.
So, this patch treats class 'A' characters as full-width always for
safety wrapping.
This patch focuses only on MBCS safe-ness, not on writing/printing
rule strict wrapping for each languages
MBCS sensitive textwrap class is originally implemented
by ITO Nobuaki <daydream.trippers@gmail.com>.