util: replace 'ellipsis' implementation by 'encoding.trim'

Before this patch, 'util.ellipsis' tried to avoid splitting at
intermediate multi-byte sequence, but its implementation was incorrect.

Internal function '_ellipsis' trims specified unicode sequence not at
most maxlength 'columns in display', but at most maxlength number of
'unicode characters'.

    def _ellipsis(text, maxlength):
        if len(text) <= maxlength:
            return text, False
        else:
            return "%s..." % (text[:maxlength - 3]), True

In many encodings, number of unicode characters can be different from
columns in display.

This patch replaces 'ellipsis' implementation by 'encoding.trim',
which can trim string at most maxlength columns in display correctly,
even though specified string contains multi-byte characters.

'_ellipsis' is removed in this patch, because it is referred only from
'ellipsis'.
This commit is contained in:
FUJIWARA Katsunori 2014-07-06 02:56:41 +09:00
parent 71717db270
commit 5206a6fd25

View File

@ -1318,23 +1318,9 @@ def email(author):
r = None
return author[author.find('<') + 1:r]
def _ellipsis(text, maxlength):
if len(text) <= maxlength:
return text, False
else:
return "%s..." % (text[:maxlength - 3]), True
def ellipsis(text, maxlength=400):
"""Trim string to at most maxlength (default: 400) characters."""
try:
# use unicode not to split at intermediate multi-byte sequence
utext, truncated = _ellipsis(text.decode(encoding.encoding),
maxlength)
if not truncated:
return text
return utext.encode(encoding.encoding)
except (UnicodeDecodeError, UnicodeEncodeError):
return _ellipsis(text, maxlength)[0]
"""Trim string to at most maxlength (default: 400) columns in display."""
return encoding.trim(text, maxlength, ellipsis='...')
def unitcountfn(*unittable):
'''return a function that renders a readable count of some quantity'''