sapling/tests/test-encoding.out
FUJIWARA Katsunori 9cce255bec replace Python standard textwrap by MBCS sensitive one for i18n text
Mercurial has problem around text wrapping/filling in MBCS encoding
environment, because standard 'textwrap' module of Python can not
treat it correctly. It splits byte sequence for one character into two
lines.

According to unicode specification, "east asian width" classifies
characters into:

   W(ide), N(arrow), F(ull-width), H(alf-width), A(mbiguous)


W/N/F/H can be always recognized as 2/1/2/1 bytes in byte sequence,
but 'A' can not. Size of 'A' depends on language in which it is used.

Unicode specification says:

   If the context(= language) cannot be established reliably they
   should be treated as narrow characters by default

but many of class 'A' characters are full-width, at least, in Japanese
environment.

So, this patch treats class 'A' characters as full-width always for
safety wrapping.

This patch focuses only on MBCS safe-ness, not on writing/printing
rule strict wrapping for each languages

MBCS sensitive textwrap class is originally implemented
by ITO Nobuaki <daydream.trippers@gmail.com>.
2010-06-06 17:20:10 +09:00

175 lines
4.8 KiB
Plaintext

adding changesets
adding manifests
adding file changes
added 2 changesets with 2 changes to 1 files
(run 'hg update' to get a working copy)
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
% should fail with encoding error
M a
? latin-1
? latin-1-tag
? utf-8
transaction abort!
rollback completed
abort: decoding near ' encoded: é': 'ascii' codec can't decode byte 0xe9 in position 20: ordinal not in range(128)!
% these should work
marked working directory as branch é
% hg log (ascii)
changeset: 5:db5520b4645f
branch: ?
tag: tip
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin1 branch
changeset: 4:9cff3c980b58
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: Added tag ? for changeset 770b9b11621d
changeset: 3:770b9b11621d
tag: ?
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: utf-8 e' encoded: ?
changeset: 2:0572af48b948
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e' encoded: ?
changeset: 1:0e5b7e3f9c4a
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: koi8-r: ????? = u'\u0440\u0442\u0443\u0442\u044c'
changeset: 0:1e78a93102a3
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e': ? = u'\xe9'
% hg log (latin-1)
changeset: 5:db5520b4645f
branch: é
tag: tip
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin1 branch
changeset: 4:9cff3c980b58
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: Added tag é for changeset 770b9b11621d
changeset: 3:770b9b11621d
tag: é
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: utf-8 e' encoded: é
changeset: 2:0572af48b948
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e' encoded: é
changeset: 1:0e5b7e3f9c4a
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: koi8-r: ÒÔÕÔØ = u'\u0440\u0442\u0443\u0442\u044c'
changeset: 0:1e78a93102a3
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e': é = u'\xe9'
% hg log (utf-8)
changeset: 5:db5520b4645f
branch: é
tag: tip
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin1 branch
changeset: 4:9cff3c980b58
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: Added tag é for changeset 770b9b11621d
changeset: 3:770b9b11621d
tag: é
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: utf-8 e' encoded: é
changeset: 2:0572af48b948
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e' encoded: é
changeset: 1:0e5b7e3f9c4a
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: koi8-r: ÒÔÕÔØ = u'\u0440\u0442\u0443\u0442\u044c'
changeset: 0:1e78a93102a3
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e': é = u'\xe9'
% hg tags (ascii)
tip 5:db5520b4645f
? 3:770b9b11621d
% hg tags (latin-1)
tip 5:db5520b4645f
é 3:770b9b11621d
% hg tags (utf-8)
tip 5:db5520b4645f
é 3:770b9b11621d
% hg branches (ascii)
? 5:db5520b4645f
default 4:9cff3c980b58 (inactive)
% hg branches (latin-1)
é 5:db5520b4645f
default 4:9cff3c980b58 (inactive)
% hg branches (utf-8)
é 5:db5520b4645f
default 4:9cff3c980b58 (inactive)
% hg log (utf-8)
changeset: 5:db5520b4645f
branch: é
tag: tip
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin1 branch
changeset: 4:9cff3c980b58
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: Added tag é for changeset 770b9b11621d
changeset: 3:770b9b11621d
tag: é
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: utf-8 e' encoded: é
changeset: 2:0572af48b948
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e' encoded: é
changeset: 1:0e5b7e3f9c4a
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: koi8-r: ртуть = u'\u0440\u0442\u0443\u0442\u044c'
changeset: 0:1e78a93102a3
user: test
date: Mon Jan 12 13:46:40 1970 +0000
summary: latin-1 e': И = u'\xe9'
% hg log (dolphin)
abort: unknown encoding: dolphin, please check your locale settings
abort: decoding near 'é': 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)!
abort: branch name not in UTF-8!