sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-09 00:14:35 +03:00

Author	SHA1	Message	Date
Mads Kiilerich	403c97887d	tests: stabilize doctest output Avoid dependencies to dict iteration order.	2013-01-15 02:59:14 +01:00
Mads Kiilerich	2f4504e446	fix trivial spelling errors	2012-08-15 22:38:42 +02:00
Martin Geisler	5b013e2061	encoding: add fast-path for ASCII uppercase. This copies the performance hack from encoding.lower (e7a5733d533f). The case-folding logic that kicks in on case-insensitive filesystems hits encoding.upper hard: with a repository with 75k files, the timings went from hg perfstatus ! wall 3.156000 comb 3.156250 user 1.625000 sys 1.531250 (best of 3) to hg perfstatus ! wall 2.390000 comb 2.390625 user 1.078125 sys 1.312500 (best of 5) This is a 24% decrease. For comparison, Mercurial 2.0 gives: hg perfstatus ! wall 2.172000 comb 2.171875 user 0.984375 sys 1.187500 (best of 5) so we're only 10% slower than before we added the extra case-folding logic. The same decrease is seen when executing 'hg status' as normal, where we go from: hg status --time time: real 4.322 secs (user 2.219+0.000 sys 2.094+0.000) to hg status --time time: real 3.307 secs (user 1.750+0.000 sys 1.547+0.000)	2012-07-23 15:55:26 -06:00
Martin Geisler	4f96956c09	encoding: use s.decode to trigger UnicodeDecodeError When calling encode on a str, the string is first decoded using the default encoding and then encoded. So s.encode('ascii') == s.decode().encode('ascii') We don't care about the encode step here -- we're just after the UnicodeDecodeError raised by decode if it finds a non-ASCII character. This way is also marginally faster since it saves the construction of the extra str object.	2012-07-23 15:55:22 -06:00
Cesar Mena	5d1ea9328c	encoding: protect against non-ascii default encoding If the default python encoding was changed from ascii, the attempt to encode as ascii before lower() could throw a UnicodeEncodeError. Catch UnicodeError instead to prevent an unhandled exception.	2012-04-22 21:27:52 -04:00
Matt Mackall	b7245bb05e	encoding: add fast-path for ASCII lowercase	2012-04-10 12:07:18 -05:00
Matt Mackall	369755dc10	encoding: tune fast-path of tolocal a bit	2012-03-22 16:54:46 -05:00
Matt Mackall	feb580aa0d	encoding: introduce utf8-b helpers	2012-02-20 16:42:45 -06:00
Mads Kiilerich	c180ed9101	encoding: use hint markup for "please check your locale settings" This will also make test-encoding.t pass on windows. The test would hit some other code path that already used hint markup.	2011-12-26 15:01:06 +01:00
FUJIWARA Katsunori	fe972435d4	i18n: use encoding.lower/upper for encoding aware case folding this patch uses encoding.lower/upper for case folding, because ones of str can not fold case of non ascii characters correctly. to avoid cyclic dependency and to encapsulate logic of normcase in each platforms, this patch introduces encodinglower/encodingupper in both posix/windows specific files. this patch does not change implementation of normcase() in posix.py, because we do not know the encoding of filenames on POSIX. some "normcase()" are excluded from function wrap list in hgext/win32mbcs.py, because they become encoding aware by this patch.	2011-12-16 21:09:41 +09:00
Matt Mackall	9310b276b8	encoding: add getcols to extract substrings based on column width	2011-09-21 13:00:46 -05:00
Matt Mackall	9b84bd37fa	encoding: colwidth input is in the local encoding	2011-09-21 13:00:41 -05:00
FUJIWARA Katsunori	5b5a083f16	i18n: calculate terminal columns by width information of each characters neither number of 'bytes' in any encoding nor 'characters' is appropriate to calculate terminal columns for specified string. this patch modifies MBTextWrapper for: - overriding '_wrap_chunks()' to make it use not built-in 'len()' but 'encoding.colwidth()' for columns of string - fixing '_cutdown()' to make it use 'encoding.colwidth()' instead of local, similar but incorrect implementation this patch also modifies 'encoding.py': - dividing 'colwith()' into 2 pieces: one for calculation columns of specified UNICODE string, and another for rest part of original one. the former is used from MBTextWrapper in 'util.py'. - preventing 'colwidth()' from evaluating HGENCODINGAMBIGUOUS configuration per each invocation: 'unicodedata.east_asian_width' checking is kept intact for reducing startup cost.	2011-08-27 04:56:12 +09:00
Augie Fackler	e16b528122	encoding: use getattr isntead of hasattr	2011-07-25 15:19:43 -05:00
Matt Mackall	f865cc3f06	encoding: add an encoding-aware lower function	2011-04-30 10:57:13 -05:00
Matt Mackall	1cf3cf83b1	encoding: avoid localstr when a string can be encoded losslessly (issue2763) localstr's hash method exists to prevent bogus matching on lossy local encodings. For instance, we don't want 'caf?' to match 'café' in an ASCII locale. But when café can be losslessly encoded in the local charset, we can simply use a normal string and avoid the hashing trick. This avoids using localstr's hash method, which would prevent a match between	2011-04-15 23:45:41 -05:00
Martin Geisler	dd0f217423	encoding: fix typo in variable name The typo had no real effect, except for an unnecessary UTF-8 encoding.	2010-11-29 10:13:55 +01:00
Matt Mackall	c7059d3926	encoding: add localstr class to track UTF-8 version of transcoded strings This allows UTF-8 strings to losslessly round-trip through Mercurial	2010-11-24 15:38:52 -06:00
Matt Mackall	50b99d1a5a	encoding: default ambiguous character to narrow The current implementation of colwidth was treating 'A'mbiguous characters as wide, which was incorrect in a non-East Asian context. As per http://unicode.org/reports/tr11/#Recommendations, we should instead default to 'narrow' if we don't know better. As character width is dependent on the particular font used and we have no idea what fonts are in use, this recommendation applies. This introduces HGENCODINGAMBIGUOUS to get the old behavior back.	2010-10-27 15:35:21 -05:00
Martin Geisler	77ce66fb6a	check-code: find trailing whitespace	2010-10-20 10:13:04 +02:00
Brodie Rao	203cf2fbd9	cleanup: remove unused imports	2010-08-27 13:32:38 -04:00
Dan Villiom Podlaski Christiansen	d64d4dc9f0	encoding: improve handling of buggy getpreferredencoding() on Mac OS X Prior to version 2.7, calling locale.getpreferredencoding() would always return 'mac-roman' on Mac OS X. Previously, this was handled by a call to locale.setlocale(). Unfortunately, Python 2.6.5 and older have a bug where isspace() would incorrectly report True for 0x85 and 0xa0 after such a call. In order to fix this, we replace the previous _encodingfixup mapping to an _encodingfixers mapping. Rather than mapping encodings to their replacement, it maps them to a function returning the replacement. This allows us to provide an simplified implementation of getpreferredencoding() which extracts the expected encoding and restores the locale. This fix is based on a patch originally submitted by Martijn Pieters as well as feedback from Brodie Rao.	2010-08-14 01:30:54 +02:00
FUJIWARA Katsunori	9cce255bec	replace Python standard textwrap by MBCS sensitive one for i18n text Mercurial has problem around text wrapping/filling in MBCS encoding environment, because standard 'textwrap' module of Python can not treat it correctly. It splits byte sequence for one character into two lines. According to unicode specification, "east asian width" classifies characters into: W(ide), N(arrow), F(ull-width), H(alf-width), A(mbiguous) W/N/F/H can be always recognized as 2/1/2/1 bytes in byte sequence, but 'A' can not. Size of 'A' depends on language in which it is used. Unicode specification says: If the context(= language) cannot be established reliably they should be treated as narrow characters by default but many of class 'A' characters are full-width, at least, in Japanese environment. So, this patch treats class 'A' characters as full-width always for safety wrapping. This patch focuses only on MBCS safe-ness, not on writing/printing rule strict wrapping for each languages MBCS sensitive textwrap class is originally implemented by ITO Nobuaki <daydream.trippers@gmail.com>.	2010-06-06 17:20:10 +09:00
Matt Mackall	8d99be19f0	many, many trivial check-code fixups	2010-01-25 00:05:27 -06:00
Matt Mackall	595d66f424	Update license to GPLv2+	2010-01-19 22:20:08 -06:00
Dirkjan Ochtman	02b4677d86	encoding: fix issue with non-standard UTF-8 CTYPE on OS X	2009-10-10 12:00:43 +02:00
Simon Heimberg	09ac1e6c92	separate import lines from mercurial and general python modules	2009-04-28 17:40:46 +02:00
Martin Geisler	8e4bc1e9ad	put license and copyright info into comment blocks	2009-04-26 01:13:08 +02:00
Martin Geisler	750183bdad	updated license to be explicit about GPL version 2	2009-04-26 01:08:54 +02:00
Matt Mackall	642f4d7151	move encoding bits from util to encoding In addition to cleaning up util, this gets rid of some circular dependencies.	2009-04-03 14:51:48 -05:00

30 Commits