Commit Graph

3 Commits

Author SHA1 Message Date
Yuya Nishihara
a5ae36fcb1 encoding: add fast path of from/toutf8b() for ASCII strings
See the previous patch for why.

The added test seems not making much sense because ASCII strings should
never contain "\xed" and be valid UTF-8.

  (with mercurial repo)
  $ export HGRCPATH=/dev/null HGPLAIN=
  $ hg log --time --config experimental.stabilization=all -Tjson > /dev/null

  (original)
  time: real 6.830 secs (user 6.740+0.000 sys 0.080+0.000)
  time: real 6.690 secs (user 6.650+0.000 sys 0.040+0.000)
  time: real 6.700 secs (user 6.640+0.000 sys 0.060+0.000)

  (fast jsonescape)
  time: real 5.630 secs (user 5.550+0.000 sys 0.070+0.000)
  time: real 5.700 secs (user 5.650+0.000 sys 0.050+0.000)
  time: real 5.690 secs (user 5.640+0.000 sys 0.050+0.000)

  (this patch)
  time: real 5.190 secs (user 5.120+0.000 sys 0.070+0.000)
  time: real 5.230 secs (user 5.170+0.000 sys 0.050+0.000)
  time: real 5.220 secs (user 5.150+0.000 sys 0.070+0.000)
2017-04-23 13:08:58 +09:00
Yuya Nishihara
42ccee312b encoding: add fast path of from/tolocal() for ASCII strings
This is micro optimization, but seems not bad since to/fromlocal() is called
lots of times and isasciistr() is cheap and simple.

We boldly assume that any non-ASCII characters have at least one 8-bit byte.
This isn't true for some email character sets (e.g. ISO-2022-JP and UTF-7),
but I believe no such encodings are used as a platform default. Shift_JIS,
a major crap, is okay as it should have a leading byte in 0x80-0xff range.

  (with mercurial repo)
  $ export HGRCPATH=/dev/null HGPLAIN=
  $ hg log --time --config experimental.stabilization=all > /dev/null

  (original)
  time: real 7.460 secs (user 7.420+0.000 sys 0.030+0.000)
  time: real 7.670 secs (user 7.590+0.000 sys 0.080+0.000)
  time: real 7.560 secs (user 7.510+0.000 sys 0.040+0.000)

  (this patch)
  time: real 7.340 secs (user 7.260+0.000 sys 0.060+0.000)
  time: real 7.260 secs (user 7.210+0.000 sys 0.030+0.000)
  time: real 7.310 secs (user 7.260+0.000 sys 0.060+0.000)
2017-04-23 13:06:23 +09:00
Yuya Nishihara
a22ffac20b encoding: add function to test if a str consists of ASCII characters
Most strings are ASCII. Let's optimize for it.

Using uint64_t is slightly faster than uint32_t on 64bit system, but there
isn't huge difference.
2017-04-23 12:59:42 +09:00