We no longer have any users of the legacy PyString* functions. We no
longer need these redefinitions.
After this change, the only reference to "PyString" in the repo is in
watchman's C extension. That isn't our code and porting Mercurial
extensions to Python 3 is not a high priority at the moment. watchman's
C extension will be dealt with later.
I think remapping Python C API types and functions is not a great
approach. I'd prefer this whole #ifdef disappeared. Add a comment
so we don't forget about it.
util.h attempts to wallpaper over C API differences between Python 2
and 3. This is not the correct approach where performance is critical.
But it is good enough for the current state of the Python 3 port.
Previously, the phase computation would grow much slower as the oldest draft
commit in the repository grew older (which is very common in repos with evolve
on) and the number of commits increase.
By rewriting the computation in C we can speed it up from 700ms to 7ms on
a large repository whose oldest draft commit is a year old.
As far as I can tell, this is wrong. double's format isn't strictly
specified in the C standard, but the wikipedia article implies that
platforms implementing optional Annex F "IEC 60559 floating-point
arithmetic" will work correctly.
My local C experts believe doing *((double *) &t) is a strict aliasing
violation, and that using a union is also one. Doing memcpy appears to
be the least-undefined behavior possible.
Previously, while unpacking the dirstate we'd create 3-4 new CPython objects
for most dirstate values:
- the state is a single character string, which is pooled by CPython
- the mode is a new object if it isn't 0 due to being in the lookup set
- the size is a new object if it is greater than 255
- the mtime is a new object if it isn't -1 due to being in the lookup set
- the tuple to contain them all
In some cases such as regular hg status, we actually look at all the objects.
In other cases like hg add, hg status for a subdirectory, or hg status with the
third-party hgwatchman enabled, we look at almost none of the objects.
This patch eliminates most object creation in these cases by defining a custom
C struct that is exposed to Python with an interface similar to a tuple. Only
when tuple elements are actually requested are the respective objects created.
The gains, where they're expected, are significant. The following tests are run
against a working copy with over 270,000 files.
parse_dirstate becomes significantly faster:
$ hg perfdirstate
before: wall 0.186437 comb 0.180000 user 0.160000 sys 0.020000 (best of 35)
after: wall 0.093158 comb 0.100000 user 0.090000 sys 0.010000 (best of 95)
and as a result, several commands benefit:
$ time hg status # with hgwatchman enabled
before: 0.42s user 0.14s system 99% cpu 0.563 total
after: 0.34s user 0.12s system 99% cpu 0.471 total
$ time hg add new-file
before: 0.85s user 0.18s system 99% cpu 1.033 total
after: 0.76s user 0.17s system 99% cpu 0.931 total
There is a slight regression in regular status performance, but this is fixed
in an upcoming patch.
getbe32 and putbe32 need to behave differently on big-endian and little-endian
systems. On big-endian ones, they should be roughly equivalent to the identity
function with a cast, but on little-endian ones they should reverse the order
of the bytes. That is achieved by the original definition, but
__builtin_bswap32 and _byteswap_ulong, as the names suggest, swap bytes around
unconditionally.
There was no measurable performance improvement, so there's no point adding
extra complexity with even more ifdefs for endianncess.
(This is not yet enabled; it will be turned on in a followup patch.)
The path encoding performed by fncache is complex and (perhaps
surprisingly) slow enough to negatively affect the overall performance
of Mercurial.
For a short path (< 120 bytes), the Python code can be reduced to a fairly
tractable state machine that either determines that nothing needs to be
done in a single pass, or performs the encoding in a second pass.
For longer paths, we avoid the more complicated hashed encoding scheme
for now, and fall back to Python.
Raw performance: I measured in a repo containing 150,000 files in its tip
manifest, with a median path name length of 57 bytes, and 95th percentile
of 96 bytes.
In this repo, the Python code takes 3.1 seconds to encode all path
names, while the hybrid C-and-Python code (called from Python) takes
0.21 seconds, for a speedup of about 14.
Across several other large repositories, I've measured the speedup from
the C code at between 26x and 40x.
For path names above 120 bytes where we must fall back to Python for
hashed encoding, the speedup is about 1.7x. Thus absolute performance
will depend strongly on the characteristics of a particular repository.
Eliminates
mercurial/parsers.c(515) : warning C4244: 'function' : conversion from
'Py_ssize_t' to 'long', possible loss of data
mercurial/parsers.c(520) : warning C4244: 'function' : conversion from
'Py_ssize_t' to 'long', possible loss of data
mercurial/parsers.c(521) : warning C4244: 'function' : conversion from
'Py_ssize_t' to 'long', possible loss of data
when compiling for Windows x64 target using the Microsoft compiler.
PyInt_FromSsize_t does not exist for Python 2.4 and earlier, so we define a
fallback in util.h to use PyInt_FromLong when compiling for Python 2.4.
If we are in py3k, a IS_PY3K symbol is defined. Apart from that, byte strings
use the API defined in Python 2.6+ (_?PyBytes_.*). For Python < 2.6, the bytes
API is defined accordingly for mercurial usage (shameless copy from
bytesobject.h from Python's code). Some macros were backported from 2.6, as
inspired by rPath's pycompat.h.