The performance of both the old and new Python ancestor algorithms
depends on the number of revs they need to traverse. Although the
new algorithm performs far better than the old when revs are
numerically and topologically close, both algorithms become slow
under other circumstances, taking up to 1.8 seconds to give answers
in a Linux kernel repo.
This C implementation of the new algorithm is a fairly straightforward
transliteration. The only corner case of interest is that it raises
an OverflowError if the number of GCA candidates found during the
first pass is greater than 24, to avoid the dual perils of fixnum
overflow and trying to allocate too much memory. (If this exception
is raised, the Python implementation is used instead.)
Performance numbers are good: in a Linux kernel repo, time for "hg
debugancestors" on two distant revs (24bf01de7537 and c2a8808f5943)
is as follows:
Old Python: 0.36 sec
New Python: 0.42 sec
New C: 0.02 sec
For a case where the new algorithm should perform well:
Old Python: 1.84 sec
New Python: 0.07 sec
New C: measures as zero when using --time
(This commit includes a paranoid cross-check to ensure that the
Python and C implementations give identical answers. The above
performance numbers were measured with that check disabled.)
This is over twice as fast as the Python dirs code. Upcoming changes
will nearly double its speed again.
perfdirs results for a working dir with 170,000 files:
Python 638 msec
C 244
Since 2852b9b207e9, raw_length can be reduced on strip, but corresponding cache
entries still have refcount. They are not dereferenced by _index_clearcache(),
and never freed.
To reproduce the problem, run "hg pull" and "hg strip null" several times
in the same process.
(This is not yet enabled; it will be turned on in a followup patch.)
The path encoding performed by fncache is complex and (perhaps
surprisingly) slow enough to negatively affect the overall performance
of Mercurial.
For a short path (< 120 bytes), the Python code can be reduced to a fairly
tractable state machine that either determines that nothing needs to be
done in a single pass, or performs the encoding in a second pass.
For longer paths, we avoid the more complicated hashed encoding scheme
for now, and fall back to Python.
Raw performance: I measured in a repo containing 150,000 files in its tip
manifest, with a median path name length of 57 bytes, and 95th percentile
of 96 bytes.
In this repo, the Python code takes 3.1 seconds to encode all path
names, while the hybrid C-and-Python code (called from Python) takes
0.21 seconds, for a speedup of about 14.
Across several other large repositories, I've measured the speedup from
the C code at between 26x and 40x.
For path names above 120 bytes where we must fall back to Python for
hashed encoding, the speedup is about 1.7x. Thus absolute performance
will depend strongly on the characteristics of a particular repository.
Not yet used (will be enabled in a later patch).
This patch is a stripped down version of patches originally created by
Bryan O'Sullivan <bryano@fb.com>
_partialmatch() does prefix matching against nodes. String passed
to _partialmetch() actualy may be any string, not prefix only.
For example,
"63af8381691a9e5c52ee57c4e965eb306f86826e or 300" is a good
argument for _partialmatch().
When _partialmatch() searches using radix tree, index_partialmatch()
C function shouldn't try to match too long strings.
Some compilers / compiler options (such as gcc 4.7) would emit warnings:
mercurial/parsers.c: In function 'pack_dirstate':
mercurial/parsers.c:306:18: warning: 'size' may be used uninitialized in this function [-Wmaybe-uninitialized]
mercurial/parsers.c:306:12: warning: 'mode' may be used uninitialized in this function [-Wmaybe-uninitialized]
It is apparently not smart enough to figure out how the 'err' arithmetics makes
sure that it can't happen.
'err' is now replaced with simple checks and goto. That might also help the
optimizer when it is inlining getintat().
This is about 9 times faster than the Python dirstate packing code.
The relatively small speedup is due to the poor locality and memory
access patterns caused by traversing dicts and other boxed Python
values.
Although index_headrevs is much faster than its Python counterpart,
it's still somewhat expensive when history is large. Since headrevs
is called several times when the tag cache is stale or missing (e.g.
after a strip or rebase), there's a win to be gained from caching
the result, which we do here.
The C implementation is more than 100 times faster than the Python
version (which is still available as a fallback).
In a repo with 330,000 revs and a stale .hg/cache/tags file, this
patch improves the performance of "hg tip" from 2.2 to 1.6 seconds.
The radix tree already contains all the information we need to
determine whether a short string is an unambiguous node identifier.
We now make use of this information.
In a kernel tree, this improves the performance of
"hg log -q -r24bf01de75" from 0.27 seconds to 0.06.
Eliminates
mercurial/parsers.c(515) : warning C4244: 'function' : conversion from
'Py_ssize_t' to 'long', possible loss of data
mercurial/parsers.c(520) : warning C4244: 'function' : conversion from
'Py_ssize_t' to 'long', possible loss of data
mercurial/parsers.c(521) : warning C4244: 'function' : conversion from
'Py_ssize_t' to 'long', possible loss of data
when compiling for Windows x64 target using the Microsoft compiler.
PyInt_FromSsize_t does not exist for Python 2.4 and earlier, so we define a
fallback in util.h to use PyInt_FromLong when compiling for Python 2.4.
When we encounter a corrupt index, we "fail" the init but our
destructor still gets called. On some systems, this was causing us to
attempt to decref a dangling to self->data.
As detailed on http://docs.python.org/extending/newtypes.html (quote):
"In this case, we can just use the default implementation provided by the API
function PyType_GenericNew(). We’d like to just assign this to the
tp_new slot, but we can’t, for portability sake. On some platforms or
compilers, we can’t statically initialize a structure member with a function
defined in another C module, so, instead, we’ll assign the tp_new slot in the
module initialization function just before calling PyType_Ready()."
Fixes "gcc (GCC) 3.4.5 (mingw-vista special r3)" complaining with:
mercurial/parsers.c:1096: error: initializer element is not constant
mercurial/parsers.c:1096: error: (near initialization for `indexType.tp_new')
This is most easily verified using valgrind on a long-running
process, as the leak has no visible consequences during normal
one-shot command usage.
In one window:
valgrind --leak-check=full --suppressions=valgrind-python.supp \
python ./hg serve
In another:
for ((i=0;i<100;i++)); do
curl -s http://localhost:8000/file/tip/README >/dev/null
done
valgrind should report no leaks.
This greatly speeds up node->rev lookups, with results that are
often user-perceptible: for instance, "hg --time log" of the node
associated with rev 1000 on a linux-2.6 repo improves from 0.3
seconds to 0.03. I have not found any instances of slowdowns.
The new perfnodelookup command in contrib/perf.py demonstrates the
speedup more dramatically, since it performs no I/O. For a single
lookup, the new code is about 40x faster.
These changes also prepare the ground for the possibility of further
improving the performance of prefix-based node lookups.
This change also fixes a nasty memory leak: previously, self->caches
was not being freed.
The new clearcaches method lets us benchmark with finer granularity,
as it lets us separate the cost of loading a revlog index from those
of populating and accessing the cache data structures.
We only parse entries in a revlog index file when they are actually
needed, and cache them when first requested.
This makes a huge difference to performance on large revlogs when
accessing the tip revision or performing a handful of numeric lookups
(very common cases). For instance, "hg --time tip --template {node}"
on a tree with 300,000 revs takes 0.15 before, 0.02 after.
Even for revlog-intensive operations (e.g. running "hg log" to
completion), the lazy approach is about 1% faster than the eager
parse_index2.
Newer versions of GCC have aggressive pointer alias optimizations that
might get fooled by our pointer manipulations. These issues shouldn't
be encountered in practice because distutils compiles extensions with
-fno-strict-alias but the code was not valid according to the standard.
This patch adds support for py3k in parsers.c. This is accomplished by including
a header file responsible for abstracting the API differences between python 2
and python 3.
When flags was DECREF'ed, scope was referencing to the outer variable,
outside of the block.
It was in fact always NULL: the real Python object was never decref'ed.