This change causes an informative ImportError to be raised when importing
the parsers extension module if the minor version of the currently-running
Python interpreter doesn't match that of the Python used when compiling
the extension module.
This change also exposes a parsers.versionerrortext constant in the
C implementation of the module. Its presence can be used to determine
whether this behavior is present in a version of the module. The value
of the constant is the leading text of the ImportError raised and is set
to "Python minor version mismatch".
Here is an example of what the new error looks like:
Traceback (most recent call last):
File "test.py", line 1, in <module>
import mercurial.parsers
ImportError: Python minor version mismatch: The Mercurial extension
modules were compiled with Python 2.7.6, but Mercurial is currently using
Python with sys.hexversion=33883888: Python 2.5.6
(r256:88840, Nov 18 2012, 05:37:10)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]
at: /opt/local/Library/Frameworks/Python.framework/Versions/2.5/Resources/
Python.app/Contents/MacOS/Python
The reason for raising an error in this scenario is that Python's C API
is known not to be compatible from minor version to minor version, even
if sys.api_version is the same. See for example this Python bug report
about incompatibilities between 2.5 and 2.6+:
http://bugs.python.org/issue8118
These incompatibilities can cause Mercurial to break in mysterious,
unforeseen ways. For example, when Mercurial compiled with Python 2.7 was
run with 2.5, the following crash occurred when running "hg status":
http://bz.selenic.com/show_bug.cgi?id=4110
After this crash was fixed, running with Python 2.5 no longer crashes, but
the following puzzling behavior still occurs:
$ hg status
...
File ".../mercurial/changelog.py", line 123, in __init__
revlog.revlog.__init__(self, opener, "00changelog.i")
File ".../mercurial/revlog.py", line 251, in __init__
d = self._io.parseindex(i, self._inline)
File ".../mercurial/revlog.py", line 158, in parseindex
index, cache = parsers.parse_index2(data, inline)
TypeError: data is not a string
which can be reproduced more simply with:
import mercurial.parsers as parsers
parsers.parse_index2("", True)
Both the crash and the TypeError occurred because the Python C API's
PyString_Check() returns the wrong value when the C header files from
Python 2.7 are run with Python 2.5. This is an example of an
incompatibility of the sort mentioned in the Python bug report above.
Failing fast with an informative error message results in a better user
experience in cases like the above. The information in the ImportError
also simplifies troubleshooting for those on Mercurial mailing lists, the
bug tracker, etc.
This patch only adds the version check to parsers.c, which is sufficient
to affect command-line commands like "hg status" and "hg summary".
An idea for a future improvement is to move the version-checking C code
to a more central location, and have it run when importing all
Mercurial extension modules and not just parsers.c.
This change causes an informative ImportError to be raised when importing
the extension module parsers if the minor version of the currently-running
Python interpreter doesn't match that of the Python that was used when
compiling the extension module. Here is an example of what the new error
looks like:
Traceback (most recent call last):
File "test.py", line 1, in <module>
import mercurial.parsers
ImportError: Python minor version mismatch: The Mercurial extension
modules were compiled with Python 2.7.6, but Mercurial is currently using
Python with sys.hexversion=33883888: Python 2.5.6
(r256:88840, Nov 18 2012, 05:37:10)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]
at: /opt/local/Library/Frameworks/Python.framework/Versions/2.5/Resources/
Python.app/Contents/MacOS/Python
The reason for raising an error in this scenario is that Python's C API
is known not to be compatible from minor version to minor version, even
if sys.api_version is the same. See for example this Python bug report
about incompatibilities between 2.5 and 2.6+:
http://bugs.python.org/issue8118
These incompatibilities can cause Mercurial to break in mysterious,
unforeseen ways. For example, when Mercurial compiled with Python 2.7 was
run with 2.5, the following crash occurred when running "hg status":
http://bz.selenic.com/show_bug.cgi?id=4110
After this crash was fixed, running with Python 2.5 no longer crashes, but
the following puzzling behavior still occurs:
$ hg status
...
File ".../mercurial/changelog.py", line 123, in __init__
revlog.revlog.__init__(self, opener, "00changelog.i")
File ".../mercurial/revlog.py", line 251, in __init__
d = self._io.parseindex(i, self._inline)
File ".../mercurial/revlog.py", line 158, in parseindex
index, cache = parsers.parse_index2(data, inline)
TypeError: data is not a string
which can be reproduced more simply with:
import mercurial.parsers as parsers
parsers.parse_index2("", True)
Both the crash and the TypeError occurred because the Python C API's
PyString_Check returns the wrong value when the C header files from
Python 2.7 are run with Python 2.5. This is an example of an
incompatibility of the sort mentioned in the Python bug report above.
Failing fast with an informative error message will result in a better
user experience in cases like the above. The information in the ImportError
will also simplify troubleshooting for those on Mercurial mailing lists,
the bug tracker, etc.
This patch only adds the version check to parsers.c, which is sufficient
to affect command-line commands like "hg status" and "hg summary".
An idea for a future improvement is to move the version-checking C code
to a more central location, and have it run when importing all
Mercurial extension modules and not just parsers.c.
Passing a non-string to parsers.parse_index2() causes Mercurial to crash
instead of raising a TypeError (found on Mac OS X 10.8.5, Python 2.7.6):
import mercurial.parsers as parsers
parsers.parse_index2(0, 0)
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 parsers.so 0x000000010e071c59 _index_clearcaches + 73 (parsers.c:644)
1 parsers.so 0x000000010e06f2d5 index_dealloc + 21 (parsers.c:1767)
2 parsers.so 0x000000010e074e3b parse_index2 + 347 (parsers.c:1891)
3 org.python.python 0x000000010dda8b17 PyEval_EvalFrameEx + 9911
This happens because when arguments of the wrong type are passed to
parsers.parse_index2(), indexType's initialization function index_init() in
parsers.c leaves the indexObject instance in a state that indexType's
destructor function index_dealloc() cannot handle.
This patch moves enough of the indexObject initialization code inside
index_init() from after the argument validation code to before it.
This way, when bad arguments are passed to index_init(), the destructor
doesn't crash and the existing code to raise a TypeError works. This
patch also adds a test to check that a TypeError is raised.
When try to compile on x64 OS X, I get this warning:
mercurial/parsers.c:931:27: warning: implicit conversion loses integer precision
: 'long' to 'int' [-Wshorten-64-to-32]
? 4 : self->raw_length / 2;
The patch verifies if value of self->raw_length falls bellow INT_MAX; if not,
it raises the ValueError exception.
If value of self->raw_length is greater than 4, it's casted to int type, to
eliminate the warning.
memchr is usually smarter than a simple for loop. With gcc 4.4.6 and glibc 2.12
on x86-64, for a 20 MB, 200,000 file manifest, parse_manifest goes from 0.116
seconds to 0.095 seconds.
Previously we'd place files written in the last second in the lookup set. This
can lead to pathological cases where a file always remains in the lookup set if
it gets modified before the next time status is run.
With this patch, only the mtime of those files is invalidated. This means that
if a file's size or mode changes, we can immediately declare it as modified
without needing to compare file contents.
ninteresting indicates the number of non-zero elements in the interesting
array, not the number of elements in the final list. Since elements in
interesting can stand for more than one gca, limiting the number of results to
ninteresting is an error.
Tests for issue3984 are included.
The invariant this code tries to hold is that ninteresting is the number of
non-zero elements in the interesting array. interesting[nsp] is incremented at
the same time as interesting[sp] is decremented. So if interesting[nsp] was
previously 0, ninteresting shouldn't be decremented.
gcc 4.6.3 on 12.04 Ubuntu machine emits warnings:
mercurial/parsers.c: In function ‘find_deepest’:
mercurial/parsers.c:1288:9: warning: format ‘%ld’ expects argument of type
‘long int’, but argument 3 has type ‘Py_ssize_t’ [-Wformat]
mercurial/parsers.c:1288:9: warning: format ‘%ld’ expects argument of type
‘long int’, but argument 4 has type ‘Py_ssize_t’ [-Wformat]
The performance of both the old and new Python ancestor algorithms
depends on the number of revs they need to traverse. Although the
new algorithm performs far better than the old when revs are
numerically and topologically close, both algorithms become slow
under other circumstances, taking up to 1.8 seconds to give answers
in a Linux kernel repo.
This C implementation of the new algorithm is a fairly straightforward
transliteration. The only corner case of interest is that it raises
an OverflowError if the number of GCA candidates found during the
first pass is greater than 24, to avoid the dual perils of fixnum
overflow and trying to allocate too much memory. (If this exception
is raised, the Python implementation is used instead.)
Performance numbers are good: in a Linux kernel repo, time for "hg
debugancestors" on two distant revs (24bf01de7537 and c2a8808f5943)
is as follows:
Old Python: 0.36 sec
New Python: 0.42 sec
New C: 0.02 sec
For a case where the new algorithm should perform well:
Old Python: 1.84 sec
New Python: 0.07 sec
New C: measures as zero when using --time
(This commit includes a paranoid cross-check to ensure that the
Python and C implementations give identical answers. The above
performance numbers were measured with that check disabled.)
This is over twice as fast as the Python dirs code. Upcoming changes
will nearly double its speed again.
perfdirs results for a working dir with 170,000 files:
Python 638 msec
C 244
Since 2852b9b207e9, raw_length can be reduced on strip, but corresponding cache
entries still have refcount. They are not dereferenced by _index_clearcache(),
and never freed.
To reproduce the problem, run "hg pull" and "hg strip null" several times
in the same process.
(This is not yet enabled; it will be turned on in a followup patch.)
The path encoding performed by fncache is complex and (perhaps
surprisingly) slow enough to negatively affect the overall performance
of Mercurial.
For a short path (< 120 bytes), the Python code can be reduced to a fairly
tractable state machine that either determines that nothing needs to be
done in a single pass, or performs the encoding in a second pass.
For longer paths, we avoid the more complicated hashed encoding scheme
for now, and fall back to Python.
Raw performance: I measured in a repo containing 150,000 files in its tip
manifest, with a median path name length of 57 bytes, and 95th percentile
of 96 bytes.
In this repo, the Python code takes 3.1 seconds to encode all path
names, while the hybrid C-and-Python code (called from Python) takes
0.21 seconds, for a speedup of about 14.
Across several other large repositories, I've measured the speedup from
the C code at between 26x and 40x.
For path names above 120 bytes where we must fall back to Python for
hashed encoding, the speedup is about 1.7x. Thus absolute performance
will depend strongly on the characteristics of a particular repository.
Not yet used (will be enabled in a later patch).
This patch is a stripped down version of patches originally created by
Bryan O'Sullivan <bryano@fb.com>
_partialmatch() does prefix matching against nodes. String passed
to _partialmetch() actualy may be any string, not prefix only.
For example,
"63af8381691a9e5c52ee57c4e965eb306f86826e or 300" is a good
argument for _partialmatch().
When _partialmatch() searches using radix tree, index_partialmatch()
C function shouldn't try to match too long strings.
Some compilers / compiler options (such as gcc 4.7) would emit warnings:
mercurial/parsers.c: In function 'pack_dirstate':
mercurial/parsers.c:306:18: warning: 'size' may be used uninitialized in this function [-Wmaybe-uninitialized]
mercurial/parsers.c:306:12: warning: 'mode' may be used uninitialized in this function [-Wmaybe-uninitialized]
It is apparently not smart enough to figure out how the 'err' arithmetics makes
sure that it can't happen.
'err' is now replaced with simple checks and goto. That might also help the
optimizer when it is inlining getintat().
This is about 9 times faster than the Python dirstate packing code.
The relatively small speedup is due to the poor locality and memory
access patterns caused by traversing dicts and other boxed Python
values.
Although index_headrevs is much faster than its Python counterpart,
it's still somewhat expensive when history is large. Since headrevs
is called several times when the tag cache is stale or missing (e.g.
after a strip or rebase), there's a win to be gained from caching
the result, which we do here.
The C implementation is more than 100 times faster than the Python
version (which is still available as a fallback).
In a repo with 330,000 revs and a stale .hg/cache/tags file, this
patch improves the performance of "hg tip" from 2.2 to 1.6 seconds.
The radix tree already contains all the information we need to
determine whether a short string is an unambiguous node identifier.
We now make use of this information.
In a kernel tree, this improves the performance of
"hg log -q -r24bf01de75" from 0.27 seconds to 0.06.
Eliminates
mercurial/parsers.c(515) : warning C4244: 'function' : conversion from
'Py_ssize_t' to 'long', possible loss of data
mercurial/parsers.c(520) : warning C4244: 'function' : conversion from
'Py_ssize_t' to 'long', possible loss of data
mercurial/parsers.c(521) : warning C4244: 'function' : conversion from
'Py_ssize_t' to 'long', possible loss of data
when compiling for Windows x64 target using the Microsoft compiler.
PyInt_FromSsize_t does not exist for Python 2.4 and earlier, so we define a
fallback in util.h to use PyInt_FromLong when compiling for Python 2.4.