This change causes an informative ImportError to be raised when importing
the parsers extension module if the minor version of the currently-running
Python interpreter doesn't match that of the Python used when compiling
the extension module.
This change also exposes a parsers.versionerrortext constant in the
C implementation of the module. Its presence can be used to determine
whether this behavior is present in a version of the module. The value
of the constant is the leading text of the ImportError raised and is set
to "Python minor version mismatch".
Here is an example of what the new error looks like:
Traceback (most recent call last):
File "test.py", line 1, in <module>
import mercurial.parsers
ImportError: Python minor version mismatch: The Mercurial extension
modules were compiled with Python 2.7.6, but Mercurial is currently using
Python with sys.hexversion=33883888: Python 2.5.6
(r256:88840, Nov 18 2012, 05:37:10)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]
at: /opt/local/Library/Frameworks/Python.framework/Versions/2.5/Resources/
Python.app/Contents/MacOS/Python
The reason for raising an error in this scenario is that Python's C API
is known not to be compatible from minor version to minor version, even
if sys.api_version is the same. See for example this Python bug report
about incompatibilities between 2.5 and 2.6+:
http://bugs.python.org/issue8118
These incompatibilities can cause Mercurial to break in mysterious,
unforeseen ways. For example, when Mercurial compiled with Python 2.7 was
run with 2.5, the following crash occurred when running "hg status":
http://bz.selenic.com/show_bug.cgi?id=4110
After this crash was fixed, running with Python 2.5 no longer crashes, but
the following puzzling behavior still occurs:
$ hg status
...
File ".../mercurial/changelog.py", line 123, in __init__
revlog.revlog.__init__(self, opener, "00changelog.i")
File ".../mercurial/revlog.py", line 251, in __init__
d = self._io.parseindex(i, self._inline)
File ".../mercurial/revlog.py", line 158, in parseindex
index, cache = parsers.parse_index2(data, inline)
TypeError: data is not a string
which can be reproduced more simply with:
import mercurial.parsers as parsers
parsers.parse_index2("", True)
Both the crash and the TypeError occurred because the Python C API's
PyString_Check() returns the wrong value when the C header files from
Python 2.7 are run with Python 2.5. This is an example of an
incompatibility of the sort mentioned in the Python bug report above.
Failing fast with an informative error message results in a better user
experience in cases like the above. The information in the ImportError
also simplifies troubleshooting for those on Mercurial mailing lists, the
bug tracker, etc.
This patch only adds the version check to parsers.c, which is sufficient
to affect command-line commands like "hg status" and "hg summary".
An idea for a future improvement is to move the version-checking C code
to a more central location, and have it run when importing all
Mercurial extension modules and not just parsers.c.
This change updates and improves the description of test-parseindex2.py.
In particular, it removes language that can be interpreted to mean that the
test module checks only the C implementation of parsers.parse_index2().
Rather, the module checks parsers.parse_index2(), which can be either the
C or pure Python implementation, depending on which version is being used.
As of 23b69fd11636, the module also does more than just compare the return
value with the original Python implementation.
This change causes an informative ImportError to be raised when importing
the extension module parsers if the minor version of the currently-running
Python interpreter doesn't match that of the Python that was used when
compiling the extension module. Here is an example of what the new error
looks like:
Traceback (most recent call last):
File "test.py", line 1, in <module>
import mercurial.parsers
ImportError: Python minor version mismatch: The Mercurial extension
modules were compiled with Python 2.7.6, but Mercurial is currently using
Python with sys.hexversion=33883888: Python 2.5.6
(r256:88840, Nov 18 2012, 05:37:10)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]
at: /opt/local/Library/Frameworks/Python.framework/Versions/2.5/Resources/
Python.app/Contents/MacOS/Python
The reason for raising an error in this scenario is that Python's C API
is known not to be compatible from minor version to minor version, even
if sys.api_version is the same. See for example this Python bug report
about incompatibilities between 2.5 and 2.6+:
http://bugs.python.org/issue8118
These incompatibilities can cause Mercurial to break in mysterious,
unforeseen ways. For example, when Mercurial compiled with Python 2.7 was
run with 2.5, the following crash occurred when running "hg status":
http://bz.selenic.com/show_bug.cgi?id=4110
After this crash was fixed, running with Python 2.5 no longer crashes, but
the following puzzling behavior still occurs:
$ hg status
...
File ".../mercurial/changelog.py", line 123, in __init__
revlog.revlog.__init__(self, opener, "00changelog.i")
File ".../mercurial/revlog.py", line 251, in __init__
d = self._io.parseindex(i, self._inline)
File ".../mercurial/revlog.py", line 158, in parseindex
index, cache = parsers.parse_index2(data, inline)
TypeError: data is not a string
which can be reproduced more simply with:
import mercurial.parsers as parsers
parsers.parse_index2("", True)
Both the crash and the TypeError occurred because the Python C API's
PyString_Check returns the wrong value when the C header files from
Python 2.7 are run with Python 2.5. This is an example of an
incompatibility of the sort mentioned in the Python bug report above.
Failing fast with an informative error message will result in a better
user experience in cases like the above. The information in the ImportError
will also simplify troubleshooting for those on Mercurial mailing lists,
the bug tracker, etc.
This patch only adds the version check to parsers.c, which is sufficient
to affect command-line commands like "hg status" and "hg summary".
An idea for a future improvement is to move the version-checking C code
to a more central location, and have it run when importing all
Mercurial extension modules and not just parsers.c.
Passing a non-string to parsers.parse_index2() causes Mercurial to crash
instead of raising a TypeError (found on Mac OS X 10.8.5, Python 2.7.6):
import mercurial.parsers as parsers
parsers.parse_index2(0, 0)
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 parsers.so 0x000000010e071c59 _index_clearcaches + 73 (parsers.c:644)
1 parsers.so 0x000000010e06f2d5 index_dealloc + 21 (parsers.c:1767)
2 parsers.so 0x000000010e074e3b parse_index2 + 347 (parsers.c:1891)
3 org.python.python 0x000000010dda8b17 PyEval_EvalFrameEx + 9911
This happens because when arguments of the wrong type are passed to
parsers.parse_index2(), indexType's initialization function index_init() in
parsers.c leaves the indexObject instance in a state that indexType's
destructor function index_dealloc() cannot handle.
This patch moves enough of the indexObject initialization code inside
index_init() from after the argument validation code to before it.
This way, when bad arguments are passed to index_init(), the destructor
doesn't crash and the existing code to raise a TypeError works. This
patch also adds a test to check that a TypeError is raised.
This greatly speeds up node->rev lookups, with results that are
often user-perceptible: for instance, "hg --time log" of the node
associated with rev 1000 on a linux-2.6 repo improves from 0.3
seconds to 0.03. I have not found any instances of slowdowns.
The new perfnodelookup command in contrib/perf.py demonstrates the
speedup more dramatically, since it performs no I/O. For a single
lookup, the new code is about 40x faster.
These changes also prepare the ground for the possibility of further
improving the performance of prefix-based node lookups.
We only parse entries in a revlog index file when they are actually
needed, and cache them when first requested.
This makes a huge difference to performance on large revlogs when
accessing the tip revision or performing a handful of numeric lookups
(very common cases). For instance, "hg --time tip --template {node}"
on a tree with 300,000 revs takes 0.15 before, 0.02 after.
Even for revlog-intensive operations (e.g. running "hg log" to
completion), the lazy approach is about 1% faster than the eager
parse_index2.