The old native head revs logic would iterate over every node, starting from 0,
and check if every node was filtered (by testing it against the filteredrevs
python set). On large repos with hundreds of thousands of commits, this could
take 150ms.
This new logic iterates over the nodes in reverse order, and skips the filtered
check if we've seen an unfiltered child of the node. This saves approximately a
bagillion filteredrevs set checks, which shaves the time down from 150ms to
20ms during every branch cache write.
Before this patch, there was only a python version of nonnormalentries.
On mozilla-central we have a 10x win by putting this function in C:
% python -m timeit -s \
'from mercurial import hg, ui, parsers; \
repo = hg.repository(ui.ui(), "mozilla-central"); \
m = repo.dirstate._map' \
'parsers.nonnormalentries(m)'
100 loops, best of 3: 3.15 msec per loop
The python implementation runs in 31ms, a similar test gives:
10 loops, best of 3: 31.7 msec per loop
On our big repos, the win is still of 10x with the python implementation running
in 350ms and the C implementation running in 30ms.
Since Py_XDECREF and free both accept NULL pointers, we can get by
with just two exit paths: one for success, and one for error.
This considerably simplifies reasoning about the possible ways to
exit from this function.
Because parsers.c does not define PY_SSIZE_T_CLEAN, "s#" format requires
(const char*, int), not (const char*, Py_ssize_t).
https://docs.python.org/2/c-api/arg.html
This error had no problem before 7d13be5f72c2, where datalen wasn't used.
But now fm1readmarkers() fails with "overflow in obsstore" on Python 2.6.9
(amd64) because upper bits of datalen seem to be filled with 1, making it
a negative integer.
This problem seems not visible on our Python 2.7 environment because upper
bits happen to be filled with 0.
Spotted by CC=clang CFLAGS='-Wall -Wextra -Wno-missing-field-initializers
-Wno-unused-parameter -Wshorten-64-to-32':
mercurial/parsers.c:1580:24: warning: comparison of integers of different
signs: 'Py_ssize_t' (aka 'long') and 'unsigned long' [-Wsign-compare]
if (self->raw_length > INT_MAX / sizeof(nodetree)) {
These fields are defined as int. This eliminates the following warning
spotted by CC=clang CFLAGS='-Wall -Wextra -Wno-missing-field-initializers
-Wno-unused-parameter -Wshorten-64-to-32':
mercurial/parsers.c:625:29: warning: comparison of integers of different
signs: 'uint32_t' (aka 'unsigned int') and 'int' [-Wsign-compare]
if (state == 'n' && mtime == now) {
On recent OS, 'stat.st_mtime' has a double precision floating point
value to represent nano seconds, but it is not wide enough for actual
file timestamp: nowadays, only 52 - 32 = 20 bit width is available for
decimal places in sec.
Therefore, casting it to 'int' may cause unexpected result. See also
changeset 8102a3981272 fixing issue4836 for detail.
For example, changed file A may be treated as "clean" unexpectedly in
steps below. "rounded now" is the value gotten by rounding via
'int(st.st_mtime)' or so.
---------------------+--------------------+------------------------
"now" | | timestamp of A (time_t)
float rounded time_t| action | FS dirstate
------ ------- ------+--------------------+-------- ---------------
N+.nnn N N | | --- ---
| update file A | N
| dirstate.normal(A) | N
N+.999 N+1 N | |
| dirstate.write() | N (*1)
| : |
| change file A | N
| : |
N+1.00 N+1 N+1 | |
| "hg status" (*2) | N N
------ ------- ------+--------------------+-------- ---------------
Timestamp N of A in dirstate isn't dropped at (*1), because "rounded
now" is N+1 at that time, even if 'st_mtime' in 'time_t' is still N.
Then, file A is unexpectedly treated as "clean" at (*2) in this case.
For consistent handling of 'stat.st_mtime', this patch makes
'pack_dirstate()' take 'now' argument not in floating point but in
integer.
This patch makes 'PyArg_ParseTuple()' in 'pack_dirstate()' use format
'i' (= checking type mismatch or overflow), even though it is ensured
that 'now' is in the range of 32bit signed integer by masking with
'_rangemask' (= 0x7fffffff) on caller side.
It should be cheaper enough than packing itself, and useful to
detect that legacy code invokes 'pack_dirstate()' with 'now' in
floating point value.
The issue4888 was caused by 0-length obsolete marker. If msize is zero,
fm1readmarkers() never ends.
This patch adds several bound checks to fm1readmarker(). Therefore, 0-length
and invalid-size marker should be rejected.
This will make it easy to implement bound checking. Currently fm1readmarker()
has no protection for corrupted obsstore and can cause infinite loop or
out-of-bound reads.
Tested with CFLAGS=-Wshorten-64-to-32 CC=clang which is the default of
Mac OS X.
Because a valid revnum shouldn't exceed INT_MAX, we don't need long width for
large tovisit array.
Now we don't need a set of reachable revisions, and the caller wants a sorted
list of revisions, so constructing a set is just a waste of time.
revset #0: 0::tip
2) 0.002536
3) 0.001598 63%
PyList_New() should set an appropriate exception on error, so we don't need
to call PyErr_NoMemory() manually.
This patch lacks error handling of PyList_Append() as it was before for
PySet_Add(). It should be fixed later.
The main goal of this patch series is to reduce the use of PyXxx() function
that is likely to require ugly error handling and inc/decref. Plus, this is
faster than using PySet_Contains().
revset #0: 0::tip
0) 0.004168
1) 0.003678 88%
This patch ignores out-of-range roots as they are in the pure implementation.
Because reachable sets are calculated from heads, and out-of-range heads raise
IndexError, we can just take out-of-range roots as unreachable. Otherwise,
the test of "hg log -Gr '. + wdir()'" would fail.
"heads" argument is changed to a list. Should we have to rename the C function
as its signature is changed?
Previously we were returning NULL from this function without actually
setting up an exception. This fixes that problem, which was detected
with cpychecker.
We're about to check if len < 40 after assigning readlen to len, which
means that if len < 40 we'll still abort, but I'm about to add a
sensible exception to that failure, so let's just discard this useless
check.
Both happy paths through this function leaked the returned list:
1) If the list was of size 0 or 1, it was retained an extra time and then
returned.
2) If the list was passed to find_deepest, it was never released before
exiting this function.
Both paths spotted by cpychecker.
This patch also replaces Py_XDECREF() by Py_DECREF() because we known "val"
and "p" are not NULL.
BTW, we can eliminate some of these allocation and error handling of int objects
if the internal "seen" array has more information. For example,
enum { SEEN = 1, ROOT = 2, REACHABLE = 4 };
/* ... build ROOT mask from roots argument ... */
if (seen[revnum + 1] & ROOT) { /* instead of PySet_Contains(roots, val) */
>From my quick hack, it is 2x faster.
Though PyInt_AS_LONG() can return a value no matter if it isn't an int object,
it could exceed the boundary of the underlying struct. I think C API should be
defensive to such errors.
Before this patch, release_seen_and_tovisit did not return NULL, so the
exception was not raised immediately. As Py_XDECREF() and free() are safe
for NULL, we can simply bail in any case.
This is being masked by the function not properly returning NULL when
it raises an exception, so the client code was just falling back to
the native codepath when it got None back. A future change removes all
reason for this C function to return None, which exposed this problem
during development.
My inspection of the implementation of PySet_New() indicates that it
does *not* reliably set an exception in the cases where it returns
NULL (as far as I can tell it'll never do that!), so let's set that up
ourselves.
This patch is part of a series of patches to speed up the computation of
revset.reachableroots by introducing a C implementation. The main motivation is to
speed up smartlog on big repositories. At the end of the series, on our big
repositories the computation of reachableroots is 10-50x faster and smartlog on is
2x-5x faster.
This patch introduces a C implementation for reachableroots following closely the
Python implementation but optimized by using C data structures.
PySet_Add increments the reference of the added object to the set, see:
https://hg.python.org/cpython/file/2.6/Objects/setobject.c#l379
Before this patch we were forgetting to decrement the reference count after
adding objects to the phaseset. This patch fixes the issue and makes the
reference count right so that these objects can be properly garbage collected.