Commit Graph

191 Commits

Author SHA1 Message Date
timeless
28949d3481 cleanup: remove superfluous space after space after equals (C) 2015-12-31 08:17:15 +00:00
Laurent Charignon
056d0af816 dirstate: add a C implementation for nonnormalentries
Before this patch, there was only a python version of nonnormalentries.
On mozilla-central we have a 10x win by putting this function in C:
% python -m timeit -s \
        'from mercurial import hg, ui, parsers; \
        repo = hg.repository(ui.ui(), "mozilla-central"); \
        m = repo.dirstate._map' \
        'parsers.nonnormalentries(m)'

100 loops, best of 3: 3.15 msec per loop

The python implementation runs in 31ms, a similar test gives:
10 loops, best of 3: 31.7 msec per loop

On our big repos, the win is still of 10x with the python implementation running
in 350ms and the C implementation running in 30ms.
2015-12-21 16:27:16 -08:00
Bryan O'Sullivan
509e69c982 parsers: use PyTuple_Pack instead of manual list-filling
Suggested by Yuya.
2015-12-17 13:07:34 -08:00
Bryan O'Sullivan
964c9f9835 parsers: add a missed PyErr_NoMemory 2015-12-14 10:47:27 -08:00
Bryan O'Sullivan
c7f7939b54 parsers: check results of PyInt_FromLong (issue4771) 2015-12-14 10:47:26 -08:00
Bryan O'Sullivan
be02f1c0fd parsers: simplify error logic in compute_phases_map_sets
Since Py_XDECREF and free both accept NULL pointers, we can get by
with just two exit paths: one for success, and one for error.

This considerably simplifies reasoning about the possible ways to
exit from this function.
2015-12-14 10:47:24 -08:00
Bryan O'Sullivan
5a05512f2f parsers: narrow scope of a variable to be less confusing 2015-12-12 20:57:01 -08:00
Yuya Nishihara
bbbc39e0a4 parsers: fix parse_dirstate to check len before unpacking header (issue4979) 2015-12-02 23:04:58 +09:00
Yuya Nishihara
557c963f7b parsers: fix width of datalen variable in fm1readmarkers
Because parsers.c does not define PY_SSIZE_T_CLEAN, "s#" format requires
(const char*, int), not (const char*, Py_ssize_t).

https://docs.python.org/2/c-api/arg.html

This error had no problem before 7d13be5f72c2, where datalen wasn't used.
But now fm1readmarkers() fails with "overflow in obsstore" on Python 2.6.9
(amd64) because upper bits of datalen seem to be filled with 1, making it
a negative integer.

This problem seems not visible on our Python 2.7 environment because upper
bits happen to be filled with 0.
2015-11-07 17:43:20 +09:00
Yuya Nishihara
18a70ac37c parsers: suppress warning of signed and unsigned comparison at nt_init
Spotted by CC=clang CFLAGS='-Wall -Wextra -Wno-missing-field-initializers
-Wno-unused-parameter -Wshorten-64-to-32':

  mercurial/parsers.c:1580:24: warning: comparison of integers of different
  signs: 'Py_ssize_t' (aka 'long') and 'unsigned long' [-Wsign-compare]
                  if (self->raw_length > INT_MAX / sizeof(nodetree)) {
2015-10-18 09:05:04 +09:00
Yuya Nishihara
0318e586fb parsers: correct type of temporary variables for dirstate tuple fields
These fields are defined as int. This eliminates the following warning
spotted by CC=clang CFLAGS='-Wall -Wextra -Wno-missing-field-initializers
-Wno-unused-parameter -Wshorten-64-to-32':

  mercurial/parsers.c:625:29: warning: comparison of integers of different
  signs: 'uint32_t' (aka 'unsigned int') and 'int' [-Wsign-compare]
                  if (state == 'n' && mtime == now) {
2015-10-17 23:14:13 +09:00
FUJIWARA Katsunori
77975f1ce1 parsers: make pack_dirstate take now in integer for consistency
On recent OS, 'stat.st_mtime' has a double precision floating point
value to represent nano seconds, but it is not wide enough for actual
file timestamp: nowadays, only 52 - 32 = 20 bit width is available for
decimal places in sec.

Therefore, casting it to 'int' may cause unexpected result. See also
changeset 8102a3981272 fixing issue4836 for detail.

For example, changed file A may be treated as "clean" unexpectedly in
steps below. "rounded now" is the value gotten by rounding via
'int(st.st_mtime)' or so.

    ---------------------+--------------------+------------------------
    "now"                |                    | timestamp of A (time_t)
    float  rounded time_t| action             | FS       dirstate
    ------ ------- ------+--------------------+-------- ---------------
    N+.nnn   N       N   |                    | ---      ---
                         | update file A      |  N
                         | dirstate.normal(A) |           N
    N+.999   N+1     N   |                    |
                         | dirstate.write()   |           N (*1)
                         |    :               |
                         | change file A      |  N
                         |    :               |
    N+1.00   N+1    N+1  |                    |
                         | "hg status" (*2)   |  N        N
    ------ ------- ------+--------------------+-------- ---------------

Timestamp N of A in dirstate isn't dropped at (*1), because "rounded
now" is N+1 at that time, even if 'st_mtime' in 'time_t' is still N.

Then, file A is unexpectedly treated as "clean" at (*2) in this case.

For consistent handling of 'stat.st_mtime', this patch makes
'pack_dirstate()' take 'now' argument not in floating point but in
integer.

This patch makes 'PyArg_ParseTuple()' in 'pack_dirstate()' use format
'i' (= checking type mismatch or overflow), even though it is ensured
that 'now' is in the range of 32bit signed integer by masking with
'_rangemask' (= 0x7fffffff) on caller side.

It should be cheaper enough than packing itself, and useful to
detect that legacy code invokes 'pack_dirstate()' with 'now' in
floating point value.
2015-10-14 02:40:04 +09:00
Yuya Nishihara
b1dadc9002 parsers: fix infinite loop or out-of-bound read in fm1readmarkers (issue4888)
The issue4888 was caused by 0-length obsolete marker. If msize is zero,
fm1readmarkers() never ends.

This patch adds several bound checks to fm1readmarker(). Therefore, 0-length
and invalid-size marker should be rejected.
2015-10-11 18:30:47 +09:00
Yuya Nishihara
62c0a27d40 parsers: read sizes of metadata pair of obsolete marker at once
This will make it easy to implement bound checking. Currently fm1readmarker()
has no protection for corrupted obsstore and can cause infinite loop or
out-of-bound reads.
2015-10-11 18:41:41 +09:00
Yuya Nishihara
d786b95595 parsers: use PyTuple_New and SET_ITEM to construct metadata pair of markers
With these 2 patches, fm1readmarkers() gets slightly faster:

  obsolete._fm1readmarkers() for 78644 entries
  58.0 -> 56.2msec
2015-09-05 16:50:35 +09:00
Yuya Nishihara
7f0688b8e9 parsers: use PyTuple_SET_ITEM() to fill new marker tuples
Because we know these tuples have no member yet, PyTuple_SetItem() isn't
necessary.
2015-09-05 16:41:21 +09:00
Augie Fackler
5d1e91bd29 parsers: fix two cases of unsigned long instead of Py_ssize_t
We had to do this before because Python 2.4 didn't understand the n
format specifier in Py_BuildValue and friends. We no longer have that
problem.
2015-08-26 10:20:07 -04:00
timeless@mozdev.org
52eae47139 spelling: behaviour -> behavior 2015-08-28 10:53:55 -04:00
Yuya Nishihara
d439794357 reachableroots: silence warning of implicit integer narrowing issued by clang
Tested with CFLAGS=-Wshorten-64-to-32 CC=clang which is the default of
Mac OS X.

Because a valid revnum shouldn't exceed INT_MAX, we don't need long width for
large tovisit array.
2015-08-14 12:25:14 +09:00
Yuya Nishihara
ac624a2f60 reachableroots: narrow scope of minidx variable
minidx is never used if includepath is false, so let's define it where it
is used.
2015-08-14 12:22:08 +09:00
Augie Fackler
e4a0ef9ede parsers: avoid int/unsigned conversions
Detected with
make local CFLAGS='-Wall -Wextra -Wno-missing-field-initializers -Wno-unused-parameter' CC=clang
2015-08-21 14:33:51 -04:00
Yuya Nishihara
4c30aae351 reachableroots: unroll loop that checks if one of parents is reachable
The difference is small, but fewer loops should be better in general:

  revset #0: 0::tip
  0) 0.001609
  1) 0.001510  93%
2015-08-16 09:30:37 +09:00
Yuya Nishihara
ffe4e45775 reachableroots: handle error of PyList_Append() 2015-08-15 19:38:03 +09:00
Yuya Nishihara
ecea4ee80c reachableroots: return list of revisions instead of set
Now we don't need a set of reachable revisions, and the caller wants a sorted
list of revisions, so constructing a set is just a waste of time.

  revset #0: 0::tip
  2) 0.002536
  3) 0.001598  63%

PyList_New() should set an appropriate exception on error, so we don't need
to call PyErr_NoMemory() manually.

This patch lacks error handling of PyList_Append() as it was before for
PySet_Add(). It should be fixed later.
2015-08-14 15:52:19 +09:00
Yuya Nishihara
72474b8722 reachableroots: use internal "revstates" array to test if rev is reachable
This is faster than using PySet_Contains().

  revset #0: 0::tip
  1) 0.003678
  2) 0.002536  68%
2015-08-14 15:49:11 +09:00
Yuya Nishihara
7f0aba37f0 reachableroots: use internal "revstates" array to test if rev is a root
The main goal of this patch series is to reduce the use of PyXxx() function
that is likely to require ugly error handling and inc/decref. Plus, this is
faster than using PySet_Contains().

  revset #0: 0::tip
  0) 0.004168
  1) 0.003678  88%

This patch ignores out-of-range roots as they are in the pure implementation.
Because reachable sets are calculated from heads, and out-of-range heads raise
IndexError, we can just take out-of-range roots as unreachable. Otherwise,
the test of "hg log -Gr '. + wdir()'" would fail.

"heads" argument is changed to a list. Should we have to rename the C function
as its signature is changed?
2015-08-14 15:43:29 +09:00
Augie Fackler
7b20700303 parsers: set exception when there's too little string data to extract parents
Previously we were returning NULL from this function without actually
setting up an exception. This fixes that problem, which was detected
with cpychecker.
2015-08-18 16:40:10 -04:00
Augie Fackler
71f1912b43 parsers: drop spurious check of readlen value
We're about to check if len < 40 after assigning readlen to len, which
means that if len < 40 we'll still abort, but I'm about to add a
sensible exception to that failure, so let's just discard this useless
check.
2015-08-18 16:39:26 -04:00
Augie Fackler
d203bdeed3 parsers: correctly decref normed value after PyDict_SetItem
Previously we were leaving this PyObject* with a refcount that was one
too high. Detected with cpychecker.
2015-08-18 16:43:26 -04:00
Augie Fackler
b4e44876ff parsers: fix two leaks in index_ancestors
Both happy paths through this function leaked the returned list:

1) If the list was of size 0 or 1, it was retained an extra time and then
   returned.

2) If the list was passed to find_deepest, it was never released before
   exiting this function.

Both paths spotted by cpychecker.
2015-08-18 17:15:04 -04:00
Yuya Nishihara
fe0b1b769d reachableroots: extend "revstates" to array of bit flags 2015-08-14 15:30:52 +09:00
Yuya Nishihara
548434fd88 reachableroots: rename "seen" array to "revstates" for future extension
It will be an array of bit flags, SEEN | ROOT | REACHABLE.
2015-08-14 15:23:42 +09:00
Yuya Nishihara
8267d3bb5d reachableroots: give anonymous name to short-lived "numheads" variable
I'll reuse it for the length of the roots list.
2015-08-15 18:29:58 +09:00
Yuya Nishihara
8743607a4d reachableroots: reduce nesting level by jumping to next iteration by continue
This can eliminate lines over 80 columns. No code change except for the
outermost "if" condition.
2015-08-15 18:03:47 +09:00
Yuya Nishihara
dd91337869 reachableroots: fix memleak of integer objects at includepath loop
In the first visit loop, val is decref-ed correctly after PySet_Add().
Let's do the same for the includepath loop.
2015-08-14 12:36:41 +09:00
Yuya Nishihara
814039db26 reachableroots: bail if integer object cannot be allocated
This patch also replaces Py_XDECREF() by Py_DECREF() because we known "val"
and "p" are not NULL.

BTW, we can eliminate some of these allocation and error handling of int objects
if the internal "seen" array has more information. For example,

  enum { SEEN = 1, ROOT = 2, REACHABLE = 4 };
  /* ... build ROOT mask from roots argument ... */
  if (seen[revnum + 1] & ROOT) {  /* instead of PySet_Contains(roots, val) */

>From my quick hack, it is 2x faster.
2015-08-14 12:31:56 +09:00
Yuya Nishihara
1566598e14 reachableroots: verify type of each item of heads argument
Though PyInt_AS_LONG() can return a value no matter if it isn't an int object,
it could exceed the boundary of the underlying struct. I think C API should be
defensive to such errors.
2015-08-13 18:59:49 +09:00
Yuya Nishihara
01d4a46e15 reachableroots: verify integer range of heads argument (issue4775)
Now it raises IndexError instead of SEGV for 'wdir()' as it was before.
2015-08-13 18:38:46 +09:00
Yuya Nishihara
d210a7a2bf reachableroots: unify bail cases to raise exception correctly
Before this patch, release_seen_and_tovisit did not return NULL, so the
exception was not raised immediately. As Py_XDECREF() and free() are safe
for NULL, we can simply bail in any case.
2015-08-13 18:29:38 +09:00
Yuya Nishihara
d44f168a4b reachableroots: pass NULL to PySet_New() as it expects a pointer, not an int 2015-08-13 17:58:33 +09:00
Augie Fackler
4d8670352a reachableroots: return NULL if we're throwing an exception
Based on my reading of [0] and surrounding sections, if we want an
exception to be properly raised when something goes wrong in the C
code, we need to make sure we return NULL here. Do so.

https://docs.python.org/2/extending/extending.html#back-to-the-example
2015-08-11 14:53:47 -04:00
Augie Fackler
380097abbb reachableroots: fix transposition of set and list types in PyArg_ParseTuple
This is being masked by the function not properly returning NULL when
it raises an exception, so the client code was just falling back to
the native codepath when it got None back. A future change removes all
reason for this C function to return None, which exposed this problem
during development.
2015-08-11 15:34:10 -04:00
Augie Fackler
eebac428f3 reachableroots: consistently use short-form of PyErr_NoMemory() 2015-08-11 14:50:39 -04:00
Augie Fackler
e09e18c658 reachableroots: if allocating a new set fails, use PyErr_NoMemory()
My inspection of the implementation of PySet_New() indicates that it
does *not* reliably set an exception in the cases where it returns
NULL (as far as I can tell it'll never do that!), so let's set that up
ourselves.
2015-08-11 14:49:40 -04:00
Laurent Charignon
b940f257de reachableroots: add a C implementation
This patch is part of a series of patches to speed up the computation of
revset.reachableroots by introducing a C implementation. The main motivation is to
speed up smartlog on big repositories. At the end of the series, on our big
repositories the computation of reachableroots is 10-50x faster and smartlog on is
2x-5x faster.

This patch introduces a C implementation for reachableroots following closely the
Python implementation but optimized by using C data structures.
2015-08-06 21:28:45 -07:00
Laurent Charignon
c906fe19bb parsers: fix memory leak in compute_phases_map_sets
PySet_Add increments the reference of the added object to the set, see:
https://hg.python.org/cpython/file/2.6/Objects/setobject.c#l379
Before this patch we were forgetting to decrement the reference count after
adding objects to the phaseset. This patch fixes the issue and makes the
reference count right so that these objects can be properly garbage collected.
2015-08-06 22:54:28 -07:00
Yuya Nishihara
b789a2d6b1 parsers: silence warning of implicit integer conversion issued by clang
"-Wshorten-64-to-32" is enabled by default on Mac OS X. Because "len" should
be represented in 32bit integer, this patch simply cast ssize_t to int.
2015-07-20 23:38:56 +09:00
Yuya Nishihara
83b3c48a90 parsers: fix buffer overflow by invalid parent revision read from revlog
If revlog file is corrupted, it can have parent pointing to invalid revision.
So we should validate it before updating nothead[], phases[], seen[], etc.
Otherwise it would cause segfault at best.

We could use "rev" instead of "maxrev" as upper bound, but I think the explicit
"maxrev" can clarify that we just want to avoid possible buffer overflow
vulnerability.
2015-07-16 23:36:08 +09:00
Siddharth Agarwal
bba072811f parsers: add an API to create a new presized dict 2015-06-15 22:41:30 -07:00
Siddharth Agarwal
1be06bc8f4 parsers: factor out code to create a presized dict
In upcoming patches we'll expose this as an API.
2015-06-15 22:37:33 -07:00