Commit Graph

14 Commits

Author SHA1 Message Date
Siddharth Agarwal
488f8eee3e pathencode: fix hashmangle short dir limit (issue3958)
The Python version of this (see mercurial/store.py:_hashencode) copies path
components up to a limit of maxshortdirslen bytes. The Python version does not
consider the initial "dh/" to be part of the this, though, while the C version
currently does. Adding len("dh/") == 3 to the limit for the C version brings it
in line with the Python version.

This was not caught by the randomized testing scheme in test-pathencode.py
because of a couple of flaws with the test. Upcoming patches will fix those
problems.
2013-06-19 22:34:34 -07:00
Matt Mackall
474d52f0c3 pathencode: grow buffers to increase safety margin 2013-05-10 11:23:50 -05:00
Yuya Nishihara
6621ca2cb6 pathencode: eliminate comma at end of enum list to avoid pedantic warning 2013-04-19 01:34:21 +09:00
Matt Mackall
197aa5594e pathencode: don't use alloca() for safety/portability 2013-01-19 17:20:39 -06:00
Bryan O'Sullivan
561ba86d03 pathencode: implement both basic and hashed encoding in C 2012-12-12 13:09:36 -08:00
Bryan O'Sullivan
0914e7afb6 pathencode: implement hashed encoding in C
This will be used by an upcoming patch.
2012-12-12 13:09:36 -08:00
Bryan O'Sullivan
e632a0600d pathencode: implement the "mangling" part of hashed encoding in C
This will be used by an upcoming patch.
2012-12-12 13:09:35 -08:00
Bryan O'Sullivan
480e769e36 pathencode: add a SHA-1 hash function
This will be used by an upcoming patch.

This calls out to the Python hash implementation.

An earlier version of this function implemented SHA-1 directly, but
the amount of extra code didn't seem like a good tradeoff compared
to the small big-picture increase in performance (long paths are
uncommon).
2012-12-12 13:09:34 -08:00
Bryan O'Sullivan
dea2c50032 store: implement lowerencode in C 2012-12-12 13:09:33 -08:00
André Sintzoff
54640fd383 pathencode: change isset name to avoid name collision
On old Mac OS X versions (10.4), arpa/inet.h (included in mercurial/util.h)
includes system/param.h which defines isset macro.
2012-09-30 15:31:27 +02:00
Adrian Buehlmann
5de7aaf8a4 pathencode: skip encoding if input is already longer than maxstorepathlen
Calling basicencode may make the path longer, never shorter. If it's already
too long before, then we don't even need to basicencode it.
2012-09-30 23:53:56 +02:00
Adrian Buehlmann
377fe9eee4 pathencode: simplify basicencode 2012-09-30 23:53:56 +02:00
Bryan O'Sullivan
a150198558 store: implement fncache basic path encoding in C
(This is not yet enabled; it will be turned on in a followup patch.)

The path encoding performed by fncache is complex and (perhaps
surprisingly) slow enough to negatively affect the overall performance
of Mercurial.

For a short path (< 120 bytes), the Python code can be reduced to a fairly
tractable state machine that either determines that nothing needs to be
done in a single pass, or performs the encoding in a second pass.

For longer paths, we avoid the more complicated hashed encoding scheme
for now, and fall back to Python.

Raw performance: I measured in a repo containing 150,000 files in its tip
manifest, with a median path name length of 57 bytes, and 95th percentile
of 96 bytes.

In this repo, the Python code takes 3.1 seconds to encode all path
names, while the hybrid C-and-Python code (called from Python) takes
0.21 seconds, for a speedup of about 14.

Across several other large repositories, I've measured the speedup from
the C code at between 26x and 40x.

For path names above 120 bytes where we must fall back to Python for
hashed encoding, the speedup is about 1.7x.  Thus absolute performance
will depend strongly on the characteristics of a particular repository.
2012-09-18 15:42:19 -07:00
Adrian Buehlmann
fd6785ba1c pathencode: new C module with fast encodedir() function
Not yet used (will be enabled in a later patch).

This patch is a stripped down version of patches originally created by
Bryan O'Sullivan <bryano@fb.com>
2012-09-18 11:43:30 +02:00