sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 00:45:18 +03:00

Author	SHA1	Message	Date
Bryan O'Sullivan	c6b9f1099d	parsers: a C implementation of the new ancestors algorithm The performance of both the old and new Python ancestor algorithms depends on the number of revs they need to traverse. Although the new algorithm performs far better than the old when revs are numerically and topologically close, both algorithms become slow under other circumstances, taking up to 1.8 seconds to give answers in a Linux kernel repo. This C implementation of the new algorithm is a fairly straightforward transliteration. The only corner case of interest is that it raises an OverflowError if the number of GCA candidates found during the first pass is greater than 24, to avoid the dual perils of fixnum overflow and trying to allocate too much memory. (If this exception is raised, the Python implementation is used instead.) Performance numbers are good: in a Linux kernel repo, time for "hg debugancestors" on two distant revs (24bf01de7537 and c2a8808f5943) is as follows: Old Python: 0.36 sec New Python: 0.42 sec New C: 0.02 sec For a case where the new algorithm should perform well: Old Python: 1.84 sec New Python: 0.07 sec New C: measures as zero when using --time (This commit includes a paranoid cross-check to ensure that the Python and C implementations give identical answers. The above performance numbers were measured with that check disabled.)	2013-04-16 10:08:20 -07:00
Bryan O'Sullivan	8f78d582d5	scmutil: rewrite dirs in C, use if available This is over twice as fast as the Python dirs code. Upcoming changes will nearly double its speed again. perfdirs results for a working dir with 170,000 files: Python 638 msec C 244	2013-04-10 15:08:27 -07:00
Siddharth Agarwal	9334236621	dirstate: move pure python dirstate packing to pure/parsers.py	2013-01-17 23:46:08 -08:00
Yuya Nishihara	26e354a203	parsers: fix memleak of revlog cache entries on strip Since 2852b9b207e9, raw_length can be reduced on strip, but corresponding cache entries still have refcount. They are not dereferenced by _index_clearcache(), and never freed. To reproduce the problem, run "hg pull" and "hg strip null" several times in the same process.	2013-01-28 19:05:35 +09:00
Bryan O'Sullivan	dea2c50032	store: implement lowerencode in C	2012-12-12 13:09:33 -08:00
Bryan O'Sullivan	a150198558	store: implement fncache basic path encoding in C (This is not yet enabled; it will be turned on in a followup patch.) The path encoding performed by fncache is complex and (perhaps surprisingly) slow enough to negatively affect the overall performance of Mercurial. For a short path (< 120 bytes), the Python code can be reduced to a fairly tractable state machine that either determines that nothing needs to be done in a single pass, or performs the encoding in a second pass. For longer paths, we avoid the more complicated hashed encoding scheme for now, and fall back to Python. Raw performance: I measured in a repo containing 150,000 files in its tip manifest, with a median path name length of 57 bytes, and 95th percentile of 96 bytes. In this repo, the Python code takes 3.1 seconds to encode all path names, while the hybrid C-and-Python code (called from Python) takes 0.21 seconds, for a speedup of about 14. Across several other large repositories, I've measured the speedup from the C code at between 26x and 40x. For path names above 120 bytes where we must fall back to Python for hashed encoding, the speedup is about 1.7x. Thus absolute performance will depend strongly on the characteristics of a particular repository.	2012-09-18 15:42:19 -07:00
Adrian Buehlmann	fd6785ba1c	pathencode: new C module with fast encodedir() function Not yet used (will be enabled in a later patch). This patch is a stripped down version of patches originally created by Bryan O'Sullivan <bryano@fb.com>	2012-09-18 11:43:30 +02:00
Bryan O'Sullivan	4045197307	parsers: fix an integer size warning issued by clang	2012-08-13 14:04:52 -07:00
sorcerer	b04ae8ca03	revlog: don't try to partialmatch strings those length > 40 _partialmatch() does prefix matching against nodes. String passed to _partialmetch() actualy may be any string, not prefix only. For example, "63af8381691a9e5c52ee57c4e965eb306f86826e or 300" is a good argument for _partialmatch(). When _partialmatch() searches using radix tree, index_partialmatch() C function shouldn't try to match too long strings.	2012-08-02 19:10:45 +04:00
Mads Kiilerich	6dedbb6378	parsers.c: remove warning: 'size' may be used uninitialized in this function Some compilers / compiler options (such as gcc 4.7) would emit warnings: mercurial/parsers.c: In function 'pack_dirstate': mercurial/parsers.c:306:18: warning: 'size' may be used uninitialized in this function [-Wmaybe-uninitialized] mercurial/parsers.c:306:12: warning: 'mode' may be used uninitialized in this function [-Wmaybe-uninitialized] It is apparently not smart enough to figure out how the 'err' arithmetics makes sure that it can't happen. 'err' is now replaced with simple checks and goto. That might also help the optimizer when it is inlining getintat().	2012-07-06 00:48:45 +02:00
Bryan O'Sullivan	ce2a30609e	parsers: add a C function to pack the dirstate This is about 9 times faster than the Python dirstate packing code. The relatively small speedup is due to the poor locality and memory access patterns caused by traversing dicts and other boxed Python values.	2012-05-30 12:55:33 -07:00
Bryan O'Sullivan	7edc7069a5	parsers: replace magic number 64 with symbolic constant	2012-06-01 15:19:08 -07:00
Bryan O'Sullivan	1e9deb3b01	parsers: cache the result of index_headrevs Although index_headrevs is much faster than its Python counterpart, it's still somewhat expensive when history is large. Since headrevs is called several times when the tag cache is stale or missing (e.g. after a strip or rebase), there's a win to be gained from caching the result, which we do here.	2012-05-19 20:21:48 -07:00
Bryan O'Sullivan	a49ea963d7	revlog: switch to a C version of headrevs The C implementation is more than 100 times faster than the Python version (which is still available as a fallback). In a repo with 330,000 revs and a stale .hg/cache/tags file, this patch improves the performance of "hg tip" from 2.2 to 1.6 seconds.	2012-05-19 19:44:58 -07:00
Bryan O'Sullivan	f9c29929d4	parsers: reduce raw_length when truncating When stripping revs, we now update raw_length to correctly reflect the new end of the index.	2012-05-19 19:44:18 -07:00
Bryan O'Sullivan	226bc14024	parsers: use Py_CLEAR where appropriate	2012-05-13 11:56:50 +02:00
Matt Mackall	05e48d4041	merge with stable	2012-05-13 12:52:24 +02:00
Bryan O'Sullivan	058dfb801d	revlog: speed up prefix matching against nodes The radix tree already contains all the information we need to determine whether a short string is an unambiguous node identifier. We now make use of this information. In a kernel tree, this improves the performance of "hg log -q -r24bf01de75" from 0.27 seconds to 0.06.	2012-05-12 10:55:08 +02:00
Bryan O'Sullivan	f29187cd15	parsers: ensure that nullid is always present in the radix tree	2012-05-12 10:55:08 +02:00
Bryan O'Sullivan	4fe1bcbdb1	parsers: allow hex keys	2012-05-12 10:55:07 +02:00
Matt Mackall	48471fd098	merge with stable	2012-05-12 00:06:11 +02:00
Adrian Buehlmann	f5ca6da4d6	parser: use PyInt_FromSsize_t in index_stats Eliminates mercurial/parsers.c(515) : warning C4244: 'function' : conversion from 'Py_ssize_t' to 'long', possible loss of data mercurial/parsers.c(520) : warning C4244: 'function' : conversion from 'Py_ssize_t' to 'long', possible loss of data mercurial/parsers.c(521) : warning C4244: 'function' : conversion from 'Py_ssize_t' to 'long', possible loss of data when compiling for Windows x64 target using the Microsoft compiler. PyInt_FromSsize_t does not exist for Python 2.4 and earlier, so we define a fallback in util.h to use PyInt_FromLong when compiling for Python 2.4.	2012-05-09 09:58:50 +02:00
Matt Mackall	6d78ec67ed	merge with stable	2012-05-11 14:48:24 +02:00
Bryan O'Sullivan	9daeaf8600	parsers: change the type of nt_level We should generally prefer Py_ssize_t whenever we are talking about lengths.	2012-05-08 14:48:50 -07:00
Bryan O'Sullivan	2d6b967125	parsers: change the type signature of hexdigit An upcoming change will make use of this.	2012-05-08 14:48:48 -07:00
Bryan O'Sullivan	98bb617e51	parsers: allow nt_find to signal an ambiguous match	2012-05-08 14:48:44 -07:00
Bryan O'Sullivan	2b933f4aaf	parsers: factor out radix tree initialization	2012-05-08 14:48:39 -07:00
Bryan O'Sullivan	7434efb91b	parsers: update ntrev when we stop scanning This prevents us from inserting some nodes twice, wasting work.	2012-05-08 14:46:06 -07:00
Bryan O'Sullivan	6f88d16cf2	parsers: use the correct maximum radix tree depth Previously, we would not use more than half of a SHA-1 hash when constructing and searching the tree.	2012-05-08 14:46:04 -07:00
Matt Mackall	d8dfd80d76	parsers: fix refcount bug on corrupt index When we encounter a corrupt index, we "fail" the init but our destructor still gets called. On some systems, this was causing us to attempt to decref a dangling to self->data.	2012-05-07 15:40:50 -05:00
Adrian Buehlmann	4482ec8070	parsers: statically initializing tp_new to PyType_GenericNew is not portable As detailed on http://docs.python.org/extending/newtypes.html (quote): "In this case, we can just use the default implementation provided by the API function PyType_GenericNew(). We’d like to just assign this to the tp_new slot, but we can’t, for portability sake. On some platforms or compilers, we can’t statically initialize a structure member with a function defined in another C module, so, instead, we’ll assign the tp_new slot in the module initialization function just before calling PyType_Ready()." Fixes "gcc (GCC) 3.4.5 (mingw-vista special r3)" complaining with: mercurial/parsers.c:1096: error: initializer element is not constant mercurial/parsers.c:1096: error: (near initialization for `indexType.tp_new')	2012-05-08 11:20:07 +02:00
Bryan O'Sullivan	1eb25dc7ef	parsers: fix refcount leak, simplify init of index (issue3417) This is most easily verified using valgrind on a long-running process, as the leak has no visible consequences during normal one-shot command usage. In one window: valgrind --leak-check=full --suppressions=valgrind-python.supp \ python ./hg serve In another: for ((i=0;i<100;i++)); do curl -s http://localhost:8000/file/tip/README >/dev/null done valgrind should report no leaks.	2012-05-02 14:37:44 -07:00
Matt Mackall	0fa9895915	util.h: replace ntohl/htonl with get/putbe32	2012-04-16 11:26:00 -05:00
Bryan O'Sullivan	dc46676e81	parsers: use base-16 trie for faster node->rev mapping This greatly speeds up node->rev lookups, with results that are often user-perceptible: for instance, "hg --time log" of the node associated with rev 1000 on a linux-2.6 repo improves from 0.3 seconds to 0.03. I have not found any instances of slowdowns. The new perfnodelookup command in contrib/perf.py demonstrates the speedup more dramatically, since it performs no I/O. For a single lookup, the new code is about 40x faster. These changes also prepare the ground for the possibility of further improving the performance of prefix-based node lookups.	2012-04-12 14:05:59 -07:00
Matt Mackall	0ba5fb4cce	util.h: more Python 2.4 fixes	2012-04-10 16:53:29 -05:00
Matt Mackall	fd4256c9b1	util.h: unify some common platform tweaks	2012-04-10 12:07:14 -05:00
Bryan O'Sullivan	774e52cd41	parsers: fix a memleak, and add a clearcaches method to the index This change also fixes a nasty memory leak: previously, self->caches was not being freed. The new clearcaches method lets us benchmark with finer granularity, as it lets us separate the cost of loading a revlog index from those of populating and accessing the cache data structures.	2012-04-06 00:28:36 -07:00
Bryan O'Sullivan	849e7f15fd	parsers: incrementally parse the revlog index in C We only parse entries in a revlog index file when they are actually needed, and cache them when first requested. This makes a huge difference to performance on large revlogs when accessing the tip revision or performing a handful of numeric lookups (very common cases). For instance, "hg --time tip --template {node}" on a tree with 300,000 revs takes 0.15 before, 0.02 after. Even for revlog-intensive operations (e.g. running "hg log" to completion), the lazy approach is about 1% faster than the eager parse_index2.	2012-04-05 13:00:35 -07:00
Bryan O'Sullivan	3fbc354649	parsers: strictly check for 20-byte hashes where they're required	2012-05-12 20:25:33 +02:00
Matt Mackall	cb69aaee4e	parsers: avoid pointer aliasing Newer versions of GCC have aggressive pointer alias optimizations that might get fooled by our pointer manipulations. These issues shouldn't be encountered in practice because distutils compiles extensions with -fno-strict-alias but the code was not valid according to the standard.	2011-08-10 13:40:01 -05:00
Martin Geisler	a76e121863	backout of e4cb9628354c Matt and a majority of crew did not like this approach.	2011-01-27 11:15:08 +01:00
Martin Geisler	d23e1973c2	specify C indention style using Emacs file local variables	2011-01-26 12:05:01 +01:00
Benoit Boissinot	6c18ebff9d	parsers.c: fix comment	2011-01-15 12:44:28 +01:00
Matt Mackall	846d35e24f	revlog: only build the nodemap on demand	2011-01-11 17:01:04 -06:00
Renato Cunha	e7d8ae78a9	parsers.c: Added support for py3k. This patch adds support for py3k in parsers.c. This is accomplished by including a header file responsible for abstracting the API differences between python 2 and python 3.	2010-06-15 19:49:56 -03:00
Matt Mackall	69f9d533aa	parsers: fix some signed comparison issues (spotted by Steve Borho)	2010-02-13 17:37:44 -06:00
Matt Mackall	8d99be19f0	many, many trivial check-code fixups	2010-01-25 00:05:27 -06:00
Nicolas Dumazet	9e39f64d52	parsers.c: parse_manifest: fixing refcount of flags When flags was DECREF'ed, scope was referencing to the outer variable, outside of the block. It was in fact always NULL: the real Python object was never decref'ed.	2009-08-27 14:15:04 +02:00
Thomas Arendsen Hein	8999e196bc	Some additional space/tab cleanups	2008-10-20 15:19:05 +02:00
Dirkjan Ochtman	f5ea74b223	clean up trailing spaces, leading spaces in C	2008-10-20 14:57:04 +02:00

1 2

66 Commits