sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-11 09:17:30 +03:00

Author	SHA1	Message	Date
Siddharth Agarwal	2d669c474b	revlog: switch findmissing* methods to incrementalmissingrevs This will allow us to remove ancestor.missingancestors in an upcoming patch.	2014-11-14 16:52:40 -08:00
Siddharth Agarwal	5692148f49	revlog: add a method to get missing revs incrementally This will turn out to be useful for discovery.	2014-11-16 00:39:48 -08:00
Siddharth Agarwal	7103eb28ea	ancestor.lazyancestors: take parentrevs function rather than changelog Principle of least privilege, and it also brings this in line with missingancestors.	2014-11-14 14:36:25 -08:00
Siddharth Agarwal	8354d9169f	revlog: cache chain info after calculating it for a rev (issue4452) This dumb cache works surprisingly well: on a repository with typical delta chains ~50k in length, unbundling a linear series of 5000 revisions (changelogs and manifests only) went from 60 seconds to 3.	2014-11-13 21:36:38 -08:00
Siddharth Agarwal	1acd4cfca4	revlog: increase I/O bound to 4x the amount of data consumed This doesn't affect normal clones since they'd be bound by the CPU bound below anyway -- it does, however, improve generaldelta clones significantly. This also results in better deltaing for generaldelta clones -- in generaldelta clones, we calculate deltas with respect to the closest base if it has a higher revision number than either parent. If the base is on a significantly different branch, this can result in pointlessly massive deltas. This reduces the number of bases and hence the number of bad deltas. Empirically, for a highly branchy repository, this resulted in an improvement of around 15% to manifest size.	2014-11-11 20:08:19 -08:00
Siddharth Agarwal	fe51051ee5	revlog: bound based on the length of the compressed deltas This is only relevant for generaldelta clones.	2014-11-11 20:01:19 -08:00
Siddharth Agarwal	27976ad2dc	revlog: compute length of compressed deltas along with chain length In upcoming patches to the revlog, we're going to split up the notions of bounding I/O and bounding CPU.	2014-11-11 19:54:36 -08:00
Siddharth Agarwal	6e115e5383	revlog: store fulltext when compressed delta is bigger than it This is a very silly case and not particularly likely to happen in the wild, but it turns out we can hit it in a couple of places. As we tune the storage parameters we're likely to hit more such cases. The affected test cases all have smaller revlogs now.	2014-11-11 21:41:12 -08:00
Siddharth Agarwal	e5d387f47e	revlog: make a predicate clearer with parens	2014-11-11 21:39:56 -08:00
Mateusz Kwapich	3433abb6a8	revlog: add config variable for limiting delta-chain length The current heuristic for deciding between storing delta and full texts is based on ratio of (sizeofdeltas)/(sizeoffulltext). In some cases (for example a manifest for ahuge repo) this approach can result in extremely long delta chains (~30,000) which are very slow to read. (In the case of a manifest ~500ms are added to every hg command because of that). This commit introduces "revlog.maxchainlength" configuration variable that will limit delta chain length.	2014-11-06 14:20:05 -08:00
Mateusz Kwapich	1a554418d5	debugrevlog: fix computing chain length in debugrevlog -d The chain length was computed correctly only when generaldelta feature was enabled. Now it's fixed. When generaldelta is disabled the base revision in revlog index is not the revision we have delta against - it's always previous revision. Instead of incorrect chainbaseandlen in command.py we are now using two single-responsibility functions in revlog.py: - chainbase(rev) - chainlen(rev) Only chainlen(rev) was missing so it was written to mimic the way the chain of deltas is actually found during file reconstruction.	2014-11-06 14:08:25 -08:00
Mike Edgar	ba052f742a	revlog: support importing censored file revision tombstones This change allows a revision log to not fail integrity checks when applying a changegroup delta (eg from a bundle) results in a censored file tombstone. The tombstone is inserted as-is, so future integrity verification will observe the tombstone. Deltas based on the tombstone will also remain correct. The new code path is encountered for exactly the cases where _addrevision is importing a tombstone from a changegroup. When committing a file containing the "magic" tombstone text, the "text" parameter will be non-empty and the checkhash call is not executed (and when committing, the node will be computed to match the "magic" tombstone text).	2014-09-03 16:34:29 -04:00
Augie Fackler	b2278203f5	revlog: move references to revlog.hash to inside the revlog class This will make it possible for subclasses to have different hashing schemes when appropriate. I anticipate using this in manifests. Note that there's still one client of mercurial.revlog.hash() outside of revlog: mercurial.context.memctx uses it to construct the file entries in an in-memory manifest. I don't think this will be a problem in the immediate future, so I've left it as-is.	2014-09-24 15:14:44 -04:00
Augie Fackler	44b8666b93	revlog: mark nullhash as module-private No other module should ever need this, so mark it with _ so nobody tries to use it.	2014-09-24 15:10:52 -04:00
Mads Kiilerich	fae32dd0a3	comments: describe ancestor consistently - avoid 'least common ancestor' "best" is as defined by mercurial.ancestor.ancestors: furthest from a root (as measured by longest path).	2014-08-19 01:13:10 +02:00
Mads Kiilerich	b46e11faa6	revlog: introduce isancestor method for efficiently determining node lineage Hide the not so obvious use of commonancestorsheads.	2014-08-19 01:13:10 +02:00
Matt Mackall	9bc396577d	repoview: fix 0L with pack/unpack for 2.4	2014-08-26 13:11:53 +02:00
Matt Mackall	dad18b6446	revlog: fix check-code error	2014-06-14 11:49:02 -05:00
Matt Mackall	6faaeed973	revlog: hold a private reference to self._cache This keeps other threads from modifying self._cache out from under us. With this and the previous fix, 'hg serve' survives 100k hits with siege.	2014-06-13 14:17:14 -05:00
Matt Mackall	6c57e49897	revlog: make _chunkcache access atomic With this and related fixes, 'hg serve' survived 100000 hits with siege.	2014-06-13 14:16:03 -05:00
Mads Kiilerich	cb0184290d	revlog: backout 1c95c1863327 - commonancestors	2014-04-17 20:01:39 +02:00
Mads Kiilerich	1478c563ea	revlog: introduce commonancestorsheads method Very similar to commonancestors but giving all the common ancestors heads.	2014-04-17 20:01:35 +02:00
Matt Mackall	4610a4e481	merge with stable	2014-04-10 12:41:39 -04:00
Matt Mackall	b154d8d029	revlog: deal with chunk ranges over 2G on Windows (issue4215) Python uses a C long (32 bits on Windows 64) rather than an ssize_t in read(), and thus has a 2G size limit. Work around this by falling back to reading one chunk at a time on overflow. This approximately doubles our headroom until we run back into the size limit on single reads.	2014-04-07 14:18:10 -05:00
Mads Kiilerich	de55a8fdda	revlog: introduce commonancestors method for getting all common ancestor heads	2014-02-24 22:42:14 +01:00
Durham Goode	b147e53e3f	revlog: move file writing to a separate function Moves the code that actually writes to a file to a separate function in revlog.py. This allows extensions to intercept and use the data being written to disk. For example, an extension might want to replicate these writes elsewhere. When cloning the Mercurial repo on /dev/shm with --pull, I see about a 0.3% perf change. It goes from 28.2 to 28.3 seconds.	2013-11-26 12:58:27 -08:00
Brodie Rao	43ab01245b	revlog: allow tuning of the chunk cache size (via format.chunkcachesize) Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on Mac OS X with an SSD, before this change: $ hg --config format.chunkcachesize=1024 perfmoonwalk ! wall 2.022021 comb 2.030000 user 1.970000 sys 0.060000 (best of 5) (16,154 cache hits, 3,840 misses.) $ hg --config format.chunkcachesize=4096 perfmoonwalk ! wall 1.901006 comb 1.900000 user 1.880000 sys 0.020000 (best of 6) (19,003 hits, 991 misses.) $ hg --config format.chunkcachesize=16384 perfmoonwalk ! wall 1.802775 comb 1.800000 user 1.800000 sys 0.000000 (best of 6) (19,746 hits, 248 misses.) $ hg --config format.chunkcachesize=32768 perfmoonwalk ! wall 1.818545 comb 1.810000 user 1.810000 sys 0.000000 (best of 6) (19,870 hits, 124 misses.) $ hg --config format.chunkcachesize=65536 perfmoonwalk ! wall 1.801350 comb 1.810000 user 1.800000 sys 0.010000 (best of 6) (19,932 hits, 62 misses.) $ hg --config format.chunkcachesize=131072 perfmoonwalk ! wall 1.805879 comb 1.820000 user 1.810000 sys 0.010000 (best of 6) (19,963 hits, 31 misses.) We may want to change the default size in the future based on testing and user feedback.	2013-11-17 18:04:29 -05:00
Brodie Rao	976b336ba8	revlog: read/cache chunks in fixed windows of 64 KB When reading a revlog chunk, instead of reading up to 64 KB ahead of the request offset and caching that, this change caches a fixed window before and after the requested data that falls on 64 KB boundaries. This increases cache hits when reading revlogs backwards. Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on Mac OS X with an SSD, before this change: $ hg perfmoonwalk ! wall 2.307994 comb 2.310000 user 2.120000 sys 0.190000 (best of 5) (Each run has 10,668 cache hits and 9,304 misses.) After this change: $ hg perfmoonwalk ! wall 1.814117 comb 1.810000 user 1.810000 sys 0.000000 (best of 6) (19,931 cache hits, 62 misses.) On a busy NFS share, before this change: $ hg perfmoonwalk ! wall 17.000034 comb 4.100000 user 3.270000 sys 0.830000 (best of 3) After: $ hg perfmoonwalk ! wall 1.746115 comb 1.670000 user 1.660000 sys 0.010000 (best of 5)	2013-11-17 18:04:28 -05:00
Durham Goode	d8c96277e4	strip: add faster revlog strip computation The previous revlog strip computation would walk every rev in the revlog, from the bottom to the top. Since we're usually stripping only the top few revs of the revlog, this was needlessly expensive on large repos. The new algorithm walks the exact number of revs that will be stripped, thus making the operation not dependent on the number of revs in the repo. This makes amend on a large repo go from 8.7 seconds to 6 seconds.	2013-11-11 16:42:49 -08:00
Durham Goode	9840379432	revlog: return lazy set from findcommonmissing When computing the commonmissing, it greedily computes the entire set immediately. On a large repo where the majority of history is irrelevant, this causes a significant slow down. Replacing it with a lazy set makes amend go from 11 seconds to 8.7 seconds.	2013-11-11 16:40:02 -08:00
Matt Mackall	2513030d97	merge with stable	2013-09-23 11:37:06 -07:00
Wojciech Lopata	0a0c3321e2	generaldelta: initialize basecache properly Previously basecache was incorrectly initialized before adding the first revision from a changegroup. Basecache value influences when full revisions are stored in revlog (when using generaldelta). As a result it was possible to generate a generaldelta-revlog that could be bigger by arbitrary factor than its non-generaldelta equivalent.	2013-09-20 10:45:51 -07:00
Siddharth Agarwal	64c41699fa	revlog: remove _chunkbase since it is no longer used This was introduced in 2011 for the lwcopy feature but never actually got used. A similar hook can easily be reintroduced if needed in the future.	2013-09-06 23:05:33 -07:00
Siddharth Agarwal	bcd16c3892	revlog: move chunk cache preload from revision to _chunks In case we don't have a cached text already, add the base rev to the list passed to _chunks. In the cached case this also avoids unnecessarily preloading the chunk for the cached rev.	2013-09-06 23:05:11 -07:00
Siddharth Agarwal	61f0e46895	revlog._chunks: inline getchunk We do this in a somewhat hacky way, relying on the fact that our sole caller preloads the cache right before calling us. An upcoming patch will make this more sensible. For a 20 MB manifest with a delta chain of > 40k, perfmanifest goes from 0.49 seconds to 0.46.	2013-09-06 22:57:51 -07:00
Siddharth Agarwal	b7abdca3c1	revlog.revision: fix cache preload for inline revlogs Previously the length of data preloaded did not account for the interleaved io contents. This meant that we'd sometimes have cache misses in _chunks despite the preloading. Having a correctly filled out cache will become essential in an upcoming patch.	2013-09-07 12:42:46 -07:00
Siddharth Agarwal	d24e970042	revlog: add a fast method for getting a list of chunks This moves _chunkraw into the loop. Doing that improves revlog decompression -- in particular, manifest decompression -- significantly. For a 20 MB manifest which is the result of a > 40k delta chain, hg perfmanifest improves from 0.55 seconds to 0.49 seconds.	2013-09-06 16:31:35 -07:00
Wojciech Lopata	3a79365ccc	revlog: pass node as an argument of addrevision This change will allow revlog subclasses that override 'checkhash' method to use custom strategy of computing nodeids without overriding 'addrevision' method. In particular this change is necessary to implement manifest compression.	2013-08-19 11:25:23 -07:00
Wojciech Lopata	299c718f66	revlog: extract 'checkhash' method Extract method that decides whether nodeid is correct for paricular revision text and parent nodes. Having this method extracted will allow revlog subclasses to implement custom way of computing nodes. In particular this change is necessary to implement manifest compression.	2013-08-19 11:06:38 -07:00
Matt Mackall	06155d5c8a	revlog: handle hidden revs in _partialmatch (issue3979) Looking up hidden prefixes could cause a no node exception Looking up unique non-hidden prefixes could be ambiguous	2013-07-23 17:28:12 -05:00
Durham Goode	e750755eba	revlog: add exception when linkrev == nullrev When we deployed the latest crew mercurial to our users, a few of them had issues where a filelog would have an entry with a -1 linkrev. This caused operations like rebase and amend to create a bundle containing the entire repository, which took a long time. I don't know what the issue is, but adding this check should prevent repos from getting in this state, and should help us pinpoint the issue next time it happens.	2013-06-17 19:44:00 -07:00
Sune Foldager	6bd4fdfe9d	bundle-ng: move group into the bundler No additional semantic changes made.	2013-05-10 21:03:01 +02:00
Alexander Plavin	48936f264c	revlog: fix a regression with null revision Introduced in the patch which fixes issue3497 Part of that patch was erroneously submitted and it shouldn't be in the code	2013-04-18 16:46:09 +04:00
Alexander Plavin	829cf92d16	log: fix behavior with empty repositories (issue3497) Make output in this special case consistent with general case one.	2013-04-17 00:29:54 +04:00
Bryan O'Sullivan	512383d40e	revlog: don't cross-check ancestor result against Python version	2013-04-16 10:08:20 -07:00
Bryan O'Sullivan	c6b9f1099d	parsers: a C implementation of the new ancestors algorithm The performance of both the old and new Python ancestor algorithms depends on the number of revs they need to traverse. Although the new algorithm performs far better than the old when revs are numerically and topologically close, both algorithms become slow under other circumstances, taking up to 1.8 seconds to give answers in a Linux kernel repo. This C implementation of the new algorithm is a fairly straightforward transliteration. The only corner case of interest is that it raises an OverflowError if the number of GCA candidates found during the first pass is greater than 24, to avoid the dual perils of fixnum overflow and trying to allocate too much memory. (If this exception is raised, the Python implementation is used instead.) Performance numbers are good: in a Linux kernel repo, time for "hg debugancestors" on two distant revs (24bf01de7537 and c2a8808f5943) is as follows: Old Python: 0.36 sec New Python: 0.42 sec New C: 0.02 sec For a case where the new algorithm should perform well: Old Python: 1.84 sec New Python: 0.07 sec New C: measures as zero when using --time (This commit includes a paranoid cross-check to ensure that the Python and C implementations give identical answers. The above performance numbers were measured with that check disabled.)	2013-04-16 10:08:20 -07:00
Bryan O'Sullivan	59b785a485	revlog: choose a consistent ancestor when there's a tie Previously, we chose a rev based on numeric ordering, which could cause "the same merge" in topologically identical but numerically different repos to choose different merge bases. We now choose the lexically least node; this is stable across different revlog orderings.	2013-04-16 10:08:19 -07:00
Bryan O'Sullivan	4a3a46aff6	ancestor: a new algorithm that is faster for nodes near tip Instead of walking all the way to the root of the DAG, we generate a set of candidate GCA revs, then figure out which ones will win the race to the root (usually without needing to traverse all the way to the root). In the common case of nodes that are close to each other in both revision number and topology, this is usually a big win: it makes "hg --time debugancestors" up to 9 times faster than the more general ancestor function when measured on heads of the linux-2.6 hg repo. Victory is not assured, however. The older function can still win by a large margin if one node is much closer to the root than the other, or by a much smaller amount if one is an ancestor of the other. For now, we've also got a small paranoid harness function that calls both ancestor functions on every input and ensures that they give equivalent answers. Even without the checker function, the old ancestor function needs to stay alive for the time being, as its generality is used by context.filectx.merge.	2013-04-16 10:08:18 -07:00
Benoit Boissinot	41300c28e0	revlog: document v0 format	2013-02-09 12:08:02 +01:00
Siddharth Agarwal	4d560304bb	revlog: move ancestor generation out to a new class This refactoring is to prepare for implementing lazy membership.	2012-12-18 10:14:01 -08:00

1 2 3 4 5 ...

473 Commits