sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-09 08:18:15 +03:00

Author	SHA1	Message	Date
Martin von Zweigbergk	0788ad5e3c	revlog: remove unused variable 'chainlen'	2015-12-04 17:22:26 -08:00
Pierre-Yves David	18b7a437e6	addrevision: use general delta when the incoming base delta is bad We unify the delta selection process to be a simple three options process: - try to use the incoming delta (if lazydeltabase is on) - try to find a suitable parents to delta against (if gd is on) - try to delta against the tipmost revision The first of this option that yield a valid delta will be used. The test change in 'test-generaldelta.t' show this behavior as we use a delta against the parent instead of a full delta when the incoming delta is not suitable. This as some impact on 'test-bundle.t' because a delta somewhere changes. It does not seems to change the test semantic and have been ignored.	2015-12-01 16:15:59 -08:00
Pierre-Yves David	01dc52b10b	addrevision: rework generaldelta computation The old code have multiple explicit tests and code duplications. This makes it hard to improve the code. We rewrite the logic in a more generic way, not changing anything of the computed result. The final goal here is to eventually be able to: - factor out the default fallback case "try against 'prev'" in a single place - allow 'lazydeltabase' case to use the smarter general delta code path when the incoming base does not provide us with a good delta.	2015-12-01 18:45:16 -08:00
Pierre-Yves David	453d103fad	addrevision: only use the incoming base if it is a good delta (issue4975) Before this change, the 'lazydeltabase' would blindly build a delta using the base provided by the incoming bundle and try to use it. If that base was far down the revlog, the delta would be seen as "no good" and we would fall back to a full text revision. We now check if the delta is good and fallback to a computing a delta again the tipmost revision otherwise (as we would do without general delta). Later changesets will improve the logic to compute the fallback delta using the general delta logic.	2015-12-01 16:06:20 -08:00
Pierre-Yves David	41fc9ceddb	addrevision: handle code path not producing delta We would like to be able to exit the delta generation block without a valid delta (for a more flexible control flow). So we make sure we do not expand the "delta" content unless we actually have a delta. We can do it one level lower because 'delta' is initialised at None anyway. Not adding a level to the assignment prevent a line length issue.	2015-12-01 16:22:49 -08:00
Pierre-Yves David	e8bed37496	addrevision: rename 'd' to 'delta' That variable is quite central to the whole function. Giving it a more explicit name help with code readability.	2015-12-01 15:29:11 -08:00
Gregory Szorc	09f64f3e3e	revlog: improve documentation There are a lot of functions and variables doing similar things. Document the role and functionality of each to make it easier to grok.	2015-11-22 16:23:20 -08:00
Augie Fackler	e212c519f3	revlog: rename bundle to cg to reflect its nature as a cg?unpacker The new convention is that bundles contain changegroups. bundle1 happens to only be a changegroup, but bundle2 is a more featureful container that isn't something you can pass to addgroup().	2015-10-14 11:32:33 -04:00
Gregory Szorc	4b88c2e0ff	revlog: don't flush data file after every added revision The current behavior of revlogs is to flush the data file when writing data to it. Tracing system calls revealed that changegroup processing incurred numerous write(2) calls for values much smaller than the default buffer size (Python defaults to 4096, but it can be adjusted based on detected block size at run time by CPython). The reason we flush revlogs is so readers have all data available. For example, the current code in revlog.py will re-open the revlog file (instead of seeking an existing file handle) to read the text of a revision. This happens when starting a new delta chain when adding several revisions from changegroups, for example. Yes, this is likely sub-optimal (we should probably be sharing file descriptors between readers and writers to avoid the flushing and associated overhead of re-opening files). While flushing revlogs is necessary, it appears all callers are diligent about flushing files before a read is performed (see buildtext() in _addrevision()), making the flush in _writeentry() redundant and unncessary. So, we remove it. In practice, this means we incur a write(2) a) when the buffer is full (typically 4096 bytes) b) when a new delta chain is created rather than after every added revision. This applies to every revlog, but by volume it mostly impacts filelogs. Removing the redundant flush from _writeentry() significantly reduces the number of write(2) calls during changegroup processing on my Linux machine. When applying a changegroup of the hg repo based on my local repo, the total number of write(2) calls during application of the mercurial/localrepo.py revlogs dropped from 1,320 to 217 with this patch applied. Total I/O related system calls dropped from 1,577 to 474. When unbundling a mozilla-central gzipped bundle (264,403 changesets with 1,492,215 changes to 222,507 files), total write(2) calls dropped from 1,252,881 to 827,106 and total system calls dropped from 3,601,259 to 3,178,636 - a reduction of 425,775! While the system call reduction is significant, it appears to have no impact on wall time on my Linux and Windows machines. Still, fewer syscalls is fewer syscalls. Surely this can't hurt. If nothing else, it makes examining remaining system call usage simpler and opens the door to experimenting with the performance impact of different buffer sizes.	2015-09-26 21:43:13 -07:00
Gregory Szorc	5b6556c143	revlog: use existing file handle when reading during _addrevision _addrevision() may need to read from revlogs as part of computing deltas. Previously, we would flush existing file handles and open a new, short-lived file handle to perform the reading. If we have an existing file handle, it seems logical to reuse it for reading instead of opening a new file handle. This patch makes that the new behavior. After this patch, revlog files are only reopened when adding revisions if the revlog is switched from inline to non-inline. On Linux when unbundling a bundle of the mozilla-central repo, this patch has the following impact on system call counts: Call Before After Delta write 827,639 673,390 -154,249 open 700,103 684,089 -16,014 read 74,489 74,489 0 fstat 493,924 461,896 -32,028 close 249,131 233,117 -16,014 stat 242,001 242,001 0 lstat 18,676 18,676 0 lseek 20,268 20,268 0 ioctl 14,652 13,173 -1,479 TOTAL 3,180,758 2,930,679 -250,079 It's worth noting that many of the open() calls fail due to missing files. That's why there are many more open() calls than close(). Despite the significant system call reduction, this change does not seem to have a significant performance impact on Linux. On Windows 10 (not a VM, on a SSD), this patch appears to reduce unbundle time for mozilla-central from ~960s to ~920s. This isn't as significant as I was hoping. But a decrease it is nonetheless. Still, Windows unbundle performance is still >2x slower than Linux. Despite the lack of significant gains, fewer system calls is fewer system calls. If nothing else, this will narrow the focus of potential areas to optimize in the future.	2015-09-27 16:08:18 -07:00
Gregory Szorc	4f8cd71a6f	revlog: always open revlogs for reading and appending An upcoming patch will teach revlogs to use the existing file handle to read revision data instead of opening a new file handle just for quick reads. For this to work, files must be opened for reading as well. This patch is merely cosmetic: there are no behavior changes.	2015-09-27 15:59:19 -07:00
Gregory Szorc	574108c7d9	revlog: support using an existing file handle when reading revlogs Currently, the low-level revlog reading code always opens a new file handle. In some key scenarios, the revlog is already opened and an existing file handle could be used to read. This patch paves the road to that by teaching various revlog reading functions to accept an optional existing file handle to read from.	2015-09-27 15:48:35 -07:00
Gregory Szorc	106b00c51d	revlog: add docstring for checkinlinesize() The name is deceptive: it does more than just "check." Add a docstring to clarify what's going on.	2015-09-27 15:31:50 -07:00
Gregory Szorc	7769f0b9ff	revlog: optionally cache the full text when adding revisions revlog instances can cache the full text of a single revision. Typically the most recently read revision is cached. When adding a delta group via addgroup() and _addrevision(), the full text isn't always computed: sometimes only the passed in delta is sufficient for adding a new revision to the revlog. When writing the changelog from a delta group, the just-added full text revision is always read immediately after it is written because the changegroup code needs to extract the set of files from the entry. In other words, revision() is always being called and caching the full text of the just-added revision is guaranteed to result in a cache hit, making the cache worthwhile. This patch adds support to _addrevision() for always building and caching the full text. This option is currently only active when processing changelog entries from a changegroup. While the total number of revision() calls is the same, the location matters: buildtext() calls into revision() on the base revision when building the full text of the just-added revision. Since the previous revision's _addrevision() built the full text and the the previous revision is likely the base revision, this means that the base revision's full text is likely cached and can be used to compute the current full text from just a delta. No extra I/O required. The end result is the changelog isn't opened and read after adding every revision from a changegroup. On my 2013 MacBook Pro running OS X 10.10.5 from an SSD and Python 2.7, this patch impacted the time taken to apply ~262,000 changesets from a mozilla-central gzip bundle: before: ~43s after: ~32s ~25% reduction in changelog processing times. Not bad.	2015-09-12 16:11:17 -07:00
Gregory Szorc	7f63b5672e	revlog: drop local assignment of cache variable The purpose of this code was to provide thread safety. With the conversion of hgweb to use separate localrepository instances per request/thread, we should no longer have any consumers that need to access revlog instances from multiple threads. Remove the code.	2015-09-12 15:16:47 -07:00
Gregory Szorc	7784d9ad76	revlog: rename generic "i" variable to "indexdata" Increase readability.	2015-09-12 12:47:00 -07:00
Durham Goode	be92ed2487	revlog: add an aggressivemergedelta option This adds an option for delta'ing against both p1 and p2 when applying merge revisions and picking whichever is smallest. Some before and after stats on manifest.d size: internal large repo: before: 1.2 GB after: 930 MB mozilla-central: before: 261 MB after: 92 MB	2015-08-30 14:03:32 -07:00
Durham Goode	98500575fc	revlog: change generaldelta delta parent heuristic The old generaldelta heuristic was "if p1 (or p2) was closer than the last full text, use it, otherwise use prev". This was problematic when a repo contained multiple branches that were very different. If commits to branch A were pushed, and the last full text was branch B, it would generate a fulltext. Then if branch B was pushed, it would generate another fulltext. The problem is that the last fulltext (and delta'ing against `prev` in general) has no correlation with the contents of the incoming revision, and therefore will always have degenerate cases. According to the blame, that algorithm was chosen to minimize the chain length. Since there is already code that protects against that (the delta-vs-fulltext code), and since it has been improved since the original generaldelta algorithm went in (2011), I believe the chain length criteria will still be preserved. The new algorithm always diffs against p1 (or p2 if it's closer), unless the resulting delta will fail the delta-vs-fulltext check, in which case we delta against prev. Some before and after stats on manifest.d size. internal large repo old heuristic - 2.0 GB new heuristic - 1.2 GB mozilla-central old heuristic - 242 MB new heuristic - 261 MB The regression in mozilla central is due to the new heuristic choosing p2r as the delta when it's closer to the tip. Switching the algorithm to always prefer p1r brings the size back down (242 MB). This is result of the way in which mozilla does merges and pushes, and the result could easily swing the other direction in other repos (depending on if they merge X into Y or Y into X), but will never be as degenerate as before. I future patch will address the regression by introducing an optional, even more aggressive delta heuristic which will knock the mozilla manifest size down dramatically.	2015-08-30 13:58:11 -07:00
Durham Goode	47e86258b9	revlog: move textlen calculation to be above delta chooser This moves the textlen calculation to be above the delta chooser. Since textlen is needed for calling isgooddelta, we need it above the delta chooser so future patches can call isgooddelta.	2015-08-30 13:34:30 -07:00
Durham Goode	2ff0cbe125	revlog: move delta check to it's own function This moves the delta vs fulltext comparison to its own function. This will allow us to reuse the function in future patches for more efficient delta choices. As a side effect, this will also allow extensions to modify our delta criteria.	2015-08-30 13:33:00 -07:00
Pierre-Yves David	c8b7676e04	format: introduce 'format.usegeneraldelta` This option will make repositories created as general delta by default but will not make Mercurial aggressively recompute deltas for all incoming bundle. Instead, the delta contained in the bundle will be used. This will allow us to start having general delta repositories created everywhere without triggering massive recomputation costs for all new clients cloning from old servers.	2015-11-02 15:59:12 +00:00
Yuya Nishihara	f08cad7728	revlog: remove unused shaoffset constants Call sites were removed at 1299f0c14572, "revlog: remove lazy index".	2015-08-02 12:16:19 +09:00
Yuya Nishihara	28a45149e9	revlog: correct comment about size of v0 index format	2015-08-02 01:14:11 +09:00
Gregory Szorc	a60875d3fb	revlog: add support for a callback whenever revisions are added A subsequent patch will add a feature that performs iterative computation as changesets are added from a changegroup. To facilitate this type of processing in a generic manner, we add a mechanism for calling a function whenever a revision is added via revlog.addgroup(). There are potential performance concerns with this callback, as using it will flush the revlog after every revision is added.	2015-07-18 10:29:37 -07:00
Gregory Szorc	5380dea2a7	global: mass rewrite to use modern exception syntax Python 2.6 introduced the "except type as instance" syntax, replacing the "except type, instance" syntax that came before. Python 3 dropped support for the latter syntax. Since we no longer support Python 2.4 or 2.5, we have no need to continue supporting the "except type, instance". This patch mass rewrites the exception syntax to be Python 2.6+ and Python 3 compatible. This patch was produced by running `2to3 -f except -w -n .`.	2015-06-23 22:20:08 -07:00
Matt Mackall	2f915e33db	revlog: move size limit check to addrevision This lets us add the name of the indexfile to the message.	2015-06-04 14:57:58 -05:00
Jordi Gutiérrez Hermoso	211712bc95	revlog: raise an exception earlier if an entry is too large (issue4675) Before we were relying on _pack to error out when trying to pass an integer that was too large for the "i" format specifier. Now we check this earlier so we can form a better error message. The error message unfortunately must exclude the filename at this level of the call stack. The problem is that this name is not available here, and the error can be triggered by a large manifest or by a large file itself. Although perhaps we could provide the name of a revlog index file (from the revlog object, instead of the revlogio object), this seems like too much leakage of internal data structures. It's not ideal already that an error message even mentions revlogs, but this does seem unavoidable here.	2015-06-02 15:04:39 -04:00
Laurent Charignon	e470f84c03	phases: fix bug where native phase computation wasn't called I forgot to include this change as a previous diff and the native code to compute the phases was never called. The AttributeError was silently caught and the pure implementation was used instead.	2015-05-29 14:24:50 -07:00
Martin von Zweigbergk	4127f67694	util: drop alias for collections.deque Now that util.deque is just an alias for collections.deque, let's just remove it.	2015-05-16 11:28:04 -07:00
Mike Edgar	c2ecfc164f	revlog: make converting from inline to non-line work after a strip The checkinlinesize function, which converts inline revlogs to non-inline, uses the current transaction's "data" field to determine how to update the transaction after the conversion. This change works around the missing data field, which is not in the transaction after a strip.	2015-03-25 15:58:31 -04:00
Laurent Charignon	6b06793e68	phase: default to C implementation for phase computation	2015-03-20 11:14:27 -07:00
Mike Edgar	93a70657fb	revlog: addgroup checks if incoming deltas add censored revs, sets flag bit A censored revision stored in a revlog should have the censored revlog index flag bit set. This implies we must know if a revision is censored before we add it to the revlog. When adding revisions from exchanged deltas, we would prefer to determine this flag without decoding every single full text. This change introduces a heuristic based on assumptions around the Mercurial delta format and filelog metadata. Since deltas which produce a censored revision must be full-replacement deltas, we can read the delta's first bytes to check the filelog metadata. Since "censored" is the alphabetically first filelog metadata key, censored filelog revisions have a well-known prefix we can look for. For more on the design and background of the censorship feature, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-01-14 15:16:08 -05:00
Mike Edgar	3562a06261	revlog: _addrevision creates full-replace deltas based on censored revisions A delta against a censored revision is either received through exchange and written blindly to a revlog, or it is created by the revlog itself. This change ensures the latter process creates deltas which fully replace all data in a censored base using a single patch operation. Recipients of a delta against a censored base will verify that the delta is in this full-replace format. Other recipients will use the delta as normal. For background and broader design of the censorship feature, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-01-21 17:11:37 -05:00
Mike Edgar	0a639adee5	revlog: special case expanding full-replacement deltas received by exchange When a delta received through exchange is added to a revlog, it will very often be expanded to a full text by applying the delta to its base. If that delta is of a particular form, we can avoid decoding the base revision. This avoids an exception if the base revision is censored. For background and broader design of the censorship feature, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-02-06 01:38:16 +00:00
Mike Edgar	9635f8c5b0	revlog: in addgroup, reject ill-formed deltas based on censored nodes To ensure interoperability when clones disagree about which file nodes are censored, a restriction is made on deltas based on censored nodes. Any such delta must replace the full text of the base in a single patch. If the recipient of a delta considers the base to be censored and the delta is not in the expected form, the recipient must reject it, as it can't know if the source has also censored the base. For background and broader design of the censorship feature, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-02-06 00:55:29 +00:00
Mike Edgar	c736894d9c	revlog: add "iscensored()" to revlog public API The iscensored method will be used by the exchange layer to reject nonconforming deltas involving censored revisions (and to produce conforming deltas). For background and broader design of the censorship feature, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-01-23 17:01:39 -05:00
Yuya Nishihara	237120f282	revlog: add __contains__ for fast membership test Because revlog implements __iter__, "rev in revlog" works but does silly O(n) lookup unexpectedly. So it seems good to add fast version of __contains__. This allows "rev in repo.changelog" in the next patch.	2015-02-04 21:25:57 +09:00
Mike Edgar	39ea63d3f6	revlog: verify censored flag when hashing added revision fulltext When receiving a delta via exchange, three possible storage outcomes emerge: 1. The delta is added directly to the revlog. ("fast-path") 2. A freshly-computed delta with a different base is stored. 3. The new revision's fulltext is computed and stored outright. Both (2) and (3) require materializing the full text of the new revision by applying the delta to its base. This is typically followed by a hash check. The new flags argument allows callers to _addrevision to signal that they expect that hash check to fail. We can use this opportunity to verify that expectation. If the hash fails, require the flag be set; if the hash passes, require the flag be unset. Rather than simply eliding the hash check, this approach provides some assurance that the censored flag is not applied to valid revisions. Read more at: http://mercurial.selenic.com/wiki/CensorPlan	2015-01-12 14:41:25 -05:00
Mike Edgar	4b3ca8d71c	revlog: add flags argument to _addrevision, update callers use default flags For revlog index flags to be useful to other parts of Mercurial, they need to be settable when writing revisions. The current use case for revlog index flags is the censorship feature: http://mercurial.selenic.com/wiki/CensorPlan While the censor flag could be inferred in _addrevision by interrogating the text/delta being added, that would bury the censorship logic and inappropriately couple it to all revision creation.	2015-01-12 14:30:24 -05:00
Mike Edgar	339a2739c2	revlog: define censored flag for revlogng index This flag bit will be used to cheaply signal censorship presence to upper layers (exchange, verify). It indicates that censorship metadata is present but does not attest to the verifiability of that metadata. For the censorship design, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-01-12 14:01:52 -05:00
Siddharth Agarwal	2d669c474b	revlog: switch findmissing* methods to incrementalmissingrevs This will allow us to remove ancestor.missingancestors in an upcoming patch.	2014-11-14 16:52:40 -08:00
Siddharth Agarwal	5692148f49	revlog: add a method to get missing revs incrementally This will turn out to be useful for discovery.	2014-11-16 00:39:48 -08:00
Siddharth Agarwal	7103eb28ea	ancestor.lazyancestors: take parentrevs function rather than changelog Principle of least privilege, and it also brings this in line with missingancestors.	2014-11-14 14:36:25 -08:00
Siddharth Agarwal	8354d9169f	revlog: cache chain info after calculating it for a rev (issue4452) This dumb cache works surprisingly well: on a repository with typical delta chains ~50k in length, unbundling a linear series of 5000 revisions (changelogs and manifests only) went from 60 seconds to 3.	2014-11-13 21:36:38 -08:00
Siddharth Agarwal	1acd4cfca4	revlog: increase I/O bound to 4x the amount of data consumed This doesn't affect normal clones since they'd be bound by the CPU bound below anyway -- it does, however, improve generaldelta clones significantly. This also results in better deltaing for generaldelta clones -- in generaldelta clones, we calculate deltas with respect to the closest base if it has a higher revision number than either parent. If the base is on a significantly different branch, this can result in pointlessly massive deltas. This reduces the number of bases and hence the number of bad deltas. Empirically, for a highly branchy repository, this resulted in an improvement of around 15% to manifest size.	2014-11-11 20:08:19 -08:00
Siddharth Agarwal	fe51051ee5	revlog: bound based on the length of the compressed deltas This is only relevant for generaldelta clones.	2014-11-11 20:01:19 -08:00
Siddharth Agarwal	27976ad2dc	revlog: compute length of compressed deltas along with chain length In upcoming patches to the revlog, we're going to split up the notions of bounding I/O and bounding CPU.	2014-11-11 19:54:36 -08:00
Siddharth Agarwal	6e115e5383	revlog: store fulltext when compressed delta is bigger than it This is a very silly case and not particularly likely to happen in the wild, but it turns out we can hit it in a couple of places. As we tune the storage parameters we're likely to hit more such cases. The affected test cases all have smaller revlogs now.	2014-11-11 21:41:12 -08:00
Siddharth Agarwal	e5d387f47e	revlog: make a predicate clearer with parens	2014-11-11 21:39:56 -08:00
Mateusz Kwapich	3433abb6a8	revlog: add config variable for limiting delta-chain length The current heuristic for deciding between storing delta and full texts is based on ratio of (sizeofdeltas)/(sizeoffulltext). In some cases (for example a manifest for ahuge repo) this approach can result in extremely long delta chains (~30,000) which are very slow to read. (In the case of a manifest ~500ms are added to every hg command because of that). This commit introduces "revlog.maxchainlength" configuration variable that will limit delta chain length.	2014-11-06 14:20:05 -08:00
Mateusz Kwapich	1a554418d5	debugrevlog: fix computing chain length in debugrevlog -d The chain length was computed correctly only when generaldelta feature was enabled. Now it's fixed. When generaldelta is disabled the base revision in revlog index is not the revision we have delta against - it's always previous revision. Instead of incorrect chainbaseandlen in command.py we are now using two single-responsibility functions in revlog.py: - chainbase(rev) - chainlen(rev) Only chainlen(rev) was missing so it was written to mimic the way the chain of deltas is actually found during file reconstruction.	2014-11-06 14:08:25 -08:00
Mike Edgar	ba052f742a	revlog: support importing censored file revision tombstones This change allows a revision log to not fail integrity checks when applying a changegroup delta (eg from a bundle) results in a censored file tombstone. The tombstone is inserted as-is, so future integrity verification will observe the tombstone. Deltas based on the tombstone will also remain correct. The new code path is encountered for exactly the cases where _addrevision is importing a tombstone from a changegroup. When committing a file containing the "magic" tombstone text, the "text" parameter will be non-empty and the checkhash call is not executed (and when committing, the node will be computed to match the "magic" tombstone text).	2014-09-03 16:34:29 -04:00
Augie Fackler	b2278203f5	revlog: move references to revlog.hash to inside the revlog class This will make it possible for subclasses to have different hashing schemes when appropriate. I anticipate using this in manifests. Note that there's still one client of mercurial.revlog.hash() outside of revlog: mercurial.context.memctx uses it to construct the file entries in an in-memory manifest. I don't think this will be a problem in the immediate future, so I've left it as-is.	2014-09-24 15:14:44 -04:00
Augie Fackler	44b8666b93	revlog: mark nullhash as module-private No other module should ever need this, so mark it with _ so nobody tries to use it.	2014-09-24 15:10:52 -04:00
Mads Kiilerich	fae32dd0a3	comments: describe ancestor consistently - avoid 'least common ancestor' "best" is as defined by mercurial.ancestor.ancestors: furthest from a root (as measured by longest path).	2014-08-19 01:13:10 +02:00
Mads Kiilerich	b46e11faa6	revlog: introduce isancestor method for efficiently determining node lineage Hide the not so obvious use of commonancestorsheads.	2014-08-19 01:13:10 +02:00
Matt Mackall	9bc396577d	repoview: fix 0L with pack/unpack for 2.4	2014-08-26 13:11:53 +02:00
Matt Mackall	dad18b6446	revlog: fix check-code error	2014-06-14 11:49:02 -05:00
Matt Mackall	6faaeed973	revlog: hold a private reference to self._cache This keeps other threads from modifying self._cache out from under us. With this and the previous fix, 'hg serve' survives 100k hits with siege.	2014-06-13 14:17:14 -05:00
Matt Mackall	6c57e49897	revlog: make _chunkcache access atomic With this and related fixes, 'hg serve' survived 100000 hits with siege.	2014-06-13 14:16:03 -05:00
Mads Kiilerich	cb0184290d	revlog: backout 1c95c1863327 - commonancestors	2014-04-17 20:01:39 +02:00
Mads Kiilerich	1478c563ea	revlog: introduce commonancestorsheads method Very similar to commonancestors but giving all the common ancestors heads.	2014-04-17 20:01:35 +02:00
Matt Mackall	4610a4e481	merge with stable	2014-04-10 12:41:39 -04:00
Matt Mackall	b154d8d029	revlog: deal with chunk ranges over 2G on Windows (issue4215) Python uses a C long (32 bits on Windows 64) rather than an ssize_t in read(), and thus has a 2G size limit. Work around this by falling back to reading one chunk at a time on overflow. This approximately doubles our headroom until we run back into the size limit on single reads.	2014-04-07 14:18:10 -05:00
Mads Kiilerich	de55a8fdda	revlog: introduce commonancestors method for getting all common ancestor heads	2014-02-24 22:42:14 +01:00
Durham Goode	b147e53e3f	revlog: move file writing to a separate function Moves the code that actually writes to a file to a separate function in revlog.py. This allows extensions to intercept and use the data being written to disk. For example, an extension might want to replicate these writes elsewhere. When cloning the Mercurial repo on /dev/shm with --pull, I see about a 0.3% perf change. It goes from 28.2 to 28.3 seconds.	2013-11-26 12:58:27 -08:00
Brodie Rao	43ab01245b	revlog: allow tuning of the chunk cache size (via format.chunkcachesize) Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on Mac OS X with an SSD, before this change: $ hg --config format.chunkcachesize=1024 perfmoonwalk ! wall 2.022021 comb 2.030000 user 1.970000 sys 0.060000 (best of 5) (16,154 cache hits, 3,840 misses.) $ hg --config format.chunkcachesize=4096 perfmoonwalk ! wall 1.901006 comb 1.900000 user 1.880000 sys 0.020000 (best of 6) (19,003 hits, 991 misses.) $ hg --config format.chunkcachesize=16384 perfmoonwalk ! wall 1.802775 comb 1.800000 user 1.800000 sys 0.000000 (best of 6) (19,746 hits, 248 misses.) $ hg --config format.chunkcachesize=32768 perfmoonwalk ! wall 1.818545 comb 1.810000 user 1.810000 sys 0.000000 (best of 6) (19,870 hits, 124 misses.) $ hg --config format.chunkcachesize=65536 perfmoonwalk ! wall 1.801350 comb 1.810000 user 1.800000 sys 0.010000 (best of 6) (19,932 hits, 62 misses.) $ hg --config format.chunkcachesize=131072 perfmoonwalk ! wall 1.805879 comb 1.820000 user 1.810000 sys 0.010000 (best of 6) (19,963 hits, 31 misses.) We may want to change the default size in the future based on testing and user feedback.	2013-11-17 18:04:29 -05:00
Brodie Rao	976b336ba8	revlog: read/cache chunks in fixed windows of 64 KB When reading a revlog chunk, instead of reading up to 64 KB ahead of the request offset and caching that, this change caches a fixed window before and after the requested data that falls on 64 KB boundaries. This increases cache hits when reading revlogs backwards. Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on Mac OS X with an SSD, before this change: $ hg perfmoonwalk ! wall 2.307994 comb 2.310000 user 2.120000 sys 0.190000 (best of 5) (Each run has 10,668 cache hits and 9,304 misses.) After this change: $ hg perfmoonwalk ! wall 1.814117 comb 1.810000 user 1.810000 sys 0.000000 (best of 6) (19,931 cache hits, 62 misses.) On a busy NFS share, before this change: $ hg perfmoonwalk ! wall 17.000034 comb 4.100000 user 3.270000 sys 0.830000 (best of 3) After: $ hg perfmoonwalk ! wall 1.746115 comb 1.670000 user 1.660000 sys 0.010000 (best of 5)	2013-11-17 18:04:28 -05:00
Durham Goode	d8c96277e4	strip: add faster revlog strip computation The previous revlog strip computation would walk every rev in the revlog, from the bottom to the top. Since we're usually stripping only the top few revs of the revlog, this was needlessly expensive on large repos. The new algorithm walks the exact number of revs that will be stripped, thus making the operation not dependent on the number of revs in the repo. This makes amend on a large repo go from 8.7 seconds to 6 seconds.	2013-11-11 16:42:49 -08:00
Durham Goode	9840379432	revlog: return lazy set from findcommonmissing When computing the commonmissing, it greedily computes the entire set immediately. On a large repo where the majority of history is irrelevant, this causes a significant slow down. Replacing it with a lazy set makes amend go from 11 seconds to 8.7 seconds.	2013-11-11 16:40:02 -08:00
Matt Mackall	2513030d97	merge with stable	2013-09-23 11:37:06 -07:00
Wojciech Lopata	0a0c3321e2	generaldelta: initialize basecache properly Previously basecache was incorrectly initialized before adding the first revision from a changegroup. Basecache value influences when full revisions are stored in revlog (when using generaldelta). As a result it was possible to generate a generaldelta-revlog that could be bigger by arbitrary factor than its non-generaldelta equivalent.	2013-09-20 10:45:51 -07:00
Siddharth Agarwal	64c41699fa	revlog: remove _chunkbase since it is no longer used This was introduced in 2011 for the lwcopy feature but never actually got used. A similar hook can easily be reintroduced if needed in the future.	2013-09-06 23:05:33 -07:00
Siddharth Agarwal	bcd16c3892	revlog: move chunk cache preload from revision to _chunks In case we don't have a cached text already, add the base rev to the list passed to _chunks. In the cached case this also avoids unnecessarily preloading the chunk for the cached rev.	2013-09-06 23:05:11 -07:00
Siddharth Agarwal	61f0e46895	revlog._chunks: inline getchunk We do this in a somewhat hacky way, relying on the fact that our sole caller preloads the cache right before calling us. An upcoming patch will make this more sensible. For a 20 MB manifest with a delta chain of > 40k, perfmanifest goes from 0.49 seconds to 0.46.	2013-09-06 22:57:51 -07:00
Siddharth Agarwal	b7abdca3c1	revlog.revision: fix cache preload for inline revlogs Previously the length of data preloaded did not account for the interleaved io contents. This meant that we'd sometimes have cache misses in _chunks despite the preloading. Having a correctly filled out cache will become essential in an upcoming patch.	2013-09-07 12:42:46 -07:00
Siddharth Agarwal	d24e970042	revlog: add a fast method for getting a list of chunks This moves _chunkraw into the loop. Doing that improves revlog decompression -- in particular, manifest decompression -- significantly. For a 20 MB manifest which is the result of a > 40k delta chain, hg perfmanifest improves from 0.55 seconds to 0.49 seconds.	2013-09-06 16:31:35 -07:00
Wojciech Lopata	3a79365ccc	revlog: pass node as an argument of addrevision This change will allow revlog subclasses that override 'checkhash' method to use custom strategy of computing nodeids without overriding 'addrevision' method. In particular this change is necessary to implement manifest compression.	2013-08-19 11:25:23 -07:00
Wojciech Lopata	299c718f66	revlog: extract 'checkhash' method Extract method that decides whether nodeid is correct for paricular revision text and parent nodes. Having this method extracted will allow revlog subclasses to implement custom way of computing nodes. In particular this change is necessary to implement manifest compression.	2013-08-19 11:06:38 -07:00
Matt Mackall	06155d5c8a	revlog: handle hidden revs in _partialmatch (issue3979) Looking up hidden prefixes could cause a no node exception Looking up unique non-hidden prefixes could be ambiguous	2013-07-23 17:28:12 -05:00
Durham Goode	e750755eba	revlog: add exception when linkrev == nullrev When we deployed the latest crew mercurial to our users, a few of them had issues where a filelog would have an entry with a -1 linkrev. This caused operations like rebase and amend to create a bundle containing the entire repository, which took a long time. I don't know what the issue is, but adding this check should prevent repos from getting in this state, and should help us pinpoint the issue next time it happens.	2013-06-17 19:44:00 -07:00
Sune Foldager	6bd4fdfe9d	bundle-ng: move group into the bundler No additional semantic changes made.	2013-05-10 21:03:01 +02:00
Alexander Plavin	48936f264c	revlog: fix a regression with null revision Introduced in the patch which fixes issue3497 Part of that patch was erroneously submitted and it shouldn't be in the code	2013-04-18 16:46:09 +04:00
Alexander Plavin	829cf92d16	log: fix behavior with empty repositories (issue3497) Make output in this special case consistent with general case one.	2013-04-17 00:29:54 +04:00
Bryan O'Sullivan	512383d40e	revlog: don't cross-check ancestor result against Python version	2013-04-16 10:08:20 -07:00
Bryan O'Sullivan	c6b9f1099d	parsers: a C implementation of the new ancestors algorithm The performance of both the old and new Python ancestor algorithms depends on the number of revs they need to traverse. Although the new algorithm performs far better than the old when revs are numerically and topologically close, both algorithms become slow under other circumstances, taking up to 1.8 seconds to give answers in a Linux kernel repo. This C implementation of the new algorithm is a fairly straightforward transliteration. The only corner case of interest is that it raises an OverflowError if the number of GCA candidates found during the first pass is greater than 24, to avoid the dual perils of fixnum overflow and trying to allocate too much memory. (If this exception is raised, the Python implementation is used instead.) Performance numbers are good: in a Linux kernel repo, time for "hg debugancestors" on two distant revs (24bf01de7537 and c2a8808f5943) is as follows: Old Python: 0.36 sec New Python: 0.42 sec New C: 0.02 sec For a case where the new algorithm should perform well: Old Python: 1.84 sec New Python: 0.07 sec New C: measures as zero when using --time (This commit includes a paranoid cross-check to ensure that the Python and C implementations give identical answers. The above performance numbers were measured with that check disabled.)	2013-04-16 10:08:20 -07:00
Bryan O'Sullivan	59b785a485	revlog: choose a consistent ancestor when there's a tie Previously, we chose a rev based on numeric ordering, which could cause "the same merge" in topologically identical but numerically different repos to choose different merge bases. We now choose the lexically least node; this is stable across different revlog orderings.	2013-04-16 10:08:19 -07:00
Bryan O'Sullivan	4a3a46aff6	ancestor: a new algorithm that is faster for nodes near tip Instead of walking all the way to the root of the DAG, we generate a set of candidate GCA revs, then figure out which ones will win the race to the root (usually without needing to traverse all the way to the root). In the common case of nodes that are close to each other in both revision number and topology, this is usually a big win: it makes "hg --time debugancestors" up to 9 times faster than the more general ancestor function when measured on heads of the linux-2.6 hg repo. Victory is not assured, however. The older function can still win by a large margin if one node is much closer to the root than the other, or by a much smaller amount if one is an ancestor of the other. For now, we've also got a small paranoid harness function that calls both ancestor functions on every input and ensures that they give equivalent answers. Even without the checker function, the old ancestor function needs to stay alive for the time being, as its generality is used by context.filectx.merge.	2013-04-16 10:08:18 -07:00
Benoit Boissinot	41300c28e0	revlog: document v0 format	2013-02-09 12:08:02 +01:00
Siddharth Agarwal	4d560304bb	revlog: move ancestor generation out to a new class This refactoring is to prepare for implementing lazy membership.	2012-12-18 10:14:01 -08:00
Siddharth Agarwal	cca6ff3076	revlog: remove incancestors since it is no longer used	2012-12-17 15:08:37 -08:00
Siddharth Agarwal	6d5198a5a3	revlog.ancestors: add support for including revs This is in preparation for an upcoming refactoring. This also fixes a bug in incancestors, where if an element of revs was an ancestor of another it would be generated twice.	2012-12-17 15:13:51 -08:00
Pierre-Yves David	9ac120f569	revlog: allow reverse iteration with revlog.revs We often need to perform rev iteration in reverse order. This changeset makes it possible to do so, in order to avoid costly reverse or reversed() calls later.	2012-11-21 00:42:05 +01:00
Siddharth Agarwal	b05b94a300	revlog: add rev-specific variant of findmissing This will be used by rebase in an upcoming commit.	2012-11-26 10:48:24 -08:00
Siddharth Agarwal	76a23a18f8	revlog: switch findmissing to use ancestor.missingancestors This also speeds up other commands that use findmissing, like incoming and merge --preview. With a large linear repository (>400000 commits) and with one incoming changeset, incoming is sped up from around 4-4.5 seconds to under 3.	2012-11-26 11:02:48 -08:00
Durham Goode	cce0517fb6	commit: increase perf by avoiding unnecessary filteredrevs check When commiting to a repo with lots of history (>400000 changesets) the filteredrevs check (added with 373606589de5) in changelog.py takes a bit of time even if the filteredrevs set is empty. Skipping the check in that case shaves 0.36 seconds off a 2.14 second commit. A 17% gain.	2012-11-16 15:39:12 -08:00
Pierre-Yves David	6d6a3d27a5	clfilter: split `revlog.headrevs` C call from python code Make the pure python implementation of headrevs available to derived classes. It is important because filtering logic applied by `revlog` derived class won't have effect on `index`. We want to be able to bypass this C call to implement our own.	2012-09-03 14:19:45 +02:00
Pierre-Yves David	23fb63d637	clfilter: handle non contiguous iteration in `revlov.headrevs` This prepares changelog level filtering. We can't assume that any revision can be heads because filtered revisions need to be excluded. New algorithm: - All revisions now start as "non heads", - every revision we iterate over is made candidate head, - parents of iterated revisions are definitely not head. Filtered revisions are never iterated over and never considered as candidate head.	2012-09-03 14:12:45 +02:00
Pierre-Yves David	6981326b92	clfilter: make the revlog class responsible of all its iteration This prepares changelog level filtering. We need the algorithms used in revlog to work on a subset of revisions. To achieve this, the use of explicit range of revision is banned. `range` and `xrange` calls are replaced by a `revlog.irevs` method. Filtered super class can then overwrite the `irevs` method to filter out revision.	2012-09-20 19:00:59 +02:00
Mads Kiilerich	2f4504e446	fix trivial spelling errors	2012-08-15 22:38:42 +02:00
Matt Mackall	5b06da939f	backout 94ae81a4e338 This may have allowed unbounded I/O sizes with the current chunk retrieval code.	2012-07-12 14:20:34 -05:00
Martin Geisler	c52341ae3f	merge with main	2012-07-12 10:03:50 +02:00
Friedrich Kastner-Masilko	a6245a11d3	revlog: fix for generaldelta distance calculation The decision whether or not to store a full snapshot instead of a delta is done based on the distance value calculated in _addrevision.builddelta(rev). This calculation traditionally used the fact of deltas only using the previous revision as base. Generaldelta mechanism is changing this, yet the calculation still assumes that current-offset minus chainbase-offset equals chain-length. This appears to be wrong. This patch corrects the calculation by means of using the chainlength function if Generaldelta is used.	2012-07-11 12:38:42 +02:00
Bryan O'Sullivan	26f2c363fd	revlog: make compress a method This allows an extension to optionally use a new compression type based on the options applied by the repo to the revlog's opener. (decompress doesn't need the same treatment, as it can be replaced using extensions.wrapfunction, and can figure out which compression algorithm is in use based on the first byte of the compressed payload.)	2012-06-25 13:56:13 -07:00
Joshua Redstone	09130c5cf2	revlog: remove reachable and switch call sites to ancestors This change does a trivial conversion of callsites to ancestors. Followon diffs will switch the callsites over to revs.	2012-06-08 08:39:44 -07:00
Joshua Redstone	70aeee0070	revlog: add incancestors, a version of ancestors that includes revs listed ancestors() returns the ancestors of revs provided. This func is like that except it also includes the revs themselves in the total set of revs generated.	2012-06-08 07:59:37 -07:00
Thomas Arendsen Hein	01adf7776d	merge heads	2012-06-07 15:55:12 +02:00
Brad Hall	f20c06750f	revlog: zlib.error sent to the user (issue3424) Give the user the zlib error message instead of a backtrace when decompression fails.	2012-06-04 14:46:42 -07:00
Joshua Redstone	e38b770424	revlog: add optional stoprev arg to revlog.ancestors() This will be used as a step in removing reachable() in a future diff. Doing it now because bryano is in the process of rewriting ancestors in C. This depends on bryano's patch to replace *revs with revs in the declaration of revlog.ancestors.	2012-06-01 15:44:13 -07:00
Bryan O'Sullivan	141bd09daa	revlog: descendants(*revs) becomes descendants(revs) (API) Once again making the API more rational, as with ancestors.	2012-06-01 12:45:16 -07:00
Bryan O'Sullivan	6ba97b40c1	revlog: ancestors(revs) becomes ancestors(revs) (API) Accepting a variable number of arguments as the old API did is deeply ugly, particularly as it means the API can't be extended with new arguments. Partly as a result, we have at least three different implementations of the same ancestors algorithm (!?). Most callers were forced to call ancestors(somelist), adding to both inefficiency and ugliness.	2012-06-01 12:37:18 -07:00
Bryan O'Sullivan	abdf4a8227	util: subclass deque for Python 2.4 backwards compatibility It turns out that Python 2.4's deque type is lacking a remove method. We can't implement remove in terms of find, because it doesn't have find either.	2012-06-01 17:05:31 -07:00
Bryan O'Sullivan	bef5b61512	cleanup: use the deque type where appropriate There have been quite a few places where we pop elements off the front of a list. This can turn O(n) algorithms into something more like O(n**2). Python has provided a deque type that can do this efficiently since at least 2.4. As an example of the difference a deque can make, it improves perfancestors performance on a Linux repo from 0.50 seconds to 0.36.	2012-05-15 10:46:23 -07:00
Bryan O'Sullivan	a49ea963d7	revlog: switch to a C version of headrevs The C implementation is more than 100 times faster than the Python version (which is still available as a fallback). In a repo with 330,000 revs and a stale .hg/cache/tags file, this patch improves the performance of "hg tip" from 2.2 to 1.6 seconds.	2012-05-19 19:44:58 -07:00
Matt Mackall	42c30757a2	revlog: don't handle long for revision matching The underlying C code doesn't support indexing by longs, there are no legitimate reasons to use a long, and longs should generally be converted to ints at a higher level by context's constructor.	2012-05-21 16:36:09 -05:00
Brodie Rao	a7ef0a0cc5	cleanup: "not x in y" -> "x not in y"	2012-05-12 16:00:57 +02:00
Bryan O'Sullivan	058dfb801d	revlog: speed up prefix matching against nodes The radix tree already contains all the information we need to determine whether a short string is an unambiguous node identifier. We now make use of this information. In a kernel tree, this improves the performance of "hg log -q -r24bf01de75" from 0.27 seconds to 0.06.	2012-05-12 10:55:08 +02:00
Matt Mackall	a97dbbe308	revlog: backout df8c4d732869 This regresses performance of 'hg branches', presumably because it's visiting the revlog in the wrong order. This suggests we either need to fix the branch code or add some read-behind to mitigate the effect.	2012-04-27 13:07:29 -05:00
Patrick Mezard	e9454c243f	revlog: fix partial revision() docstring (from f4a6c9197dbd)	2012-04-13 10:14:59 +02:00
Matt Mackall	a6546db90e	revlog: drop some unneeded rev.node calls in revdiff	2012-04-13 22:55:46 -05:00
Bryan O'Sullivan	62554752c6	revlog: avoid an expensive string copy This showed up in a statprof profile of "hg svn rebuildmeta", which is read-intensive on the changelog. This two-line patch improved the performance of that command by 10%.	2012-04-12 20:26:33 -07:00
Matt Mackall	4e0b41f193	revlog: increase readahead size	2012-04-13 21:35:48 -05:00
Bryan O'Sullivan	dc46676e81	parsers: use base-16 trie for faster node->rev mapping This greatly speeds up node->rev lookups, with results that are often user-perceptible: for instance, "hg --time log" of the node associated with rev 1000 on a linux-2.6 repo improves from 0.3 seconds to 0.03. I have not found any instances of slowdowns. The new perfnodelookup command in contrib/perf.py demonstrates the speedup more dramatically, since it performs no I/O. For a single lookup, the new code is about 40x faster. These changes also prepare the ground for the possibility of further improving the performance of prefix-based node lookups.	2012-04-12 14:05:59 -07:00
Matt Mackall	055cba03a8	revlog: allow retrieving contents by revision number	2012-04-08 12:38:02 -05:00
Matt Mackall	30645d82e7	revlog: add hasnode helper method	2012-04-07 15:43:18 -05:00
Pierre-Yves David	15ab7ccd15	revlog: make addgroup returns a list of node contained in the added source This list will contains any node see in the source, not only the added one. This is intended to allow phase to be move according what was pushed by client not only what was added.	2012-01-13 01:29:03 +01:00
Pierre-Yves David	a51dc67424	revlog: improve docstring for findcommonmissing	2012-01-09 04:15:31 +01:00
Steven Brown	3ebdb5ed19	revlog: clarify strip docstring "readd" -> "re-add" I misread it as "read".	2012-01-10 22:35:25 +08:00
Matt Mackall	864ce9da04	misc: adding missing file close() calls Spotted by Victor Stinner <victor.stinner@haypocalc.com>	2011-11-03 11:24:55 -05:00
Greg Ward	bc1dfb1ac9	atomictempfile: make close() consistent with other file-like objects. The usual contract is that close() makes your writes permanent, so atomictempfile's use of close() to discard writes (and rename() to keep them) is rather unexpected. Thus, change it so close() makes things permanent and add a new discard() method to throw them away. discard() is only used internally, in __del__(), to ensure that writes are discarded when an atomictempfile object goes out of scope. I audited mercurial., hgext., and ~80 third-party extensions, and found no one using the existing semantics of close() to discard writes, so this should be safe.	2011-08-25 20:21:04 -04:00
Augie Fackler	ea2e868e0f	revlog: use getattr instead of hasattr	2011-07-25 15:43:55 -05:00
Matt Mackall	1b52b02896	check-code: catch misspellings of descendant This word is fairly common in Mercurial, and easy to misspell.	2011-06-07 17:02:54 -05:00
Sune Foldager	7db447dd4c	revlog: bail out earlier in group when we have no chunks	2011-06-03 20:32:54 +02:00
Martin Geisler	af8a35e078	check-code: flag 0/1 used as constant Boolean expression	2011-06-01 12:38:46 +02:00
Matt Mackall	66805ccfed	revlog: stop exporting node.short	2011-05-21 15:01:28 -05:00
Matt Mackall	a6f2ad6f1e	revlog: drop base() again deltaparent does what's needed, and more "portably".	2011-05-18 17:05:30 -05:00
Sune Foldager	9a73f9bed3	revlog: linearize created changegroups in generaldelta revlogs This greatly improves the speed of the bundling process, and often reduces the bundle size considerably. (Although if the repository is already ordered, this has little effect on both time and bundle size.) For non-generaldelta clients, the reduced bundle size translates to a reduced repository size, similar to shrinking the revlogs (which uses the exact same algorithm). For generaldelta clients the difference is minor. When the new bundle format comes, reordering will not be necessary since we can then store the deltaparent relationsships directly. The eventual default behavior for clients and servers is presented in the table below, where "new" implies support for GD as well as the new bundle format: old client new client old server old bundle, no reorder old bundle, no reorder new server, non-GD old bundle, no reorder[1] old bundle, no reorder[2] new server, GD old bundle, reorder[3] new bundle, no reorder[4] [1] reordering is expensive on the server in this case, skip it [2] client can choose to do its own redelta here [3] reordering is needed because otherwise the pull does a lot of extra work on the server [4] reordering isn't needed because client can get deltabase in bundle format Currently, the default is to reorder on GD-servers, and not otherwise. A new setting, bundle.reorder, has been added to override the default reordering behavior. It can be set to either 'auto' (the default), or any true or false value as a standard boolean setting, to either force the reordering on or off regardless of generaldelta. Some timing data from a relatively branch test repository follows. All bundling is done with --all --type none options. Non-generaldelta, non-shrunk repo: ----------------------------------- Size: 276M Without reorder (default): Bundle time: 14.4 seconds Bundle size: 939M With reorder: Bundle time: 1 minute, 29.3 seconds Bundle size: 381M Generaldelta, non-shrunk repo: ----------------------------------- Size: 87M Without reorder: Bundle time: 2 minutes, 1.4 seconds Bundle size: 939M With reorder (default): Bundle time: 25.5 seconds Bundle size: 381M	2011-05-18 23:26:26 +02:00
Sune Foldager	c222fc4662	changelog: don't use generaldelta	2011-05-16 13:06:48 +02:00
Sune Foldager	d7f01e602b	revlog: get rid of defversion defversion was a property (later option) on the store opener, used to propagate the changelog revlog format to the other revlogs, so they would be created with the same format. This required that the changelog instance was created before any other revlog; an invariant that wasn't directly enforced (or documented) anywhere. We now use the revlogv1 requirement instead, which is transfered to the store opener options. If this option is missing, v0 revlogs are created.	2011-05-16 12:44:34 +02:00
Matt Mackall	608041d55e	revlog: restore the base method	2011-05-15 11:50:15 -05:00
Sune Foldager	2ce60e2564	revlog: improve delta generation heuristics for generaldelta Without this change, pulls (and clones) into a generaldelta repository could generate very inefficient revlogs, the size of which could be at least twice the original size. This was caused by the generated delta chains covering too large distances, causing new chains to be built far too often. This change addresses the problem by forcing a delta against second parent or against the previous revision, when the first parent delta is in danger of creating a long chain.	2011-05-12 15:24:33 +02:00
Sune Foldager	7b30600f6b	revlog: fix bug in chainbase cache The bug didn't cause corruption, and thus wasn't caught in hg verify or in tests. It could lead to delta chains longer than normally allowed, by affecting the code that decides when to add a full revision. This could, in turn, lead to performance regression.	2011-05-12 13:47:17 +02:00
Sune Foldager	762090a2c7	revlog: add docstring to _addrevision	2011-05-11 11:04:44 +02:00
Sune Foldager	1c7dece034	revlog: support writing generaldelta revlogs With generaldelta switched on, deltas are always computed against the first parent when adding revisions. This is done regardless of what revision the incoming bundle, if any, is deltaed against. The exact delta building strategy is subject to change, but this will not affect compatibility. Generaldelta is switched off by default.	2011-05-08 21:32:33 +02:00
Sune Foldager	8bdf02181a	revlog: support reading generaldelta revlogs Generaldelta is a new revlog global flag. When it's turned on, the base field of each revision entry holds the deltaparent instead of the base revision of the current delta chain. This allows for great potential flexibility when generating deltas, as any revision can serve as deltaparent. Previously, the deltaparent for revision r was hardcoded to be r - 1. The base revision of the delta chain can still be accessed as before, since it is now computed in an iterative fashion, following the deltaparents backwards.	2011-05-07 22:40:17 +02:00
Sune Foldager	88485e9322	revlog: calculate base revisions iteratively This is in preparation for generaldelta, where the revlog entry base field is reinterpreted as the deltaparent. For that reason we also rename the base function to chainbase. Without generaldelta, performance is unaffected, but generaldelta will suffer from this in _addrevision, since delta chains will be walked repeatedly. A cache has been added to eliminate this problem completely.	2011-05-07 22:40:14 +02:00
Sune Foldager	be6386433b	revlog: remove the last bits of punched/shallow Most of it was removed in fa05c723ac8c, but a few pieces were accidentally left behind.	2011-05-07 22:37:40 +02:00
Martin Geisler	d04646b8d9	revlog: use real Booleans instead of 0/1 in nodesbetween	2011-05-06 12:09:20 +02:00
Sune Foldager	750dcd7b48	revlog: compute correct deltaparent in the deltaparent function It now returns nullrev for chain base revisions, since they are conceptually deltas against nullrev. The revdiff function was updated accordingly.	2011-05-05 18:05:24 +02:00
Sune Foldager	bb96ed66fc	revlog: remove support for punched/shallow The feature was never finished, and there has been restructuring going on since it was added.	2011-05-05 12:46:02 +02:00

1 2 3 4 5 ...

613 Commits