sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 08:47:12 +03:00

Author	SHA1	Message	Date
Mark Thomas	073ae56963	revlog: add option to mmap revlog index Following on from Jun Wu's patch last October[1], we have found that using mmap for the revlog index in repos with large revlogs gives a noticable performance improvment (~110ms on each hg invocation), particularly for commands that don't touch the index very much. This changeset adds this as an option, activated by a new experimental config option so that it can be enabled on a per-repo basis. The configuration option specifies an index size threshold at which Mercurial will switch to using mmap to access the index. If the configuration option is not specified, the default remains to load the full file, which seems to be the best option for smaller repos. Some initial performance numbers for average of 5 invocations of `hg log -l 5` for different cache states: \| Repo: \| HG \| FB \| \|---\|---\|---\| \| Index size: \| 2.3MB \| much bigger \| \| read (warm): \| 237ms \| 432ms \| \| mmap (warm): \| 227ms \| 321ms \| \| \| (-3%) \| (-26%) \| \| read (cold): \| 397ms \| 696ms \| \| mmap (cold): \| 410ms \| 888ms \| \| \| (+3%) \| (+28%) \| [1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/088737.html Test Plan: `hg log --config experimental.mmapindex=true` Differential Revision: https://phab.mercurial-scm.org/D477	2017-09-13 17:26:26 +00:00
Durham Goode	c00411b064	revlog: add revmap back to revlog.addgroup The recent e85296920485 patch removed the linkmapper argument from addgroup, as part of trying to make addgroup more agnostic from the changegroup format. It turns out that the changegroup can't resolve linkrevs while iterating over the deltas, because applying the deltas might affect the linkrev resolution. For example, when applying a series of changelog entries, the linkmapper just returns len(cl). If we're iterating over the deltas without applying them to the changelog, this results in incorrect linkrevs. This was caught by the hgsql extension, which reads the revisions before applying them. The fix is to return linknodes as part of the delta iterator, and let the consumer choose what to do. Differential Revision: https://phab.mercurial-scm.org/D730	2017-09-20 09:22:22 -07:00
Martin von Zweigbergk	cb781eb5fa	templater: extract shortest() logic from template function It can be useful for extensions to be able to produce the shortest unambiguous hash (including the in-tree "show" extension). That logic is currently inside the shortest() template function. Let's move it out of the templater. I've put it on revlog since it's closely related to revlog._partialmatch. We may also want a convenience method on context, but I'll leave that for a later patch. Differential Revision: https://phab.mercurial-scm.org/D724	2017-09-15 00:01:57 -07:00
Durham Goode	c94bd0e75c	changegroup: remove changegroup dependency from revlog.addgroup Previously revlog.addgroup would accept a changegroup and a linkmapper and use it to iterate of the deltas. As part of untangling the revlog-changegroup interdependency, let's move the changegroup delta iteration logic to it's own function and pass the simple iterator to the revlog instead. This will make it easier to introduce non-revlogs stores in the future, without reinventing any changegroup specific logic. Differential Revision: https://phab.mercurial-scm.org/D688	2017-09-13 10:43:44 -07:00
Durham Goode	d27ceccf8a	revlog: refactor chain variable Previously the addgroup loop would set chain to be the result of self._addrevision(node,...). Since _addrevision now always returns the passed in node, we can drop that behavior and just always set chain = node in the loop. This will be useful in a future patch where we refactor the cg.deltachunk logic to another function and therefore chain disappears entirely from this function. Differential Revision: https://phab.mercurial-scm.org/D699	2017-09-13 10:43:16 -07:00
Martin von Zweigbergk	54312c2822	revlog: move check for wdir from changelog to revlog Yuya said he preferred this (to keep them in one place, I think). Differential Revision: https://phab.mercurial-scm.org/D569	2017-08-30 09:21:31 -07:00
Augie Fackler	58925444d0	revlog: use pycompat.bytestr() to reliably have a %s-able value	2017-08-22 21:21:43 -04:00
Martin von Zweigbergk	9255a8ef24	revlog: abort on attempt to write null revision My repo got corrupted yesterday by something that ended up writing the null revision to the revlog (nullid hash, not nullrev index, of course). We use many extensions internally (narrowhg, remotefilelog, evolve, internal extensions) and treemanifests are on. The null revision was written to the changelog, the root manifest log, and one subdirectory manifest log. I have no idea exactly why the null revision was written, but it seems cheap enough to check that we should fail instead of corrupting the repo. Differential Revision: https://phab.mercurial-scm.org/D522	2017-08-25 15:50:07 -07:00
Alex Gaynor	119e84a2a0	revlog: use struct.Struct instances for slight performance wins Differential Revision: https://phab.mercurial-scm.org/D32	2017-07-10 16:41:13 -04:00
Alex Gaynor	c6f81cfa87	revlog: micro-optimize the computation of hashes Differential Revision: https://phab.mercurial-scm.org/D31	2017-07-10 16:39:28 -04:00
Pierre-Yves David	7c5463c25b	revlog: add an experimental option to mitigated delta issues (issue5480) The general delta heuristic to select a delta do not scale with the number of branch. The delta base is frequently too far away to be able to reuse a chain according to the "distance" criteria. This leads to insertion of larger delta (or even full text) that themselves push the bases for the next delta further away leading to more large deltas and full texts. This full text and frequent recomputation throw Mercurial performance in disarray. For example of a slightly large repository 280 000 files (2 150 000 versions) 430 000 changesets (10 000 topological heads) Number below compares repository with and without the distance criteria: manifest size: with: 21.4 GB without: 0.3 GB store size: with: 28.7 GB without 7.4 GB bundle last 15 00 revisions: with: 800 seconds 971 MB without: 50 seconds 73 MB unbundle time (of the last 15K revisions): with: 1150 seconds (~19 minutes) without: 35 seconds Similar issues has been observed in other repositories. Adding a new option or "feature" on stable is uncommon. However, given that this issues is making Mercurial practically unusable, I'm exceptionally targeting this patch for stable. What is actually needed is a full rework of the delta building and reading logic. However, that will be a longer process and churn not suitable for stable. In the meantime, we introduces a quick and dirty mitigation of this in the 'experimental' config space. The new option introduces a way to set the maximum amount of memory usable to store a diff in memory. This extend the ability for Mercurial to create chains without removing all safe guard regarding memory access. The option should be phased out when core has a more proper solution available. Setting the limit to '0' remove all limits, setting it to '-1' use the default limit (textsize x 4).	2017-06-23 13:49:34 +02:00
Gregory Szorc	c45a2cb5ab	revlog: C implementation of delta chain resolution I've seen revlog._deltachain() appear in a number of performance profiles. I suspect there are 2 reasons for this: 1. Delta chain resolution performs many index lookups, thus triggering population of index tuples. Creating possibly tens of thousands of PyObject will have overhead. 2. Delta chain resolution is a tight loop. By moving delta chain resolution to C, we can defer instantiation of full index entry tuples and make the loop faster courtesy of not running in Python. We can measure the impact to delta chain resolution via `hg perflogrevision` using the mozilla-central repo with a recent manifest having delta chain length of 33726: $ hg perfrevlogrevision -m 364895 ! full ! wall 0.367585 comb 0.370000 user 0.340000 sys 0.030000 (best of 27) ! wall 0.357581 comb 0.360000 user 0.350000 sys 0.010000 (best of 28) ! deltachain ! wall 0.010644 comb 0.010000 user 0.010000 sys 0.000000 (best of 270) ! wall 0.000292 comb 0.000000 user 0.000000 sys 0.000000 (best of 8729) $ hg perfrevlogrevision --cache -m 364895 ! deltachain ! wall 0.003904 comb 0.000000 user 0.000000 sys 0.000000 (best of 712) ! wall 0.000284 comb 0.000000 user 0.000000 sys 0.000000 (best of 9926) The first test measures savings from both not instantiating index entries and moving to C. The second test (which doesn't clear the index caches) essentially isolates the benefits of moving from Python to C. It still shows a 13.7x speedup (versus 36.4x). And there are multiple milliseconds of savings within the critical path for resolving revision data. I think that justifies the existence of C code. A more striking example of the benefits of this change can be demonstrated by timing `hg debugdeltachain -m` for the mozilla-central repo: $ time hg debugdeltachain -m > /dev/null before: 1057.4s after: 503.3s PyPy2.7 5.8.0: 220.0s It's worth noting that the C code isn't as optimal as it could be. We're still instantiating a new PyObject for every revision. A future optimization would be to reuse the PyObject on the cached index tuple. We could potentially also get wins by using a memory array of raw integers. There is also room for a delta chain cache on revlog instances. Of course, the best optimization is to implement revlog reading outside of Python so Python doesn't need to be concerned about the relatively expensive index entries and operations on them.	2017-06-25 12:41:34 -07:00
Pulkit Goyal	dc7fe61263	py3: catch binascii.Error raised from binascii.unhexlify Before Python 3, binsacii.unhexlify used to raise TypeError, now it raises binascii.Error.	2017-06-20 22:11:46 +05:30
Martin von Zweigbergk	daced3540c	revlog: rename list of nodes from "content" to "nodes" It seems like the reason for "content" is that the variable contains the nodes that the changegroup "contains", see c2901cddb53f (revlog: make addgroup returns a list of node contained in the added source, 2012-01-13), but "nodes" seems much clearer.	2017-06-15 13:42:35 -07:00
Martin von Zweigbergk	12dbf027df	revlog: delete obsolete comment The comment seems to refer to code that was deleted in a46638c640d8 (revlog.addgroup(): always use _addrevision() to add new revlog entries, 2010-10-08).	2017-06-15 13:25:41 -07:00
Martin von Zweigbergk	f0c7377259	revlog: delete dead assignment in addgroup()	2017-06-15 13:23:21 -07:00
Gregory Szorc	efbb740737	revlog: skeleton support for version 2 revlogs There are a number of improvements we want to make to revlogs that will require a new version - version 2. It is unclear what the full set of improvements will be or when we'll be done with them. What I do know is that the process will likely take longer than a single release, will require input from various stakeholders to evaluate changes, and will have many contentious debates and bikeshedding. It is unrealistic to develop revlog version 2 up front: there are just too many uncertainties that we won't know until things are implemented and experiments are run. Some changes will also be invasive and prone to bit rot, so sitting on dozens of patches is not practical. This commit introduces skeleton support for version 2 revlogs in a way that is flexible and not bound by backwards compatibility concerns. An experimental repo requirement for denoting revlog v2 has been added. The requirement string has a sub-version component to it. This will allow us to declare multiple requirements in the course of developing revlog v2. Whenever we change the in-development revlog v2 format, we can tweak the string, creating a new requirement and locking out old clients. This will allow us to make as many backwards incompatible changes and experiments to revlog v2 as we want. In other words, we can land code and make meaningful progress towards revlog v2 while still maintaining extreme format flexibility up until the point we freeze the format and remove the experimental labels. To enable the new repo requirement, you must supply an experimental and undocumented config option. But not just any boolean flag will do: you need to explicitly use a value that no sane person should ever type. This is an additional guard against enabling revlog v2 on an installation it shouldn't be enabled on. The specific scenario I'm trying to prevent is say a user with a 4.4 client with a frozen format enabling the option but then downgrading to 4.3 and accidentally creating repos with an outdated and unsupported repo format. Requiring a "challenge" string should prevent this. Because the format is not yet finalized and I don't want to take any chances, revlog v2's version is currently 0xDEAD. I figure squatting on a value we're likely never to use as an actual revlog version to mean "internal testing only" is acceptable. And "dead" is easily recognized as something meaningful. There is a bunch of cleanup that is needed before work on revlog v2 begins in earnest. I plan on doing that work once this patch is accepted and we're comfortable with the idea of starting down this path.	2017-05-19 20:29:11 -07:00
Yuya Nishihara	685172007c	revlog: add support for partial matching of wdir node id The idea is simple. If the given node id prefix is 'ff...f', add +1 to the number of matches (e.g. ambiguous if partial + maybewdir > 1). This patch also fixes id() revset and shortest() template since _partialmatch() can raise WdirUnsupported exception.	2016-08-19 18:26:04 +09:00
Yuya Nishihara	b42457ae0a	revlog: map rev(wdirid) to WdirUnsupported exception This will allow us to map repo["ff..."] to workingctx. _partialmatch() will be updated later. I tried "return wdirrev" in place of raising the exception, but earlier exception seemed better.	2016-08-20 22:37:58 +09:00
Pulkit Goyal	dfd06a1929	revlog: raise error.WdirUnsupported from revlog.node() if wdirrev is passed When we try to run, 'hg debugrevspec 'branch(wdir())'', it throws an index error and blows up. Lets raise the WdirUnsupported if wdir() is passed so that we can catch that later.	2017-05-23 01:30:36 +05:30
Pulkit Goyal	26a5b62b59	revlog: raise WdirUnsupported when wdirrev is passed revlog.parentrevs() is called while evaluating ^ operator in revsets. When wdir is passed, it raises IndexError. This patch raises WdirUnsupported if wdir is passed in the function. The error will be caugth in future patches.	2017-05-19 19:12:06 +05:30
Gregory Szorc	7d51a8278d	revlog: remove some revlogNG terminology RevlogNG is not such a good name when it is no longer the newest revlog version. Since we'll soon have revlog version 2, let's remove some references to it.	2017-05-19 20:14:31 -07:00
Gregory Szorc	5bcef1853c	revlog: tweak wording and logic for flags validation First, the logic around the if..elif..elif was subtly wrong and sub-optimal because all branches would be tested as long as the revlog was valid. This patch changes things so it behaves like a switch statement over the revlog version. While I was here, I also tweaked error strings to make them consistent and to read better.	2017-05-19 20:10:50 -07:00
Yuya Nishihara	4563e16232	parsers: switch to policy importer # no-check-commit	2016-08-13 12:23:56 +09:00
Gregory Szorc	8af088ee65	revlog: rename constants (API) Feature flag constants don't need "NG" in the name because they will presumably apply to non-"NG" version revlogs. All feature flag constants should also share a similar naming convention to identify them as such. And, "RevlogNG" isn't a great internal name since it isn't obvious it maps to version 1 revlogs. Plus, "NG" (next generation) is only a good name as long as it is the latest version. Since we're talking about version 2, now is as good a time as any to move on from that naming.	2017-05-17 19:52:18 -07:00
Jun Wu	718861a5f7	changelog: make sure datafile is 00changelog.d (API) 0ad0d26ff7 makes it possible for changelog datafile to be "00changelog.i.d", which is wrong. This patch adds an explicit datafile parameter to fix it.	2017-05-17 20:14:27 -07:00
Martin von Zweigbergk	c3406ac3db	cleanup: use set literals We no longer support Python 2.6, so we can now use set literals.	2017-02-10 16:56:29 -08:00
Jun Wu	4656f56bb3	flagprocessor: add a fast path when flags is 0 When flags is 0, _processflags could be a no-op instead of iterating through the flag bits.	2017-05-10 16:17:58 -07:00
Jun Wu	2c11c92a85	revlog: move part of "addrevision" to "addrawrevision" "addrawrevision" will be the public API to reuse revision rawdata elsewhere. It will be used by a future patch.	2017-05-09 21:27:06 -07:00
Gregory Szorc	5d6e940365	revlog: rename _chunkraw to _getsegmentforrevs() This completes our rename of internal revlog methods to distinguish between low-level raw revlog data "segments" and higher-level, per-revision "chunks." perf.py has been updated to consult both names so it will work against older Mercurial versions.	2017-05-06 12:12:53 -07:00
Gregory Szorc	46413ff643	revlog: rename internal functions containing "chunk" to use "segment" Currently, "chunk" is overloaded in revlog terminology to mean multiple things. One of them refers to a segment of raw data from the revlog. This commit renames various methods only used within revlog.py to have "segment" in their name instead of "chunk." While I was here, I also made the names more descriptive. e.g. "_loadchunk()" becomes "_readsegment()" because it actually does I/O.	2017-05-06 12:02:12 -07:00
Jun Wu	0606028aff	revlog: make "size" diverge from "rawsize" Previously, revlog.size equals to revlog.rawsize. However, the flag processor framework could make a difference - "size" could mean the length of len(revision(raw=False)), while "rawsize" means len(revision(raw=True)). This patch makes it so. This corrects "hg status" output when flag processor is involved. The call stack looks like: basectx.status -> workingctx._buildstatus -> workingctx._dirstatestatus -> workingctx._checklookup -> filectx.cmp -> filelog.cmp -> filelog.size -> revlog.size	2017-04-09 12:53:31 -07:00
Jun Wu	e557e14680	revlog: avoid applying delta chain on cache hit Previously, revlog.revision(raw=False) may try to apply the delta chain on _cache hit. That happens if flags are non-empty. This patch makes rawtext reused so delta chain application is avoided. "_cache" and "rev" are moved a bit to avoid unnecessary assignments.	2017-04-02 18:40:13 -07:00
Jun Wu	5f26616d71	revlog: indent block to make review easier	2017-04-02 18:29:24 -07:00
Jun Wu	2ab18ee566	revlog: avoid calculating "flags" twice in revision() This is more consistent with other code in "revision()" - prefer performance to code length.	2017-04-02 18:25:12 -07:00
Jun Wu	20165e0767	revlog: use raw revision for rawsize When writing the revlog-ng index, the third field is len(rawtext). See revlog._addrevision: textlen = len(rawtext) .... e = (offset_type(offset, flags), l, textlen, base, link, p1r, p2r, node) self.index.insert(-1, e) Therefore, revlog.index[rev][2] returned by revlog.rawsize should be len(rawtext), where "rawtext" is revlog.revision(raw=True). Unfortunately it's hard to add a test for this code path because "if l >= 0" catches most cases.	2017-04-02 18:57:03 -07:00
Jun Wu	7151069c4a	revlog: add a fast path for revision(raw=False) If cache hit and flags are empty, no flag processor runs and "text" equals to "rawtext". So we check flags, and return rawtext. This resolves performance issue introduced by a previous patch.	2017-03-30 21:21:15 -07:00
Jun Wu	ae8c9ce375	revlog: make _addrevision only accept rawtext All 3 users of _addrevision use raw: - addrevision: passing rawtext to _addrevision - addgroup: passing rawtext and raw=True to _addrevision - clone: passing rawtext to _addrevision There is no real user using _addrevision(raw=False). On the other hand, _addrevision is low-level code dealing with raw revlog deltas and rawtexts. It should not transform rawtext to non-raw text. This patch removes the "raw" parameter from "_addrevision", and does some rename and doc change to make it clear that "_addrevision" expects rawtext. Archeology shows 886a08012bbe added "raw" flag to "_addrevision", follow-ups fe1e206cb389 and 1cfa6239c923 seem to make the flag unnecessary. test-revlog-raw.py no longer complains.	2017-03-30 18:38:03 -07:00
Jun Wu	9a6035a980	revlog: use raw revisions in clone test-revlog-raw.py now shows "clone test passed", but there is more to fix.	2017-03-30 18:24:23 -07:00
Jun Wu	2468c838bd	revlog: use raw revisions in revdiff See the added comment. revdiff is meant to output the raw delta that will be written to revlog. It should use raw. test-revlog-raw.py now shows "addgroupcopy test passed", but there is more to fix.	2017-03-30 18:23:27 -07:00
Jun Wu	558f5cce61	revlog: use raw content when building delta Using external content provided by flagprocessor when building revlog delta is wrong, because deltas are applied to raw contents in revlog. This patch fixes the above issue by adding "raw=True". test-revlog-raw.py now shows "local test passed", but there is more to fix.	2017-03-30 17:58:03 -07:00
Jun Wu	50b232c61f	revlog: fix _cache usage in revision() As documented at revlog.__init__, revlog._cache stores raw text. The current read and write usage of "_cache" in revlog.revision lacks of raw=True check. This patch fixes that by adding check about raw, and storing rawtext explicitly in _cache. Note: it may slow down cache hit code path when raw=False and flags=0. That performance issue will be fixed in a later patch. test-revlog-raw now points us to a new problem.	2017-03-30 15:34:08 -07:00
Jun Wu	fec2bbc9e9	revlog: rename some "text"s to "rawtext" This makes code easier to understand. "_addrevision" is left untouched - it will be changed in a later patch.	2017-03-30 14:56:09 -07:00
Jun Wu	b483b1b607	revlog: clarify flagprocessor documentation The words "text", "newtext", "bool" could be confusing. Use explicit "text" or "rawtext" and document more about the "bool".	2017-03-30 07:59:48 -07:00
Jun Wu	7f99b86dbd	revlog: avoid unnecessary node -> rev conversion	2017-03-29 16:23:04 -07:00
Yuya Nishihara	5a92909ce0	py3: fix slicing of byte string in revlog.compress() I tried .startswith('\0'), but data wasn't always a bytes nor a bytearray.	2017-03-26 17:12:06 +09:00
Augie Fackler	8471f9f823	revlog: use pycompat.maplist to eagerly evaluate map on Python 3 According to Pulkit, this should fix `hg status --all` on Python 3.	2017-03-21 17:39:49 -04:00
Augie Fackler	e0f1b901d8	revlog: use int instead of long By my reading of PEP 237[0], this is completely safe and has been since Python 2.2. 0: https://www.python.org/dev/peps/pep-0237/	2017-03-19 01:05:28 -04:00
Augie Fackler	bc09440907	revlog: use bytes() instead of str() to get data from memoryview Fixes `files -v` on Python 3.	2017-03-12 15:27:02 -04:00
Augie Fackler	03a50eb15f	revlog: use bytes() to ensure text from _chunks is a reasonable type	2017-03-12 03:32:38 -04:00

1 2 3 4 5 ...

606 Commits