sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-07 15:27:13 +03:00

Author	SHA1	Message	Date
Alex Gaynor	c6f81cfa87	revlog: micro-optimize the computation of hashes Differential Revision: https://phab.mercurial-scm.org/D31	2017-07-10 16:39:28 -04:00
Pierre-Yves David	7c5463c25b	revlog: add an experimental option to mitigated delta issues (issue5480) The general delta heuristic to select a delta do not scale with the number of branch. The delta base is frequently too far away to be able to reuse a chain according to the "distance" criteria. This leads to insertion of larger delta (or even full text) that themselves push the bases for the next delta further away leading to more large deltas and full texts. This full text and frequent recomputation throw Mercurial performance in disarray. For example of a slightly large repository 280 000 files (2 150 000 versions) 430 000 changesets (10 000 topological heads) Number below compares repository with and without the distance criteria: manifest size: with: 21.4 GB without: 0.3 GB store size: with: 28.7 GB without 7.4 GB bundle last 15 00 revisions: with: 800 seconds 971 MB without: 50 seconds 73 MB unbundle time (of the last 15K revisions): with: 1150 seconds (~19 minutes) without: 35 seconds Similar issues has been observed in other repositories. Adding a new option or "feature" on stable is uncommon. However, given that this issues is making Mercurial practically unusable, I'm exceptionally targeting this patch for stable. What is actually needed is a full rework of the delta building and reading logic. However, that will be a longer process and churn not suitable for stable. In the meantime, we introduces a quick and dirty mitigation of this in the 'experimental' config space. The new option introduces a way to set the maximum amount of memory usable to store a diff in memory. This extend the ability for Mercurial to create chains without removing all safe guard regarding memory access. The option should be phased out when core has a more proper solution available. Setting the limit to '0' remove all limits, setting it to '-1' use the default limit (textsize x 4).	2017-06-23 13:49:34 +02:00
Gregory Szorc	c45a2cb5ab	revlog: C implementation of delta chain resolution I've seen revlog._deltachain() appear in a number of performance profiles. I suspect there are 2 reasons for this: 1. Delta chain resolution performs many index lookups, thus triggering population of index tuples. Creating possibly tens of thousands of PyObject will have overhead. 2. Delta chain resolution is a tight loop. By moving delta chain resolution to C, we can defer instantiation of full index entry tuples and make the loop faster courtesy of not running in Python. We can measure the impact to delta chain resolution via `hg perflogrevision` using the mozilla-central repo with a recent manifest having delta chain length of 33726: $ hg perfrevlogrevision -m 364895 ! full ! wall 0.367585 comb 0.370000 user 0.340000 sys 0.030000 (best of 27) ! wall 0.357581 comb 0.360000 user 0.350000 sys 0.010000 (best of 28) ! deltachain ! wall 0.010644 comb 0.010000 user 0.010000 sys 0.000000 (best of 270) ! wall 0.000292 comb 0.000000 user 0.000000 sys 0.000000 (best of 8729) $ hg perfrevlogrevision --cache -m 364895 ! deltachain ! wall 0.003904 comb 0.000000 user 0.000000 sys 0.000000 (best of 712) ! wall 0.000284 comb 0.000000 user 0.000000 sys 0.000000 (best of 9926) The first test measures savings from both not instantiating index entries and moving to C. The second test (which doesn't clear the index caches) essentially isolates the benefits of moving from Python to C. It still shows a 13.7x speedup (versus 36.4x). And there are multiple milliseconds of savings within the critical path for resolving revision data. I think that justifies the existence of C code. A more striking example of the benefits of this change can be demonstrated by timing `hg debugdeltachain -m` for the mozilla-central repo: $ time hg debugdeltachain -m > /dev/null before: 1057.4s after: 503.3s PyPy2.7 5.8.0: 220.0s It's worth noting that the C code isn't as optimal as it could be. We're still instantiating a new PyObject for every revision. A future optimization would be to reuse the PyObject on the cached index tuple. We could potentially also get wins by using a memory array of raw integers. There is also room for a delta chain cache on revlog instances. Of course, the best optimization is to implement revlog reading outside of Python so Python doesn't need to be concerned about the relatively expensive index entries and operations on them.	2017-06-25 12:41:34 -07:00
Pulkit Goyal	dc7fe61263	py3: catch binascii.Error raised from binascii.unhexlify Before Python 3, binsacii.unhexlify used to raise TypeError, now it raises binascii.Error.	2017-06-20 22:11:46 +05:30
Martin von Zweigbergk	daced3540c	revlog: rename list of nodes from "content" to "nodes" It seems like the reason for "content" is that the variable contains the nodes that the changegroup "contains", see c2901cddb53f (revlog: make addgroup returns a list of node contained in the added source, 2012-01-13), but "nodes" seems much clearer.	2017-06-15 13:42:35 -07:00
Martin von Zweigbergk	12dbf027df	revlog: delete obsolete comment The comment seems to refer to code that was deleted in a46638c640d8 (revlog.addgroup(): always use _addrevision() to add new revlog entries, 2010-10-08).	2017-06-15 13:25:41 -07:00
Martin von Zweigbergk	f0c7377259	revlog: delete dead assignment in addgroup()	2017-06-15 13:23:21 -07:00
Gregory Szorc	efbb740737	revlog: skeleton support for version 2 revlogs There are a number of improvements we want to make to revlogs that will require a new version - version 2. It is unclear what the full set of improvements will be or when we'll be done with them. What I do know is that the process will likely take longer than a single release, will require input from various stakeholders to evaluate changes, and will have many contentious debates and bikeshedding. It is unrealistic to develop revlog version 2 up front: there are just too many uncertainties that we won't know until things are implemented and experiments are run. Some changes will also be invasive and prone to bit rot, so sitting on dozens of patches is not practical. This commit introduces skeleton support for version 2 revlogs in a way that is flexible and not bound by backwards compatibility concerns. An experimental repo requirement for denoting revlog v2 has been added. The requirement string has a sub-version component to it. This will allow us to declare multiple requirements in the course of developing revlog v2. Whenever we change the in-development revlog v2 format, we can tweak the string, creating a new requirement and locking out old clients. This will allow us to make as many backwards incompatible changes and experiments to revlog v2 as we want. In other words, we can land code and make meaningful progress towards revlog v2 while still maintaining extreme format flexibility up until the point we freeze the format and remove the experimental labels. To enable the new repo requirement, you must supply an experimental and undocumented config option. But not just any boolean flag will do: you need to explicitly use a value that no sane person should ever type. This is an additional guard against enabling revlog v2 on an installation it shouldn't be enabled on. The specific scenario I'm trying to prevent is say a user with a 4.4 client with a frozen format enabling the option but then downgrading to 4.3 and accidentally creating repos with an outdated and unsupported repo format. Requiring a "challenge" string should prevent this. Because the format is not yet finalized and I don't want to take any chances, revlog v2's version is currently 0xDEAD. I figure squatting on a value we're likely never to use as an actual revlog version to mean "internal testing only" is acceptable. And "dead" is easily recognized as something meaningful. There is a bunch of cleanup that is needed before work on revlog v2 begins in earnest. I plan on doing that work once this patch is accepted and we're comfortable with the idea of starting down this path.	2017-05-19 20:29:11 -07:00
Yuya Nishihara	685172007c	revlog: add support for partial matching of wdir node id The idea is simple. If the given node id prefix is 'ff...f', add +1 to the number of matches (e.g. ambiguous if partial + maybewdir > 1). This patch also fixes id() revset and shortest() template since _partialmatch() can raise WdirUnsupported exception.	2016-08-19 18:26:04 +09:00
Yuya Nishihara	b42457ae0a	revlog: map rev(wdirid) to WdirUnsupported exception This will allow us to map repo["ff..."] to workingctx. _partialmatch() will be updated later. I tried "return wdirrev" in place of raising the exception, but earlier exception seemed better.	2016-08-20 22:37:58 +09:00
Pulkit Goyal	dfd06a1929	revlog: raise error.WdirUnsupported from revlog.node() if wdirrev is passed When we try to run, 'hg debugrevspec 'branch(wdir())'', it throws an index error and blows up. Lets raise the WdirUnsupported if wdir() is passed so that we can catch that later.	2017-05-23 01:30:36 +05:30
Pulkit Goyal	26a5b62b59	revlog: raise WdirUnsupported when wdirrev is passed revlog.parentrevs() is called while evaluating ^ operator in revsets. When wdir is passed, it raises IndexError. This patch raises WdirUnsupported if wdir is passed in the function. The error will be caugth in future patches.	2017-05-19 19:12:06 +05:30
Gregory Szorc	7d51a8278d	revlog: remove some revlogNG terminology RevlogNG is not such a good name when it is no longer the newest revlog version. Since we'll soon have revlog version 2, let's remove some references to it.	2017-05-19 20:14:31 -07:00
Gregory Szorc	5bcef1853c	revlog: tweak wording and logic for flags validation First, the logic around the if..elif..elif was subtly wrong and sub-optimal because all branches would be tested as long as the revlog was valid. This patch changes things so it behaves like a switch statement over the revlog version. While I was here, I also tweaked error strings to make them consistent and to read better.	2017-05-19 20:10:50 -07:00
Yuya Nishihara	4563e16232	parsers: switch to policy importer # no-check-commit	2016-08-13 12:23:56 +09:00
Gregory Szorc	8af088ee65	revlog: rename constants (API) Feature flag constants don't need "NG" in the name because they will presumably apply to non-"NG" version revlogs. All feature flag constants should also share a similar naming convention to identify them as such. And, "RevlogNG" isn't a great internal name since it isn't obvious it maps to version 1 revlogs. Plus, "NG" (next generation) is only a good name as long as it is the latest version. Since we're talking about version 2, now is as good a time as any to move on from that naming.	2017-05-17 19:52:18 -07:00
Jun Wu	718861a5f7	changelog: make sure datafile is 00changelog.d (API) 0ad0d26ff7 makes it possible for changelog datafile to be "00changelog.i.d", which is wrong. This patch adds an explicit datafile parameter to fix it.	2017-05-17 20:14:27 -07:00
Martin von Zweigbergk	c3406ac3db	cleanup: use set literals We no longer support Python 2.6, so we can now use set literals.	2017-02-10 16:56:29 -08:00
Jun Wu	4656f56bb3	flagprocessor: add a fast path when flags is 0 When flags is 0, _processflags could be a no-op instead of iterating through the flag bits.	2017-05-10 16:17:58 -07:00
Jun Wu	2c11c92a85	revlog: move part of "addrevision" to "addrawrevision" "addrawrevision" will be the public API to reuse revision rawdata elsewhere. It will be used by a future patch.	2017-05-09 21:27:06 -07:00
Gregory Szorc	5d6e940365	revlog: rename _chunkraw to _getsegmentforrevs() This completes our rename of internal revlog methods to distinguish between low-level raw revlog data "segments" and higher-level, per-revision "chunks." perf.py has been updated to consult both names so it will work against older Mercurial versions.	2017-05-06 12:12:53 -07:00
Gregory Szorc	46413ff643	revlog: rename internal functions containing "chunk" to use "segment" Currently, "chunk" is overloaded in revlog terminology to mean multiple things. One of them refers to a segment of raw data from the revlog. This commit renames various methods only used within revlog.py to have "segment" in their name instead of "chunk." While I was here, I also made the names more descriptive. e.g. "_loadchunk()" becomes "_readsegment()" because it actually does I/O.	2017-05-06 12:02:12 -07:00
Jun Wu	0606028aff	revlog: make "size" diverge from "rawsize" Previously, revlog.size equals to revlog.rawsize. However, the flag processor framework could make a difference - "size" could mean the length of len(revision(raw=False)), while "rawsize" means len(revision(raw=True)). This patch makes it so. This corrects "hg status" output when flag processor is involved. The call stack looks like: basectx.status -> workingctx._buildstatus -> workingctx._dirstatestatus -> workingctx._checklookup -> filectx.cmp -> filelog.cmp -> filelog.size -> revlog.size	2017-04-09 12:53:31 -07:00
Jun Wu	e557e14680	revlog: avoid applying delta chain on cache hit Previously, revlog.revision(raw=False) may try to apply the delta chain on _cache hit. That happens if flags are non-empty. This patch makes rawtext reused so delta chain application is avoided. "_cache" and "rev" are moved a bit to avoid unnecessary assignments.	2017-04-02 18:40:13 -07:00
Jun Wu	5f26616d71	revlog: indent block to make review easier	2017-04-02 18:29:24 -07:00
Jun Wu	2ab18ee566	revlog: avoid calculating "flags" twice in revision() This is more consistent with other code in "revision()" - prefer performance to code length.	2017-04-02 18:25:12 -07:00
Jun Wu	20165e0767	revlog: use raw revision for rawsize When writing the revlog-ng index, the third field is len(rawtext). See revlog._addrevision: textlen = len(rawtext) .... e = (offset_type(offset, flags), l, textlen, base, link, p1r, p2r, node) self.index.insert(-1, e) Therefore, revlog.index[rev][2] returned by revlog.rawsize should be len(rawtext), where "rawtext" is revlog.revision(raw=True). Unfortunately it's hard to add a test for this code path because "if l >= 0" catches most cases.	2017-04-02 18:57:03 -07:00
Jun Wu	7151069c4a	revlog: add a fast path for revision(raw=False) If cache hit and flags are empty, no flag processor runs and "text" equals to "rawtext". So we check flags, and return rawtext. This resolves performance issue introduced by a previous patch.	2017-03-30 21:21:15 -07:00
Jun Wu	ae8c9ce375	revlog: make _addrevision only accept rawtext All 3 users of _addrevision use raw: - addrevision: passing rawtext to _addrevision - addgroup: passing rawtext and raw=True to _addrevision - clone: passing rawtext to _addrevision There is no real user using _addrevision(raw=False). On the other hand, _addrevision is low-level code dealing with raw revlog deltas and rawtexts. It should not transform rawtext to non-raw text. This patch removes the "raw" parameter from "_addrevision", and does some rename and doc change to make it clear that "_addrevision" expects rawtext. Archeology shows 886a08012bbe added "raw" flag to "_addrevision", follow-ups fe1e206cb389 and 1cfa6239c923 seem to make the flag unnecessary. test-revlog-raw.py no longer complains.	2017-03-30 18:38:03 -07:00
Jun Wu	9a6035a980	revlog: use raw revisions in clone test-revlog-raw.py now shows "clone test passed", but there is more to fix.	2017-03-30 18:24:23 -07:00
Jun Wu	2468c838bd	revlog: use raw revisions in revdiff See the added comment. revdiff is meant to output the raw delta that will be written to revlog. It should use raw. test-revlog-raw.py now shows "addgroupcopy test passed", but there is more to fix.	2017-03-30 18:23:27 -07:00
Jun Wu	558f5cce61	revlog: use raw content when building delta Using external content provided by flagprocessor when building revlog delta is wrong, because deltas are applied to raw contents in revlog. This patch fixes the above issue by adding "raw=True". test-revlog-raw.py now shows "local test passed", but there is more to fix.	2017-03-30 17:58:03 -07:00
Jun Wu	50b232c61f	revlog: fix _cache usage in revision() As documented at revlog.__init__, revlog._cache stores raw text. The current read and write usage of "_cache" in revlog.revision lacks of raw=True check. This patch fixes that by adding check about raw, and storing rawtext explicitly in _cache. Note: it may slow down cache hit code path when raw=False and flags=0. That performance issue will be fixed in a later patch. test-revlog-raw now points us to a new problem.	2017-03-30 15:34:08 -07:00
Jun Wu	fec2bbc9e9	revlog: rename some "text"s to "rawtext" This makes code easier to understand. "_addrevision" is left untouched - it will be changed in a later patch.	2017-03-30 14:56:09 -07:00
Jun Wu	b483b1b607	revlog: clarify flagprocessor documentation The words "text", "newtext", "bool" could be confusing. Use explicit "text" or "rawtext" and document more about the "bool".	2017-03-30 07:59:48 -07:00
Jun Wu	7f99b86dbd	revlog: avoid unnecessary node -> rev conversion	2017-03-29 16:23:04 -07:00
Yuya Nishihara	5a92909ce0	py3: fix slicing of byte string in revlog.compress() I tried .startswith('\0'), but data wasn't always a bytes nor a bytearray.	2017-03-26 17:12:06 +09:00
Augie Fackler	8471f9f823	revlog: use pycompat.maplist to eagerly evaluate map on Python 3 According to Pulkit, this should fix `hg status --all` on Python 3.	2017-03-21 17:39:49 -04:00
Augie Fackler	e0f1b901d8	revlog: use int instead of long By my reading of PEP 237[0], this is completely safe and has been since Python 2.2. 0: https://www.python.org/dev/peps/pep-0237/	2017-03-19 01:05:28 -04:00
Augie Fackler	bc09440907	revlog: use bytes() instead of str() to get data from memoryview Fixes `files -v` on Python 3.	2017-03-12 15:27:02 -04:00
Augie Fackler	03a50eb15f	revlog: use bytes() to ensure text from _chunks is a reasonable type	2017-03-12 03:32:38 -04:00
Augie Fackler	58dedd9fd0	revlog: extract first byte of revlog with a slice so it's portable	2017-03-12 00:49:49 -05:00
Martin von Zweigbergk	ad5f4ef8a6	revlog: give EXTSTORED flag value to narrowhg Narrowhg has been using "1 << 14" as its revlog flag value for a long time. We (Google) have many repos with that value in production already. When the same value was reserved for EXTSTORED, it made those repos invalid. Upgrading them will be a little painful. We should clearly have reserved the value for narrowhg a long time ago. Since the EXTSTORED flag is not yet in any release and Facebook also says they have not started using it in production, so it should be okay to change it. This patch gives the current value (1 << 14) back to narrowhg and gives a new value (1 << 13) to EXTSTORED.	2017-01-17 11:25:02 -08:00
Gregory Szorc	765aada92f	localrepo: experimental support for non-zlib revlog compression The final part of integrating the compression manager APIs into revlog storage is the plumbing for repositories to advertise they are using non-zlib storage and for revlogs to instantiate a non-zlib compression engine. The main intent of the compression manager work was to zstd all of the things. Adding zstd to revlogs has proved to be more involved than other places because revlogs are... special. Very small inputs and the use of delta chains (which are themselves a form of compression) are a completely different use case from streaming compression, which bundles and the wire protocol employ. I've conducted numerous experiments with zstd in revlogs and have yet to formalize compression settings and a storage architecture that I'm confident I won't regret later. In other words, I'm not yet ready to commit to a new mechanism for using zstd - or any other compression format - in revlogs. That being said, having some support for zstd (and other compression formats) in revlogs in core is beneficial. It can allow others to conduct experiments. This patch introduces highly experimental support for non-zlib compression formats in revlogs. Introduced is a config option to control which compression engine to use. Also introduced is a namespace of "exp-compression-" requirements to denote support for non-zlib compression in revlogs. I've prefixed the namespace with "exp-" (short for "experimental") because I'm not confident of the requirements "schema" and in no way want to give the illusion of supporting these requirements in the future. I fully intend to drop support for these requirements once we figure out what we're doing with zstd in revlogs. A good portion of the patch is teaching the requirements system about registered compression engines and passing the requested compression engine as an opener option so revlogs can instantiate the proper compression engine for new operations. That's a verbose way of saying "we can now use zstd in revlogs!" On an `hg pull` conversion of the mozilla-unified repo with no extra redelta settings (like aggressivemergedeltas), we can see the impact of zstd vs zlib in revlogs: $ hg perfrevlogchunks -c ! chunk ! wall 2.032052 comb 2.040000 user 1.990000 sys 0.050000 (best of 5) ! wall 1.866360 comb 1.860000 user 1.820000 sys 0.040000 (best of 6) ! chunk batch ! wall 1.877261 comb 1.870000 user 1.860000 sys 0.010000 (best of 6) ! wall 1.705410 comb 1.710000 user 1.690000 sys 0.020000 (best of 6) $ hg perfrevlogchunks -m ! chunk ! wall 2.721427 comb 2.720000 user 2.640000 sys 0.080000 (best of 4) ! wall 2.035076 comb 2.030000 user 1.950000 sys 0.080000 (best of 5) ! chunk batch ! wall 2.614561 comb 2.620000 user 2.580000 sys 0.040000 (best of 4) ! wall 1.910252 comb 1.910000 user 1.880000 sys 0.030000 (best of 6) $ hg perfrevlog -c -d 1 ! wall 4.812885 comb 4.820000 user 4.800000 sys 0.020000 (best of 3) ! wall 4.699621 comb 4.710000 user 4.700000 sys 0.010000 (best of 3) $ hg perfrevlog -m -d 1000 ! wall 34.252800 comb 34.250000 user 33.730000 sys 0.520000 (best of 3) ! wall 24.094999 comb 24.090000 user 23.320000 sys 0.770000 (best of 3) Only modest wins for the changelog. But manifest reading is significantly faster. What's going on? One reason might be data volume. zstd decompresses faster. So given more bytes, it will put more distance between it and zlib. Another reason is size. In the current design, zstd revlogs are larger*: debugcreatestreamclonebundle (size in bytes) zlib: 1,638,852,492 zstd: 1,680,601,332 I haven't investigated this fully, but I reckon a significant cause of larger revlogs is that the zstd frame/header has more bytes than zlib's. For very small inputs or data that doesn't compress well, we'll tend to store more uncompressed chunks than with zlib (because the compressed size isn't smaller than original). This will make revlog reading faster because it is doing less decompression. Moving on to bundle performance: $ hg bundle -a -t none-v2 (total CPU time) zlib: 102.79s zstd: 97.75s So, marginal CPU decrease for reading all chunks in all revlogs (this is somewhat disappointing). $ hg bundle -a -t <engine>-v2 (total CPU time) zlib: 191.59s zstd: 115.36s This last test effectively measures the difference between zlib->zlib and zstd->zstd for revlogs to bundle. This is a rough approximation of what a server does during `hg clone`. There are some promising results for zstd. But not enough for me to feel comfortable advertising it to users. We'll get there...	2017-01-13 20:16:56 -08:00
Gregory Szorc	94d36bba2d	revlog: use compression engine APIs for decompression Now that compression engines declare their header in revlog chunks and can decompress revlog chunks, we refactor revlog.decompress() to use them. Making full use of the property that revlog compressor objects are reusable, revlog instances now maintain a dict mapping an engine's revlog header to a compressor object. This is not only a performance optimization for engines where compressor object reuse can result in better performance, but it also serves as a cache of header values so we don't need to perform redundant lookups against the compression engine manager. (Yes, I measured and the overhead of a function call versus a dict lookup was observed.) Replacing the previous inline lookup table with a dict lookup was measured to make chunk reading ~2.5% slower on changelogs and ~4.5% slower on manifests. So, the inline lookup table has been mostly preserved so we don't lose performance. This is unfortunate. But many decompression operations complete in microseconds, so Python attribute lookup, dict lookup, and function calls do matter. The impact of this change on mozilla-unified is as follows: $ hg perfrevlogchunks -c ! chunk ! wall 1.953663 comb 1.950000 user 1.920000 sys 0.030000 (best of 6) ! wall 1.946000 comb 1.940000 user 1.910000 sys 0.030000 (best of 6) ! chunk batch ! wall 1.791075 comb 1.800000 user 1.760000 sys 0.040000 (best of 6) ! wall 1.785690 comb 1.770000 user 1.750000 sys 0.020000 (best of 6) $ hg perfrevlogchunks -m ! chunk ! wall 2.587262 comb 2.580000 user 2.550000 sys 0.030000 (best of 4) ! wall 2.616330 comb 2.610000 user 2.560000 sys 0.050000 (best of 4) ! chunk batch ! wall 2.427092 comb 2.420000 user 2.400000 sys 0.020000 (best of 5) ! wall 2.462061 comb 2.460000 user 2.400000 sys 0.060000 (best of 4) Changelog chunk reading is slightly faster but manifest reading is slower. What gives? On this repo, 99.85% of changelog entries are zlib compressed (the 'x' header). On the manifest, 67.5% are zlib and 32.4% are '\0'. This patch swapped the test order of 'x' and '\0' so now 'x' is tested first. This makes changelogs faster since they almost always hit the first branch. This makes a significant percentage of manifest '\0' chunks slower because that code path now performs an extra test. Yes, I too can't believe we're able to measure the impact of an if..elif with simple string compares. I reckon this code would benefit from being written in C...	2017-01-13 19:58:00 -08:00
Gregory Szorc	24c1205d69	revlog: use compression engine API for compression This commit swaps in the just-added revlog compressor API into the revlog class. Instead of implementing zlib compression inline in compress(), we now store a cached-on-first-use revlog compressor on each revlog instance and invoke its "compress()" method. As part of this, revlog.compress() has been refactored a bit to use a cleaner code flow and modern formatting (e.g. avoiding parenthesis around returned tuples). On a mozilla-unified repo, here are the "compress" times for a few commands: $ hg perfrevlogchunks -c ! wall 5.772450 comb 5.780000 user 5.780000 sys 0.000000 (best of 3) ! wall 5.795158 comb 5.790000 user 5.790000 sys 0.000000 (best of 3) $ hg perfrevlogchunks -m ! wall 9.975789 comb 9.970000 user 9.970000 sys 0.000000 (best of 3) ! wall 10.019505 comb 10.010000 user 10.010000 sys 0.000000 (best of 3) Compression times did seem to slow down just a little. There are 360,210 changelog revisions and 359,342 manifest revisions. For the changelog, mean time to compress a revision increased from ~16.025us to ~16.088us. That's basically a function call or an attribute lookup. I suppose this is the price you pay for abstraction. It's so low that I'm not concerned.	2017-01-02 11:22:52 -08:00
Gregory Szorc	1a6670d670	revlog: move decompress() from module to revlog class (API) Upcoming patches will convert revlogs to use the compression engine APIs to perform all things compression. The yet-to-be-introduced APIs support a persistent "compressor" object so the same object can be reused for multiple compression operations, leading to better performance. In addition, compression engines like zstd may wish to tweak compression engine state based on the revlog (e.g. per-revlog compression dictionaries). A global and shared decompress() function will shortly no longer make much sense. So, we move decompress() to be a method of the revlog class. It joins compress() there. On the mozilla-unified repo, we can measure the impact of this change on reading performance: $ hg perfrevlogchunks -c ! chunk ! wall 1.932573 comb 1.930000 user 1.900000 sys 0.030000 (best of 6) ! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6) ! chunk batch ! wall 1.787879 comb 1.780000 user 1.770000 sys 0.010000 (best of 6 ! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6) "chunk" appeared to become slower but "chunk batch" got faster. Upon further examination by running both sets multiple times, the numbers appear to converge across all runs. This tells me that there is no perceived performance impact to this refactor.	2017-01-02 13:00:16 -08:00
Gregory Szorc	df8167ed29	revlog: make compressed size comparisons consistent revlog.compress() compares the compressed size to the input size and throws away the compressed data if it is larger than the input. This is the correct thing to do, as storing compressed data that is larger than the input takes up more storage space and makes reading slower. However, the comparison was implemented inconsistently. For the streaming compression mode, we threw away the result if it was greater than or equal to the input size. But for the one-shot compression, we threw away the compression only if it was greater than the input size! This patch changes the comparison for the simple case so it is consistent with the streaming case. As a few tests demonstrate, this adds 1 byte to some revlog entries. This is because of an added 'u' header on the chunk. It seems somewhat wrong to increase the revlog size here. However, IMO the cost of 1 byte in storage is insignificant compared to the performance gains of avoiding decompression. This patch should invite questions around the heuristic for throwing away compressed data. For example, I'd argue we should be more liberal about rejecting compressed data, additionally doing so where the number of bytes saved fails to reach a threshold. But we can have this discussion another time.	2017-01-02 11:50:17 -08:00
Gregory Szorc	4dbc7459c8	revlog: add clone method Upcoming patches will introduce functionality for in-place repository/store "upgrades." Copying the contents of a revlog feels sufficiently low-level to warrant being in the revlog class. So this commit implements that functionality. Because full delta recomputation can be very expensive (we're talking several hours on the Firefox repository), we support multiple modes of execution with regards to delta (re)use. This will allow repository upgrades to choose the "level" of processing/optimization they wish to perform when converting revlogs. It's not obvious from this commit, but "addrevisioncb" will be used for progress reporting.	2016-12-18 17:02:57 -08:00
Remi Chaintron	66071d6de5	revlog: REVIDX_EXTSTORED flag This flag will be used by the lfs extension to mark the revision data as stored externally.	2017-01-05 17:16:51 +00:00

1 2 3 4 5 ...

597 Commits