sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-08 15:57:43 +03:00

Author	SHA1	Message	Date
Jun Wu	7f99b86dbd	revlog: avoid unnecessary node -> rev conversion	2017-03-29 16:23:04 -07:00
Yuya Nishihara	5a92909ce0	py3: fix slicing of byte string in revlog.compress() I tried .startswith('\0'), but data wasn't always a bytes nor a bytearray.	2017-03-26 17:12:06 +09:00
Augie Fackler	8471f9f823	revlog: use pycompat.maplist to eagerly evaluate map on Python 3 According to Pulkit, this should fix `hg status --all` on Python 3.	2017-03-21 17:39:49 -04:00
Augie Fackler	e0f1b901d8	revlog: use int instead of long By my reading of PEP 237[0], this is completely safe and has been since Python 2.2. 0: https://www.python.org/dev/peps/pep-0237/	2017-03-19 01:05:28 -04:00
Augie Fackler	bc09440907	revlog: use bytes() instead of str() to get data from memoryview Fixes `files -v` on Python 3.	2017-03-12 15:27:02 -04:00
Augie Fackler	03a50eb15f	revlog: use bytes() to ensure text from _chunks is a reasonable type	2017-03-12 03:32:38 -04:00
Augie Fackler	58dedd9fd0	revlog: extract first byte of revlog with a slice so it's portable	2017-03-12 00:49:49 -05:00
Martin von Zweigbergk	ad5f4ef8a6	revlog: give EXTSTORED flag value to narrowhg Narrowhg has been using "1 << 14" as its revlog flag value for a long time. We (Google) have many repos with that value in production already. When the same value was reserved for EXTSTORED, it made those repos invalid. Upgrading them will be a little painful. We should clearly have reserved the value for narrowhg a long time ago. Since the EXTSTORED flag is not yet in any release and Facebook also says they have not started using it in production, so it should be okay to change it. This patch gives the current value (1 << 14) back to narrowhg and gives a new value (1 << 13) to EXTSTORED.	2017-01-17 11:25:02 -08:00
Gregory Szorc	765aada92f	localrepo: experimental support for non-zlib revlog compression The final part of integrating the compression manager APIs into revlog storage is the plumbing for repositories to advertise they are using non-zlib storage and for revlogs to instantiate a non-zlib compression engine. The main intent of the compression manager work was to zstd all of the things. Adding zstd to revlogs has proved to be more involved than other places because revlogs are... special. Very small inputs and the use of delta chains (which are themselves a form of compression) are a completely different use case from streaming compression, which bundles and the wire protocol employ. I've conducted numerous experiments with zstd in revlogs and have yet to formalize compression settings and a storage architecture that I'm confident I won't regret later. In other words, I'm not yet ready to commit to a new mechanism for using zstd - or any other compression format - in revlogs. That being said, having some support for zstd (and other compression formats) in revlogs in core is beneficial. It can allow others to conduct experiments. This patch introduces highly experimental support for non-zlib compression formats in revlogs. Introduced is a config option to control which compression engine to use. Also introduced is a namespace of "exp-compression-" requirements to denote support for non-zlib compression in revlogs. I've prefixed the namespace with "exp-" (short for "experimental") because I'm not confident of the requirements "schema" and in no way want to give the illusion of supporting these requirements in the future. I fully intend to drop support for these requirements once we figure out what we're doing with zstd in revlogs. A good portion of the patch is teaching the requirements system about registered compression engines and passing the requested compression engine as an opener option so revlogs can instantiate the proper compression engine for new operations. That's a verbose way of saying "we can now use zstd in revlogs!" On an `hg pull` conversion of the mozilla-unified repo with no extra redelta settings (like aggressivemergedeltas), we can see the impact of zstd vs zlib in revlogs: $ hg perfrevlogchunks -c ! chunk ! wall 2.032052 comb 2.040000 user 1.990000 sys 0.050000 (best of 5) ! wall 1.866360 comb 1.860000 user 1.820000 sys 0.040000 (best of 6) ! chunk batch ! wall 1.877261 comb 1.870000 user 1.860000 sys 0.010000 (best of 6) ! wall 1.705410 comb 1.710000 user 1.690000 sys 0.020000 (best of 6) $ hg perfrevlogchunks -m ! chunk ! wall 2.721427 comb 2.720000 user 2.640000 sys 0.080000 (best of 4) ! wall 2.035076 comb 2.030000 user 1.950000 sys 0.080000 (best of 5) ! chunk batch ! wall 2.614561 comb 2.620000 user 2.580000 sys 0.040000 (best of 4) ! wall 1.910252 comb 1.910000 user 1.880000 sys 0.030000 (best of 6) $ hg perfrevlog -c -d 1 ! wall 4.812885 comb 4.820000 user 4.800000 sys 0.020000 (best of 3) ! wall 4.699621 comb 4.710000 user 4.700000 sys 0.010000 (best of 3) $ hg perfrevlog -m -d 1000 ! wall 34.252800 comb 34.250000 user 33.730000 sys 0.520000 (best of 3) ! wall 24.094999 comb 24.090000 user 23.320000 sys 0.770000 (best of 3) Only modest wins for the changelog. But manifest reading is significantly faster. What's going on? One reason might be data volume. zstd decompresses faster. So given more bytes, it will put more distance between it and zlib. Another reason is size. In the current design, zstd revlogs are larger*: debugcreatestreamclonebundle (size in bytes) zlib: 1,638,852,492 zstd: 1,680,601,332 I haven't investigated this fully, but I reckon a significant cause of larger revlogs is that the zstd frame/header has more bytes than zlib's. For very small inputs or data that doesn't compress well, we'll tend to store more uncompressed chunks than with zlib (because the compressed size isn't smaller than original). This will make revlog reading faster because it is doing less decompression. Moving on to bundle performance: $ hg bundle -a -t none-v2 (total CPU time) zlib: 102.79s zstd: 97.75s So, marginal CPU decrease for reading all chunks in all revlogs (this is somewhat disappointing). $ hg bundle -a -t <engine>-v2 (total CPU time) zlib: 191.59s zstd: 115.36s This last test effectively measures the difference between zlib->zlib and zstd->zstd for revlogs to bundle. This is a rough approximation of what a server does during `hg clone`. There are some promising results for zstd. But not enough for me to feel comfortable advertising it to users. We'll get there...	2017-01-13 20:16:56 -08:00
Gregory Szorc	94d36bba2d	revlog: use compression engine APIs for decompression Now that compression engines declare their header in revlog chunks and can decompress revlog chunks, we refactor revlog.decompress() to use them. Making full use of the property that revlog compressor objects are reusable, revlog instances now maintain a dict mapping an engine's revlog header to a compressor object. This is not only a performance optimization for engines where compressor object reuse can result in better performance, but it also serves as a cache of header values so we don't need to perform redundant lookups against the compression engine manager. (Yes, I measured and the overhead of a function call versus a dict lookup was observed.) Replacing the previous inline lookup table with a dict lookup was measured to make chunk reading ~2.5% slower on changelogs and ~4.5% slower on manifests. So, the inline lookup table has been mostly preserved so we don't lose performance. This is unfortunate. But many decompression operations complete in microseconds, so Python attribute lookup, dict lookup, and function calls do matter. The impact of this change on mozilla-unified is as follows: $ hg perfrevlogchunks -c ! chunk ! wall 1.953663 comb 1.950000 user 1.920000 sys 0.030000 (best of 6) ! wall 1.946000 comb 1.940000 user 1.910000 sys 0.030000 (best of 6) ! chunk batch ! wall 1.791075 comb 1.800000 user 1.760000 sys 0.040000 (best of 6) ! wall 1.785690 comb 1.770000 user 1.750000 sys 0.020000 (best of 6) $ hg perfrevlogchunks -m ! chunk ! wall 2.587262 comb 2.580000 user 2.550000 sys 0.030000 (best of 4) ! wall 2.616330 comb 2.610000 user 2.560000 sys 0.050000 (best of 4) ! chunk batch ! wall 2.427092 comb 2.420000 user 2.400000 sys 0.020000 (best of 5) ! wall 2.462061 comb 2.460000 user 2.400000 sys 0.060000 (best of 4) Changelog chunk reading is slightly faster but manifest reading is slower. What gives? On this repo, 99.85% of changelog entries are zlib compressed (the 'x' header). On the manifest, 67.5% are zlib and 32.4% are '\0'. This patch swapped the test order of 'x' and '\0' so now 'x' is tested first. This makes changelogs faster since they almost always hit the first branch. This makes a significant percentage of manifest '\0' chunks slower because that code path now performs an extra test. Yes, I too can't believe we're able to measure the impact of an if..elif with simple string compares. I reckon this code would benefit from being written in C...	2017-01-13 19:58:00 -08:00
Gregory Szorc	24c1205d69	revlog: use compression engine API for compression This commit swaps in the just-added revlog compressor API into the revlog class. Instead of implementing zlib compression inline in compress(), we now store a cached-on-first-use revlog compressor on each revlog instance and invoke its "compress()" method. As part of this, revlog.compress() has been refactored a bit to use a cleaner code flow and modern formatting (e.g. avoiding parenthesis around returned tuples). On a mozilla-unified repo, here are the "compress" times for a few commands: $ hg perfrevlogchunks -c ! wall 5.772450 comb 5.780000 user 5.780000 sys 0.000000 (best of 3) ! wall 5.795158 comb 5.790000 user 5.790000 sys 0.000000 (best of 3) $ hg perfrevlogchunks -m ! wall 9.975789 comb 9.970000 user 9.970000 sys 0.000000 (best of 3) ! wall 10.019505 comb 10.010000 user 10.010000 sys 0.000000 (best of 3) Compression times did seem to slow down just a little. There are 360,210 changelog revisions and 359,342 manifest revisions. For the changelog, mean time to compress a revision increased from ~16.025us to ~16.088us. That's basically a function call or an attribute lookup. I suppose this is the price you pay for abstraction. It's so low that I'm not concerned.	2017-01-02 11:22:52 -08:00
Gregory Szorc	1a6670d670	revlog: move decompress() from module to revlog class (API) Upcoming patches will convert revlogs to use the compression engine APIs to perform all things compression. The yet-to-be-introduced APIs support a persistent "compressor" object so the same object can be reused for multiple compression operations, leading to better performance. In addition, compression engines like zstd may wish to tweak compression engine state based on the revlog (e.g. per-revlog compression dictionaries). A global and shared decompress() function will shortly no longer make much sense. So, we move decompress() to be a method of the revlog class. It joins compress() there. On the mozilla-unified repo, we can measure the impact of this change on reading performance: $ hg perfrevlogchunks -c ! chunk ! wall 1.932573 comb 1.930000 user 1.900000 sys 0.030000 (best of 6) ! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6) ! chunk batch ! wall 1.787879 comb 1.780000 user 1.770000 sys 0.010000 (best of 6 ! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6) "chunk" appeared to become slower but "chunk batch" got faster. Upon further examination by running both sets multiple times, the numbers appear to converge across all runs. This tells me that there is no perceived performance impact to this refactor.	2017-01-02 13:00:16 -08:00
Gregory Szorc	df8167ed29	revlog: make compressed size comparisons consistent revlog.compress() compares the compressed size to the input size and throws away the compressed data if it is larger than the input. This is the correct thing to do, as storing compressed data that is larger than the input takes up more storage space and makes reading slower. However, the comparison was implemented inconsistently. For the streaming compression mode, we threw away the result if it was greater than or equal to the input size. But for the one-shot compression, we threw away the compression only if it was greater than the input size! This patch changes the comparison for the simple case so it is consistent with the streaming case. As a few tests demonstrate, this adds 1 byte to some revlog entries. This is because of an added 'u' header on the chunk. It seems somewhat wrong to increase the revlog size here. However, IMO the cost of 1 byte in storage is insignificant compared to the performance gains of avoiding decompression. This patch should invite questions around the heuristic for throwing away compressed data. For example, I'd argue we should be more liberal about rejecting compressed data, additionally doing so where the number of bytes saved fails to reach a threshold. But we can have this discussion another time.	2017-01-02 11:50:17 -08:00
Gregory Szorc	4dbc7459c8	revlog: add clone method Upcoming patches will introduce functionality for in-place repository/store "upgrades." Copying the contents of a revlog feels sufficiently low-level to warrant being in the revlog class. So this commit implements that functionality. Because full delta recomputation can be very expensive (we're talking several hours on the Firefox repository), we support multiple modes of execution with regards to delta (re)use. This will allow repository upgrades to choose the "level" of processing/optimization they wish to perform when converting revlogs. It's not obvious from this commit, but "addrevisioncb" will be used for progress reporting.	2016-12-18 17:02:57 -08:00
Remi Chaintron	66071d6de5	revlog: REVIDX_EXTSTORED flag This flag will be used by the lfs extension to mark the revision data as stored externally.	2017-01-05 17:16:51 +00:00
Remi Chaintron	dfc79cbfc3	revlog: flag processor Add the ability for revlog objects to process revision flags and apply registered transforms on read/write operations. This patch introduces: - the 'revlog._processflags()' method that looks at revision flags and applies flag processors registered on them. Due to the need to handle non-commutative operations, flag transforms are applied in stable order but the order in which the transforms are applied is reversed between read and write operations. - the 'addflagprocessor()' method allowing to register processors on flags. Flag processors are defined as a 3-tuple of (read, write, raw) functions to be applied depending on the operation being performed. - an update on 'revlog.addrevision()' behavior. The current flagprocessor design relies on extensions to wrap around 'addrevision()' to set flags on revision data, and on the flagprocessor to perform the actual transformation of its contents. In the lfs case, this means we need to process flags before we meet the 2GB size check, leading to performing some operations before it happens: - if flags are set on the revision data, we assume some extensions might be modifying the contents using the flag processor next, and we compute the node for the original revision data (still allowing extension to override the node by wrapping around 'addrevision()'). - we then invoke the flag processor to apply registered transforms (in lfs's case, drastically reducing the size of large blobs). - finally, we proceed with the 2GB size check. Note: In the case a cachedelta is passed to 'addrevision()' and we detect the flag processor modified the revision data, we chose to trust the flag processor and drop the cachedelta.	2017-01-10 16:15:21 +00:00
Remi Chaintron	bd07cff7ec	revlog: pass revlog flags to addrevision Adding the ability to passing flags to addrevision instead of simply passing default flags to _addrevision will allow extensions relying on flag transforms to wrap around addrevision() in order to update revlog flags. The first use case of this patch will be the lfs extension marking nodes as stored externally when the contents are larger than the defined threshold. One of the reasons leading to setting flags in addrevision() wrappers in the flag processor design is that it allows to detect files larger than the 2GB limit before the check is performed, which allows lfs to transform the contents into metadata.	2017-01-05 17:16:07 +00:00
Remi Chaintron	6d11b9177b	revlog: add 'raw' argument to revision and _addrevision This patch introduces a new 'raw' argument (defaults to False) to revlog's revision() and _addrevision() methods. When the 'raw' argument is set to True, it indicates the revision data should be handled as raw data by the flagprocessor. Note: Given revlog.addgroup() calls are restricted to changegroup generation, we can always set raw to True when calling revlog._addrevision() from revlog.addgroup().	2017-01-05 17:16:07 +00:00
Remi Chaintron	cc88d4a3c4	revlog: merge hash checking subfunctions This patch factors the behavior of both methods into 'checkhash'.	2016-12-13 14:21:36 +00:00
Cotizo Sima	56dfac3f63	revlog: ensure that flags do not overflow 2 bytes This patch adds a line that ensures we are not setting by mistake a set of flags overlfowing the 2 bytes they are allocated. Given the way the data is packed in the revlog header, overflowing 2 bytes will result in setting a wrong offset.	2016-11-28 04:34:01 -08:00
Augie Fackler	2d9c1e8476	revlog: avoid shadowing several variables using list comprehensions	2016-11-10 16:34:43 -05:00
Gregory Szorc	83ab000007	revlog: optimize _chunkraw when startrev==endrev In many cases, _chunkraw() is called with startrev==endrev. When this is true, we can avoid an extra index lookup and some other minor operations. On the mozilla-unified repo, `hg perfrevlogchunks -c` says this has the following impact: ! read w/ reused fd ! wall 0.371846 comb 0.370000 user 0.350000 sys 0.020000 (best of 27) ! wall 0.337930 comb 0.330000 user 0.300000 sys 0.030000 (best of 30) ! read batch w/ reused fd ! wall 0.014952 comb 0.020000 user 0.000000 sys 0.020000 (best of 197) ! wall 0.014866 comb 0.010000 user 0.000000 sys 0.010000 (best of 196) So, we've gone from ~25x slower than batch to ~22.5x slower. At this point, there's probably not much else we can do except implement an optimized function in the index itself, including in C.	2016-10-23 10:40:33 -07:00
Gregory Szorc	4d79c96e22	revlog: inline start() and end() for perf reasons When I implemented `hg perfrevlogchunks`, one of the things that stood out was N * _chunk() calls was ~38x slower than 1 _chunks() call. Specifically, on the mozilla-unified repo: N_chunk: 0.528997s 1_chunks: 0.013735s This repo has 352,097 changesets. So the average time per changeset comes out to: N_chunk: 1.502us 1_chunks: 0.039us If you extrapolate these numbers to a repository with 1M changesets, that comes out to 1.502s versus 0.039s, which is significant. At these latencies, Python attribute lookups and function calls matter. So, this patch inlines some code to cut down on that overhead. The impact of this patch on N*_chunk() calls is clear: ! wall 0.528997 comb 0.520000 user 0.500000 sys 0.020000 (best of 19) ! wall 0.367723 comb 0.370000 user 0.350000 sys 0.020000 (best of 27) So, we go from ~38x slower to ~27x. A nice improvement. But there's still a long way to go. It's worth noting that functionality like revsets perform changelog lookups one revision at a time. So this code path is worth optimizing.	2016-10-22 15:41:23 -07:00
Gregory Szorc	52757f4357	revlog: reorder index accessors to match data structure order Index entries are ordered tuples. We have accessors in the revlog class to map tuple offsets to names. To help reinforce the order, reorder the methods so they match the order of elements in the tuple. While I'm here, also sneak in some minimal documentation.	2016-10-23 09:34:55 -07:00
Pierre-Yves David	b03bd97b6a	revlog: make 'storedeltachains' a "public" attribute The next changeset will make that attribute read by the changegroup packer. We make it "public" beforehand.	2016-10-14 02:25:08 +02:00
Gregory Szorc	748ec42334	revlog: add instance variable controlling delta chain use This is to support disabling delta chains on the changelog in a subsequent patch.	2016-09-24 12:25:37 -07:00
Gregory Szorc	fd27c38d85	revlog: document high frequency of code execution Recording my notes while working on performance optimization.	2016-08-24 20:18:58 -07:00
Gregory Szorc	670f80af85	revlog: make code in builddelta() slightly easier to read self.compress() is destructured into its components. "l" is renamed to "deltalen."	2016-08-24 20:00:52 -07:00
FUJIWARA Katsunori	2215e1134b	revlog: specify checkambig at writing to avoid file stat ambiguity This allows revlog-style files to be written out with checkambig=True easily. Because avoiding file stat ambiguity is needed only for filecache-ed manifest and changelog, this patch does: - use False for default value of checkambig - focus only on writing changes of index file out This patch also adds optional argument checkambig to _divert/_delay for changelog, to safely accept checkambig specified in revlog layer. But this argument can be fully ignored, because: - changes are written into other than index file, if name != target - changes are never written into index file, otherwise (into pending file by _divert, or into in-memory buffer by _delay) This is a part of ExactCacheValidationPlan. https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan	2016-09-22 21:51:58 +09:00
Gregory Szorc	e339efbb5b	revlog: use an LRU cache for delta chain bases Profiling using statprof revealed a hotspot during changegroup application calculating delta chain bases on generaldelta repos. Essentially, revlog._addrevision() was performing a lot of redundant work tracing the delta chain as part of determining when the chain distance was acceptable. This was most pronounced when adding revisions to manifests, which can have delta chains thousands of revisions long. There was a delta chain base cache on revlogs before, but it only captured a single revision. This was acceptable before generaldelta, when _addrevision would build deltas from the previous revision and thus we'd pretty much guarantee a cache hit when resolving the delta chain base on a subsequent _addrevision call. However, it isn't suitable for generaldelta because parent revisions aren't necessarily the last processed revision. This patch converts the delta chain base cache to an LRU dict cache. The cache can hold multiple entries, so generaldelta repos have a higher chance of getting a cache hit. The impact of this change when processing changegroup additions is significant. On a generaldelta conversion of the "mozilla-unified" repo (which contains heads of the main Firefox repositories in chronological order - this means there are lots of transitions between heads in revlog order), this change has the following impact when performing an `hg unbundle` of an uncompressed bundle of the repo: before: 5:42 CPU time after: 4:34 CPU time Most of this time is saved when applying the changelog and manifest revlogs: before: 2:30 CPU time after: 1:17 CPU time That nearly a 50% reduction in CPU time applying changesets and manifests! Applying a gzipped bundle of the same repo (effectively simulating a `hg clone` over HTTP) showed a similar speedup: before: 5:53 CPU time after: 4:46 CPU time Wall time improvements were basically the same as CPU time. I didn't measure explicitly, but it feels like most of the time is saved when processing manifests. This makes sense, as large manifests tend to have very long delta chains and thus benefit the most from this cache. So, this change effectively makes changegroup application (which is used by `hg unbundle`, `hg clone`, `hg pull`, `hg unshelve`, and various other commands) significantly faster when delta chains are long (which can happen on repos with large numbers of files and thus large manifests). In theory, this change can result in more memory utilization. However, we're caching a dict of ints. At most we have 200 ints + Python object overhead per revlog. And, the cache is really only populated when performing read-heavy operations, such as adding changegroups or scanning an individual revlog. For memory bloat to be an issue, we'd need to scan/read several revisions from several revlogs all while having active references to several revlogs. I don't think there are many operations that do this, so I don't think memory bloat from the cache will be an issue.	2016-08-22 21:48:50 -07:00
Gregory Szorc	60ecfeec38	revlog: remove unused variables	2016-08-22 20:17:36 -07:00
Augie Fackler	b3e5d375f6	revlog: use `iter(callable, sentinel)` instead of while True This is functionally equivalent, but is a little more concise.	2016-08-05 15:35:02 -04:00
Jun Wu	03c27804a3	revlog: add a fast path for "ambiguous identifier" Before fd1bb7c, if the C index.partialmatch raises RevlogError, the Python code raises "ambiguous identifier" error immediately, which is efficient. fd1bb7c took hidden revisions into consideration and forced the slow path enumerating the changelog to double-check hidden revisions. But it's not necessary if we know the revlog has no hidden revisions. This patch adds back the fast path for unfiltered revlogs.	2016-06-22 21:30:49 +01:00
Augie Fackler	9e4f37dc8d	revlog: use hashlib.sha1 directly instead of through util Also remove module-local _sha alias, which was barely used.	2016-06-10 00:10:34 -04:00
Gregory Szorc	812b01e150	revlog: remove unnecessary cache validation in _chunks Previously, we likely called _chunkraw() multiple times in order to ensure it didn't change out from under us. I'm pretty certain this code had its origins in the days where we attempted to have thread safety of localrepository and thus revlog instances. revlog instances are already not thread safe for writing. And, as of Mercurial 3.6, hgweb uses a separate localrepository instance per request, so there should only be a single thread reading a revlog at a time. We more or less decided that attempting to make classes like revlog thread safe is a lost cause. So, this patch removes thread safety from _chunks. As a result, we make one less call into _chunkraw() when the initial read isn't serviced by the cache. This translates to savings of 4 function calls overall and possibly prevents the creation of an additional buffer view into the cache. I doubt this translates into any real world performance wins because decompression will almost certainly dwarf time spent in _chunks(). But it does make the code simpler, so it is an improvement.	2015-11-22 17:57:35 -08:00
Gregory Szorc	f5985dcd63	revlog: return offset from _chunkraw() A subsequent patch will refactor _chunks() and the calculation of the offset will no longer occur in that function. Prepare by returning the offset from _chunkraw().	2016-01-05 19:51:51 -08:00
timeless	ebb1d48658	cleanup: remove superfluous space after space after equals (python)	2015-12-31 08:16:59 +00:00
Gregory Szorc	054a9bb83f	revlog: avoid string slice when decompressing u* chunks Revlog chunks can be stored uncompressed. If the first byte of the raw data is \0, we store the data as is. Else we prefix it with 'u'. Before, we performed a string slice to strip out the 'u' prefix. With this patch, we use a buffer to avoid an extra memory copy and associated garbage collection overhead. I was unable to verify any performance impact of this patch. For both mozilla-central and the hg repos, the number of manifest revisions with 'u' prefixes is very small - under 1%. So this change likely isn't called enough to have an impact on manifest reading. However, the reasoning behind this change is solid, so it should be safe.	2015-12-20 16:00:27 -08:00
Matt Mackall	a8248afde0	cleanup: back out performance hacks amended into previous commit	2015-12-21 14:52:18 -06:00
timeless	777dbfe303	commands: consistently indent notes 3 spaces most notes have 3 spaces for indentation, these had 2...	2015-12-18 06:33:48 +00:00
Gregory Szorc	46245e3fe8	revlog: refactor delta chain computation into own function This code is already written in multiple locations. While this code needs to be fast and extracting it to its own function adds overhead, code paths reading delta chains typically read, decompress, and do binary patching on revlog data from the delta chain. This other work (especially zlib decompression) almost certainly accounts for a lot more time than the overhead of introducing a Python function call. So I'm not worried about the performance impact of this change.	2015-12-20 18:56:05 -08:00
Gregory Szorc	cf051c1737	revlog: make clearcaches() more effective clearcaches() was added several years ago in 1e47437a1ca7 as part of implementing a perf command. Since revlog instances have many caches and since the spirit of this mostly unused method is to facilitate performance testing, I think it's appropriate for all the revlog's caches to get cleared when it is called.	2015-12-20 17:48:20 -08:00
Mike Edgar	44af48ee4a	changegroup: add flags field to cg3 delta header This lets revlog flags be transmitted over the wire. Right now this is useful for censored nodes and for narrowhg's ellipsis nodes.	2015-12-14 15:55:12 -05:00
Matt Mackall	e061768384	merge with stable	2015-12-18 14:40:11 -06:00
Gregory Szorc	2632a8c77a	revlog: seek to end of file before writing (issue4943) Revlogs were recently refactored to open file handles in "a+" and use a persistent file handle for reading and writing. This drastically reduced the number of file handles being opened. Unfortunately, it appears that some versions of Solaris lose the file offset when performing a write after the handle has been seeked. The simplest workaround is to seek to EOF on files opened in a+ mode before writing to them, which is what this patch does. Ideally, this code would exist in the vfs layer. However, this would require creating a proxy class for file objects in order to provide a custom implementation of write(). This would add overhead. Since revlogs are the only files we open in a+ mode, the one-off workaround in revlog.py should be sufficient. This patch appears to have little to no impact on performance on my Linux machine.	2015-12-17 17:16:02 -08:00
Gregory Szorc	25c5781010	revlog: use absolute_import	2015-12-12 23:22:18 -08:00
Martin von Zweigbergk	7a4ad651b5	revlog: don't consider nullrev when choosing delta base In the most complex case, we try using the incoming delta base, then we try both parents, and then we try the previous revlog entry. If none of these result in a good delta, we natually use the null revision as base. However, we sometimes consider the nullrev before we have exhausted our other options. Specifically, when both parents are null, we use the nullrev as delta base if it produces a good delta (according to _isgooddelta()), and we fail to try the previous revlog entry as delta base. After e60126c6093d (addrevision: use general delta when the incoming base delta is bad, 2015-12-01), it can also happen for non-merge commits when the incoming delta is not good. The Firefox repo (from many months back) shrinks a tiny bit with this patch: from 1.855GB to 1.830GB (1.4%). The hg repo itself shrinks even less: by less than 0.1%. There may be repos that get larger instead. This undoes the unexplained test change in e60126c6093d.	2015-12-04 17:46:56 -08:00
Martin von Zweigbergk	d9cd156585	revlog: make calls to _isgooddelta() consistent We always want to call _isgooddelta() before accepting a delta. We mostly call the function right after building the delta, but not always. Instead, we have an extra call at the end of the big code block. Let's make it consistent, so we call _isgooddelta() right after builddelta() and exactly once per delta. That also lets us rely on "delta is None" to mean we didn't find a good delta.	2015-12-04 17:14:14 -08:00
Martin von Zweigbergk	af588ff953	revlog: clarify which revision is added to 'tested' when using cached delta The tested delta revisions are added to the 'tested' set. These are the same revisions we pass to builddelta(). However, in one case, we add builddelta(rev)[3] to the set intead of adding 'rev' itself. In that particular case, that element is the same as the function's input revision (because self._generaldelta is true), so the effect is the same. Still, let's just add the function's input revision to avoid confusing future readers.	2015-12-04 16:45:06 -08:00
Martin von Zweigbergk	0788ad5e3c	revlog: remove unused variable 'chainlen'	2015-12-04 17:22:26 -08:00
Pierre-Yves David	18b7a437e6	addrevision: use general delta when the incoming base delta is bad We unify the delta selection process to be a simple three options process: - try to use the incoming delta (if lazydeltabase is on) - try to find a suitable parents to delta against (if gd is on) - try to delta against the tipmost revision The first of this option that yield a valid delta will be used. The test change in 'test-generaldelta.t' show this behavior as we use a delta against the parent instead of a full delta when the incoming delta is not suitable. This as some impact on 'test-bundle.t' because a delta somewhere changes. It does not seems to change the test semantic and have been ignored.	2015-12-01 16:15:59 -08:00
Pierre-Yves David	01dc52b10b	addrevision: rework generaldelta computation The old code have multiple explicit tests and code duplications. This makes it hard to improve the code. We rewrite the logic in a more generic way, not changing anything of the computed result. The final goal here is to eventually be able to: - factor out the default fallback case "try against 'prev'" in a single place - allow 'lazydeltabase' case to use the smarter general delta code path when the incoming base does not provide us with a good delta.	2015-12-01 18:45:16 -08:00
Pierre-Yves David	453d103fad	addrevision: only use the incoming base if it is a good delta (issue4975) Before this change, the 'lazydeltabase' would blindly build a delta using the base provided by the incoming bundle and try to use it. If that base was far down the revlog, the delta would be seen as "no good" and we would fall back to a full text revision. We now check if the delta is good and fallback to a computing a delta again the tipmost revision otherwise (as we would do without general delta). Later changesets will improve the logic to compute the fallback delta using the general delta logic.	2015-12-01 16:06:20 -08:00
Pierre-Yves David	41fc9ceddb	addrevision: handle code path not producing delta We would like to be able to exit the delta generation block without a valid delta (for a more flexible control flow). So we make sure we do not expand the "delta" content unless we actually have a delta. We can do it one level lower because 'delta' is initialised at None anyway. Not adding a level to the assignment prevent a line length issue.	2015-12-01 16:22:49 -08:00
Pierre-Yves David	e8bed37496	addrevision: rename 'd' to 'delta' That variable is quite central to the whole function. Giving it a more explicit name help with code readability.	2015-12-01 15:29:11 -08:00
Gregory Szorc	09f64f3e3e	revlog: improve documentation There are a lot of functions and variables doing similar things. Document the role and functionality of each to make it easier to grok.	2015-11-22 16:23:20 -08:00
Augie Fackler	e212c519f3	revlog: rename bundle to cg to reflect its nature as a cg?unpacker The new convention is that bundles contain changegroups. bundle1 happens to only be a changegroup, but bundle2 is a more featureful container that isn't something you can pass to addgroup().	2015-10-14 11:32:33 -04:00
Gregory Szorc	4b88c2e0ff	revlog: don't flush data file after every added revision The current behavior of revlogs is to flush the data file when writing data to it. Tracing system calls revealed that changegroup processing incurred numerous write(2) calls for values much smaller than the default buffer size (Python defaults to 4096, but it can be adjusted based on detected block size at run time by CPython). The reason we flush revlogs is so readers have all data available. For example, the current code in revlog.py will re-open the revlog file (instead of seeking an existing file handle) to read the text of a revision. This happens when starting a new delta chain when adding several revisions from changegroups, for example. Yes, this is likely sub-optimal (we should probably be sharing file descriptors between readers and writers to avoid the flushing and associated overhead of re-opening files). While flushing revlogs is necessary, it appears all callers are diligent about flushing files before a read is performed (see buildtext() in _addrevision()), making the flush in _writeentry() redundant and unncessary. So, we remove it. In practice, this means we incur a write(2) a) when the buffer is full (typically 4096 bytes) b) when a new delta chain is created rather than after every added revision. This applies to every revlog, but by volume it mostly impacts filelogs. Removing the redundant flush from _writeentry() significantly reduces the number of write(2) calls during changegroup processing on my Linux machine. When applying a changegroup of the hg repo based on my local repo, the total number of write(2) calls during application of the mercurial/localrepo.py revlogs dropped from 1,320 to 217 with this patch applied. Total I/O related system calls dropped from 1,577 to 474. When unbundling a mozilla-central gzipped bundle (264,403 changesets with 1,492,215 changes to 222,507 files), total write(2) calls dropped from 1,252,881 to 827,106 and total system calls dropped from 3,601,259 to 3,178,636 - a reduction of 425,775! While the system call reduction is significant, it appears to have no impact on wall time on my Linux and Windows machines. Still, fewer syscalls is fewer syscalls. Surely this can't hurt. If nothing else, it makes examining remaining system call usage simpler and opens the door to experimenting with the performance impact of different buffer sizes.	2015-09-26 21:43:13 -07:00
Gregory Szorc	5b6556c143	revlog: use existing file handle when reading during _addrevision _addrevision() may need to read from revlogs as part of computing deltas. Previously, we would flush existing file handles and open a new, short-lived file handle to perform the reading. If we have an existing file handle, it seems logical to reuse it for reading instead of opening a new file handle. This patch makes that the new behavior. After this patch, revlog files are only reopened when adding revisions if the revlog is switched from inline to non-inline. On Linux when unbundling a bundle of the mozilla-central repo, this patch has the following impact on system call counts: Call Before After Delta write 827,639 673,390 -154,249 open 700,103 684,089 -16,014 read 74,489 74,489 0 fstat 493,924 461,896 -32,028 close 249,131 233,117 -16,014 stat 242,001 242,001 0 lstat 18,676 18,676 0 lseek 20,268 20,268 0 ioctl 14,652 13,173 -1,479 TOTAL 3,180,758 2,930,679 -250,079 It's worth noting that many of the open() calls fail due to missing files. That's why there are many more open() calls than close(). Despite the significant system call reduction, this change does not seem to have a significant performance impact on Linux. On Windows 10 (not a VM, on a SSD), this patch appears to reduce unbundle time for mozilla-central from ~960s to ~920s. This isn't as significant as I was hoping. But a decrease it is nonetheless. Still, Windows unbundle performance is still >2x slower than Linux. Despite the lack of significant gains, fewer system calls is fewer system calls. If nothing else, this will narrow the focus of potential areas to optimize in the future.	2015-09-27 16:08:18 -07:00
Gregory Szorc	4f8cd71a6f	revlog: always open revlogs for reading and appending An upcoming patch will teach revlogs to use the existing file handle to read revision data instead of opening a new file handle just for quick reads. For this to work, files must be opened for reading as well. This patch is merely cosmetic: there are no behavior changes.	2015-09-27 15:59:19 -07:00
Gregory Szorc	574108c7d9	revlog: support using an existing file handle when reading revlogs Currently, the low-level revlog reading code always opens a new file handle. In some key scenarios, the revlog is already opened and an existing file handle could be used to read. This patch paves the road to that by teaching various revlog reading functions to accept an optional existing file handle to read from.	2015-09-27 15:48:35 -07:00
Gregory Szorc	106b00c51d	revlog: add docstring for checkinlinesize() The name is deceptive: it does more than just "check." Add a docstring to clarify what's going on.	2015-09-27 15:31:50 -07:00
Gregory Szorc	7769f0b9ff	revlog: optionally cache the full text when adding revisions revlog instances can cache the full text of a single revision. Typically the most recently read revision is cached. When adding a delta group via addgroup() and _addrevision(), the full text isn't always computed: sometimes only the passed in delta is sufficient for adding a new revision to the revlog. When writing the changelog from a delta group, the just-added full text revision is always read immediately after it is written because the changegroup code needs to extract the set of files from the entry. In other words, revision() is always being called and caching the full text of the just-added revision is guaranteed to result in a cache hit, making the cache worthwhile. This patch adds support to _addrevision() for always building and caching the full text. This option is currently only active when processing changelog entries from a changegroup. While the total number of revision() calls is the same, the location matters: buildtext() calls into revision() on the base revision when building the full text of the just-added revision. Since the previous revision's _addrevision() built the full text and the the previous revision is likely the base revision, this means that the base revision's full text is likely cached and can be used to compute the current full text from just a delta. No extra I/O required. The end result is the changelog isn't opened and read after adding every revision from a changegroup. On my 2013 MacBook Pro running OS X 10.10.5 from an SSD and Python 2.7, this patch impacted the time taken to apply ~262,000 changesets from a mozilla-central gzip bundle: before: ~43s after: ~32s ~25% reduction in changelog processing times. Not bad.	2015-09-12 16:11:17 -07:00
Gregory Szorc	7f63b5672e	revlog: drop local assignment of cache variable The purpose of this code was to provide thread safety. With the conversion of hgweb to use separate localrepository instances per request/thread, we should no longer have any consumers that need to access revlog instances from multiple threads. Remove the code.	2015-09-12 15:16:47 -07:00
Gregory Szorc	7784d9ad76	revlog: rename generic "i" variable to "indexdata" Increase readability.	2015-09-12 12:47:00 -07:00
Durham Goode	be92ed2487	revlog: add an aggressivemergedelta option This adds an option for delta'ing against both p1 and p2 when applying merge revisions and picking whichever is smallest. Some before and after stats on manifest.d size: internal large repo: before: 1.2 GB after: 930 MB mozilla-central: before: 261 MB after: 92 MB	2015-08-30 14:03:32 -07:00
Durham Goode	98500575fc	revlog: change generaldelta delta parent heuristic The old generaldelta heuristic was "if p1 (or p2) was closer than the last full text, use it, otherwise use prev". This was problematic when a repo contained multiple branches that were very different. If commits to branch A were pushed, and the last full text was branch B, it would generate a fulltext. Then if branch B was pushed, it would generate another fulltext. The problem is that the last fulltext (and delta'ing against `prev` in general) has no correlation with the contents of the incoming revision, and therefore will always have degenerate cases. According to the blame, that algorithm was chosen to minimize the chain length. Since there is already code that protects against that (the delta-vs-fulltext code), and since it has been improved since the original generaldelta algorithm went in (2011), I believe the chain length criteria will still be preserved. The new algorithm always diffs against p1 (or p2 if it's closer), unless the resulting delta will fail the delta-vs-fulltext check, in which case we delta against prev. Some before and after stats on manifest.d size. internal large repo old heuristic - 2.0 GB new heuristic - 1.2 GB mozilla-central old heuristic - 242 MB new heuristic - 261 MB The regression in mozilla central is due to the new heuristic choosing p2r as the delta when it's closer to the tip. Switching the algorithm to always prefer p1r brings the size back down (242 MB). This is result of the way in which mozilla does merges and pushes, and the result could easily swing the other direction in other repos (depending on if they merge X into Y or Y into X), but will never be as degenerate as before. I future patch will address the regression by introducing an optional, even more aggressive delta heuristic which will knock the mozilla manifest size down dramatically.	2015-08-30 13:58:11 -07:00
Durham Goode	47e86258b9	revlog: move textlen calculation to be above delta chooser This moves the textlen calculation to be above the delta chooser. Since textlen is needed for calling isgooddelta, we need it above the delta chooser so future patches can call isgooddelta.	2015-08-30 13:34:30 -07:00
Durham Goode	2ff0cbe125	revlog: move delta check to it's own function This moves the delta vs fulltext comparison to its own function. This will allow us to reuse the function in future patches for more efficient delta choices. As a side effect, this will also allow extensions to modify our delta criteria.	2015-08-30 13:33:00 -07:00
Pierre-Yves David	c8b7676e04	format: introduce 'format.usegeneraldelta` This option will make repositories created as general delta by default but will not make Mercurial aggressively recompute deltas for all incoming bundle. Instead, the delta contained in the bundle will be used. This will allow us to start having general delta repositories created everywhere without triggering massive recomputation costs for all new clients cloning from old servers.	2015-11-02 15:59:12 +00:00
Yuya Nishihara	f08cad7728	revlog: remove unused shaoffset constants Call sites were removed at 1299f0c14572, "revlog: remove lazy index".	2015-08-02 12:16:19 +09:00
Yuya Nishihara	28a45149e9	revlog: correct comment about size of v0 index format	2015-08-02 01:14:11 +09:00
Gregory Szorc	a60875d3fb	revlog: add support for a callback whenever revisions are added A subsequent patch will add a feature that performs iterative computation as changesets are added from a changegroup. To facilitate this type of processing in a generic manner, we add a mechanism for calling a function whenever a revision is added via revlog.addgroup(). There are potential performance concerns with this callback, as using it will flush the revlog after every revision is added.	2015-07-18 10:29:37 -07:00
Gregory Szorc	5380dea2a7	global: mass rewrite to use modern exception syntax Python 2.6 introduced the "except type as instance" syntax, replacing the "except type, instance" syntax that came before. Python 3 dropped support for the latter syntax. Since we no longer support Python 2.4 or 2.5, we have no need to continue supporting the "except type, instance". This patch mass rewrites the exception syntax to be Python 2.6+ and Python 3 compatible. This patch was produced by running `2to3 -f except -w -n .`.	2015-06-23 22:20:08 -07:00
Matt Mackall	2f915e33db	revlog: move size limit check to addrevision This lets us add the name of the indexfile to the message.	2015-06-04 14:57:58 -05:00
Jordi Gutiérrez Hermoso	211712bc95	revlog: raise an exception earlier if an entry is too large (issue4675) Before we were relying on _pack to error out when trying to pass an integer that was too large for the "i" format specifier. Now we check this earlier so we can form a better error message. The error message unfortunately must exclude the filename at this level of the call stack. The problem is that this name is not available here, and the error can be triggered by a large manifest or by a large file itself. Although perhaps we could provide the name of a revlog index file (from the revlog object, instead of the revlogio object), this seems like too much leakage of internal data structures. It's not ideal already that an error message even mentions revlogs, but this does seem unavoidable here.	2015-06-02 15:04:39 -04:00
Laurent Charignon	e470f84c03	phases: fix bug where native phase computation wasn't called I forgot to include this change as a previous diff and the native code to compute the phases was never called. The AttributeError was silently caught and the pure implementation was used instead.	2015-05-29 14:24:50 -07:00
Martin von Zweigbergk	4127f67694	util: drop alias for collections.deque Now that util.deque is just an alias for collections.deque, let's just remove it.	2015-05-16 11:28:04 -07:00
Mike Edgar	c2ecfc164f	revlog: make converting from inline to non-line work after a strip The checkinlinesize function, which converts inline revlogs to non-inline, uses the current transaction's "data" field to determine how to update the transaction after the conversion. This change works around the missing data field, which is not in the transaction after a strip.	2015-03-25 15:58:31 -04:00
Laurent Charignon	6b06793e68	phase: default to C implementation for phase computation	2015-03-20 11:14:27 -07:00
Mike Edgar	93a70657fb	revlog: addgroup checks if incoming deltas add censored revs, sets flag bit A censored revision stored in a revlog should have the censored revlog index flag bit set. This implies we must know if a revision is censored before we add it to the revlog. When adding revisions from exchanged deltas, we would prefer to determine this flag without decoding every single full text. This change introduces a heuristic based on assumptions around the Mercurial delta format and filelog metadata. Since deltas which produce a censored revision must be full-replacement deltas, we can read the delta's first bytes to check the filelog metadata. Since "censored" is the alphabetically first filelog metadata key, censored filelog revisions have a well-known prefix we can look for. For more on the design and background of the censorship feature, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-01-14 15:16:08 -05:00
Mike Edgar	3562a06261	revlog: _addrevision creates full-replace deltas based on censored revisions A delta against a censored revision is either received through exchange and written blindly to a revlog, or it is created by the revlog itself. This change ensures the latter process creates deltas which fully replace all data in a censored base using a single patch operation. Recipients of a delta against a censored base will verify that the delta is in this full-replace format. Other recipients will use the delta as normal. For background and broader design of the censorship feature, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-01-21 17:11:37 -05:00
Mike Edgar	0a639adee5	revlog: special case expanding full-replacement deltas received by exchange When a delta received through exchange is added to a revlog, it will very often be expanded to a full text by applying the delta to its base. If that delta is of a particular form, we can avoid decoding the base revision. This avoids an exception if the base revision is censored. For background and broader design of the censorship feature, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-02-06 01:38:16 +00:00
Mike Edgar	9635f8c5b0	revlog: in addgroup, reject ill-formed deltas based on censored nodes To ensure interoperability when clones disagree about which file nodes are censored, a restriction is made on deltas based on censored nodes. Any such delta must replace the full text of the base in a single patch. If the recipient of a delta considers the base to be censored and the delta is not in the expected form, the recipient must reject it, as it can't know if the source has also censored the base. For background and broader design of the censorship feature, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-02-06 00:55:29 +00:00
Mike Edgar	c736894d9c	revlog: add "iscensored()" to revlog public API The iscensored method will be used by the exchange layer to reject nonconforming deltas involving censored revisions (and to produce conforming deltas). For background and broader design of the censorship feature, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-01-23 17:01:39 -05:00
Yuya Nishihara	237120f282	revlog: add __contains__ for fast membership test Because revlog implements __iter__, "rev in revlog" works but does silly O(n) lookup unexpectedly. So it seems good to add fast version of __contains__. This allows "rev in repo.changelog" in the next patch.	2015-02-04 21:25:57 +09:00
Mike Edgar	39ea63d3f6	revlog: verify censored flag when hashing added revision fulltext When receiving a delta via exchange, three possible storage outcomes emerge: 1. The delta is added directly to the revlog. ("fast-path") 2. A freshly-computed delta with a different base is stored. 3. The new revision's fulltext is computed and stored outright. Both (2) and (3) require materializing the full text of the new revision by applying the delta to its base. This is typically followed by a hash check. The new flags argument allows callers to _addrevision to signal that they expect that hash check to fail. We can use this opportunity to verify that expectation. If the hash fails, require the flag be set; if the hash passes, require the flag be unset. Rather than simply eliding the hash check, this approach provides some assurance that the censored flag is not applied to valid revisions. Read more at: http://mercurial.selenic.com/wiki/CensorPlan	2015-01-12 14:41:25 -05:00
Mike Edgar	4b3ca8d71c	revlog: add flags argument to _addrevision, update callers use default flags For revlog index flags to be useful to other parts of Mercurial, they need to be settable when writing revisions. The current use case for revlog index flags is the censorship feature: http://mercurial.selenic.com/wiki/CensorPlan While the censor flag could be inferred in _addrevision by interrogating the text/delta being added, that would bury the censorship logic and inappropriately couple it to all revision creation.	2015-01-12 14:30:24 -05:00
Mike Edgar	339a2739c2	revlog: define censored flag for revlogng index This flag bit will be used to cheaply signal censorship presence to upper layers (exchange, verify). It indicates that censorship metadata is present but does not attest to the verifiability of that metadata. For the censorship design, see: http://mercurial.selenic.com/wiki/CensorPlan	2015-01-12 14:01:52 -05:00
Siddharth Agarwal	2d669c474b	revlog: switch findmissing* methods to incrementalmissingrevs This will allow us to remove ancestor.missingancestors in an upcoming patch.	2014-11-14 16:52:40 -08:00
Siddharth Agarwal	5692148f49	revlog: add a method to get missing revs incrementally This will turn out to be useful for discovery.	2014-11-16 00:39:48 -08:00
Siddharth Agarwal	7103eb28ea	ancestor.lazyancestors: take parentrevs function rather than changelog Principle of least privilege, and it also brings this in line with missingancestors.	2014-11-14 14:36:25 -08:00
Siddharth Agarwal	8354d9169f	revlog: cache chain info after calculating it for a rev (issue4452) This dumb cache works surprisingly well: on a repository with typical delta chains ~50k in length, unbundling a linear series of 5000 revisions (changelogs and manifests only) went from 60 seconds to 3.	2014-11-13 21:36:38 -08:00
Siddharth Agarwal	1acd4cfca4	revlog: increase I/O bound to 4x the amount of data consumed This doesn't affect normal clones since they'd be bound by the CPU bound below anyway -- it does, however, improve generaldelta clones significantly. This also results in better deltaing for generaldelta clones -- in generaldelta clones, we calculate deltas with respect to the closest base if it has a higher revision number than either parent. If the base is on a significantly different branch, this can result in pointlessly massive deltas. This reduces the number of bases and hence the number of bad deltas. Empirically, for a highly branchy repository, this resulted in an improvement of around 15% to manifest size.	2014-11-11 20:08:19 -08:00
Siddharth Agarwal	fe51051ee5	revlog: bound based on the length of the compressed deltas This is only relevant for generaldelta clones.	2014-11-11 20:01:19 -08:00
Siddharth Agarwal	27976ad2dc	revlog: compute length of compressed deltas along with chain length In upcoming patches to the revlog, we're going to split up the notions of bounding I/O and bounding CPU.	2014-11-11 19:54:36 -08:00
Siddharth Agarwal	6e115e5383	revlog: store fulltext when compressed delta is bigger than it This is a very silly case and not particularly likely to happen in the wild, but it turns out we can hit it in a couple of places. As we tune the storage parameters we're likely to hit more such cases. The affected test cases all have smaller revlogs now.	2014-11-11 21:41:12 -08:00
Siddharth Agarwal	e5d387f47e	revlog: make a predicate clearer with parens	2014-11-11 21:39:56 -08:00
Mateusz Kwapich	3433abb6a8	revlog: add config variable for limiting delta-chain length The current heuristic for deciding between storing delta and full texts is based on ratio of (sizeofdeltas)/(sizeoffulltext). In some cases (for example a manifest for ahuge repo) this approach can result in extremely long delta chains (~30,000) which are very slow to read. (In the case of a manifest ~500ms are added to every hg command because of that). This commit introduces "revlog.maxchainlength" configuration variable that will limit delta chain length.	2014-11-06 14:20:05 -08:00
Mateusz Kwapich	1a554418d5	debugrevlog: fix computing chain length in debugrevlog -d The chain length was computed correctly only when generaldelta feature was enabled. Now it's fixed. When generaldelta is disabled the base revision in revlog index is not the revision we have delta against - it's always previous revision. Instead of incorrect chainbaseandlen in command.py we are now using two single-responsibility functions in revlog.py: - chainbase(rev) - chainlen(rev) Only chainlen(rev) was missing so it was written to mimic the way the chain of deltas is actually found during file reconstruction.	2014-11-06 14:08:25 -08:00

1 2 3 4 5 ...

612 Commits