Commit Graph

480 Commits

Author SHA1 Message Date
Mike Edgar
0a639adee5 revlog: special case expanding full-replacement deltas received by exchange
When a delta received through exchange is added to a revlog, it will very
often be expanded to a full text by applying the delta to its base. If
that delta is of a particular form, we can avoid decoding the base revision.
This avoids an exception if the base revision is censored.

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-02-06 01:38:16 +00:00
Mike Edgar
9635f8c5b0 revlog: in addgroup, reject ill-formed deltas based on censored nodes
To ensure interoperability when clones disagree about which file nodes are
censored, a restriction is made on deltas based on censored nodes. Any such
delta must replace the full text of the base in a single patch.

If the recipient of a delta considers the base to be censored and the delta
is not in the expected form, the recipient must reject it, as it can't know
if the source has also censored the base.

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-02-06 00:55:29 +00:00
Mike Edgar
c736894d9c revlog: add "iscensored()" to revlog public API
The iscensored method will be used by the exchange layer to reject
nonconforming deltas involving censored revisions (and to produce
conforming deltas).

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-01-23 17:01:39 -05:00
Yuya Nishihara
237120f282 revlog: add __contains__ for fast membership test
Because revlog implements __iter__, "rev in revlog" works but does silly O(n)
lookup unexpectedly. So it seems good to add fast version of __contains__.

This allows "rev in repo.changelog" in the next patch.
2015-02-04 21:25:57 +09:00
Mike Edgar
39ea63d3f6 revlog: verify censored flag when hashing added revision fulltext
When receiving a delta via exchange, three possible storage outcomes emerge:

1. The delta is added directly to the revlog. ("fast-path")
2. A freshly-computed delta with a different base is stored.
3. The new revision's fulltext is computed and stored outright.

Both (2) and (3) require materializing the full text of the new revision by
applying the delta to its base. This is typically followed by a hash check.

The new flags argument allows callers to _addrevision to signal that they
expect that hash check to fail. We can use this opportunity to verify that
expectation. If the hash fails, require the flag be set; if the hash passes,
require the flag be unset.

Rather than simply eliding the hash check, this approach provides some
assurance that the censored flag is not applied to valid revisions.

Read more at: http://mercurial.selenic.com/wiki/CensorPlan
2015-01-12 14:41:25 -05:00
Mike Edgar
4b3ca8d71c revlog: add flags argument to _addrevision, update callers use default flags
For revlog index flags to be useful to other parts of Mercurial, they need to
be settable when writing revisions. The current use case for revlog index
flags is the censorship feature: http://mercurial.selenic.com/wiki/CensorPlan

While the censor flag could be inferred in _addrevision by interrogating the
text/delta being added, that would bury the censorship logic and
inappropriately couple it to all revision creation.
2015-01-12 14:30:24 -05:00
Mike Edgar
339a2739c2 revlog: define censored flag for revlogng index
This flag bit will be used to cheaply signal censorship presence to upper
layers (exchange, verify). It indicates that censorship metadata is present
but does not attest to the verifiability of that metadata.

For the censorship design, see: http://mercurial.selenic.com/wiki/CensorPlan
2015-01-12 14:01:52 -05:00
Siddharth Agarwal
2d669c474b revlog: switch findmissing* methods to incrementalmissingrevs
This will allow us to remove ancestor.missingancestors in an upcoming patch.
2014-11-14 16:52:40 -08:00
Siddharth Agarwal
5692148f49 revlog: add a method to get missing revs incrementally
This will turn out to be useful for discovery.
2014-11-16 00:39:48 -08:00
Siddharth Agarwal
7103eb28ea ancestor.lazyancestors: take parentrevs function rather than changelog
Principle of least privilege, and it also brings this in line with
missingancestors.
2014-11-14 14:36:25 -08:00
Siddharth Agarwal
8354d9169f revlog: cache chain info after calculating it for a rev (issue4452)
This dumb cache works surprisingly well: on a repository with typical delta
chains ~50k in length, unbundling a linear series of 5000 revisions (changelogs
and manifests only) went from 60 seconds to 3.
2014-11-13 21:36:38 -08:00
Siddharth Agarwal
1acd4cfca4 revlog: increase I/O bound to 4x the amount of data consumed
This doesn't affect normal clones since they'd be bound by the CPU bound below
anyway -- it does, however, improve generaldelta clones significantly.

This also results in better deltaing for generaldelta clones -- in generaldelta
clones, we calculate deltas with respect to the closest base if it has a higher
revision number than either parent. If the base is on a significantly different
branch, this can result in pointlessly massive deltas. This reduces the number
of bases and hence the number of bad deltas.

Empirically, for a highly branchy repository, this resulted in an improvement
of around 15% to manifest size.
2014-11-11 20:08:19 -08:00
Siddharth Agarwal
fe51051ee5 revlog: bound based on the length of the compressed deltas
This is only relevant for generaldelta clones.
2014-11-11 20:01:19 -08:00
Siddharth Agarwal
27976ad2dc revlog: compute length of compressed deltas along with chain length
In upcoming patches to the revlog, we're going to split up the notions of
bounding I/O and bounding CPU.
2014-11-11 19:54:36 -08:00
Siddharth Agarwal
6e115e5383 revlog: store fulltext when compressed delta is bigger than it
This is a very silly case and not particularly likely to happen in the wild,
but it turns out we can hit it in a couple of places. As we tune the storage
parameters we're likely to hit more such cases.

The affected test cases all have smaller revlogs now.
2014-11-11 21:41:12 -08:00
Siddharth Agarwal
e5d387f47e revlog: make a predicate clearer with parens 2014-11-11 21:39:56 -08:00
Mateusz Kwapich
3433abb6a8 revlog: add config variable for limiting delta-chain length
The current heuristic for deciding between storing delta and full texts
is based on ratio of (sizeofdeltas)/(sizeoffulltext).

In some cases (for example a manifest for ahuge repo) this approach
can result in extremely long delta chains (~30,000) which are very slow to
read. (In the case of a manifest ~500ms are added to every hg command because of that).

This commit introduces "revlog.maxchainlength" configuration variable that will
limit delta chain length.
2014-11-06 14:20:05 -08:00
Mateusz Kwapich
1a554418d5 debugrevlog: fix computing chain length in debugrevlog -d
The chain length was computed correctly only when generaldelta
feature was enabled. Now it's fixed.

When generaldelta is disabled the base revision in revlog index is not
the revision we have delta against - it's always previous revision.

Instead of incorrect chainbaseandlen in command.py we are now using two
single-responsibility functions in revlog.py:
 - chainbase(rev)
 - chainlen(rev)
Only chainlen(rev) was missing so it was written to mimic the way the
chain of deltas is actually found during file reconstruction.
2014-11-06 14:08:25 -08:00
Mike Edgar
ba052f742a revlog: support importing censored file revision tombstones
This change allows a revision log to not fail integrity checks when applying a
changegroup delta (eg from a bundle) results in a censored file tombstone. The
tombstone is inserted as-is, so future integrity verification will observe the
tombstone. Deltas based on the tombstone will also remain correct.

The new code path is encountered for *exactly* the cases where _addrevision is
importing a tombstone from a changegroup. When committing a file containing
the "magic" tombstone text, the "text" parameter will be non-empty and the
checkhash call is not executed (and when committing, the node will be computed
to match the "magic" tombstone text).
2014-09-03 16:34:29 -04:00
Augie Fackler
b2278203f5 revlog: move references to revlog.hash to inside the revlog class
This will make it possible for subclasses to have different hashing
schemes when appropriate. I anticipate using this in manifests.

Note that there's still one client of mercurial.revlog.hash() outside
of revlog: mercurial.context.memctx uses it to construct the file
entries in an in-memory manifest. I don't think this will be a problem
in the immediate future, so I've left it as-is.
2014-09-24 15:14:44 -04:00
Augie Fackler
44b8666b93 revlog: mark nullhash as module-private
No other module should ever need this, so mark it with _ so nobody
tries to use it.
2014-09-24 15:10:52 -04:00
Mads Kiilerich
fae32dd0a3 comments: describe ancestor consistently - avoid 'least common ancestor'
"best" is as defined by mercurial.ancestor.ancestors: furthest from a root (as
measured by longest path).
2014-08-19 01:13:10 +02:00
Mads Kiilerich
b46e11faa6 revlog: introduce isancestor method for efficiently determining node lineage
Hide the not so obvious use of commonancestorsheads.
2014-08-19 01:13:10 +02:00
Matt Mackall
9bc396577d repoview: fix 0L with pack/unpack for 2.4 2014-08-26 13:11:53 +02:00
Matt Mackall
dad18b6446 revlog: fix check-code error 2014-06-14 11:49:02 -05:00
Matt Mackall
6faaeed973 revlog: hold a private reference to self._cache
This keeps other threads from modifying self._cache out from under us.
With this and the previous fix, 'hg serve' survives 100k hits with siege.
2014-06-13 14:17:14 -05:00
Matt Mackall
6c57e49897 revlog: make _chunkcache access atomic
With this and related fixes, 'hg serve' survived 100000 hits with
siege.
2014-06-13 14:16:03 -05:00
Mads Kiilerich
cb0184290d revlog: backout 1c95c1863327 - commonancestors 2014-04-17 20:01:39 +02:00
Mads Kiilerich
1478c563ea revlog: introduce commonancestorsheads method
Very similar to commonancestors but giving all the common ancestors heads.
2014-04-17 20:01:35 +02:00
Matt Mackall
4610a4e481 merge with stable 2014-04-10 12:41:39 -04:00
Matt Mackall
b154d8d029 revlog: deal with chunk ranges over 2G on Windows (issue4215)
Python uses a C long (32 bits on Windows 64) rather than an ssize_t in
read(), and thus has a 2G size limit. Work around this by falling back
to reading one chunk at a time on overflow. This approximately doubles
our headroom until we run back into the size limit on single reads.
2014-04-07 14:18:10 -05:00
Mads Kiilerich
de55a8fdda revlog: introduce commonancestors method for getting all common ancestor heads 2014-02-24 22:42:14 +01:00
Durham Goode
b147e53e3f revlog: move file writing to a separate function
Moves the code that actually writes to a file to a separate function in
revlog.py. This allows extensions to intercept and use the data being written to
disk. For example, an extension might want to replicate these writes elsewhere.

When cloning the Mercurial repo on /dev/shm with --pull, I see about a 0.3% perf change.
It goes from 28.2 to 28.3 seconds.
2013-11-26 12:58:27 -08:00
Brodie Rao
43ab01245b revlog: allow tuning of the chunk cache size (via format.chunkcachesize)
Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:

$ hg --config format.chunkcachesize=1024 perfmoonwalk
! wall 2.022021 comb 2.030000 user 1.970000 sys 0.060000 (best of 5)

(16,154 cache hits, 3,840 misses.)

$ hg --config format.chunkcachesize=4096 perfmoonwalk
! wall 1.901006 comb 1.900000 user 1.880000 sys 0.020000 (best of 6)

(19,003 hits, 991 misses.)

$ hg --config format.chunkcachesize=16384 perfmoonwalk
! wall 1.802775 comb 1.800000 user 1.800000 sys 0.000000 (best of 6)

(19,746 hits, 248 misses.)

$ hg --config format.chunkcachesize=32768 perfmoonwalk
! wall 1.818545 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)

(19,870 hits, 124 misses.)

$ hg --config format.chunkcachesize=65536 perfmoonwalk
! wall 1.801350 comb 1.810000 user 1.800000 sys 0.010000 (best of 6)

(19,932 hits, 62 misses.)

$ hg --config format.chunkcachesize=131072 perfmoonwalk
! wall 1.805879 comb 1.820000 user 1.810000 sys 0.010000 (best of 6)

(19,963 hits, 31 misses.)

We may want to change the default size in the future based on testing and
user feedback.
2013-11-17 18:04:29 -05:00
Brodie Rao
976b336ba8 revlog: read/cache chunks in fixed windows of 64 KB
When reading a revlog chunk, instead of reading up to 64 KB ahead of the
request offset and caching that, this change caches a fixed window before
and after the requested data that falls on 64 KB boundaries. This increases
cache hits when reading revlogs backwards.

Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:

$ hg perfmoonwalk
! wall 2.307994 comb 2.310000 user 2.120000 sys 0.190000 (best of 5)

(Each run has 10,668 cache hits and 9,304 misses.)

After this change:

$ hg perfmoonwalk
! wall 1.814117 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)

(19,931 cache hits, 62 misses.)

On a busy NFS share, before this change:

$ hg perfmoonwalk
! wall 17.000034 comb 4.100000 user 3.270000 sys 0.830000 (best of 3)

After:

$ hg perfmoonwalk
! wall 1.746115 comb 1.670000 user 1.660000 sys 0.010000 (best of 5)
2013-11-17 18:04:28 -05:00
Durham Goode
d8c96277e4 strip: add faster revlog strip computation
The previous revlog strip computation would walk every rev in the revlog, from
the bottom to the top.  Since we're usually stripping only the top few revs of
the revlog, this was needlessly expensive on large repos.

The new algorithm walks the exact number of revs that will be stripped, thus
making the operation not dependent on the number of revs in the repo.

This makes amend on a large repo go from 8.7 seconds to 6 seconds.
2013-11-11 16:42:49 -08:00
Durham Goode
9840379432 revlog: return lazy set from findcommonmissing
When computing the commonmissing, it greedily computes the entire set
immediately. On a large repo where the majority of history is irrelevant, this
causes a significant slow down.

Replacing it with a lazy set makes amend go from 11 seconds to 8.7 seconds.
2013-11-11 16:40:02 -08:00
Matt Mackall
2513030d97 merge with stable 2013-09-23 11:37:06 -07:00
Wojciech Lopata
0a0c3321e2 generaldelta: initialize basecache properly
Previously basecache was incorrectly initialized before adding the first
revision from a changegroup. Basecache value influences when full revisions are
stored in revlog (when using generaldelta). As a result it was possible to
generate a generaldelta-revlog that could be bigger by arbitrary factor than its
non-generaldelta equivalent.
2013-09-20 10:45:51 -07:00
Siddharth Agarwal
64c41699fa revlog: remove _chunkbase since it is no longer used
This was introduced in 2011 for the lwcopy feature but never actually got used.
A similar hook can easily be reintroduced if needed in the future.
2013-09-06 23:05:33 -07:00
Siddharth Agarwal
bcd16c3892 revlog: move chunk cache preload from revision to _chunks
In case we don't have a cached text already, add the base rev to the list
passed to _chunks. In the cached case this also avoids unnecessarily preloading
the chunk for the cached rev.
2013-09-06 23:05:11 -07:00
Siddharth Agarwal
61f0e46895 revlog._chunks: inline getchunk
We do this in a somewhat hacky way, relying on the fact that our sole caller
preloads the cache right before calling us. An upcoming patch will make this
more sensible.

For a 20 MB manifest with a delta chain of > 40k, perfmanifest goes from 0.49
seconds to 0.46.
2013-09-06 22:57:51 -07:00
Siddharth Agarwal
b7abdca3c1 revlog.revision: fix cache preload for inline revlogs
Previously the length of data preloaded did not account for the interleaved io
contents. This meant that we'd sometimes have cache misses in _chunks despite
the preloading.

Having a correctly filled out cache will become essential in an upcoming patch.
2013-09-07 12:42:46 -07:00
Siddharth Agarwal
d24e970042 revlog: add a fast method for getting a list of chunks
This moves _chunkraw into the loop. Doing that improves revlog decompression --
in particular, manifest decompression -- significantly. For a 20 MB manifest
which is the result of a > 40k delta chain, hg perfmanifest improves from 0.55
seconds to 0.49 seconds.
2013-09-06 16:31:35 -07:00
Wojciech Lopata
3a79365ccc revlog: pass node as an argument of addrevision
This change will allow revlog subclasses that override 'checkhash' method
to use custom strategy of computing nodeids without overriding 'addrevision'
method. In particular this change is necessary to implement manifest
compression.
2013-08-19 11:25:23 -07:00
Wojciech Lopata
299c718f66 revlog: extract 'checkhash' method
Extract method that decides whether nodeid is correct for paricular revision
text and parent nodes. Having this method extracted will allow revlog
subclasses to implement custom way of computing nodes. In particular this
change is necessary to implement manifest compression.
2013-08-19 11:06:38 -07:00
Matt Mackall
06155d5c8a revlog: handle hidden revs in _partialmatch (issue3979)
Looking up hidden prefixes could cause a no node exception
Looking up unique non-hidden prefixes could be ambiguous
2013-07-23 17:28:12 -05:00
Durham Goode
e750755eba revlog: add exception when linkrev == nullrev
When we deployed the latest crew mercurial to our users, a few of them
had issues where a filelog would have an entry with a -1 linkrev. This
caused operations like rebase and amend to create a bundle containing the
entire repository, which took a long time.

I don't know what the issue is, but adding this check should prevent repos
from getting in this state, and should help us pinpoint the issue next time
it happens.
2013-06-17 19:44:00 -07:00
Sune Foldager
6bd4fdfe9d bundle-ng: move group into the bundler
No additional semantic changes made.
2013-05-10 21:03:01 +02:00
Alexander Plavin
48936f264c revlog: fix a regression with null revision
Introduced in the patch which fixes issue3497
Part of that patch was erroneously submitted and it shouldn't be in the code
2013-04-18 16:46:09 +04:00