Commit Graph

459 Commits

Author SHA1 Message Date
Mads Kiilerich
fae32dd0a3 comments: describe ancestor consistently - avoid 'least common ancestor'
"best" is as defined by mercurial.ancestor.ancestors: furthest from a root (as
measured by longest path).
2014-08-19 01:13:10 +02:00
Mads Kiilerich
b46e11faa6 revlog: introduce isancestor method for efficiently determining node lineage
Hide the not so obvious use of commonancestorsheads.
2014-08-19 01:13:10 +02:00
Matt Mackall
9bc396577d repoview: fix 0L with pack/unpack for 2.4 2014-08-26 13:11:53 +02:00
Matt Mackall
dad18b6446 revlog: fix check-code error 2014-06-14 11:49:02 -05:00
Matt Mackall
6faaeed973 revlog: hold a private reference to self._cache
This keeps other threads from modifying self._cache out from under us.
With this and the previous fix, 'hg serve' survives 100k hits with siege.
2014-06-13 14:17:14 -05:00
Matt Mackall
6c57e49897 revlog: make _chunkcache access atomic
With this and related fixes, 'hg serve' survived 100000 hits with
siege.
2014-06-13 14:16:03 -05:00
Mads Kiilerich
cb0184290d revlog: backout 1c95c1863327 - commonancestors 2014-04-17 20:01:39 +02:00
Mads Kiilerich
1478c563ea revlog: introduce commonancestorsheads method
Very similar to commonancestors but giving all the common ancestors heads.
2014-04-17 20:01:35 +02:00
Matt Mackall
4610a4e481 merge with stable 2014-04-10 12:41:39 -04:00
Matt Mackall
b154d8d029 revlog: deal with chunk ranges over 2G on Windows (issue4215)
Python uses a C long (32 bits on Windows 64) rather than an ssize_t in
read(), and thus has a 2G size limit. Work around this by falling back
to reading one chunk at a time on overflow. This approximately doubles
our headroom until we run back into the size limit on single reads.
2014-04-07 14:18:10 -05:00
Mads Kiilerich
de55a8fdda revlog: introduce commonancestors method for getting all common ancestor heads 2014-02-24 22:42:14 +01:00
Durham Goode
b147e53e3f revlog: move file writing to a separate function
Moves the code that actually writes to a file to a separate function in
revlog.py. This allows extensions to intercept and use the data being written to
disk. For example, an extension might want to replicate these writes elsewhere.

When cloning the Mercurial repo on /dev/shm with --pull, I see about a 0.3% perf change.
It goes from 28.2 to 28.3 seconds.
2013-11-26 12:58:27 -08:00
Brodie Rao
43ab01245b revlog: allow tuning of the chunk cache size (via format.chunkcachesize)
Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:

$ hg --config format.chunkcachesize=1024 perfmoonwalk
! wall 2.022021 comb 2.030000 user 1.970000 sys 0.060000 (best of 5)

(16,154 cache hits, 3,840 misses.)

$ hg --config format.chunkcachesize=4096 perfmoonwalk
! wall 1.901006 comb 1.900000 user 1.880000 sys 0.020000 (best of 6)

(19,003 hits, 991 misses.)

$ hg --config format.chunkcachesize=16384 perfmoonwalk
! wall 1.802775 comb 1.800000 user 1.800000 sys 0.000000 (best of 6)

(19,746 hits, 248 misses.)

$ hg --config format.chunkcachesize=32768 perfmoonwalk
! wall 1.818545 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)

(19,870 hits, 124 misses.)

$ hg --config format.chunkcachesize=65536 perfmoonwalk
! wall 1.801350 comb 1.810000 user 1.800000 sys 0.010000 (best of 6)

(19,932 hits, 62 misses.)

$ hg --config format.chunkcachesize=131072 perfmoonwalk
! wall 1.805879 comb 1.820000 user 1.810000 sys 0.010000 (best of 6)

(19,963 hits, 31 misses.)

We may want to change the default size in the future based on testing and
user feedback.
2013-11-17 18:04:29 -05:00
Brodie Rao
976b336ba8 revlog: read/cache chunks in fixed windows of 64 KB
When reading a revlog chunk, instead of reading up to 64 KB ahead of the
request offset and caching that, this change caches a fixed window before
and after the requested data that falls on 64 KB boundaries. This increases
cache hits when reading revlogs backwards.

Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:

$ hg perfmoonwalk
! wall 2.307994 comb 2.310000 user 2.120000 sys 0.190000 (best of 5)

(Each run has 10,668 cache hits and 9,304 misses.)

After this change:

$ hg perfmoonwalk
! wall 1.814117 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)

(19,931 cache hits, 62 misses.)

On a busy NFS share, before this change:

$ hg perfmoonwalk
! wall 17.000034 comb 4.100000 user 3.270000 sys 0.830000 (best of 3)

After:

$ hg perfmoonwalk
! wall 1.746115 comb 1.670000 user 1.660000 sys 0.010000 (best of 5)
2013-11-17 18:04:28 -05:00
Durham Goode
d8c96277e4 strip: add faster revlog strip computation
The previous revlog strip computation would walk every rev in the revlog, from
the bottom to the top.  Since we're usually stripping only the top few revs of
the revlog, this was needlessly expensive on large repos.

The new algorithm walks the exact number of revs that will be stripped, thus
making the operation not dependent on the number of revs in the repo.

This makes amend on a large repo go from 8.7 seconds to 6 seconds.
2013-11-11 16:42:49 -08:00
Durham Goode
9840379432 revlog: return lazy set from findcommonmissing
When computing the commonmissing, it greedily computes the entire set
immediately. On a large repo where the majority of history is irrelevant, this
causes a significant slow down.

Replacing it with a lazy set makes amend go from 11 seconds to 8.7 seconds.
2013-11-11 16:40:02 -08:00
Matt Mackall
2513030d97 merge with stable 2013-09-23 11:37:06 -07:00
Wojciech Lopata
0a0c3321e2 generaldelta: initialize basecache properly
Previously basecache was incorrectly initialized before adding the first
revision from a changegroup. Basecache value influences when full revisions are
stored in revlog (when using generaldelta). As a result it was possible to
generate a generaldelta-revlog that could be bigger by arbitrary factor than its
non-generaldelta equivalent.
2013-09-20 10:45:51 -07:00
Siddharth Agarwal
64c41699fa revlog: remove _chunkbase since it is no longer used
This was introduced in 2011 for the lwcopy feature but never actually got used.
A similar hook can easily be reintroduced if needed in the future.
2013-09-06 23:05:33 -07:00
Siddharth Agarwal
bcd16c3892 revlog: move chunk cache preload from revision to _chunks
In case we don't have a cached text already, add the base rev to the list
passed to _chunks. In the cached case this also avoids unnecessarily preloading
the chunk for the cached rev.
2013-09-06 23:05:11 -07:00
Siddharth Agarwal
61f0e46895 revlog._chunks: inline getchunk
We do this in a somewhat hacky way, relying on the fact that our sole caller
preloads the cache right before calling us. An upcoming patch will make this
more sensible.

For a 20 MB manifest with a delta chain of > 40k, perfmanifest goes from 0.49
seconds to 0.46.
2013-09-06 22:57:51 -07:00
Siddharth Agarwal
b7abdca3c1 revlog.revision: fix cache preload for inline revlogs
Previously the length of data preloaded did not account for the interleaved io
contents. This meant that we'd sometimes have cache misses in _chunks despite
the preloading.

Having a correctly filled out cache will become essential in an upcoming patch.
2013-09-07 12:42:46 -07:00
Siddharth Agarwal
d24e970042 revlog: add a fast method for getting a list of chunks
This moves _chunkraw into the loop. Doing that improves revlog decompression --
in particular, manifest decompression -- significantly. For a 20 MB manifest
which is the result of a > 40k delta chain, hg perfmanifest improves from 0.55
seconds to 0.49 seconds.
2013-09-06 16:31:35 -07:00
Wojciech Lopata
3a79365ccc revlog: pass node as an argument of addrevision
This change will allow revlog subclasses that override 'checkhash' method
to use custom strategy of computing nodeids without overriding 'addrevision'
method. In particular this change is necessary to implement manifest
compression.
2013-08-19 11:25:23 -07:00
Wojciech Lopata
299c718f66 revlog: extract 'checkhash' method
Extract method that decides whether nodeid is correct for paricular revision
text and parent nodes. Having this method extracted will allow revlog
subclasses to implement custom way of computing nodes. In particular this
change is necessary to implement manifest compression.
2013-08-19 11:06:38 -07:00
Matt Mackall
06155d5c8a revlog: handle hidden revs in _partialmatch (issue3979)
Looking up hidden prefixes could cause a no node exception
Looking up unique non-hidden prefixes could be ambiguous
2013-07-23 17:28:12 -05:00
Durham Goode
e750755eba revlog: add exception when linkrev == nullrev
When we deployed the latest crew mercurial to our users, a few of them
had issues where a filelog would have an entry with a -1 linkrev. This
caused operations like rebase and amend to create a bundle containing the
entire repository, which took a long time.

I don't know what the issue is, but adding this check should prevent repos
from getting in this state, and should help us pinpoint the issue next time
it happens.
2013-06-17 19:44:00 -07:00
Sune Foldager
6bd4fdfe9d bundle-ng: move group into the bundler
No additional semantic changes made.
2013-05-10 21:03:01 +02:00
Alexander Plavin
48936f264c revlog: fix a regression with null revision
Introduced in the patch which fixes issue3497
Part of that patch was erroneously submitted and it shouldn't be in the code
2013-04-18 16:46:09 +04:00
Alexander Plavin
829cf92d16 log: fix behavior with empty repositories (issue3497)
Make output in this special case consistent with general case one.
2013-04-17 00:29:54 +04:00
Bryan O'Sullivan
512383d40e revlog: don't cross-check ancestor result against Python version 2013-04-16 10:08:20 -07:00
Bryan O'Sullivan
c6b9f1099d parsers: a C implementation of the new ancestors algorithm
The performance of both the old and new Python ancestor algorithms
depends on the number of revs they need to traverse.  Although the
new algorithm performs far better than the old when revs are
numerically and topologically close, both algorithms become slow
under other circumstances, taking up to 1.8 seconds to give answers
in a Linux kernel repo.

This C implementation of the new algorithm is a fairly straightforward
transliteration.  The only corner case of interest is that it raises
an OverflowError if the number of GCA candidates found during the
first pass is greater than 24, to avoid the dual perils of fixnum
overflow and trying to allocate too much memory.  (If this exception
is raised, the Python implementation is used instead.)

Performance numbers are good: in a Linux kernel repo, time for "hg
debugancestors" on two distant revs (24bf01de7537 and c2a8808f5943)
is as follows:

  Old Python: 0.36 sec
  New Python: 0.42 sec
  New C: 0.02 sec

For a case where the new algorithm should perform well:

  Old Python: 1.84 sec
  New Python: 0.07 sec
  New C: measures as zero when using --time

(This commit includes a paranoid cross-check to ensure that the
Python and C implementations give identical answers. The above
performance numbers were measured with that check disabled.)
2013-04-16 10:08:20 -07:00
Bryan O'Sullivan
59b785a485 revlog: choose a consistent ancestor when there's a tie
Previously, we chose a rev based on numeric ordering, which could
cause "the same merge" in topologically identical but numerically
different repos to choose different merge bases.

We now choose the lexically least node; this is stable across
different revlog orderings.
2013-04-16 10:08:19 -07:00
Bryan O'Sullivan
4a3a46aff6 ancestor: a new algorithm that is faster for nodes near tip
Instead of walking all the way to the root of the DAG, we generate
a set of candidate GCA revs, then figure out which ones will win
the race to the root (usually without needing to traverse all the
way to the root).

In the common case of nodes that are close to each other in both
revision number and topology, this is usually a big win: it makes
"hg --time debugancestors" up to 9 times faster than the more general
ancestor function when measured on heads of the linux-2.6 hg repo.

Victory is not assured, however. The older function can still win
by a large margin if one node is much closer to the root than the
other, or by a much smaller amount if one is an ancestor of the
other.

For now, we've also got a small paranoid harness function that calls
both ancestor functions on every input and ensures that they give
equivalent answers.

Even without the checker function, the old ancestor function needs
to stay alive for the time being, as its generality is used by
context.filectx.merge.
2013-04-16 10:08:18 -07:00
Benoit Boissinot
41300c28e0 revlog: document v0 format 2013-02-09 12:08:02 +01:00
Siddharth Agarwal
4d560304bb revlog: move ancestor generation out to a new class
This refactoring is to prepare for implementing lazy membership.
2012-12-18 10:14:01 -08:00
Siddharth Agarwal
cca6ff3076 revlog: remove incancestors since it is no longer used 2012-12-17 15:08:37 -08:00
Siddharth Agarwal
6d5198a5a3 revlog.ancestors: add support for including revs
This is in preparation for an upcoming refactoring. This also fixes a bug in
incancestors, where if an element of revs was an ancestor of another it would
be generated twice.
2012-12-17 15:13:51 -08:00
Pierre-Yves David
9ac120f569 revlog: allow reverse iteration with revlog.revs
We often need to perform rev iteration in reverse order. This
changeset makes it possible to do so, in order to avoid costly reverse
or reversed() calls later.
2012-11-21 00:42:05 +01:00
Siddharth Agarwal
b05b94a300 revlog: add rev-specific variant of findmissing
This will be used by rebase in an upcoming commit.
2012-11-26 10:48:24 -08:00
Siddharth Agarwal
76a23a18f8 revlog: switch findmissing to use ancestor.missingancestors
This also speeds up other commands that use findmissing, like
incoming and merge --preview. With a large linear repository (>400000
commits) and with one incoming changeset, incoming is sped up from
around 4-4.5 seconds to under 3.
2012-11-26 11:02:48 -08:00
Durham Goode
cce0517fb6 commit: increase perf by avoiding unnecessary filteredrevs check
When commiting to a repo with lots of history (>400000 changesets)
the filteredrevs check (added with 373606589de5) in changelog.py
takes a bit of time even if the filteredrevs set is empty. Skipping
the check in that case shaves 0.36 seconds off a 2.14 second commit.
A 17% gain.
2012-11-16 15:39:12 -08:00
Pierre-Yves David
6d6a3d27a5 clfilter: split revlog.headrevs C call from python code
Make the pure python implementation of headrevs available to derived classes. It
is important because filtering logic applied by `revlog` derived class won't
have effect on `index`. We want to be able to bypass this C call to implement
our own.
2012-09-03 14:19:45 +02:00
Pierre-Yves David
23fb63d637 clfilter: handle non contiguous iteration in revlov.headrevs
This prepares changelog level filtering.  We can't assume that any revision can
be heads because filtered revisions need to be excluded.

New algorithm:
- All revisions now start as "non heads",
- every revision we iterate over is made candidate head,
- parents of iterated revisions are definitely not head.

Filtered revisions are never iterated over and never considered as candidate
head.
2012-09-03 14:12:45 +02:00
Pierre-Yves David
6981326b92 clfilter: make the revlog class responsible of all its iteration
This prepares changelog level filtering. We need the algorithms used in revlog to
work on a subset of revisions.  To achieve this, the use of explicit range of
revision is banned. `range` and `xrange` calls are replaced by a `revlog.irevs`
method. Filtered super class can then overwrite the `irevs` method to filter out
revision.
2012-09-20 19:00:59 +02:00
Mads Kiilerich
2f4504e446 fix trivial spelling errors 2012-08-15 22:38:42 +02:00
Matt Mackall
5b06da939f backout 94ae81a4e338
This may have allowed unbounded I/O sizes with the current chunk
retrieval code.
2012-07-12 14:20:34 -05:00
Martin Geisler
c52341ae3f merge with main 2012-07-12 10:03:50 +02:00
Friedrich Kastner-Masilko
a6245a11d3 revlog: fix for generaldelta distance calculation
The decision whether or not to store a full snapshot instead of a delta is done
based on the distance value calculated in _addrevision.builddelta(rev).

This calculation traditionally used the fact of deltas only using the previous
revision as base. Generaldelta mechanism is changing this, yet the calculation
still assumes that current-offset minus chainbase-offset equals chain-length.
This appears to be wrong.

This patch corrects the calculation by means of using the chainlength function
if Generaldelta is used.
2012-07-11 12:38:42 +02:00
Bryan O'Sullivan
26f2c363fd revlog: make compress a method
This allows an extension to optionally use a new compression type based
on the options applied by the repo to the revlog's opener.

(decompress doesn't need the same treatment, as it can be replaced using
extensions.wrapfunction, and can figure out which compression algorithm
is in use based on the first byte of the compressed payload.)
2012-06-25 13:56:13 -07:00