Commit Graph

587 Commits

Author SHA1 Message Date
Jordi Gutiérrez Hermoso
211712bc95 revlog: raise an exception earlier if an entry is too large (issue4675)
Before we were relying on _pack to error out when trying to pass an
integer that was too large for the "i" format specifier. Now we check
this earlier so we can form a better error message.

The error message unfortunately must exclude the filename at this
level of the call stack. The problem is that this name is not
available here, and the error can be triggered by a large manifest or
by a large file itself. Although perhaps we could provide the name of
a revlog index file (from the revlog object, instead of the revlogio
object), this seems like too much leakage of internal data structures.
It's not ideal already that an error message even mentions revlogs,
but this does seem unavoidable here.
2015-06-02 15:04:39 -04:00
Laurent Charignon
e470f84c03 phases: fix bug where native phase computation wasn't called
I forgot to include this change as a previous diff and the native code to
compute the phases was never called. The AttributeError was silently caught and
the pure implementation was used instead.
2015-05-29 14:24:50 -07:00
Martin von Zweigbergk
4127f67694 util: drop alias for collections.deque
Now that util.deque is just an alias for collections.deque, let's just
remove it.
2015-05-16 11:28:04 -07:00
Mike Edgar
c2ecfc164f revlog: make converting from inline to non-line work after a strip
The checkinlinesize function, which converts inline revlogs to non-inline,
uses the current transaction's "data" field to determine how to update the
transaction after the conversion.

This change works around the missing data field, which is not in the
transaction after a strip.
2015-03-25 15:58:31 -04:00
Laurent Charignon
6b06793e68 phase: default to C implementation for phase computation 2015-03-20 11:14:27 -07:00
Mike Edgar
93a70657fb revlog: addgroup checks if incoming deltas add censored revs, sets flag bit
A censored revision stored in a revlog should have the censored revlog index
flag bit set. This implies we must know if a revision is censored before we
add it to the revlog. When adding revisions from exchanged deltas, we would
prefer to determine this flag without decoding every single full text.

This change introduces a heuristic based on assumptions around the Mercurial
delta format and filelog metadata. Since deltas which produce a censored
revision must be full-replacement deltas, we can read the delta's first bytes
to check the filelog metadata. Since "censored" is the alphabetically first
filelog metadata key, censored filelog revisions have a well-known prefix we
can look for.

For more on the design and background of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-01-14 15:16:08 -05:00
Mike Edgar
3562a06261 revlog: _addrevision creates full-replace deltas based on censored revisions
A delta against a censored revision is either received through exchange and
written blindly to a revlog, or it is created by the revlog itself. This
change ensures the latter process creates deltas which fully replace all
data in a censored base using a single patch operation.

Recipients of a delta against a censored base will verify that the delta is in
this full-replace format. Other recipients will use the delta as normal.

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-01-21 17:11:37 -05:00
Mike Edgar
0a639adee5 revlog: special case expanding full-replacement deltas received by exchange
When a delta received through exchange is added to a revlog, it will very
often be expanded to a full text by applying the delta to its base. If
that delta is of a particular form, we can avoid decoding the base revision.
This avoids an exception if the base revision is censored.

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-02-06 01:38:16 +00:00
Mike Edgar
9635f8c5b0 revlog: in addgroup, reject ill-formed deltas based on censored nodes
To ensure interoperability when clones disagree about which file nodes are
censored, a restriction is made on deltas based on censored nodes. Any such
delta must replace the full text of the base in a single patch.

If the recipient of a delta considers the base to be censored and the delta
is not in the expected form, the recipient must reject it, as it can't know
if the source has also censored the base.

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-02-06 00:55:29 +00:00
Mike Edgar
c736894d9c revlog: add "iscensored()" to revlog public API
The iscensored method will be used by the exchange layer to reject
nonconforming deltas involving censored revisions (and to produce
conforming deltas).

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-01-23 17:01:39 -05:00
Yuya Nishihara
237120f282 revlog: add __contains__ for fast membership test
Because revlog implements __iter__, "rev in revlog" works but does silly O(n)
lookup unexpectedly. So it seems good to add fast version of __contains__.

This allows "rev in repo.changelog" in the next patch.
2015-02-04 21:25:57 +09:00
Mike Edgar
39ea63d3f6 revlog: verify censored flag when hashing added revision fulltext
When receiving a delta via exchange, three possible storage outcomes emerge:

1. The delta is added directly to the revlog. ("fast-path")
2. A freshly-computed delta with a different base is stored.
3. The new revision's fulltext is computed and stored outright.

Both (2) and (3) require materializing the full text of the new revision by
applying the delta to its base. This is typically followed by a hash check.

The new flags argument allows callers to _addrevision to signal that they
expect that hash check to fail. We can use this opportunity to verify that
expectation. If the hash fails, require the flag be set; if the hash passes,
require the flag be unset.

Rather than simply eliding the hash check, this approach provides some
assurance that the censored flag is not applied to valid revisions.

Read more at: http://mercurial.selenic.com/wiki/CensorPlan
2015-01-12 14:41:25 -05:00
Mike Edgar
4b3ca8d71c revlog: add flags argument to _addrevision, update callers use default flags
For revlog index flags to be useful to other parts of Mercurial, they need to
be settable when writing revisions. The current use case for revlog index
flags is the censorship feature: http://mercurial.selenic.com/wiki/CensorPlan

While the censor flag could be inferred in _addrevision by interrogating the
text/delta being added, that would bury the censorship logic and
inappropriately couple it to all revision creation.
2015-01-12 14:30:24 -05:00
Mike Edgar
339a2739c2 revlog: define censored flag for revlogng index
This flag bit will be used to cheaply signal censorship presence to upper
layers (exchange, verify). It indicates that censorship metadata is present
but does not attest to the verifiability of that metadata.

For the censorship design, see: http://mercurial.selenic.com/wiki/CensorPlan
2015-01-12 14:01:52 -05:00
Siddharth Agarwal
2d669c474b revlog: switch findmissing* methods to incrementalmissingrevs
This will allow us to remove ancestor.missingancestors in an upcoming patch.
2014-11-14 16:52:40 -08:00
Siddharth Agarwal
5692148f49 revlog: add a method to get missing revs incrementally
This will turn out to be useful for discovery.
2014-11-16 00:39:48 -08:00
Siddharth Agarwal
7103eb28ea ancestor.lazyancestors: take parentrevs function rather than changelog
Principle of least privilege, and it also brings this in line with
missingancestors.
2014-11-14 14:36:25 -08:00
Siddharth Agarwal
8354d9169f revlog: cache chain info after calculating it for a rev (issue4452)
This dumb cache works surprisingly well: on a repository with typical delta
chains ~50k in length, unbundling a linear series of 5000 revisions (changelogs
and manifests only) went from 60 seconds to 3.
2014-11-13 21:36:38 -08:00
Siddharth Agarwal
1acd4cfca4 revlog: increase I/O bound to 4x the amount of data consumed
This doesn't affect normal clones since they'd be bound by the CPU bound below
anyway -- it does, however, improve generaldelta clones significantly.

This also results in better deltaing for generaldelta clones -- in generaldelta
clones, we calculate deltas with respect to the closest base if it has a higher
revision number than either parent. If the base is on a significantly different
branch, this can result in pointlessly massive deltas. This reduces the number
of bases and hence the number of bad deltas.

Empirically, for a highly branchy repository, this resulted in an improvement
of around 15% to manifest size.
2014-11-11 20:08:19 -08:00
Siddharth Agarwal
fe51051ee5 revlog: bound based on the length of the compressed deltas
This is only relevant for generaldelta clones.
2014-11-11 20:01:19 -08:00
Siddharth Agarwal
27976ad2dc revlog: compute length of compressed deltas along with chain length
In upcoming patches to the revlog, we're going to split up the notions of
bounding I/O and bounding CPU.
2014-11-11 19:54:36 -08:00
Siddharth Agarwal
6e115e5383 revlog: store fulltext when compressed delta is bigger than it
This is a very silly case and not particularly likely to happen in the wild,
but it turns out we can hit it in a couple of places. As we tune the storage
parameters we're likely to hit more such cases.

The affected test cases all have smaller revlogs now.
2014-11-11 21:41:12 -08:00
Siddharth Agarwal
e5d387f47e revlog: make a predicate clearer with parens 2014-11-11 21:39:56 -08:00
Mateusz Kwapich
3433abb6a8 revlog: add config variable for limiting delta-chain length
The current heuristic for deciding between storing delta and full texts
is based on ratio of (sizeofdeltas)/(sizeoffulltext).

In some cases (for example a manifest for ahuge repo) this approach
can result in extremely long delta chains (~30,000) which are very slow to
read. (In the case of a manifest ~500ms are added to every hg command because of that).

This commit introduces "revlog.maxchainlength" configuration variable that will
limit delta chain length.
2014-11-06 14:20:05 -08:00
Mateusz Kwapich
1a554418d5 debugrevlog: fix computing chain length in debugrevlog -d
The chain length was computed correctly only when generaldelta
feature was enabled. Now it's fixed.

When generaldelta is disabled the base revision in revlog index is not
the revision we have delta against - it's always previous revision.

Instead of incorrect chainbaseandlen in command.py we are now using two
single-responsibility functions in revlog.py:
 - chainbase(rev)
 - chainlen(rev)
Only chainlen(rev) was missing so it was written to mimic the way the
chain of deltas is actually found during file reconstruction.
2014-11-06 14:08:25 -08:00
Mike Edgar
ba052f742a revlog: support importing censored file revision tombstones
This change allows a revision log to not fail integrity checks when applying a
changegroup delta (eg from a bundle) results in a censored file tombstone. The
tombstone is inserted as-is, so future integrity verification will observe the
tombstone. Deltas based on the tombstone will also remain correct.

The new code path is encountered for *exactly* the cases where _addrevision is
importing a tombstone from a changegroup. When committing a file containing
the "magic" tombstone text, the "text" parameter will be non-empty and the
checkhash call is not executed (and when committing, the node will be computed
to match the "magic" tombstone text).
2014-09-03 16:34:29 -04:00
Augie Fackler
b2278203f5 revlog: move references to revlog.hash to inside the revlog class
This will make it possible for subclasses to have different hashing
schemes when appropriate. I anticipate using this in manifests.

Note that there's still one client of mercurial.revlog.hash() outside
of revlog: mercurial.context.memctx uses it to construct the file
entries in an in-memory manifest. I don't think this will be a problem
in the immediate future, so I've left it as-is.
2014-09-24 15:14:44 -04:00
Augie Fackler
44b8666b93 revlog: mark nullhash as module-private
No other module should ever need this, so mark it with _ so nobody
tries to use it.
2014-09-24 15:10:52 -04:00
Mads Kiilerich
fae32dd0a3 comments: describe ancestor consistently - avoid 'least common ancestor'
"best" is as defined by mercurial.ancestor.ancestors: furthest from a root (as
measured by longest path).
2014-08-19 01:13:10 +02:00
Mads Kiilerich
b46e11faa6 revlog: introduce isancestor method for efficiently determining node lineage
Hide the not so obvious use of commonancestorsheads.
2014-08-19 01:13:10 +02:00
Matt Mackall
9bc396577d repoview: fix 0L with pack/unpack for 2.4 2014-08-26 13:11:53 +02:00
Matt Mackall
dad18b6446 revlog: fix check-code error 2014-06-14 11:49:02 -05:00
Matt Mackall
6faaeed973 revlog: hold a private reference to self._cache
This keeps other threads from modifying self._cache out from under us.
With this and the previous fix, 'hg serve' survives 100k hits with siege.
2014-06-13 14:17:14 -05:00
Matt Mackall
6c57e49897 revlog: make _chunkcache access atomic
With this and related fixes, 'hg serve' survived 100000 hits with
siege.
2014-06-13 14:16:03 -05:00
Mads Kiilerich
cb0184290d revlog: backout 1c95c1863327 - commonancestors 2014-04-17 20:01:39 +02:00
Mads Kiilerich
1478c563ea revlog: introduce commonancestorsheads method
Very similar to commonancestors but giving all the common ancestors heads.
2014-04-17 20:01:35 +02:00
Matt Mackall
4610a4e481 merge with stable 2014-04-10 12:41:39 -04:00
Matt Mackall
b154d8d029 revlog: deal with chunk ranges over 2G on Windows (issue4215)
Python uses a C long (32 bits on Windows 64) rather than an ssize_t in
read(), and thus has a 2G size limit. Work around this by falling back
to reading one chunk at a time on overflow. This approximately doubles
our headroom until we run back into the size limit on single reads.
2014-04-07 14:18:10 -05:00
Mads Kiilerich
de55a8fdda revlog: introduce commonancestors method for getting all common ancestor heads 2014-02-24 22:42:14 +01:00
Durham Goode
b147e53e3f revlog: move file writing to a separate function
Moves the code that actually writes to a file to a separate function in
revlog.py. This allows extensions to intercept and use the data being written to
disk. For example, an extension might want to replicate these writes elsewhere.

When cloning the Mercurial repo on /dev/shm with --pull, I see about a 0.3% perf change.
It goes from 28.2 to 28.3 seconds.
2013-11-26 12:58:27 -08:00
Brodie Rao
43ab01245b revlog: allow tuning of the chunk cache size (via format.chunkcachesize)
Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:

$ hg --config format.chunkcachesize=1024 perfmoonwalk
! wall 2.022021 comb 2.030000 user 1.970000 sys 0.060000 (best of 5)

(16,154 cache hits, 3,840 misses.)

$ hg --config format.chunkcachesize=4096 perfmoonwalk
! wall 1.901006 comb 1.900000 user 1.880000 sys 0.020000 (best of 6)

(19,003 hits, 991 misses.)

$ hg --config format.chunkcachesize=16384 perfmoonwalk
! wall 1.802775 comb 1.800000 user 1.800000 sys 0.000000 (best of 6)

(19,746 hits, 248 misses.)

$ hg --config format.chunkcachesize=32768 perfmoonwalk
! wall 1.818545 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)

(19,870 hits, 124 misses.)

$ hg --config format.chunkcachesize=65536 perfmoonwalk
! wall 1.801350 comb 1.810000 user 1.800000 sys 0.010000 (best of 6)

(19,932 hits, 62 misses.)

$ hg --config format.chunkcachesize=131072 perfmoonwalk
! wall 1.805879 comb 1.820000 user 1.810000 sys 0.010000 (best of 6)

(19,963 hits, 31 misses.)

We may want to change the default size in the future based on testing and
user feedback.
2013-11-17 18:04:29 -05:00
Brodie Rao
976b336ba8 revlog: read/cache chunks in fixed windows of 64 KB
When reading a revlog chunk, instead of reading up to 64 KB ahead of the
request offset and caching that, this change caches a fixed window before
and after the requested data that falls on 64 KB boundaries. This increases
cache hits when reading revlogs backwards.

Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:

$ hg perfmoonwalk
! wall 2.307994 comb 2.310000 user 2.120000 sys 0.190000 (best of 5)

(Each run has 10,668 cache hits and 9,304 misses.)

After this change:

$ hg perfmoonwalk
! wall 1.814117 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)

(19,931 cache hits, 62 misses.)

On a busy NFS share, before this change:

$ hg perfmoonwalk
! wall 17.000034 comb 4.100000 user 3.270000 sys 0.830000 (best of 3)

After:

$ hg perfmoonwalk
! wall 1.746115 comb 1.670000 user 1.660000 sys 0.010000 (best of 5)
2013-11-17 18:04:28 -05:00
Durham Goode
d8c96277e4 strip: add faster revlog strip computation
The previous revlog strip computation would walk every rev in the revlog, from
the bottom to the top.  Since we're usually stripping only the top few revs of
the revlog, this was needlessly expensive on large repos.

The new algorithm walks the exact number of revs that will be stripped, thus
making the operation not dependent on the number of revs in the repo.

This makes amend on a large repo go from 8.7 seconds to 6 seconds.
2013-11-11 16:42:49 -08:00
Durham Goode
9840379432 revlog: return lazy set from findcommonmissing
When computing the commonmissing, it greedily computes the entire set
immediately. On a large repo where the majority of history is irrelevant, this
causes a significant slow down.

Replacing it with a lazy set makes amend go from 11 seconds to 8.7 seconds.
2013-11-11 16:40:02 -08:00
Matt Mackall
2513030d97 merge with stable 2013-09-23 11:37:06 -07:00
Wojciech Lopata
0a0c3321e2 generaldelta: initialize basecache properly
Previously basecache was incorrectly initialized before adding the first
revision from a changegroup. Basecache value influences when full revisions are
stored in revlog (when using generaldelta). As a result it was possible to
generate a generaldelta-revlog that could be bigger by arbitrary factor than its
non-generaldelta equivalent.
2013-09-20 10:45:51 -07:00
Siddharth Agarwal
64c41699fa revlog: remove _chunkbase since it is no longer used
This was introduced in 2011 for the lwcopy feature but never actually got used.
A similar hook can easily be reintroduced if needed in the future.
2013-09-06 23:05:33 -07:00
Siddharth Agarwal
bcd16c3892 revlog: move chunk cache preload from revision to _chunks
In case we don't have a cached text already, add the base rev to the list
passed to _chunks. In the cached case this also avoids unnecessarily preloading
the chunk for the cached rev.
2013-09-06 23:05:11 -07:00
Siddharth Agarwal
61f0e46895 revlog._chunks: inline getchunk
We do this in a somewhat hacky way, relying on the fact that our sole caller
preloads the cache right before calling us. An upcoming patch will make this
more sensible.

For a 20 MB manifest with a delta chain of > 40k, perfmanifest goes from 0.49
seconds to 0.46.
2013-09-06 22:57:51 -07:00
Siddharth Agarwal
b7abdca3c1 revlog.revision: fix cache preload for inline revlogs
Previously the length of data preloaded did not account for the interleaved io
contents. This meant that we'd sometimes have cache misses in _chunks despite
the preloading.

Having a correctly filled out cache will become essential in an upcoming patch.
2013-09-07 12:42:46 -07:00
Siddharth Agarwal
d24e970042 revlog: add a fast method for getting a list of chunks
This moves _chunkraw into the loop. Doing that improves revlog decompression --
in particular, manifest decompression -- significantly. For a 20 MB manifest
which is the result of a > 40k delta chain, hg perfmanifest improves from 0.55
seconds to 0.49 seconds.
2013-09-06 16:31:35 -07:00
Wojciech Lopata
3a79365ccc revlog: pass node as an argument of addrevision
This change will allow revlog subclasses that override 'checkhash' method
to use custom strategy of computing nodeids without overriding 'addrevision'
method. In particular this change is necessary to implement manifest
compression.
2013-08-19 11:25:23 -07:00
Wojciech Lopata
299c718f66 revlog: extract 'checkhash' method
Extract method that decides whether nodeid is correct for paricular revision
text and parent nodes. Having this method extracted will allow revlog
subclasses to implement custom way of computing nodes. In particular this
change is necessary to implement manifest compression.
2013-08-19 11:06:38 -07:00
Matt Mackall
06155d5c8a revlog: handle hidden revs in _partialmatch (issue3979)
Looking up hidden prefixes could cause a no node exception
Looking up unique non-hidden prefixes could be ambiguous
2013-07-23 17:28:12 -05:00
Durham Goode
e750755eba revlog: add exception when linkrev == nullrev
When we deployed the latest crew mercurial to our users, a few of them
had issues where a filelog would have an entry with a -1 linkrev. This
caused operations like rebase and amend to create a bundle containing the
entire repository, which took a long time.

I don't know what the issue is, but adding this check should prevent repos
from getting in this state, and should help us pinpoint the issue next time
it happens.
2013-06-17 19:44:00 -07:00
Sune Foldager
6bd4fdfe9d bundle-ng: move group into the bundler
No additional semantic changes made.
2013-05-10 21:03:01 +02:00
Alexander Plavin
48936f264c revlog: fix a regression with null revision
Introduced in the patch which fixes issue3497
Part of that patch was erroneously submitted and it shouldn't be in the code
2013-04-18 16:46:09 +04:00
Alexander Plavin
829cf92d16 log: fix behavior with empty repositories (issue3497)
Make output in this special case consistent with general case one.
2013-04-17 00:29:54 +04:00
Bryan O'Sullivan
512383d40e revlog: don't cross-check ancestor result against Python version 2013-04-16 10:08:20 -07:00
Bryan O'Sullivan
c6b9f1099d parsers: a C implementation of the new ancestors algorithm
The performance of both the old and new Python ancestor algorithms
depends on the number of revs they need to traverse.  Although the
new algorithm performs far better than the old when revs are
numerically and topologically close, both algorithms become slow
under other circumstances, taking up to 1.8 seconds to give answers
in a Linux kernel repo.

This C implementation of the new algorithm is a fairly straightforward
transliteration.  The only corner case of interest is that it raises
an OverflowError if the number of GCA candidates found during the
first pass is greater than 24, to avoid the dual perils of fixnum
overflow and trying to allocate too much memory.  (If this exception
is raised, the Python implementation is used instead.)

Performance numbers are good: in a Linux kernel repo, time for "hg
debugancestors" on two distant revs (24bf01de7537 and c2a8808f5943)
is as follows:

  Old Python: 0.36 sec
  New Python: 0.42 sec
  New C: 0.02 sec

For a case where the new algorithm should perform well:

  Old Python: 1.84 sec
  New Python: 0.07 sec
  New C: measures as zero when using --time

(This commit includes a paranoid cross-check to ensure that the
Python and C implementations give identical answers. The above
performance numbers were measured with that check disabled.)
2013-04-16 10:08:20 -07:00
Bryan O'Sullivan
59b785a485 revlog: choose a consistent ancestor when there's a tie
Previously, we chose a rev based on numeric ordering, which could
cause "the same merge" in topologically identical but numerically
different repos to choose different merge bases.

We now choose the lexically least node; this is stable across
different revlog orderings.
2013-04-16 10:08:19 -07:00
Bryan O'Sullivan
4a3a46aff6 ancestor: a new algorithm that is faster for nodes near tip
Instead of walking all the way to the root of the DAG, we generate
a set of candidate GCA revs, then figure out which ones will win
the race to the root (usually without needing to traverse all the
way to the root).

In the common case of nodes that are close to each other in both
revision number and topology, this is usually a big win: it makes
"hg --time debugancestors" up to 9 times faster than the more general
ancestor function when measured on heads of the linux-2.6 hg repo.

Victory is not assured, however. The older function can still win
by a large margin if one node is much closer to the root than the
other, or by a much smaller amount if one is an ancestor of the
other.

For now, we've also got a small paranoid harness function that calls
both ancestor functions on every input and ensures that they give
equivalent answers.

Even without the checker function, the old ancestor function needs
to stay alive for the time being, as its generality is used by
context.filectx.merge.
2013-04-16 10:08:18 -07:00
Benoit Boissinot
41300c28e0 revlog: document v0 format 2013-02-09 12:08:02 +01:00
Siddharth Agarwal
4d560304bb revlog: move ancestor generation out to a new class
This refactoring is to prepare for implementing lazy membership.
2012-12-18 10:14:01 -08:00
Siddharth Agarwal
cca6ff3076 revlog: remove incancestors since it is no longer used 2012-12-17 15:08:37 -08:00
Siddharth Agarwal
6d5198a5a3 revlog.ancestors: add support for including revs
This is in preparation for an upcoming refactoring. This also fixes a bug in
incancestors, where if an element of revs was an ancestor of another it would
be generated twice.
2012-12-17 15:13:51 -08:00
Pierre-Yves David
9ac120f569 revlog: allow reverse iteration with revlog.revs
We often need to perform rev iteration in reverse order. This
changeset makes it possible to do so, in order to avoid costly reverse
or reversed() calls later.
2012-11-21 00:42:05 +01:00
Siddharth Agarwal
b05b94a300 revlog: add rev-specific variant of findmissing
This will be used by rebase in an upcoming commit.
2012-11-26 10:48:24 -08:00
Siddharth Agarwal
76a23a18f8 revlog: switch findmissing to use ancestor.missingancestors
This also speeds up other commands that use findmissing, like
incoming and merge --preview. With a large linear repository (>400000
commits) and with one incoming changeset, incoming is sped up from
around 4-4.5 seconds to under 3.
2012-11-26 11:02:48 -08:00
Durham Goode
cce0517fb6 commit: increase perf by avoiding unnecessary filteredrevs check
When commiting to a repo with lots of history (>400000 changesets)
the filteredrevs check (added with 373606589de5) in changelog.py
takes a bit of time even if the filteredrevs set is empty. Skipping
the check in that case shaves 0.36 seconds off a 2.14 second commit.
A 17% gain.
2012-11-16 15:39:12 -08:00
Pierre-Yves David
6d6a3d27a5 clfilter: split revlog.headrevs C call from python code
Make the pure python implementation of headrevs available to derived classes. It
is important because filtering logic applied by `revlog` derived class won't
have effect on `index`. We want to be able to bypass this C call to implement
our own.
2012-09-03 14:19:45 +02:00
Pierre-Yves David
23fb63d637 clfilter: handle non contiguous iteration in revlov.headrevs
This prepares changelog level filtering.  We can't assume that any revision can
be heads because filtered revisions need to be excluded.

New algorithm:
- All revisions now start as "non heads",
- every revision we iterate over is made candidate head,
- parents of iterated revisions are definitely not head.

Filtered revisions are never iterated over and never considered as candidate
head.
2012-09-03 14:12:45 +02:00
Pierre-Yves David
6981326b92 clfilter: make the revlog class responsible of all its iteration
This prepares changelog level filtering. We need the algorithms used in revlog to
work on a subset of revisions.  To achieve this, the use of explicit range of
revision is banned. `range` and `xrange` calls are replaced by a `revlog.irevs`
method. Filtered super class can then overwrite the `irevs` method to filter out
revision.
2012-09-20 19:00:59 +02:00
Mads Kiilerich
2f4504e446 fix trivial spelling errors 2012-08-15 22:38:42 +02:00
Matt Mackall
5b06da939f backout 94ae81a4e338
This may have allowed unbounded I/O sizes with the current chunk
retrieval code.
2012-07-12 14:20:34 -05:00
Martin Geisler
c52341ae3f merge with main 2012-07-12 10:03:50 +02:00
Friedrich Kastner-Masilko
a6245a11d3 revlog: fix for generaldelta distance calculation
The decision whether or not to store a full snapshot instead of a delta is done
based on the distance value calculated in _addrevision.builddelta(rev).

This calculation traditionally used the fact of deltas only using the previous
revision as base. Generaldelta mechanism is changing this, yet the calculation
still assumes that current-offset minus chainbase-offset equals chain-length.
This appears to be wrong.

This patch corrects the calculation by means of using the chainlength function
if Generaldelta is used.
2012-07-11 12:38:42 +02:00
Bryan O'Sullivan
26f2c363fd revlog: make compress a method
This allows an extension to optionally use a new compression type based
on the options applied by the repo to the revlog's opener.

(decompress doesn't need the same treatment, as it can be replaced using
extensions.wrapfunction, and can figure out which compression algorithm
is in use based on the first byte of the compressed payload.)
2012-06-25 13:56:13 -07:00
Joshua Redstone
09130c5cf2 revlog: remove reachable and switch call sites to ancestors
This change does a trivial conversion of callsites to ancestors.
Followon diffs will switch the callsites over to revs.
2012-06-08 08:39:44 -07:00
Joshua Redstone
70aeee0070 revlog: add incancestors, a version of ancestors that includes revs listed
ancestors() returns the ancestors of revs provided. This func is like
that except it also includes the revs themselves in the total set of
revs generated.
2012-06-08 07:59:37 -07:00
Thomas Arendsen Hein
01adf7776d merge heads 2012-06-07 15:55:12 +02:00
Brad Hall
f20c06750f revlog: zlib.error sent to the user (issue3424)
Give the user the zlib error message instead of a backtrace when decompression
fails.
2012-06-04 14:46:42 -07:00
Joshua Redstone
e38b770424 revlog: add optional stoprev arg to revlog.ancestors()
This will be used as a step in removing reachable() in a future diff.
Doing it now because bryano is in the process of rewriting ancestors in
C.  This depends on bryano's patch to replace *revs with revs in the
declaration of revlog.ancestors.
2012-06-01 15:44:13 -07:00
Bryan O'Sullivan
141bd09daa revlog: descendants(*revs) becomes descendants(revs) (API)
Once again making the API more rational, as with ancestors.
2012-06-01 12:45:16 -07:00
Bryan O'Sullivan
6ba97b40c1 revlog: ancestors(*revs) becomes ancestors(revs) (API)
Accepting a variable number of arguments as the old API did is
deeply ugly, particularly as it means the API can't be extended
with new arguments.  Partly as a result, we have at least three
different implementations of the same ancestors algorithm (!?).

Most callers were forced to call ancestors(*somelist), adding to
both inefficiency and ugliness.
2012-06-01 12:37:18 -07:00
Bryan O'Sullivan
abdf4a8227 util: subclass deque for Python 2.4 backwards compatibility
It turns out that Python 2.4's deque type is lacking a remove method.
We can't implement remove in terms of find, because it doesn't have
find either.
2012-06-01 17:05:31 -07:00
Bryan O'Sullivan
bef5b61512 cleanup: use the deque type where appropriate
There have been quite a few places where we pop elements off the
front of a list.  This can turn O(n) algorithms into something more
like O(n**2).  Python has provided a deque type that can do this
efficiently since at least 2.4.

As an example of the difference a deque can make, it improves
perfancestors performance on a Linux repo from 0.50 seconds to 0.36.
2012-05-15 10:46:23 -07:00
Bryan O'Sullivan
a49ea963d7 revlog: switch to a C version of headrevs
The C implementation is more than 100 times faster than the Python
version (which is still available as a fallback).

In a repo with 330,000 revs and a stale .hg/cache/tags file, this
patch improves the performance of "hg tip" from 2.2 to 1.6 seconds.
2012-05-19 19:44:58 -07:00
Matt Mackall
42c30757a2 revlog: don't handle long for revision matching
The underlying C code doesn't support indexing by longs, there are no
legitimate reasons to use a long, and longs should generally be
converted to ints at a higher level by context's constructor.
2012-05-21 16:36:09 -05:00
Brodie Rao
a7ef0a0cc5 cleanup: "not x in y" -> "x not in y" 2012-05-12 16:00:57 +02:00
Bryan O'Sullivan
058dfb801d revlog: speed up prefix matching against nodes
The radix tree already contains all the information we need to
determine whether a short string is an unambiguous node identifier.
We now make use of this information.

In a kernel tree, this improves the performance of
"hg log -q -r24bf01de75" from 0.27 seconds to 0.06.
2012-05-12 10:55:08 +02:00
Matt Mackall
a97dbbe308 revlog: backout df8c4d732869
This regresses performance of 'hg branches', presumably because it's
visiting the revlog in the wrong order. This suggests we either need
to fix the branch code or add some read-behind to mitigate the effect.
2012-04-27 13:07:29 -05:00
Patrick Mezard
e9454c243f revlog: fix partial revision() docstring (from f4a6c9197dbd) 2012-04-13 10:14:59 +02:00
Matt Mackall
a6546db90e revlog: drop some unneeded rev.node calls in revdiff 2012-04-13 22:55:46 -05:00
Bryan O'Sullivan
62554752c6 revlog: avoid an expensive string copy
This showed up in a statprof profile of "hg svn rebuildmeta", which
is read-intensive on the changelog.  This two-line patch improved
the performance of that command by 10%.
2012-04-12 20:26:33 -07:00
Matt Mackall
4e0b41f193 revlog: increase readahead size 2012-04-13 21:35:48 -05:00
Bryan O'Sullivan
dc46676e81 parsers: use base-16 trie for faster node->rev mapping
This greatly speeds up node->rev lookups, with results that are
often user-perceptible: for instance, "hg --time log" of the node
associated with rev 1000 on a linux-2.6 repo improves from 0.3
seconds to 0.03.  I have not found any instances of slowdowns.

The new perfnodelookup command in contrib/perf.py demonstrates the
speedup more dramatically, since it performs no I/O.  For a single
lookup, the new code is about 40x faster.

These changes also prepare the ground for the possibility of further
improving the performance of prefix-based node lookups.
2012-04-12 14:05:59 -07:00
Matt Mackall
055cba03a8 revlog: allow retrieving contents by revision number 2012-04-08 12:38:02 -05:00
Matt Mackall
30645d82e7 revlog: add hasnode helper method 2012-04-07 15:43:18 -05:00
Pierre-Yves David
15ab7ccd15 revlog: make addgroup returns a list of node contained in the added source
This list will contains any node see in the source, not only the added one.
This is intended to allow phase to be move according what was pushed by client
not only what was added.
2012-01-13 01:29:03 +01:00
Pierre-Yves David
a51dc67424 revlog: improve docstring for findcommonmissing 2012-01-09 04:15:31 +01:00
Steven Brown
3ebdb5ed19 revlog: clarify strip docstring "readd" -> "re-add"
I misread it as "read".
2012-01-10 22:35:25 +08:00
Matt Mackall
864ce9da04 misc: adding missing file close() calls
Spotted by Victor Stinner <victor.stinner@haypocalc.com>
2011-11-03 11:24:55 -05:00
Greg Ward
bc1dfb1ac9 atomictempfile: make close() consistent with other file-like objects.
The usual contract is that close() makes your writes permanent, so
atomictempfile's use of close() to *discard* writes (and rename() to
keep them) is rather unexpected. Thus, change it so close() makes
things permanent and add a new discard() method to throw them away.
discard() is only used internally, in __del__(), to ensure that writes
are discarded when an atomictempfile object goes out of scope.

I audited mercurial.*, hgext.*, and ~80 third-party extensions, and
found no one using the existing semantics of close() to discard
writes, so this should be safe.
2011-08-25 20:21:04 -04:00
Augie Fackler
ea2e868e0f revlog: use getattr instead of hasattr 2011-07-25 15:43:55 -05:00
Matt Mackall
1b52b02896 check-code: catch misspellings of descendant
This word is fairly common in Mercurial, and easy to misspell.
2011-06-07 17:02:54 -05:00
Sune Foldager
7db447dd4c revlog: bail out earlier in group when we have no chunks 2011-06-03 20:32:54 +02:00
Martin Geisler
af8a35e078 check-code: flag 0/1 used as constant Boolean expression 2011-06-01 12:38:46 +02:00
Matt Mackall
66805ccfed revlog: stop exporting node.short 2011-05-21 15:01:28 -05:00
Matt Mackall
a6f2ad6f1e revlog: drop base() again
deltaparent does what's needed, and more "portably".
2011-05-18 17:05:30 -05:00
Sune Foldager
9a73f9bed3 revlog: linearize created changegroups in generaldelta revlogs
This greatly improves the speed of the bundling process, and often reduces the
bundle size considerably. (Although if the repository is already ordered, this
has little effect on both time and bundle size.)

For non-generaldelta clients, the reduced bundle size translates to a reduced
repository size, similar to shrinking the revlogs (which uses the exact same
algorithm). For generaldelta clients the difference is minor.

When the new bundle format comes, reordering will not be necessary since we
can then store the deltaparent relationsships directly. The eventual default
behavior for clients and servers is presented in the table below, where "new"
implies support for GD as well as the new bundle format:

                    old client                    new client
old server          old bundle, no reorder        old bundle, no reorder
new server, non-GD  old bundle, no reorder[1]     old bundle, no reorder[2]
new server, GD      old bundle, reorder[3]        new bundle, no reorder[4]

[1] reordering is expensive on the server in this case, skip it
[2] client can choose to do its own redelta here
[3] reordering is needed because otherwise the pull does a lot of extra
    work on the server
[4] reordering isn't needed because client can get deltabase in bundle
    format

Currently, the default is to reorder on GD-servers, and not otherwise. A new
setting, bundle.reorder, has been added to override the default reordering
behavior. It can be set to either 'auto' (the default), or any true or false
value as a standard boolean setting, to either force the reordering on or off
regardless of generaldelta.


Some timing data from a relatively branch test repository follows. All
bundling is done with --all --type none options.

Non-generaldelta, non-shrunk repo:
-----------------------------------
Size: 276M

Without reorder (default):
Bundle time: 14.4 seconds
Bundle size: 939M

With reorder:
Bundle time: 1 minute, 29.3 seconds
Bundle size: 381M

Generaldelta, non-shrunk repo:
-----------------------------------
Size: 87M

Without reorder:
Bundle time: 2 minutes, 1.4 seconds
Bundle size: 939M

With reorder (default):
Bundle time: 25.5 seconds
Bundle size: 381M
2011-05-18 23:26:26 +02:00
Sune Foldager
c222fc4662 changelog: don't use generaldelta 2011-05-16 13:06:48 +02:00
Sune Foldager
d7f01e602b revlog: get rid of defversion
defversion was a property (later option) on the store opener, used to propagate
the changelog revlog format to the other revlogs, so they would be created with
the same format.

This required that the changelog instance was created before any other revlog;
an invariant that wasn't directly enforced (or documented) anywhere.

We now use the revlogv1 requirement instead, which is transfered to the store
opener options. If this option is missing, v0 revlogs are created.
2011-05-16 12:44:34 +02:00
Matt Mackall
608041d55e revlog: restore the base method 2011-05-15 11:50:15 -05:00
Sune Foldager
2ce60e2564 revlog: improve delta generation heuristics for generaldelta
Without this change, pulls (and clones) into a generaldelta repository could
generate very inefficient revlogs, the size of which could be at least twice
the original size.

This was caused by the generated delta chains covering too large distances,
causing new chains to be built far too often. This change addresses the
problem by forcing a delta against second parent or against the previous
revision, when the first parent delta is in danger of creating a long chain.
2011-05-12 15:24:33 +02:00
Sune Foldager
7b30600f6b revlog: fix bug in chainbase cache
The bug didn't cause corruption, and thus wasn't caught in hg verify or in
tests. It could lead to delta chains longer than normally allowed, by
affecting the code that decides when to add a full revision. This could,
in turn, lead to performance regression.
2011-05-12 13:47:17 +02:00
Sune Foldager
762090a2c7 revlog: add docstring to _addrevision 2011-05-11 11:04:44 +02:00
Sune Foldager
1c7dece034 revlog: support writing generaldelta revlogs
With generaldelta switched on, deltas are always computed against the first
parent when adding revisions. This is done regardless of what revision the
incoming bundle, if any, is deltaed against.

The exact delta building strategy is subject to change, but this will not
affect compatibility.

Generaldelta is switched off by default.
2011-05-08 21:32:33 +02:00
Sune Foldager
8bdf02181a revlog: support reading generaldelta revlogs
Generaldelta is a new revlog global flag. When it's turned on, the base field
of each revision entry holds the deltaparent instead of the base revision of
the current delta chain.

This allows for great potential flexibility when generating deltas, as any
revision can serve as deltaparent. Previously, the deltaparent for revision r
was hardcoded to be r - 1.

The base revision of the delta chain can still be accessed as before, since it
is now computed in an iterative fashion, following the deltaparents backwards.
2011-05-07 22:40:17 +02:00
Sune Foldager
88485e9322 revlog: calculate base revisions iteratively
This is in preparation for generaldelta, where the revlog entry base field is
reinterpreted as the deltaparent. For that reason we also rename the base
function to chainbase.

Without generaldelta, performance is unaffected, but generaldelta will suffer
from this in _addrevision, since delta chains will be walked repeatedly.
A cache has been added to eliminate this problem completely.
2011-05-07 22:40:14 +02:00
Sune Foldager
be6386433b revlog: remove the last bits of punched/shallow
Most of it was removed in fa05c723ac8c, but a few pieces were accidentally
left behind.
2011-05-07 22:37:40 +02:00
Martin Geisler
d04646b8d9 revlog: use real Booleans instead of 0/1 in nodesbetween 2011-05-06 12:09:20 +02:00
Sune Foldager
750dcd7b48 revlog: compute correct deltaparent in the deltaparent function
It now returns nullrev for chain base revisions, since they are conceptually
deltas against nullrev. The revdiff function was updated accordingly.
2011-05-05 18:05:24 +02:00
Sune Foldager
bb96ed66fc revlog: remove support for punched/shallow
The feature was never finished, and there has been restructuring going on
since it was added.
2011-05-05 12:46:02 +02:00
Sune Foldager
d959ff1e97 revlog: remove support for parentdelta
We will introduce a more powerful and general delta concept instead,
called generaldelta.
2011-05-05 12:55:12 +02:00
Peter Arrenbrecht
75fa0e5ea9 discovery: add new set-based discovery
Adds a new discovery method based on repeatedly sampling the still
undecided subset of the local node graph to determine the set of nodes
common to both the client and the server.

For small differences between client and server, it uses about the same
or slightly fewer roundtrips than the old tree-based discovery. For
larger differences, it typically reduces the number of roundtrips
drastically (from 150 to 4, for instance).

The old discovery code now lives in treediscovery.py, the new code is
in setdiscovery.py.

Still missing is a hook for extensions to contribute nodes to the
initial sample. For instance, Augie's remotebranches could contribute
the last known state of the server's heads.

Credits for the actual sampler and computing common heads instead of
bases go to Benoit Boissinot.
2011-05-02 19:21:30 +02:00
Benoit Boissinot
b805aced54 unbundler: separate delta and header parsing
Add header parsing for changelog and manifest (currently no headers might
change for next-gen bundle).
2011-04-30 19:01:24 +02:00
Benoit Boissinot
e3152ec807 changegroup: new bundler API 2011-04-30 11:03:28 +02:00
Benoit Boissinot
c5f5260aea bundler: make parsechunk return the base revision of the delta 2011-04-30 10:00:41 +02:00
Sune Foldager
9b847e3562 revlog: introduce _chunkbase to allow filelog to override
Used by revlog.revision to retrieve the base-chunk in a delta chain.
2011-04-30 16:33:47 +02:00
Alexander Solovyov
0eb3836642 remove unused imports and variables 2011-04-30 13:59:14 +02:00
Matt Mackall
1fb0b59ceb changegroup: introduce bundler objects
This makes the bundler pluggable at lower levels.
2011-03-31 15:24:06 -05:00
Matt Mackall
c9e7d5507f changegroup: add revlog to the group callback 2011-03-28 11:18:56 -05:00
Matt Mackall
d9e86660be changegroup: move sorting down into group 2011-03-28 11:18:56 -05:00
Matt Mackall
f94b6206a0 changegroup: combine infocollect and lookup callbacks 2011-03-28 11:18:56 -05:00
Matt Mackall
af08071ace changegroup: drop unused fullrev
This is unfinished and unused and complicates expanding the wire protocol.
2011-03-24 17:16:30 -05:00
Matt Mackall
f689cccd2c revlog: change variable name to avoid reuse 2011-03-26 17:12:02 -05:00
Peter Arrenbrecht
6646f48826 wireproto: add getbundle() function
getbundle(common, heads) -> bundle

Returns the changegroup for all ancestors of heads which are not ancestors of common. For both
sets, the heads are included in the set.

Intended to eventually supercede changegroupsubset and changegroup. Uses heads of common region
to exclude unwanted changesets instead of bases of desired region, which is more useful and
easier to implement.

Designed to be extensible with new optional arguments (which will have to be guarded by
corresponding capabilities).
2011-03-23 16:02:11 +01:00
Dan Villiom Podlaski Christiansen
ec590d5cd4 explicitly close files
Add missing calls to close() to many places where files are
opened. Relying on reference counting to catch them soon-ish is not
portable and fails in environments with a proper GC, such as PyPy.
2010-12-24 15:23:01 +01:00
Matt Mackall
f549de8060 revlog: remove stray test in rev() 2011-01-21 16:26:01 -06:00
Matt Mackall
856c224de7 revlog: pass rev to _checkhash 2011-01-18 15:55:48 -06:00
Matt Mackall
275d2d9cb0 revlog: incrementally build node cache with linear searches
This avoids needing to prime the cache for operations like verify
which visit most or all of the index.
2011-01-18 15:55:46 -06:00
Benoit Boissinot
8acffa3308 revlog: explicit test and explicit variable names 2011-01-16 12:25:46 +01:00
Benoit Boissinot
3ada8fe22e revlog: if the nodemap is set, use the fast version of revlog.rev() 2011-01-16 12:24:48 +01:00
Benoit Boissinot
383d62511b revlog/parseindex: construct the nodemap if it is empty 2011-01-15 15:06:53 +01:00
Benoit Boissinot
4072e97b7c revlog: always add the magic nullid/nullrev entry in parseindex 2011-01-15 13:02:19 +01:00
Benoit Boissinot
b75c111431 revlog/parseindex: no need to pass the file around 2011-01-15 15:04:58 +01:00
Matt Mackall
1e3dbac7f5 revlog: do revlog node->rev mapping by scanning
Now that the nodemap is lazily created, we use linear scanning back
from tip for typical node to rev mapping. Given that nodemap creation
is O(n log n) and revisions searched for are usually very close to
tip, this is often a significant performance win for a small number of
searches.

When we do end up building a nodemap for bulk lookups, the scanning
function is replaced with a hash lookup.
2011-01-11 21:52:03 -06:00
Matt Mackall
a1c37f5749 revlog: introduce a cache for partial lookups
Partial lookups are always O(n), and often we look up the same
one multiple times.
2011-01-11 17:12:32 -06:00
Matt Mackall
846d35e24f revlog: only build the nodemap on demand 2011-01-11 17:01:04 -06:00