We unify the delta selection process to be a simple three options process:
- try to use the incoming delta (if lazydeltabase is on)
- try to find a suitable parents to delta against (if gd is on)
- try to delta against the tipmost revision
The first of this option that yield a valid delta will be used.
The test change in 'test-generaldelta.t' show this behavior as we use a delta
against the parent instead of a full delta when the incoming delta is not
suitable.
This as some impact on 'test-bundle.t' because a delta somewhere changes. It
does not seems to change the test semantic and have been ignored.
The old code have multiple explicit tests and code duplications. This makes it
hard to improve the code. We rewrite the logic in a more generic way, not
changing anything of the computed result.
The final goal here is to eventually be able to:
- factor out the default fallback case "try against 'prev'" in a single place
- allow 'lazydeltabase' case to use the smarter general delta code path when
the incoming base does not provide us with a good delta.
Before this change, the 'lazydeltabase' would blindly build a delta using the
base provided by the incoming bundle and try to use it. If that base was far
down the revlog, the delta would be seen as "no good" and we would fall back to
a full text revision.
We now check if the delta is good and fallback to a computing a delta again the
tipmost revision otherwise (as we would do without general delta).
Later changesets will improve the logic to compute the fallback delta using the
general delta logic.
We would like to be able to exit the delta generation block without a valid
delta (for a more flexible control flow). So we make sure we do not expand the
"delta" content unless we actually have a delta.
We can do it one level lower because 'delta' is initialised at None anyway. Not
adding a level to the assignment prevent a line length issue.
The new convention is that bundles contain changegroups. bundle1
happens to *only* be a changegroup, but bundle2 is a more featureful
container that isn't something you can pass to addgroup().
The current behavior of revlogs is to flush the data file when writing
data to it. Tracing system calls revealed that changegroup processing
incurred numerous write(2) calls for values much smaller than the
default buffer size (Python defaults to 4096, but it can be adjusted
based on detected block size at run time by CPython).
The reason we flush revlogs is so readers have all data available.
For example, the current code in revlog.py will re-open the revlog
file (instead of seeking an existing file handle) to read the text
of a revision. This happens when starting a new delta chain when
adding several revisions from changegroups, for example. Yes, this
is likely sub-optimal (we should probably be sharing file descriptors
between readers and writers to avoid the flushing and associated
overhead of re-opening files).
While flushing revlogs is necessary, it appears all callers are
diligent about flushing files before a read is performed (see
buildtext() in _addrevision()), making the flush in
_writeentry() redundant and unncessary. So, we remove it. In practice,
this means we incur a write(2) a) when the buffer is full (typically
4096 bytes) b) when a new delta chain is created rather than after
every added revision. This applies to every revlog, but by volume
it mostly impacts filelogs.
Removing the redundant flush from _writeentry() significantly
reduces the number of write(2) calls during changegroup processing on
my Linux machine. When applying a changegroup of the hg repo based on
my local repo, the total number of write(2) calls during application
of the mercurial/localrepo.py revlogs dropped from 1,320 to 217 with
this patch applied. Total I/O related system calls dropped from 1,577
to 474.
When unbundling a mozilla-central gzipped bundle (264,403 changesets
with 1,492,215 changes to 222,507 files), total write(2) calls
dropped from 1,252,881 to 827,106 and total system calls dropped from
3,601,259 to 3,178,636 - a reduction of 425,775!
While the system call reduction is significant, it appears
to have no impact on wall time on my Linux and Windows machines. Still,
fewer syscalls is fewer syscalls. Surely this can't hurt. If nothing
else, it makes examining remaining system call usage simpler and opens
the door to experimenting with the performance impact of different
buffer sizes.
_addrevision() may need to read from revlogs as part of computing
deltas. Previously, we would flush existing file handles and open
a new, short-lived file handle to perform the reading.
If we have an existing file handle, it seems logical to reuse it
for reading instead of opening a new file handle. This patch
makes that the new behavior.
After this patch, revlog files are only reopened when adding
revisions if the revlog is switched from inline to non-inline.
On Linux when unbundling a bundle of the mozilla-central repo, this
patch has the following impact on system call counts:
Call Before After Delta
write 827,639 673,390 -154,249
open 700,103 684,089 -16,014
read 74,489 74,489 0
fstat 493,924 461,896 -32,028
close 249,131 233,117 -16,014
stat 242,001 242,001 0
lstat 18,676 18,676 0
lseek 20,268 20,268 0
ioctl 14,652 13,173 -1,479
TOTAL 3,180,758 2,930,679 -250,079
It's worth noting that many of the open() calls fail due to missing
files. That's why there are many more open() calls than close().
Despite the significant system call reduction, this change does not
seem to have a significant performance impact on Linux.
On Windows 10 (not a VM, on a SSD), this patch appears to reduce
unbundle time for mozilla-central from ~960s to ~920s. This isn't
as significant as I was hoping. But a decrease it is nonetheless.
Still, Windows unbundle performance is still >2x slower than Linux.
Despite the lack of significant gains, fewer system calls is fewer
system calls. If nothing else, this will narrow the focus of potential
areas to optimize in the future.
An upcoming patch will teach revlogs to use the existing file
handle to read revision data instead of opening a new file handle
just for quick reads. For this to work, files must be opened for
reading as well.
This patch is merely cosmetic: there are no behavior changes.
Currently, the low-level revlog reading code always opens a new file
handle. In some key scenarios, the revlog is already opened and an
existing file handle could be used to read. This patch paves the
road to that by teaching various revlog reading functions to accept
an optional existing file handle to read from.
revlog instances can cache the full text of a single revision. Typically
the most recently read revision is cached.
When adding a delta group via addgroup() and _addrevision(), the
full text isn't always computed: sometimes only the passed in delta is
sufficient for adding a new revision to the revlog.
When writing the changelog from a delta group, the just-added full
text revision is always read immediately after it is written because
the changegroup code needs to extract the set of files from the entry.
In other words, revision() is *always* being called and caching the full
text of the just-added revision is guaranteed to result in a cache hit,
making the cache worthwhile.
This patch adds support to _addrevision() for always building and
caching the full text. This option is currently only active when
processing changelog entries from a changegroup.
While the total number of revision() calls is the same, the location
matters: buildtext() calls into revision() on the base revision when
building the full text of the just-added revision. Since the previous
revision's _addrevision() built the full text and the the previous
revision is likely the base revision, this means that the base
revision's full text is likely cached and can be used to compute the
current full text from just a delta. No extra I/O required.
The end result is the changelog isn't opened and read after adding every
revision from a changegroup.
On my 2013 MacBook Pro running OS X 10.10.5 from an SSD and Python 2.7,
this patch impacted the time taken to apply ~262,000 changesets from a
mozilla-central gzip bundle:
before: ~43s
after: ~32s
~25% reduction in changelog processing times. Not bad.
The purpose of this code was to provide thread safety. With the
conversion of hgweb to use separate localrepository instances per
request/thread, we should no longer have any consumers that need to
access revlog instances from multiple threads. Remove the code.
This adds an option for delta'ing against both p1 and p2 when applying merge
revisions and picking whichever is smallest.
Some before and after stats on manifest.d size:
internal large repo:
before: 1.2 GB
after: 930 MB
mozilla-central:
before: 261 MB
after: 92 MB
The old generaldelta heuristic was "if p1 (or p2) was closer than the last full text,
use it, otherwise use prev". This was problematic when a repo contained multiple
branches that were very different. If commits to branch A were pushed, and the
last full text was branch B, it would generate a fulltext. Then if branch B was
pushed, it would generate another fulltext. The problem is that the last
fulltext (and delta'ing against `prev` in general) has no correlation with the
contents of the incoming revision, and therefore will always have degenerate
cases.
According to the blame, that algorithm was chosen to minimize the chain length.
Since there is already code that protects against that (the delta-vs-fulltext
code), and since it has been improved since the original generaldelta algorithm
went in (2011), I believe the chain length criteria will still be preserved.
The new algorithm always diffs against p1 (or p2 if it's closer), unless the
resulting delta will fail the delta-vs-fulltext check, in which case we delta
against prev.
Some before and after stats on manifest.d size.
internal large repo
old heuristic - 2.0 GB
new heuristic - 1.2 GB
mozilla-central
old heuristic - 242 MB
new heuristic - 261 MB
The regression in mozilla central is due to the new heuristic choosing p2r as
the delta when it's closer to the tip. Switching the algorithm to always prefer
p1r brings the size back down (242 MB). This is result of the way in which
mozilla does merges and pushes, and the result could easily swing the other
direction in other repos (depending on if they merge X into Y or Y into X), but
will never be as degenerate as before.
I future patch will address the regression by introducing an optional, even more
aggressive delta heuristic which will knock the mozilla manifest size down
dramatically.
This moves the textlen calculation to be above the delta chooser. Since textlen
is needed for calling isgooddelta, we need it above the delta chooser so future
patches can call isgooddelta.
This moves the delta vs fulltext comparison to its own function. This will allow
us to reuse the function in future patches for more efficient delta choices. As
a side effect, this will also allow extensions to modify our delta criteria.
This option will make repositories created as general delta by default but will
not make Mercurial aggressively recompute deltas for all incoming bundle.
Instead, the delta contained in the bundle will be used. This will allow us to
start having general delta repositories created everywhere without triggering
massive recomputation costs for all new clients cloning from old servers.
A subsequent patch will add a feature that performs iterative
computation as changesets are added from a changegroup. To facilitate
this type of processing in a generic manner, we add a mechanism for
calling a function whenever a revision is added via revlog.addgroup().
There are potential performance concerns with this callback, as using it
will flush the revlog after every revision is added.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
Before we were relying on _pack to error out when trying to pass an
integer that was too large for the "i" format specifier. Now we check
this earlier so we can form a better error message.
The error message unfortunately must exclude the filename at this
level of the call stack. The problem is that this name is not
available here, and the error can be triggered by a large manifest or
by a large file itself. Although perhaps we could provide the name of
a revlog index file (from the revlog object, instead of the revlogio
object), this seems like too much leakage of internal data structures.
It's not ideal already that an error message even mentions revlogs,
but this does seem unavoidable here.
I forgot to include this change as a previous diff and the native code to
compute the phases was never called. The AttributeError was silently caught and
the pure implementation was used instead.
The checkinlinesize function, which converts inline revlogs to non-inline,
uses the current transaction's "data" field to determine how to update the
transaction after the conversion.
This change works around the missing data field, which is not in the
transaction after a strip.
A censored revision stored in a revlog should have the censored revlog index
flag bit set. This implies we must know if a revision is censored before we
add it to the revlog. When adding revisions from exchanged deltas, we would
prefer to determine this flag without decoding every single full text.
This change introduces a heuristic based on assumptions around the Mercurial
delta format and filelog metadata. Since deltas which produce a censored
revision must be full-replacement deltas, we can read the delta's first bytes
to check the filelog metadata. Since "censored" is the alphabetically first
filelog metadata key, censored filelog revisions have a well-known prefix we
can look for.
For more on the design and background of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
A delta against a censored revision is either received through exchange and
written blindly to a revlog, or it is created by the revlog itself. This
change ensures the latter process creates deltas which fully replace all
data in a censored base using a single patch operation.
Recipients of a delta against a censored base will verify that the delta is in
this full-replace format. Other recipients will use the delta as normal.
For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
When a delta received through exchange is added to a revlog, it will very
often be expanded to a full text by applying the delta to its base. If
that delta is of a particular form, we can avoid decoding the base revision.
This avoids an exception if the base revision is censored.
For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
To ensure interoperability when clones disagree about which file nodes are
censored, a restriction is made on deltas based on censored nodes. Any such
delta must replace the full text of the base in a single patch.
If the recipient of a delta considers the base to be censored and the delta
is not in the expected form, the recipient must reject it, as it can't know
if the source has also censored the base.
For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
The iscensored method will be used by the exchange layer to reject
nonconforming deltas involving censored revisions (and to produce
conforming deltas).
For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
Because revlog implements __iter__, "rev in revlog" works but does silly O(n)
lookup unexpectedly. So it seems good to add fast version of __contains__.
This allows "rev in repo.changelog" in the next patch.
When receiving a delta via exchange, three possible storage outcomes emerge:
1. The delta is added directly to the revlog. ("fast-path")
2. A freshly-computed delta with a different base is stored.
3. The new revision's fulltext is computed and stored outright.
Both (2) and (3) require materializing the full text of the new revision by
applying the delta to its base. This is typically followed by a hash check.
The new flags argument allows callers to _addrevision to signal that they
expect that hash check to fail. We can use this opportunity to verify that
expectation. If the hash fails, require the flag be set; if the hash passes,
require the flag be unset.
Rather than simply eliding the hash check, this approach provides some
assurance that the censored flag is not applied to valid revisions.
Read more at: http://mercurial.selenic.com/wiki/CensorPlan
For revlog index flags to be useful to other parts of Mercurial, they need to
be settable when writing revisions. The current use case for revlog index
flags is the censorship feature: http://mercurial.selenic.com/wiki/CensorPlan
While the censor flag could be inferred in _addrevision by interrogating the
text/delta being added, that would bury the censorship logic and
inappropriately couple it to all revision creation.
This flag bit will be used to cheaply signal censorship presence to upper
layers (exchange, verify). It indicates that censorship metadata is present
but does not attest to the verifiability of that metadata.
For the censorship design, see: http://mercurial.selenic.com/wiki/CensorPlan
This dumb cache works surprisingly well: on a repository with typical delta
chains ~50k in length, unbundling a linear series of 5000 revisions (changelogs
and manifests only) went from 60 seconds to 3.
This doesn't affect normal clones since they'd be bound by the CPU bound below
anyway -- it does, however, improve generaldelta clones significantly.
This also results in better deltaing for generaldelta clones -- in generaldelta
clones, we calculate deltas with respect to the closest base if it has a higher
revision number than either parent. If the base is on a significantly different
branch, this can result in pointlessly massive deltas. This reduces the number
of bases and hence the number of bad deltas.
Empirically, for a highly branchy repository, this resulted in an improvement
of around 15% to manifest size.
This is a very silly case and not particularly likely to happen in the wild,
but it turns out we can hit it in a couple of places. As we tune the storage
parameters we're likely to hit more such cases.
The affected test cases all have smaller revlogs now.
The current heuristic for deciding between storing delta and full texts
is based on ratio of (sizeofdeltas)/(sizeoffulltext).
In some cases (for example a manifest for ahuge repo) this approach
can result in extremely long delta chains (~30,000) which are very slow to
read. (In the case of a manifest ~500ms are added to every hg command because of that).
This commit introduces "revlog.maxchainlength" configuration variable that will
limit delta chain length.
The chain length was computed correctly only when generaldelta
feature was enabled. Now it's fixed.
When generaldelta is disabled the base revision in revlog index is not
the revision we have delta against - it's always previous revision.
Instead of incorrect chainbaseandlen in command.py we are now using two
single-responsibility functions in revlog.py:
- chainbase(rev)
- chainlen(rev)
Only chainlen(rev) was missing so it was written to mimic the way the
chain of deltas is actually found during file reconstruction.
This change allows a revision log to not fail integrity checks when applying a
changegroup delta (eg from a bundle) results in a censored file tombstone. The
tombstone is inserted as-is, so future integrity verification will observe the
tombstone. Deltas based on the tombstone will also remain correct.
The new code path is encountered for *exactly* the cases where _addrevision is
importing a tombstone from a changegroup. When committing a file containing
the "magic" tombstone text, the "text" parameter will be non-empty and the
checkhash call is not executed (and when committing, the node will be computed
to match the "magic" tombstone text).
This will make it possible for subclasses to have different hashing
schemes when appropriate. I anticipate using this in manifests.
Note that there's still one client of mercurial.revlog.hash() outside
of revlog: mercurial.context.memctx uses it to construct the file
entries in an in-memory manifest. I don't think this will be a problem
in the immediate future, so I've left it as-is.
Python uses a C long (32 bits on Windows 64) rather than an ssize_t in
read(), and thus has a 2G size limit. Work around this by falling back
to reading one chunk at a time on overflow. This approximately doubles
our headroom until we run back into the size limit on single reads.
Moves the code that actually writes to a file to a separate function in
revlog.py. This allows extensions to intercept and use the data being written to
disk. For example, an extension might want to replicate these writes elsewhere.
When cloning the Mercurial repo on /dev/shm with --pull, I see about a 0.3% perf change.
It goes from 28.2 to 28.3 seconds.
Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:
$ hg --config format.chunkcachesize=1024 perfmoonwalk
! wall 2.022021 comb 2.030000 user 1.970000 sys 0.060000 (best of 5)
(16,154 cache hits, 3,840 misses.)
$ hg --config format.chunkcachesize=4096 perfmoonwalk
! wall 1.901006 comb 1.900000 user 1.880000 sys 0.020000 (best of 6)
(19,003 hits, 991 misses.)
$ hg --config format.chunkcachesize=16384 perfmoonwalk
! wall 1.802775 comb 1.800000 user 1.800000 sys 0.000000 (best of 6)
(19,746 hits, 248 misses.)
$ hg --config format.chunkcachesize=32768 perfmoonwalk
! wall 1.818545 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)
(19,870 hits, 124 misses.)
$ hg --config format.chunkcachesize=65536 perfmoonwalk
! wall 1.801350 comb 1.810000 user 1.800000 sys 0.010000 (best of 6)
(19,932 hits, 62 misses.)
$ hg --config format.chunkcachesize=131072 perfmoonwalk
! wall 1.805879 comb 1.820000 user 1.810000 sys 0.010000 (best of 6)
(19,963 hits, 31 misses.)
We may want to change the default size in the future based on testing and
user feedback.
When reading a revlog chunk, instead of reading up to 64 KB ahead of the
request offset and caching that, this change caches a fixed window before
and after the requested data that falls on 64 KB boundaries. This increases
cache hits when reading revlogs backwards.
Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:
$ hg perfmoonwalk
! wall 2.307994 comb 2.310000 user 2.120000 sys 0.190000 (best of 5)
(Each run has 10,668 cache hits and 9,304 misses.)
After this change:
$ hg perfmoonwalk
! wall 1.814117 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)
(19,931 cache hits, 62 misses.)
On a busy NFS share, before this change:
$ hg perfmoonwalk
! wall 17.000034 comb 4.100000 user 3.270000 sys 0.830000 (best of 3)
After:
$ hg perfmoonwalk
! wall 1.746115 comb 1.670000 user 1.660000 sys 0.010000 (best of 5)
The previous revlog strip computation would walk every rev in the revlog, from
the bottom to the top. Since we're usually stripping only the top few revs of
the revlog, this was needlessly expensive on large repos.
The new algorithm walks the exact number of revs that will be stripped, thus
making the operation not dependent on the number of revs in the repo.
This makes amend on a large repo go from 8.7 seconds to 6 seconds.
When computing the commonmissing, it greedily computes the entire set
immediately. On a large repo where the majority of history is irrelevant, this
causes a significant slow down.
Replacing it with a lazy set makes amend go from 11 seconds to 8.7 seconds.
Previously basecache was incorrectly initialized before adding the first
revision from a changegroup. Basecache value influences when full revisions are
stored in revlog (when using generaldelta). As a result it was possible to
generate a generaldelta-revlog that could be bigger by arbitrary factor than its
non-generaldelta equivalent.
In case we don't have a cached text already, add the base rev to the list
passed to _chunks. In the cached case this also avoids unnecessarily preloading
the chunk for the cached rev.
We do this in a somewhat hacky way, relying on the fact that our sole caller
preloads the cache right before calling us. An upcoming patch will make this
more sensible.
For a 20 MB manifest with a delta chain of > 40k, perfmanifest goes from 0.49
seconds to 0.46.
Previously the length of data preloaded did not account for the interleaved io
contents. This meant that we'd sometimes have cache misses in _chunks despite
the preloading.
Having a correctly filled out cache will become essential in an upcoming patch.
This moves _chunkraw into the loop. Doing that improves revlog decompression --
in particular, manifest decompression -- significantly. For a 20 MB manifest
which is the result of a > 40k delta chain, hg perfmanifest improves from 0.55
seconds to 0.49 seconds.
This change will allow revlog subclasses that override 'checkhash' method
to use custom strategy of computing nodeids without overriding 'addrevision'
method. In particular this change is necessary to implement manifest
compression.
Extract method that decides whether nodeid is correct for paricular revision
text and parent nodes. Having this method extracted will allow revlog
subclasses to implement custom way of computing nodes. In particular this
change is necessary to implement manifest compression.
When we deployed the latest crew mercurial to our users, a few of them
had issues where a filelog would have an entry with a -1 linkrev. This
caused operations like rebase and amend to create a bundle containing the
entire repository, which took a long time.
I don't know what the issue is, but adding this check should prevent repos
from getting in this state, and should help us pinpoint the issue next time
it happens.
The performance of both the old and new Python ancestor algorithms
depends on the number of revs they need to traverse. Although the
new algorithm performs far better than the old when revs are
numerically and topologically close, both algorithms become slow
under other circumstances, taking up to 1.8 seconds to give answers
in a Linux kernel repo.
This C implementation of the new algorithm is a fairly straightforward
transliteration. The only corner case of interest is that it raises
an OverflowError if the number of GCA candidates found during the
first pass is greater than 24, to avoid the dual perils of fixnum
overflow and trying to allocate too much memory. (If this exception
is raised, the Python implementation is used instead.)
Performance numbers are good: in a Linux kernel repo, time for "hg
debugancestors" on two distant revs (24bf01de7537 and c2a8808f5943)
is as follows:
Old Python: 0.36 sec
New Python: 0.42 sec
New C: 0.02 sec
For a case where the new algorithm should perform well:
Old Python: 1.84 sec
New Python: 0.07 sec
New C: measures as zero when using --time
(This commit includes a paranoid cross-check to ensure that the
Python and C implementations give identical answers. The above
performance numbers were measured with that check disabled.)
Previously, we chose a rev based on numeric ordering, which could
cause "the same merge" in topologically identical but numerically
different repos to choose different merge bases.
We now choose the lexically least node; this is stable across
different revlog orderings.
Instead of walking all the way to the root of the DAG, we generate
a set of candidate GCA revs, then figure out which ones will win
the race to the root (usually without needing to traverse all the
way to the root).
In the common case of nodes that are close to each other in both
revision number and topology, this is usually a big win: it makes
"hg --time debugancestors" up to 9 times faster than the more general
ancestor function when measured on heads of the linux-2.6 hg repo.
Victory is not assured, however. The older function can still win
by a large margin if one node is much closer to the root than the
other, or by a much smaller amount if one is an ancestor of the
other.
For now, we've also got a small paranoid harness function that calls
both ancestor functions on every input and ensures that they give
equivalent answers.
Even without the checker function, the old ancestor function needs
to stay alive for the time being, as its generality is used by
context.filectx.merge.
This is in preparation for an upcoming refactoring. This also fixes a bug in
incancestors, where if an element of revs was an ancestor of another it would
be generated twice.
We often need to perform rev iteration in reverse order. This
changeset makes it possible to do so, in order to avoid costly reverse
or reversed() calls later.
This also speeds up other commands that use findmissing, like
incoming and merge --preview. With a large linear repository (>400000
commits) and with one incoming changeset, incoming is sped up from
around 4-4.5 seconds to under 3.
When commiting to a repo with lots of history (>400000 changesets)
the filteredrevs check (added with 373606589de5) in changelog.py
takes a bit of time even if the filteredrevs set is empty. Skipping
the check in that case shaves 0.36 seconds off a 2.14 second commit.
A 17% gain.
Make the pure python implementation of headrevs available to derived classes. It
is important because filtering logic applied by `revlog` derived class won't
have effect on `index`. We want to be able to bypass this C call to implement
our own.
This prepares changelog level filtering. We can't assume that any revision can
be heads because filtered revisions need to be excluded.
New algorithm:
- All revisions now start as "non heads",
- every revision we iterate over is made candidate head,
- parents of iterated revisions are definitely not head.
Filtered revisions are never iterated over and never considered as candidate
head.
This prepares changelog level filtering. We need the algorithms used in revlog to
work on a subset of revisions. To achieve this, the use of explicit range of
revision is banned. `range` and `xrange` calls are replaced by a `revlog.irevs`
method. Filtered super class can then overwrite the `irevs` method to filter out
revision.
The decision whether or not to store a full snapshot instead of a delta is done
based on the distance value calculated in _addrevision.builddelta(rev).
This calculation traditionally used the fact of deltas only using the previous
revision as base. Generaldelta mechanism is changing this, yet the calculation
still assumes that current-offset minus chainbase-offset equals chain-length.
This appears to be wrong.
This patch corrects the calculation by means of using the chainlength function
if Generaldelta is used.
This allows an extension to optionally use a new compression type based
on the options applied by the repo to the revlog's opener.
(decompress doesn't need the same treatment, as it can be replaced using
extensions.wrapfunction, and can figure out which compression algorithm
is in use based on the first byte of the compressed payload.)
ancestors() returns the ancestors of revs provided. This func is like
that except it also includes the revs themselves in the total set of
revs generated.
This will be used as a step in removing reachable() in a future diff.
Doing it now because bryano is in the process of rewriting ancestors in
C. This depends on bryano's patch to replace *revs with revs in the
declaration of revlog.ancestors.
Accepting a variable number of arguments as the old API did is
deeply ugly, particularly as it means the API can't be extended
with new arguments. Partly as a result, we have at least three
different implementations of the same ancestors algorithm (!?).
Most callers were forced to call ancestors(*somelist), adding to
both inefficiency and ugliness.
There have been quite a few places where we pop elements off the
front of a list. This can turn O(n) algorithms into something more
like O(n**2). Python has provided a deque type that can do this
efficiently since at least 2.4.
As an example of the difference a deque can make, it improves
perfancestors performance on a Linux repo from 0.50 seconds to 0.36.
The C implementation is more than 100 times faster than the Python
version (which is still available as a fallback).
In a repo with 330,000 revs and a stale .hg/cache/tags file, this
patch improves the performance of "hg tip" from 2.2 to 1.6 seconds.
The underlying C code doesn't support indexing by longs, there are no
legitimate reasons to use a long, and longs should generally be
converted to ints at a higher level by context's constructor.
The radix tree already contains all the information we need to
determine whether a short string is an unambiguous node identifier.
We now make use of this information.
In a kernel tree, this improves the performance of
"hg log -q -r24bf01de75" from 0.27 seconds to 0.06.
This regresses performance of 'hg branches', presumably because it's
visiting the revlog in the wrong order. This suggests we either need
to fix the branch code or add some read-behind to mitigate the effect.
This showed up in a statprof profile of "hg svn rebuildmeta", which
is read-intensive on the changelog. This two-line patch improved
the performance of that command by 10%.
This greatly speeds up node->rev lookups, with results that are
often user-perceptible: for instance, "hg --time log" of the node
associated with rev 1000 on a linux-2.6 repo improves from 0.3
seconds to 0.03. I have not found any instances of slowdowns.
The new perfnodelookup command in contrib/perf.py demonstrates the
speedup more dramatically, since it performs no I/O. For a single
lookup, the new code is about 40x faster.
These changes also prepare the ground for the possibility of further
improving the performance of prefix-based node lookups.
This list will contains any node see in the source, not only the added one.
This is intended to allow phase to be move according what was pushed by client
not only what was added.
The usual contract is that close() makes your writes permanent, so
atomictempfile's use of close() to *discard* writes (and rename() to
keep them) is rather unexpected. Thus, change it so close() makes
things permanent and add a new discard() method to throw them away.
discard() is only used internally, in __del__(), to ensure that writes
are discarded when an atomictempfile object goes out of scope.
I audited mercurial.*, hgext.*, and ~80 third-party extensions, and
found no one using the existing semantics of close() to discard
writes, so this should be safe.
This greatly improves the speed of the bundling process, and often reduces the
bundle size considerably. (Although if the repository is already ordered, this
has little effect on both time and bundle size.)
For non-generaldelta clients, the reduced bundle size translates to a reduced
repository size, similar to shrinking the revlogs (which uses the exact same
algorithm). For generaldelta clients the difference is minor.
When the new bundle format comes, reordering will not be necessary since we
can then store the deltaparent relationsships directly. The eventual default
behavior for clients and servers is presented in the table below, where "new"
implies support for GD as well as the new bundle format:
old client new client
old server old bundle, no reorder old bundle, no reorder
new server, non-GD old bundle, no reorder[1] old bundle, no reorder[2]
new server, GD old bundle, reorder[3] new bundle, no reorder[4]
[1] reordering is expensive on the server in this case, skip it
[2] client can choose to do its own redelta here
[3] reordering is needed because otherwise the pull does a lot of extra
work on the server
[4] reordering isn't needed because client can get deltabase in bundle
format
Currently, the default is to reorder on GD-servers, and not otherwise. A new
setting, bundle.reorder, has been added to override the default reordering
behavior. It can be set to either 'auto' (the default), or any true or false
value as a standard boolean setting, to either force the reordering on or off
regardless of generaldelta.
Some timing data from a relatively branch test repository follows. All
bundling is done with --all --type none options.
Non-generaldelta, non-shrunk repo:
-----------------------------------
Size: 276M
Without reorder (default):
Bundle time: 14.4 seconds
Bundle size: 939M
With reorder:
Bundle time: 1 minute, 29.3 seconds
Bundle size: 381M
Generaldelta, non-shrunk repo:
-----------------------------------
Size: 87M
Without reorder:
Bundle time: 2 minutes, 1.4 seconds
Bundle size: 939M
With reorder (default):
Bundle time: 25.5 seconds
Bundle size: 381M
defversion was a property (later option) on the store opener, used to propagate
the changelog revlog format to the other revlogs, so they would be created with
the same format.
This required that the changelog instance was created before any other revlog;
an invariant that wasn't directly enforced (or documented) anywhere.
We now use the revlogv1 requirement instead, which is transfered to the store
opener options. If this option is missing, v0 revlogs are created.
Without this change, pulls (and clones) into a generaldelta repository could
generate very inefficient revlogs, the size of which could be at least twice
the original size.
This was caused by the generated delta chains covering too large distances,
causing new chains to be built far too often. This change addresses the
problem by forcing a delta against second parent or against the previous
revision, when the first parent delta is in danger of creating a long chain.
The bug didn't cause corruption, and thus wasn't caught in hg verify or in
tests. It could lead to delta chains longer than normally allowed, by
affecting the code that decides when to add a full revision. This could,
in turn, lead to performance regression.
With generaldelta switched on, deltas are always computed against the first
parent when adding revisions. This is done regardless of what revision the
incoming bundle, if any, is deltaed against.
The exact delta building strategy is subject to change, but this will not
affect compatibility.
Generaldelta is switched off by default.
Generaldelta is a new revlog global flag. When it's turned on, the base field
of each revision entry holds the deltaparent instead of the base revision of
the current delta chain.
This allows for great potential flexibility when generating deltas, as any
revision can serve as deltaparent. Previously, the deltaparent for revision r
was hardcoded to be r - 1.
The base revision of the delta chain can still be accessed as before, since it
is now computed in an iterative fashion, following the deltaparents backwards.
This is in preparation for generaldelta, where the revlog entry base field is
reinterpreted as the deltaparent. For that reason we also rename the base
function to chainbase.
Without generaldelta, performance is unaffected, but generaldelta will suffer
from this in _addrevision, since delta chains will be walked repeatedly.
A cache has been added to eliminate this problem completely.