defversion was a property (later option) on the store opener, used to propagate
the changelog revlog format to the other revlogs, so they would be created with
the same format.
This required that the changelog instance was created before any other revlog;
an invariant that wasn't directly enforced (or documented) anywhere.
We now use the revlogv1 requirement instead, which is transfered to the store
opener options. If this option is missing, v0 revlogs are created.
Without this change, pulls (and clones) into a generaldelta repository could
generate very inefficient revlogs, the size of which could be at least twice
the original size.
This was caused by the generated delta chains covering too large distances,
causing new chains to be built far too often. This change addresses the
problem by forcing a delta against second parent or against the previous
revision, when the first parent delta is in danger of creating a long chain.
The bug didn't cause corruption, and thus wasn't caught in hg verify or in
tests. It could lead to delta chains longer than normally allowed, by
affecting the code that decides when to add a full revision. This could,
in turn, lead to performance regression.
With generaldelta switched on, deltas are always computed against the first
parent when adding revisions. This is done regardless of what revision the
incoming bundle, if any, is deltaed against.
The exact delta building strategy is subject to change, but this will not
affect compatibility.
Generaldelta is switched off by default.
Generaldelta is a new revlog global flag. When it's turned on, the base field
of each revision entry holds the deltaparent instead of the base revision of
the current delta chain.
This allows for great potential flexibility when generating deltas, as any
revision can serve as deltaparent. Previously, the deltaparent for revision r
was hardcoded to be r - 1.
The base revision of the delta chain can still be accessed as before, since it
is now computed in an iterative fashion, following the deltaparents backwards.
This is in preparation for generaldelta, where the revlog entry base field is
reinterpreted as the deltaparent. For that reason we also rename the base
function to chainbase.
Without generaldelta, performance is unaffected, but generaldelta will suffer
from this in _addrevision, since delta chains will be walked repeatedly.
A cache has been added to eliminate this problem completely.
Adds a new discovery method based on repeatedly sampling the still
undecided subset of the local node graph to determine the set of nodes
common to both the client and the server.
For small differences between client and server, it uses about the same
or slightly fewer roundtrips than the old tree-based discovery. For
larger differences, it typically reduces the number of roundtrips
drastically (from 150 to 4, for instance).
The old discovery code now lives in treediscovery.py, the new code is
in setdiscovery.py.
Still missing is a hook for extensions to contribute nodes to the
initial sample. For instance, Augie's remotebranches could contribute
the last known state of the server's heads.
Credits for the actual sampler and computing common heads instead of
bases go to Benoit Boissinot.
getbundle(common, heads) -> bundle
Returns the changegroup for all ancestors of heads which are not ancestors of common. For both
sets, the heads are included in the set.
Intended to eventually supercede changegroupsubset and changegroup. Uses heads of common region
to exclude unwanted changesets instead of bases of desired region, which is more useful and
easier to implement.
Designed to be extensible with new optional arguments (which will have to be guarded by
corresponding capabilities).
Add missing calls to close() to many places where files are
opened. Relying on reference counting to catch them soon-ish is not
portable and fails in environments with a proper GC, such as PyPy.
Now that the nodemap is lazily created, we use linear scanning back
from tip for typical node to rev mapping. Given that nodemap creation
is O(n log n) and revisions searched for are usually very close to
tip, this is often a significant performance win for a small number of
searches.
When we do end up building a nodemap for bulk lookups, the scanning
function is replaced with a hash lookup.
We were not returning the correct result if nullrev was in revs, as we
are checking parent(currentrev) != nullrev before yielding currentrev
test-convert-hg-startrev was wrong: if we start converting from rev -1 and
onwards, all the descendants of -1 (full repo) should be converted.
When parentdelta is enabled, we choose the delta that has the minimum
distance to its base. Otherwise, base may be sufficiently far away to
require a full version, resulting in greatly reduced compression.
While reading changegroup if a node with missing parents is encountered,
we add a punched entry in the index with null parents for the missing
parent node.
Rename some variables, making the name more obvious (in particular "cache" was
actually two different variable.
Move code around, moving the index preloading before the deltachain computation,
without that index preloading was useless (everything was read in deltachain).
Refactor revlog._addrevision() and put the correct base rev in the
parent-delta case: base(rev) should always be equal to the first full snapshot
that is needed by the delta chain, in both parent-delta and tip-delta case.
Before this fix, the base rev was in most case wrong (and in the case where
p1 == nullid, this triggered the bug from issue2337). This means that
repositories converted to parent-delta earlier are corrupted and needs to be
reconverted.
index flag to identify a revision as punched, i.e. it contains no data.
REVIDX_PUNCHED_FLAG = 2, is used to mark a revision as punched.
REVIDX_KNOWN_FLAGS is the accumulation of all index flags.
This is similar to the __builtin__.cmp behaviour, but still not
straightforward, as the dailylife meaning of a comparison usually is
"find out if they are different".
Previously, it only returned revisions that were in the revlog when it
was originally opened; revisions added since then were invisible.
This broke revlog._partialmatch() and therefore repo.lookup().
(Credit to Benoit Boissinot for simplifying my original test script
and for the actual fix.)
This lets us change the threshold at which a *.d file will be split
out, which should make it much easier to construct test cases that
probe revlogs with a separate data file.
(issue2137)
This lets us change the threshold at which a *.d file will be split
out, which should make it much easier to construct test cases that
probe revlogs with a separate data file.
(issue2137)
If a buffer of an mutable object is passed to revlog.addrevision(), the revlog
will happily store it in its cache. Later when the revlog reuses the cached
entry, if the manifest modified the object in-between, all kind of bugs
appears.
We fix it by:
- passing immutable objects to addrevision() if they are already available
- only storing the text in the cache if it's of str type
Then we can remove the conversion of the cache entry to str() during
retrieval. That was probably just there hiding the bug for the common cases
but not really fixing it.