'repo' is a very confusing name to use for 'self', especially when
it's not a repo. Also drop repo.ui member (a.k.a. self.ui) now that
'self' doesn't shadow outer 'repo' variable.
We already have a _repo property on the packer, and we only access the
changelog and manifest revlog in one place, so it's just as easy to
get them from self._repo.
Since we always pass self._reorder to self.group(), let's drop the
parameter and let group() read from self._reorder itself. There are no
other in-tree callers to group().
The difference between reorder=None (bundle.reorder=auto) and
reorder=False is that the generaldelta revlogs get reordered with the
former. In cg2packer, group() we check if the revlog uses generaldelta
and if reorder=None and then convert that to reorder=False. We are
effectively saying that whether or not generaldelta is used, we want
reorder=None to mean reorder=False for changegroup 2. To make this
clearer, check if reorder=None in the constructor and change it to
False there and drop the overriding of group(). Also document the
reason for turning reordering off.
The config option bundle.reorder can be {on,off,auto}, which gets read
into the 'reorder' variable as {True,False,None}. In two places, we
need to decide how to handle the None/auto case. I personally find it
easier to read those expressions when written to explicitly compare to
None.
changegroup.group() and changegroup.generatefiles() both currently
start progress (with topic "bundling"), but changegroup.generate()
closes the topic. Move the closing to the functions that start the
topic, so it's easier to see where the topic is started and closed.
This completes a move that seems to have been started in f8f5836242c6
(bundle-ng: move progress handling out of the linkrev callback,
2013-05-10).
The 'mf' variable is a manifest revlog, not a manifest, so let's
rename it accordingly. We already call the changelog variable 'cl', so
'ml' seems appropriate.
The parameter has been unused since it was introduced in 40209abd6471
(bundle: refactor changegroup prune to be its own function,
2013-05-30), and Durham says it is not used by his extension either.
This eliminates the following test failure on Windows, as well as a similar one
in evolve's test-wireproto.t. See the previous patch for details on the
problem.
--- e:/Projects/hg/tests/test-init.t
+++ e:/Projects/hg/tests/test-init.t.err
@@ -216,10 +216,10 @@
* test 0:08b9e9f63b32
$ hg clone -e "python \"$TESTDIR/dummyssh\"" local ssh://user@dummy/remote-bookmarks
searching for changes
+ exporting bookmark test
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
- exporting bookmark test
$ hg -R remote-bookmarks bookmarks
test 0:08b9e9f63b32
It is finally time to freeze the bundle2 format! To do so we:
- rename HG2Y to HG20,
- drop "b2x:" prefix from all part names,
- rename capability to "bundle2-exp" to "bundle2"
- rename the hook flag from 'bundle2-exp' to 'bundle2'
To ensure that exchanged deltas in the presence of censored revisions can
always be applied to the recipient repository, the deltas must replace the
entire base text. To make this restriction reasonably enforceable, the delta
must do so with a single patch operation.
For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
To ensure interoperability when clones disagree about which file nodes are
censored, a restriction is made on deltas based on censored nodes. Any such
delta must replace the full text of the base in a single patch.
If the recipient of a delta considers the base to be censored and the delta
is not in the expected form, the recipient must reject it, as it can't know
if the source has also censored the base.
For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
This diff adds support to writebundle to generate a bundle2 wrapper; upcoming
diffs will add an option to write a v2 changegroup part instead of v1 in these
bundles.
The next diff will add support for writing bundle2 files to writebundle, but
the bundle2 generator wants access to a ui object. This changes the signature
and callsites to pass one in.
This is kind of similar to the debugbundle command but gives summarized actual
uncompressed number of bytes when creating the bundle. The numbers are as
usable as the bundle format is efficient. Hopefully bundle2 will make it a
better indicator of actual entropy.
This is useful when accepting pull requests to assess whether the repo size
increase seems reasonable for the diff before pushing stuff upstream, It has
helped me catching large files that should have been committed as largefiles
but was committed as regular files in intermediate changesets.
This output doesn't combine well with debug output so we only enable it when
verbose without debug.
Previously, if reorder was true during the creation of a changegroup bundle,
it was possible that the manifest and filelogs would be reordered such that the
resulting bundle filelog had a linkrev that pointed to a commit that was not
the earliest instance of the filelog revision. For example:
With commits:
0<-1<---3<-4
\ /
--2<---
if 2 and 3 added the same version of a file, if the manifests of 2 and 3 have
their order reversed, but the changelog did not, it could produce a filelog with
linkrevs 0<-3 instead of 0<-2, which meant if commit 3 was stripped, it would
delete that file data from the repository and commit 2 would be corrupt (as
would any future pulls that tried to build upon that version of the file).
The fix is to make the linkrev fixup smarter. Previously it considered the first
manifest that added a file to be the first commit that added that file, which is
not true. Now, for every file revision we add to the bundle we make sure we
attach it to the earliest applicable linkrev.
Previously, fnodes had a key and empty dict value for every element in
changedfiles. This is somewhat wasteful. Empty dicts in CPython consume
a lot more memory than you would expect - 280 bytes.
On mozilla-central, which has ~190,000 files/fnodes keys, the previous
loop populating fnodes allocated 91,924 KB of memory, most of that for
the empty dicts.
With this patch in place, our peak RSS during mozilla-central clone
drops:
before: 364,356 KB
after: 326,008 KB
delta: -38,348 KB
When combined with the previous patch, total peak RSS decrease is now
190,116 KB.
The contents of fnodes are only accessed once per key. It is wasteful to
cache the value since nobody will use it.
Before this patch, the caching of unused data in fnodes was effectively
causing a memory leak during the file streaming part of bundle creation.
On mozilla-central (which has ~190,000 entries in fnodes), this patch
has a significant impact on RSS at the end of generate():
before: 516,124 KB
after: 364,356 KB
delta: -151,768 KB
The origin of this code can be traced back to 1f567a607f1f and has been
with us since the 2.7 release.
lookupmf() is currently defined earlier than when it is needed. Future
patches further refactoring this code will be easier to read when
lookupmf() is in its new home.
This mirrors the API for 'pending' and 'finalize' callbacks. I do not have
immediate usage planned for it, but I'm sure some callback will be happy to
access transaction related data.
The post-transaction hooks run after the lock release (because hooks may want to
touch the repository), but they must only run if the transaction is successfully
closed.
We use the new 'addpostclose' method on transaction to register a callback
installing this post-lock-release call.
Instead of calling 'cl.finalize()' by hand (possibly at a bogus time) we
register it in the transaction during 'delayupdate' and rely on 'tr.close()' to
call it at the right time.
The 'delayupdate' method now takes a transaction object and registers its
'_writepending' method for execution in 'transaction.writepending()'. The hook can then
use 'transaction.writepending()' directly.
At some point this will allow the addition of other file creation
during writepending.
cg2 supports generaldelta in changegroups, to be used in bundle2.
Since generaldelta is handled directly in cg2, reordering is switched
off by default.
The commands getchangegroup, getlocalchangegroup and getsubset now each
have a version ending in -raw. The raw versions return the chunk generator
from the changegroup packer directly, without wrapping it in a chunkbuffer
and unpacker. This avoids extra chunkbuffers in the bundle2 code path.
Also, the raw versions can be extended to support alternative packers
in the future, to be used from bundle2.
We only have "01" right now, but we should get general delta in soon.
Bundle2 is expected to make use of this to advertise and select the right packer
to use on both sides.
We store the source and url of the current data into `transaction.hookargs` this
let us inherit it from upper layers that may have created a much wider
transaction. We have to modify bundle2 at the same time to register the source
and url in the transaction. We have to do it in the same patch otherwise, the
`addchangegroup` call would fill these values and the hook calling will crash
because of the duplicated 'source' and 'url' arguments passed to the hook call.
We want to reused some possible information stored in the transaction
`hookargs` dict that may be stored by something handling the transaction at an
upper level (eg: bundle2) So we move the running of the hooks after transaction
creation. This has no visible effects (but an empty transaction roolback if the
hook fails) because nothing had happened in the transaction yet.
The transaction is now carrying hook-related informations. So we use it to
retrieve the `node` argument. This will also carry around all kinds of other useful
informations (like: "are we in a bundle2 processing")
addchangegroup creates a runhook function that is used to invoke the
changegroup and incoming hooks, but at the time the function is called,
the contents of hookargs associated with the transaction may have been
modified externally. For instance, bundle2 code affects it with
obsolescence markers and bookmarks info.
It also creates problems when a single transaction is used with multiple
changegroups added (as per an upcoming change), whereby the contents
of hookargs are that of after adding a latter changegroup when invoking
the hook for the first changegroup.
Functions like getbundle and classes like unbundle10 really manipulate
changegroups and not bundles. A HG10 bundle is the same as a changegroup
plus a small header, but this is no longer the case for a HG2X bundle,
so it's better to separate the names a bit.
We now pass a transaction option to this phase movement function. The
object is currently not used by the function, but it will be in the
future.
All call sites have been updated. Most call sites were already enclosed in a
transaction for a long time. The handful of others have been recently
updated in previous commit.
We now pass a transaction option to this phase movement function. The object
is currently not used by the function, but it will be in the future.
All call sites have been updated. Most call sites were already enclosed in a
transaction for a long time. The handful of others have been recently
updated in previous commit.
The retractboundary function remains to be upgraded.
This argument controls the phase used for the added changesets. This can be
useful to unbundle in "secret" phase as required by shelve.
This change aims at helping high-level code get rid of manual phase
movement. An important milestone for having phases part of the transaction.