The computation of roots was buggy, any ancestor of a bundled merge which was
also a descendant of the parents of a bundled revision were included as part of
the bundle. We fix it and add a test for strip (which revealed the problem).
Check the test for a practical usecase.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
This eliminates the following test failure on Windows, as well as a similar one
in evolve's test-wireproto.t. See the previous patch for details on the
problem.
--- e:/Projects/hg/tests/test-init.t
+++ e:/Projects/hg/tests/test-init.t.err
@@ -216,10 +216,10 @@
* test 0:08b9e9f63b32
$ hg clone -e "python \"$TESTDIR/dummyssh\"" local ssh://user@dummy/remote-bookmarks
searching for changes
+ exporting bookmark test
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
- exporting bookmark test
$ hg -R remote-bookmarks bookmarks
test 0:08b9e9f63b32
It is finally time to freeze the bundle2 format! To do so we:
- rename HG2Y to HG20,
- drop "b2x:" prefix from all part names,
- rename capability to "bundle2-exp" to "bundle2"
- rename the hook flag from 'bundle2-exp' to 'bundle2'
'repo' is a very confusing name to use for 'self', especially when
it's not a repo. Also drop repo.ui member (a.k.a. self.ui) now that
'self' doesn't shadow outer 'repo' variable.
To ensure that exchanged deltas in the presence of censored revisions can
always be applied to the recipient repository, the deltas must replace the
entire base text. To make this restriction reasonably enforceable, the delta
must do so with a single patch operation.
For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
To ensure interoperability when clones disagree about which file nodes are
censored, a restriction is made on deltas based on censored nodes. Any such
delta must replace the full text of the base in a single patch.
If the recipient of a delta considers the base to be censored and the delta
is not in the expected form, the recipient must reject it, as it can't know
if the source has also censored the base.
For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
This diff adds support to writebundle to generate a bundle2 wrapper; upcoming
diffs will add an option to write a v2 changegroup part instead of v1 in these
bundles.
The next diff will add support for writing bundle2 files to writebundle, but
the bundle2 generator wants access to a ui object. This changes the signature
and callsites to pass one in.
This is kind of similar to the debugbundle command but gives summarized actual
uncompressed number of bytes when creating the bundle. The numbers are as
usable as the bundle format is efficient. Hopefully bundle2 will make it a
better indicator of actual entropy.
This is useful when accepting pull requests to assess whether the repo size
increase seems reasonable for the diff before pushing stuff upstream, It has
helped me catching large files that should have been committed as largefiles
but was committed as regular files in intermediate changesets.
This output doesn't combine well with debug output so we only enable it when
verbose without debug.
We already have a _repo property on the packer, and we only access the
changelog and manifest revlog in one place, so it's just as easy to
get them from self._repo.
Since we always pass self._reorder to self.group(), let's drop the
parameter and let group() read from self._reorder itself. There are no
other in-tree callers to group().
The difference between reorder=None (bundle.reorder=auto) and
reorder=False is that the generaldelta revlogs get reordered with the
former. In cg2packer, group() we check if the revlog uses generaldelta
and if reorder=None and then convert that to reorder=False. We are
effectively saying that whether or not generaldelta is used, we want
reorder=None to mean reorder=False for changegroup 2. To make this
clearer, check if reorder=None in the constructor and change it to
False there and drop the overriding of group(). Also document the
reason for turning reordering off.
The config option bundle.reorder can be {on,off,auto}, which gets read
into the 'reorder' variable as {True,False,None}. In two places, we
need to decide how to handle the None/auto case. I personally find it
easier to read those expressions when written to explicitly compare to
None.
changegroup.group() and changegroup.generatefiles() both currently
start progress (with topic "bundling"), but changegroup.generate()
closes the topic. Move the closing to the functions that start the
topic, so it's easier to see where the topic is started and closed.
This completes a move that seems to have been started in f8f5836242c6
(bundle-ng: move progress handling out of the linkrev callback,
2013-05-10).
The 'mf' variable is a manifest revlog, not a manifest, so let's
rename it accordingly. We already call the changelog variable 'cl', so
'ml' seems appropriate.
The parameter has been unused since it was introduced in 40209abd6471
(bundle: refactor changegroup prune to be its own function,
2013-05-30), and Durham says it is not used by his extension either.
Previously, if reorder was true during the creation of a changegroup bundle,
it was possible that the manifest and filelogs would be reordered such that the
resulting bundle filelog had a linkrev that pointed to a commit that was not
the earliest instance of the filelog revision. For example:
With commits:
0<-1<---3<-4
\ /
--2<---
if 2 and 3 added the same version of a file, if the manifests of 2 and 3 have
their order reversed, but the changelog did not, it could produce a filelog with
linkrevs 0<-3 instead of 0<-2, which meant if commit 3 was stripped, it would
delete that file data from the repository and commit 2 would be corrupt (as
would any future pulls that tried to build upon that version of the file).
The fix is to make the linkrev fixup smarter. Previously it considered the first
manifest that added a file to be the first commit that added that file, which is
not true. Now, for every file revision we add to the bundle we make sure we
attach it to the earliest applicable linkrev.
Previously, fnodes had a key and empty dict value for every element in
changedfiles. This is somewhat wasteful. Empty dicts in CPython consume
a lot more memory than you would expect - 280 bytes.
On mozilla-central, which has ~190,000 files/fnodes keys, the previous
loop populating fnodes allocated 91,924 KB of memory, most of that for
the empty dicts.
With this patch in place, our peak RSS during mozilla-central clone
drops:
before: 364,356 KB
after: 326,008 KB
delta: -38,348 KB
When combined with the previous patch, total peak RSS decrease is now
190,116 KB.
The contents of fnodes are only accessed once per key. It is wasteful to
cache the value since nobody will use it.
Before this patch, the caching of unused data in fnodes was effectively
causing a memory leak during the file streaming part of bundle creation.
On mozilla-central (which has ~190,000 entries in fnodes), this patch
has a significant impact on RSS at the end of generate():
before: 516,124 KB
after: 364,356 KB
delta: -151,768 KB
The origin of this code can be traced back to 1f567a607f1f and has been
with us since the 2.7 release.
lookupmf() is currently defined earlier than when it is needed. Future
patches further refactoring this code will be easier to read when
lookupmf() is in its new home.
This mirrors the API for 'pending' and 'finalize' callbacks. I do not have
immediate usage planned for it, but I'm sure some callback will be happy to
access transaction related data.
The post-transaction hooks run after the lock release (because hooks may want to
touch the repository), but they must only run if the transaction is successfully
closed.
We use the new 'addpostclose' method on transaction to register a callback
installing this post-lock-release call.
Instead of calling 'cl.finalize()' by hand (possibly at a bogus time) we
register it in the transaction during 'delayupdate' and rely on 'tr.close()' to
call it at the right time.
The 'delayupdate' method now takes a transaction object and registers its
'_writepending' method for execution in 'transaction.writepending()'. The hook can then
use 'transaction.writepending()' directly.
At some point this will allow the addition of other file creation
during writepending.
cg2 supports generaldelta in changegroups, to be used in bundle2.
Since generaldelta is handled directly in cg2, reordering is switched
off by default.
The commands getchangegroup, getlocalchangegroup and getsubset now each
have a version ending in -raw. The raw versions return the chunk generator
from the changegroup packer directly, without wrapping it in a chunkbuffer
and unpacker. This avoids extra chunkbuffers in the bundle2 code path.
Also, the raw versions can be extended to support alternative packers
in the future, to be used from bundle2.
We only have "01" right now, but we should get general delta in soon.
Bundle2 is expected to make use of this to advertise and select the right packer
to use on both sides.
We store the source and url of the current data into `transaction.hookargs` this
let us inherit it from upper layers that may have created a much wider
transaction. We have to modify bundle2 at the same time to register the source
and url in the transaction. We have to do it in the same patch otherwise, the
`addchangegroup` call would fill these values and the hook calling will crash
because of the duplicated 'source' and 'url' arguments passed to the hook call.
We want to reused some possible information stored in the transaction
`hookargs` dict that may be stored by something handling the transaction at an
upper level (eg: bundle2) So we move the running of the hooks after transaction
creation. This has no visible effects (but an empty transaction roolback if the
hook fails) because nothing had happened in the transaction yet.
The transaction is now carrying hook-related informations. So we use it to
retrieve the `node` argument. This will also carry around all kinds of other useful
informations (like: "are we in a bundle2 processing")
addchangegroup creates a runhook function that is used to invoke the
changegroup and incoming hooks, but at the time the function is called,
the contents of hookargs associated with the transaction may have been
modified externally. For instance, bundle2 code affects it with
obsolescence markers and bookmarks info.
It also creates problems when a single transaction is used with multiple
changegroups added (as per an upcoming change), whereby the contents
of hookargs are that of after adding a latter changegroup when invoking
the hook for the first changegroup.
Functions like getbundle and classes like unbundle10 really manipulate
changegroups and not bundles. A HG10 bundle is the same as a changegroup
plus a small header, but this is no longer the case for a HG2X bundle,
so it's better to separate the names a bit.
We now pass a transaction option to this phase movement function. The
object is currently not used by the function, but it will be in the
future.
All call sites have been updated. Most call sites were already enclosed in a
transaction for a long time. The handful of others have been recently
updated in previous commit.
We now pass a transaction option to this phase movement function. The object
is currently not used by the function, but it will be in the future.
All call sites have been updated. Most call sites were already enclosed in a
transaction for a long time. The handful of others have been recently
updated in previous commit.
The retractboundary function remains to be upgraded.
This argument controls the phase used for the added changesets. This can be
useful to unbundle in "secret" phase as required by shelve.
This change aims at helping high-level code get rid of manual phase
movement. An important milestone for having phases part of the transaction.
Extensions that add to bundle2 will want to know which commits are outgoing so
they can bundle data that is appropriate to those commits. This moves the logic
for figuring that out to a separate function so extensions can do the same
computation.
We are registering data related to the process into the transaction hook data.
This lets other parties using the same transaction get informed of the
addchangegroup result.
This code used to be in `writebundle` only. We needs to make it more broadly
available for bundle2. The "changegroup" bundle2 part has to retrieve the
binary content of changegroup stream. We moved the chunks retrieving code into
the `unbundle10` object directly and the `writebundle` code is now using that.
This split is useful for bundle2 purpose, we want to be able to easily stream
changegroup content in a part.
To keep thing simples, we kept compression out of the new methods. If it make
more sense in the future, compression may get included in this function too.
Before this patch, filename specified to "changegroup.writebundle()"
should be absolute one.
In some cases, they should be relative to repository root, store and
so on (backup before strip, for example).
This patch adds "vfs" argument to "writebundle()", and makes
"writebundle()" open (and unlink) "filename" via vfs for relative
access, if both filename and vfs are specified.
When a changegroup is added by a push on a publishing server, we ensure they
are added as public. This is used to enforce publishing on server when the
client is not aware of phases. It also prevents race conditions where a reader
could see the changesets as draft before they get turned public by the client.
Finally, this save rounds trip as the client does not need additional request to
turn them public.
However, this logic was only enforced when the changegroup was from a "push"
source. And "push" is used for local pushes only. Wireprotocol push uses "serve"
as source since Mercurial 1.9. We now enforce this logic for both "push" and
"serve" sources.
One could note that this logic was mainly useful during wireprotocol exchanges.
So this code is finally put into good use, 9 versions after its introduction.
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API stay unchanged.
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API remains unchanged.
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API remains unchanged.
Somewhere before 2.7, a change [82beb9b16505] was committed that
entailed a large performance regression when bundling (and therefore
remote cloning) repositories. For each file in the repository, it would
recompute the set of needed changesets even though it is the same for
all files. This computation would dominate bundle runtimes according to
profiler output (by 10x or more).
Moves the file chunk generation part of bundle creation to it's own function.
This allows extensions to customize the filelog part of bundle generation.
Change 1f567a607f1f dropped the 'revset' variable which kept track of
which changesets were being bundled. Instead, it used "not in
commonset" to decide which changesets were outgoing.. which ran into
trouble when a commit was in progress.
A few places in the code use 'if not len(revlog)' to check if the revlog
exists. Replacing this with 'not filerevlog' allows alternative revlog
implementations to override __nonzero__ to handle this case without
implementing __len__.
Moving the prune function to be a non-nested function allows extensions to
control which revisions are allowed in the changegroup. For example, in my
shallow repo extension I want to prevent filelogs from being added to the
bundle.
This also allows an extension to use a filelog implementation that doesn't
have revlog.linkrev implemented.
Create a simple start() method to pass the lookup function until bundler
becomes smarter and gets a repo object.
Since we now create the bundler for the whole lifetime, we need to pass it
down to revlog methods.
Mercurial often failed with struct.error or mpatch.mpatchError if incomplete
data was received from a server.
Now we validate all changegroup reads and aborts with
abort: stream ended unexpectedly (got %d bytes, expected %d)
if less than requested was read.
- handle chunk headers separately rather than prepending them to
(potentially large) chunks
- break large chunks into 1M pieces for compression
- don't prepend file metadata onto (potentially large) file data
Incoming ssh needs to detect the end of the changegroup, otherwise it would
block trying to read from the ssh pipe. This is done by parsing the
changegroup chunks.
bundlerepo.getchunk() already is identical to
localrepo.addchangegroup.getchunk(), which is followed by getgroup which
looks much like what you can re-use in bundlerepository.__init__() and in
write_bundle(). bundlerevlog.__init__.genchunk() looks very similar, too,
as do some while loops in localrepo.py.
Applied patch from Benoit Boissinot to move duplicate/related code
to mercurial/changegroup.py and use this to fix incoming ssh.