Previously, if reorder was true during the creation of a changegroup bundle,
it was possible that the manifest and filelogs would be reordered such that the
resulting bundle filelog had a linkrev that pointed to a commit that was not
the earliest instance of the filelog revision. For example:
With commits:
0<-1<---3<-4
\ /
--2<---
if 2 and 3 added the same version of a file, if the manifests of 2 and 3 have
their order reversed, but the changelog did not, it could produce a filelog with
linkrevs 0<-3 instead of 0<-2, which meant if commit 3 was stripped, it would
delete that file data from the repository and commit 2 would be corrupt (as
would any future pulls that tried to build upon that version of the file).
The fix is to make the linkrev fixup smarter. Previously it considered the first
manifest that added a file to be the first commit that added that file, which is
not true. Now, for every file revision we add to the bundle we make sure we
attach it to the earliest applicable linkrev.
Previously, fnodes had a key and empty dict value for every element in
changedfiles. This is somewhat wasteful. Empty dicts in CPython consume
a lot more memory than you would expect - 280 bytes.
On mozilla-central, which has ~190,000 files/fnodes keys, the previous
loop populating fnodes allocated 91,924 KB of memory, most of that for
the empty dicts.
With this patch in place, our peak RSS during mozilla-central clone
drops:
before: 364,356 KB
after: 326,008 KB
delta: -38,348 KB
When combined with the previous patch, total peak RSS decrease is now
190,116 KB.
The contents of fnodes are only accessed once per key. It is wasteful to
cache the value since nobody will use it.
Before this patch, the caching of unused data in fnodes was effectively
causing a memory leak during the file streaming part of bundle creation.
On mozilla-central (which has ~190,000 entries in fnodes), this patch
has a significant impact on RSS at the end of generate():
before: 516,124 KB
after: 364,356 KB
delta: -151,768 KB
The origin of this code can be traced back to 1f567a607f1f and has been
with us since the 2.7 release.
lookupmf() is currently defined earlier than when it is needed. Future
patches further refactoring this code will be easier to read when
lookupmf() is in its new home.
This mirrors the API for 'pending' and 'finalize' callbacks. I do not have
immediate usage planned for it, but I'm sure some callback will be happy to
access transaction related data.
The post-transaction hooks run after the lock release (because hooks may want to
touch the repository), but they must only run if the transaction is successfully
closed.
We use the new 'addpostclose' method on transaction to register a callback
installing this post-lock-release call.
Instead of calling 'cl.finalize()' by hand (possibly at a bogus time) we
register it in the transaction during 'delayupdate' and rely on 'tr.close()' to
call it at the right time.
The 'delayupdate' method now takes a transaction object and registers its
'_writepending' method for execution in 'transaction.writepending()'. The hook can then
use 'transaction.writepending()' directly.
At some point this will allow the addition of other file creation
during writepending.
cg2 supports generaldelta in changegroups, to be used in bundle2.
Since generaldelta is handled directly in cg2, reordering is switched
off by default.
The commands getchangegroup, getlocalchangegroup and getsubset now each
have a version ending in -raw. The raw versions return the chunk generator
from the changegroup packer directly, without wrapping it in a chunkbuffer
and unpacker. This avoids extra chunkbuffers in the bundle2 code path.
Also, the raw versions can be extended to support alternative packers
in the future, to be used from bundle2.
We only have "01" right now, but we should get general delta in soon.
Bundle2 is expected to make use of this to advertise and select the right packer
to use on both sides.
We store the source and url of the current data into `transaction.hookargs` this
let us inherit it from upper layers that may have created a much wider
transaction. We have to modify bundle2 at the same time to register the source
and url in the transaction. We have to do it in the same patch otherwise, the
`addchangegroup` call would fill these values and the hook calling will crash
because of the duplicated 'source' and 'url' arguments passed to the hook call.
We want to reused some possible information stored in the transaction
`hookargs` dict that may be stored by something handling the transaction at an
upper level (eg: bundle2) So we move the running of the hooks after transaction
creation. This has no visible effects (but an empty transaction roolback if the
hook fails) because nothing had happened in the transaction yet.
The transaction is now carrying hook-related informations. So we use it to
retrieve the `node` argument. This will also carry around all kinds of other useful
informations (like: "are we in a bundle2 processing")
addchangegroup creates a runhook function that is used to invoke the
changegroup and incoming hooks, but at the time the function is called,
the contents of hookargs associated with the transaction may have been
modified externally. For instance, bundle2 code affects it with
obsolescence markers and bookmarks info.
It also creates problems when a single transaction is used with multiple
changegroups added (as per an upcoming change), whereby the contents
of hookargs are that of after adding a latter changegroup when invoking
the hook for the first changegroup.
Functions like getbundle and classes like unbundle10 really manipulate
changegroups and not bundles. A HG10 bundle is the same as a changegroup
plus a small header, but this is no longer the case for a HG2X bundle,
so it's better to separate the names a bit.
We now pass a transaction option to this phase movement function. The
object is currently not used by the function, but it will be in the
future.
All call sites have been updated. Most call sites were already enclosed in a
transaction for a long time. The handful of others have been recently
updated in previous commit.
We now pass a transaction option to this phase movement function. The object
is currently not used by the function, but it will be in the future.
All call sites have been updated. Most call sites were already enclosed in a
transaction for a long time. The handful of others have been recently
updated in previous commit.
The retractboundary function remains to be upgraded.
This argument controls the phase used for the added changesets. This can be
useful to unbundle in "secret" phase as required by shelve.
This change aims at helping high-level code get rid of manual phase
movement. An important milestone for having phases part of the transaction.
Extensions that add to bundle2 will want to know which commits are outgoing so
they can bundle data that is appropriate to those commits. This moves the logic
for figuring that out to a separate function so extensions can do the same
computation.
We are registering data related to the process into the transaction hook data.
This lets other parties using the same transaction get informed of the
addchangegroup result.
This code used to be in `writebundle` only. We needs to make it more broadly
available for bundle2. The "changegroup" bundle2 part has to retrieve the
binary content of changegroup stream. We moved the chunks retrieving code into
the `unbundle10` object directly and the `writebundle` code is now using that.
This split is useful for bundle2 purpose, we want to be able to easily stream
changegroup content in a part.
To keep thing simples, we kept compression out of the new methods. If it make
more sense in the future, compression may get included in this function too.
Before this patch, filename specified to "changegroup.writebundle()"
should be absolute one.
In some cases, they should be relative to repository root, store and
so on (backup before strip, for example).
This patch adds "vfs" argument to "writebundle()", and makes
"writebundle()" open (and unlink) "filename" via vfs for relative
access, if both filename and vfs are specified.
When a changegroup is added by a push on a publishing server, we ensure they
are added as public. This is used to enforce publishing on server when the
client is not aware of phases. It also prevents race conditions where a reader
could see the changesets as draft before they get turned public by the client.
Finally, this save rounds trip as the client does not need additional request to
turn them public.
However, this logic was only enforced when the changegroup was from a "push"
source. And "push" is used for local pushes only. Wireprotocol push uses "serve"
as source since Mercurial 1.9. We now enforce this logic for both "push" and
"serve" sources.
One could note that this logic was mainly useful during wireprotocol exchanges.
So this code is finally put into good use, 9 versions after its introduction.
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API stay unchanged.
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API remains unchanged.
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API remains unchanged.
Somewhere before 2.7, a change [82beb9b16505] was committed that
entailed a large performance regression when bundling (and therefore
remote cloning) repositories. For each file in the repository, it would
recompute the set of needed changesets even though it is the same for
all files. This computation would dominate bundle runtimes according to
profiler output (by 10x or more).
Moves the file chunk generation part of bundle creation to it's own function.
This allows extensions to customize the filelog part of bundle generation.
Change 1f567a607f1f dropped the 'revset' variable which kept track of
which changesets were being bundled. Instead, it used "not in
commonset" to decide which changesets were outgoing.. which ran into
trouble when a commit was in progress.
A few places in the code use 'if not len(revlog)' to check if the revlog
exists. Replacing this with 'not filerevlog' allows alternative revlog
implementations to override __nonzero__ to handle this case without
implementing __len__.
Moving the prune function to be a non-nested function allows extensions to
control which revisions are allowed in the changegroup. For example, in my
shallow repo extension I want to prevent filelogs from being added to the
bundle.
This also allows an extension to use a filelog implementation that doesn't
have revlog.linkrev implemented.