Somewhere before 2.7, a change [82beb9b16505] was committed that
entailed a large performance regression when bundling (and therefore
remote cloning) repositories. For each file in the repository, it would
recompute the set of needed changesets even though it is the same for
all files. This computation would dominate bundle runtimes according to
profiler output (by 10x or more).
Moves the file chunk generation part of bundle creation to it's own function.
This allows extensions to customize the filelog part of bundle generation.
Change 1f567a607f1f dropped the 'revset' variable which kept track of
which changesets were being bundled. Instead, it used "not in
commonset" to decide which changesets were outgoing.. which ran into
trouble when a commit was in progress.
A few places in the code use 'if not len(revlog)' to check if the revlog
exists. Replacing this with 'not filerevlog' allows alternative revlog
implementations to override __nonzero__ to handle this case without
implementing __len__.
Moving the prune function to be a non-nested function allows extensions to
control which revisions are allowed in the changegroup. For example, in my
shallow repo extension I want to prevent filelogs from being added to the
bundle.
This also allows an extension to use a filelog implementation that doesn't
have revlog.linkrev implemented.
Create a simple start() method to pass the lookup function until bundler
becomes smarter and gets a repo object.
Since we now create the bundler for the whole lifetime, we need to pass it
down to revlog methods.
Mercurial often failed with struct.error or mpatch.mpatchError if incomplete
data was received from a server.
Now we validate all changegroup reads and aborts with
abort: stream ended unexpectedly (got %d bytes, expected %d)
if less than requested was read.
- handle chunk headers separately rather than prepending them to
(potentially large) chunks
- break large chunks into 1M pieces for compression
- don't prepend file metadata onto (potentially large) file data