Compression engines allow options to be passed to them to control
behavior. This patch exposes an argument to bundle2.writebundle()
that passes options to the compression engine when writing compressed
bundles. The argument is honored for both bundle1 and bundle2, the
latter requiring a bit of plumbing to pass the value around.
An upcoming patch will change the "alg" argument passed to this
function from None to "UN" when no compression is wanted.
The existing implementation of bundle2 does not set a "Compression"
parameter if no compression is used. In theory, setting
"Compression=UN" should work. But I haven't audited the code to see if
all client versions supporting bundle2 will accept this.
Rather than take the risk, avoid the BC breakage and treat "UN"
the same as None.
Like the recent change for the compressor side, this too is
relatively straightforward. We now store a compression engine
on the instance instead of a low-level decompressor. Again, this
will allow us to easily transition to different compression engine
APIs when they are implemented.
Now that we have a new API to define compression engines, let's put it
to use!
The new code stores a reference to the compression engine instead of
a low-level compressor object. This will allow us to more easily
transition to different APIs on the compression engine interface
once we implement them.
As part of this, we change the registration in bundletypes to use 'UN'
instead of None. Previously, util.compressors had the no-op compressor
registered under both the 'UN' and None keys. Since we're switching to
a new API, I don't see the point in carrying this dual registration
forward.
This is similar to 72dcaa40df76. Not all calls into the compressor
return compressed data, as the compressor may buffer compressed
output internally. It is cheaper to check for empty chunks than to
send empty chunks through the generator.
When generating a gzip-v2 bundle of the mozilla-unified repo, this
change results in 50,093 empty chunks not being sent through the
generator (out of 1,902,996 total input chunks).
Before this patch, bundle2 application attempted to consume remaining
bundle2 part data when the process is interrupted (SIGINT) or when
sys.exit is called (translated into a SystemExit exception). This
meant that if one of these occurred when applying a say 1 GB
changegroup bundle2 part being downloaded over a network, it may take
Mercurial *several minutes* to terminate after a SIGINT because the
process is waiting on the network to stream megabytes of data. This is
not a great user experience and a regression from bundle1. Furthermore,
many process supervisors tend to only give processes a finite amount of
time to exit after delivering SIGINT: if processes take too long to
self-terminate, a SIGKILL is issued and Mercurial has no opportunity to
clean up. This would mean orphaned locks and transactions. Not good.
This patch changes the bundle2 application behavior to fail faster
when an interrupt or system exit is requested. It does so by not
catching BaseException (which includes KeyboardInterrupt and
SystemExit) and by explicitly checking for these conditions in
yet another handler which would also seek to the end of the current
bundle2 part on failure.
The end result of this patch is that SIGINT is now reacted to
significantly faster: the active transaction is rolled back
immediately without waiting for incoming bundle2 data to be consumed.
This restores the pre-bundle2 behavior and makes Mercurial treat
signals with the urgency they deserve.
The bundle2 changegroup part has an advisory param saying how many
changesets are in the part. Before this patch, we were setting
this part when generating bundle2 parts via the wire protocol but
not when generating local bundle2 files.
A side effect of not setting the changeset count part is that progress
bars don't work when applying changesets. As the tests show, this
impacted clone bundles, shelve, backup bundles, `hg unbundle`, and
anything touching bundle2 files.
This patch adds a backdoor to allow us to pass state from
changegroup generation into the unbundler. We store the number
of changesets in the changegroup in this state and use it to
populate the aforementioned advisory part parameter when generating
the bundle2 bundle.
I concede that I'm not thrilled by how state is being passed in
changegroup.py (it feels a bit hacky). I would love to overhaul the
rather confusing set of functions in changegroup.py with something that
passes rich objects around instead of e.g. low-level generators.
However, given the code freeze for 3.9 is imminent, I'd rather not
undertake this endeavor right now. This feels like the easiest way
to get the parameter added to the changegroup part.
An upcoming change that introduces a 2nd part parameter to a part
reveals that `hg debugbundle` isn't deterministic because parameters
are stored on n plain, unsorted dict.
While we could change that command to sort before output, I think
the more important underlying issue is that bundle2 reading is taking
an ordered data structure and converting it to an unordered one.
Plugging in util.sortdict() fixes that problem while preserving API
compatibility.
This patch also appears to shine light on the fact that we don't
have tests verifying parts with multiple parameters roundtrip
correctly. That would be a good thing to test (and fuzz)... someday.
Usually, the heads will have the same ordering in handlecheckheads. Insisting
on the same ordering is however an unnecessary constraint that in some custom
cases can cause pushes to fail even though the actual heads didn't change. This
caused production issues for us in combination with the current version of
https://bitbucket.org/Unity-Technologies/hgwebcachingproxy/ .
Change 760f9c8ad835 (changegroup: move chunk extraction into a
getchunks method of unbundle10, 2014-04-10) extracted some code to a
getchunks() method and copied a comment about the changegroup format
to the new method. The copy that remains in the old place, doesn't
make much sense there, so let's remove it.
writebundle() writes a bundle2 bundle or a plain changegroup1. Imagine
away the "2" in "bundle2.py" for a moment and this change should makes
sense. The bundle wraps the changegroup, so it makes sense that it
knows about it. Another sign that this is correct is that the delayed
import of bundle2 in changegroup goes away.
I'll leave it for another time to remove the "2" in "bundle2.py"
(alternatively, extract a new bundle.py from it).
In b89de5ee5b31 (changegroup: don't support versions 01 and 02 with
treemanifests, 2016-01-19), I stopped supporting use of cg1 and cg2
with treemanifest repos. What I had not considered was that it's
perfectly safe to pull *to* a treemanifest repo using any changegroup
version. As reported in issue5066, I therefore broke pull from old
repos into a treemanifest repo. It was not covered by the test case,
because that pulled from a local repo while enabling treemanifests,
which enabled treemanifests on the source repo as well. After
switching to pulling via HTTP, it breaks.
Fix by splitting up changegroup.supportedversions() into
supportedincomingversions() and supportedoutgoingversions().
By adding a mandatory 'treemanifest' parameter in the bundle2 part, we
make it possible for the recipient to set repo requirements before the
manifest revlog is accessed.
Before bundle2, hook output from hook failures was prefixed with
"remote: ". Up to this point with bundle2, the output was converted to
the message to print in an Abort exception. This had 2 implications:
1) It was unclear whether an error message came from the local repo
or the remote
2) The exit code changed from 1 to 255
This patch changes the handling of error:abort bundle2 parts during push
to prefix the error message with "remote: ". This restores the old
behavior.
We still preserve the behavior of raising an Abort during bundle2
application failure. This is a regression from pre-bundle2 because the
exit code changed.
Because we no longer raise an Abort with the remote's message, we needed
to insert a message for the new Abort. So, I invented a new error
message for that. This is another change from pre-bundle2. However, I
like the new error message because it states unambiguously who aborted
the push failed, which I think is important for users so they can decide
what's next.
We allow specifying the url to carry it to hooks. This gets us closer to
'bundle1.apply(...)' and will allow us to remove regressions in multiple place
where we forget to pass the url to hooks.
We allow specifying the source to carry it to hooks. This gets us closer to
'bundle1.apply(...)' and will allow us to remove regressions in multiple places
where we forget to pass the source to hooks.
There is a case where the intent is clear and the transaction is not optional. We
want to be able to alter that transaction in a wide and easy way. We cannot get
a unified '.apply(repo)' method for bundle1 and bundle2 yet because the api are
still a bit too far apart. But this is a good step forward to get the rc out.
We would skip the part if it was fully unknown, so we should also skip it if we
know we won't be able to apply it. This will allow us to produce bundles with
obsolescence markers alongside changegroup while still being able to apply them
on any client.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
A future patch will allow the bundle2 lock be taken lazily. We need to
introduce transaction-gets to each handler that needs the lock.
The tests caught these issues when I added lazy locking.
There is use case for directly forward and bundle2 stream from the peer to a
file (eg: 'hg incoming --bundle'), However ssh peers have no way to know the
'getbundle' is over except for actually interpreting the bundle. So we need to
have the unbundle do the interpreting and forward job.
The function is marked as private to highlight that this is terrible and that we
are sorry.
We want to introduce a simple way to forward the content of a bundle2 stream.
For this purpose, we will need to both yield the parameters block and process it
(to apply any behavior change it might indicate).
This changeset adds support for a 'compression' parameter in bundle2 streams.
When set, it controls the compression algorithm used for the payload part of the
bundle2.
There is currently no usage of this except in tests.
We are about to allow compressing the core of the bundle2. So we extract the
generation of this bits in its own parts to make this compression phases easier
in a later changesets.
As a comment in the code have been asking for, it is now possible to register
piece of code that handle parameters for the stream. I've been wondering is such
function should be class methods or not. I eventually went for externally
decorated methods to stick with the culture of extensibility from an extensions
that apply to bundle2.
The original name explicitly mention "Part", however it is also used outside of
parts related feature. We rename from 'UnsupportedPartError' to
'BundleUnknownFeatureError' to fix this.
GeneratorExit means the other end of the conversation has already
stopped listening, so don't try and yield out error
information. Instead, just let the GeneratorExit propagate normally.
This should resolve esoteric issues observed with servers that have
aggressive timeouts waiting for data to send to clients logging
internal Python errors[0]. This has been observed with both gunicorn's
gevent worker model and with scm-manager's built-in webserver (which
itself is something sitting inside jetty.)
0: Exception RuntimeError: 'generator ignored GeneratorExit' in <generator object getchunks at 0x7fd2f6c586e0> ignored
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.