Compression engines allow options to be passed to them to control
behavior. This patch exposes an argument to bundle2.writebundle()
that passes options to the compression engine when writing compressed
bundles. The argument is honored for both bundle1 and bundle2, the
latter requiring a bit of plumbing to pass the value around.
An upcoming patch will change the "alg" argument passed to this
function from None to "UN" when no compression is wanted.
The existing implementation of bundle2 does not set a "Compression"
parameter if no compression is used. In theory, setting
"Compression=UN" should work. But I haven't audited the code to see if
all client versions supporting bundle2 will accept this.
Rather than take the risk, avoid the BC breakage and treat "UN"
the same as None.
Like the recent change for the compressor side, this too is
relatively straightforward. We now store a compression engine
on the instance instead of a low-level decompressor. Again, this
will allow us to easily transition to different compression engine
APIs when they are implemented.
Now that we have a new API to define compression engines, let's put it
to use!
The new code stores a reference to the compression engine instead of
a low-level compressor object. This will allow us to more easily
transition to different APIs on the compression engine interface
once we implement them.
As part of this, we change the registration in bundletypes to use 'UN'
instead of None. Previously, util.compressors had the no-op compressor
registered under both the 'UN' and None keys. Since we're switching to
a new API, I don't see the point in carrying this dual registration
forward.
This is similar to 72dcaa40df76. Not all calls into the compressor
return compressed data, as the compressor may buffer compressed
output internally. It is cheaper to check for empty chunks than to
send empty chunks through the generator.
When generating a gzip-v2 bundle of the mozilla-unified repo, this
change results in 50,093 empty chunks not being sent through the
generator (out of 1,902,996 total input chunks).
Before this patch, bundle2 application attempted to consume remaining
bundle2 part data when the process is interrupted (SIGINT) or when
sys.exit is called (translated into a SystemExit exception). This
meant that if one of these occurred when applying a say 1 GB
changegroup bundle2 part being downloaded over a network, it may take
Mercurial *several minutes* to terminate after a SIGINT because the
process is waiting on the network to stream megabytes of data. This is
not a great user experience and a regression from bundle1. Furthermore,
many process supervisors tend to only give processes a finite amount of
time to exit after delivering SIGINT: if processes take too long to
self-terminate, a SIGKILL is issued and Mercurial has no opportunity to
clean up. This would mean orphaned locks and transactions. Not good.
This patch changes the bundle2 application behavior to fail faster
when an interrupt or system exit is requested. It does so by not
catching BaseException (which includes KeyboardInterrupt and
SystemExit) and by explicitly checking for these conditions in
yet another handler which would also seek to the end of the current
bundle2 part on failure.
The end result of this patch is that SIGINT is now reacted to
significantly faster: the active transaction is rolled back
immediately without waiting for incoming bundle2 data to be consumed.
This restores the pre-bundle2 behavior and makes Mercurial treat
signals with the urgency they deserve.
The bundle2 changegroup part has an advisory param saying how many
changesets are in the part. Before this patch, we were setting
this part when generating bundle2 parts via the wire protocol but
not when generating local bundle2 files.
A side effect of not setting the changeset count part is that progress
bars don't work when applying changesets. As the tests show, this
impacted clone bundles, shelve, backup bundles, `hg unbundle`, and
anything touching bundle2 files.
This patch adds a backdoor to allow us to pass state from
changegroup generation into the unbundler. We store the number
of changesets in the changegroup in this state and use it to
populate the aforementioned advisory part parameter when generating
the bundle2 bundle.
I concede that I'm not thrilled by how state is being passed in
changegroup.py (it feels a bit hacky). I would love to overhaul the
rather confusing set of functions in changegroup.py with something that
passes rich objects around instead of e.g. low-level generators.
However, given the code freeze for 3.9 is imminent, I'd rather not
undertake this endeavor right now. This feels like the easiest way
to get the parameter added to the changegroup part.
An upcoming change that introduces a 2nd part parameter to a part
reveals that `hg debugbundle` isn't deterministic because parameters
are stored on n plain, unsorted dict.
While we could change that command to sort before output, I think
the more important underlying issue is that bundle2 reading is taking
an ordered data structure and converting it to an unordered one.
Plugging in util.sortdict() fixes that problem while preserving API
compatibility.
This patch also appears to shine light on the fact that we don't
have tests verifying parts with multiple parameters roundtrip
correctly. That would be a good thing to test (and fuzz)... someday.
Usually, the heads will have the same ordering in handlecheckheads. Insisting
on the same ordering is however an unnecessary constraint that in some custom
cases can cause pushes to fail even though the actual heads didn't change. This
caused production issues for us in combination with the current version of
https://bitbucket.org/Unity-Technologies/hgwebcachingproxy/ .
Change 760f9c8ad835 (changegroup: move chunk extraction into a
getchunks method of unbundle10, 2014-04-10) extracted some code to a
getchunks() method and copied a comment about the changegroup format
to the new method. The copy that remains in the old place, doesn't
make much sense there, so let's remove it.
writebundle() writes a bundle2 bundle or a plain changegroup1. Imagine
away the "2" in "bundle2.py" for a moment and this change should makes
sense. The bundle wraps the changegroup, so it makes sense that it
knows about it. Another sign that this is correct is that the delayed
import of bundle2 in changegroup goes away.
I'll leave it for another time to remove the "2" in "bundle2.py"
(alternatively, extract a new bundle.py from it).
In b89de5ee5b31 (changegroup: don't support versions 01 and 02 with
treemanifests, 2016-01-19), I stopped supporting use of cg1 and cg2
with treemanifest repos. What I had not considered was that it's
perfectly safe to pull *to* a treemanifest repo using any changegroup
version. As reported in issue5066, I therefore broke pull from old
repos into a treemanifest repo. It was not covered by the test case,
because that pulled from a local repo while enabling treemanifests,
which enabled treemanifests on the source repo as well. After
switching to pulling via HTTP, it breaks.
Fix by splitting up changegroup.supportedversions() into
supportedincomingversions() and supportedoutgoingversions().
By adding a mandatory 'treemanifest' parameter in the bundle2 part, we
make it possible for the recipient to set repo requirements before the
manifest revlog is accessed.
Before bundle2, hook output from hook failures was prefixed with
"remote: ". Up to this point with bundle2, the output was converted to
the message to print in an Abort exception. This had 2 implications:
1) It was unclear whether an error message came from the local repo
or the remote
2) The exit code changed from 1 to 255
This patch changes the handling of error:abort bundle2 parts during push
to prefix the error message with "remote: ". This restores the old
behavior.
We still preserve the behavior of raising an Abort during bundle2
application failure. This is a regression from pre-bundle2 because the
exit code changed.
Because we no longer raise an Abort with the remote's message, we needed
to insert a message for the new Abort. So, I invented a new error
message for that. This is another change from pre-bundle2. However, I
like the new error message because it states unambiguously who aborted
the push failed, which I think is important for users so they can decide
what's next.
We allow specifying the url to carry it to hooks. This gets us closer to
'bundle1.apply(...)' and will allow us to remove regressions in multiple place
where we forget to pass the url to hooks.
We allow specifying the source to carry it to hooks. This gets us closer to
'bundle1.apply(...)' and will allow us to remove regressions in multiple places
where we forget to pass the source to hooks.
There is a case where the intent is clear and the transaction is not optional. We
want to be able to alter that transaction in a wide and easy way. We cannot get
a unified '.apply(repo)' method for bundle1 and bundle2 yet because the api are
still a bit too far apart. But this is a good step forward to get the rc out.
We would skip the part if it was fully unknown, so we should also skip it if we
know we won't be able to apply it. This will allow us to produce bundles with
obsolescence markers alongside changegroup while still being able to apply them
on any client.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
A future patch will allow the bundle2 lock be taken lazily. We need to
introduce transaction-gets to each handler that needs the lock.
The tests caught these issues when I added lazy locking.
There is use case for directly forward and bundle2 stream from the peer to a
file (eg: 'hg incoming --bundle'), However ssh peers have no way to know the
'getbundle' is over except for actually interpreting the bundle. So we need to
have the unbundle do the interpreting and forward job.
The function is marked as private to highlight that this is terrible and that we
are sorry.
We want to introduce a simple way to forward the content of a bundle2 stream.
For this purpose, we will need to both yield the parameters block and process it
(to apply any behavior change it might indicate).
This changeset adds support for a 'compression' parameter in bundle2 streams.
When set, it controls the compression algorithm used for the payload part of the
bundle2.
There is currently no usage of this except in tests.
We are about to allow compressing the core of the bundle2. So we extract the
generation of this bits in its own parts to make this compression phases easier
in a later changesets.
As a comment in the code have been asking for, it is now possible to register
piece of code that handle parameters for the stream. I've been wondering is such
function should be class methods or not. I eventually went for externally
decorated methods to stick with the culture of extensibility from an extensions
that apply to bundle2.
The original name explicitly mention "Part", however it is also used outside of
parts related feature. We rename from 'UnsupportedPartError' to
'BundleUnknownFeatureError' to fix this.
GeneratorExit means the other end of the conversation has already
stopped listening, so don't try and yield out error
information. Instead, just let the GeneratorExit propagate normally.
This should resolve esoteric issues observed with servers that have
aggressive timeouts waiting for data to send to clients logging
internal Python errors[0]. This has been observed with both gunicorn's
gevent worker model and with scm-manager's built-in webserver (which
itself is something sitting inside jetty.)
0: Exception RuntimeError: 'generator ignored GeneratorExit' in <generator object getchunks at 0x7fd2f6c586e0> ignored
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
The docstring for the header size field currently says "The total
number of Bytes used by the part headers", but the size is about a
single header, so let's change it to "header".
When unbundling over the wire is aborted, we have a mechanism to convey the
error inside a bundle part. As we add support for more errors, we need to know if
the client will support them. For this purpose, we duck punch the reply
capabilities of the client on the raised extensions.
This is similar to what is done to salvage the server output on error.
The pushkey code is generic and the server side has little context on what the
client is trying to achieve. Generating interesting error messages server side
would be challenging. Instead we introduce a dedicated exception that carries more
data. In particular, it carries the id of the part which failed that will allow
clients to display custom error messages depending on the part intent.
The processing and transfer-over-the-wire of this exception is to be implemented
in coming changesets.
So far, result of a pushkey operation had no consequence on the transaction
(beside the change). We makes it respect the 'mandatory' flag of part so that
failed pushkey call abort the whole transaction. This will allow rejecting
changes (primary target: changesets) regarding phases or bookmark criteria in
the future (when we will push such data in a mandatory part).
We currently raise an abort error because all clients support it. We'll
introduce a more precise error in the next changesets.
.hgtags fnodes cache entries can be expensive to compute, especially
if there are hundreds of even thousands of them. This patch implements
support for receiving a bundle2 part that contains a mapping of
changeset to .hgtags fnodes.
An upcoming patch will teach the server to send this part, allowing
clients to bypass having to redundantly compute these values.
A number of tests changed due to the client advertising the "hgtagsfnodes"
capability.
The old output is very verbose and unsuitable for general debug level. It is
however very useful for debugging bundle2 generation or consumption issues. All
this verbose ouput is hidden under a 'devel.bundle2.debug' flag.
The part generation process was lacking a ui object and could not produce debug
output. It seems valuable to have some debug output on this part too, especially
now that we are planning to be able to hide it in the default --debug output.
The bundling process is very verbose, we would like to be able to hide such
output behind a configuration flag and have it more explicitly referencing
bundle2. The first step is to gather all these messages in a dedicated function.
The bundling process is very verbose, we would like to be able to hide such
output behind a configuration flag and have it more explicitly referencing
bundle2. The first step is to gather all these messages in a dedicated
function.
The same gathering will be later do for debug message issue by unbundling.
The current bundle2 processing was capturing all output. This is nice as it
provide better meta data about what output what, but this was changing two
things:
1) adding a prefix "remote: " to "other" output during local push (issue4613)
2) local and ssh push does not provide real time output anymore (issue4615)
As we are unsure about what form should be used in (1) and how to solve (2) we
disable output capture in this two cases. Output capture can be forced using an
experimental option.
Until this changeset, we were only able to save output if an error happened
during the 'transaction.close()' phase. If the 'processbundle' call raised an
exception, the 'bundleoperation' object was never returned, so the reply bundle
was never accessible and no output could be salvaged. We introduce a quick (but
not very elegant) fix to gain access to any reply created during the processing.
This conclude this output related series. We should hopefully be able client-side to see the
whole server output, in a proper order.
The code is now complex enough that a refactoring of it would make sense on
default.
External hook used to directly write on stdout and stderr. As a result their
output was not captured by the bundle2 processing. This resulted in confusing
out of order output on the client side. We are now capturing hooks output in
this context.
Remote output should be silenced by --quiet. The issue was found while running
`test-largefiles-cache.t` so it will get tested once we switch bundle2 by
default.
The re-handling of output is happening in some 'unbundle' callers. We have to
transmit the output information to this place so we stick it on the exception.
This is the third step in our quest for preserving the server output on error
(issue4594). We want to be able to copy the output part from the aborted reply
into the exception bundle.
This method returns a copy of all 'output' parts added to the bundler.
This is the second step in our quest for preserving the server output on error
(issue4594). We want to be able to copy the output parts from the aborted reply
into the exception bundle.
The function will be used in a later patch.
This is the first step in our quest for preserving the server output on error
(issue4594). We want to be able to copy the output parts from the aborted reply
into the exception bundle.
The function will be used in a later patch.
We want to preserve output even when the unbundling fails (eg: hook output). So
we must make sure that everything we have is flushed into the reply bundle.
(This is related to issue4594)
The obsolescence markers exchange is still experimental. We (developer) need
more information about what is going on. I'm adding an experimental flag to add
display the amount of data exchanged during bundle2 exchanges.
It is finally time to freeze the bundle2 format! To do so we:
- rename HG2Y to HG20,
- drop "b2x:" prefix from all part names,
- rename capability to "bundle2-exp" to "bundle2"
- rename the hook flag from 'bundle2-exp' to 'bundle2'
We now take full advantage of the 'getunbundler' function by using a
'{version -> unbundler-class}' mapping. This map currently contains a single
entry but will make it easy to support more versions from an extension/the
future.
At some point, this map will probably contain bundler-class information too,
in the same fashion the packer map does. However, this is not critically required
right now so it will happen by itself when needed.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
To support multiple bundle2 formats, we will need a function returning
the proper unbundler according to the header. We introduce such aa
function and change the usage in the code base. The function will get
smarter in later changesets.
This is somewhat similar to the dispatching we do for 'HG10' and 'HG11'.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
This makes it easy to create a new bundler class that inherits from
the core one. This matches the way 'changegroup' packers work.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
Bundlerepo uses the compressed() method to determine whether it should write
an uncompressed temporary file. Since we don't support compressed bundle2 files
at the moment, make this method return true.
Replace bare part.read() calls with part.seek(0, 2) since the return value is
being ignored. As this doesn't necessarily require building a string that
contains the rest of the part, the potential exists to reduce the memory
footprint of these operations.
This implements a seek() method for unbundlepart. This allows on-disk bundle2
parts to behave enough like files for bundlerepo to handle them. A future
patch will add support for bundlerepo to read the bundle2 files that are
written when the experimental.strip-bundle2-version config option is used.
This patch adds seek(), tell(), and close() implementations for unpackermixin
which forward to the file descriptor's implementation if possible. A future
patch will use this to make bundle2.unbundlepart seekable, which will in turn
make it usable as a file descriptor for bundlerepo.
The binary format description has always stated that the parttype should be simple,
but it was never really enforced. Recent discussions have convinced me we want to
keep the part type simple and easy to debug. There is enough extensibility in
the rest of the format.