This is one step towards removing a bunch of "if isinstance(gen,
unbundle20)" by treating bundle1 and bundle2 more similarly.
The name may sounds ironic for a method in the bundle2 module, but I
didn't think it was worth it yet to create a new 'bundle' module that
depends on the 'bundle2' module. Besides, we'll inline the method
again later.
The results only need to be combined if they come from a bundle2. More
importantly, we'll change its argument to a bundleoperation soon, and
then it definitely will no longer belong in changegroup.py.
This adds an experimental.bundle-phases config option to include phase
information in bundles. As with the recently added support for
bundling obsmarkers, the support for bundling phases is hidden behind
the config option until we decide to make a bundlespec v3 that
includes phases (and obsmarkers and ...).
We could perhaps use the listkeys format for this, but that's
considered obsolete according to Pierre-Yves. Instead, we introduce a
new "phase-heads" bundle part. The new part contains the phase heads
among the set of bundled revisions. It does not include those in
secret phase; any head in the bundle that is not mentioned in the
phase-heads part is assumed to be secret. As a special case, an empty
phase-heads part thus means that any changesets should be added in
secret phase. (If we ever add a fourth phase, we'll include secret in
the part and we'll add a version number.)
For now, phases are only included by "hg bundle", and not by
e.g. strip and rebase.
When adding support for bundling and unbundling phases, it will be
useful to have the list of added changesets. To do that, we return the
list from changegroup.apply().
We move the feature to a proper configuration and document it. The config goes
in the 'server' section because it feels like something the server owner would
want to decide. We pick and open field because it seems likely that other
checking levels will emerge in the future. (eg: server like the mozilla-try
server will likely wants a "none" value)
The option name contains 'push' since this affects 'push' only. The option value
'check-related' is preferred over one explicitly containing 'allow' or 'deny'
because the client still have a strong decision power here. Here, the server is
just advising the client on the check mode to use.
Client has a mechanism for the server to check that nothing changed server side
since the client prepared a push. That check is wide and any head changed on
the server will lead to an aborted push. We introduce a way for the client to
send a less strict checking. That logic will check that no heads impacted by
the push have been affected. If other unrelated heads (including named branches
heads) have been affected, the push will proceed.
This is very helpful for repositories with high developers traffic on different
heads, a common setup.
That behavior is currently controlled by an experimental option. The config
should live in the "server" section but bike-shedding of the name will happen
in the next changesets. Servers advertise this capability through a new bundle2
capability 'checkeads', using the value 'related'.
The 'test-push-race.t' is updated to check that new capabilities on the
documented cases.
The "hg bundle" command is a good place to test if the inclusion of obsmarkers
within a bundle is working well (part exists, content is correct etc). So we
add a way to have them included.
Ideally, this would be controlled by a change around bundlespec (bundlespec
"v3" + arguments). However, my main goal is to have obsmarkers included in
bundle created by the 'hg strip' command, not the 'hg bundle' so for now I'm
avoiding the detour through bundlespec rework territory.
Better debug output for obsmarkers in 'debugbundle' will be added in later
changesets. The 'test-obsolete-bundle-strip.t' test will also get updated in a
later changeset to keep the current changeset smaller.
We move it next to similar part building functions. We will need it for the
"writenewbundle" logic. This will allow us to easily include obsmarkers in
on-disk bundle, a necessary step before having `hg strip` also operate on
markers.
(Yes, the bundle2 module was already too large, but there any many
interdependencies between its components so it is non-trivial to split, this is
a quest for another adventure.)
Adding markers to the repository might affect the set of obsolete changesets. So we
most remove the "volatile" set who rely in that data. We add two missing
invalidations after merging markers. This was caught by code change in the evolve
extensions tests.
This issues highlight that the current way to do things is a bit fragile,
however we keep things simple for stable.
Many have seen a "stream ended unexpectedly" error. This message is
raised from changegroup.readexactly() when a read(n) operation fails
to return exactly N bytes.
I believe most occurrences of this error in the wild stem from
the code changed in this patch. Before, if bundle2's part applicator
raised an Exception when processing/applying parts, the exception
handler would attempt to iterate the remaining parts. If I/O
during this iteration failed, it would likely raise the
"stream ended unexpectedly" exception.
The problem with this approach is that if we already encountered
an I/O error iterating the bundle2 data during application, then
any further I/O would almost certainly fail. If the original stream
were closed, changegroup.readexactly() would obtain an empty
string, triggering "stream ended unexpectedly" with "got 0." This is
the error message that users would see. What's worse is that the
original I/O related exception would be lost since a new exception
would be raised. This made debugging the actual I/O failure
effectively impossible.
This patch changes the exception handler for bundle2 application to
ignore errors when seeking the underlying stream. When the underlying
error is I/O related, the seek should fail fast and the original
exception will be re-raised. The output changes in
test-http-bad-server.t demonstrate this.
When the underlying error is not I/O related and the stream can be
seeked, the old behavior is preserved.
These methods are unrelated to unpacking. They are used internally by the
'unbundlepart' class only. So me move them there as private methods.
In the same go, we clarify their internal role in the their docstring.
The unpackermixin is a utility used to implement the bundle2 protocol. It should
not be used when writing part handlers. We update the docstring to clarify this.
There are some non-obvious limitations on the parameters of this method.
Add some documentation where people will likely look to understand how
to use this API.
The current function ('writebundle') is focussing on getting an existing
changegroup to disk. It is no easy ways to includes more part in the generated
bundle2. So we introduce a slightly higher level function that is fed the
'outgoing' object (that defines the bundled spec) and the bundlespec parameters
(to control the changegroup generation and inclusion of other parts).
This is creating the third logic dedicated to create a consistent bundle2 (the
other 2 are the push code and the getbundle code). We should probably reconcile
them at some points but they all takes different types of input. So we need to
introduce an intermediate "object" that each different input could be converted
to. Such unified "bundle2 specification" could be fed to some unified code.
We start by having the `hg bundle` related code on its own to helps defines its
specific needs first. Once the common and specific parts of each logic will be
known we can start unification.
Compression engines allow options to be passed to them to control
behavior. This patch exposes an argument to bundle2.writebundle()
that passes options to the compression engine when writing compressed
bundles. The argument is honored for both bundle1 and bundle2, the
latter requiring a bit of plumbing to pass the value around.
An upcoming patch will change the "alg" argument passed to this
function from None to "UN" when no compression is wanted.
The existing implementation of bundle2 does not set a "Compression"
parameter if no compression is used. In theory, setting
"Compression=UN" should work. But I haven't audited the code to see if
all client versions supporting bundle2 will accept this.
Rather than take the risk, avoid the BC breakage and treat "UN"
the same as None.
Like the recent change for the compressor side, this too is
relatively straightforward. We now store a compression engine
on the instance instead of a low-level decompressor. Again, this
will allow us to easily transition to different compression engine
APIs when they are implemented.
Now that we have a new API to define compression engines, let's put it
to use!
The new code stores a reference to the compression engine instead of
a low-level compressor object. This will allow us to more easily
transition to different APIs on the compression engine interface
once we implement them.
As part of this, we change the registration in bundletypes to use 'UN'
instead of None. Previously, util.compressors had the no-op compressor
registered under both the 'UN' and None keys. Since we're switching to
a new API, I don't see the point in carrying this dual registration
forward.
This is similar to 72dcaa40df76. Not all calls into the compressor
return compressed data, as the compressor may buffer compressed
output internally. It is cheaper to check for empty chunks than to
send empty chunks through the generator.
When generating a gzip-v2 bundle of the mozilla-unified repo, this
change results in 50,093 empty chunks not being sent through the
generator (out of 1,902,996 total input chunks).
Before this patch, bundle2 application attempted to consume remaining
bundle2 part data when the process is interrupted (SIGINT) or when
sys.exit is called (translated into a SystemExit exception). This
meant that if one of these occurred when applying a say 1 GB
changegroup bundle2 part being downloaded over a network, it may take
Mercurial *several minutes* to terminate after a SIGINT because the
process is waiting on the network to stream megabytes of data. This is
not a great user experience and a regression from bundle1. Furthermore,
many process supervisors tend to only give processes a finite amount of
time to exit after delivering SIGINT: if processes take too long to
self-terminate, a SIGKILL is issued and Mercurial has no opportunity to
clean up. This would mean orphaned locks and transactions. Not good.
This patch changes the bundle2 application behavior to fail faster
when an interrupt or system exit is requested. It does so by not
catching BaseException (which includes KeyboardInterrupt and
SystemExit) and by explicitly checking for these conditions in
yet another handler which would also seek to the end of the current
bundle2 part on failure.
The end result of this patch is that SIGINT is now reacted to
significantly faster: the active transaction is rolled back
immediately without waiting for incoming bundle2 data to be consumed.
This restores the pre-bundle2 behavior and makes Mercurial treat
signals with the urgency they deserve.
The bundle2 changegroup part has an advisory param saying how many
changesets are in the part. Before this patch, we were setting
this part when generating bundle2 parts via the wire protocol but
not when generating local bundle2 files.
A side effect of not setting the changeset count part is that progress
bars don't work when applying changesets. As the tests show, this
impacted clone bundles, shelve, backup bundles, `hg unbundle`, and
anything touching bundle2 files.
This patch adds a backdoor to allow us to pass state from
changegroup generation into the unbundler. We store the number
of changesets in the changegroup in this state and use it to
populate the aforementioned advisory part parameter when generating
the bundle2 bundle.
I concede that I'm not thrilled by how state is being passed in
changegroup.py (it feels a bit hacky). I would love to overhaul the
rather confusing set of functions in changegroup.py with something that
passes rich objects around instead of e.g. low-level generators.
However, given the code freeze for 3.9 is imminent, I'd rather not
undertake this endeavor right now. This feels like the easiest way
to get the parameter added to the changegroup part.
An upcoming change that introduces a 2nd part parameter to a part
reveals that `hg debugbundle` isn't deterministic because parameters
are stored on n plain, unsorted dict.
While we could change that command to sort before output, I think
the more important underlying issue is that bundle2 reading is taking
an ordered data structure and converting it to an unordered one.
Plugging in util.sortdict() fixes that problem while preserving API
compatibility.
This patch also appears to shine light on the fact that we don't
have tests verifying parts with multiple parameters roundtrip
correctly. That would be a good thing to test (and fuzz)... someday.
Usually, the heads will have the same ordering in handlecheckheads. Insisting
on the same ordering is however an unnecessary constraint that in some custom
cases can cause pushes to fail even though the actual heads didn't change. This
caused production issues for us in combination with the current version of
https://bitbucket.org/Unity-Technologies/hgwebcachingproxy/ .
Change 760f9c8ad835 (changegroup: move chunk extraction into a
getchunks method of unbundle10, 2014-04-10) extracted some code to a
getchunks() method and copied a comment about the changegroup format
to the new method. The copy that remains in the old place, doesn't
make much sense there, so let's remove it.
writebundle() writes a bundle2 bundle or a plain changegroup1. Imagine
away the "2" in "bundle2.py" for a moment and this change should makes
sense. The bundle wraps the changegroup, so it makes sense that it
knows about it. Another sign that this is correct is that the delayed
import of bundle2 in changegroup goes away.
I'll leave it for another time to remove the "2" in "bundle2.py"
(alternatively, extract a new bundle.py from it).
In b89de5ee5b31 (changegroup: don't support versions 01 and 02 with
treemanifests, 2016-01-19), I stopped supporting use of cg1 and cg2
with treemanifest repos. What I had not considered was that it's
perfectly safe to pull *to* a treemanifest repo using any changegroup
version. As reported in issue5066, I therefore broke pull from old
repos into a treemanifest repo. It was not covered by the test case,
because that pulled from a local repo while enabling treemanifests,
which enabled treemanifests on the source repo as well. After
switching to pulling via HTTP, it breaks.
Fix by splitting up changegroup.supportedversions() into
supportedincomingversions() and supportedoutgoingversions().
By adding a mandatory 'treemanifest' parameter in the bundle2 part, we
make it possible for the recipient to set repo requirements before the
manifest revlog is accessed.
Before bundle2, hook output from hook failures was prefixed with
"remote: ". Up to this point with bundle2, the output was converted to
the message to print in an Abort exception. This had 2 implications:
1) It was unclear whether an error message came from the local repo
or the remote
2) The exit code changed from 1 to 255
This patch changes the handling of error:abort bundle2 parts during push
to prefix the error message with "remote: ". This restores the old
behavior.
We still preserve the behavior of raising an Abort during bundle2
application failure. This is a regression from pre-bundle2 because the
exit code changed.
Because we no longer raise an Abort with the remote's message, we needed
to insert a message for the new Abort. So, I invented a new error
message for that. This is another change from pre-bundle2. However, I
like the new error message because it states unambiguously who aborted
the push failed, which I think is important for users so they can decide
what's next.
We allow specifying the url to carry it to hooks. This gets us closer to
'bundle1.apply(...)' and will allow us to remove regressions in multiple place
where we forget to pass the url to hooks.
We allow specifying the source to carry it to hooks. This gets us closer to
'bundle1.apply(...)' and will allow us to remove regressions in multiple places
where we forget to pass the source to hooks.
There is a case where the intent is clear and the transaction is not optional. We
want to be able to alter that transaction in a wide and easy way. We cannot get
a unified '.apply(repo)' method for bundle1 and bundle2 yet because the api are
still a bit too far apart. But this is a good step forward to get the rc out.
We would skip the part if it was fully unknown, so we should also skip it if we
know we won't be able to apply it. This will allow us to produce bundles with
obsolescence markers alongside changegroup while still being able to apply them
on any client.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
A future patch will allow the bundle2 lock be taken lazily. We need to
introduce transaction-gets to each handler that needs the lock.
The tests caught these issues when I added lazy locking.
There is use case for directly forward and bundle2 stream from the peer to a
file (eg: 'hg incoming --bundle'), However ssh peers have no way to know the
'getbundle' is over except for actually interpreting the bundle. So we need to
have the unbundle do the interpreting and forward job.
The function is marked as private to highlight that this is terrible and that we
are sorry.
We want to introduce a simple way to forward the content of a bundle2 stream.
For this purpose, we will need to both yield the parameters block and process it
(to apply any behavior change it might indicate).
This changeset adds support for a 'compression' parameter in bundle2 streams.
When set, it controls the compression algorithm used for the payload part of the
bundle2.
There is currently no usage of this except in tests.
We are about to allow compressing the core of the bundle2. So we extract the
generation of this bits in its own parts to make this compression phases easier
in a later changesets.
As a comment in the code have been asking for, it is now possible to register
piece of code that handle parameters for the stream. I've been wondering is such
function should be class methods or not. I eventually went for externally
decorated methods to stick with the culture of extensibility from an extensions
that apply to bundle2.
The original name explicitly mention "Part", however it is also used outside of
parts related feature. We rename from 'UnsupportedPartError' to
'BundleUnknownFeatureError' to fix this.
GeneratorExit means the other end of the conversation has already
stopped listening, so don't try and yield out error
information. Instead, just let the GeneratorExit propagate normally.
This should resolve esoteric issues observed with servers that have
aggressive timeouts waiting for data to send to clients logging
internal Python errors[0]. This has been observed with both gunicorn's
gevent worker model and with scm-manager's built-in webserver (which
itself is something sitting inside jetty.)
0: Exception RuntimeError: 'generator ignored GeneratorExit' in <generator object getchunks at 0x7fd2f6c586e0> ignored
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
The docstring for the header size field currently says "The total
number of Bytes used by the part headers", but the size is about a
single header, so let's change it to "header".
When unbundling over the wire is aborted, we have a mechanism to convey the
error inside a bundle part. As we add support for more errors, we need to know if
the client will support them. For this purpose, we duck punch the reply
capabilities of the client on the raised extensions.
This is similar to what is done to salvage the server output on error.
The pushkey code is generic and the server side has little context on what the
client is trying to achieve. Generating interesting error messages server side
would be challenging. Instead we introduce a dedicated exception that carries more
data. In particular, it carries the id of the part which failed that will allow
clients to display custom error messages depending on the part intent.
The processing and transfer-over-the-wire of this exception is to be implemented
in coming changesets.
So far, result of a pushkey operation had no consequence on the transaction
(beside the change). We makes it respect the 'mandatory' flag of part so that
failed pushkey call abort the whole transaction. This will allow rejecting
changes (primary target: changesets) regarding phases or bookmark criteria in
the future (when we will push such data in a mandatory part).
We currently raise an abort error because all clients support it. We'll
introduce a more precise error in the next changesets.
.hgtags fnodes cache entries can be expensive to compute, especially
if there are hundreds of even thousands of them. This patch implements
support for receiving a bundle2 part that contains a mapping of
changeset to .hgtags fnodes.
An upcoming patch will teach the server to send this part, allowing
clients to bypass having to redundantly compute these values.
A number of tests changed due to the client advertising the "hgtagsfnodes"
capability.
The old output is very verbose and unsuitable for general debug level. It is
however very useful for debugging bundle2 generation or consumption issues. All
this verbose ouput is hidden under a 'devel.bundle2.debug' flag.
The part generation process was lacking a ui object and could not produce debug
output. It seems valuable to have some debug output on this part too, especially
now that we are planning to be able to hide it in the default --debug output.
The bundling process is very verbose, we would like to be able to hide such
output behind a configuration flag and have it more explicitly referencing
bundle2. The first step is to gather all these messages in a dedicated function.