When unbundling over the wire is aborted, we have a mechanism to convey the
error inside a bundle part. As we add support for more errors, we need to know if
the client will support them. For this purpose, we duck punch the reply
capabilities of the client on the raised extensions.
This is similar to what is done to salvage the server output on error.
The pushkey code is generic and the server side has little context on what the
client is trying to achieve. Generating interesting error messages server side
would be challenging. Instead we introduce a dedicated exception that carries more
data. In particular, it carries the id of the part which failed that will allow
clients to display custom error messages depending on the part intent.
The processing and transfer-over-the-wire of this exception is to be implemented
in coming changesets.
So far, result of a pushkey operation had no consequence on the transaction
(beside the change). We makes it respect the 'mandatory' flag of part so that
failed pushkey call abort the whole transaction. This will allow rejecting
changes (primary target: changesets) regarding phases or bookmark criteria in
the future (when we will push such data in a mandatory part).
We currently raise an abort error because all clients support it. We'll
introduce a more precise error in the next changesets.
.hgtags fnodes cache entries can be expensive to compute, especially
if there are hundreds of even thousands of them. This patch implements
support for receiving a bundle2 part that contains a mapping of
changeset to .hgtags fnodes.
An upcoming patch will teach the server to send this part, allowing
clients to bypass having to redundantly compute these values.
A number of tests changed due to the client advertising the "hgtagsfnodes"
capability.
The old output is very verbose and unsuitable for general debug level. It is
however very useful for debugging bundle2 generation or consumption issues. All
this verbose ouput is hidden under a 'devel.bundle2.debug' flag.
The part generation process was lacking a ui object and could not produce debug
output. It seems valuable to have some debug output on this part too, especially
now that we are planning to be able to hide it in the default --debug output.
The bundling process is very verbose, we would like to be able to hide such
output behind a configuration flag and have it more explicitly referencing
bundle2. The first step is to gather all these messages in a dedicated function.
The bundling process is very verbose, we would like to be able to hide such
output behind a configuration flag and have it more explicitly referencing
bundle2. The first step is to gather all these messages in a dedicated
function.
The same gathering will be later do for debug message issue by unbundling.
The current bundle2 processing was capturing all output. This is nice as it
provide better meta data about what output what, but this was changing two
things:
1) adding a prefix "remote: " to "other" output during local push (issue4613)
2) local and ssh push does not provide real time output anymore (issue4615)
As we are unsure about what form should be used in (1) and how to solve (2) we
disable output capture in this two cases. Output capture can be forced using an
experimental option.
Until this changeset, we were only able to save output if an error happened
during the 'transaction.close()' phase. If the 'processbundle' call raised an
exception, the 'bundleoperation' object was never returned, so the reply bundle
was never accessible and no output could be salvaged. We introduce a quick (but
not very elegant) fix to gain access to any reply created during the processing.
This conclude this output related series. We should hopefully be able client-side to see the
whole server output, in a proper order.
The code is now complex enough that a refactoring of it would make sense on
default.
External hook used to directly write on stdout and stderr. As a result their
output was not captured by the bundle2 processing. This resulted in confusing
out of order output on the client side. We are now capturing hooks output in
this context.
Remote output should be silenced by --quiet. The issue was found while running
`test-largefiles-cache.t` so it will get tested once we switch bundle2 by
default.
The re-handling of output is happening in some 'unbundle' callers. We have to
transmit the output information to this place so we stick it on the exception.
This is the third step in our quest for preserving the server output on error
(issue4594). We want to be able to copy the output part from the aborted reply
into the exception bundle.
This method returns a copy of all 'output' parts added to the bundler.
This is the second step in our quest for preserving the server output on error
(issue4594). We want to be able to copy the output parts from the aborted reply
into the exception bundle.
The function will be used in a later patch.
This is the first step in our quest for preserving the server output on error
(issue4594). We want to be able to copy the output parts from the aborted reply
into the exception bundle.
The function will be used in a later patch.
We want to preserve output even when the unbundling fails (eg: hook output). So
we must make sure that everything we have is flushed into the reply bundle.
(This is related to issue4594)
The obsolescence markers exchange is still experimental. We (developer) need
more information about what is going on. I'm adding an experimental flag to add
display the amount of data exchanged during bundle2 exchanges.
It is finally time to freeze the bundle2 format! To do so we:
- rename HG2Y to HG20,
- drop "b2x:" prefix from all part names,
- rename capability to "bundle2-exp" to "bundle2"
- rename the hook flag from 'bundle2-exp' to 'bundle2'
We now take full advantage of the 'getunbundler' function by using a
'{version -> unbundler-class}' mapping. This map currently contains a single
entry but will make it easy to support more versions from an extension/the
future.
At some point, this map will probably contain bundler-class information too,
in the same fashion the packer map does. However, this is not critically required
right now so it will happen by itself when needed.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
To support multiple bundle2 formats, we will need a function returning
the proper unbundler according to the header. We introduce such aa
function and change the usage in the code base. The function will get
smarter in later changesets.
This is somewhat similar to the dispatching we do for 'HG10' and 'HG11'.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
This makes it easy to create a new bundler class that inherits from
the core one. This matches the way 'changegroup' packers work.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
Bundlerepo uses the compressed() method to determine whether it should write
an uncompressed temporary file. Since we don't support compressed bundle2 files
at the moment, make this method return true.
Replace bare part.read() calls with part.seek(0, 2) since the return value is
being ignored. As this doesn't necessarily require building a string that
contains the rest of the part, the potential exists to reduce the memory
footprint of these operations.
This implements a seek() method for unbundlepart. This allows on-disk bundle2
parts to behave enough like files for bundlerepo to handle them. A future
patch will add support for bundlerepo to read the bundle2 files that are
written when the experimental.strip-bundle2-version config option is used.
This patch adds seek(), tell(), and close() implementations for unpackermixin
which forward to the file descriptor's implementation if possible. A future
patch will use this to make bundle2.unbundlepart seekable, which will in turn
make it usable as a file descriptor for bundlerepo.
The binary format description has always stated that the parttype should be simple,
but it was never really enforced. Recent discussions have convinced me we want to
keep the part type simple and easy to debug. There is enough extensibility in
the rest of the format.
Encoding whether or not a part is mandatory in the capitalization of the
parttype is unintuitive and error-prone. This sequence of patches separates
these concerns in the API to reduce programmer error and pave the way for
a potential change in how this information is transmitted over the wire.
Since the parttype and mandatory bit are separated in bundle2.unbundlepart
(see previous patch), there is no longer a need to remove the mandatory bit
before working with the parttype.
Encoding whether or not a part is mandatory in the capitalization of the
parttype is unintuitive and error-prone. This sequence of patches separates
these concerns in the API to reduce programmer error and pave the way for
a potential change in how this information is transmitted over the wire.
This patch separates the two pieces of information when reading the part header
so that it's unnecessary to know how they were combined during transmission.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
This diff adds an experimental config option "bundle2.pushback" which provides
a transaction to the reply unbundler during a push operation. This behavior is
opt-in because of potential security issues: the response can contain any part
type that has a handler defined, allowing the server to make arbitrary changes
to the local repository.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
The default transaction getter for processbundle is a private function that
raises an exception; this diff lets calling code pass None as the transaction
getter to explicitly request this default behavior.
The next diff will check a config option to determine whether to provide a
transaction to the reply bundle processor. If one shouldn't be provided, the
code needs a way to specify that the default behavior should be used.
This will let the bundle2 client and server detect what packer they should be using.
This detection part is not done. I expect it to be done with the addition of the
second packer (with generaldelta support).
If an exception is raised during a bundle2 part payload generation it is now
recorded in the bundle. If such exception occurs, we capture it, transmit an
abort exception through the bundle, cleanly close the current part payload and
raise it again. This allow to generate valid bundle even in case of exception so
that the consumer does not wait forever for a dead producer. This also allow to
raise the exception during unbundling at the exact point it happened during
bundling make debugging easier.
It is now possible to emit a single part in the middle of a payload production.
This part will be processed with limitation (only access to a `ui` object). The
goal is to let the server raise exception and output while a part is being
processed. The source motivation is to transmit exception that occurs while
generating a part.
This change is was the motivation to bump the bundle2 format from HG2X to HG2Y.
Somehow, the format bump made it into 3.2 without it. So this change go on
stable. It is low risk as bundle2 is still disabled by default.
Bundle2 opens doors to advanced features allowing to reduce load on
mercurial servers, and improve clone experience for users on unstable or
slow networks.
For instance, it could be possible to pre-generate a bundle of a
repository, and give a pointer to it to clients cloning the repository,
followed by another changegroup with the remainder. For significantly
big repositories, this could come as several base bundles with e.g. 10k
changesets, which, combined with checkpoints (not part of this change),
would prevent users with flaky networks from starting over any time
their connection fails.
While the server-side support for those features doesn't exist yet, it
is preferable to have client-side support for this early-on, allowing
experiments on servers only requiring a vanilla client with bundle2
enabled.
We are changing all integers that denote the size of a chunk to read to int32.
There are two main motivations for that.
First, we change everything to the same width (32 bits) to make it possible for
a reasonably agnostic actor to forward a bundle2 without any extra processing.
With this change, this could be achieved by just reading int32s and forwarding
chunks of the size read. A bit a smartness would be logic to detect the end of
stream but nothing too complicated.
Second, we need some capacity to transmit special information during the bundle
processing. For example we would like to be able to raise an exception while a
part is being read if this exception happend while this part was generated.
Having signed integer let us use negative numbers to trigger special events
during the parsing of the bundle.
The format is renamed for B2X to B2Y because this breaks binary
compatibility. The B2X format support is dropped. It was experimental to
allow this kind of things. All elements not directly related to the binary
format remain flagged "b2x" because they are still compatible.
This is code movement only. This will be useful to have it separated for reuse
purposes. We plan to introduce a new feature to the bundle format that allow
inserting a part in the middle of another part payload. This will be useful to
transmit a exception raised during a part generation.
The final writing of the empty part was done explicitly. We now using proper
pack call using symbolic constant. This open simple change in the bundle2
format.
Functions like getbundle and classes like unbundle10 really manipulate
changegroups and not bundles. A HG10 bundle is the same as a changegroup
plus a small header, but this is no longer the case for a HG2X bundle,
so it's better to separate the names a bit.
Right next to the function that encodes the supported versions in
capabilities we add a function that decodes the versions out of capabilities.
This is going to be useful to know what formats can be used for exchange.
This property can be used to know how much parts have been added to the bundle2.
This will be useful to check if any part have been generated for a push.
After ``listkeys`` we can now include ``pushkey`` request in a bundle2. The part
uses a very simple scheme closest as possible to the current wireproto command
for ``pushkey``. We may eventually decide for a more sophisticated part format
before the protocol becomes final.
The process of decoding remote bundle2caps blob into a dictionary is cumbersome.
We move it into a small helper function. This will clarify code that reads
bundle2 capabilities of peers and helps using it in new places.
Advisory parts are advisory. If a handler exists but does not support the
proper parameters, we can safely ignore it.
Test has been updated to include this case.
Once we picked a handler, we check that all mandatory parameter keys are
properly supported. If not we raise an exception.
We added a test for this case.
The code now fails for any part with unknown mandatory parameters. We will
ignore such errors for advisory parts in a later changeset.
If we are to enforce the mandatory aspect of parameter, we need a way to
discover what a handler supports. The best option we end up with is this a simple
declaration of known parameters at registration time.
We simply plug the list of parameters on the function object because Python lets
us do that and there is no benefit for a more complicated way.
One of the handlers is updated for example and testing.
We picked a null character to split each parameter during the transfer. This is
fragile if the same character is used in parameter name. However other
codes will already behave in a strange way in that case, so we are not
introducing any regression. A better format may be picked for the final
version of the protocol.
This is a backward compatibility breakage per se. But bundle2 was explicitly
flagged as experimental, and this is one an error path anyway. So the worse
possible outcome from this change is to still have a crash but with a different
message.
We are going to raise exceptions for a wider range of cases: unsupported
mandatory stream and part parameters. We rename the exception with a wider
name.
We expose all keys that MUST be processed in ``part.mandatorykeys``. This makes
it much easier to access the information. Enforcement of the mandatory
parameters is coming in later changesets.
This exposes all parameters the part received into a ``part.params`` dictionary.
This should be much easier to use.
This dictionary itself does not expose the mandatory or advisory aspect of
parameters, but no current users of bundle2 actually enforce any of this logic.
Coming changesets will improve this aspect.
The handling of parameters will become much more sophisticated in the coming
changesets. So we extract the logic in a function to not pollute the generic
logic.
No rules were specified about parameter key uniqueness. We document that keys
should be unique and document it. This opens the way to a more friendly (read
dictionary like) way to access value of parameters in the code.
As we will introduce functions to alter already created parts, we need a proper
exception to raise when code tries to alter a part that cannot be altered anymore.
As we are moving toward being able to alter a part after its creation, we need
to make the implication of the part being already part of the bundle2 clear.
We introduce a ``_generated`` attribute on parts. Coming changesets will
make it easier to update a part's contents after its creation. We need a way to track
if the part is still open to modification or if it is currently being generated
and should not be touched anymore.
As a bonus, we can now detect and crash if someone manages to write bogus code
to get a part generated twice.
Creating new parts is the most common operation people do when exposed to a
bundler. We create a dedicated method on the bundler object for it. This will
simplify the code and also avoid having to import the ``mercurial.bundle2``
module in multiple places.
One part creators have been updated for testing purpose.
The `bundle20` class contains methods to help define the content and methods
to generate the actual stream. We add small doc headers to help distinguish
between the two.
Same drill again. We catch the PushRaced error, check if it cames from
a bundle2 processing, if so we turn it into a bundle2 with a part
transporting error information to be reraised client side.
If the heads on the server differ from the ones reported seen by the client at
bundle time, we raise a PushRaced exception. However, the part raising the
exception was broken.
To fix it, we move the PushRaced class in the error module so it can be
accessible everywhere without an import cycle.
A test is also added to prevent regression.
Same as for Abort error, we catch the error, encode it into a bundle2 reply
(expected by the client) and stream this reply. The client processing of the
error will raise the exception again.
Clients expect a bundle2 reply to their bundle2 submission. So we
catch the Abort error and turn it into a bundle2 containing a part
transporting the exception data. The unbundling of this reply will
raise the error again.
All currently core parts are moved to a `bx2` namespace (for "bundle 2
experimental"). This should avoid conflicts between the final stable
format and the one about to be released.
The current implementation of bundle2 is still very experimental and the 3.0
freeze is yesterday. The current bundle2 format has never been field-tested, so
we rename the header to HG2X. This leaves the HG20 header available for real
usage as a stable format in Mercurial 3.1.
We won't guarantee that future mercurial versions will keep supporting this
`HG2X` format.
This attribute conveys the capabilities supported by the destination of the
bundle. It is used to decide which parts to include in the bundle.
This is currently a set but will probably be turned into a dictionary to allow
capabilities with values.
When a reply is built, the bundle processing will capture the output of each
handler and sends it to the client in a dedicated part.
As a side effect, this add a "remote: " prefix to destination output on local
push. This is considered okay for now as:
1. bundle2 is still experimental,
2. Matt said he could be okay to change output for bundle2,
3. This keeps the implementation simple.
This changeset does it for stdout only. stderr will be done in a future changeset.
The bundle2 processing does not create a bundle2 reply by default anymore. It
is only done if the client requests it with a `replycaps` part. This part is
called `replycaps` as it will eventually contain data about which bundle2
capabilities are supported by the client.
We have to add a flag to the test command to control whether a reply is
generated or not.
The `readbundle` function will consume the 4 first bytes to dispatch between
various unbundler. We introduce a way to inform `unbundle20` that the header
has been read and it can be trusted.
Using `readbundle` in the part handlers creates a circular import hell. We are
now using a simple `HG10UN` stream with no header. Some parameters may
later be introduced on the part to change parameter.
Producers are updated as well.
This part is intended to hold the same role as the `heads` argument of the
unbundle function. The client fill it with the known heads at bundle time and
the server will abort if its heads changed.
The `unbundle` part gains a `read` method to retrieve payload content.
This method behaves as a python file-like read method.
The bundle-processing code is updated to make sure a part is fully consumed before
another one is extracted.
Test output changes because the debug output is even more interleaved now.
We have a new unbundle class and it is now responsible from extracting its own
data. The top level bundler only extracts the header (to detect an end of stream
marker) then leaves everything else to the `unbundlepart` class. The ultimate
goal is to have `unbundlepart` responsible for lazily extracting its payload.
This is mostly code movement.
The coming `unbundlepart` will need the same kind of method than `unbundle20`
for unpacking data from the stream. We extract them into a mixin class before
the creation of `unbundlepart`.
We are going to introduce an `unbundlepart` dedicated to reading bundle. So we
need to rename the one used to create bundle. Even if dedicated to creation, this
is still used for unbundling until we get the new class.
When the `part.data` attribute is an iterator, we assume it is an iterator of
chunks and use it.
We use a chunkbuffer to yield chunks of 4096 bytes.
The tests are updated to use this feature.
We are preparing streaming capability for part. So the generation of payload
chunk will becomes more complex. We extract this part in its own function before
any changes.
We now have an official way to return the result of addchangegroup. The tests are
updated to check that the return bundle is properly created. It will be used
when push is bundle2 enabled.
We do not know yet what kind of data future features and extensions will need to
exchange. To handle that, bundle2 allows to send arbitrary content to the
server. As a consequence, we need to be able to reply arbitrary content to the
client. And, we can use bundle2 to transmit those arbitrary data.
When a client will push a bundle2 to the server, the server will reply with a
bundle2 itself.
This changeset installs the first stone of this logic and test it.
For sending response to a pushed bundle, we need to link reply parts to request
part. We introduce a part id for this purpose. This is a 32 bit unique
integer stored in the header.
We use the `gettransaction` method approach already used for pull. We
need this because we do not know beforehand if the bundle needs a
transaction to be created. And (1) we do not want to create a
transaction for nothing. (2) Some bundle2 bundles may be read-only and
do not require any lock or transaction to be held.
The current changegroup format is put in a "changegroup" part and processed by
an appropriate handlers.
This is not production ready code, but let us start smoke testing.
Part handlers can now add records to the `bundleoperation` object. This can be
used to help other parts or to let the caller of the unbundling process react
to the results.