This implements a seek() method for unbundlepart. This allows on-disk bundle2
parts to behave enough like files for bundlerepo to handle them. A future
patch will add support for bundlerepo to read the bundle2 files that are
written when the experimental.strip-bundle2-version config option is used.
This patch adds seek(), tell(), and close() implementations for unpackermixin
which forward to the file descriptor's implementation if possible. A future
patch will use this to make bundle2.unbundlepart seekable, which will in turn
make it usable as a file descriptor for bundlerepo.
The binary format description has always stated that the parttype should be simple,
but it was never really enforced. Recent discussions have convinced me we want to
keep the part type simple and easy to debug. There is enough extensibility in
the rest of the format.
Encoding whether or not a part is mandatory in the capitalization of the
parttype is unintuitive and error-prone. This sequence of patches separates
these concerns in the API to reduce programmer error and pave the way for
a potential change in how this information is transmitted over the wire.
Since the parttype and mandatory bit are separated in bundle2.unbundlepart
(see previous patch), there is no longer a need to remove the mandatory bit
before working with the parttype.
Encoding whether or not a part is mandatory in the capitalization of the
parttype is unintuitive and error-prone. This sequence of patches separates
these concerns in the API to reduce programmer error and pave the way for
a potential change in how this information is transmitted over the wire.
This patch separates the two pieces of information when reading the part header
so that it's unnecessary to know how they were combined during transmission.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
This diff adds an experimental config option "bundle2.pushback" which provides
a transaction to the reply unbundler during a push operation. This behavior is
opt-in because of potential security issues: the response can contain any part
type that has a handler defined, allowing the server to make arbitrary changes
to the local repository.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
The default transaction getter for processbundle is a private function that
raises an exception; this diff lets calling code pass None as the transaction
getter to explicitly request this default behavior.
The next diff will check a config option to determine whether to provide a
transaction to the reply bundle processor. If one shouldn't be provided, the
code needs a way to specify that the default behavior should be used.
This will let the bundle2 client and server detect what packer they should be using.
This detection part is not done. I expect it to be done with the addition of the
second packer (with generaldelta support).
If an exception is raised during a bundle2 part payload generation it is now
recorded in the bundle. If such exception occurs, we capture it, transmit an
abort exception through the bundle, cleanly close the current part payload and
raise it again. This allow to generate valid bundle even in case of exception so
that the consumer does not wait forever for a dead producer. This also allow to
raise the exception during unbundling at the exact point it happened during
bundling make debugging easier.
It is now possible to emit a single part in the middle of a payload production.
This part will be processed with limitation (only access to a `ui` object). The
goal is to let the server raise exception and output while a part is being
processed. The source motivation is to transmit exception that occurs while
generating a part.
This change is was the motivation to bump the bundle2 format from HG2X to HG2Y.
Somehow, the format bump made it into 3.2 without it. So this change go on
stable. It is low risk as bundle2 is still disabled by default.
Bundle2 opens doors to advanced features allowing to reduce load on
mercurial servers, and improve clone experience for users on unstable or
slow networks.
For instance, it could be possible to pre-generate a bundle of a
repository, and give a pointer to it to clients cloning the repository,
followed by another changegroup with the remainder. For significantly
big repositories, this could come as several base bundles with e.g. 10k
changesets, which, combined with checkpoints (not part of this change),
would prevent users with flaky networks from starting over any time
their connection fails.
While the server-side support for those features doesn't exist yet, it
is preferable to have client-side support for this early-on, allowing
experiments on servers only requiring a vanilla client with bundle2
enabled.
We are changing all integers that denote the size of a chunk to read to int32.
There are two main motivations for that.
First, we change everything to the same width (32 bits) to make it possible for
a reasonably agnostic actor to forward a bundle2 without any extra processing.
With this change, this could be achieved by just reading int32s and forwarding
chunks of the size read. A bit a smartness would be logic to detect the end of
stream but nothing too complicated.
Second, we need some capacity to transmit special information during the bundle
processing. For example we would like to be able to raise an exception while a
part is being read if this exception happend while this part was generated.
Having signed integer let us use negative numbers to trigger special events
during the parsing of the bundle.
The format is renamed for B2X to B2Y because this breaks binary
compatibility. The B2X format support is dropped. It was experimental to
allow this kind of things. All elements not directly related to the binary
format remain flagged "b2x" because they are still compatible.
This is code movement only. This will be useful to have it separated for reuse
purposes. We plan to introduce a new feature to the bundle format that allow
inserting a part in the middle of another part payload. This will be useful to
transmit a exception raised during a part generation.
The final writing of the empty part was done explicitly. We now using proper
pack call using symbolic constant. This open simple change in the bundle2
format.
Functions like getbundle and classes like unbundle10 really manipulate
changegroups and not bundles. A HG10 bundle is the same as a changegroup
plus a small header, but this is no longer the case for a HG2X bundle,
so it's better to separate the names a bit.
Right next to the function that encodes the supported versions in
capabilities we add a function that decodes the versions out of capabilities.
This is going to be useful to know what formats can be used for exchange.
This property can be used to know how much parts have been added to the bundle2.
This will be useful to check if any part have been generated for a push.
After ``listkeys`` we can now include ``pushkey`` request in a bundle2. The part
uses a very simple scheme closest as possible to the current wireproto command
for ``pushkey``. We may eventually decide for a more sophisticated part format
before the protocol becomes final.
The process of decoding remote bundle2caps blob into a dictionary is cumbersome.
We move it into a small helper function. This will clarify code that reads
bundle2 capabilities of peers and helps using it in new places.
Advisory parts are advisory. If a handler exists but does not support the
proper parameters, we can safely ignore it.
Test has been updated to include this case.
Once we picked a handler, we check that all mandatory parameter keys are
properly supported. If not we raise an exception.
We added a test for this case.
The code now fails for any part with unknown mandatory parameters. We will
ignore such errors for advisory parts in a later changeset.
If we are to enforce the mandatory aspect of parameter, we need a way to
discover what a handler supports. The best option we end up with is this a simple
declaration of known parameters at registration time.
We simply plug the list of parameters on the function object because Python lets
us do that and there is no benefit for a more complicated way.
One of the handlers is updated for example and testing.
We picked a null character to split each parameter during the transfer. This is
fragile if the same character is used in parameter name. However other
codes will already behave in a strange way in that case, so we are not
introducing any regression. A better format may be picked for the final
version of the protocol.
This is a backward compatibility breakage per se. But bundle2 was explicitly
flagged as experimental, and this is one an error path anyway. So the worse
possible outcome from this change is to still have a crash but with a different
message.
We are going to raise exceptions for a wider range of cases: unsupported
mandatory stream and part parameters. We rename the exception with a wider
name.
We expose all keys that MUST be processed in ``part.mandatorykeys``. This makes
it much easier to access the information. Enforcement of the mandatory
parameters is coming in later changesets.
This exposes all parameters the part received into a ``part.params`` dictionary.
This should be much easier to use.
This dictionary itself does not expose the mandatory or advisory aspect of
parameters, but no current users of bundle2 actually enforce any of this logic.
Coming changesets will improve this aspect.
The handling of parameters will become much more sophisticated in the coming
changesets. So we extract the logic in a function to not pollute the generic
logic.
No rules were specified about parameter key uniqueness. We document that keys
should be unique and document it. This opens the way to a more friendly (read
dictionary like) way to access value of parameters in the code.
As we will introduce functions to alter already created parts, we need a proper
exception to raise when code tries to alter a part that cannot be altered anymore.
As we are moving toward being able to alter a part after its creation, we need
to make the implication of the part being already part of the bundle2 clear.
We introduce a ``_generated`` attribute on parts. Coming changesets will
make it easier to update a part's contents after its creation. We need a way to track
if the part is still open to modification or if it is currently being generated
and should not be touched anymore.
As a bonus, we can now detect and crash if someone manages to write bogus code
to get a part generated twice.
Creating new parts is the most common operation people do when exposed to a
bundler. We create a dedicated method on the bundler object for it. This will
simplify the code and also avoid having to import the ``mercurial.bundle2``
module in multiple places.
One part creators have been updated for testing purpose.
The `bundle20` class contains methods to help define the content and methods
to generate the actual stream. We add small doc headers to help distinguish
between the two.
Same drill again. We catch the PushRaced error, check if it cames from
a bundle2 processing, if so we turn it into a bundle2 with a part
transporting error information to be reraised client side.
If the heads on the server differ from the ones reported seen by the client at
bundle time, we raise a PushRaced exception. However, the part raising the
exception was broken.
To fix it, we move the PushRaced class in the error module so it can be
accessible everywhere without an import cycle.
A test is also added to prevent regression.
Same as for Abort error, we catch the error, encode it into a bundle2 reply
(expected by the client) and stream this reply. The client processing of the
error will raise the exception again.
Clients expect a bundle2 reply to their bundle2 submission. So we
catch the Abort error and turn it into a bundle2 containing a part
transporting the exception data. The unbundling of this reply will
raise the error again.
All currently core parts are moved to a `bx2` namespace (for "bundle 2
experimental"). This should avoid conflicts between the final stable
format and the one about to be released.
The current implementation of bundle2 is still very experimental and the 3.0
freeze is yesterday. The current bundle2 format has never been field-tested, so
we rename the header to HG2X. This leaves the HG20 header available for real
usage as a stable format in Mercurial 3.1.
We won't guarantee that future mercurial versions will keep supporting this
`HG2X` format.
This attribute conveys the capabilities supported by the destination of the
bundle. It is used to decide which parts to include in the bundle.
This is currently a set but will probably be turned into a dictionary to allow
capabilities with values.
When a reply is built, the bundle processing will capture the output of each
handler and sends it to the client in a dedicated part.
As a side effect, this add a "remote: " prefix to destination output on local
push. This is considered okay for now as:
1. bundle2 is still experimental,
2. Matt said he could be okay to change output for bundle2,
3. This keeps the implementation simple.
This changeset does it for stdout only. stderr will be done in a future changeset.
The bundle2 processing does not create a bundle2 reply by default anymore. It
is only done if the client requests it with a `replycaps` part. This part is
called `replycaps` as it will eventually contain data about which bundle2
capabilities are supported by the client.
We have to add a flag to the test command to control whether a reply is
generated or not.
The `readbundle` function will consume the 4 first bytes to dispatch between
various unbundler. We introduce a way to inform `unbundle20` that the header
has been read and it can be trusted.
Using `readbundle` in the part handlers creates a circular import hell. We are
now using a simple `HG10UN` stream with no header. Some parameters may
later be introduced on the part to change parameter.
Producers are updated as well.
This part is intended to hold the same role as the `heads` argument of the
unbundle function. The client fill it with the known heads at bundle time and
the server will abort if its heads changed.
The `unbundle` part gains a `read` method to retrieve payload content.
This method behaves as a python file-like read method.
The bundle-processing code is updated to make sure a part is fully consumed before
another one is extracted.
Test output changes because the debug output is even more interleaved now.
We have a new unbundle class and it is now responsible from extracting its own
data. The top level bundler only extracts the header (to detect an end of stream
marker) then leaves everything else to the `unbundlepart` class. The ultimate
goal is to have `unbundlepart` responsible for lazily extracting its payload.
This is mostly code movement.
The coming `unbundlepart` will need the same kind of method than `unbundle20`
for unpacking data from the stream. We extract them into a mixin class before
the creation of `unbundlepart`.
We are going to introduce an `unbundlepart` dedicated to reading bundle. So we
need to rename the one used to create bundle. Even if dedicated to creation, this
is still used for unbundling until we get the new class.
When the `part.data` attribute is an iterator, we assume it is an iterator of
chunks and use it.
We use a chunkbuffer to yield chunks of 4096 bytes.
The tests are updated to use this feature.
We are preparing streaming capability for part. So the generation of payload
chunk will becomes more complex. We extract this part in its own function before
any changes.
We now have an official way to return the result of addchangegroup. The tests are
updated to check that the return bundle is properly created. It will be used
when push is bundle2 enabled.
We do not know yet what kind of data future features and extensions will need to
exchange. To handle that, bundle2 allows to send arbitrary content to the
server. As a consequence, we need to be able to reply arbitrary content to the
client. And, we can use bundle2 to transmit those arbitrary data.
When a client will push a bundle2 to the server, the server will reply with a
bundle2 itself.
This changeset installs the first stone of this logic and test it.
For sending response to a pushed bundle, we need to link reply parts to request
part. We introduce a part id for this purpose. This is a 32 bit unique
integer stored in the header.
We use the `gettransaction` method approach already used for pull. We
need this because we do not know beforehand if the bundle needs a
transaction to be created. And (1) we do not want to create a
transaction for nothing. (2) Some bundle2 bundles may be read-only and
do not require any lock or transaction to be held.
The current changegroup format is put in a "changegroup" part and processed by
an appropriate handlers.
This is not production ready code, but let us start smoke testing.
Part handlers can now add records to the `bundleoperation` object. This can be
used to help other parts or to let the caller of the unbundling process react
to the results.
This object will hold all data and state gathered through the processing of a
bundle. This will allow:
- each handler to be aware of the things unbundled so far
- the caller to retrieve data about the execution
- bear useful object and logic (like repo, transaction)
- bear possible bundle2 reply triggered by the unbundling.
For now the object is very simple but it will grow at the same time as the
bundle2 implementation.
The unbundle can comes from multiple sources. (on disk file, peer, etc) and
(ultimately) of multiple type (bundle10, bundle20). The `processbundle` is no
longer in charge of creating the bundle.
When the bundle processing abort on unknown mandatory parts, we now makes sure
all the bundle content is read. This avoid leaving the communication channel in
an unrecoverable state.
Mandatory part cannot be ignored when unknown. We raise a simple KeyError
exception when this happen.
This is very early version of this logic, see inline comment for future
improvement lead.
We now have a function that interpret part content.
This is a version early version of this function. It'll see major changes in
scope and API in future development. As for previous I'm just focussing on
getting minimal logic setup. Refining will happen with real world usage.
We use the same approach that the other unpack, as function is given the struct
format and his both responsible for reading the right amount of data from the
header and unpack the struct.
This give use flexibility if we decide to change the size of something in the
format before the release.
We add the ability to bundle and unbundle a payload in parts. The payload is the
actual binary data of the part. It is used to convey all the applicative data.
For now we stick to very simple implementation with all the data fit in single
chunk. This open the door to some bundle2 testing usage. This will be improved before
bundle2 get used for real. We need to be able to stream the payload in multiple
part to exchange any changegroup efficiently. This simple version will do for
now.
Bundling and unbundling are done in the same changeset because the test for
parts is less modular. However, the result is not too complex.
The bundle2 now raise value error when seeing invalid parameter names. The first
introduced rules is: no empty parameter.
The test extension is improve to properly abort when ValueError are encountered.
This introduces support for arbitrary characters in stream parameters name and
value. The urlquote format has been chosen because it is:
- simple,
- standard,
- no-op on simple alphanumerical entry.
Parameter can now have a value. We use a `<name>=<value>` form inspired from
capabilities.
There is still no kind of escaping in the name or value, yet.
Stream level parameter have very restricted use case. Clarify why we chosen a
textual format and point that applicative data goes in applicative parts.
This changeset introduce an unbundler class to match the bundle2 bundler. It is
currently able to unbundle an empty bundle2 only and will gain more feature at
the same pace than the bundler.
It also comes with its special extension command in test.
This changeset is the very first of a long series. It create a new bundle2
module and add a simple class that generate and empty bundle2 container.
The module is documented with the current state of the implementation. For
information about the final goal you may want to consult the mercurial wiki
page:
http://mercurial.selenic.com/wiki/BundleFormat2
The documentation of the module will be updated with later patches adding more and
more feature to the format.
This patches also introduce a test case. This test case build and use its own
small extension that use the new bundle2 module. Since the new format is unable
to do anything right now, we could not use real mercurial code to test it.
Moreover, some advanced feature of the bundle2 spec will not be used by core
mercurial at all. So we need to have them tested.