This means that users of request batching don't need to worry
themselves with capability checking. Instead, they can just use
batching, and if the remote server doesn't support batching for some
reason the wirepeer code will transparently un-batch the requests.
This will allow for some slight simplification in a handful of
places. Prior to this change, largefiles would have been silently
broken against a server which did not support batching.
"Real" batching only makes sense for wirepeers, but it greatly
simplifies the clients of peer instances if they can be ignorant to
actual batching capabilities of that peer. By moving the
not-really-batched batching code into peer.peer, all peer instances
now work with the batching API, thus simplifying users.
This leaves a couple of name forwards in wirepeer.py. Originally I had
planned to clean those up, but it kind of unclarifies other bits of
code that want to use batching, so I think it makes sense for the
names to stay exposed by wireproto. Specifically, almost nothing is
currently aware of peer (see largefiles.proto for an example), so
making them be aware of the peer module *and* the wireproto module
seems like some abstraction leakage. I *think* the right long-term fix
would actually be to make wireproto an implementation detail that
clients wouldn't need to know about, but I don't really know what that
would entail at the moment.
As far as I'm aware, no clients of batching in third-party extensions
will need updating, which is nice icing.
This issue appears to be as old as wireproto batching itself: I can
reproduce the failure as far back as 6afda0a50a20 trivially by
rebasing the test changes in this patch, which was back in the 1.9
era. I didn't test before that change, because prior to that the
testfile has a different name and I'm lazy.
Note that the test thought it was checking this case, but it actually
wasn't: it put a literal ; in the arg and response for its greet
command, but the mangle/unmangle step defined in the test meant that
instead of "Fo, =;o" going over the wire, "Gp-!><p" went instead,
which doesn't contain any special characters (those being [.=;]) and
thus not exercising the escaping. The test has been updated to use
pre-unmangled special characters, so the request is now "Fo+<:o",
which mangles to "Gp,=;p". I have confirmed that the test fails
without the adjustment to the escaping rules in wireproto.py.
No existing clients of RPC batching were depending on the old behavior
in any way. The only *actual* users of batchable RPCs in core were:
1) largefiles, wherein it batches up many statlfile calls. It sends
hexlified hashes over the wire and gets a 0, 1, or 2 back as a
response. No risk of special characters.
2) setdiscovery, which was using heads() and known(), both of which
communicate via hexlified nodes. Again, no risk of special characters.
Since the escaping functionality has been completely broken since it
was introduced, we know that it has no users. As such, we can change
the escaping mechanism without having to worry about backwards
compatibility issues.
For the curious, this was detected by chance: it happens that the
lz4-compressed text of a test file for remotefilelog compressed to
something containing a ;, which then caused the failure when I moved
remotefilelog to using batching for file content fetching.
Well-behaved Mercurial clients will respect the httpheader capability by not
sending http headers longer than the given limit in bytes. The limit is
currently hard-coded at 1024 bytes, a safe value for any web server.
Since parsing headers is a notable factor in web server performance, tuning
header size can nontrivially improve performance for request-heavy operations
(eg. obsolete marker negotiation). Exposing the maximum header length limit
as a configuration setting is a simple way to enable such tuning.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
The 'bundlecaps' argument is expected to be a set, but 'listkeys' is
expected to be a list where ordering matters. We introduce a new 'scsv'
argument type for the 'set' version and move 'csv' to the 'list'
version.
'test-ssh.t' is changed because this introduced an instability in the order we
were producing listkeys parts.
This is a useful information to have in general and we already have debug
output related to listkeys. I'm planning to play around with massive amount of
phases roots and bookmarks so having this data in debug will be very useful.
This already got me to spot that one of the Logilab's review repo is exchanging
65KB of phases data during each exchanges.
Streaming clones are fast because they are essentially tar files.
On mozilla-central, a streaming clone only consumes ~55s CPU time
on clients as opposed to ~340s CPU time for a regular clone or gzip
bundle unbundle.
Mozilla is deploying static file "lookaside" support to our Mercurial
server. Static bundles are pre-generated and uploaded to S3. When a
clone is performed, the static file is fetched, applied, and then an
incremental pull is performed. Unfortunately, on an ideal network
connection this still takes as much wall and CPU time as a regular
clone (although it does save significant server resources).
We like the client-side wall time wins of streaming clones. But we want
to leverage S3-based pre-generated files for serving the bulk of clone
data.
This patch moves the code for producing a "stream bundle" into its
own standalone function, away from the wire protocol. This will enable
stream bundle files to be produced outside the context of the wire
protocol.
A bikeshed on whether exchange is the best module for this function
might be warranted. I selected exchange instead of changegroup because
"stream bundles" aren't changegroups (yet).
The 'bundlecaps' argument is built as a set, we need to stabilise the order
before exchanging them. Otherwise, in the test, http logs are unstable when the
'bundlecaps' contains something (eg: using bundle2).
In case of errors, output parts salvaged from the reply bundle are re-injected
into the bundle carrying the exception.
We still need to fix the situation for non-wireprotocol push.
We want to add output information to the error bundle. Before doing this, we
rework the code to have a single bundler creation and return statement. This
will make the update with the output simpler as only one place will have to be
touched.
That way, any new server will be ready to accept bundle2 payload. The decision
for the client to use it is still off by default so this is not turning bundle2
everywhere.
We introduce a new kill switch for this in case stuff goes wrong.
It is finally time to freeze the bundle2 format! To do so we:
- rename HG2Y to HG20,
- drop "b2x:" prefix from all part names,
- rename capability to "bundle2-exp" to "bundle2"
- rename the hook flag from 'bundle2-exp' to 'bundle2'
To support more bundle2 formats, we need a wider detection of bundle2-family
streams. The various places what were explicitly detecting the full magic string
are now matching on the first three characters of it.
To support multiple bundle2 formats, we will need a function returning
the proper unbundler according to the header. We introduce such aa
function and change the usage in the code base. The function will get
smarter in later changesets.
This is somewhat similar to the dispatching we do for 'HG10' and 'HG11'.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
This change touches every module in which repository.sopener was being used, and
changes it for the equivalent repository.svfs.
It should now be possible to remove localrepo.sopener.
We have been running discovery on unfiltered repository for quite some time.
This was aimed at two things:
- save some bandwith by prevent the repushing of common but hidden changesets
- allow phases changes on secret/hidden changeset on bare push.
The cost of this unfiltered discovery combined with evolution is actually really
high. Evolution likely create thousand of hidden heads, and the discovery is
going to try to discovery if each of them are common or not. For example,
pushing from my development mercurial repository implies 17 discovery
round-trip.
The benefit are rare corner cases while the drawback are massive. So we run the
discovery on a filtered repository again.
We add some hack to detect remote heads that are known locally and adds them to
the common set anyway, so the good behavior of most of the corner case should
remains. But this will not work in all cases.
This bring my discovery phase back from 17 round-trips to 1 or 2.
We are changing all integers that denote the size of a chunk to read to int32.
There are two main motivations for that.
First, we change everything to the same width (32 bits) to make it possible for
a reasonably agnostic actor to forward a bundle2 without any extra processing.
With this change, this could be achieved by just reading int32s and forwarding
chunks of the size read. A bit a smartness would be logic to detect the end of
stream but nothing too complicated.
Second, we need some capacity to transmit special information during the bundle
processing. For example we would like to be able to raise an exception while a
part is being read if this exception happend while this part was generated.
Having signed integer let us use negative numbers to trigger special events
during the parsing of the bundle.
The format is renamed for B2X to B2Y because this breaks binary
compatibility. The B2X format support is dropped. It was experimental to
allow this kind of things. All elements not directly related to the binary
format remain flagged "b2x" because they are still compatible.
Functions like getbundle and classes like unbundle10 really manipulate
changegroups and not bundles. A HG10 bundle is the same as a changegroup
plus a small header, but this is no longer the case for a HG2X bundle,
so it's better to separate the names a bit.
In all the remaining cases the comprehension variable is used for the same
thing as a previous loop variable.
This will mute some pyflakes "list comprehension redefines" warnings.
The ``getbundle`` function was initially design to return a changegroup bundle.
However, bundle2 allows transmitting a wide range of data. Some bundle2
requests may not include a changegroup at all.
Before this changeset, the client would request a changegroup for
``heads=[nullid]`` and receive an empty changegroup.
We introduce an official boolean parameter, ``cg``, that can be set
to false to disable changegroup generation on getbundle. A new bundle2
capability is introduced to let the client know.
In case of an unknown parameter passed to ``getbundle``, the server prints a
message and ignores the parameter. This message was misleadingly prefixed with
"abort: ". The use of "abort: " here is clearly wrong as nothing is actually
aborted and the ``getbundle`` call proceeds without the parameter.
The message is now prefixed with "warning: " and rephrased to make it clearer
that the parameter was simply ignored.
A new ``listkeys`` is supported by getbundle. It is a list of namespaces whose
content should be included in the bundle.
An appropriate entry has been added to the wireproto map of getbundle arguments
and a new bundle2 capability is advertised.
There are still no codes that request such parts in core mercurial.
In addition to listing the expected options for ``getbundle``, we also list their
types and handle the encoding/decoding automatically. This should make it easier
for extensions to transmit additional information to getbundle.
For now, getbundle accepts a fixed number of arguments: ``heads``, ``common``
and ``bundlecaps``. We make this list exposed at the module level to let
extensions add content there. This is important for extensions that wish to use
bundle2 for other contents than changegroup.
We picked a null character to split each parameter during the transfer. This is
fragile if the same character is used in parameter name. However other
codes will already behave in a strange way in that case, so we are not
introducing any regression. A better format may be picked for the final
version of the protocol.
This is a backward compatibility breakage per se. But bundle2 was explicitly
flagged as experimental, and this is one an error path anyway. So the worse
possible outcome from this change is to still have a crash but with a different
message.
We are going to raise exceptions for a wider range of cases: unsupported
mandatory stream and part parameters. We rename the exception with a wider
name.
Same drill again. We catch the PushRaced error, check if it cames from
a bundle2 processing, if so we turn it into a bundle2 with a part
transporting error information to be reraised client side.
If the heads on the server differ from the ones reported seen by the client at
bundle time, we raise a PushRaced exception. However, the part raising the
exception was broken.
To fix it, we move the PushRaced class in the error module so it can be
accessible everywhere without an import cycle.
A test is also added to prevent regression.
Same as for Abort error, we catch the error, encode it into a bundle2 reply
(expected by the client) and stream this reply. The client processing of the
error will raise the exception again.
Clients expect a bundle2 reply to their bundle2 submission. So we
catch the Abort error and turn it into a bundle2 containing a part
transporting the exception data. The unbundling of this reply will
raise the error again.
We highlight the fact that this is experimental by moving it to an "experimental"
section, and we match the config name with the server capability name
`bundle2-exp`.
For the same reason, we advertise this bundle2 implementation and format as
experimental. This will leave room for field testing in 3.0 but won't conflict
with a stable implementation in 3.1.
The current implementation of bundle2 is still very experimental and the 3.0
freeze is yesterday. The current bundle2 format has never been field-tested, so
we rename the header to HG2X. This leaves the HG20 header available for real
usage as a stable format in Mercurial 3.1.
We won't guarantee that future mercurial versions will keep supporting this
`HG2X` format.
We can now retrieve them from the server during push. The capabilities are
encoded the same way as in `replycaps` part (with an extra layer of urlquoting
to escape separators).
We use the new method defined in the past changeset to send a bundle2 stream and
receive one in reply. The http version is missing remote output support. This
will be done later using a bundle part.
This method will be used by bundle2 pushes. It calls a command, feeds it with a
stream and receives another stream in reply.
Actual implementation for ssh and http will be done in later changesets.
This changeset makes `wireprotocol` peers advertise bundle2 capability and
comply with bundle2 `getbundle` requests.
Note that advertising bundle2 could make a client try to use it for push. Such
pushes would fail. However, I do not expect any human being to have enabled
bundle2 on their server yet.
Localrepo now supports the unbundle method of pushing changegroups. We
plan to use the unbundle call for bundle2 so it is important that all
peers supports it. The `peer.unbundle` and `peer.addchangegroup` code
path have small difference so cause some test output changes. None of those
changes seems problematic.
The `exchange` module now contains an `unbundle` function that holds the core
unbundle logic. The wire protocol keeps its own unbundle function. It enforces
wireprotocol-specific logic and then calls the extracted function.
This aims at implementing unbundle for localrepo.
We are going to refactor the unbundle function to have it working on
a local repository too. Having this function extracted will ease the process.
In the case of non-matching heads, the function now directly raises an
exception. The top level of the function is catching it.
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API stay unchanged.
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API remains unchanged.
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API remains unchanged.
Move move in the same direction we took for command line commands. each wire
protocol function will be decorated with its name and arguments.
Beside beside easier to read, this open the road to easily adding more metadata
(like security level or return type)
We already have multiple call function for multiple return type. The
`_decompress` function is only used for http and seems like a layer violation.
We drop it in favor of a new call type dedicated to "stream that may be useful to
compress".
The wireprotocole command use a small set of class as return value. We document
the meaning of each of them.
AS my knowledge of wire protocol is fairly shallow, the documentation can
probably use improvement. But this is a better than nothing.
It will help people that need to add capabilities (in a more subtle was that
just adding some to the list) in multiple way:
1. This function returns a list, not a string. Making it easier to look at,
extend or alter the content.
2. The original capabilities function will be store in the dictionary of wire
protocol command. So extension that wrap this function also need to update
the dictionary entry.
Both wrapping and update of the dictionary entry are needed because the
`hello` wire protocol use the function itself. This is specifically sneaky for
extension writer as ssh use the `hello` command while http use the
`capabilities` command.
With this new `_capabilities` function there is one and only one obvious
place to wrap when needed.
Moves the file walk out of the stream method so that extensions can override it.
This allows an extension to decide what files should be streamed, and in
particular allows a stream without filelogs.
The message was not very much to the point and did not in any way help an
ordinary user.
'repository changed while preparing/uploading bundle - please try again'
is more correct, gives the user some understanding of what is going on, and
tells how to 'recover' from the situation.
The 'bundle' aspect could be seen as an implementation detail that shouldn't be
mentioned, but I think it helps giving an exact error message.
The message could still leave the user wondering why Mercurial doesn't lock the
repo and how unsafe it thus is. Explaining that is however too much detail.
Now that changelog filtering is in place, it's become evident that
naming the filters according to the set of revs _not_ included in the
filtered changelog is confusing. This is especially evident in the
collaborative branch cache scheme.
This changes the names of the filters to reflect the revs that _are_
included:
hidden -> visible
unserved -> served
mutable -> immutable
impactable -> base
repoview.filteredrevs is renamed to filterrevs, so that callers read a
bit more sensibly, e.g.:
filterrevs('visible') # filter revs according to what's visible