Commit Graph

33 Commits

Author SHA1 Message Date
Gregory Szorc
8509056f34 sparse: add a requirement when a repository uses sparse (BC)
The presence of a sparse checkout can confuse legacy clients or
clients without sparse enabled for reasons that should be obvious.

This commit introduces a new repository requirement that tracks
whether sparse is enabled. The requirement is added when a sparse
config is activated and removed when the sparse config is reset.

The localrepository constructor has been taught to not open repos
with this requirement unless the sparse feature is enabled. It yields
a more actionable error message than what you would get if the
lockout were handled strictly at the requirements verification phase.
Old clients that aren't sparse aware will see the generic
"repository requires features unknown to this Mercurial" error,
however.

The new requirement has "exp" in its name to reflect the
experimental nature of sparse. There's a chance that the eventual
non-experimental feature won't change significantly and we could
have squatted on the "sparse" requirement without ill effect. If
that happens, we can teach new clients to still recognize the old
name. But I suspect we'll sneak in some BC and we'll want a new
requirement to convey new meaning.

Differential Revision: https://phab.mercurial-scm.org/D110
2017-07-17 11:45:38 -07:00
Gregory Szorc
efbb740737 revlog: skeleton support for version 2 revlogs
There are a number of improvements we want to make to revlogs
that will require a new version - version 2. It is unclear what the
full set of improvements will be or when we'll be done with them.
What I do know is that the process will likely take longer than a
single release, will require input from various stakeholders to
evaluate changes, and will have many contentious debates and
bikeshedding.

It is unrealistic to develop revlog version 2 up front: there
are just too many uncertainties that we won't know until things
are implemented and experiments are run. Some changes will also
be invasive and prone to bit rot, so sitting on dozens of patches
is not practical.

This commit introduces skeleton support for version 2 revlogs in
a way that is flexible and not bound by backwards compatibility
concerns.

An experimental repo requirement for denoting revlog v2 has been
added. The requirement string has a sub-version component to it.
This will allow us to declare multiple requirements in the course
of developing revlog v2. Whenever we change the in-development
revlog v2 format, we can tweak the string, creating a new
requirement and locking out old clients. This will allow us to
make as many backwards incompatible changes and experiments to
revlog v2 as we want. In other words, we can land code and make
meaningful progress towards revlog v2 while still maintaining
extreme format flexibility up until the point we freeze the
format and remove the experimental labels.

To enable the new repo requirement, you must supply an experimental
and undocumented config option. But not just any boolean flag
will do: you need to explicitly use a value that no sane person
should ever type. This is an additional guard against enabling
revlog v2 on an installation it shouldn't be enabled on. The
specific scenario I'm trying to prevent is say a user with a
4.4 client with a frozen format enabling the option but then
downgrading to 4.3 and accidentally creating repos with an
outdated and unsupported repo format. Requiring a "challenge"
string should prevent this.

Because the format is not yet finalized and I don't want to take
any chances, revlog v2's version is currently 0xDEAD. I figure
squatting on a value we're likely never to use as an actual revlog
version to mean "internal testing only" is acceptable. And
"dead" is easily recognized as something meaningful.

There is a bunch of cleanup that is needed before work on revlog
v2 begins in earnest. I plan on doing that work once this patch
is accepted and we're comfortable with the idea of starting down
this path.
2017-05-19 20:29:11 -07:00
Gregory Szorc
7d51a8278d revlog: remove some revlogNG terminology
RevlogNG is not such a good name when it is no longer the
newest revlog version. Since we'll soon have revlog version 2,
let's remove some references to it.
2017-05-19 20:14:31 -07:00
Augie Fackler
ec67fe37fb merge with stable 2017-05-04 00:26:55 -04:00
Matt Harbison
44c7a911a4 help: spelling fixes 2017-05-03 22:07:47 -04:00
Siddharth Agarwal
d28bc68b23 internals: document that "branches" is a legacy wire command
Modern clients use a different discovery mechanism.
2017-05-03 14:07:14 -07:00
Yuya Nishihara
cb8fe16f12 help: fix layout of pre-formatted text 2017-03-09 12:55:48 +09:00
Augie Fackler
b3e554d062 internals: add some brief documentation about censor 2017-01-23 20:17:24 -05:00
Kim Alvefur
977cd43980 help: align description of 'base rev' with reality [issue5488]
The text about revlogs seems to be wrong about -1 being used to indicate
the start of a delta chain. Attempt to correct this.
2017-02-28 15:19:08 +01:00
Kyle Lippincott
2dc3df2798 help: fix internals.changegroups
Add information about tree manifests, copy edit the text and fix up a few
ambiguities.

The document also contains a few additional fixes from Siddharth Agarwal
<sid0@fb.com>, who used it to build a parser for changegroups in Rust.
2017-03-01 18:37:34 -08:00
Dan Villiom Podlaski Christiansen
66c3976319 share: add --relative flag to store a relative path to the source
Storing a relative path the source repository is useful when exporting
repositories over the network or when they're located on external
drives where the mountpoint isn't always fixed.

Currently, Mercurial interprets paths in `.hg/shared` relative to
$PWD. I suspect this is very much unintentional, and you have to
manually edit `.hg/shared` in order to trigger this behaviour.

However, on the off chance that someone might rely on it, I added a
new capability called 'relshared'. In addition, this makes earlier
versions of Mercurial fail with a graceful error.

I should note that I haven't tested this patch on Windows.
2017-02-13 14:05:24 +01:00
Martin von Zweigbergk
ad5f4ef8a6 revlog: give EXTSTORED flag value to narrowhg
Narrowhg has been using "1 << 14" as its revlog flag value for a long
time. We (Google) have many repos with that value in production
already. When the same value was reserved for EXTSTORED, it made those
repos invalid. Upgrading them will be a little painful. We should
clearly have reserved the value for narrowhg a long time ago. Since
the EXTSTORED flag is not yet in any release and Facebook also says
they have not started using it in production, so it should be okay to
change it. This patch gives the current value (1 << 14) back to
narrowhg and gives a new value (1 << 13) to EXTSTORED.
2017-01-17 11:25:02 -08:00
Martin von Zweigbergk
a445384510 help: don't let tools reflow revlog flags list
Before this change, the text about revlog flags was reflowed into a
single paragraph, which made it a bit hard to read. I don't even know
the rules around this, but adding a blank line before each flag seems
to prevent the reflowing.
2017-01-17 11:45:10 -08:00
Martin von Zweigbergk
0ecfe18db3 help: format revlog.txt more closely to result
The rendered text has spaces before each item in the list
2017-01-17 11:29:06 -08:00
Gregory Szorc
73aa240fe1 util: declare wire protocol support of compression engines
This patch implements a new compression engine API allowing
compression engines to declare support for the wire protocol.

Support is declared by returning a compression format string
identifier that will be added to payloads to signal the compression
type of data that follows and default integer priorities of the
engine.

Accessor methods have been added to the compression engine manager
class to facilitate use.

Note that the "none" and "bz2" engines declare wire protocol support
but aren't enabled by default due to their priorities being 0. It
is essentially free from a coding perspective to support these
compression formats, so we do it in case anyone may derive use from
it.
2016-12-24 13:51:12 -07:00
Gregory Szorc
0d089ecdf9 internals: document compression negotiation
As part of adding zstd support to all of the things, we'll need
to teach the wire protocol to support non-zlib compression formats.

This commit documents how we'll implement that.

To understand how we arrived at this proposal, let's look at how
things are done today.

The wire protocol today doesn't have a unified format. Instead,
there is a limited facility for differentiating replies as successful
or not. And, each command essentially defines its own response format.

A significant deficiency in the current protocol is the lack of
payload framing over the SSH transport. In the HTTP transport,
chunked transfer is used and the end of an HTTP response body (and
the end of a Mercurial command response) can be identified by a 0
length chunk. This is how HTTP chunked transfer works. But in the
SSH transport, there is no such framing, at least for certain
responses (notably the response to "getbundle" requests). Clients
can't simply read until end of stream because the socket is
persistent and reused for multiple requests. Clients need to know
when they've encountered the end of a request but there is nothing
simple for them to key off of to detect this. So what happens is
the client must decode the payload (as opposed to being dumb and
forwarding frames/packets). This means the payload itself needs
to support identifying end of stream. In some cases (bundle2), it
also means the payload can encode "error" or "interrupt" events
telling the client to e.g. abort processing. The lack of framing
on the SSH transport and the transfer of its responsibilities to
e.g. bundle2 is a massive layering violation and a wart on the
protocol architecture. It needs to be fixed someday by inventing a
proper framing protocol.

So about compression.

The client transport abstractions have a "_callcompressable()"
API. This API is called to invoke a remote command that will
send a compressible response. The response is essentially a
"streaming" response (no framing data at the Mercurial layer)
that is fed into a decompressor.

On the HTTP transport, the decompressor is zlib and only zlib.
There is currently no mechanism for the client to specify an
alternate compression format. And, clients don't advertise what
compression formats they support or ask the server to send a
specific compression format. Instead, it is assumed that non-error
responses to "compressible" commands are zlib compressed.

On the SSH transport, there is no compression at the Mercurial
protocol layer. Instead, compression must be handled by SSH
itself (e.g. `ssh -C`) or within the payload data (e.g. bundle
compression).

For the HTTP transport, adding new compression formats is pretty
straightforward. Once you know what decompressor to use, you can
stream data into the decompressor until you reach a 0 size HTTP
chunk, at which point you are at end of stream.

So our wire protocol changes for the HTTP transport are pretty
straightforward: the client and server advertise what compression
formats they support and an appropriate compression format is
chosen. We introduce a new HTTP media type to hold compressed
payloads. The header of the payload defines the compression format
being used. Whoever is on the receiving end can sniff the first few
bytes route to an appropriate decompressor.

Support for multiple compression formats is advertised on both
server and client. The server advertises a "compression" capability
saying which compression formats it supports and in what order they
are preferred. Clients advertise their support for multiple
compression formats and media types via the introduced "X-HgProto"
request header.

Strictly speaking, servers don't need to advertise which compression
formats they support. But doing so allows clients to fail fast if
they don't support any of the formats the server does. This is useful
in situations like sending bundles, where the client may have to
perform expensive computation before sending data to the server.

Rather than simply advertise a list of supported compression formats,
we introduce an additional "httpmediatype" server capability
advertising which media types the server supports. This means servers
are explicit about what formats they exchange. IMO, this is superior
to inferring support from other capabilities (like "compression").

By advertising compression support on each request in the "X-HgProto"
header and media type and direction at the server level, we are able
to gradually transition existing commands/responses to the new media
type and possibly compression. Contrast with the old world, where we
only supported a single media type and the use of compression was
built-in to the semantics of the command on both client and server.
In the new world, if "application/mercurial-0.2" is supported,
compression is supported. It's that simple.

It's worth noting that we explicitly don't use "Accept,"
"Accept-Encoding," "Content-Encoding," or "Transfer-Encoding" for
content negotiation and compression. People knowledgeable of the HTTP
specifications will say that we should use these because that's
what they are designed to be used for. They have a point and I
sympathize with the argument. Earlier versions of this commit even
defined supported media types in the "Accept" header. However, my
years of experience rolling out services leveraging HTTP has taught
me to not trust the HTTP layer, especially if you are going outside
the normal spec (such as using a custom "Content-Encoding" value to
represent zstd streams). I've seen load balancers, proxies, and other
network devices do very bad and unexpected things to HTTP messages
(like insisting zlib compressed content is decoded and then re-encoded
at a different compression level or even stripping compression
completely). I've found that the best way to avoid surprises when
writing protocols on top of HTTP is to use HTTP as a dumb transport as
much as possible to minimize the chances that an "intelligent" agent
between endpoints will muck with your data. While the widespread use of
TLS is mitigating many intermediate network agents interfering with
HTTP, there are still problems at the edges, with e.g. the origin HTTP
server needing to convert HTTP to and from WSGI and buggy or
feature-lacking HTTP client implementations. I've found the best way to
avoid these problems is to avoid using headers like "Content-Encoding"
and to bake as much logic as possible into media types and HTTP message
bodies. The protocol changes in this commit do rely on a custom HTTP
request header and the "Content-Type" headers. But we used them before,
so we shouldn't be increasing our exposure to "bad" HTTP agents.

For the SSH transport, we can't easily implement content negotiation
to determine compression formats because the SSH transport has no
content negotiation capabilities today. And without a framing protocol,
we don't know how much data to feed into a decompressor. So in order
to implement compression support on the SSH transport, we'd need to
invent a mechanism to represent content types and an outer framing
protocol to stream data robustly. While I'm fully capable of doing
that, it is a lot of work and not something that should be undertaken
lightly. My opinion is that if we're going to change the SSH transport
protocol, we should take a long hard look at implementing a grand
unified protocol that attempts to address all the deficiencies with
the existing protocol. While I want this to happen, that would be
massive scope bloat standing in the way of zstd support. So, I've
decided to take the easy solution: the SSH transport will not gain
support for multiple compression formats. Keep in mind it doesn't
support *any* compression today. So essentially nothing is changing
on the SSH front.
2016-12-24 13:56:36 -07:00
Remi Chaintron
66071d6de5 revlog: REVIDX_EXTSTORED flag
This flag will be used by the lfs extension to mark the revision data as stored
externally.
2017-01-05 17:16:51 +00:00
Remi Chaintron
0c2f48e6cc documentation: better censor flag documentation 2016-12-22 19:08:38 -05:00
Remi Chaintron
25e30cf1ed censor: flag internal documentation 2016-11-23 17:36:35 +00:00
Gregory Szorc
247c663431 help: clarify contents of revlog index
The previous wording indicated that field at index 3 was the
size of the decompressed chunk, not the size of the full
revision text.
2016-11-22 18:13:02 -08:00
Gregory Szorc
57934ad4c1 help: document wire protocol commands 2016-08-22 19:50:21 -07:00
Gregory Szorc
a167c55bc7 help: document wire protocol "handshake" protocol
There isn't a formal handshake protocol in the wire protocol. But
clients almost certainly need to perform particular actions before they
can communicate with a server optimally. So document what that is
so people understand what's going on at connection establishment time.
2016-08-22 19:49:59 -07:00
Gregory Szorc
378142967e help: document wire protocol capabilities
All capabilities from the history of the project are now documented.
2016-08-22 19:48:31 -07:00
Gregory Szorc
b21abff69d help: document wire protocol transport protocols
The HTTP and SSH transport protocols are documented. This
includes how commands and arguments are serialized as well as
response types.
2016-08-22 19:47:34 -07:00
Gregory Szorc
5c136c2209 help: internals topic for wire protocol
The Mercurial wire protocol is under-documented. This includes a lack
of source docstrings and comments as well as pages on the official
wiki.

This patch adds the beginnings of "internals" documentation on the
wire protocol.

The documentation should have nearly complete coverage on the
lower-level parts of the protocol, such as the different transport
mechanims, how commands and arguments are sent, capabilities, and,
of course, the commands themselves.

As part of writing this documentation, I discovered a number of
deficiencies in the protocol and bugs in the implementation. I've
started sending patches for some of the issues. I hope to send a lot
more.

This patch starts with the scaffolding for a new internals page.
2016-08-22 19:46:39 -07:00
Gregory Szorc
efb3c4ff63 help: don't try to render a section on sub-topics
This patch subtly changes the behavior of the parsing of "X.Y" values
to not set the "section" variable when rendering a known sub-topic.
Previously, "section" would be the same as the sub-topic name. This
required the sub-topic RST to have a section named the same as the
sub-topic name.

When I made this change, the descriptions from help.internalstable
started being rendered in command line output. This didn't look correct
to me, as it didn't match the formatting of main help pages. I
corrected this by moving the top section to help.internalstable and
changing the section levels of all the "internals" topics.

The end result is that "internals" topics now match the rendering of
main topics on both the CLI and HTML. And, "internals" topics no longer
require a main section matching the name of the topic.
2016-08-06 17:04:22 -07:00
Matt Harbison
8d72832828 help: fix the display for hg help internals.revlogs (issue5227)
It previously aborted saying the help section wasn't found.  Credit to Yuya for
figuring out the fix.
2016-05-08 22:28:09 -04:00
Gregory Szorc
db9b5063c1 help: document sharing of revlog header with revision 0
The previous docs were incorrect about there being a discrete header
on revlogs.
2016-03-19 15:17:33 -07:00
Gregory Szorc
891be7d008 help: document requirements
We didn't have unified documentation of the various repository
requirements. This patch changes that.
2016-03-12 18:51:07 -08:00
Gregory Szorc
8e5f315fff internals: document revlog format
It seems like a good idea to document the revlog format.

There is a lot more that could be added to this documentation.
But you have to start somewhere.
2015-12-30 16:21:57 -07:00
Augie Fackler
e4e988fdf5 changegroups: add documentation for cg3 2015-12-18 09:57:35 -05:00
Gregory Szorc
0507fa0c45 help: add documentation for bundle types
Bundle types and the high-level data format of each bundle isn't
documented anywhere. Let's document this as well.

Obviously there are many more details about bundles that could be
written about. But you have to start somewhere.
2015-12-13 11:27:52 -08:00
Gregory Szorc
15c267d26e help: add documentation for changegroup formats
There is no formal location for spec-like technical/internal docs. The
repository makes sense as such a location because spec-like
documentation should be reviewed (ruling out a wiki). mpm has also
stated that he would like this documentation to be part of the
built-in help system. So, we establish an "internals" sub-directory
to hold this class of documentation.

The format of changegroups does not appear to be documented anywhere,
even in source code. It therefore seemed like an appropriate first thing
to document.

This patch adds low-level documentation of versions 1 and 2 of the
changegroup foromat. It currently only describes the raw data format.
There is probably room to write higher-level documentation on strategies
for producing and consuming the data. We'll leave that for another day.

The added file is not yet accessible via `hg help` nor via hgweb.
Support for this will follow in subsequent patches.
2015-10-25 00:19:45 +01:00