doc: remove unhelpful documents

Summary:
Clean up the less helpful documents.
- GeneralDelta: Mostly about migration. We're already migrated.
- RepositoryLayoutRequirements: Mostly about migration. We're already migrated.
- Revlog: Ancient. Not used in production.
- HttpCommandProtocol: Lack of content. Not really used in production.
- SshCommandProtocol: Lack of content.

Reviewed By: phillco

Differential Revision: D10287049

fbshipit-source-id: 54d2b00f146057a8a246c284043adc651db478d8
This commit is contained in:
Jun Wu 2018-10-24 18:57:38 -07:00 committed by Facebook Github Bot
parent 0159adb35b
commit f24f595b7f
6 changed files with 0 additions and 304 deletions

View File

@ -21,16 +21,11 @@ Contents:
internals/DirState
internals/EncodingStrategy
internals/FileFormats
internals/GeneralDelta
internals/HandlingLargeFiles
internals/HttpCommandProtocol
internals/Manifest
internals/MercurialApi
internals/RepositoryLayoutRequirements
internals/RequiresFile
internals/RevlogNG
internals/Revlog
internals/SshCommandProtocol
internals/UnlinkingFilesOnWindows
internals/WhatGoesWhere
internals/WireProtocol

View File

@ -1,84 +0,0 @@
GeneralDelta
============
Using the generaldelta compression option.
Introduction
------------
The original Mercurial compression format has a particular weakness in storing and transmitting deltas for branches that are heavily interleaved. In some instances, this can make the size of the manifest data (stored in **00manifest.d**) balloon by 10x or more. The generaldelta option is an effort to mitigate that, while still maintaining Mercurial's O(1)-bounded performance.
The generaldelta feature is available in Mercurial 1.9 and later.
Enabling generaldelta
---------------------
The generaldelta feature is enabled by default in Mercurial 3.7
For older release can be enabled for new clones with:
::
[format]
generaldelta = true
This will actually enable three features:
* generaldelta storage
* recomputation of delta on pull (to be stored as "optimised" general delta)
* delta reordering on pulls when this is enabled on the server side
The latter feature will let clients without generaldelta enabled experience some of the disk space and bandwidth benefits.
Converting a repo to generaldelta
---------------------------------
This is as simple as:
::
$ hg clone -U --config format.generaldelta=1 --pull project project-generaldelta
The aforementioned reordering can also marginally improve compression for generaldelta clients, which can be tried with a second pass:
::
$ hg clone -U --config format.generaldelta=1 --pull project-generaldelta project-generaldelta-pass2
Detailed compression statistics for the manifest can be checked with **debugrevlog**:
::
$ hg debugrevlog -m
format : 1
flags : generaldelta
revisions : 14932
merges : 1763 (11.81%)
normal : 13169 (88.19%)
revisions : 14932
full : 61 ( 0.41%)
deltas : 14871 (99.59%)
revision size : 3197528
full : 744577 (23.29%)
deltas : 2452951 (76.71%)
avg chain length : 172
compression ratio : 229
uncompressed data size (min/max/avg) : 125 / 80917 / 49156
full revision size (min/max/avg) : 113 / 37284 / 12206
delta size (min/max/avg) : 0 / 27029 / 164
deltas against prev : 13770 (92.60%)
where prev = p1 : 13707 (99.54%)
where prev = p2 : 8 ( 0.06%)
other : 55 ( 0.40%)
deltas against p1 : 1097 ( 7.38%)
deltas against p2 : 4 ( 0.03%)
deltas against other : 0 ( 0.00%)
Of particular interest are the number of full revisions and the average delta size.

View File

@ -1,74 +0,0 @@
HTTP commands are sent as CGI requests having the following form:
::
GET /hgweb.cgi?cmd=foo&param1=bar HTTP/1.1
Results are returned with the Content-Type application/mercurial-0.1.
The best way to explore the protocol is to run ``hg serve`` in a terminal, then try out the various commands. Errors, such as missing parameters, will be logged in the terminal window, including source references.
Available commands
==================
The available commands can be seen at the end of ``mercurial/wireproto.py``, along with arguments.
lookup
~~~~~~
Given a changeset reference (given by the ``key`` parameter), yields the changeset ID.
Returns a status code (1 on success, 0 on failure) and a result (the changeset ID or error message).
Examples:
::
$ curl 'http://selenic.com/hg/?cmd=lookup&key=0'
1 9117c6561b0bd7792fa13b50d28239d51b78e51f
$ curl 'http://selenic.com/hg/?cmd=lookup&key=33d290cc14ae48c8c18d2a2c9dfae99728ee0cff'
0 unknown revision '33d290cc14ae48c8c18d2a2c9dfae99728ee0cff'
$ curl 'http://selenic.com/hg/?cmd=lookup&key=tip'
1 55724f42fa14b6759a47106998feea25a032e45c
heads
~~~~~
Returns a space separated list of `changeset ID`_ identifying all the heads in the repository. Takes no parameters
branches
~~~~~~~~
changegroup
~~~~~~~~~~~
changegroupsubset
~~~~~~~~~~~~~~~~~
between
~~~~~~~
capabilities
~~~~~~~~~~~~
Accepts no parameters. Returns a whitespace-separated list of other commands accepted by this server. For the *unbundle* command, produces the form unbundle=HG10GZ,HG10BZ,HG10UN if all three compression schemes are supported.
unbundle
~~~~~~~~
Usage:
::
POST /hgweb.cgi?cmd=unbundle&heads=HEADS HTTP/1.1
content-type: application/octet-stream
This command allows for the upload of new changes to the repository. The body of the POST request must be a changegroup in bundle format. The returned output is the same as what the *hg unbundle* command would print to standard output if it was being run locally.
stream_out
~~~~~~~~~~

View File

@ -1,51 +0,0 @@
*In http://selenic.com/pipermail/mercurial/2008-June/019561.html, mpm listed repository layout requirements (quote from that post):*
-------------------------
Here are the repo layout requirements:
a) does not randomize disk layout (ie hashing)
b) avoids case collisions
c) uses only ASCII
d) handles stupidly long filenames
e) avoids filesystem quirks like reserved words and characters
f) mostly human-readable (optional)
g) reversible (optional)
Point (a) is important for performance. Filesystems are likely to store sequentially created files in the same directory near each other on disk. Disks and operating systems are likely to cache such data. Thus, always reading and writing files in a consistent order gives the operating system and hardware its best opportunity to optimize layout. Ideally, the store layout should roughly mirror the working dir layout.
Point (g) is interesting. If we throw out cryptographically-strong hashing because of (a), we either have to expand names to meet (b) and (c) or throw out information and risk collisions. And we don't really want that, so (g) may effectively be implied. It's also worth considering (f): it makes understanding what's going on in the store a hell of a lot easier, especially when something goes wrong. Which again means: no hashing.
Which means that adding (d) is hard, because just about any solution to (b) and (c) will blow up path lengths, especially in the presence of interesting character sets. If our absolute paths are limited to a mere 255 characters and people want to have multi-line filenames, we've got a problem.
So we probably have to find a way to write longer filenames (NTFS's real limit is 32k).
-------------------------
*mpm on why hashing filenames doesn't solve the problem (quote from from* http://selenic.com/pipermail/mercurial/2008-June/019574.html*):*
-------------------------
Let's say you've cloned repo A, which gives you some set of files F that get stored in random order. To checkout that set of files F, we can either read them out of the repo in random (pessimal) order, or write them out to the working directory in random (pessimal) order. In both cases, we end up seeking like mad. The performance hit is around an order of magnitude: not something to sneeze at.
(And that's on an FS without decent metadata read-ahead. The difference will likely be more marked on something like cmason's btrfs.)
If we're clever, we can try reading every file out of the repo into memory in hash order, then writing them out in directory order. Which works just fine until you can't fit everything in memory. Then we discover we weren't so clever after all and we're back to seeking like mad, even if we're doing fairly large subsets. If our repo is 1G and we're batching chunks of 100M, we probably still end up seeking over the entire 1G of input space (or output space) to batch up each chunk.
Mercurial originally used a hashing scheme for its store 3 years ago and as soon as I moved my Linux test repo from one partition to another, performance suddenly went to hell because the sort order got pessimized by rsync alphabetizing things. I tried all sorts of variations on sorting, batching, etc., but none of them were even close in performance to the current scheme, namely mirroring the working directory layout and doing all operations in alphabetical order.
Further, simply cloning or cp'ing generally defragments things rather than making them worse.
-------------------------
Notes
~~~~~
According to http://selenic.com/pipermail/mercurial-devel/2008-June/006916.html, a reversible encoding of path names is currently (2008-06-28) required (see also http://selenic.com/pipermail/mercurial-devel/2008-June/006930.html).

View File

@ -1,52 +0,0 @@
Revlog
======
.. note::
A new revlog format was introduced for Mercurial 0.9: see* RevlogNG_
A **revlog**, for example ``.hg/data/somefile.d``, is the most important data structure and represents all versions of a file in a repository. Each version is stored compressed in its entirety or stored as a compressed binary delta (difference) relative to the preceeding version in the revlog. Whether to store a full version is decided by how much data would be needed to reconstruct the file. This system ensures that Mercurial does not need huge amounts of data to reconstruct any version of a file, no matter how many versions are stored.
The reconstruction requires a single read, if Mercurial knows when and where to read. Each revlog therefore has an **index**, for example ``.hg/store/data/somefile.i``, containing one fixed-size record for each version. The record contains:
* the nodeid of the file version
* the nodeids of its parents
* the length of the revision data
* the offset in the revlog saying where to begin reading
* the base of the delta chain
* the linkrev pointing to the corresponding changeset
Here's an example:
::
$ hg debugindex .hg/store/data/README.i
rev offset length base linkrev nodeid p1 p2
0 0 1125 0 0 80b6e76643dc 000000000000 000000000000
1 1125 268 0 1 d6f755337615 80b6e76643dc 000000000000
2 1393 49 0 27 96d3ee574f69 d6f755337615 000000000000
3 1442 349 0 63 8e5de3bb5d58 96d3ee574f69 000000000000
4 1791 55 0 67 ed9a629889be 8e5de3bb5d58 000000000000
5 1846 100 0 81 b7ac2f914f9b ed9a629889be 000000000000
6 1946 405 0 160 1d528b9318aa b7ac2f914f9b 000000000000
7 2351 39 0 176 2a612f851a95 1d528b9318aa 000000000000
8 2390 0 0 178 95fdb2f5e08c 2a612f851a95 2a612f851a95
9 2390 127 0 179 fc5dc12f851b 95fdb2f5e08c 000000000000
10 2517 0 0 182 24104c3ccac4 fc5dc12f851b fc5dc12f851b
11 2517 470 0 204 cc286a25cf37 24104c3ccac4 000000000000
12 2987 346 0 205 ffe871632da6 cc286a25cf37 000000000000
...
With one read of the index to fetch the record and then one read of the revlog, Mercurial can reconstruct any version of a file in time proportional to the file size.
So that adding a new version requires only O(1) seeks, the revlogs and their indices are append-only.
Revlogs are also used for manifests and changesets.
.. _RevlogNG: RevlogNG

View File

@ -1,38 +0,0 @@
Command names are sent over the ssh pipe as plain text, followed by a *single character* linebreak. This is important on systems that automatically use a two character line-break, such as CR+LF on Windows: if there is extra whitespace on the end of the command (in the case of windows, there will be an extra CR at the end), it will not be recognized.
Arguments are sent as ``[argname] [value length]\n``, followed by the value. Responses are ``[response length]\n``, followed by the response.
Example:
To issue the "lookup" command on the key "tip", the client issues the following:
::
lookup
key 3
tip
And the server might respond:
::
25
1 9b4a87d1a1c9c3577b12990ce5819e2955347083
Version detection
,,,,,,,,,,,,,,,,,
Because older Mercurial versions give no/zero-length responses to unknown commands, you must first send the ``hello`` command followed by a command with known output, and then determine if the ``hello`` command responded before the known output was sent.
Over STDIO
~~~~~~~~~~
You can use the SSH command protocol over stdio with the following command:
::
hg serve --stdio
You can now type or otherwise send your client commands to the server directly through its STDIN stream, and it will respond on its STDOUT stream.