Commit Graph

26419 Commits

Author SHA1 Message Date
Gregory Szorc
5d1b4c49ee clonebundles: support for seeding clones from pre-generated bundles
Cloning can be an expensive operation for servers because the server
generates a bundle from existing repository data at request time. For
a large repository like mozilla-central, this consumes 4+ minutes
of CPU time on the server. It also results in significant network
utilization. Multiplied by hundreds or even thousands of clients and
the ensuing load can result in difficulties scaling the Mercurial server.

Despite generation of bundles being deterministic until the next
changeset is added, the generation of bundles to service a clone request
is not cached. Each clone thus performs redundant work. This is
wasteful.

This patch introduces the "clonebundles" extension and related
client-side functionality to help alleviate this deficiency. The
client-side feature is behind an experimental flag and is not enabled by
default.

It works as follows:

1) Server operator generates a bundle and makes it available on a
   server (likely HTTP).
2) Server operator defines the URL of a bundle file in a
   .hg/clonebundles.manifest file.
3) Client `hg clone`ing sees the server is advertising bundle URLs.
4) Client fetches and applies the advertised bundle.
5) Client performs equivalent of `hg pull` to fetch changes made since
   the bundle was created.

Essentially, the server performs the expensive work of generating a
bundle once and all subsequent clones fetch a static file from
somewhere. Scaling static file serving is a much more manageable
problem than scaling a Python application like Mercurial. Assuming your
repository grows less than 1% per day, the end result is 99+% of CPU
and network load from clones is eliminated, allowing Mercurial servers
to scale more easily. Serving static files also means data can be
transferred to clients as fast as they can consume it, rather than as
fast as servers can generate it. This makes clones faster.

Mozilla has implemented similar functionality of this patch on
hg.mozilla.org using a custom extension. We are hosting bundle files in
Amazon S3 and CloudFront (a CDN) and have successfully offloaded
>1 TB/day in data transfer from hg.mozilla.org, freeing up significant
bandwidth and CPU resources. The positive impact has been stellar and
I believe it has proved its value to be included in Mercurial core. I
feel it is important for the client-side support to be enabled in core
by default because it means that clients will get faster, more reliable
clones and will enable server operators to reduce load without
requiring any client-side configuration changes (assuming clients are
up to date, of course).

The scope of this feature is narrowly and specifically tailored to
cloning, despite "serve pulls from pre-generated bundles" being a valid
and useful feature. I would eventually like for Mercurial servers to
support transferring *all* repository data via statically hosted files.
You could imagine a server that siphons all pushed data to bundle files
and instructs clients to apply a stream of bundles to reconstruct all
repository data. This feature, while useful and powerful, is
significantly more work to implement because it requires the server
component have awareness of discovery and a mapping of which changesets
are in which files. Full, clone bundles, by contrast, are much simpler.

The wire protocol command is named "clonebundles" instead of something
more generic like "staticbundles" to leave the door open for a new, more
powerful and more generic server-side component with minimal backwards
compatibility implications. The name "bundleclone" is used by Mozilla's
extension and would cause problems since there are subtle differences
in Mozilla's extension.

Mozilla's experience with this idea has taught us that some form of
"content negotiation" is required. Not all clients will support all
bundle formats or even URLs (advanced TLS requirements, etc). To ensure
the highest uptake possible, a server needs to advertise multiple
versions of bundles and clients need to be able to choose the most
appropriate from that list one. The "attributes" in each
server-advertised entry facilitate this filtering and sorting. Their
use will become apparent in subsequent patches.

Initial inspiration and credit for the idea of cloning from static files
belongs to Augie Fackler and his "lookaside clone" extension proof of
concept.
2015-10-09 11:22:01 -07:00
Gregory Szorc
6cc7d3daaa sslutil: expose attribute indicating whether SNI is supported
This will be used so clone bundles can advertise whether URLs require
SNI. This will be explained more in a subsequent patch.
2015-09-29 16:17:32 -07:00
Siddharth Agarwal
25aa934095 resolve: perform all premerges before performing any file merges (BC)
Just like the BC to merge before it, this allows for a maximally consistent
state before providing any prompts to the user.
2015-10-11 23:58:07 -07:00
Siddharth Agarwal
4961e30679 test-resolve.t: add some tests for .orig file contents
An upcoming patch will touch some code around this area, and I couldn't find
any tests related to this.
2015-10-11 23:56:44 -07:00
Siddharth Agarwal
b47d5aac3c test-resolve.t: add some output to show order of operations
This basically shows the behavior of resolve with multiple files. An upcoming
behavior change will cause this output to also change.
2015-10-11 23:54:40 -07:00
Siddharth Agarwal
c97c4cf7f6 merge.mergestate: perform all premerges before any merges (BC)
We perform all that we can non-interactively before prompting the user for input
via their merge tool. This allows for a maximally consistent state when the user
is first prompted.

The test output changes indicate the actual behavior change happening.
2015-10-11 21:56:39 -07:00
Siddharth Agarwal
8b2a429453 merge: introduce a preresolve function
The section of code that writes out the version of the file cached in the merge
state should only be run at preresolve time. This is so that if the premerge
keeps around conflict markers, those don't get overwritten before the main
merge.
2015-10-11 20:12:12 -07:00
Siddharth Agarwal
cbb558b9d7 merge.mergestate._resolve: also return completed status
We'll need this for a new 'preresolve' function we're adding.
2015-10-11 18:37:54 -07:00
Siddharth Agarwal
2826ed841f merge.mergestate: add a wrapper around resolve
The resolve function will be broken up into separate pre-resolve and resolve
steps.
2015-10-11 18:29:50 -07:00
Siddharth Agarwal
a6dc53e738 simplemerge: move conflict warning message to filemerge
The current output for a failed merge with conflict markers looks something like:

  merging foo
  warning: conflicts during merge.
  merging foo incomplete! (edit conflicts, then use 'hg resolve --mark')
  merging bar
  warning: conflicts during merge.
  merging bar incomplete! (edit conflicts, then use 'hg resolve --mark')

We're going to change the way merges are done to perform all premerges before
all merges, so that the output above would look like:

  merging foo
  merging bar
  warning: conflicts during merge.
  merging foo incomplete! (edit conflicts, then use 'hg resolve --mark')
  warning: conflicts during merge.
  merging bar incomplete! (edit conflicts, then use 'hg resolve --mark')

The 'warning: conflicts during merge' line has no context, so is pretty
confusing.

This patch will change the future output to:

  merging foo
  merging bar
  warning: conflicts while merging foo! (edit, then use 'hg resolve --mark')
  warning: conflicts while merging bar! (edit, then use 'hg resolve --mark')

The hint on how to resolve the conflicts makes this a bit unwieldy, but solving
that is tricky because we already hint that people run 'hg resolve' to retry
unresolved merges. The 'hg resolve --mark' mostly applies to conflict marker
based resolution.
2015-10-09 13:54:52 -07:00
Siddharth Agarwal
dceb171bec filemerge: clean up some dead code
We now exit early if we do a premerge, so extra checks are no longer necessary.
2015-10-11 15:04:00 -07:00
Augie Fackler
f010f2acf9 run-tests: add b-prefix on two strings to fix python3 support 2015-10-12 14:15:04 -04:00
Siddharth Agarwal
88da24240c filemerge: break overall filemerge into separate premerge and merge steps
This means that in ms.resolve we must call merge after calling premerge. This
doesn't yet mean that all premerges happen before any merges -- however, this
does get us closer to our goal.

The output differences are because we recompute the merge tool. The only
user-visible difference caused by this patch is that if the tool is missing
we'll print the warning twice. Not a huge deal, though.
2015-10-11 20:47:14 -07:00
Siddharth Agarwal
b1a86ac060 filemerge: only copy to backup during premerge step
The premerge might leave the original file in an unclean state. Therefore it's
important to only copy the file in the beginning.
2015-10-11 20:04:40 -07:00
Siddharth Agarwal
014002cfa1 filemerge: only print out "merging f" output at premerge step
We're soon going to call this function twice, once for premerge and once for
merge. This makes sure the "merging" output only gets printed during the
premerge step.
2015-10-11 20:02:53 -07:00
Siddharth Agarwal
76bf2a7269 filemerge: deindent the parts of filemerge outside the try block
It is no longer necessary to indent these parts.
2015-10-08 00:19:20 -07:00
Siddharth Agarwal
dd90f817f9 filemerge: introduce a premerge flag and function
This flag will let us get to our overall goal of performing all premerges
before any merges.
2015-10-11 20:47:04 -07:00
Siddharth Agarwal
82f2aec334 filemerge: also return whether the merge is complete
In future patches, we'll pause merges after the premerge step. After the
premerge step we'll return complete = False.
2015-10-11 12:56:21 -07:00
Siddharth Agarwal
da75e232c9 filemerge: add a wrapper around the filemerge function
We'll introduce a separate premerge function that calls the same code.
2015-10-11 12:31:08 -07:00
Mads Kiilerich
90c21b3c76 context: don't hex encode all unknown 20 char revision specs (issue4890)
af5de4d23fd4 introduced nice hexified display of missing nodes. It did however
also make missing 20 character revision specifications be shown as hex - very
confusing.

Users are often wrong and somehow specify revisions that don't exist. Nodes
will however rarely be missing ... and they will only look like a user provided
revision specification and be all ascii in 1 of 4*10**9.

With this change, missing revisions will only be hexified if they really look
like binary nodes. This change will thus improve the error reporting UI in the
common case and only very rarely make it confusing in the opposite direction of
how it was before.
2015-10-09 01:19:37 +02:00
Pierre-Yves David
fb6955396c discovery: put trivial branch first
Having the simple and tiny branch of the conditional first help readability. The
"else" that appears after a screen of code is harder to relate to a conditional.
2015-10-12 00:45:24 -07:00
Pierre-Yves David
5e41fdb7bc shelve: rename 'publicancestors' to something accurate (issue4737)
That function is actually not returning public ancestors at all. This is
pointed by the second line of the docstring...

The bundling behavior was made correct in a5141977198d but with confusion
remaining regarding what each function was doing.

This close issue4737, because this highlight that shelve is actually -not-
bundling too much data (this was actually properly tested).
2015-10-09 15:31:50 -07:00
Nathan Goldbaum
77c0959268 makefile: add wheel build target 2015-10-09 12:30:46 -05:00
Nathan Goldbaum
d769f6e8d5 setup: import setup from setuptools if FORCE_SETUPTOOLS is set
This should allow easier experimentation with using setuptools in mercurial's
build automation, without breaking anything that currently depends on distutils
behavior
2015-10-09 12:25:51 -05:00
Gijs Kruitbosch
51164c44ad hgweb: remove obsolete -webkit-border-radius property 2015-10-12 14:46:51 +01:00
Anton Shestakov
0c82af3553 monoblue: add a link to the latest file revision
For reference, this was added to paper/coal in 0309017a1c71 and to gitweb in
f8b235fcf40d.
2015-10-12 15:20:04 +08:00
Pierre-Yves David
501c6cbdbc discovery: reference relevant bug in the faulty code
We extend the comment about this code flaw with more code flaw.
2015-10-09 15:44:00 -07:00
Pierre-Yves David
38aa2ec3c6 discovery: fix a typo in a comment
The idea here is that the code is imperfect, not that it is impossible to get
something behaving properly.
2015-10-09 15:37:05 -07:00
Pierre-Yves David
11cc1a1216 getsubset: get the unpacker version from the bundler
The current setup requires to pass both a packer and, optionally, the version
of the unpacker. This is confusing and error prone as the two value cannot
mismatch. Instead, we simply grab the version from the packer. This fixes a bug
where requesting a cg2 from 'hg bundle' were reported as changegroup 1.

I should have caught that in the initial changeset but I missed it somehow.
2015-10-09 14:59:37 -07:00
Emanuele Giaquinta
400c470fa8 test-convert-cvs: add sleep so cvs notices changes
This change makes the test pass on gcc112.fsffrance.org.
2015-10-12 15:42:32 +03:00
Emanuele Giaquinta
18869274ee cvsps: fix computation of parent revisions when log caching is on
cvsps computes the parent revisions of log entries by walking the cvs log
sorted by (rcs, revision) and by iteratively maintaining a 'versions'
dictionary which maps a (rcs, branch) pair onto the last revision seen for that
pair. When log caching is on and a log cache exists, cvsps fails to set the
parent revisions of new log entries because it does not iterate over the log
cache in the parents computation. A complication is that a file rcs can change
(move to/from the attic), with respect to its value in the log cache, if the
file is removed/added back. This patch adds an iteration over the log cache to
update the rcs of cached log entries, if changed, and to properly populate the
'versions' dictionary.
2015-10-07 11:33:52 +03:00
Matt Mackall
74b31a11a1 dirstate: batch calls to statfiles (issue4878)
This makes it more interruptible.
2015-10-06 16:26:20 -05:00
Yuya Nishihara
b1dadc9002 parsers: fix infinite loop or out-of-bound read in fm1readmarkers (issue4888)
The issue4888 was caused by 0-length obsolete marker. If msize is zero,
fm1readmarkers() never ends.

This patch adds several bound checks to fm1readmarker(). Therefore, 0-length
and invalid-size marker should be rejected.
2015-10-11 18:30:47 +09:00
Yuya Nishihara
62c0a27d40 parsers: read sizes of metadata pair of obsolete marker at once
This will make it easy to implement bound checking. Currently fm1readmarker()
has no protection for corrupted obsstore and can cause infinite loop or
out-of-bound reads.
2015-10-11 18:41:41 +09:00
Siddharth Agarwal
056b42f09f filemerge: clean up temp files in a finally block
This isn't really a big deal because the temp files are created in $TMPDIR, but
it makes some upcoming work simpler.
2015-10-07 21:51:24 -07:00
Pierre-Yves David
4b641b8c22 check-code: detect and ban 'util.Abort'
We have seen the light, please use the new way.
2015-10-08 12:53:09 -07:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Pierre-Yves David
7e69ac5da6 eol: rename 'error' to 'haserror'
The variable 'error' conflict with the module name that we would like to import
and use in a coming changeset.
2015-10-05 22:49:24 -07:00
Pierre-Yves David
3f7eb46ea2 discovery: rename 'error' to 'errormsg'
The variable 'error' conflict with the module name that we would like to import
and use in a coming changeset.
2015-10-05 22:29:57 -07:00
Christian Delahousse
7472064a6f histedit: delete histedit statefile on any exception during abort
When an user aborts a histedit, many things could go wrong. At a minimum, after
a histedit abort failure, their repository should be out of that state. We've
found situations where the user could not exit the histedit state without
manually deleting the histedit state file. This patch ensures that if any
exception happens during an abort, the histedit statefile will be deleted so
that users are out of the histedit state and can at least manually get the repo
back to a workable condition.
2015-10-05 16:44:45 -07:00
Christian Delahousse
77b3596e11 histedit: check presence of statefile before deleting it
When the histeditstate class instance has it's clear() method called, there is
nothing to check to see if the state file exists before deleting it. It may not
exist, which would create an exception. This patch allows clear to be called at
any time.

This will be needed for the following patch.
2015-10-06 15:09:28 -07:00
Christian Delahousse
6f2b3a468a histedit: add inprogress method to state class
If a histedit is progress, the 'histedit-state' file should exist. The patch
implements a convenience function to do check if a histedit is in progress.

This method will be use in next patch in the series.
2015-10-05 16:34:17 -07:00
FUJIWARA Katsunori
1a4ac4179d commands: use dirstateguard instead of begin/end-parentchange for backout
Before this patch, "hg backout" uses 'begin'/'end'-'parentchange()'
of 'dirstate' class to avoid writing incomplete dirstate changes out
at failure.

But this framework doesn't work as expected, if 'dirstate.write()' is
invoked between them. In fact, in-memory dirstate changes may be
written out at 'repo.status()' implied by 'merge.update()', even
before this patch.

To restore dirstate as expected at failure of "hg backout", this patch
uses 'dirstateguard' instead of 'begin'/'end'-'parentchange()'.
2015-10-09 03:53:47 +09:00
FUJIWARA Katsunori
7d9bf405fe commands: make "hg import" use dirstateguard only for --no-commit
Previous patch made dirstate changes in a transaction scope "all or
nothing". Therefore, 'dirstateguard' is meaningless, if its scope is
as same as one of the related transaction.

Before this patch, "hg import" uses 'dirstateguard' always, but
transaction is also started if '--no-commit' isn't specified.

To avoid redundancy, this patch makes "hg import" use dirstateguard
only if transaction isn't started (= '--no-commit' is specified).

In this patch, 'if dsguard' can be examined safely, because 'dsguard'
is initialized (with None) before outermost 'try'.
2015-10-09 03:53:47 +09:00
FUJIWARA Katsunori
05986aa7a1 cmdutil: stop tryimportone from using dirstateguard (BC)
There is no user of 'cmdutil.tryimportone()' other than
'commands.import_()', which can restore dirstate at failure of
applying patches by transaction or dirstateguard.

Therefore, it is reasonable to stop 'tryimportone()' from using
redundant 'dirstateguard', even though it changes behavior of
'tryimportone()'.

After this patch, 3rd party extensions should use 'dirstateguard' or
so explicitly, if they want to restore dirstate at failure of
importing a patch.
2015-10-09 03:53:46 +09:00
FUJIWARA Katsunori
f2187903e7 dirstate: remove meaningless dirstateguard
Previous patch made dirstate changes in a transaction scope "all or
nothing". Therefore, 'dirstateguard' is meaningless, if its scope is
as same as one of the related transaction.

This patch removes such meaningless 'dirstateguard' usage.
2015-10-09 03:53:46 +09:00
FUJIWARA Katsunori
45bfdb573e localrepo: execute appropriate actions for dirstate at releasing transaction
Before this patch, in-memory dirstate changes are still kept over a
transaction scope boundary regardless of the result of it.

For "all or nothing" policy of the transaction, in-memory dirstate
changes should be:

  - written out at successful closing a transaction, because
    subsequent 'dirstate.invalidate()' can lose them

  - discarded at failure of a transaction, because outer
    'wlock.release()' or so may write them out

To discard all changes in a transaction completely, this patch also
restores '.hg/dirstate' by '.hg/journal.dirstate' at failure, because
'transaction' itself does nothing for files related to '.hg/journal.*'
in such case (therefore, renaming in this patch is safe enough).

This is a part of preparations for "transactional dirstate". See also
the wiki page below for detail about it.

    https://mercurial.selenic.com/wiki/DirstateTransactionPlan

This patch also removes redundant 'dirstate.invalidate()' just before
aborting a transaction for shelve/unshelve.
2015-10-09 03:53:46 +09:00
FUJIWARA Katsunori
47524f74ef transaction: add releasefn to notify the end of a transaction scope
'releasefn' is used by subsequent patch, to do appropriate action
according to the result of it at the end of a transaction scope.

To ensure that 'releasefn' is invoked only once, this patch invokes it
after assignment 'self.journal = None', because such assignment
prevents from invoked 'transaction._abort()' again via '__del__()'.

    def __del__(self):
        if self.journal:
            self._abort()
2015-10-09 03:53:46 +09:00
Siddharth Agarwal
4376011dff filemerge: move post-merge checks into a separate function
This makes the overall filemerge function easier to follow, and makes upcoming
work simpler.
2015-10-07 23:35:30 -07:00
Siddharth Agarwal
3fbcf75f0f filemerge._xmerge: drop no longer necessary 'if r:' check
Cleanup from an earlier patch to make premerge be directly called from the main
filemerge function.
2015-10-08 14:18:43 -07:00