Cloning can be an expensive operation for servers because the server
generates a bundle from existing repository data at request time. For
a large repository like mozilla-central, this consumes 4+ minutes
of CPU time on the server. It also results in significant network
utilization. Multiplied by hundreds or even thousands of clients and
the ensuing load can result in difficulties scaling the Mercurial server.
Despite generation of bundles being deterministic until the next
changeset is added, the generation of bundles to service a clone request
is not cached. Each clone thus performs redundant work. This is
wasteful.
This patch introduces the "clonebundles" extension and related
client-side functionality to help alleviate this deficiency. The
client-side feature is behind an experimental flag and is not enabled by
default.
It works as follows:
1) Server operator generates a bundle and makes it available on a
server (likely HTTP).
2) Server operator defines the URL of a bundle file in a
.hg/clonebundles.manifest file.
3) Client `hg clone`ing sees the server is advertising bundle URLs.
4) Client fetches and applies the advertised bundle.
5) Client performs equivalent of `hg pull` to fetch changes made since
the bundle was created.
Essentially, the server performs the expensive work of generating a
bundle once and all subsequent clones fetch a static file from
somewhere. Scaling static file serving is a much more manageable
problem than scaling a Python application like Mercurial. Assuming your
repository grows less than 1% per day, the end result is 99+% of CPU
and network load from clones is eliminated, allowing Mercurial servers
to scale more easily. Serving static files also means data can be
transferred to clients as fast as they can consume it, rather than as
fast as servers can generate it. This makes clones faster.
Mozilla has implemented similar functionality of this patch on
hg.mozilla.org using a custom extension. We are hosting bundle files in
Amazon S3 and CloudFront (a CDN) and have successfully offloaded
>1 TB/day in data transfer from hg.mozilla.org, freeing up significant
bandwidth and CPU resources. The positive impact has been stellar and
I believe it has proved its value to be included in Mercurial core. I
feel it is important for the client-side support to be enabled in core
by default because it means that clients will get faster, more reliable
clones and will enable server operators to reduce load without
requiring any client-side configuration changes (assuming clients are
up to date, of course).
The scope of this feature is narrowly and specifically tailored to
cloning, despite "serve pulls from pre-generated bundles" being a valid
and useful feature. I would eventually like for Mercurial servers to
support transferring *all* repository data via statically hosted files.
You could imagine a server that siphons all pushed data to bundle files
and instructs clients to apply a stream of bundles to reconstruct all
repository data. This feature, while useful and powerful, is
significantly more work to implement because it requires the server
component have awareness of discovery and a mapping of which changesets
are in which files. Full, clone bundles, by contrast, are much simpler.
The wire protocol command is named "clonebundles" instead of something
more generic like "staticbundles" to leave the door open for a new, more
powerful and more generic server-side component with minimal backwards
compatibility implications. The name "bundleclone" is used by Mozilla's
extension and would cause problems since there are subtle differences
in Mozilla's extension.
Mozilla's experience with this idea has taught us that some form of
"content negotiation" is required. Not all clients will support all
bundle formats or even URLs (advanced TLS requirements, etc). To ensure
the highest uptake possible, a server needs to advertise multiple
versions of bundles and clients need to be able to choose the most
appropriate from that list one. The "attributes" in each
server-advertised entry facilitate this filtering and sorting. Their
use will become apparent in subsequent patches.
Initial inspiration and credit for the idea of cloning from static files
belongs to Augie Fackler and his "lookaside clone" extension proof of
concept.
That function is actually not returning public ancestors at all. This is
pointed by the second line of the docstring...
The bundling behavior was made correct in a5141977198d but with confusion
remaining regarding what each function was doing.
This close issue4737, because this highlight that shelve is actually -not-
bundling too much data (this was actually properly tested).
cvsps computes the parent revisions of log entries by walking the cvs log
sorted by (rcs, revision) and by iteratively maintaining a 'versions'
dictionary which maps a (rcs, branch) pair onto the last revision seen for that
pair. When log caching is on and a log cache exists, cvsps fails to set the
parent revisions of new log entries because it does not iterate over the log
cache in the parents computation. A complication is that a file rcs can change
(move to/from the attic), with respect to its value in the log cache, if the
file is removed/added back. This patch adds an iteration over the log cache to
update the rcs of cached log entries, if changed, and to properly populate the
'versions' dictionary.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
When an user aborts a histedit, many things could go wrong. At a minimum, after
a histedit abort failure, their repository should be out of that state. We've
found situations where the user could not exit the histedit state without
manually deleting the histedit state file. This patch ensures that if any
exception happens during an abort, the histedit statefile will be deleted so
that users are out of the histedit state and can at least manually get the repo
back to a workable condition.
When the histeditstate class instance has it's clear() method called, there is
nothing to check to see if the state file exists before deleting it. It may not
exist, which would create an exception. This patch allows clear to be called at
any time.
This will be needed for the following patch.
If a histedit is progress, the 'histedit-state' file should exist. The patch
implements a convenience function to do check if a histedit is in progress.
This method will be use in next patch in the series.
Previous patch made dirstate changes in a transaction scope "all or
nothing". Therefore, 'dirstateguard' is meaningless, if its scope is
as same as one of the related transaction.
This patch removes such meaningless 'dirstateguard' usage.
Before this patch, in-memory dirstate changes are still kept over a
transaction scope boundary regardless of the result of it.
For "all or nothing" policy of the transaction, in-memory dirstate
changes should be:
- written out at successful closing a transaction, because
subsequent 'dirstate.invalidate()' can lose them
- discarded at failure of a transaction, because outer
'wlock.release()' or so may write them out
To discard all changes in a transaction completely, this patch also
restores '.hg/dirstate' by '.hg/journal.dirstate' at failure, because
'transaction' itself does nothing for files related to '.hg/journal.*'
in such case (therefore, renaming in this patch is safe enough).
This is a part of preparations for "transactional dirstate". See also
the wiki page below for detail about it.
https://mercurial.selenic.com/wiki/DirstateTransactionPlan
This patch also removes redundant 'dirstate.invalidate()' just before
aborting a transaction for shelve/unshelve.
patchbomb relies on the 'hg bundle' command to generate an attached bundle using
--bundle. However, while 'hg bundle' has a --type option, patchbomb did not.
This is becoming very relevant since we are about to issue bundle2 for
general-delta repository.
This was tracked as issue4863
This config allows to specify a public location where your changeset can be
found. It then include a dedicated patch header show a command to be used to
retrieve the change. See the test for example.
This is flagged as experimental because this feature is not safe until we have
more logic to test that:
- changeset actually exists on destination
- changeset is draft on destination.
As all this is experimental, bike shedding can happily happens before we remove
the experimental flag.
Before this patch, "hg unshelve" uses aborting a current transaction
to discard temporary changes while unshelving.
This assumes that dirstate changes in a transaction scope are kept
even after aborting it. But this assumption will be broken by
"transactional dirstate". See the wiki page below for detail about it.
https://mercurial.selenic.com/wiki/DirstateTransactionPlan
This patch explicitly saves shelved dirstate just before aborting
current transaction, and restore dirstate with it after aborting by
utility function '_aborttransaction()' added by previous patch.
Before this patch, "hg shelve" uses aborting a current transaction to
discard temporary changes while shelving.
This assumes that dirstate changes in a transaction scope are kept
even after aborting it. But this assumption will be broken by
"transactional dirstate". See the wiki page below for detail about it.
https://mercurial.selenic.com/wiki/DirstateTransactionPlan
This patch explicitly saves shelved dirstate just before aborting
current transaction, and restore dirstate with it after aborting by
utility function '_aborttransaction()' added by previous patch.
This patch replaces 'if tr: tr.abort()' by 'lockmod.release(tr)',
because the former is already done in '_aborttransaction()' (and the
latter has no effect), if current transaction is aborted in it
successfully. Otherwise, the latter is enough to trigger aborting.
"hg shelve" and "hg unshelve" use aborting a current transaction to
discard temporary changes while (un)shelving.
This assumes that dirstate changes in a transaction scope are kept
even after aborting it. But this assumption will be broken by
"transactional dirstate". See the wiki page below for detail about it.
https://mercurial.selenic.com/wiki/DirstateTransactionPlan
This patch adds utility function "_aborttransaction()" to abort
current transaction but keep dirstate changes for (un)shelving.
'dirstate.invalidate()' just after aborting a transaction should be
removed soon by subsequent patch, which writes or discards in-memory
dirstate changes at releasing transaction according to the result of
it.
BTW, there are some other ways below, which (seem to, at first glance)
resolve this issue. But this patch chose straightforward way for ease
of review and future refactorring.
- commit transaction at first, and then rollback it
It causes unintentional "dirty read" of running transaction to
other processes at committing it.
- use dirstateguard to save and restore shelved dirstate
After DirstateTransactionPlan, making 'dirstate.write()' write
in-memory changes into actual file requires
'transaction.writepending()' while transaction running.
It causes meaningless writing other in-memory changes out, even
though they are never referred.
In addition to it, it isn't desirable that scope of dirstateguard
and transaction intersects each other.
- get list of files changed from the parent, keep it in memory, and
emulate that changes after aborting transaction
This additional memory consumption may block aborting transaction
in large repository (on small resource environment).
Before this patch, 'bmstore.write()' always write in-memory bookmark
changes into '.hg/bookmarks' regardless of transaction activity.
If 'bmstore.write()' is invoked inside a transaction and it writes
changes into '.hg/bookmarks', then:
- original bookmarks aren't restored at failure of that transaction
This breaks "all or nothing" policy of the transaction.
BTW, "hg rollback" can restore bookmarks successfully even before
this patch, because original bookmarks are saved into
'.hg/journal.bookmarks' at the beginning of the transaction, and
it (actually renamed as '.hg/undo.bookmarks') is used by "hg
rollback".
- uncommitted bookmark changes are visible to other processes
This is a kind of "dirty read"
For example, 'rebase.rebase()' implies 'bmstore.write()', and it may
be executed inside the transaction of "hg unshelve". Then, intentional
aborting at the end of "hg unshelve" transaction doesn't restore
original bookmarks (this is obviously a bug).
This patch uses 'bmstore.recordchange()' instead of actual writing by
'bmstore._writerepo()', if any transaction is active
This patch also removes meaningless restoring bmstore explicitly at
the end of "hg shelve".
This patch doesn't choose fixing each 'bmstore.write()' callers as
like below, because writing similar code here and there is very
redundant.
before:
bmstore.write()
after:
tr = repo.currenttransaction()
if tr:
bmstore.recordchange(tr)
else:
bmstore.write()
Even though 'bmstore.write()' itself may have to be discarded by
putting bookmark operations into transaction scope, this patch chose
fixing it to implement "transactional dirstate" at first.
We will generate different changegroup if general delta is enabled so we gather
this in the lower level function. There wasn't any good reason to have it in
the main code anyway.
This regressed in f07f4c45a8f2, when trying to conditionally disable archiving
of largefiles.
I'm not sure if wrapfunction() is the right way to do this, but it seems to
work. The mysterious issue with lfstatus getting out of sync in the proxy and
the unfiltered view crops up again here. See the referenced cset for more info.
This is a preparation for using 'repo.rollback()' instead of aborting
a current running transaction for "shelve" and "unshelve".
Before this patch, updating files as a part of 'repo.rollback()'
overridden by keyword extension always follows 'restrict' mode of the
command currently executed.
"merge", "unshelve" and so on should be 'restrict'-ed, because keyword
expansion may cause unexpected conflicts at merging while these
commands.
But, if 'repo.rollback()' is invoked while executing 'restrict'-ed
commands, modified files in the working directory are marked as
"CLEAN" unexpectedly by code path below:
# 'lookup' below is True at updating modified files for rollback
kwcmd = self.restrict and lookup # kwexpand/kwshrink
:
if kwcmd:
self.repo.dirstate.normal(f)
On the other hand, "rollback" command isn't 'restrict'-ed, because
rollbacking itself doesn't imply merging.
Therefore, disabling 'restrict' mode while updating files as a part of
'repo.rollback()' regardless of current 'restrict' mode should be
reasonable.
We want to see partial command results as soon as possible. But the buffering
mode of stdout (= pager's stdin) was set to fully-buffered because it isn't
associated with a tty. So, this patch recreates new stdout object to force its
buffering mode.
Because two file objects are associated with the same stdout fd and their
destructors will call close(), one of them must be closed carefully. Python
expects that the stdout fd never be closed even after sys.stdout.close() [1],
but newstdout has no such hack. So this patch calls newstdout.close()
immediately before duplicating the original stdout fd to sys.stdout.
operation sys.stdout newstdout fd
--------------------- ---------- --------- --------
newstdout.close() open closed closed
os.dup2(stdoutfd, ..) open closed open
del sys.stdout closed closed open [1]
[1]: https://hg.python.org/cpython/file/v2.7.10/Python/sysmodule.c#l1391
In some languages that have no caps, "DEPRECATED" and "deprecated" can be
translated to the same byte sequence. So it is too wild to exclude messages
by _("DEPRECATED").
This patch avoids unnecessary conflicts to resolve during rebase for the users
of changeset evolution.
This patch modifies rebase to skip obsolete commits if they are being rebased on
their successors.
It introduces a new rebase state 'revprecursor' for these revisions that are
being skipped and a new message to inform the user of what is happening.
This feature is gated behind the config flag experimental.rebaseskipobsolete
When an obsolete commit is skipped, the output is:
not rebasing 14:9ad579b4a5de "I", already in destination as 17:fc37a630c901 "K"
Mutable default arguments are know to the state of California to cause bugs. The
underlying function already handle "None" as an option value, so we do not need
to do anything else.