Now that we have support for detecting compatible stream clone bundles
in bundle specifications, we can safely add support for applying stream
clone bundles to the clone bundles feature.
For the same reasons that we don't produce stream clone bundles with `hg
bundle`, we don't support consuming stream clone bundles with `hg
unbundle`. We introduce a complementary debug command for applying
stream clone bundles. This command is mostly to facilitate testing.
Although it may be used to manually apply stream clone bundles until a
more formal mechanism is (possibly) adopted.
Now that we have support for recognizing the streaming clone bundle
type, add a debug command for creating them.
I decided to create a new debug command instead of adding support to `hg
bundle` because stream clone bundles are not exactly used the same way
as normal bundle files and I don't want to commit to supporting them
through the official `hg bundle` command forever. A debug command,
however, can be changed without as much concern for backwards
compatibility.
As part of this, `hg bundle` will explicitly reject requests to produce
stream bundles.
This command will be required by server operators using stream clone
bundles with the clone bundles feature.
c67339617276 (while 3.4 code-freeze) made all 'update' hooks run after
releasing wlock for visibility of in-memory dirstate changes. But this
breaks paired invocation of 'preupdate' and 'update' hooks.
For example, 'hg backout --merge' for TARGET revision, which isn't
parent of CURRENT, consists of steps below:
1. update from CURRENT to TARGET
2. commit BACKOUT revision, which backs TARGET out
3. update from BACKOUT to CURRENT
4. merge TARGET into CURRENT
Then, we expects hooks to run in the order below:
- 'preupdate' on CURRENT for (1)
- 'update' on TARGET for (1)
- 'preupdate' on BACKOUT for (3)
- 'update' on CURRENT for (3)
- 'preupdate' on TARGET for (4)
- 'update' on CURRENT/TARGET for (4)
But hooks actually run in the order below:
- 'preupdate' on CURRENT for (1)
- 'preupdate' on BACKOUT for (3)
- 'preupdate' on TARGET for (4)
- 'update' on TARGET for (1), but actually on CURRENT/TARGET
- 'update' on CURRENT for (3), but actually on CURRENT/TARGET
- 'update' on CURRENT for (4), but actually on CURRENT/TARGET
Root cause of the issue focused by c67339617276 is that external
'update' hook process can't view in-memory changes (especially, of
dirstate), because they aren't written out until the end of
transaction (or wlock).
Now, hooks can be invoked just after updating, because previous
patches made in-memory changes visible to external process.
This patch may break backward compatibility from the point of view of
"scheduling hook execution", but should be reasonable because 'update'
hooks had been executed in this order before 3.4.
This patch tests "hg backout" and "hg unshelve", because the former
activates the transaction before 'update' hook invocation, but the
former doesn't.
This patch centralizes passing HG_PENDING to external hook process
into '_exthook()'. To make in-memory changes visible to external hook
process, this patch does:
- write (or schedule to write) in-memory dirstate changes, and
- set HG_PENDING environment variable, if:
- a transaction is running, and
- there are in-memory changes to be visible
This patch tests some commands with some hooks, because transaction
activity of a same hook differs from each other ("---": "not tested").
======== ========= ========= ============
command preupdate precommit pretxncommit
======== ========= ========= ============
unshelve o --- ---
backout x --- ---
import --- o o
qrefresh --- x o
======== ========= ========= ============
Each hooks are examined separately to prevent in-memory changes from
being visible to external process accidentally by side effect of hooks
previously invoked.
Before this patch, external editor process for the commit log can't
view some in-memory changes (especially, of dirstate), because they
aren't written out until the end of transaction (or wlock).
This causes unexpected output of Mercurial commands spawned from that
editor process.
To make in-memory changes visible to external editor process, this
patch does:
- write (or schedule to write) in-memory dirstate changes, and
- set HG_PENDING environment variable, if:
- a transaction is running, and
- there are in-memory changes to be visible
"hg diff" spawned from external editor process for "hg qrefresh"
shows:
- "changes newly imported into the topmost" before ab68b153ce34(*)
- "all changes recorded in the topmost by refreshing" after this patch
(*) ab68b153ce34 changed steps invoking editor process
Even though backward compatibility may be broken, the latter behavior
looks reasonable, because "hg diff" spawned from the editor process
consistently shows "what changes new revision records" regardless of
invocation context.
In fact, issue4378 itself should be resolved by b46029eb5b29, which
made 'repo.transaction()' write in-memory dirstate changes out
explicitly before starting transaction. It also made "hg qrefresh"
imply 'dirstate.write()' before external editor invocation in call
chain below.
- mq.queue.refresh
- strip.strip
- repair.strip
- localrepository.transaction
- dirstate.write
- localrepository.commit
- invoke external editor
Though, this patch has '(issue4378)' in own summary line to indicate
that issues like issue4378 should be fixed by this.
BTW, this patch adds '-m' option to a 'hg ci --amend' execution in
'test-commit-amend.t', to avoid invoking external editor process.
In this case, "unsure" states may be changed to "clean" according to
timestamp or so on. These changes should be written into pending file,
if external editor invocation is required,
Then, writing dirstate changes out breaks stability of test, because
it shows "transaction abort!/rollback completed" occasionally.
Aborting after editor process invocation while commands below may
cause similar instability of tests, too (AFAIK, there is no more such
one, at this revision)
- commit --amend
- without --message/--logfile
- import
- without --message/--logfile,
- without --no-commit,
- without --bypass,
- one of below, and
- patch has no description text, or
- with --edit
- aborting at the 1st patch, which adds or removes file(s)
- if it only changes existing files, status is checked only for
changed files by 'scmutil.matchfiles()', and transition from
"unsure" to "normal" in dirstate doesn't occur (= dirstate
isn't changed, and written out)
- aborting at the 2nd or later patch implies other pending
changes (e.g. changelog), and always causes showing
"transaction abort!/rollback completed"
This would have caught the problem fixed by 47bcbe06a48d. There are
other *.wxs files that can be checked, but they appear to be more
complicated. For example, locale.wxs has what appears to be foreach
loop support, as well as variable substitution.
By checking `hg files` to determine tracked file, this is able to avoid false
failures when other junk is present in the filesystem, like *.orig files.
I can't tell if the map-cmdline.status file is not included on purpose, but I
don't see the purpose of excluding it. The missing help files seem reasonable
for Windows.
If a sub-topic/section is requested and the main topic corresponds to
a topic with sub-topics, we now look for and return content for a
sub-topic if found.
With this patch, `hg help internals.X` now works. hgweb does not yet
render sub-topics, however.
We introduce the "internals" help topic, which renders an index of
available sub-topics. The sub-topics themselves are still not
reachable via the help system.
As part of attempting to more aggressively use the existing
lrucachedict, collections.deque operations were frequently
showing up in profiling output, negating benefits of caching.
Searching the internet seems to tell me that the most efficient
way to implement an LRU cache in Python is to have a dict indexing
the cached entries and then to use a doubly linked list to track
freshness of each entry. So, this patch replaces our existing
lrucachedict with a version using such a pattern.
The recently introduced perflrucachedict command reveals the
following timings for 10,000 operations for the following cache
sizes for the existing cache:
n=4 init=0.004079 gets=0.003632 sets=0.005188 mixed=0.005402
n=8 init=0.004045 gets=0.003998 sets=0.005064 mixed=0.005328
n=16 init=0.004011 gets=0.004496 sets=0.005021 mixed=0.005555
n=32 init=0.004064 gets=0.005611 sets=0.005188 mixed=0.006189
n=64 init=0.003975 gets=0.007684 sets=0.005178 mixed=0.007245
n=128 init=0.004121 gets=0.012005 sets=0.005422 mixed=0.009471
n=256 init=0.004143 gets=0.020295 sets=0.005227 mixed=0.013612
n=512 init=0.004039 gets=0.036703 sets=0.005243 mixed=0.020685
n=1024 init=0.004193 gets=0.068142 sets=0.005251 mixed=0.033064
n=2048 init=0.004070 gets=0.133383 sets=0.005160 mixed=0.050359
n=4096 init=0.004053 gets=0.265194 sets=0.004868 mixed=0.048352
n=8192 init=0.004087 gets=0.542218 sets=0.004562 mixed=0.032753
n=16384 init=0.004106 gets=1.064055 sets=0.004179 mixed=0.020367
n=32768 init=0.004034 gets=2.097620 sets=0.004260 mixed=0.013031
n=65536 init=0.004108 gets=4.106390 sets=0.004268 mixed=0.010191
As the data shows, the existing cache's retrieval performance
diminishes linearly with cache size. (Keep in mind the microbenchmark
is testing 100% cache hit rate.)
The new cache implementation reveals the following:
n=4 init=0.006665 gets=0.006541 sets=0.005733 mixed=0.006876
n=8 init=0.006649 gets=0.006374 sets=0.005663 mixed=0.006899
n=16 init=0.006570 gets=0.006504 sets=0.005799 mixed=0.007057
n=32 init=0.006854 gets=0.006459 sets=0.005747 mixed=0.007034
n=64 init=0.006580 gets=0.006495 sets=0.005740 mixed=0.006992
n=128 init=0.006534 gets=0.006739 sets=0.005648 mixed=0.007124
n=256 init=0.006669 gets=0.006773 sets=0.005824 mixed=0.007151
n=512 init=0.006701 gets=0.007061 sets=0.006042 mixed=0.007372
n=1024 init=0.006641 gets=0.007620 sets=0.006387 mixed=0.007464
n=2048 init=0.006517 gets=0.008598 sets=0.006871 mixed=0.008077
n=4096 init=0.006720 gets=0.010933 sets=0.007854 mixed=0.008663
n=8192 init=0.007383 gets=0.015969 sets=0.010288 mixed=0.008896
n=16384 init=0.006660 gets=0.025447 sets=0.011208 mixed=0.008826
n=32768 init=0.006658 gets=0.044390 sets=0.011192 mixed=0.008943
n=65536 init=0.006836 gets=0.082736 sets=0.011151 mixed=0.008826
Let's go through the results.
The new cache takes longer to construct. ~6.6ms vs ~4.1ms. However,
this is measuring 10,000 __init__ calls, so the difference is
~0.2us/instance. We currently only create lrucachedict for manifest
instances, so this regression is not likely relevant.
The new cache is slightly slower for retrievals for cache sizes
< 1024. It's worth noting that the only existing use of lurcachedict
is in manifest.py and the default cache size is 4. This regression
is worrisome. However, for n=4, the delta is ~2.9s for 10,000 lookups,
or ~0.29us/op. Again, this is a marginal regression and likely not
relevant in the real world. Timing `hg log -p -l 100` for
mozilla-central reveals that cache lookup times are dominated by
decompression and fulltext resolution (even with lz4 manifests).
The new cache is significantly faster for retrievals at larger
capacities. Whereas the old implementation has retrieval performance
linear with cache capacity, the new cache is constant time until much
larger values. And, when it does start to increase significantly, it
is a few magnitudes faster than the current cache.
The new cache does appear to be slower for sets when capacity is large.
However, performance is similar for smaller capacities. Of course,
caches should generally be optimized for retrieval performance because
if a cache is getting more sets than gets, it doesn't really make
sense to cache. If this regression is worrisome, again, taking the
largest regression at n=65536 of ~6.9ms for 10,000 results in a
regression of ~0.68us/op. This is not significant in the grand scheme
of things.
Overall, the new cache is performant at retrievals at much larger
capacity values which makes it a generally more useful cache backend.
While there are regressions, their absolute value is extremely small.
Since we aren't using lrucachedict aggressively today, these
regressions should not be relevant. The improved scalability of
lrucachedict should enable us to more aggressively utilize
lrucachedict for more granular caching (read: higher capacity caches)
in the near future. The impetus for this patch is to establish a cache
of decompressed revlog revisions, notably manifest revisions. And since
delta chains can grow to >10,000 and cache hit rate can be high, the
improved retrieval performance of lrucachedict should be relevant.
We had them on 'test-check-code-hg.t' to avoid collision with the test checking
'check-code' itself. Now that this one have been rename, we can safely remove
this suffix for all of them. This get them in line with 'check-pyflakes.t'.
This test (making sure the 'check-code' script run as intended) have been
confused with the test making that the mercurial code base comply with our
coding still by multiple generations of contributors.
We are moving it out of the way so that all tests starting with
'test-check' are now doing compliance testing.
While I was here, I removed the try..except around importing cStringIO
because cStringIO should always be importable on modern Python versions.
We already do an unconditional import in other files.
Before, hg help -c was the same as hg help, now it only shows commands.
Before, hg help -e was the same as hg help, now it only shows extensions.
Before, hg help -k crashed, now it shows all topics.
When committing interactively without changes, the user would get a ValueError
exception. This patch adds a dictionary to the return value of filterpatch
when there are no files to change.
The 'peer.known' call (handled at the repository level) was applying its own
manual filtering (looking at phases) instead of relying on the repoview
mechanism. This led to the discovery finding more "common" node that
'getbundle' was willing to recognised. From there, bad things happen, issue4982
is a symptom of it. While situations like described in issue4982 can still
happen because of race conditions, fixing 'peer.known' is important for
consistency in all cases.
We update the code to use 'repoview' filtering. This lead to small changes in
the tests for exchanging obsolescence marker because the discovery yields
different results.
The test affected in 'test-obsolete-changeset-exchange.t' is a test for
issue4982 getting back to its expected state.
A fix to issue4982 (not fixed in this patch) will reinforce the filtering
during discovery. This will makes two of our test repositories appear
unrelated (because all common content is properly hidden). To avoid this, we
introduce an extra base changeset that will not get obsoleted. This affects
various test output so we put this addition in its own changeset.
We currently allow updating and merging (with --force) when there are
unresolved merge conflicts, as long as there is only one parent of the
working copy. Even worse, when updating to another revision
(linearly), if one of the unresolved files (including any conflict
markers in the working copy) can now be merged cleanly with the target
revision, the file becomes marked as resolved.
While we could potentially allow updates that affect only files that
are not in the set of unresolved files, that's considerably more work,
and we don't have a use case for it anyway. Instead, let's keep it
simple and refuse any merge or update (without -C) when there are
unresolved conflicts.
Note that test-merge-local.t explicitly checks for conflict markers
that get carried over on update. It's unclear if that was intentional
or not, but it seems bad enough that we should forbid it. The simplest
way of fixing the test case is to leave the conflict markers in place
and just mark the files resolved, so let's just do that for now.
While I was here, I removed condition code for failure to import json.
This code was necessary to support Python < 2.6, which didn't include
the json module.
Python 3 is inevitable. There have been incremental movements towards
converting the code base to be Python 3 compatible. Unfortunately, we
don't have any tests that look for Python 3 compatibility. This patch
changes that.
We introduce a check-py3-compat.py script whose role is to verify
Python 3 compatibility of the files passed in. We add a test that
calls this script with all .py files from the source checkout.
The script currently only verifies that absolute_import and
print_function are used. These are the low hanging fruits for Python
compatbility. Over time, we can include more checks, including
verifying we're able to load each Python file with Python 3. You
have to start somewhere.
Accepting this patch means that all new .py files must have
absolute_import and print_function (if "print" is used) to avoid
a new warning about Python 3 incompatibility. We've already
converted several files to use absolute_import and print_function
is in the same boat, so I don't think this is such a radical
proposition.
As discussed on the list, we are adding an official way to keep old API around
for a short time in order to help third party developer to catch up. The
deprecated API will issue developer warning (issued by default during test runs)
to warn extensions authors that they need to upgrade their code without
instantaneously breaking tool chains and normal users.
The version is passed as an explicit argument so that developer think about it
and a potential future script can automatically check for it.
This is not build as a decorator because accessing the 'ui' instance will likely
be different each time. The message is also free form because deprecated API are
replaced in a variety of ways. I'm not super happy about the final rendering of
that message, but this is a developer oriented warning and I would like to move
forward.
Before this patch, import-checker.py didn't know if a name in ImportFrom
statement are module or not. Therefore, it complained the following example
did "direct symbol import from mercurial".
# hgext/foo.py
from mercurial import hg
This patch reuses the dict of local modules to filter out sub-module names.
We currently use 'd' to indicate that a manifest entry is a
directory. Let's switch to 't', since that's not a valid hex digit and
therefore easier to spot in the raw manifest data.
This will break any existing repos with tree manifests, but it's still
an experimental feature and there are probably only a few test repos
in existence with 'd' flags.
Power users often want to apply per-path configuration options. For
example, they may want to declare an alternate URL for push operations
or declare a revset of revisions to push when `hg push` is used
(as opposed to attempting to push all revisions by default).
This patch establishes the use of sub-options (config options with
":" in the name) to declare additional behavior for paths.
New sub-options are declared by using the new ``@ui.pathsuboption``
decorator. This decorator serves multiple purposes:
* Declaring which sub-options are registered
* Declaring how a sub-option maps to an attribute on ``path``
instances (this is needed to `hg paths` can render sub-options
and values properly)
* Validation and normalization of config options to attribute
values
* Allows extensions to declare new sub-options without monkeypatching
* Allows extensions to overwrite built-in behavior for sub-option
handling
As convenient as the new option registration decorator is, extensions
(and even core functionality) may still need an additional hook point
to perform finalization of path instances. For example, they may wish
to validate that multiple options/attributes aren't conflicting with
each other. This hook point could be added later, if needed.
To prove this new functionality works, we implement the "pushurl"
path sub-option. This option declares the URL that `hg push` should
use by default.
We require that "pushurl" is an actual URL. This requirement might be
controversial and could be dropped if there is opposition. However,
objectors should read the complicated code in ui.path.__init__ and
commands.push for resolving non-URL values before making a judgement.
We also don't allow #fragment in the URLs. I intend to introduce a
":pushrev" (or similar) option to define a revset to control which
revisions are pushed when "-r <rev>" isn't passed into `hg push`.
This is much more powerful than #fragment and I don't think #fragment
is useful enough to continue supporting.
The [paths] section of the "config" help page has been updated
significantly. `hg paths` has been taught to display path sub-options.
The docs mention that "default-push" is now deprecated. However, there
are several references to it that need to be cleaned up. A large part
of this is converting more consumers to the new paths API. This will
happen naturally as more path sub-options are added and more and more
components need to access them.
We have debug commands for displaying overall revlog statistics
(debugrevlog) and for dumping a revlog index (debugindex). As part
of investigating various aspects of revlog behavior and performance,
I found it important to have an understanding of how revlog
delta chains behave in practice.
This patch implements a "debugdeltachain" command. For each revision
in a revlog, it dumps information about the delta chain. Which delta
chain it is part of, length of the delta chain, distance since base
revision, info about base revision, size of the delta chain, etc. The
generic formatting facility is used, which means we can templatize
output and get machine readable output like JSON.
This command has already uncovered some weird history in
mozilla-central I didn't know about. So I think it's valuable.
Previously, `hg histedit` required a revision argument specifying which
revision to use as the base for the current histedit operation. There
was an undocumented and experimental "histedit.defaultrev" option that
supported defining a single revision to be used if no argument is
passed.
Mercurial knows what changesets can be edited. And in most scenarios,
people want to edit this history of everything on the current head that
is rewritable. Making histedit do this by default and not require
an explicit argument or additional configuration is a major usability
win and will enable more people to use histedit.
This patch changes the behavior of the experimental and undocumented
"histedit.defaultrev" config option to select an appropriate base
revision by default. Comprehensive tests exercising the edge cases
in the new, somewhat complicated default revset have been added.
Surprisingly, no tests broke. I guess we were never testing the
behavior with no ANCESTOR argument (it used to fail with
"abort: histedit requires exactly one ancestor revision"). The new
behavior is much more user friendly.
The functionality for choosing the default base revision has been
moved to destutil.py, where it can easily be modified by extensions.
In the most complex case, we try using the incoming delta base, then
we try both parents, and then we try the previous revlog entry. If
none of these result in a good delta, we natually use the null
revision as base. However, we sometimes consider the nullrev before we
have exhausted our other options. Specifically, when both parents are
null, we use the nullrev as delta base if it produces a good delta
(according to _isgooddelta()), and we fail to try the previous revlog
entry as delta base. After e60126c6093d (addrevision: use general
delta when the incoming base delta is bad, 2015-12-01), it can also
happen for non-merge commits when the incoming delta is not good.
The Firefox repo (from many months back) shrinks a tiny bit with this
patch: from 1.855GB to 1.830GB (1.4%). The hg repo itself shrinks even
less: by less than 0.1%. There may be repos that get larger instead.
This undoes the unexplained test change in e60126c6093d.
bundle2 is the new and preferred wire protocol format. For various
reasons, server operators may wish to force clients to use it.
One reason is performance. If a repository is stored in generaldelta,
the server must recompute deltas in order to produce the bundle1
changegroup. This can be extremely expensive. For mozilla-central,
bundle generation typically takes a few minutes. However, generating
a non-gd bundle from a generaldelta encoded mozilla-central requires
over 30 minutes of CPU! If a large repository like mozilla-central
were encoded in generaldelta and non-gd clients connected, they could
easily flood a server by cloning.
This patch gives server operators config knobs to control whether
bundle1 is allowed for push and pull operations. The default is to
support legacy bundle1 clients, making this patch backwards compatible.
Before this change, asking for file from history (eg: 'hg cat -r 42 foo/bar')
could fail because of the current content of the working copy (eg: current
"foo" being a symlink). As the working copy state have no influence on the
content of the history, we can safely skip these checks.
The working copy context class have a different 'match'
implementation. That implementation still use the repo.auditor will
still catch symlink traversal.
I've audited all stuff calling "match" and they all go through a ctx
in a sensible way. The most unclear case was diff which still seemed
okay. You raised my paranoid level today and I double checked through
tests. They behave properly.
The odds of someone using the wrong (matching with a changectx for
operation that will eventually touch the file system) is non-zero
because you are never sure of what people will do. But I dunno if we
can fight against that. So I would not commit to "never" for "at this
level" and "in the future" if someone write especially bad code.
However, as a last defense, the vfs itself is running path auditor in
all cases outside of .hg/. So I think anything passing the 'matcher'
for buggy reason would growl at the vfs layer.
New ui.graphnodetemplate option allows us to colorize a node symbol by phase
or branch,
[ui]
graphnodetemplate = {label('graphnode.{phase}', graphnode)}
[color]
graphnode.draft = yellow bold
or use a variety of unicode emoji characters, and so on. (You'll need less-481
to display non-BMP unicode character.)
[ui]
graphnodetemplate = {ifeq(obsolete, 'stable', graphnode, '\xf0\x9f\x92\xa9')}
Before this patch, "hg commit" (process A) executes steps below:
1. get current branch heads via 'repo.branchheads()'
- cache 'repo.changelog'
2. invoke 'repo.commit()'
3. acquire wlock
- invalidate 'repo.dirstate'
4. access 'repo.dirstate'
- re-read '.hg/dirstate'
- check validity of parent revisions with 'repo.changelog'
5. invoke 'repo.commitctx()'
6. acquire store lock (slock)
- invalidate 'repo.changelog'
7. do committing
8. release slock
9. release wlock
10. check new branch head (via 'cmdutil.commitstatus()')
If acquisition of wlock at (3) above waits for another "hg commit"
(process B) or so running parallelly to release wlock, process A
causes creating orphan revision, because:
- '.hg/dirstate' refers the revision, which is newly added by
process B, as its parent
- but already cached 'repo.changelog' doesn't contain such revision
- therefore, validating parents of '.hg/dirstate' at (4) above
replaces such revision with 'nullid'
Then, process A creates "orphan" revision, of which parent is "null"
revision.
In addition to it, "created new head" may be shown at the end of
process A unintentionally, if store is updated parallelly, because
both getting branch heads (1) and checking new branch head (10) are
executed outside slock scope.
To avoid this issue, this patch makes "hg commit" acquire wlock and
slock before processing.
This patch resolves the issue between "hg commit" processes, but not
one between "hg commit" and other commands. Subsequent patches resolve
the latter.
Even after this patch, there are still corner case problems below:
- filecache may overlook changes of '.hg/dirstate', and it causes
similar issue (see below for detail)
https://bz.mercurial-scm.org/show_bug.cgi?id=4368#c10
- 3rd party extension may cause similar issue, if it directly uses
'repo.commit()' without acquisition of wlock and slock
This can be fixed by acquisition of slock at the beginning of
'repo.commit()', but it seems suitable for "default" branch
In fact, acquisition of slock itself is already introduced at
"default" branch by ec227b188932, but acquisition is not at the
beginning of 'repo.commit()'.
This patch also changes some tests:
- test-fncache.t needs this tricky wrapping, to release (= forced
failure of) wlock certainly
- order of "hg commit" output is changed by widening scope of locks,
because some hooks are fired after releasing wlock
We unify the delta selection process to be a simple three options process:
- try to use the incoming delta (if lazydeltabase is on)
- try to find a suitable parents to delta against (if gd is on)
- try to delta against the tipmost revision
The first of this option that yield a valid delta will be used.
The test change in 'test-generaldelta.t' show this behavior as we use a delta
against the parent instead of a full delta when the incoming delta is not
suitable.
This as some impact on 'test-bundle.t' because a delta somewhere changes. It
does not seems to change the test semantic and have been ignored.
The currently used manifest is too small and cannot sustain a chain length
above "1". This make testing the 'lazybasedelta' behavior hard. So we add an
extra file in the manifest to help testing in the next changeset.
The semantic of existing tests have been checked and is not changed.
Previously, when extdiff was called on two changesets where
a subrepository had been removed, an unexpected KeyError would
be raised.
Now, the missing subrepository will be ignored. This behavior
mirrors the behavior in diffordiffstat from cmdutil.py line
~1138-1153. The KeyError is caught and the revision is
set to None.
try/catch of LookupError around matchmod.narrowmatcher and
sub.status is removed, as LookupError is not raised anywhere
within those methods or deeper calls.
When strip builds the list of changedfiles to pass into dirstate.rebuild, it adds
files blindly, including those that have been removed. This tests ensures that
rebuild can handle this case.
When debugrebuilddirstate --minimal is called, rebuilding the dirstate was done
outside of the appropriate rebuild function. This patch makes
debugrebuilddirstate use dirstate.rebuild.
This was done to allow our extension to become aware debugrebuilddirstate
--minimal
Debugging the dirstate helps if you have options to add files for normal lookup
or drop them from the dirstate. This patch adds a convenience command to
test-rebuilddirstate.t to modify the dirstate. It will be used in the next patch
to write proper tests for debugrebuilddirstate --minimal
before, if you ran hg graft --user ... --date ... --log ... revs,
and if it failed, it would suggest "hg graft --continue",
but if you did that, your --user / --date / --log options
were lost, because they were not persisted anywhere...
Users may spend a lot of effort writing histedit rules,
getting an abort without being told they can recover their work
is very frustrating.
Avoid that by telling them where to find their work.
It makes far more sense to leave these conflicts unresolved and kick back to
the user than to just assume that the local version be chosen. There are almost
certainly buggy scripts and applications using Mercurial in the wild that do
merges or rebases non-interactively, and then assume that if the operation
succeeded there's nothing the user needs to pay attention to.
When comparing a file that was removed at the current revision, parents used to
show grandparents instead, due to how fctx was "shifted" from the current
revision to its p1. Let's not do that.
The fix is pretty much copied from webcommands.filediff().
When I moved crecord into core, I didn't include the toggleAmend function (to
switch from commit to amend mode). I did it because it would have made it more
difficult to use record and crecord interchangably. This patch reintroduces the
amend mode for commit -i as well as two tests to verify the behavior of the
function.
As the author of several 3rd party extensions, I frequently see bug
reports from users attempting to run my extension with an old version
of Mercurial that I no longer support in my extension. Oftentimes, the
extension will import just fine. But as soon as we run extsetup(),
reposetup(), or get into the guts of a wrapped function, we encounter
an exception and abort. Today, Mercurial will print a message about
extensions that don't have a "testedwith" declaring explicit
compatibility with the current version.
The existing mechanism is a good start. But it isn't as robust as I
would like. Specifically, Mercurial assumes compatibility by default.
This means extension authors must perform compatibility checking in
their extsetup() or we wait and see if we encounter an abort at
runtime. And, compatibility checking can involve a lot of code and
lots of error checking. It's a lot of effort for extension authors.
Oftentimes, extension authors know which versions of Mercurial there
extension works on and more importantly where it is broken.
This patch introduces a magic "minimumhgversion" attribute in
extensions. When found, the extension loading mechanism will compare
the declared version against the current Mercurial version. If the
extension explicitly states we require a newer Mercurial version, a
warning is printed and the extension isn't loaded beyond importing
the Python module. This causes a graceful failure while alerting
the user of the compatibility issue.
I would be receptive to the idea of making the failure more fatal.
However, care would need to be taken to not criple every hg command.
e.g. the user may use `hg config` to fix the hgrc and if we aborted
trying to run that, the user would effectively be locked out of `hg`!
A potential future improvement to this functionality would be to catch
ImportError for the extension/module and parse the source code for
"minimumhgversion = 'XXX'" and do similar checking. This way we could
give more information about why the extension failed to load.
We have finally laid all the groundwork to make this happen.
The only change/delete conflicts that haven't been moved are .hgsubstate
conflicts. Those are trickier to deal with and well outside the scope of this
series.
We add comprehensive testing not just for the initial selections but also for
re-resolves and all possible dirstate transitions caused by merge tools. That
testing managed to shake out several bugs in the way we were handling dirstate
transitions.
The other test changes are because we now treat change/delete conflicts as
proper merges, and increment the 'merged' counter rather than the 'updated'
counter. I believe this is the right approach here.
For third-party extensions, if they're interacting with filemerge code they
might have to deal with an absentfilectx rather than a regular filectx.
Still to come:
- add a 'leave unresolved' option to merges
- change the default for non-interactive change/delete conflicts to be 'leave
unresolved'
- add debug output to go alongside debug outputs for binary and symlink file
merges
This is so much easier to read than a long string of zeroes, and we're going to
have a lot more of these nodes once change/delete conflicts are part of the
merge state.
We're going to soon compare the output of all the non-orig files before and
after a resolve, and this makes that more convenient. The .orig files are
obviously going to differ between the two.
This is somewhat different from the currently existing 'a' action, for the
following case:
- dirty working copy, with file 'fa' added and 'fm' modified
- hg merge --force with a rev that neither has 'fa' nor 'fm'
- for the change/delete conflicts we pick 'changed' for both 'fa' and 'fm'.
In this case 'branchmerge' is true, but we need to distinguish between 'fa',
which should ultimately be marked added, and 'fm', which should be marked
modified.
Our current strategy is to just not touch the dirstate at all. That works for
now, but won't work once we move change/delete conflicts to the resolve phase.
In that case we may perform repeated re-resolves, some of which might mark the
file removed or remove the file from the dirstate. We'll need to re-add the
file to the dirstate, and we need to be able to figure out whether we mark the
file added or modified. That is what the new 'am' action lets us do.
In upcoming patches we're going to move change/delete conflicts to the resolve
phase -- it will be important to see how regular conflicts interact with
change/delete ones.
The next patch will remove the progress extension completely, so we have
to pick another extension. The schemes is picked arbitrary.
This test was introduced at 57703c45ed60.
If detailed conflict markers are enabled and the closing quote gets truncated,
editors will often screw syntax highlighting up from that point because they'll
see an opening quote and think it's the beginning of a string.
In tests, the hashes change because the commit messages of the shelved bundles
also change.
This is a first (very simple) version of the histedit base action.
It works well in common usecases like rebasing the whole stack and
spliting the stack.
I don't see any obvious edge cases - but probably there is more than one.
That's why I want to keep it behind experimental.histeditng config knob
for now. I think on knob for all new histedit behaviors is better because
we will test all of them together and testers will need to turn it on only
once to get all new nice things.
Before we can add a 'base' action to histedit need to change verification
so that action can specify which steps of verification should run for it.
Also it's everything we need for the exec and stop actions implementation.
I thought about baking verification into each histedit action (so each
of them is responsible for verifying its constraints) but it felt wrong
because:
- every action would need to know its context (eg. the list of all other
actions)
- a lot of duplicated work will be added - each action will iterate through
all others
- the steps of the verification would need to be extracted and named anyway
in order to be reused
The verifyrules function grows too big now. I plan to refator it in one of
the next series.
Windows can't invoke a python script directly, so invoke sh.exe instead.
According to sid0, the output changes are due to the fact that 'f' is no longer
being passed all of the args that it was, but these changes aren't essential to
the test [1].
[1] https://selenic.com/pipermail/mercurial-devel/2015-November/075768.html
Without this, C:\path\to\test is converted into C:pathtotest.
Since $TESTTMP appears in output, seems to work in some places without quotes,
and is also used within a larger quote block (see test-rebase-collapse.t, ~line
160), I'm not sure what a check-code rule would look like (or even if it is
feasible).
The variable uniformly uses '\' separators, so the straight equality check with
'/' separating the last component fails. It also doesn't like having the quote
appear in the middle of the string when testing.
Starting with 8102a3981272, the output changed on Windows:
--- e:/Projects/hg/tests/test-context.py.out
+++ e:/Projects/hg/tests/test-context.py.err
@@ -1,4 +1,4 @@
-workingfilectx.date = (1000, 0)
+workingfilectx.date = (1000L, 0)
ASCII : Gr?ezi!
Latin-1 : Grⁿezi!
UTF-8 : Gr├╝ezi!
Since int and long are both 32 bit on Windows, this seems harmless in practice
other than the previous test failure.
The extension was failing to load on Windows because $TESTTMP contains a path
component 'test', prefixed by a path separator '\'. That combination ends up
converted to "...<tab>est...".
The other invocations aren't quoted, and Windows doesn't like the single quotes:
diff --git a/tests/test-ssh.t b/tests/test-ssh.t
--- a/tests/test-ssh.t
+++ b/tests/test-ssh.t
@@ -520,20 +520,8 @@ remote hook failure is attributed to rem
$ echo "pretxnchangegroup.fail = python:$TESTTMP/failhook:hook" >> remote/.hg/hgrc
$ hg -q --config ui.ssh="python '$TESTDIR/dummyssh'" clone ssh://user@dummy/remote hookout
+ abort: no suitable response from remote hg!
+ [255]
$ cd hookout
+ $TESTTMP.sh: line 264: cd: hookout: No such file or directory
$ touch hookfailure
- $ hg -q commit -A -m 'remote hook failure'
....
This works around a bug in older Mercurial versions' handling of the v2 merge
state.
We also add a bunch of tests that make sure that
(1) we correctly abort when the merge state has an unsupported record type
(2) aborting the merge, rebase or histedit continues to work and clears out the
merge state.
Help of status cmd defines status file of 'missing', what is
called in fileset 'deleted'. To stay consistent this patch
introduces missing() predicate which in fact is alias to
'deleted'.
This patch avoids unnecessary conflicts to resolve during rebase for the users
of changeset evolution.
This patch modifies rebase to skip obsolete commits with no successor.
It introduces a new rebase state 'revpruned' for these revisions that are
being skipped and a new message to inform the user of what is happening.
This feature is gated behind the config flag experimental.rebaseskipobsolete
When an obsolete commit is skipped, the output is:
note: not rebasing 7:360bbaa7d3ce "O", it has no successor
On my machine, whenever I run all test with a high -j value, test-convert-git.t
would consistently fail by displaying an estimate. This patch removes that value
from the output.
Before this patch, making a commit on a local repo could move a bookmark and
both operations would not be grouped as one transaction. This patch makes both
operations part of one transaction. This is necessary to switch to the new api
to save bookmarks repo._bookmarks.recordchange if we don't want to change the
current behavior of rollback.
Dirstate change happening after the commit is done is now part of the
transaction mentioned above. This leads to a change in the expected output of
several tests.
The change to test-fncache happens because both lock are now released in the
same finally clause. The lock release is made explicitly buggy in this test.
Previously releasing lock would crash triggering release of wlock that crashes
too. Now lock release crash does not directly result in the release of wlock.
Instead wlock is released at garbage collection time and the error raised at
that time "confuses" python.
We're going to add a separate record type for change/delete conflicts soon. We
need to make sure they get stored with the correct record type so that older
versions of Mercurial correctly abort when they see change/delete records.
We don't appear to print error codes elsewhere. The error codes are
inconsistent between at least Linux and OS X and are more trouble than
they are worth. Humans care about the error string more than the code
anyway.
A glob was also added to pave over differences in error strings between
Linux and OS X.
Very often in my life I'm finding that the only configured merge tool
present on the system is vimdiff[0], and it's currently impossible (as
far as I can tell) short of specifying `ui.merge = `[1] to actually
*disable* a merge tool. This allows vimdiff-haters to put:
[merge-tools]
vimdiff.disable = yes
in their ~/.hgrc and never see vimdiff again. I'm stopping short of
putting this as a commented out entry in the sample new user hgrc
(seen when a user runs `hg config --edit` with no ~/.hgrc) for now,
but I might come back and do that later.
0: vimdiff is at an awkward intersection: it's usually installed by
the vim package which is often installed as a vi substitute, so it's
mere presence doesn't imply me wanting it, unlike (say) kdiff3.
1: There's a related problem I ran into today where specifying
`ui.merge = :merge` failed because :merge isn't a command, which I
think is a regression. I'll try and figure that out and at least file
a bug.
The revset is not ready for prime time yet. However it is useful to have some
version of it exposed to help candidate users to play with it and provide
feedback on what we should aim at.
We add a small test to make sure the code runs.
Server operators that have enabled clone bundles probably want clients
to use it. This patch introduces a feature that will insert a bundle2
"output" part that advertises the existence of the clone bundles
feature to clients that aren't using it.
The server uses the "cbattempted" argument to "getbundle" to determine
whether a client supports clone bundles and to avoid sending the message
to clients that failed the clone bundle for whatever reason.
If a clone bundle persistently fails to apply, users need a way to
disable it so they have a hope of the clone working. Change the hint for
the abort scenario to advertise the config option to disable clone
bundles.
When a user's repository is in an unfinished unshelve state and they choose to
abort, at a minimum, the repo should be out of that state. We've found
situations where the user could not leave the state unless manually deleting the
state file. This fix ensures that no matter what exception may be raised during
the abort, the shelved state file will be deleted, the user will be out of the
unshelve state and they can get their repository into a workable condition.
When Mozilla enabled Pygments on hg.mozilla.org, we got a lot of weirdly
colorized files. Upon further investigation, the hightlight extension
is first attempting a filename+content based match then falling back to a
purely content-driven detection mode in Pygments. Sounds good in theory.
Unfortunately, Pygments' content-driven detection establishes no minimum
threshold for returning a lexer. Furthermore, the detection code for
a number of languages is very liberal. For example, ActionScript 3 will
return a confidence of 0.3 (out of 1.0) if the first 1k of the file
we pass in matches the regex "\w+\s*:\s*\w"! Python matches on
"import ". It's no coincidence that a number of our extension-less files
were getting highlighted improperly.
This patch adds an option to have the highlighter not fall back to
purely content-based detection when filename+content detection failed.
This can be enabled to render unlighted text instead of taking the risk
that unknown file types are highlighted incorrectly. The old behavior is
still the default.
After rebasing a set of changes onto a public changeset and having the first one
be skipped, if you try to abort, the operation fails. This fix adds a check to
disallow the target rev into the dstates list within the abort function. This
list is checked for immutable states before the rest of abort does its thing.
As obsolescence markers can contains unknown nodes and 'allsuccessors' returns
them, we have to protect again that when looking for successors of the rebase
set in the destination.
Test have been expanded to catch that.
Due to how the line links now reside outside of the source lines, hovering over
line numbers doesn't count as hovering over the appropriate source line. It can
be worked around by using a "+" css selector. However, it's necessary to
reorder the elements and put <a> before <span> (which is actually quite
logical). It works without further css tweaks because <a> is already
absolute-positioned and so the order doesn't matter visually.
In hgweb, some pages have a context of current revision; e.g. changelog and
shortlog show changesets starting from this current revision. However, some
gitweb templates were dropping current revision from some urls _to_ /graph page
and _on_ that page. This patch fixes it.
File/directory case folding collisions cannot be represented on case folding
systems and have to fail.
To detect this and abort early, utilize that for file/directory collisions, a
sorted list of case folded manifest names will have the colliding directory
right after the file.
(This could perhaps be optimized, but this way of doing it also has
directory/directory case folding in mind ... which however not is handled yet.)
Not all bundles are appropriate for all clients. For example, someone
with a slow Internet connection may want to prefer bz2 bundles over gzip
bundles because they are smaller and don't take as long to transfer.
This is information that a server cannot know on its own. So, we invent
a mechanism for "preferring" server-advertised URLs based on their
attributes.
We could invent a negotiation between client and server where the client
sends its preferences and the sorting/filtering is done server-side.
However, this feels complex. We can avoid complicating the wire protocol
and exposing ourselves to backwards compatible concerns by performing
the sorting locally.
This patch defines a new config option for expressing preferred
attributes in server-advertised bundles.
At Mozilla, we leverage this feature so clients in fast data centers
prefer uncompressed bundles. (We advertise gzip bundles first because
that is a reasonable default.)
I consider this an advanced feature. I'm on the fence as to whether it
should be documented in `hg help config`.
Server Name Indication (SNI) is commonly used in CDNs and other hosted
environments. Unfortunately, Python <2.7.9 does not support SNI and when
these older Python versions attempt to negotiate TLS to an SNI server,
they raise an opaque error like
"_ssl.c:507: error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert
handshake failure."
We introduce a manifest attribute to denote the URL requires SNI and
have clients without SNI support filter these entries.
Not all clients are capable of reading every bundle. Currently, content
negotiation to ensure a server sends a client a compatible bundle
format is performed at request time. The response bundle is dynamically
generated at request time, so this works fine.
Clone bundles are statically generated *before* the request. This means
that a modern server could produce bundles that a legacy client isn't
capable of reading. Without some kind of "type hint" in the clone
bundles manifest, a client may attempt to download an incompatible
bundle. Furthermore, a client may not realize a bundle is incompatible
until it has processed part of the bundle (imagine consuming a 1 GB
changegroup bundle2 part only to discover the bundle2 part afterwards is
incompatibl). This would waste time and resources. And it isn't very
user friendly.
Clone bundle manifests thus need to advertise the *exact* format of the
hosted bundles so clients may filter out entries that they don't know
how to read. This patch introduces that mechanism.
We introduce the BUNDLESPEC attribute to declare the "bundle
specification" of the entry. Bundle specifications are parsed using
exchange.parsebundlespecification, which uses the same strings as the
"--type" argument to `hg bundle`. The supported bundle specifications
are well defined and backwards compatible.
When a client encounters a BUNDLESPEC that is invalid or unsupported, it
silently ignores the entry.
exchange.readbundle() can return 2 different types. We weren't handling
the bundle2 case. Handle it.
At some point we'll likely want a generic API for applying a bundle from
a file handle. For now, create another one-off until we figure out what
the unified bundle API should look like (addressing this is a can of
worms I don't want to open right now).
The old code was tailored to `hg bundle` usage and not appropriate for
use as a general API, which clone bundles will require. The code has
been rewritten to make it more generally suitable.
We introduce dedicated error types to represent invalid and unsupported
bundle specifications. The reason we need dedicated error types (rather
than error.Abort) is because clone bundles will want to catch these
exception as part of filtering entries. We don't want to swallow
error.Abort on principle.
It's common for GUI or web frontend to fetch chunk of revisions per batch
size. Previously it was possible only if revisions were sorted by revision
number.
$ hg log -r 'limit({revspec} & :{last_known}, 101)'
So this patch introduces a general way to retrieve chunk of revisions after
skipping offset revisions.
$ hg log -r 'limit({revspec}, 100, {last_count})'
This is a dumb implementation. We can optimize it for baseset and spanset
later.
This patch delays writing in-memory changes out, if transaction is
running.
'_getfsnow()' is defined as a function, to hook it easily for
ambiguous timestamp tests (see also fakedirstatewritetime.py)
'if tr:' code path in this patch is still disabled at this revision,
because there is no client invoking 'dirstate.write()' with repo
object.
BTW, this patch changes 'dirstate.invalidate()' semantics around
'dirstate.write()' in a transaction scope:
before:
with repo.transaction():
dirstate.CHANGE('A')
dirstate.write() # change for A is written out here
dirstate.CHANGE('B')
dirstate.invalidate() # discards only change for B
after:
with repo.transaction():
dirstate.CHANGE('A')
dirstate.write() # change for A is still kept in memory
dirstate.CHANGE('B')
dirstate.invalidate() # discards changes for A and B
Fortunately, there is no code path expecting the former, at least, in
Mercurial itself, because 'dirstateguard' was introduced to remove
such 'dirstate.invalidate()'.
'localrepository.rollback()' explicilty restores dirstate, only if at
least one of current parents of the working directory is removed at
rollbacking (a.k.a "parent-gone").
After DirstateTransactionPlan, 'dirstate.write()' will cause marking
'.hg/dirstate' as a file to be restored at rollbacking.
https://mercurial.selenic.com/wiki/DirstateTransactionPlan
Then, 'transaction.rollback()' restores '.hg/dirstate' regardless of
parents of the working directory at that time, and this causes
unexpected dirstate changes if not "parent-gone" (e.g. "hg update" to
another branch after "hg commit" or so, then "hg rollback").
To avoid such situation, this patch restores dirstate to one before
rollbacking if not "parent-gone".
before:
b1. restore dirstate explicitly, if "parent-gone"
after:
a1. save dirstate before actual rollbacking via dirstateguard
a2. restore dirstate via 'transaction.rollback()'
a3. if "parent-gone"
- discard backup (a1)
- restore dirstate from 'undo.dirstate'
a4. otherwise, restore dirstate from backup (a1)
Even though restoring dirstate at (a3) after (a2) seems redundant,
this patch keeps this existing code path, because:
- it isn't ensured that 'dirstate.write()' was invoked at least once
while transaction running
If not, '.hg/dirstate' isn't restored at (a2).
In addition to it, rude 3rd party extension invoking
'dirstate.write()' without 'repo' while transaction running (see
subsequent patches for detail) may break consistency of a file
backup-ed by transaction.
- this patch mainly focuses on changes for DirstateTransactionPlan
Restoring dirstate at (a3) itself should be cheaper enough than
rollbacking itself. Redundancy will be removed in next step.
Newly added test is almost meaningless at this point. It will be used
to detect regression while implementing delayed dirstate write out.
On recent OS, 'stat.st_mtime' has a double precision floating point
value to represent nano seconds, but it is not wide enough for actual
file timestamp: nowadays, only 52 - 32 = 20 bit width is available for
decimal places in sec.
Therefore, casting it to 'int' may cause unexpected result. See also
changeset 8102a3981272 fixing issue4836 for detail.
For example, changed file A may be treated as "clean" unexpectedly in
steps below. "rounded now" is the value gotten by rounding via
'int(st.st_mtime)' or so.
---------------------+--------------------+------------------------
"now" | | timestamp of A (time_t)
float rounded time_t| action | FS dirstate
------ ------- ------+--------------------+-------- ---------------
N+.nnn N N | | --- ---
| update file A | N
| dirstate.normal(A) | N
N+.999 N+1 N | |
| dirstate.write() | N (*1)
| : |
| change file A | N
| : |
N+1.00 N+1 N+1 | |
| "hg status" (*2) | N N
------ ------- ------+--------------------+-------- ---------------
Timestamp N of A in dirstate isn't dropped at (*1), because "rounded
now" is N+1 at that time, even if 'st_mtime' in 'time_t' is still N.
Then, file A is unexpectedly treated as "clean" at (*2) in this case.
For consistent handling of 'stat.st_mtime', this patch makes
'pack_dirstate()' take 'now' argument not in floating point but in
integer.
This patch makes 'PyArg_ParseTuple()' in 'pack_dirstate()' use format
'i' (= checking type mismatch or overflow), even though it is ensured
that 'now' is in the range of 32bit signed integer by masking with
'_rangemask' (= 0x7fffffff) on caller side.
It should be cheaper enough than packing itself, and useful to
detect that legacy code invokes 'pack_dirstate()' with 'now' in
floating point value.
Before, when merging revisions with missing largefiles, the missing largefiles
would be fetched as a part of the merge. If that failed (for example because
the main repository temporarily was unavailable), the largefile would be left
missing. However, the next commit would abort and (seemed to) fail when
markcommitted tried to mark the standin file as normal and thus had to hash the
largefile that didn't exist. (Actually, the commit would succeed but the
largefile update that follows right after the commit transaction would abort -
quite confusing.)
To fix that, make sure that synclfdirstate only marks files as normal if they
actually exist.
Advertising that the patch are available to be pulled requires that to be true.
So we check revision availability on the remote before sending any email.
Cloning can be an expensive operation for servers because the server
generates a bundle from existing repository data at request time. For
a large repository like mozilla-central, this consumes 4+ minutes
of CPU time on the server. It also results in significant network
utilization. Multiplied by hundreds or even thousands of clients and
the ensuing load can result in difficulties scaling the Mercurial server.
Despite generation of bundles being deterministic until the next
changeset is added, the generation of bundles to service a clone request
is not cached. Each clone thus performs redundant work. This is
wasteful.
This patch introduces the "clonebundles" extension and related
client-side functionality to help alleviate this deficiency. The
client-side feature is behind an experimental flag and is not enabled by
default.
It works as follows:
1) Server operator generates a bundle and makes it available on a
server (likely HTTP).
2) Server operator defines the URL of a bundle file in a
.hg/clonebundles.manifest file.
3) Client `hg clone`ing sees the server is advertising bundle URLs.
4) Client fetches and applies the advertised bundle.
5) Client performs equivalent of `hg pull` to fetch changes made since
the bundle was created.
Essentially, the server performs the expensive work of generating a
bundle once and all subsequent clones fetch a static file from
somewhere. Scaling static file serving is a much more manageable
problem than scaling a Python application like Mercurial. Assuming your
repository grows less than 1% per day, the end result is 99+% of CPU
and network load from clones is eliminated, allowing Mercurial servers
to scale more easily. Serving static files also means data can be
transferred to clients as fast as they can consume it, rather than as
fast as servers can generate it. This makes clones faster.
Mozilla has implemented similar functionality of this patch on
hg.mozilla.org using a custom extension. We are hosting bundle files in
Amazon S3 and CloudFront (a CDN) and have successfully offloaded
>1 TB/day in data transfer from hg.mozilla.org, freeing up significant
bandwidth and CPU resources. The positive impact has been stellar and
I believe it has proved its value to be included in Mercurial core. I
feel it is important for the client-side support to be enabled in core
by default because it means that clients will get faster, more reliable
clones and will enable server operators to reduce load without
requiring any client-side configuration changes (assuming clients are
up to date, of course).
The scope of this feature is narrowly and specifically tailored to
cloning, despite "serve pulls from pre-generated bundles" being a valid
and useful feature. I would eventually like for Mercurial servers to
support transferring *all* repository data via statically hosted files.
You could imagine a server that siphons all pushed data to bundle files
and instructs clients to apply a stream of bundles to reconstruct all
repository data. This feature, while useful and powerful, is
significantly more work to implement because it requires the server
component have awareness of discovery and a mapping of which changesets
are in which files. Full, clone bundles, by contrast, are much simpler.
The wire protocol command is named "clonebundles" instead of something
more generic like "staticbundles" to leave the door open for a new, more
powerful and more generic server-side component with minimal backwards
compatibility implications. The name "bundleclone" is used by Mozilla's
extension and would cause problems since there are subtle differences
in Mozilla's extension.
Mozilla's experience with this idea has taught us that some form of
"content negotiation" is required. Not all clients will support all
bundle formats or even URLs (advanced TLS requirements, etc). To ensure
the highest uptake possible, a server needs to advertise multiple
versions of bundles and clients need to be able to choose the most
appropriate from that list one. The "attributes" in each
server-advertised entry facilitate this filtering and sorting. Their
use will become apparent in subsequent patches.
Initial inspiration and credit for the idea of cloning from static files
belongs to Augie Fackler and his "lookaside clone" extension proof of
concept.
We perform all that we can non-interactively before prompting the user for input
via their merge tool. This allows for a maximally consistent state when the user
is first prompted.
The test output changes indicate the actual behavior change happening.
The current output for a failed merge with conflict markers looks something like:
merging foo
warning: conflicts during merge.
merging foo incomplete! (edit conflicts, then use 'hg resolve --mark')
merging bar
warning: conflicts during merge.
merging bar incomplete! (edit conflicts, then use 'hg resolve --mark')
We're going to change the way merges are done to perform all premerges before
all merges, so that the output above would look like:
merging foo
merging bar
warning: conflicts during merge.
merging foo incomplete! (edit conflicts, then use 'hg resolve --mark')
warning: conflicts during merge.
merging bar incomplete! (edit conflicts, then use 'hg resolve --mark')
The 'warning: conflicts during merge' line has no context, so is pretty
confusing.
This patch will change the future output to:
merging foo
merging bar
warning: conflicts while merging foo! (edit, then use 'hg resolve --mark')
warning: conflicts while merging bar! (edit, then use 'hg resolve --mark')
The hint on how to resolve the conflicts makes this a bit unwieldy, but solving
that is tricky because we already hint that people run 'hg resolve' to retry
unresolved merges. The 'hg resolve --mark' mostly applies to conflict marker
based resolution.
This means that in ms.resolve we must call merge after calling premerge. This
doesn't yet mean that all premerges happen before any merges -- however, this
does get us closer to our goal.
The output differences are because we recompute the merge tool. The only
user-visible difference caused by this patch is that if the tool is missing
we'll print the warning twice. Not a huge deal, though.
af5de4d23fd4 introduced nice hexified display of missing nodes. It did however
also make missing 20 character revision specifications be shown as hex - very
confusing.
Users are often wrong and somehow specify revisions that don't exist. Nodes
will however rarely be missing ... and they will only look like a user provided
revision specification and be all ascii in 1 of 4*10**9.
With this change, missing revisions will only be hexified if they really look
like binary nodes. This change will thus improve the error reporting UI in the
common case and only very rarely make it confusing in the opposite direction of
how it was before.
The current setup requires to pass both a packer and, optionally, the version
of the unpacker. This is confusing and error prone as the two value cannot
mismatch. Instead, we simply grab the version from the packer. This fixes a bug
where requesting a cg2 from 'hg bundle' were reported as changegroup 1.
I should have caught that in the initial changeset but I missed it somehow.
cvsps computes the parent revisions of log entries by walking the cvs log
sorted by (rcs, revision) and by iteratively maintaining a 'versions'
dictionary which maps a (rcs, branch) pair onto the last revision seen for that
pair. When log caching is on and a log cache exists, cvsps fails to set the
parent revisions of new log entries because it does not iterate over the log
cache in the parents computation. A complication is that a file rcs can change
(move to/from the attic), with respect to its value in the log cache, if the
file is removed/added back. This patch adds an iteration over the log cache to
update the rcs of cached log entries, if changed, and to properly populate the
'versions' dictionary.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
When an user aborts a histedit, many things could go wrong. At a minimum, after
a histedit abort failure, their repository should be out of that state. We've
found situations where the user could not exit the histedit state without
manually deleting the histedit state file. This patch ensures that if any
exception happens during an abort, the histedit statefile will be deleted so
that users are out of the histedit state and can at least manually get the repo
back to a workable condition.
In the external pushrebase extension, it is valuable to be able to do some work
without taking the lock (like running expensive hooks). This enables
significantly higher commit throughput.
This patch adds an option to lazily acquire the lock. It means that all bundle2
part handlers that require writing to the repo must first call
op.gettransction(), when in this mode.
patchbomb relies on the 'hg bundle' command to generate an attached bundle using
--bundle. However, while 'hg bundle' has a --type option, patchbomb did not.
This is becoming very relevant since we are about to issue bundle2 for
general-delta repository.
This was tracked as issue4863
As we have a way for extension to add more header, we need a way for them to
actually process them. We add a basic hook point to do extra work after the
import have been committed.
As we have a way for extension to add more header, we need a way for them to
actually process them. We add a basic hook points to alter the changeset
(especially extra) before we commit. There would be more to do for a full
featured hooking, but this currently fit my needs.
This config allows to specify a public location where your changeset can be
found. It then include a dedicated patch header show a command to be used to
retrieve the change. See the test for example.
This is flagged as experimental because this feature is not safe until we have
more logic to test that:
- changeset actually exists on destination
- changeset is draft on destination.
As all this is experimental, bike shedding can happily happens before we remove
the experimental flag.
Incoming was using bundle1 in all cases, as bundle1 is restricted to
changegroup1 and does not support general delta, this can lead to significant
CPU overhead if the server is using general delta storage. We now properly
request and store a bundle2 to disk.
If the server include any output or error in the bundle, they will be stored on
disk and replayed when the bundle is read. As 'hg incoming' is going to read the
bundle right away, we call that 'good' enough and go back to the bigger plan of
having general delta on by default.
This was tracked as 4864
When users configure the default foreground or background color to
non-default (black on white) values, several hgweb styles lack
contrast for headers and table row items. This patch fixes that by
ensuring that where either foreground or background colors are
specified, both are specified.
We had some basic undocumented support for uncompressed bundle2 support. We now
have an official extensible syntax to specify both format type and compression
(eg: bzip2-v2).
In practice, this changeset introduce the 'v1' and 'v2' identifier to make it
possible to combine format and compression. The default format is still 'v1'.
We'll care about picking 'v1' or 'v2' in regard with general delta in the next
changesets.
Before this patch, 'bmstore.write()' always write in-memory bookmark
changes into '.hg/bookmarks' regardless of transaction activity.
If 'bmstore.write()' is invoked inside a transaction and it writes
changes into '.hg/bookmarks', then:
- original bookmarks aren't restored at failure of that transaction
This breaks "all or nothing" policy of the transaction.
BTW, "hg rollback" can restore bookmarks successfully even before
this patch, because original bookmarks are saved into
'.hg/journal.bookmarks' at the beginning of the transaction, and
it (actually renamed as '.hg/undo.bookmarks') is used by "hg
rollback".
- uncommitted bookmark changes are visible to other processes
This is a kind of "dirty read"
For example, 'rebase.rebase()' implies 'bmstore.write()', and it may
be executed inside the transaction of "hg unshelve". Then, intentional
aborting at the end of "hg unshelve" transaction doesn't restore
original bookmarks (this is obviously a bug).
This patch uses 'bmstore.recordchange()' instead of actual writing by
'bmstore._writerepo()', if any transaction is active
This patch also removes meaningless restoring bmstore explicitly at
the end of "hg shelve".
This patch doesn't choose fixing each 'bmstore.write()' callers as
like below, because writing similar code here and there is very
redundant.
before:
bmstore.write()
after:
tr = repo.currenttransaction()
if tr:
bmstore.recordchange(tr)
else:
bmstore.write()
Even though 'bmstore.write()' itself may have to be discarded by
putting bookmark operations into transaction scope, this patch chose
fixing it to implement "transactional dirstate" at first.