Before this patch, there is no easy way to detect if there are extensions
failed to load. While this is okay for most situations, chgserver is designed
to preload all extensions specified in config and does need the information.
This patch adds extensions.notloaded() to return names of extensions failed
to load.
Having exception classes more precise than 'Abort' will allow us to properly
catch "nothing to rebase" situations when we will be using 'destmerge' in
rebase.
'hg merge' refuses to pick a default destination if the working copy is not on
a head. This is a very sensible default for 'hg merge' but 'hg rebase' should
work in this situation. So we introduce a way to disable this check. It will
soon be used by rebase.
We can now specify from where the merge is performed. The experimental revset
is updated to take revisions as argument, allowing to test the feature.
This will become very useful for pick the 'rebase' default destination. For this
reason, we also exclude all descendants from the rebased set from the candidate
destinations. This descendants exclusion was not necessary for merge as default
destination would not be picked from anything else than a head.
I'm not super excited with the current error messages, but I would prefer to
delay an overall messages rework once 'hg rebase' is done getting a default
destination aligned with 'hg merge'.
While 'hg merge' will refuse to pick a default destination if the working copy
is not on a head, this will be a common and valid case for rebase. In this
case, we will need to exclude from the candidate destination all descendants
from the rebased set.
We make a step forward in that direction by removing parents of the working
copy from the candidate destinations instead of manually filtering the working
copy parent at the end of the process. This will make the extra step of
filtering descendant much simpler in a future changeset.
The manifest.manifest class has a _treeinmem member than one can
manually set to True to test that the treemanifest class works as a
drop-in replacement for manifestdict (which is mostly a requirement
for treemanifest repos to work). However, it doesn't quite work at the
moment. These tests fail:
test-largefiles-misc.t
test-rebase-newancestor.t
test-subrepo.t
test-subrepo-deep-nested-change.t
test-subrepo-recursion.t
All but test-rebase-newancestor.t fail because they trigger calls to
subdirmatcher.visitdir(), which tries to access a _excluderoots field
that does not exist on the subdirmatcher. Let's fix that by overriding
visitdir() in a similar way to how matchfn is overridden, i.e. by
prepending the directory before calling the superclass method.
Add test coverage for graft --continue without starting.
Suggest committing (or whatever the current activity is), via
wrongtooltocontinue which uses howtocontinue.
checkafterresolved allows Mercurial to suggest what command to
use next. If users try to continue the wrong command, there
wasn't a good way for the command to suggest what to do next.
Split checkmdutil into howtocontinue and checkafterresolved.
Introduce wrongtooltocontinue which handles raising an Abort with
the hint from howtocontinue.
Before this patch, 'cachedlocalrepo' always caches "visible" repoview
object, because 'cachedlocalrepo' uses "visible" repoview returned by
'hg.repository()' without any additional processing.
If the client of 'cachedlocalrepo' wants "served" repoview, some
objects to be cached are discarded unintentionally.
1. 'cachedlocalrepo' newly caches "visible" repoview object
(call it VIEW1)
2. 'cachedlocalrepo' returns VIEW1 to the client of it at 'fetch()'
3. the client gets "served" repoview object by 'filtered("served")'
on VIEW1 (call this "served" repoview VIEW2)
4. accessing to 'repo.changelog' implies:
- instantiation of changelog via 'localrepository.changelog'
- instantiation of "filtered changelog" via 'repoview.changelog'
5. "filtered changelog" above is cached in VIEW2
6. VIEW2 is discarded after processing, because there is no
reference to it
7. 'cachedlocalrepo' returns VIEW1 cached at (1) above to the
client at next 'fetch()'
8. 'filtered("served")' on VIEW1 at the client side creates new
"served" repoview again, because VIEW1 is "visible"
(call this new "served" repoview VIEW3)
9. accessing to 'repo.changelog' implies instantiation of filtered
changelog again, because "filtered changelog" is cached in
VIEW2 at (5), but not in VIEW3 currently used
10. (go to (7) above)
As described above, "served" repoview object and "filtered changelog"
cached in it are discarded always, even if the repository itself
hasn't been changed since last access.
For example, in the case of 'hgweb_mod.hgweb', "newly caching" occurs,
when:
- all cached objects are already assigned to another threads
(in this case, repoview is created in 'cachedlocalrepo.copy()')
- or, stat of '00changelog.i' is changed from last access
(in this case, repoview is created in 'cachedlocalrepo.fetch()')
once changes are pushed via HTTP, this always occurs.
The root cause of this inefficiency is that 'cachedlocalrepo' always
caches "visible" repoview object, even if the client of it wants
another view.
To make 'cachedlocalrepo' cache appropriate repoview object, this
patch adds additional filtering on the repo object returned by
'hg.repository()'. It is assumed that initial repoview object should
be already filtered by expected view.
After this patch:
- 'filtered("served")' on VIEW1 at (3)/(7) above returns VIEW1
itself, because VIEW1 is now "served", and
- VIEW2 and VIEW3 equal VIEW1
- therefore, "filtered changelog" is cached in VIEW1, and reused
intentionally
_filerev depends on the filelog implementation using revlogs and linkrevs.
Alternative implementations, like remotefilelog, do not have rev numbers, so
this call fails. Replacing it with _filenode means it doesn't rely on rev
numbers, and doesn't cost anything extra, since _filerev is using _filenode
under the hood anyway.
The "manifest" label that's used in error messages will instead be the
directory path for subdirectory manifests (not the root manifest), so
let's extract the constant to a variable already to make future
patches simpler.
When a changeset refers to a manifest revision that's not found in the
manifest log, we say "changeset refers to missing revision X", but
when a manifest refers to file revision that's not found in the
filelog, we say "X in manifests not found". The language used for
missing manifest revisions seems clearer, so let's use that for
missing filelog revisions too.
We include the "manifest" prefix on most other errors, so it seems
consistent to add them to the remaining messages too. Also, having the
"manifest" prefix will be more consistent with having the directory
prefix there when we add support for treemanifests. With the
"manifest" at the beginning, let's remove the now-redundant
"manifest" in the message itself.
In 1f64dad11884 (verify: filter messages about missing null manifests
(issue2900), 2011-07-13), we started ignoring nullid in the list of
manifest nodeids to check. Then, in 954b09a82a61 (verify: do not choke
on valid changelog without manifest, 2012-08-21), we stopped adding
nullid to the list to start with. So let's drop the left-over check
now.
Reasons:
* _crosscheckfiles(), as the name suggests, is about checking that
the set of files files mentioned in changesets match the set of
files mentioned in the manifests.
* The "checking" in _crosscheckfiles() looked rather strange, as it
just emitted an error for *every* entry in mflinkrevs. The reason
was that these were the entries remaining after the call to
_verifymanifest(). Moving all the processing of mflinkrevs into
_verifymanifest() makes it much clearer that it's the remaining
entries that are a problem.
Functional change: progress is no longer reported for "crosschecking"
of missing manifest entries. Since the crosschecking phase takes a
tiny fraction of the verification, I don't think this is a
problem. Also, any reports of "changeset refers to unknown manifest"
will now come before "crosschecking files in changesets and
manifests".
We had some real-world cases where syntax errors in Python hooks would crash
the whole process and leave it in an indeterminate state. Handle those better.
We refuse to pick a destination for a bare 'hg merge' if the working copy is not
at head. This is meant to prevent strange merge from user who forget to update.
(Moreover, such merge does not reduce actually the number of heads)
However, we were doing that as the last possible failure type. So user were
recommended to merge with an explicit head (from this bad location) if the
branch had too many heads.
We now make "not on branch heads" class of failure the first things to check
and fail on. The one test that change was actually trying to check for these
failure (and did not). The new test output is correct.
We plan to be able to reuse this function for rebase. The error
message explicitly refers to "merge" in multiple places. So we'll need
to be able to use different messages. The first step of that is to
extract all messages in a dedicated dictionary and use them
indirectly.
As a side effect it clarifies the actual function and opens the way to
various cleanups and fixes in future changesets.
Previously we would lstat the file to see if it was a file or a link before
attempting to process it. If the file happened to exist across a symlink, and if
that symlink was pointing to a network file system, that check could be very
expensive.
The new logic audit's the path to avoid symlinks before performing the lstat on
the file itself.
In our situation, this shaved 10 minutes off of certain hg updates.
300 files * (2 seconds - the network filesystem lookup time)
Previously, when we verified the parts of a path in the auditor, we would
validate the deepest directory first, then it's parent, and so on up to the
root. If there happened to be a symlink in the chain, that meant our first check
would likely traverse that symlink. In some cases that symlink might point to
a network filesystem that is expensive, and therefore this simple check could be
very slow.
The fix is to check the path parts starting at the root and working our way
down.
This has a minor performance difference in that we used to be able to short
circuit from the audit if we reached a directory that had already been checked.
Now we can't, but the cost is N dictionary look ups, where N is the number of
parts in the path, which should be fairly minor.
Currently, whitespace in command line --config options are considered
significant while whitespace in config files are not considered
significant. This diff strips the leading and trailing whitespace from
command line config options.
This matches 'hook failed' warnings.
We're also going to add hints to some of the hook load errors. Without this
change we'd have two pairs of parens for a single error message, which looks
really cluttered.
For example, ":hg:`help config.default-push`" creates an invalid link
to "hgrc.5.html#default-push" in HTML, but ":hg:`help
config.paths.default-push`" creates a valid link to
"hgrc.5.html#paths".
Previously, two changes that were nearly, but not quite, identical would result
in large merge conflict regions that looked very similar, and were thus very
confusing to users, and lead people used to other source control systems to
claim that "mercurial's merge algorithms suck". In the relatively common case
of a new file being introduced in two branches with very slight modifications,
the old behavior would show the entire file as a conflict, and it would be very
difficult for a user to determine what was going on.
In the past, mercurial attempted to solve this with a "very smart" algorithm
that would find all common lines, but this has significant problems as
described in 3d22d20aa950.
Instead, we use a "very dumb" algorithm introduced in the previous patch that
simply matches lines at the periphery of conflict regions. This minimizes most
conflict regions well, though there may still be some degenerate edge cases,
like small modification to the beginning and end of a large file.
This is necessary for hgweb to embed JSON data in HTML. JSON data must be
able to be embedded in non-UTF-8 HTML page so long as the page encoding is
compatible with ASCII.
According to RFC 7159, non-BMP character is represented as UTF-16 surrogate
pair. This function first splits an input string into an array of UTF-16
code points.
https://tools.ietf.org/html/rfc7159.html#section-7
This makes jsonescape() a thread-safe function, which is necessary for hgweb.
The initialization stuff isn't that slow:
$ python -m timeit -n1000 -s 'from mercurial import encoding as x' 'reload(x)'
original: 1000 loops, best of 3: 158 usec per loop
this patch: 1000 loops, best of 3: 214 usec per loop
compared to loading the commands module:
$ python -m timeit -n1000 -s 'from mercurial import commands as x' 'reload(x)'
1000 loops, best of 3: 1.11 msec per loop
This is slightly faster and convenient to implement a paranoid escaping.
$ python -m timeit \
-s 'from mercurial import encoding; data = str(bytearray(xrange(128)))' \
'encoding.jsonescape(data)'
original: 100000 loops, best of 3: 15.1 usec per loop
this patch: 100000 loops, best of 3: 13.7 usec per loop
Bare 'hg update' now brings you to the tipmost descendant (on the same branch).
Leaving the user on the same topological branch. The previous behavior, updating
to the tipmost changeset on the same branch could lead to jump from a
topological branch to another. This was confusing and impractical. As the only
conceivable reason for the old behavior have been address by the recently
introduce message about other heads, we can "safely" change this behavior
All test changes have been reviewed and seen a valid consequences.
Not all external extensions provide docs; if you use such an extension, you
will experience a crash if you use "hg help --keyword <word>", and <word>
happens to match the extension name.
With an ignore pattern containing a '/' and a Windows style path containing '\',
status was properly ignoring the file, but debugignore was stating that it
wasn't ignored.
When I taught debugrebuildfncache about dirlogs in ebe9dacc63ba
(treemanifests: fix streaming clone, 2016-02-04), I added a
last-minute "if 'treemanifest' in repo" guard. That should have been
checking for "... in repo.requirements". Fix that and add tests for
it.
A concern around the user experience of Mercurial is user getting stuck on there
own topological branch forever. For example, someone pulling another topological
branch, missing that message in pull asking them to merge and getting stuck on
there own local branch.
The current way to "address" this concern was for bare 'hg update' to target the
tipmost (also latest pulled) changesets and complain when the update was not
linear. That way, failure to merge newly pulled changesets would result in some
kind of failure.
Yet the failure was quite obscure, not working in all cases (eg: commit right
after pull) and the behavior was very impractical in the common case
(eg: issue4673).
To be able to change that behavior, we need to provide other ways to alert a
user stucks on one of many topological head. We do so with an extra message after
bare update:
1 other heads for branch "default"
Bookmark get its own special version:
1 other divergent bookmarks for "foobar"
There is significant room to improve the message itself, and we should augment
it with hint about how to see theses other heads or handle the situation (see
in-line comment). But having "a" message is already a significant improvement
compared to the existing situation. Once we have it we can iterate on a better
version of it. As having such message is an important step toward changing the
default destination for update and other nicety, I would like to move forward
quickly on getting such message.
This was discussed during London - October 2015 Sprint.
These options were undocumented for 3.7 because of an issue found during the
freeze (see rev e5c26e4e6ec7). This issue has now been fixed, so we can
document these options again.
checkunknown and checkignored are currently respected for updates and regular
merges, but not for certain kinds of rebases. To be precise, they aren't
respected for rebases when:
(1) we're rebasing while currently on the destination commit, and
(2) an untracked or ignored file F is currently in the working copy, and
(3) the same file F is in a source commit, and
(4) F has different contents in the source commit.
This happens because rebases set force to True when calling merge.update.
Setting force to True makes a lot of sense in general, but it turns out the
force option is overloaded: there's a deprecated '--force' option in merge that
allows you to merge in outstanding changes, including changes in untracked
files. We use the 'mergeforce' parameter to tell those two cases apart.
I think the behavior during rebases when checkunknown is 'abort' (the default)
is wrong -- we should abort on or overwrite differing untracked files, not try
to merge them in. However that currently breaks rebases by aborting in the
middle -- we need better handling for that case before we can change the
default.