Since 5a430b57b16e (update: change default destination to tipmost
descendant (issue4673) (BC), 2016-02-02), non-linear updates can no
longer happen if the user doesn't specify a destination, so we can
drop these case from the table in the docstring of merge.update().
The check is done in merge.update() already and the next few patches
will add more checks there. Some of the additional checks will need
information about the merge that will not be available in destutil.
Since commands.postincoming() catches UpdateAbort(), we need to change
merge.update() to raise that more specific exception.
This goes directly again c6bcc960108f (destupdate: move the check
related to the "clean" logic in the function, 2015-10-05), but it will
simplify the next few patches, and we can always move it out again
(preferably move, not copy) after if we still think it's better that
way.
The comment about "obsolete.background" seems to have been about
obsolete.foreground ever since it was introduced in fbdfa64ceced
(update: allow dirty update to foreground (successors), 2013-04-16),
so let's change it.
As far as I can tell, all the callers of merge.update() have been
migrated over to use destutil to find the default destination when the
revision was not specified. So it's time to delete the code for
handling a node value of None. Let's add an assertion that node is not
None, so any extensions relying on it will not silently misbehave.
The merge code works by applying the changes between an ancestor and
the working copy onto the destination. To achieve an update, it sets
the ancestor to be the parent of the working copy. To achieve a clean
update (update -C), it sets the ancestor to be the working copy itself
(so there are no changes to carry over). The logic for this was spread
out a bit. Let's move it all to one place so it's easier to follow the
reason for it. Also add some documentation.
0b5f1f2efc77 introduced handling of a crash in this case. A review comment
suggested that it was not entirely obvious that a 'dm' always would have a 'r'
for the source file.
To mitigate that risk, make the code more conservative and make less
assumptions.
Work around that 'dm' in the data model only can have one operation for the
target file, but still can have multiple and conflicting operations on the
source file where the other operation is a 'rm'. The move would thus fail with
'abort: No such file or directory'.
In this case it is "obvious" that the file should be removed, either before or
after moving it. We thus keep the 'rm' of the source file but drop the 'dm'.
This is not a pretty fix but quite "obviously" safe (famous last words...) as
it only touches a rare code path that used to crash. It is possible that it
would be better to swap the files for 'dm' as suggested on
https://bz.mercurial-scm.org/show_bug.cgi?id=5020#c13 but it is not entirely
obvious that it not just would create conflicts on the other file. That can be
revisited later.
Previously we indicated that the .hgsubstate file was dirty by adding a '+' to
the end of its hash in the wctx manifest. This made is complicated to have new
manifest implementations that rely on the node length being fixed.
In previous patches we added added and modified node placeholders, so let's use
those to indicate dirty here as well. It doesn't look like anything ever
depended on this '+' (aside from it being different to the parent), so nothing
else needed to change here.
Previously the added/modified placeholder hash for manifests generated from the
dirstate was a 21byte long string consisting of the p1 file hash plus a single
character to indicate an add or a modify. Normal hashes are only 20 bytes long.
This makes it complicated to implement more efficient manifest implementations
which rely on the hashes being fixed length.
Let's change this hash to just be 20 bytes long, and rely on the astronomical
improbability of an actual hash being these 20 bytes (just like we rely on no
hash every being the nullid).
This changes the possible behavior slightly in that the hash for all
added/modified entries in the dirstate manifest will now be the same (so simple
node comparisons would say they are equal), but we should never be doing simple
node comparisons on these nodes even with the old hashes, because they did not
accurately represent the content (i.e. two files based off the same p1 file
node, with different working copy contents would have the same hash (even with
the appended character) in the old scheme too, so we couldn't depend on the
hashes period).
As a followup to the issue4028 series, this fixes a variant of the issue
that can occur when updating with uncommited local changes.
The duplicated .hgsub warning is coming from wc.dirty(). We would previously
skip this call because it's only relevant when we're going to perform copy
tracing, which we didn't do before.
The change to the update summary line is because we now treat the rename as a
proper rename (which counts as a change), rather than an add+delete pair
(which counts as a change and a delete).
During update directories are deleted as soon as they have no entries.
But if current working directory is deleted then it cause problems
in complex commands like 'hg split'. This commit adds a warning
that will help users figure the problem faster.
I always read the name "checkcase(path)" as "do we need to check for
case folding at this path", but it's actually (I think) meant to be
read "check if the file system cares about case at this path". I'm
clearly not the only one confused by this as the dirstate has this
property:
def _checkcase(self):
return not util.checkcase(self._join('.hg'))
Maybe we should even inverse the function and call it fscasefolding()
since that's what all callers care about?
See the comment for a detailed explanation why.
Even though this is a bug, I've sent it to 'default' rather than 'stable'
because it isn't triggered in any code paths in stock Mercurial, just with the
merge driver included. For the same reason I haven't included any tests here --
the merge driver is getting a new test.
Now that we store and display merge labels in user prompts (not just
conflict markets), we should rely on labels to clarify the two sides of a
merge operation (hg merge, hg update, hg rebase etc).
"remote" is not a great name here, as it conflates "remote" as in "remote
server" with "remote" as in "the side of the merge that's further away". In
cases where you're merging the "wrong way" around, remote can even be the
"local" commit that you're merging with something pulled from the remote
server.
Now that we persist the labels, we can consistently use the labels in
prompts for the user without risk of confusion. This changes a huge amount
of command output:
This means that merge prompts like:
no tool found to merge a
keep (l)ocal, take (o)ther, or leave (u)nresolved? u
and
remote changed a which local deleted
use (c)hanged version, leave (d)eleted, or leave (u)nresolved? c
become:
no tool found to merge a
keep (l)ocal [working copy], take (o)ther [destination], or leave (u)nresolved? u
and
remote [source] changed a which local [dest] deleted
use (c)hanged version, leave (d)eleted, or leave (u)nresolved? c
where "working copy" and "destination" were supplied by the command that
requested the merge as labels for conflict markers, and thus should be
human-friendly.
In cbefa73a359814e6784a63f90b78c7afd39bc7d5, I introduced a new bug:
when a symlink points to a folder in commit A and to another folder
in commit B, while updating from A to B, Mercurial will try to use
removedir on this symlink, which will fail. This is a very bad bug,
since it basically renders symlinks to folders unusable in repos.
Added test case fails without a fix and passes with it.
This is a fix to an old problem when Mercurial got confused by an
untracked folder with the same name as one of the files in a commit
hg was trying to update to. It is pretty safe to remove this folder if
it is empty. Backing up an empty folder seems to go against Mercurial's
"don't track dirs" philosophy.
All versions of Python we support or hope to support make the hash
functions available in the same way under the same name, so we may as
well drop the util forwards.
These messages have been overlooked by check-code, because they start
with non-alphabet character (' ').
Making these messages translatable seems reasonable, because all other
'ui.note()'-ed messages in calculateupdates() are already
translatable.
This is also a part of preparation for making "missing _() in ui
message" detection of check-code more exact.
When we introduce the develwarning, we did not had an official deprecation API
and infrastructure. We can now officially deprecate the old way with a version
deadline.
We permit the caller of merge operations to supply labels for the merge
parts ("local", "other", and optionally "base"). These labels are used in
conflict markers to reduce confusion; however, the labels were not
persistent, so 'hg resolve' would lose the labels.
Store the labels in the mergestate.
I think it's both simpler and clearer to use any() than the current
for loop.
While at it, also drop the call to sorted(), since it doesn't matter
which order we iterate over subrepos.
As 70dbbec70c4c demonstrated with stream clones, closing files on
background threads on Windows can yield a significant speedup
because closing files that have been created/appended to is slow
on Windows/NTFS.
Working directory updates can write thousands of files. Therefore it
is susceptible to excessive slowness on Windows due to slow file
closes.
This patch enables background file closing when performing working
directory file writes. The impact when performing an `hg up tip` on
mozilla-central (136,357 files) from an empty working directory is
significant:
Before: 535s (8:55)
After: 133s (2:13)
Delta: -402s (6:42)
That's a 4x speedup!
By comparison, that same machine can perform the same operation
in ~15s on Linux. So Windows went from ~35x to ~9x slower. Not bad
but there's still work to do.
As a reminder, background file closing is only activated on Windows
because it is only beneficial on that platform. So this patch
shouldn't change non-Windows behavior at all.
It's worth noting that non-Windows systems perform working directory
updates with multiple processes. Unfortunately, worker.py doesn't
yet support Windows. So, there is still plenty of room for making
working directory updates faster on Windows. Even if multiple
processes are used on Windows, I believe background file closing
will still provide a benefit, as individual processes will still
be slowed down by the file close bottleneck (assuming the I/O system
isn't saturated).
Previously we would lstat the file to see if it was a file or a link before
attempting to process it. If the file happened to exist across a symlink, and if
that symlink was pointing to a network file system, that check could be very
expensive.
The new logic audit's the path to avoid symlinks before performing the lstat on
the file itself.
In our situation, this shaved 10 minutes off of certain hg updates.
300 files * (2 seconds - the network filesystem lookup time)
checkunknown and checkignored are currently respected for updates and regular
merges, but not for certain kinds of rebases. To be precise, they aren't
respected for rebases when:
(1) we're rebasing while currently on the destination commit, and
(2) an untracked or ignored file F is currently in the working copy, and
(3) the same file F is in a source commit, and
(4) F has different contents in the source commit.
This happens because rebases set force to True when calling merge.update.
Setting force to True makes a lot of sense in general, but it turns out the
force option is overloaded: there's a deprecated '--force' option in merge that
allows you to merge in outstanding changes, including changes in untracked
files. We use the 'mergeforce' parameter to tell those two cases apart.
I think the behavior during rebases when checkunknown is 'abort' (the default)
is wrong -- we should abort on or overwrite differing untracked files, not try
to merge them in. However that currently breaks rebases by aborting in the
middle -- we need better handling for that case before we can change the
default.
In an upcoming patch we'll have different behavior here for when 'merge
--force' is used as opposed to when other kinds of force operations are
performed, like rebases.
During a merge, each file has a current commitnode+filenode, an other
commitnode+filenode, and an ancestor commitnode+filenode. The ancestor
commitnode is not stored though, and we rely on the ability for the filectx() to
look up the commitnode by using the filenode's linkrev. In alternative backends
(like remotefilelog), linkrevs may have restriction that prevent arbitrary
linkrev look up given a filenode.
This patch accounts for that by storing the ancestor commitnode in
the merge state so that it is available later at resolve time.
This results in some test changes because the ancestor commitnode we're using at
resolve time changes slightly. Before, we used the linkrev commit, which is the
earliest commit that introduced that particular filenode (which may not be the
latest common ancestor of the commits being merged). Now we use the latest
common ancestor of the merged commits as the commitnode. This is fine though,
because that commit contains the same filenode as the linkrev'd commit.
In future commits we will want to store more data related to each file in the
merge state. This patch adds an optional record for storing a dictionary of
extras for each file.
In my patch series ending with rev c3f2eede6938 I switched most change/delete
conflicts to be handled at the resolve layer. .hgsubstate was the one file that
we weren't able to handle, so we kept the old code path around for it.
The old code path added .hgsubstate to one of the other lists as the user
specifies, including possibly the 'g' list.
Now since we did this check after converting the actions from being keyed by
file to being keyed by action type, there was nothing that actually removed
.hgsubstate from the 'cd' or 'dc' lists. This meant that the file would
eventually make its way into the 'mergeactions' list, now freshly augmented
with 'cd' and 'dc' actions.
We call subrepo.submerge for both 'g' actions and merge actions.
This means that if the resolution to an .hgsubstate change/delete conflict was
to add it to the 'g' list, subrepo.submerge would be called twice. It turns out
that this doesn't cause any adverse effects on Linux due to caching, but
apparently breaks on other operating systems including Windows.
The fix here moves this to before we convert the actions over. This ensures
that it .hgsubstate doesn't make its way into multiple lists.
The real fix here is going to be:
(1) move .hgsubstate conflict resolution into the resolve layer, and
(2) use a real data structure for the actions rather than shuffling data around
between lists and dictionaries: we need a hash (or prefix-based) index by
file and a list index by action type.
There's a very tiny behavior change here: collision detection on
case-insensitive systems will happen after this is resolved, not before. I think
this is the right change -- .hgsubstate could theoretically collide with other
files -- but in any case it makes no practical difference.
Thanks to Yuya Nishihara for investigating this.