The old merge code would call manifestmerge and calculate the complete diff
between the source to the destination. In many cases, like rebase, the vast
majority of differences between the source and destination are irrelevant
because they are differences between the destination and the common ancestor
only, and therefore don't affect the merge. Since most actions are 'keep', all
the effort to compute them is wasted.
Instead, let's compute the difference between the source and the common ancestor
and only perform the diff of those files against the merge destination. When
using treemanifest, this lets us avoid loading almost the entire tree when
rebasing from a very old ancestor. This speeds up rebase of an old stack of 27
commits by 20x.
In mozilla-central, without treemanifest, when rebasing a commit from
default~100000 to default, this speeds up the manifestmerge step from 2.6s to
1.2s. However, the additional diff adds an overhead to all manifestmerge calls,
especially for flat manifests. When rebasing a commit from default~1 to default
it appears to add 100ms in mozilla-central. While we could put this optimization
behind a flag, I think the fact that it makes merge O(number of changes being
applied) instead of O(number of changes between X and Y) justifies it.
Now that we store and display merge labels in user prompts (not just
conflict markets), we should rely on labels to clarify the two sides of a
merge operation (hg merge, hg update, hg rebase etc).
"remote" is not a great name here, as it conflates "remote" as in "remote
server" with "remote" as in "the side of the merge that's further away". In
cases where you're merging the "wrong way" around, remote can even be the
"local" commit that you're merging with something pulled from the remote
server.
It makes far more sense to leave these conflicts unresolved and kick back to
the user than to just assume that the local version be chosen. There are almost
certainly buggy scripts and applications using Mercurial in the wild that do
merges or rebases non-interactively, and then assume that if the operation
succeeded there's nothing the user needs to pay attention to.
(This wasn't possible earlier because there was no way to re-resolve
change/delete conflicts -- but now it is.)
We have finally laid all the groundwork to make this happen.
The only change/delete conflicts that haven't been moved are .hgsubstate
conflicts. Those are trickier to deal with and well outside the scope of this
series.
We add comprehensive testing not just for the initial selections but also for
re-resolves and all possible dirstate transitions caused by merge tools. That
testing managed to shake out several bugs in the way we were handling dirstate
transitions.
The other test changes are because we now treat change/delete conflicts as
proper merges, and increment the 'merged' counter rather than the 'updated'
counter. I believe this is the right approach here.
For third-party extensions, if they're interacting with filemerge code they
might have to deal with an absentfilectx rather than a regular filectx.
Still to come:
- add a 'leave unresolved' option to merges
- change the default for non-interactive change/delete conflicts to be 'leave
unresolved'
- add debug output to go alongside debug outputs for binary and symlink file
merges
This is somewhat different from the currently existing 'a' action, for the
following case:
- dirty working copy, with file 'fa' added and 'fm' modified
- hg merge --force with a rev that neither has 'fa' nor 'fm'
- for the change/delete conflicts we pick 'changed' for both 'fa' and 'fm'.
In this case 'branchmerge' is true, but we need to distinguish between 'fa',
which should ultimately be marked added, and 'fm', which should be marked
modified.
Our current strategy is to just not touch the dirstate at all. That works for
now, but won't work once we move change/delete conflicts to the resolve phase.
In that case we may perform repeated re-resolves, some of which might mark the
file removed or remove the file from the dirstate. We'll need to re-add the
file to the dirstate, and we need to be able to figure out whether we mark the
file added or modified. That is what the new 'am' action lets us do.
It's unclear why everything from the first 'updating:' line should be
ignored. The arbitrariness makes it confusing that changing the code
so e.g. the 'getting 8/f' line is printed later makes it disappear
completely from the ouput. The list of 'preserving x for resolve of y'
seems convered by the subsequent for loop in the test case. Perhaps
it's only copies that are of interests, so let's keep only that part.
I actually wanted to reduce the amount of code around the call to
applyupdates(), so I tried moving these warnings a little earlier, and
I think it makes the output make a little more sense (see changes to
test cases).
We perform all that we can non-interactively before prompting the user for input
via their merge tool. This allows for a maximally consistent state when the user
is first prompted.
The test output changes indicate the actual behavior change happening.
This means that in ms.resolve we must call merge after calling premerge. This
doesn't yet mean that all premerges happen before any merges -- however, this
does get us closer to our goal.
The output differences are because we recompute the merge tool. The only
user-visible difference caused by this patch is that if the tool is missing
we'll print the warning twice. Not a huge deal, though.
When the progress extension is not enabled, each call to 'ui.progress' used to
issue a debug message. This results is a very verbose output and often redundant
in tests. Dropping it makes tests less volatile to factor they do not meant to
test.
We had to alter the sed trick in 'test-rename-merge2.t'. Sed is used to drop all
output from a certain point and hidding the progress output remove its anchor.
So we anchor on something else.
The '~' in the bug report is being expanded to a path with Windows style slashes
before being passed to shellquote() via util.shellquote(). But shlex.split()
strips '\' out of the string, leaving an invalid path in dispatch.aliasargs().
This regressed in 72640182118e.
For now, the tests need to be conditionalized for Windows (because those paths
are quoted). In the future, a more complex regex could probably skip the quotes
if all component separators are double '\'. I opted to glob away the quotes in
test-rename-merge2.t and test-up-local-change.t (which only exist on Windows),
because they are in very large blocks of output and there are way too many diffs
to conditionalize with #if directives. Maybe the entire path should be globbed
away like the following paths in each changed line. Or, letting #if directives
sit in the middle of the output as was mentioned a few months back would work
too.
Unfortunately, I couldn't figure out how to test the specific bug. All of the
'hg serve' tests have a #require serve declaration, causing them to be skipped
on Windows. Adding an alias for 'expandtest = outgoing ~/bogusrepo' prints the
repo as '$TESTTMP/bogusrepo', so the test runner must be changing the
environment somehow.
Windows and OpenVMS use double quote for shell quoting, posix uses
single quote. Since the other test lines added in 3ba9d147e92e don't
include the quotes, this was presumably an oversight.
It is easier to reason about certain algorithms in terms of a
file->action mapping than the current action->list-of-files. Bid merge
is already written this way (but with a list of actions per file), and
largefiles' overridecalculateupdates() will also benefit. However,
that requires us to have at most one action per file. That requirement
is currently violated by 'dr' (divergent rename) and 'rd' (rename and
delete) actions, which can exist for the same file as some other
action.
These actions are only used for displaying warnings to the user; they
don't change anything in the working copy or the dirstate. In this
way, they are similar to the 'k' (keep) action. However, they are even
less action-like than 'k' is: 'k' at least describes what to do with
the file ("do nothing"), while 'dr' and 'rd' or only annotations for
files for which there may exist other, "real" actions.
As a first step towards separating these acitons out, stop including
them in the progress output, just like we already exclude the 'k'
action.
Most merge action messages don't describe the action itself, they
describe the reason the action was taken. The only exeption is the 'k'
action, for which the message is just "keep" and instead there is a
code comment folling it that says "remote unchanged". Let's move that
comment into the merge action message.
We can further limit the scope of the 2-way merge case by breaking out
the case where the file was not created from scratch on both sides but
rather renamed in the same way (and is therefore a 3-way merge). This
involves copying some code, but it makes it clearer which case the
"Note:" in the code refers to.
When 'f' is not in 'ma', 'a' will be 'nullid' and all the if/elif
conditions that check whether some one nodeid is equal to 'a' will
fail, and the else-clause will instead apply. We can make that more
explicit by creating a separate 'm' action for the case where 'a' is
'nullid'. While it does mean copying some code, perhaps it makes it a
little clearer which codepaths are possible, and which cases the
"Note:" in the code refers to. It also lets us make the debug action
messages a little more specific.
Preparing for action list split-up, making sure the final change don't have any
test changes.
The patch moves debug statements around without really changing anything.
Arguably, it temporarily makes the code worse. The only justification is that
it makes it easier to review the test changes ... and in the end the big change
will not change test output at all.
The changes to test output are due to changes in the ordering of debug output.
That is mainly because we now do the debug logging for files when we actually
process them. Files are also processed in a slightly different but still
correct order. It is now primarily ordered by action type, secondarily by
filename.
The patch introduces some redundancy. Some of it will be removed again, some of
it will in the end help code readability and efficiency. It is possible that we
later on could introduce a "process this action list and do some logging and
progress reporting and apply this function".
The "preserving X for resolve" debug statements will only have single space
indentation. It will no longer have a leading single space indented "f: msg ->
m" message. Having this message double indented would thus no longer make
sense.
The bid actions will temporarily be sorted using a custom sort key that happens
to match the sort order the simplified code will have in the end.
The ordering of actions matters. Normal file system semantics is that files
have to be removed before a directory with the same name can be created.
Before the first ordering key was to have 'r' and 'f' actions come first,
secondary key was the filename.
Because of future refactorings we want to consistently have all action types
(with a sensible priority) as separate first keys. Grouped by action type, we
sort by filename.
Not processing in strict filename order could give worse performance,
especially on spinning disks. That is however primarily an issue in the cases
where "all" actions are of the same kind and will be grouped together anyway.
Such a 'keep' action will later be the preferred (non)action when there
is multiple ancestors. It is thus very convenient to have it explicitly.
The extra actions will only be emitted in the case where the local file has
changed since the ancestor but the other hasn't. That is the symmetrical
operation to a 'get' action.
This will create more action tuples that not really serve a purpose. The number
of actions will however have the number of changed files as upper bound and it
should thus not increase the memory/cpu use significantly.
"merge.applyupdates()" sorts "actions" in removal first order, and
"workeractions" derived from it should be also sorted.
If each actions in "workeractions" are executed in serial, this
sorting ensures that merging/updating process is collision free,
because updating the file in target context is always executed after
removing the existing file which causes case-folding collision against
the former.
In the other hand, if each actions are executed in parallel, updating
on a worker process may be executed before removing on another worker
process, because "worker.partition()" partitions list of actions
regardless of type of each actions.
This patch divides "workeractions" into removing and updating, and
executes the former first.
This patch still scans "actions"/"workeractions" some times for ease
of patch review, even though large list may cost much in this way.
(total cost should be as same as before)
This also changes some tests, because dividing "workeractions" affects
progress indication.
Show messages at a point where the actions have been sorted, thus preparing for
backout of 14f4258e3526.
This makes manifestmerge more of a silent operation, just like 'copies' is.
Indent 'preserving' messages to make them subordinate to the action logging so
they fit in the new context. (The 'preserving' messages are quite redundant and
could also be removed completely.)
Preparing for backout of 14f4258e3526.
The number of prompts will for all relevant cases be significantly smaller than
the total number of files in the manifests. We can thus afford to sort the
prompts more than we can afford to sort the manifests.
The -> in debug messages is currently overloaded to mean both source to dest
and dest to source. To fix this, we add explicit labels and make the arrow
direction consistent.
Many tests didn't change back from subdirectories at the end of the tests ...
and they don't have to. The missing 'cd ..' could always be added when another
test case is added to the test file.
This change do that tests (99.5%) consistently end up in $TESTDIR where they
started, thus making it simpler to extend them or move them around.
ui.forcemerge is set before calling into merge or resolve commands, then unset
to prevent ui pollution for further operations.
ui.forcemerge takes precedence over HGMERGE, but mimics HGMERGE behavior if the
given --tool is not found by the merge-tools machinery. This makes it possible
to do: hg resolve --tool="python mymerge.py" FILE
With this approach, HGMERGE and ui.merge are not harmed by --tool
See the Hg Book on why we actually want to detect this case:
http://hgbook.red-bean.com/read/mercurial-in-daily-use.html#id364290
Before:
$ hg up deadbeef
warning: detected divergent renames of X to:
...
After:
$ hg up deadbeef
note: possible conflict - X was renamed multiple times to:
...
No functionality change.