This variant gives access to a feature already present in ``internal:merge``:
displaying merge base content.
In the basic merge (calling ``hg merge``) case, including more context to the
merge markers is an interesting addition.
But this extra information is the only viable option in case conflict from
grafting (, rebase, etc…).
When grafting ``source`` on ``destination``, the parent of ``source`` is
used as the ``base``. When all three changesets add content in the same
location, the marker for ``source`` will contains both ``base`` and ``source``
content. Without the content of base exposed, there is no way for the user
to discriminate content coming from ``base`` and content commit from ``source``.
Practical example (all addition are in the same place):
* ``destination`` adds ``Dest-Content``
* ``base`` adds ``Base-Content``
* ``source`` adds ``Src-Content``
Grafting ``source`` on ``destination`` will produce the following conflict:
<<<<<<< destination
Dest-Content
=======
Base-Content
Src-Content
>>>>>>> source
This that case there is no way to distinct ``base`` from ``source``. As a result
content from ``base`` are likely to slip in the resolution result.
However, adding the base make the situation very clear:
<<<<<<< destination
Dest-Content
||||||| base
Base-Content
======= base
Base-Content
Src-Content
>>>>>>> source
Once the base is added, the addition from the grafted changeset is made clear.
User can compare the content from ``base`` and ``source`` to make an enlightened
decision during merge resolution.
When a third label is provided (to included the base content) it is properly
processed as the two others. Nothing changes if only two labels are provided.
Matt Mackall said:
The goal of simplemerge should have always been to be a drop-in
replacement for RCS merge. Please nuke this minimization thing entirely.
This whole things is now dead.
fade484cb8f6 disabled minimal for `internal:merge` but forgot to also disabled
it for premerge. This is now done.
This gives me an occasion to shamelessly includes my explanation of why this
minimisation feature must disappear:
[this is why it's pointless to reject patches with misspellings in the
description - mpm]
Detailled explanation
=====================
The ``simplemerge`` code use in ``internal:merge`` has a feature called
"minimization". It reprocess conflicting chunks to find common changes
inside them and excludes such common sections from the marker.
This approach seems a significant win at first glance but produces very
confusing results in some other cases.
Simple example
--------------
A simple example is enough to show the benefit of this feature. In this merge,
both sides change all numbers from letters to digits, but one side is also
changing some values.
$ cat << EOF > base
> Small Mathematical Series.
> One
> Two
> Three
> Four
> Five
> Hop we are done.
> EOF
$ cat << EOF > local
> Small Mathematical Series.
> 1
> 2
> 3
> 4
> 5
> Hop we are done.
> EOF
$ cat << EOF > other
> Small Mathematical Series.
> 1
> 2
> 3
> 6
> 8
> Hop we are done.
> EOF
In the minimalists case, the markers focus on the disagreement between the two
sides.
$ $TESTDIR/../contrib/simplemerge --print local base other
Small Mathematical Series.
1
2
3
<<<<<<< local
4
5
=======
6
8
>>>>>>> other
Hop we are done.
warning: conflicts during merge.
[1]
In the non minimalist case, the whole chunk is included in the conflict marker.
Making it harder spot actual differences.
$ $TESTDIR/../contrib/simplemerge --print --no-minimal local base other
Small Mathematical Series.
<<<<<<< local
1
2
3
4
5
=======
1
2
3
6
8
>>>>>>> other
Hop we are done.
warning: conflicts during merge.
[1]
Practical Advantages of minimalisation: merge of grafted change
---------------------------------------------------------------
This feature can be very useful when a change have been grafted in another
branch and then some change have been made to the grafted code.
$ cat << EOF > base
> # empty file
> EOF
$ cat << EOF > local
> def somefunction(one, two):
> some = one
> stuff = two
> are(happening)
> here()
> EOF
$ cat << EOF > other
> def somefunction(one, two):
> some = one
> change = two
> are(happening)
> here()
> EOF
The minimalist case recognises the grafted content as similar and highlight the
actual change.
$ $TESTDIR/../contrib/simplemerge --print local base other
def somefunction(one, two):
some = one
<<<<<<< local
stuff = two
=======
change = two
>>>>>>> other
are(happening)
here()
warning: conflicts during merge.
[1]
Again, the non-minimalist case produces a larger conflict. Making it harder to
spot the actual conflict.
$ $TESTDIR/../contrib/simplemerge --print --no-minimal local base other
<<<<<<< local
def somefunction(one, two):
some = one
stuff = two
are(happening)
here()
=======
def somefunction(one, two):
some = one
change = two
are(happening)
here()
>>>>>>> other
warning: conflicts during merge.
[1]
Practical disadvantage: multiple functions on each side
---------------------------------------------------------------
So, if this "minimalist" help so much, why introduce a setting to disable it?
The issue is that this minimisation will grab any common lines for breaking
chunks. This may result in partial context when solving a merge. The most
simple example is a merge where both side added some (different) functions
separated by blank lines. The "minimalist" approach will recognise the blank
line as "common" and over slice the chunks, turning a simple conflict case into
multiple pairs of conflicting functions.
$ cat << EOF > base
> # empty file
> EOF
$ cat << EOF > local
> def function1():
> bla()
> bla()
> bla()
>
> def function2():
> ble()
> ble()
> ble()
> EOF
$ cat << EOF > other
> def function3():
> bli()
> bli()
> bli()
>
> def function4():
> blo()
> blo()
> blo()
> EOF
The minimal case presents each function as a separated context.
$ $TESTDIR/../contrib/simplemerge --print local base other
<<<<<<< local
def function1():
bla()
bla()
bla()
=======
def function3():
bli()
bli()
bli()
>>>>>>> other
<<<<<<< local
def function2():
ble()
ble()
ble()
=======
def function4():
blo()
blo()
blo()
>>>>>>> other
warning: conflicts during merge.
[1]
The non-minimalist approach produces a simpler version with more context in
each block. Solving such conflicts is usually as simple as dropping the 3 lines
dedicated to markers.
$ $TESTDIR/../contrib/simplemerge --prin --no-minimal local base other
<<<<<<< local
def function1():
bla()
bla()
bla()
def function2():
ble()
ble()
ble()
=======
def function3():
bli()
bli()
bli()
def function4():
blo()
blo()
blo()
>>>>>>> other
warning: conflicts during merge.
[1]
Practical disaster: programing language have a lot of common line
=================================================================
If only blank lines between function where the only frequent content of a code
file. But programming language tend to repeat them self much more often. In that
case, the minimalist approach turns a simple conflict into a massive mess.
Consider this example where two unrelated functions are added on each side.
Those function shares common programming constructs by chance.
$ cat << EOF > base
> # empty file
> EOF
$ cat << EOF > local
> def longfunction():
> if bla:
> foo
> else:
> bar
> try:
> ret = some stuff
> except Exception:
> ret = None
> if ret is not None:
> return ret
> return 0
>
> def shortfunction(foo):
> goo()
> ret = foo + 5
> return ret
> EOF
$ cat << EOF > other
> def otherlongfunction():
> for x in xxx:
> if coin:
> break
> tutu
> else:
> bar()
> baz()
> ret = week()
> try:
> groumpf = tutu
> fool()
> except Exception:
> zoo()
> pool()
> if cond:
> return ret
>
> # some big block
> ret ** 6
> koin()
> return ret
> EOF
The minimalist approach will hash the whole conflict into small chunks that
does not match any meaningful semantic and are impossible to solve.
$ $TESTDIR/../contrib/simplemerge --print local base other
<<<<<<< local
def longfunction():
if bla:
foo
=======
def otherlongfunction():
for x in xxx:
if coin:
break
tutu
>>>>>>> other
else:
<<<<<<< local
bar
=======
bar()
baz()
ret = week()
>>>>>>> other
try:
<<<<<<< local
ret = some stuff
=======
groumpf = tutu
fool()
>>>>>>> other
except Exception:
<<<<<<< local
ret = None
if ret is not None:
=======
zoo()
pool()
if cond:
>>>>>>> other
return ret
<<<<<<< local
return 0
=======
>>>>>>> other
<<<<<<< local
def shortfunction(foo):
goo()
ret = foo + 5
=======
# some big block
ret ** 6
koin()
>>>>>>> other
return ret
warning: conflicts during merge.
[1]
The non minimalist approach will properly produce a single set of conflict
markers. Highlighting that the two chunk are unrelated. Such conflict from
unrelated content added at the same place is usually solved by dropping the
marker an keeping both content. Something impossible with minimised markers.
$ $TESTDIR/../contrib/simplemerge --prin --no-minimal local base other
<<<<<<< local
def longfunction():
if bla:
foo
else:
bar
try:
ret = some stuff
except Exception:
ret = None
if ret is not None:
return ret
return 0
def shortfunction(foo):
goo()
ret = foo + 5
return ret
=======
def otherlongfunction():
for x in xxx:
if coin:
break
tutu
else:
bar()
baz()
ret = week()
try:
groumpf = tutu
fool()
except Exception:
zoo()
pool()
if cond:
return ret
# some big block
ret ** 6
koin()
return ret
>>>>>>> other
warning: conflicts during merge.
[1]
Add a new internal:tagmerge merge tool which implements an automatic merge
algorithm for mercurial's tag files
The tagmerge algorithm is able to resolve most merge conflicts that
currently would trigger a .hgtags merge conflict. The only case that
it does not (and cannot) handle is that in which two tags point to
different revisions on each merge parent _and_ their corresponding tag
histories have the same rank (i.e. the same length). In all other
cases the merge algorithm will choose the revision belonging to the
parent with the highest ranked tag history. The merged tag history is
the combination of both tag histories (special care is taken to try to
combine common tag histories where possible).
The algorithm also handles cases in which tags have been manually
removed from the .hgtags file and other similar corner cases.
In addition to actually merging the tags from two parents, taking into
account the base, the algorithm also tries to minimize the difference
between the merged tag file and the first parent's tag file (i.e. it
tries to make the merged tag order as as similar as possible to the
first parent's tag file order).
The algorithm works as follows:
1. read the tags from p1, p2 and the base
- when reading the p1 tags, also get the line numbers associated to each
tag node (these will be used to sort the merged tags in a way that
minimizes the diff to p1). Ignore the file numbers when reading p2 and
the base
2. recover the "lost tags" (i.e. those that are found in the base but not on p1
or p2) and add them back to p1 and/or p2
- at this point the only tags that are on p1 but not on p2 are those new
tags that were introduced in p1. Same thing for the tags that are on p2
but not on p2
3. take all tags that are only on p1 or only on p2 (but not on the base)
- Note that these are the tags that were introduced between base and p1 and
between base and p2, possibly on separate clones
4. for each tag found both on p1 and p2 perform the following merge algorithm:
- the tags conflict if their tag "histories" have the same "rank" (i.e.
length) _AND_ the last (current) tag is _NOT_ the same
- for non conflicting tags:
- choose which are the high and the low ranking nodes
- the high ranking list of nodes is the one that is longer.
In case of draw favor p1
- the merged node list is made of 3 parts:
- first the nodes that are common to the beginning of both the
low and the high ranking nodes
- second the non common low ranking nodes
- finally the non common high ranking nodes (with the last one
being the merged tag node)
- note that this is equivalent to putting the whole low ranking node
list first, followed by the non common high ranking nodes
- note that during the merge we keep the "node line numbers", which will
be used when writing the merged tags to the tag file
5. write the merged tags taking into account to their positions in the first
parent (i.e. try to keep the relative ordering of the nodes that come
from p1). This minimizes the diff between the merged and the p1 tag files
This is done by using the following algorithm
- group the nodes for a given tag that must be written next to each other
- A: nodes that come from consecutive lines on p1
- B: nodes that come from p2 (i.e. whose associated line number is None)
and are next to one of the a nodes in A
- each group is associated with a line number coming from p1
- generate a "tag block" for each of the groups
- a tag block is a set of consecutive "node tag" lines belonging to the
same tag and which will be written next to each other on the merged
tags file
- sort the "tag blocks" according to their associated number line
- put blocks whose nodes come all from p2 first
- write the tag blocks in the sorted order
Notes:
- A few tests have been added to test-tag.t. These tests are very specific to
the new internal:tagmerge tool, so perhaps they should be moved to their own
test file.
- The merge algorithm was discussed in a thread on the mercurial mailing list.
In http://markmail.org/message/anqaxldup4tmgyrx a slightly different algorithm
was suggested. In it the p1 and p2 tags would have been interleaved instead of
put one before the other. It would be possible to implement that but my tests
suggest that the merge result would be more confusing and harder to understand.
As extensively detailed by Pierre-Yves[1], simplemerge's minimal
markers can result in hopeless confusion for many common merges. As it
happens, we accidentally inherited this behavior when we borrowed
simplemerge from bzr; it is not the behavior used by RCS's merge(1),
Since merge(1) (and not bzr) is what we aim to emulate when emulating
RCS's merge markers, we simply turn this feature off. This brings us
in line with the behavior of CVS, SVN, and Git as a bonus.
(NB: using conflict markers with Mercurial is discouraged.)
[1] http://markmail.org/message/wj5mh3lc46czlvld
convert glob tessa
Before this patch, 'detailed' is used as the default of '[ui]
mergemarkers'. This embeds non-ASCII characters in tags, branches,
bookmarks, author and/or commit descriptions into merged files in the
encoding specified by '--encoding' global option, 'HGENCODING' or
other locale setting environment variables.
But, if files to be merged use another encoding, this behavior breaks
consistency of encoding in merged files.
For example, ISO-2022-JP or EUC-JP are sometimes used as the file
encoding for Japanese characters, because of historical and/or
environmental reasons, even though UTF-8 or Shift-JIS are ordinarily
used as the terminal encoding.
This can't be resolved automatically, because Mercurial doesn't aware
encoding of managed files.
This patch uses 'basic' as the default of '[ui] mergemarkers' to avoid
embedding encoding sensitive characters for safety.
This patch puts '[ui] mergemarkers = detailed' into default hgrc file
for tests, to reduce changes for tests in this patch.
Before this patch, filemerge slices byte sequence directly to trim
conflict markers, but this may cause:
- splitting at intermediate multi-byte sequence
- incorrect calculation of column width (length of byte sequence is
different from columns in display in many cases)
This patch uses 'util.ellipsis' to trim custom conflict markers
correctly, even if multi-byte characters are used in them.
Before this patch, with careless configuration (missing '|firstline'
filtering for '{desc}' keyword, for example), '[ui]
mergemarkertemplate' can make conflict markers multiple lines.
For ordinary users, advantage of allowing '[ui] mergemarkertemplate'
to generate multiple lines for customizing seems to be less than
advantage of disallowing it for safety.
This patch uses only the first line of the conflict marker generated
from '[ui] mergemarkertemplate' configuration for safety.
We already have a ":" after the user name to denote the start of the
description. The current usage of quotes around the description is
problematic as the truncation to 80 chars is likely to drop the
closing quote. This may confuse syntax coloration in some editors.
Adds a labels function parameter to all the functions between merge.update and
filemerge.filemerge. This will allow commands like rebase to specify custom
marker labels.
Adds a conflict marker formatter that can produce custom conflict marker
descriptions. It can be set via ui.mergemarkertemplate. The old behavior
can be used still by setting ui.mergemarkers=basic.
The default format is similar to:
{node|short} {tag} {branch} {bookmarks} - {author}: "{desc|firstline}"
And renders as:
contextblahblah
<<<<<<< local: c7fdd7ce4652 - durham: "Fix broken stuff in my feature branch"
line from my changes
=======
line from the other changes
>>>>>>> other: a3e55d7f4d38 master - sid0: "This is a commit to master th...
morecontextblahblah
Moves the conflict marker definition up to filemerge, so it gets applied to all
merge strategies, and so in a future patch we can manipulate the markers.
We have seen some failures on Windows that could seem like the unlinks of
temporary files were failing. That could perhaps be because the merge tool
somehow still held the files open.
Instead of the bare bone os.unlink, use our util.unlink with special
rename/retry handling on Windows.
A follow-up to 6847621b4da6.
internal:merge should never be picked for merging symlinks ... but in the test
suite we have HGMERGE="internal:merge" which bypasses all the usual merge-tool
cleverness. Without any output it can be hard to figure out what happened and
where the problem is.
Simplemerge is not symlink aware and will never do the the right thing on
symlinks. It would read the symlink as a file and abort with 'No such file or
directory' on dangling symlinks.
Instead, internal:merge now simply fails to merge symlinks.
current 'filemerge.filemerge()' implementation is verfy complicated.
- it is not easy to add new internal merge tools
(only by patching on 'filemerge()', or replacing it completely)
- cleanup of temporary files is unsatisfactory
('internal:dump' does not, in fact)
this is patch for refactoring of 'filemerge()' to isolate each
internal merge tool implementations from 'filemerge()', and clean up
common part in it.
hgrc(5) already implies that this works, so we might as well support it.
Another approach would be to implement this in util.findexe(): that
would benefit other callers of findexe(), e.g. convert and anyone
calling the user's editor. But findexe() is really implemented in
both posix.py and windows.py, so this would make both of those modules
depend on util.py: not good. So keep it narrow and only for merge
tools.
These leaks may occur in environments that don't employ a reference
counting GC, i.e. PyPy.
This implies:
- changing opener(...).read() calls to opener.read(...)
- changing opener(...).write() calls to opener.write(...)
- changing open(...).read(...) to util.readfile(...)
- changing open(...).write(...) to util.writefile(...)
This allows us to provide alternate search keys for 64bit operating systems that
may have 32bit merge tools installed. Presumably it may find other uses.
ui.forcemerge is set before calling into merge or resolve commands, then unset
to prevent ui pollution for further operations.
ui.forcemerge takes precedence over HGMERGE, but mimics HGMERGE behavior if the
given --tool is not found by the merge-tools machinery. This makes it possible
to do: hg resolve --tool="python mymerge.py" FILE
With this approach, HGMERGE and ui.merge are not harmed by --tool
re.match only looks at the beginning of the merged file, and without
re.MULTILINE the file had to end with ">>>>>>> something".
Now conflict markers inside the file are found, too.
util.interpolate can be used to replace multiple items in a string all at once
(and optionally apply a function to the replacement), without worrying about
recursing:
>>> import util
>>> s = '$foo, $spam'
>>> util.interpolate(r'\$', { 'foo': 'bar', 'spam': 'eggs' }, s)
'bar, eggs'
>>> util.interpolate(r'\$', { 'foo': 'spam', 'spam': 'foo' }, s)
'spam, foo'
>>> util.interpolate(r'\$', { 'foo': 'spam', 'spam': 'foo' }, s, lambda s: s.upper())
'SPAM, FOO'
The patch also changes filemerge.py to use this new function.
tool.check is a list of check options, and can be used in place of
tool.checkchanged and tool.checkconflicts:
Equivalences:
tool.checkchanged = yes
tool.checkconflicts = no
tool.check = changed
tool.checkchanged = no
tool.checkconflicts = yes
tool.check = conflicts
tool.checkchanged = yes
tool.checkconflicts = yes
tool.check = changed, conflicts
Add _toollist() wrapper for ui.configlist() to implement this consistently.
checkchanged and checkconflicts are still supported, but check is
preferred for implementing new check options.