Some code branches and exceptional circumstances such as empty
mergestate files could result in mergestate._local and
mergestate._other not being defined or reset to None. These variables
are now correctly set to None when they should be.
The current situation is a bit of a layering violation as
merge-specific knowledge is pushed down to lower layers and leaks
merge assumptions into other code paths.
Here, we simply silence the warning with a hack. Both the warning and
the hack will probably go away in the near future when bid merge is
made the default.
Bid merge is a new rarely used feature that the user explicitly enabled - we
should tell/warn when the user actually is using it, just like we tell when we
not are using it.
Give a message like
note: merging 3b08d01b0ab5+ and adfe50279922 using bids from ancestors 0f6b37dbe527 and 40663881a6dd
The basic idea is to do the merge planning with all the available ancestors,
consider the resulting actions as "bids", make an "auction" and
automatically pick the most favourable action for each file.
This implements the basic functionality and will only consider "keep" and
"get" actions. The heuristics for picking the best action can be tweaked later
on.
By default it will only pass ctx.ancestor as the single ancestor to
calculateupdates. The code path for merging with a single ancestor is not
changed.
Such a 'keep' action will later be the preferred (non)action when there
is multiple ancestors. It is thus very convenient to have it explicitly.
The extra actions will only be emitted in the case where the local file has
changed since the ancestor but the other hasn't. That is the symmetrical
operation to a 'get' action.
This will create more action tuples that not really serve a purpose. The number
of actions will however have the number of changed files as upper bound and it
should thus not increase the memory/cpu use significantly.
test-merge-types.t changes a bit in flag merging. It relied on the
implementation detail that 100% identical revlog entries are reused. The revlog
reuse did that fctx.ancestor() saw an ancestor where there really not was one.
Before this patch, "contrib/check-code.py" can't detect these
problems, because the regexp pattern to detect "% inside _()" doesn't
suppose the case that the format string and "%" aren't placed in the
same line.
This patch replaces "\s" in that regexp pattern with "[ \t\n]" to
detect "% inside _()" problems in such case.
"[\s\n]" can't be used in this purpose, because "\s" is automatically
replaced with "[ \t]" by "_preparepats()" and "\s" in "[]" causes
nested "[]" unexpectedly.
We can use the "other" data from the recorded merge state instead of inferring
what the other could be from working copy parent. This will allow resolve to
fulfil its duty even when the second parent have been dropped.
Most direct benefit is fixing a regression in backout.
This data is mostly redundant with the "other" changeset node (+ other changeset
file path). However, more data never hurt.
The old format do not store it so this require some dancing to add and remove it
on demand.
When we have to fallback to the old version of the file, we infer the
"other" from current working directory parent. The same way it is currently done
in the resolve command. This is know to have shortcoming… but we cannot do
better from the data contained in the old file format. This is actually the
motivation to add this new file format.
We need to record the merge we were merging with. This solve multiple
bug with resolve when dropping the second parent after a merge. This
happen a lot when doing special merge (overriding the ancestor).
Backout, shelve, rebase, etc. can takes advantage of it.
This changeset just add the information in the merge state. We'll use it in the
resolve process in a later changeset.
This new format will allow us to address common bugs while doing special merge
(graft, backout, rebase…) and record user choice during conflict resolution.
The format is open so we can add more record for future usage.
This file still store hexified version of node to help human willing to debug
it by hand. The overhead or oversize are not expected be an issue.
The old format is still used. It will be written to disk along side the newer
format. And at parse time we detect if the data from old version of the
mergestate are different from the one in the new version file. If its the same,
both have most likely be written at the same time and you can trust the extra
data from the new file. If it differs, the old file have been written by an
older version of mercurial that did not knew about the new file. In that case we
use the content of the old file.
The format of the file is unchanged. But we are preparing a new file with a new
format that would be record based. So we change all the read/write logic to
handle a list of record until a very low level. This will allow simple plugging
of the new format in the current code.
Merge could overwrite untracked files and cause data loss.
Instead we now handle the 'local side removed file and has untracked file
instead' case as the 'other side added file that local has untracked' case:
FILE: untracked file exists
abort: untracked files in working directory differ from files in requested revision
It could perhaps make sense to create .orig files when overwriting, either
instead of aborting or when overwriting anyway because of force ... but for now
we stay consistent with similar cases.
The tests shows no real changes because of this ... but there must be some
weird corner cases where using the right ancestor for the merge planning is
better than using the wrong one.
Previously, a bare update would ignore any successor changesets thus
potentially leaving you on an obsolete head. This happens commonly when there
is an old bookmark that hasn't been moved forward which is the motivating
reason for this patch series.
Now, we will check for successor changesets if two conditions hold: 1) we are
doing a bare update 2) *and* we are currently on an obsolete head.
If we are in this situation, then we calculate the branchtip of the successor
set and update to that changeset.
Tests coverage has been added.
Forgets need to be in the beginning of the action list, same as removes. This
lets us avoid clashes in the dirstate where a directory is forgotten and a
file with the same name is added, or vice versa.
hg update . (or equivalents) are effectively no-ops in just about all
circumstances. These sorts of updates can be especially common in a
bookmark-oriented workflow. This saves us a status check and a manifest
decompression, which means that on a repo with over 210,000 files, this brings
hg update . down from 2.5 seconds to 0.15.
There is one change in behavior: a file that was added, not committed, and then
deleted but not removed used to be removed from the dirstate. With this patch
it isn't. This is what causes the change in test-mq-qpush-exact.t. This seems
like it's enough of an edge case to not be worth handling.
The output of test-empty.t changes because those files are not yet created.
Previously, the error message for a dirty non-linear update was the same (and
relatively unhelpful) whether or not a rev was specified. This patch and an
upcoming one will introduce separate, more helpful hints.
To figure out what to do with locally unknown files, Mercurial attempts to read
them if they exist. When an attempt is made to read a file that exists but
traverses a symlink, Mercurial aborts.
With this patch, we first ensure that the file doesn't traverse a symlink
before opening it. This is fine because a file being "remote created" means the
symlink doesn't exist remotely, which means it will be deleted in the apply
phase.
Before this patch, case-folding collision detection uses
"copies.pathcopies()" before "manifestmerge()", and is not aware of
renaming in some cases.
For example, in the case of issue3452, "copies.pathcopies()" can't
detect renaming, if the file is renamed at the revision before common
ancestor of merging. So, "hg merge" is aborted unexpectedly on case
insensitive filesystem.
This patch fully rewrites case-folding collision detection, and
relocate it into "manifestmerge()".
New implementation uses list of actions held in "actions" and
"prompts" to build provisional merged manifest up.
Provisional merged manifest should be correct, if actions required to
build merge result up in working directory are listed up in "actions"
and "prompts" correctly.
This patch checks case-folding collision still before prompting for
merge, to avoid aborting after some interactions with users. So, this
assumes that user would choose not "deleted" but "changed".
This patch also changes existing abort message, because sorting before
collision detection changes order of checked files.
"merge.applyupdates()" sorts "actions" in removal first order, and
"workeractions" derived from it should be also sorted.
If each actions in "workeractions" are executed in serial, this
sorting ensures that merging/updating process is collision free,
because updating the file in target context is always executed after
removing the existing file which causes case-folding collision against
the former.
In the other hand, if each actions are executed in parallel, updating
on a worker process may be executed before removing on another worker
process, because "worker.partition()" partitions list of actions
regardless of type of each actions.
This patch divides "workeractions" into removing and updating, and
executes the former first.
This patch still scans "actions"/"workeractions" some times for ease
of patch review, even though large list may cost much in this way.
(total cost should be as same as before)
This also changes some tests, because dividing "workeractions" affects
progress indication.
Update to "foreground" are no longer seen as cross branch update. "Foreground"
are descendants or successors (or successors of descendants (or descendant of
successors (etc))). This allows to update with uncommited changes that get
automatically merged.
This changeset is a small step forward. We want to allow dirty update to
"background" (precursors) and takes obsolescence in account when finding the
default update destination. But those requires deeper changes and will comes in
later changesets.
This can happen when a file with flags is removed or deleted in the working
directory and also not present in m2. The obvious solution is to add a
__delitem__ override to manifestdict that removes the file from flags if
necessary, but that has a significant performance cost in some cases, e.g.
hg status --rev rev1 --rev rev2 <file>.
This patch improves manifestmerge performance significantly.
In a repository with 170,000 files, the following results were observed on a
clean working directory. Revision '.' adds one file.
hg perfmergecalculate -r .
- before: 0.41 seconds
- after: 0.13 seconds
hg perfmergecalculate -r .^
- before: 0.53 seconds
- after: 0.24 seconds
Comparing against '.' is much faster than comparing against '.^' because with
'.', the wctx and p2 manifest strings have the same identity, so comparisons
are simply pointer equality. With '.^', the strings have different identities
so we need to perform memcmps.
Any operation that uses manifestmerge benefits.
- hg update . goes from 2.04 seconds to 1.75
- hg update .^ goes from 2.52 seconds to 2.25
- hg rebase -r . -d .~6 (involves 4 merges) goes from 11.8 seconds to 10.8
When a series of commits first adds a file and then removes it,
hg rebase --collapse prompts whether to keep the file or delete it. This is
due to it reusing the branch merge code. In a noninteractive terminal it
defaults to keeping the file, which results in a collapsed commit that is
has a file that should be deleted. This bug resulted in developers accidentally
commiting unintentional changes to our repo twice today, so it's fairly
important to get fixed.
This change allows rebase --collapse to tell the merge code to accept the
latest version every time without prompting.
Adds a test as well.
If the manifest of an earlier revision on the same delta chain is read before
that of a later revision, the revlog remembers that we parsed the earlier
revision and continues applying deltas from there onwards. If manifests are
parsed the other way round, we have to start over from the fulltext.
For a fresh clone of mozilla-central, updating from 29dd80c95b7d to its parent
aab96936a177 requires approximately 400 fewer zlib.decompress calls, which
results in a speedup from 1.10 seconds to 1.05.
_forgetremoved can trigger manifest construction, but we only want it to
happen after manifestmerge, so that our attempt to read the manifests in the
right order in an upcoming patch actually works.
This has a big effect on the performance of working dir updates.
Here are the results of update from null to the given rev in several
repos, on a Linux 3.2 system with 32 cores running ext4, with the progress
extension enabled.
repo rev plain parallel speedup
hg 76a842fb67cc 0.9 0.3 3
mozilla-central fe1600b22c77 42.8 7.7 5.5
linux-2.6 9ef4b770e069 31.4 4.9 6.4
Instead of a monotonic count, getupdates yields the number of files it
has updated since it last reported, and its caller sums the numbers when
updating progress.
Once we run these updates in parallel, this will allow worker processes
to report progress less often, reducing overhead.
In an upcoming patch, we will update .hgsubstate in a non-interactive
worker process. Merges of subrepo contents will still need to occur in the
master process (since they may be interactive), so we move that code into
a place where it will always run in what will become the master process.
This replaces the _checkunknown call in calculateupdates with a more
performant version. On a repository with over 150,000 files, this speeds up an
update by 0.6-0.8 seconds, which is up to 25%.
This does not introduce any UI changes. There is existing test coverage for
every case, mostly in test-merge*.t.
Show messages at a point where the actions have been sorted, thus preparing for
backout of 14f4258e3526.
This makes manifestmerge more of a silent operation, just like 'copies' is.
Indent 'preserving' messages to make them subordinate to the action logging so
they fit in the new context. (The 'preserving' messages are quite redundant and
could also be removed completely.)
Preparing for backout of 14f4258e3526.
The number of prompts will for all relevant cases be significantly smaller than
the total number of files in the manifests. We can thus afford to sort the
prompts more than we can afford to sort the manifests.
Add sorted() in places found by testing with PYTHONHASHSEED=random and code
inspection.
An alternative to sprinkling sorted() all over would be to change substate to a
custom dict with sorted iterators...
The 'x' flag and the 'l' flag are very different. It is usually not a problem
to change the 'x' flag of a normal file independent of the content, but one
does not simply change the type of a file to 'l' independent of the content.
This removes the fmerge function that merged both 'x' and 'l' independent of
content early in the merge process. This correctly introduces some conflicts
instead of silent incorrect merges. 3-way flag merge will now be done in the
resolve process, right next to file content merge. Conflicts can thus be
resolved with (slightly inconvenient) resolve commands like 'resolve f --tool
internal:other'. It thus brings us closer to be able to re-solve manifest merge
after the merge and avoid prompts during merge.
This also removes the "conflicting flags for a - (n)one, e(x)ec or sym(l)ink?"
prompt that nobody could answer and that made it easy to mix symlink targets
and file contents up. Instead it will give a file merge where a sufficiently
clever merge tool can help resolving the issue.
The early prescan for move/remove and removal of moved files in applyupdates
was introduced with mergestate 5ff549be1f0c and rendered this chunk of code
irrelevant.
The impact of the chunk was reduced in 0309b0586c65 - but it could have been
removed completely.
Currently the "copy" dict contains both explicit copies/moves made by a
context and pending moves that need to happen because the other context moved
the directory the file was in. For explicit copies, the dict stores a
destination to source map, while for pending moves via directory renames, it
stores a source to destination map. The merge code uses this fact in a non-
obvious way to differentiate between these two cases.
We make this explicit by storing these pending moves in a separate dict. The
dict still has a source to destination map, but that is called out in the
docstring.
This pulls the code used to calculate the changes that need to happen
during merge.update() into a separate function. This is not useful on
its own, but is instead preparatory to performing grafts in memory
when there are no potential conflicts.
Before this patch, case-folding collision is checked simply between
manifests of each merged revisions.
So, files may be considered as colliding each other, even though one
of them is already deleted on one of merged branches: in such case,
merge causes deleting it, so case-folding collision doesn't occur.
This patch checks whether both of files colliding each other still
remain after merge or not, and ignores collision if at least one of
them is deleted by merge.
In the case that one of colliding files is deleted on one of merged
branches and changed on another, file is considered to still remain
after merge, even though it may be deleted by merge, if "deleting" of
it is chosen in "manifestmerge()".
This avoids fail to merge by case-folding collisions after choices
from "changing" and "deleting" of files.
This patch adds only tests for "removed remotely" code paths in
"_remains()", because other ones are tested by existing tests in
"test-casecollision-merge.t".
For divergent renames the following message is printed during merge:
note: possible conflict - file was renamed multiple times to:
newfile
file2
When a file is renamed in one branch and deleted in the other, the file still
exists after a merge. With this change a similar message is printed for mv+rm:
note: possible conflict - file was deleted and renamed to:
newfile