I may add an alternative way of getting copy metadata (from changelog,
not filelog) but the chaining with the dirstate copy metadata will be
the same, so it will probably help to have this extracted. Even if
that doesn't happen, the next patch will show that we can simplify
this a bit after this refactoring, so it seems worth it regardless.
Differential Revision: https://phab.mercurial-scm.org/D1697
The function would ignore the matcher if the dirstate copies were
requested. It doesn't matter in practice because all callers used the
returned map only for looking up specific files from and those files
had already been filtered by the matcher (AFACT). Still, it's a little
confusing, so let's make it clearer by respecting the matcher in this
case too.
Differential Revision: https://phab.mercurial-scm.org/D1695
The heuristics algorithm find possible candidates for move/copy and then check
whether they are actually a copy or move. In some cases, there can be lot of
candidates possible which can actually slow down the algorithm.
This patch introduces a config option
`experimental.copytrace.movecandidateslimit` using which one can limit the
candidates to check. The limit defaults to 100.
Thanks to Yuya for suggesting to skip copytracing for that file with a
warning.
Differential Revision: https://phab.mercurial-scm.org/D987
With in-memory merge, copy information needs to be stored in-memory, not in the
dirstate.
To make this transition easy, move the existing dirstate-based approach to
workingfilectx; that way, other implementations can choose to store it
somewhere else.
Differential Revision: https://phab.mercurial-scm.org/D1106
This patch adds documentation for the config option. The config name does not
convey much and hence documentation was required.
Differential Revision: https://phab.mercurial-scm.org/D986
The heuristics options tries to the default full copytracing algorithm if both
the source and destination branches contains of non-public changesets only. But
this can be slow in cases when we have a lot of drafts.
This patch adds a new config option experimental.copytrace.sourcecommitlimit
which defaults to 100. This value will be the limit of number of drafts from c1
to base. Incase there are more changesets even though they are draft, the
heuristics algorithm will be used.
Differential Revision: https://phab.mercurial-scm.org/D763
This patch adds the functionality to use the full copytracing even if
`experimental.copytrace = heuristics` in cases when drafts are involved.
This is also a part of copytrace extension in fbext.
This also adds tests which are also taken from fbext.
.. feature::
The `heuristics` option for `experimental.copytrace` performs full
copytracing if both source and destination branches contains non-public
changsets only.
Differential Revision: https://phab.mercurial-scm.org/D625
copytrace extension in fb-hgext has a heuristic implementation of copy tracing
which is faster than the current copy tracing. The heuristic limits the search
of copies to just files that are either:
1) Renames in the same directory
2) Moved to other directory with same name
The default copytrace implementation is very slow as it finds all the new files
that were added from merge base up to the head commit and for each file it
checks whether it this was copied or moved version of a different file.
Stash@fb did analysis for the above heuristics on the fb repo and found that
among 2,443,768 moves/copies there are only 32,234 moves/copies which does not
fall under the above heuristics which is approx. 0.013 of total copies.
This patch moves the heuristics algorithm under config
`experimental.copytrace=heuristics`.
While moving fbext to core, this patch removes couple of less useful config
options named `sourcecommitlimit` and `maxmovescandidatestocheck`.
Tests are also added for the heuristics algorithm, which are basically copied
from fbext/tests/test-copytrace.t. The tests follow a pattern creating a server
repo and then cloning to a local repo to create public and draft changesets, the
distinction which will be useful in upcoming patches.
After this patch `experimental.copytrace` has the following behaviour:
1) `off`: turns off copytracing
2) `heuristics`: use the heuristic algorithm added in this patch.
3) everything else: use the full copytracing algorithm
.. feature::
A new fast heuristic algorithm for copytracing which assumes that the files
moves are either::
1) Renames in the same directory
2) Moves in other directories with same names
You can use this algorithm by setting `experimental.copytrace=heuristics`.
Differential Revision: https://phab.mercurial-scm.org/D623
We are going to introduce a new fast heuristic based copytracing algorithm, so
lets make mergecopies the function which decides which algorithm to go with and
then calls the related function.
While I was here, I add a line in test-copy-move-merge.t saying its a test
related to the full copytracing algorithm.
Differential Revision: https://phab.mercurial-scm.org/D622
This patch replaces experimental.disablecopytrace with experimental.copytrace.
Since the words does not means the same, the default value is also changed. Now
experimental.copytrace defaults to 'on'. The new value is not boolean value as
we will be now having two different algorithms (current one and heuristics one
to be imported from fbext) so we need this to be have more options than
booleans.
The old config option is not kept is completely replaced as that was under
experimental and we don't gurantee BC to experimental things.
.. bc::
The config option for copytrace `experimental.disablecopytrace` is now
replaced with `experimental.copytrace` which defaults to `on`. If you need to
turn off copytracing, add `[experimental] copytrace = off` to your config.
Differential Revision: https://phab.mercurial-scm.org/D621
This enables the optimization introduced by b8d938230143 for non-rebase cases.
Before, the match couldn't be narrowed if it was e.g. alwaysmatcher.
The logic is copied from fca0d99edf8e.
This documentation is very helpful for any developer to understand what
copytracing is and what the function does. Since this is the main function of
doing copytracing, I have also included bits about copytracing in it.
This additions are picked from a doc by Stash@Fb. So thanks to him.
Differential Revision: https://phab.mercurial-scm.org/D409
dict.items() returned a list on Python 2 and whereas on Python 3 it returns a
view object. So we required a work around. Using dict.update() is better then
constructing lists as it should save us on gc churns.
Previously `c2` may had an incorrect linkrev because getsrcfctx set wrong
_descendantrev. getsrcfctx() sets descendant rev equals to srcctx.rev() (see
_makegetfctx()), but for `c2` descendant rev should be dstctx. While we were
lucky it didn't broke copytracing it made it significantly slower in some
cases. Besides it broke some external extensions, for example remotefilelog.
Previously the added/modified placeholder hash for manifests generated from the
dirstate was a 21byte long string consisting of the p1 file hash plus a single
character to indicate an add or a modify. Normal hashes are only 20 bytes long.
This makes it complicated to implement more efficient manifest implementations
which rely on the hashes being fixed length.
Let's change this hash to just be 20 bytes long, and rely on the astronomical
improbability of an actual hash being these 20 bytes (just like we rely on no
hash every being the nullid).
This changes the possible behavior slightly in that the hash for all
added/modified entries in the dirstate manifest will now be the same (so simple
node comparisons would say they are equal), but we should never be doing simple
node comparisons on these nodes even with the old hashes, because they did not
accurately represent the content (i.e. two files based off the same p1 file
node, with different working copy contents would have the same hash (even with
the appended character) in the old scheme too, so we couldn't depend on the
hashes period).
Previously the new-node placeholder hash for manifests generated from the
dirstate was a 21byte long string of "!" characters. Normal hashes are only 20
bytes long. This makes it complicated to implement more efficient manifest
implementations which rely on the hashes being fixed length.
Let's change this hash to just be 20 bytes long, and rely on the astronomical
improbability of an actual hash being 20 "!" bytes in a row (just like we rely
on no hash ever being the nullid).
A future diff will do this for added and modified dirstate markers as well, so
we're putting the new newnodeid in node.py so there's a common place for these
placeholders.
This is a fix for a regression introduced by the patches for issue4028.
The test changes are due to us doing fewer _checkcopies searches now, which
makes some test outputs revert to the pre-issue4028 behavior. That issue itself
remains fixed, we only skip copy tracing for files where it isn't relevant.
As a nice side effect, this makes copy detection much faster when tracing
backwards through lots of renames.
When working in a rotated DAG (for a graftlike merge), there can be files
that are renamed both between the base and the topological CA, and between
the TCA and the endpoint farther from the base. Such renames span the TCA
(and thus need both passes of _checkcopies to be fully detected), but may
not necessarily be divergent.
Make _checkcopies return "incomplete copies" and "incomplete divergences"
in this case, and let mergecopies recombine them once data from both passes
of _checkcopies is available.
With this patch, all known cases involving renames and grafts pass.
(Developed together with Pierre-Yves David)
As the two _checkcopies passes' ranges are separated by tca, not base,
only one of the two passes will actually encounter the base.
Pass "remotebase" to the other pass to let it know not to expect passing
over the base. This is required for handling a few unusual rename cases.
We first combine incomplete copies on the two sides of the topological CA
into complete copies.
Any leftover incomplete copies are then combined with the incomplete
divergences to reconstruct divergences spanning over the topological CA.
Finally we promote any divergences falsely flagged as incomplete to full
divergences.
Right now, there is nothing generating incomplete copy/divergence data,
so this code does nothing. Changes to _checkcopies to populate these
dicts are coming later in this series.
During a graftlike merge, _checkcopies runs from ctx to tca, possibly
passing over the merge base. If there is a rename both before and after
the base, then we're actually dealing with divergent renames.
If there is no rename on the other side of tca, then the divergence is
contained entirely in the range of one _checkcopies invocation, and
should be detected "in the loop" without having to rely on the other
_checkcopies pass.
The algorithm of _checkcopies can only walk backwards in the DAG, never
forward. Because of this, the two _checkcopies patches need to run from
their respective endpoints to the TCA to cover the entire subgraph where
the merge is being performed. However, detection of files new in both
endpoints, as well as directory rename detection, need to run with respect
to the merge base, so we need lists of new files both from the TCA's and
the merge base's viewpoint to correctly detect renames in a graft-like
merge scenario.
(Series reworked by Pierre-Yves David)
This introduces a distinction between "merge base" and
"topological common ancestor". During a regular merge, these two are
identical. Graft, however, performs a merge in a rotated DAG, where the
merge base will not be a common ancestor at all in the
original DAG.
To correctly find copies in case of a graft, we need to take both the
merge base and the topological CA into account, and track any renames
between them in reverse. Fortunately we can detect this in advance,
see comment in the code about "backwards".
This patch only supports finding non-divergent renames contained entirely
between the merge base and the topological CA. Further patches are coming
to support more complex cases.
(Pierre-Yves David was involved in the cleanup of this patch.)
Right now, nothing changes as a result of this, but we want to handle
grafts differently from ordinary merges later.
(Series developed together with Pierre-Yves David)
When grafting a copy backwards through a rename, a copy is wrongly detected,
which causes the graft to be applied inappropriately, in a destructive way.
Make sure that the old file name really exists in the common ancestor,
and bail out if it doesn't.
This fixes the aggravated case of bug 5343, although the basic issue
(failure to duplicate the copy information) still occurs.
This variable was named after the common ancestor. It is actually the merge
base that might differ from the common ancestor in the graft case. We rename the
variable before a larger refactoring to clarify the situation. Similar rename
was also applied to 'checkcopies' in a prior changeset.
The 'movewithdir' had a lot of related logic all around the 'mergecopies'.
However it is actually never containing anything until the very last loop in
that function. We move the (simplified) variable definition there for clarity