Commit Graph

156 Commits

Author SHA1 Message Date
Martin von Zweigbergk
55f777a236 copies: group wdir-handling in one place
I think this makes it both easier to follow and shorter.

Differential Revision: https://phab.mercurial-scm.org/D1698
2017-12-14 00:25:03 -08:00
Martin von Zweigbergk
c3a801b87d copies: extract method for getting non-wdir forward copies
I may add an alternative way of getting copy metadata (from changelog,
not filelog) but the chaining with the dirstate copy metadata will be
the same, so it will probably help to have this extracted. Even if
that doesn't happen, the next patch will show that we can simplify
this a bit after this refactoring, so it seems worth it regardless.

Differential Revision: https://phab.mercurial-scm.org/D1697
2017-12-14 00:18:38 -08:00
Martin von Zweigbergk
ffa0e63e83 copies: consistently use """ for docstrings
Differential Revision: https://phab.mercurial-scm.org/D1696
2017-12-14 08:27:22 -08:00
Martin von Zweigbergk
3bd0ca942e copies: always respect matcher arg to _forwardcopies()
The function would ignore the matcher if the dirstate copies were
requested. It doesn't matter in practice because all callers used the
returned map only for looking up specific files from and those files
had already been filtered by the matcher (AFACT). Still, it's a little
confusing, so let's make it clearer by respecting the matcher in this
case too.

Differential Revision: https://phab.mercurial-scm.org/D1695
2017-12-11 10:24:38 -08:00
Pulkit Goyal
98c13e4c70 copies: add a config to limit the number of candidates to check in heuristics
The heuristics algorithm find possible candidates for move/copy and then check
whether they are actually a copy or move. In some cases, there can be lot of
candidates possible which can actually slow down the algorithm.

This patch introduces a config option
`experimental.copytrace.movecandidateslimit` using which one can limit the
candidates to check. The limit defaults to 100.

Thanks to Yuya for suggesting to skip copytracing for that file with a
warning.

Differential Revision: https://phab.mercurial-scm.org/D987
2017-10-10 02:25:03 +05:30
Phil Cohen
a04280b2f9 context: add workingfilectx.markcopied
With in-memory merge, copy information needs to be stored in-memory, not in the
dirstate.

To make this transition easy, move the existing dirstate-based approach to
workingfilectx; that way, other implementations can choose to store it
somewhere else.

Differential Revision: https://phab.mercurial-scm.org/D1106
2017-10-15 20:36:29 -07:00
Pulkit Goyal
2251d97e92 copies: add docs for config experimental.copytrace.sourcecommitlimit
This patch adds documentation for the config option. The config name does not
convey much and hence documentation was required.

Differential Revision: https://phab.mercurial-scm.org/D986
2017-10-08 04:39:42 +05:30
Yuya Nishihara
183298cb9f copytrace: use ctx.mutable() instead of adhoc constant of non-public phases 2017-09-22 22:45:02 +09:00
Pulkit Goyal
2c8e44b5f7 py3: explicitly convert dict.keys() and dict.items() into a list
Differential Revision: https://phab.mercurial-scm.org/D853
2017-09-30 15:45:15 +05:30
Pulkit Goyal
ee400fb169 copytrace: add a a new config to limit the number of drafts in heuristics
The heuristics options tries to the default full copytracing algorithm if both
the source and destination branches contains of non-public changesets only. But
this can be slow in cases when we have a lot of drafts.

This patch adds a new config option experimental.copytrace.sourcecommitlimit
which defaults to 100. This value will be the limit of number of drafts from c1
to base. Incase there are more changesets even though they are draft, the
heuristics algorithm will be used.

Differential Revision: https://phab.mercurial-scm.org/D763
2017-09-21 15:58:44 +05:30
Pulkit Goyal
8a6be941c9 copytrace: use the full copytracing method if only drafts are involved
This patch adds the functionality to use the full copytracing even if
`experimental.copytrace = heuristics` in cases when drafts are involved.

This is also a part of copytrace extension in fbext.

This also adds tests which are also taken from fbext.

.. feature::

   The `heuristics` option for `experimental.copytrace` performs full
   copytracing if both source and destination branches contains non-public
   changsets only.

Differential Revision: https://phab.mercurial-scm.org/D625
2017-09-03 20:06:45 +05:30
Pulkit Goyal
a5baff1381 copytrace: move fast heuristic copytracing algorithm to core
copytrace extension in fb-hgext has a heuristic implementation of copy tracing
which is faster than the current copy tracing. The heuristic limits the search
of copies to just files that are either:

1) Renames in the same directory
2) Moved to other directory with same name

The default copytrace implementation is very slow as it finds all the new files
that were added from merge base up to the head commit and for each file it
checks whether it this was copied or moved version of a different file.

Stash@fb did analysis for the above heuristics on the fb repo and found that
among 2,443,768 moves/copies there are only 32,234 moves/copies which does not
fall under the above heuristics which is approx. 0.013 of total copies.

This patch moves the heuristics algorithm under config
`experimental.copytrace=heuristics`.

While moving fbext to core, this patch removes couple of less useful config
options named `sourcecommitlimit` and `maxmovescandidatestocheck`.

Tests are also added for the heuristics algorithm, which are basically copied
from fbext/tests/test-copytrace.t. The tests follow a pattern creating a server
repo and then cloning to a local repo to create public and draft changesets, the
distinction which will be useful in upcoming patches.

After this patch `experimental.copytrace` has the following behaviour:

1) `off`: turns off copytracing
2) `heuristics`: use the heuristic algorithm added in this patch.
3) everything else: use the full copytracing algorithm

.. feature::

   A new fast heuristic algorithm for copytracing which assumes that the files
   moves are either::
   1) Renames in the same directory
   2) Moves in other directories with same names
   You can use this algorithm by setting `experimental.copytrace=heuristics`.

Differential Revision: https://phab.mercurial-scm.org/D623
2017-09-03 03:49:15 +05:30
Pulkit Goyal
b8929368f2 copytrace: move the default copytracing algorithm in a new function
We are going to introduce a new fast heuristic based copytracing algorithm, so
lets make mergecopies the function which decides which algorithm to go with and
then calls the related function.

While I was here, I add a line in test-copy-move-merge.t saying its a test
related to the full copytracing algorithm.

Differential Revision: https://phab.mercurial-scm.org/D622
2017-09-03 02:34:01 +05:30
Pulkit Goyal
59fe14130c copytrace: replace experimental.disablecopytrace config with copytrace (BC)
This patch replaces experimental.disablecopytrace with experimental.copytrace.
Since the words does not means the same, the default value is also changed. Now
experimental.copytrace defaults to 'on'. The new value is not boolean value as
we will be now having two different algorithms (current one and heuristics one
to be imported from fbext) so we need this to be have more options than
booleans.

The old config option is not kept is completely replaced as that was under
experimental and we don't gurantee BC to experimental things.

.. bc::

   The config option for copytrace `experimental.disablecopytrace` is now
   replaced with `experimental.copytrace` which defaults to `on`. If you need to
   turn off copytracing, add `[experimental] copytrace = off` to your config.

Differential Revision: https://phab.mercurial-scm.org/D621
2017-09-03 01:52:19 +05:30
Gábor Stefanik
2431ab3a7d copies: fix misaligned lines 2017-08-22 16:16:39 +02:00
Gábor Stefanik
a4ce7f6a87 copies: fix typo in comment
"will not be limited" was meant to be "will not be visited". I missed this
when writing the original graft-through-rename patch series.
2017-08-22 16:08:31 +02:00
Yuya Nishihara
5b29c5b3bd copies: use intersectmatchers() in non-merge p1 optimization
This enables the optimization introduced by b8d938230143 for non-rebase cases.
Before, the match couldn't be narrowed if it was e.g. alwaysmatcher.

The logic is copied from fca0d99edf8e.
2017-08-19 11:23:33 +09:00
Pulkit Goyal
304e4abf3a copies: add more details to the documentation of mergecopies()
This documentation is very helpful for any developer to understand what
copytracing is and what the function does. Since this is the main function of
doing copytracing, I have also included bits about copytracing in it.

This additions are picked from a doc by Stash@Fb. So thanks to him.

Differential Revision: https://phab.mercurial-scm.org/D409
2017-08-16 00:25:20 +05:30
Pulkit Goyal
b73ca160f7 py3: use dict.update() instead of constructing lists and adding them
dict.items() returned a list on Python 2 and whereas on Python 3 it returns a
view object. So we required a work around. Using dict.update() is better then
constructing lists as it should save us on gc churns.
2017-06-01 01:14:02 +05:30
Stanislau Hlebik
8513dd5f4b copies: introduce getdstfctx
Previously `c2` may had an incorrect linkrev because getsrcfctx set wrong
_descendantrev. getsrcfctx() sets descendant rev equals to srcctx.rev() (see
_makegetfctx()), but for `c2` descendant rev should be dstctx. While we were
lucky it didn't broke copytracing it made it significantly slower in some
cases. Besides it broke some external extensions, for example remotefilelog.
2017-05-29 06:06:13 -07:00
Stanislau Hlebik
52f5b22254 copies: rename getfctx to getsrcfctx
In the next patch we'll use getdstfctx. Let's rename getfctx to getsrcfctx in
this patch.
2017-05-29 05:58:08 -07:00
Stanislau Hlebik
789df611a1 copies: remove msrc and mdst parameters
This function already has lots of parameters. And we can get manifests from
contexts. So let's get msrc and mdst parameters from srcctx and dstctx.
2017-05-29 05:57:25 -07:00
Stanislau Hlebik
d7b30bb1f8 copies: add dstctx parameter
Add parameter with destination context
2017-05-29 05:57:03 -07:00
Stanislau Hlebik
92e5d5f67c copies: rename ctx to srcctx
In the next diff we'll pass new dstctx parameter. Let's rename ctx to srcctx in
this patch.
2017-05-29 05:56:17 -07:00
Stanislau Hlebik
aa25365b25 copies: rename m2 to mdst
Small refactoring to rename m2 to more clearer mdst.
2017-05-29 05:52:15 -07:00
Stanislau Hlebik
219c6a051d copies: rename m1 to msrc
Small refactoring that renames `m1` parameter name to a more clearer name
`msrc`.
2017-05-29 05:52:15 -07:00
Martin von Zweigbergk
c3406ac3db cleanup: use set literals
We no longer support Python 2.6, so we can now use set literals.
2017-02-10 16:56:29 -08:00
Durham Goode
3058681421 copies: remove use of manifest.matches
Convert the existing use of manifest.matches to use the new api. This is part
of getting rid of manifest.matches, since it is O(manifest).
2017-03-07 09:56:11 -08:00
Gábor Stefanik
b631d16eab graft: support grafting changes to new file in renamed directory (issue5436) 2016-12-05 17:40:01 +01:00
Durham Goode
fb55c2fbf3 dirstate: change added/modified placeholder hash length to 20 bytes
Previously the added/modified placeholder hash for manifests generated from the
dirstate was a 21byte long string consisting of the p1 file hash plus a single
character to indicate an add or a modify. Normal hashes are only 20 bytes long.
This makes it complicated to implement more efficient manifest implementations
which rely on the hashes being fixed length.

Let's change this hash to just be 20 bytes long, and rely on the astronomical
improbability of an actual hash being these 20 bytes (just like we rely on no
hash every being the nullid).

This changes the possible behavior slightly in that the hash for all
added/modified entries in the dirstate manifest will now be the same (so simple
node comparisons would say they are equal), but we should never be doing simple
node comparisons on these nodes even with the old hashes, because they did not
accurately represent the content (i.e. two files based off the same p1 file
node, with different working copy contents would have the same hash (even with
the appended character) in the old scheme too, so we couldn't depend on the
hashes period).
2016-11-10 02:19:16 -08:00
Durham Goode
03d313b5fd dirstate: change placeholder hash length to 20 bytes
Previously the new-node placeholder hash for manifests generated from the
dirstate was a 21byte long string of "!" characters. Normal hashes are only 20
bytes long.  This makes it complicated to implement more efficient manifest
implementations which rely on the hashes being fixed length.

Let's change this hash to just be 20 bytes long, and rely on the astronomical
improbability of an actual hash being 20 "!" bytes in a row (just like we rely
on no hash ever being the nullid).

A future diff will do this for added and modified dirstate markers as well, so
we're putting the new newnodeid in node.py so there's a common place for these
placeholders.
2016-11-10 02:17:22 -08:00
Gábor Stefanik
5533b05a12 merge: avoid superfluous filemerges when grafting through renames (issue5407)
This is a fix for a regression introduced by the patches for issue4028.

The test changes are due to us doing fewer _checkcopies searches now, which
makes some test outputs revert to the pre-issue4028 behavior. That issue itself
remains fixed, we only skip copy tracing for files where it isn't relevant.
As a nice side effect, this makes copy detection much faster when tracing
backwards through lots of renames.
2016-10-25 21:01:53 +02:00
Gábor Stefanik
14dc42e666 copies: improve assertions during copy recombination
- Make sure there is nothing to recombine in non-graftlike scenarios
- More pythonic assert syntax
2016-10-18 02:09:08 +02:00
Gábor Stefanik
2f48be6841 copies: make _checkcopies handle copy sequences spanning the TCA (issue4028)
When working in a rotated DAG (for a graftlike merge), there can be files
that are renamed both between the base and the topological CA, and between
the TCA and the endpoint farther from the base. Such renames span the TCA
(and thus need both passes of _checkcopies to be fully detected), but may
not necessarily be divergent.

Make _checkcopies return "incomplete copies" and "incomplete divergences"
in this case, and let mergecopies recombine them once data from both passes
of _checkcopies is available.

With this patch, all known cases involving renames and grafts pass.

(Developed together with Pierre-Yves David)
2016-10-11 04:39:47 +02:00
Gábor Stefanik
c03c8792d5 checkcopies: add logic to handle remotebase
As the two _checkcopies passes' ranges are separated by tca, not base,
only one of the two passes will actually encounter the base.
Pass "remotebase" to the other pass to let it know not to expect passing
over the base. This is required for handling a few unusual rename cases.
2016-10-11 04:25:59 +02:00
Gábor Stefanik
912f58ada1 mergecopies: add logic to process incomplete data
We first combine incomplete copies on the two sides of the topological CA
into complete copies.
Any leftover incomplete copies are then combined with the incomplete
divergences to reconstruct divergences spanning over the topological CA.
Finally we promote any divergences falsely flagged as incomplete to full
divergences.

Right now, there is nothing generating incomplete copy/divergence data,
so this code does nothing. Changes to _checkcopies to populate these
dicts are coming later in this series.
2016-10-04 12:51:54 +02:00
Gábor Stefanik
242c4897e8 checkcopies: handle divergences contained entirely in tca::ctx
During a graftlike merge, _checkcopies runs from ctx to tca, possibly
passing over the merge base. If there is a rename both before and after
the base, then we're actually dealing with divergent renames.
If there is no rename on the other side of tca, then the divergence is
contained entirely in the range of one _checkcopies invocation, and
should be detected "in the loop" without having to rely on the other
_checkcopies pass.
2016-10-12 11:54:03 +02:00
Gábor Stefanik
d967d939d6 mergecopies: invoke _computenonoverlap for both base and tca during merges
The algorithm of _checkcopies can only walk backwards in the DAG, never
forward. Because of this, the two _checkcopies patches need to run from
their respective endpoints to the TCA to cover the entire subgraph where
the merge is being performed. However, detection of files new in both
endpoints, as well as directory rename detection, need to run with respect
to the merge base, so we need lists of new files both from the TCA's and
the merge base's viewpoint to correctly detect renames in a graft-like
merge scenario.

(Series reworked by Pierre-Yves David)
2016-10-13 02:19:43 +02:00
Pierre-Yves David
cce3e9c3ad copies: make it possible to distinguish betwen _computenonoverlap invocations
_computenonoverlap needs to be invoked twice during a graft, and debugging
messages should be distinguishable between the two invocations
2016-10-18 00:00:43 +02:00
Gábor Stefanik
7730b47e09 copies: make _checkcopies handle simple renames in a rotated DAG
This introduces a distinction between "merge base" and
"topological common ancestor". During a regular merge, these two are
identical. Graft, however, performs a merge in a rotated DAG, where the
merge base will not be a common ancestor at all in the
original DAG.

To correctly find copies in case of a graft, we need to take both the
merge base and the topological CA into account, and track any renames
between them in reverse. Fortunately we can detect this in advance,
see comment in the code about "backwards".

This patch only supports finding non-divergent renames contained entirely
between the merge base and the topological CA. Further patches are coming
to support more complex cases.

(Pierre-Yves David was involved in the cleanup of this patch.)
2016-10-13 02:03:54 +02:00
Gábor Stefanik
60bab1ec6c copies: compute a suitable TCA if base turns out to be unsuitable
This will be used later in an update to _checkcopies.

(Pierre-Yves David was involved in the cleanup of this patch.)
2016-10-13 02:03:49 +02:00
Gábor Stefanik
6250f7ff54 copies: detect graft-like merges
Right now, nothing changes as a result of this, but we want to handle
grafts differently from ordinary merges later.

(Series developed together with Pierre-Yves David)
2016-10-13 01:47:33 +02:00
Gábor Stefanik
4adc2f1a6a checkcopies: add a sanity check against false-positive copies
When grafting a copy backwards through a rename, a copy is wrongly detected,
which causes the graft to be applied inappropriately, in a destructive way.
Make sure that the old file name really exists in the common ancestor,
and bail out if it doesn't.

This fixes the aggravated case of bug 5343, although the basic issue
(failure to duplicate the copy information) still occurs.
2016-10-12 21:33:45 +02:00
Pierre-Yves David
604c8243a9 mergecopies: rename 'ca' to 'base'
This variable was named after the common ancestor. It is actually the merge
base that might differ from the common ancestor in the graft case. We rename the
variable before a larger refactoring to clarify the situation. Similar rename
was also applied to 'checkcopies' in a prior changeset.
2016-10-13 01:30:14 +02:00
Pierre-Yves David
4eabc75da9 copies: move variable document from checkcopies to mergecopies
It appears that 'mergecopies' is the function consuming these data so we move
the documentation there.
2016-10-13 01:26:33 +02:00
Pierre-Yves David
9df147eb63 checkcopies: pass data as a dictionary of dictionaries
more are coming
2016-10-11 02:21:42 +02:00
Pierre-Yves David
80ed73689f checkcopies: move 'movewithdir' initialisation right before its usage
The 'movewithdir' had a lot of related logic all around the 'mergecopies'.
However it is actually never containing anything until the very last loop in
that function. We move the (simplified) variable definition there for clarity
2016-10-11 02:15:23 +02:00
Pierre-Yves David
163070ae3d checkcopies: extract the '_related' closure
There is not need for it to be a closure.
2016-10-11 01:29:08 +02:00
Pierre-Yves David
2d670597fe checkcopies: add an inline comment about the '_related' call
This helps understanding the flow of the function.
2016-10-08 23:00:55 +02:00
Pierre-Yves David
71b0c4ef9c checkcopies: minor change to comment
This helped me understand the refactoring so this must be helpful.
2016-10-08 19:03:16 +02:00