Commit Graph

110 Commits

Author SHA1 Message Date
Pierre-Yves David
80ed73689f checkcopies: move 'movewithdir' initialisation right before its usage
The 'movewithdir' had a lot of related logic all around the 'mergecopies'.
However it is actually never containing anything until the very last loop in
that function. We move the (simplified) variable definition there for clarity
2016-10-11 02:15:23 +02:00
Pierre-Yves David
163070ae3d checkcopies: extract the '_related' closure
There is not need for it to be a closure.
2016-10-11 01:29:08 +02:00
Pierre-Yves David
2d670597fe checkcopies: add an inline comment about the '_related' call
This helps understanding the flow of the function.
2016-10-08 23:00:55 +02:00
Pierre-Yves David
71b0c4ef9c checkcopies: minor change to comment
This helped me understand the refactoring so this must be helpful.
2016-10-08 19:03:16 +02:00
Pierre-Yves David
1f966aa892 checkcopies: rename 'ca' to 'base'
This variable was named after the common ancestor. It is actually the merge
base that might differ from the common ancestor in the graft case. We rename the
variable before a larger refactoring to clarify the situation.
2016-10-08 18:38:42 +02:00
Gábor Stefanik
8d5c0019c5 copies: don't record divergence for files needing no merge
This is left over from when _checkcopies was factored out from mergecopies.

The 2nd break has "of = None" before it, so it's a functionally equivalent
change. The 1st one, however, causes a divergence to be recorded when
a file has been renamed, but there is nothing to be merged to it.

This is currently harmless, since the extra divergence is simply ignored
later. However, the new _checkcopies introduced in the rest of this series
does more than just record a divergence after completing the main loop,
and it's important that the "post-processing" stage is really skipped
for no-merge-needed renames.
2016-10-03 13:29:59 +02:00
Gábor Stefanik
5a7c7889a5 copies: mark checkcopies as internal with the _ prefix 2016-10-03 13:24:56 +02:00
Gábor Stefanik
6f3ec7c4c3 copies: split u1/u2 to u1u/u2u and u1r/u2r
These will be made different in case of grafts by another patch in this series.
2016-10-03 13:23:19 +02:00
Gábor Stefanik
88c9737beb copies: style fixes and add comment 2016-10-03 13:18:31 +02:00
Gábor Stefanik
846bad50ac copies: limit is an optimization, and doesn't provide guarantees 2016-10-03 16:19:55 +02:00
timeless
a1cb3173a2 py3: convert to next() function
next(..) was introduced in py2.6 and .next() is not available in py3

https://docs.python.org/2/library/functions.html#next
2016-05-16 21:30:53 +00:00
Durham Goode
6f7f581f5f copies: optimize forward copy detection logic for rebases
Forward copy detection (i.e. detecting what files have been moved/copied in
commit X since ancestor Y) previously required diff'ing the manifests of both X
and Y. This was expensive since it required reading both entire manifests and
doing a set difference (they weren't already in a set because of the
lazymanifest work). This cost almost 1 second on very large repositories, and
happens N times for a rebase of N commits.

This patch optimizes it for the case of rebase. In a rebase, we are comparing a
commit against it's immediate parent, and therefore we can know what files
changed by looking at ctx.files().  This lets us drastically decrease the size
of the set comparison, and makes it O(# of changes) instead of O(size of
manifest). This makes it take 1ms instead of 1000ms.
2016-02-05 13:23:24 -08:00
Matt Mackall
a8fcfbf03d copies: fix detection of divergent directory renames
If we move all the files out of one directory, but into two different
directories, we should not consider it a directory rename. The
detection of this case was broken.
2016-01-13 10:10:05 -06:00
Mads Kiilerich
09567db49a spelling: trivial spell checking 2015-10-17 00:58:46 +02:00
Matt Mackall
3b6391ff9a copies: group bothnew with other sets 2015-08-19 15:40:13 -05:00
Matt Mackall
a4a77851e3 copies: rename renamedelete to renamedeleteset for clarity 2015-08-19 15:32:27 -05:00
Matt Mackall
505912bcc2 copies: move _makegetfctx calls into checkcopies 2015-08-19 15:26:08 -05:00
Matt Mackall
babc6236a3 copies: factor out setupctx into _makegetfctx
This reduces the scope of mergecopies a bit
2015-08-19 15:17:33 -05:00
Matt Mackall
6be7bd49e6 copies: avoid reference to c1/c2 in makectx 2015-08-21 15:12:58 -05:00
Matt Mackall
846cb052bb copies: move debug statement to appropriate place 2015-08-19 15:11:17 -05:00
Matt Mackall
06dcd73aae copies: rename diverge2 to divergeset for clarity 2015-08-19 14:04:54 -05:00
Matt Mackall
cf71f5908f copies: begin separating mergecopies sides 2015-08-19 13:40:18 -05:00
Matt Mackall
a37c0c906b copies: rename ctx() to getfctx() for clarity 2015-08-19 13:09:54 -05:00
Durham Goode
1bef6fef82 copy: add flag for disabling copy tracing
Copy tracing can be up to 80% of rebase time when rebasing stacks of commits in
large repos (hundreds of thousands of files).  This provides the option of
turning off the majority of copy tracing. It does not turn off _forwardcopies()
since that is used to carry copy information inside a commit across a rebase.

This will affect the situation where a user edits a file, then rebases on top of
commits that have moved that file. The move will not be detected and the user
will have to manually resolve the issue (possibly by redoing the rebase with
this flag off).

The reason to have a flag instead of trying to fix the actual copy tracing
performance is that copy tracing is fundamentally an O(number of files in the
repo) operation.  In order to know if file X in the rebase source was copied
anywhere, we have to walk the filelog for every new file that exists in the
rebase destination (i.e. a file in the destination that is not in the common
ancestor).  Without an index that lets us trace forward (i.e. from file Y in the
common ancestor forward to the rebase destination), it will never be an O(number
of changes in my branch) operation.

In mozilla-central, rebasing a 3 commit stack across 20,000 revs goes from 39s
to 11s.
2015-01-27 11:26:27 -08:00
Gregory Szorc
9440bd18c5 copies: use absolute_import 2015-08-08 00:41:13 -07:00
Matt Mackall
fc81d2a796 merge with stable 2015-05-26 14:52:47 -05:00
Matt Mackall
bd4df663a9 mergecopies: avoid slowdown from linkrev adjustment (issue4680)
checkcopies was using fctx.rev() which it was expecting would be
equivalent to linkrev() but was triggering the new _adjustlinkrev path.
This was making grafts and merges with large sets of potential copies
very expensive.
2015-05-26 06:45:18 -05:00
Martin von Zweigbergk
68e09d510f copies: document hack for adding '' to set of dirs
The root directory is not normally added to 'dirs' instances (although
I think it should be). In copies.mergecopies, we call dirname() to get
the directory of a path and then check for containment in the 'dirs'
instances ('d1' and 'd2'). In order to easily handle files in the root
directory, '/' is added to d1/d2. This results in the empty string
being added to the sets, since what comes before the slash in '/' is
an empty string. This seems less than obvious, so let's document it.
2015-05-22 14:02:04 -07:00
Durham Goode
586027b77d copies: switch to using pathutil.dirname
copies had it's own dirname implementation. Now that pathutils has a common one,
let's use that instead.
2015-05-22 12:58:27 -07:00
Durham Goode
610230ad03 copies: add matcher parameter to copy logic
This allows passing a matcher down the pathcopies() stack to _forwardcopies().
This will let us add logic in a later patch to avoid tracing copies when not
necessary (like when doing hg diff -r 1 -r 2 foo.txt).
2015-04-16 11:29:30 -07:00
Durham Goode
ce308b547b copies: pass changectx instead of manifest to _computenonoverlap
The _computenonoverlap function takes two manifests to allow extensions to hook
in and read the manifest nodes produced by the function. The remotefilelog
extension actually needs the entire changectx instead (which includes the
manifest) so it can prefetch the subset of files necessary for a sparse checkout
(and the sparse checkout depends on which commit is being accessed, hence the
need for the changectx).

I have tests in the remotefilelog extension that cover this.
2015-04-03 15:18:34 -07:00
Matt Mackall
db55434dfb merge with stable 2015-03-20 17:30:38 -05:00
Pierre-Yves David
41927328f0 mergecopies: reuse ancestry context when traversing file history (issue4537)
Merge copies is traversing file history in search for copies and renames.
Since 3.3 we are doing "linkrev adjustment" to ensure duplicated filelog entry
does not confuse the traversal. This "linkrev adjustment" involved ancestry
testing and walking in the changeset graph. If we do such walk in the changesets
graph for each file, we end up with a 'O(<changesets>x<files>)' complexity
that create massive issue. For examples, grafting a changeset in Mozilla's repo
moved from 6 seconds to more than 3 minutes.

There is a mechanism to reuse such ancestors computation between all files. But
it has to be manually set up in situation were it make sense to take such
shortcut. This changesets set this mechanism up and bring back the graph time
from 3 minutes to 8 seconds.

To do so, we need a bigger control on the way 'filectx' are instantiated during
each 'checkcopies' calls that 'mergecopies' is doing. We add a new 'setupctx'
that configure and return a 'filectx' factory. The function make sure the
ancestry context is properly created and the factory make sure it is properly
installed on returned 'filectx'.
2015-03-20 00:30:35 -07:00
Matt Mackall
ab01fb226c copies: use linkrev for file tracing limit
This lets us lazily evaluate _adjustlinkrev.
2015-02-01 16:25:12 -06:00
Pierre-Yves David
e94f338ab6 _adjustlinkrev: reuse ancestors set during rename detection (issue4514)
The new linkrev adjustement mechanism makes rename detection very slow, because
each file rewalks the ancestor dag. To mitigate the issue in Mercurial 3.3, we
introduce a simplistic way to share the ancestors computation for the linkrev
validation phase.

We can reuse the ancestors in that case because we do not care about
sub-branching in the ancestors graph.

The cached set will be use to check if the linkrev is valid in the search
context. This is the vast majority of the ancestors usage during copies search
since the uncached one will only be used when linkrev is invalid, which is
hopefully rare.
2015-01-30 16:02:28 +00:00
Durham Goode
f07fec4ff0 copies: added manifests to computenonoverlap
Commit d1f83f500b47 changed the computenonoverlap api's to not require the
manifests. We actually need the manifests in the remotefilelog extension so
we can find the file nodes for the various files that change.  Let's add it
back to the function signature with a note explaining why.

This doesn't affect any behavior.
2015-03-10 13:56:05 -07:00
Martin von Zweigbergk
2de043381d copies: only calculate 'addedinm[12]' sets once
Pass the addedinm1 and addedinm2 instead of m1, m2, ma into
_computenonoverlap() instead of calculating the sets twice.
2015-02-27 14:26:22 -08:00
Martin von Zweigbergk
d4eabc6ccd copies: calculate 'bothnew' from manifestdict.filesnotin()
In the same spirit as the previous change, let's now calculate the
'bothnew' variable using manifestdict.filesnotin().5D
2015-02-27 14:03:01 -08:00
Martin von Zweigbergk
93f839cfd2 copies: replace _nonoverlap() by calls to manifestdict.filesnotin()
Now that we have manifestdict.filesnotin(), we can write _nonoverlap()
in terms of that method instead, enabling future speedups when
filesnotin() gets optimized, and perhaps making the code a little
clearer at the same time.
2015-02-27 14:02:30 -08:00
Martin von Zweigbergk
199e845f93 copies: move code into new manifestdict.filesnotin() method
copies._computeforwardmissing() finds files in one context that is not
in the other. Let's move this code into a new method on manifestdict,
so m1.filesnotin(m2) can be optimized for various types of manifests
(we expect more types of manifests soon).
2015-02-27 13:57:37 -08:00
Durham Goode
7b9ea7ac55 copy: move _forwardcopies file logic to a function
Moves the _forwardcopies missingfiles logic to a separate function so that other
extensions which need to prefetch information about the files being
processed have a hook point.

This saves extensions from having to recompute this information themselves, and
thus saves several seconds off of various commands (like rebase).
2015-01-27 17:24:12 -08:00
Durham Goode
5a2bff7069 copy: move mergecopies file logic to a function
Moves the mergecopies nonoverlap logic to a separate function so that other
extensions which may need to prefetch information about the files being
processed have a hook point.

This saves extensions from having to recompute this information themselves, and
thus saves several seconds off of various commands (like rebase).
2015-01-27 17:23:18 -08:00
Mads Kiilerich
523c87c1fe spelling: fixes from proofreading of spell checker issues 2014-04-17 22:47:38 +02:00
Ryan McElroy
365c7718eb amend: fix amending rename commit with diverged topologies (issue4405)
This addresses the bug described in issue4405: when obsolescence markers are
enabled, amending a commit with a file move can lead to the copy information
being lost.

However, the bug is more general and can be reproduced without obsmarkers as
well, as demonstracted by Pierre-Yves and put into the updated test.
Specifically, graph topology divergences between the filelogs and the changelog
can cause copy information to be lost during amends.
2014-10-16 06:35:06 -07:00
Matt Mackall
f663e5fc01 duplicatecopies: move from cmdutil to copies
This is in preparation for moving its primary caller into merge.py,
which would be a layering violation in the current location.
2014-10-13 14:33:13 -05:00
Mads Kiilerich
c55887b864 copies: guard debug section with ui.debugflag 2014-02-25 20:31:53 +01:00
Mads Kiilerich
210347bb4f copies: remove _checkcopies wrapper - it does no good
mergecopies might be doomed but it is not dead yet ...
2014-02-25 20:31:51 +01:00
Mads Kiilerich
43ddf0086b copies: when both sides made the same copy, report it as a copy
Not used yet ... but shows up in debug output.
2014-02-25 20:29:14 +01:00
Mads Kiilerich
3ee1a27c56 diff: search beyond ancestor when detecting renames
This removes an optimization that was introduced in 5a644704d5eb but was too
aggressive - as indicated by how it changed test-mq-merge.t .

We are walking filelogs to find copy sources and we can thus not be sure to hit
the base revision and find the renamed file there - it could also be in the
first ancestor of the base ... in the filelog.

We are walking the filelog and can thus not easily know when we hit the first
ancestor of the base revision and which filename to look for there. Instead, we
use _findlimit like mergecopies do: The lower bound for how far we have to go
is found from the lowest changelog revision that is an ancestor of only one of
the compared revisions. Any filelog ancestor with a revision number lower than
that revision will be the ancestor of both compared revisions, and there is
thus no reason to go further back than that.
2013-11-16 15:46:29 -05:00
Durham Goode
8921ef9d64 copies: refactor checkcopies() into a top level method
This moves checkcopies() out of mergecopies() and makes it a top level
function in the copies module. This allows extensions to override it. For
example, I'm developing a filelog replacement that doesn't have rev numbers
so all the rev number dependent implementation here needs to be replaced
by the extension.

No logic is changed in this commit.
2013-05-01 10:44:21 -07:00