Commit Graph

128 Commits

Author SHA1 Message Date
Gábor Stefanik
b631d16eab graft: support grafting changes to new file in renamed directory (issue5436) 2016-12-05 17:40:01 +01:00
Durham Goode
fb55c2fbf3 dirstate: change added/modified placeholder hash length to 20 bytes
Previously the added/modified placeholder hash for manifests generated from the
dirstate was a 21byte long string consisting of the p1 file hash plus a single
character to indicate an add or a modify. Normal hashes are only 20 bytes long.
This makes it complicated to implement more efficient manifest implementations
which rely on the hashes being fixed length.

Let's change this hash to just be 20 bytes long, and rely on the astronomical
improbability of an actual hash being these 20 bytes (just like we rely on no
hash every being the nullid).

This changes the possible behavior slightly in that the hash for all
added/modified entries in the dirstate manifest will now be the same (so simple
node comparisons would say they are equal), but we should never be doing simple
node comparisons on these nodes even with the old hashes, because they did not
accurately represent the content (i.e. two files based off the same p1 file
node, with different working copy contents would have the same hash (even with
the appended character) in the old scheme too, so we couldn't depend on the
hashes period).
2016-11-10 02:19:16 -08:00
Durham Goode
03d313b5fd dirstate: change placeholder hash length to 20 bytes
Previously the new-node placeholder hash for manifests generated from the
dirstate was a 21byte long string of "!" characters. Normal hashes are only 20
bytes long.  This makes it complicated to implement more efficient manifest
implementations which rely on the hashes being fixed length.

Let's change this hash to just be 20 bytes long, and rely on the astronomical
improbability of an actual hash being 20 "!" bytes in a row (just like we rely
on no hash ever being the nullid).

A future diff will do this for added and modified dirstate markers as well, so
we're putting the new newnodeid in node.py so there's a common place for these
placeholders.
2016-11-10 02:17:22 -08:00
Gábor Stefanik
5533b05a12 merge: avoid superfluous filemerges when grafting through renames (issue5407)
This is a fix for a regression introduced by the patches for issue4028.

The test changes are due to us doing fewer _checkcopies searches now, which
makes some test outputs revert to the pre-issue4028 behavior. That issue itself
remains fixed, we only skip copy tracing for files where it isn't relevant.
As a nice side effect, this makes copy detection much faster when tracing
backwards through lots of renames.
2016-10-25 21:01:53 +02:00
Gábor Stefanik
14dc42e666 copies: improve assertions during copy recombination
- Make sure there is nothing to recombine in non-graftlike scenarios
- More pythonic assert syntax
2016-10-18 02:09:08 +02:00
Gábor Stefanik
2f48be6841 copies: make _checkcopies handle copy sequences spanning the TCA (issue4028)
When working in a rotated DAG (for a graftlike merge), there can be files
that are renamed both between the base and the topological CA, and between
the TCA and the endpoint farther from the base. Such renames span the TCA
(and thus need both passes of _checkcopies to be fully detected), but may
not necessarily be divergent.

Make _checkcopies return "incomplete copies" and "incomplete divergences"
in this case, and let mergecopies recombine them once data from both passes
of _checkcopies is available.

With this patch, all known cases involving renames and grafts pass.

(Developed together with Pierre-Yves David)
2016-10-11 04:39:47 +02:00
Gábor Stefanik
c03c8792d5 checkcopies: add logic to handle remotebase
As the two _checkcopies passes' ranges are separated by tca, not base,
only one of the two passes will actually encounter the base.
Pass "remotebase" to the other pass to let it know not to expect passing
over the base. This is required for handling a few unusual rename cases.
2016-10-11 04:25:59 +02:00
Gábor Stefanik
912f58ada1 mergecopies: add logic to process incomplete data
We first combine incomplete copies on the two sides of the topological CA
into complete copies.
Any leftover incomplete copies are then combined with the incomplete
divergences to reconstruct divergences spanning over the topological CA.
Finally we promote any divergences falsely flagged as incomplete to full
divergences.

Right now, there is nothing generating incomplete copy/divergence data,
so this code does nothing. Changes to _checkcopies to populate these
dicts are coming later in this series.
2016-10-04 12:51:54 +02:00
Gábor Stefanik
242c4897e8 checkcopies: handle divergences contained entirely in tca::ctx
During a graftlike merge, _checkcopies runs from ctx to tca, possibly
passing over the merge base. If there is a rename both before and after
the base, then we're actually dealing with divergent renames.
If there is no rename on the other side of tca, then the divergence is
contained entirely in the range of one _checkcopies invocation, and
should be detected "in the loop" without having to rely on the other
_checkcopies pass.
2016-10-12 11:54:03 +02:00
Gábor Stefanik
d967d939d6 mergecopies: invoke _computenonoverlap for both base and tca during merges
The algorithm of _checkcopies can only walk backwards in the DAG, never
forward. Because of this, the two _checkcopies patches need to run from
their respective endpoints to the TCA to cover the entire subgraph where
the merge is being performed. However, detection of files new in both
endpoints, as well as directory rename detection, need to run with respect
to the merge base, so we need lists of new files both from the TCA's and
the merge base's viewpoint to correctly detect renames in a graft-like
merge scenario.

(Series reworked by Pierre-Yves David)
2016-10-13 02:19:43 +02:00
Pierre-Yves David
cce3e9c3ad copies: make it possible to distinguish betwen _computenonoverlap invocations
_computenonoverlap needs to be invoked twice during a graft, and debugging
messages should be distinguishable between the two invocations
2016-10-18 00:00:43 +02:00
Gábor Stefanik
7730b47e09 copies: make _checkcopies handle simple renames in a rotated DAG
This introduces a distinction between "merge base" and
"topological common ancestor". During a regular merge, these two are
identical. Graft, however, performs a merge in a rotated DAG, where the
merge base will not be a common ancestor at all in the
original DAG.

To correctly find copies in case of a graft, we need to take both the
merge base and the topological CA into account, and track any renames
between them in reverse. Fortunately we can detect this in advance,
see comment in the code about "backwards".

This patch only supports finding non-divergent renames contained entirely
between the merge base and the topological CA. Further patches are coming
to support more complex cases.

(Pierre-Yves David was involved in the cleanup of this patch.)
2016-10-13 02:03:54 +02:00
Gábor Stefanik
60bab1ec6c copies: compute a suitable TCA if base turns out to be unsuitable
This will be used later in an update to _checkcopies.

(Pierre-Yves David was involved in the cleanup of this patch.)
2016-10-13 02:03:49 +02:00
Gábor Stefanik
6250f7ff54 copies: detect graft-like merges
Right now, nothing changes as a result of this, but we want to handle
grafts differently from ordinary merges later.

(Series developed together with Pierre-Yves David)
2016-10-13 01:47:33 +02:00
Gábor Stefanik
4adc2f1a6a checkcopies: add a sanity check against false-positive copies
When grafting a copy backwards through a rename, a copy is wrongly detected,
which causes the graft to be applied inappropriately, in a destructive way.
Make sure that the old file name really exists in the common ancestor,
and bail out if it doesn't.

This fixes the aggravated case of bug 5343, although the basic issue
(failure to duplicate the copy information) still occurs.
2016-10-12 21:33:45 +02:00
Pierre-Yves David
604c8243a9 mergecopies: rename 'ca' to 'base'
This variable was named after the common ancestor. It is actually the merge
base that might differ from the common ancestor in the graft case. We rename the
variable before a larger refactoring to clarify the situation. Similar rename
was also applied to 'checkcopies' in a prior changeset.
2016-10-13 01:30:14 +02:00
Pierre-Yves David
4eabc75da9 copies: move variable document from checkcopies to mergecopies
It appears that 'mergecopies' is the function consuming these data so we move
the documentation there.
2016-10-13 01:26:33 +02:00
Pierre-Yves David
9df147eb63 checkcopies: pass data as a dictionary of dictionaries
more are coming
2016-10-11 02:21:42 +02:00
Pierre-Yves David
80ed73689f checkcopies: move 'movewithdir' initialisation right before its usage
The 'movewithdir' had a lot of related logic all around the 'mergecopies'.
However it is actually never containing anything until the very last loop in
that function. We move the (simplified) variable definition there for clarity
2016-10-11 02:15:23 +02:00
Pierre-Yves David
163070ae3d checkcopies: extract the '_related' closure
There is not need for it to be a closure.
2016-10-11 01:29:08 +02:00
Pierre-Yves David
2d670597fe checkcopies: add an inline comment about the '_related' call
This helps understanding the flow of the function.
2016-10-08 23:00:55 +02:00
Pierre-Yves David
71b0c4ef9c checkcopies: minor change to comment
This helped me understand the refactoring so this must be helpful.
2016-10-08 19:03:16 +02:00
Pierre-Yves David
1f966aa892 checkcopies: rename 'ca' to 'base'
This variable was named after the common ancestor. It is actually the merge
base that might differ from the common ancestor in the graft case. We rename the
variable before a larger refactoring to clarify the situation.
2016-10-08 18:38:42 +02:00
Gábor Stefanik
8d5c0019c5 copies: don't record divergence for files needing no merge
This is left over from when _checkcopies was factored out from mergecopies.

The 2nd break has "of = None" before it, so it's a functionally equivalent
change. The 1st one, however, causes a divergence to be recorded when
a file has been renamed, but there is nothing to be merged to it.

This is currently harmless, since the extra divergence is simply ignored
later. However, the new _checkcopies introduced in the rest of this series
does more than just record a divergence after completing the main loop,
and it's important that the "post-processing" stage is really skipped
for no-merge-needed renames.
2016-10-03 13:29:59 +02:00
Gábor Stefanik
5a7c7889a5 copies: mark checkcopies as internal with the _ prefix 2016-10-03 13:24:56 +02:00
Gábor Stefanik
6f3ec7c4c3 copies: split u1/u2 to u1u/u2u and u1r/u2r
These will be made different in case of grafts by another patch in this series.
2016-10-03 13:23:19 +02:00
Gábor Stefanik
88c9737beb copies: style fixes and add comment 2016-10-03 13:18:31 +02:00
Gábor Stefanik
846bad50ac copies: limit is an optimization, and doesn't provide guarantees 2016-10-03 16:19:55 +02:00
timeless
a1cb3173a2 py3: convert to next() function
next(..) was introduced in py2.6 and .next() is not available in py3

https://docs.python.org/2/library/functions.html#next
2016-05-16 21:30:53 +00:00
Durham Goode
6f7f581f5f copies: optimize forward copy detection logic for rebases
Forward copy detection (i.e. detecting what files have been moved/copied in
commit X since ancestor Y) previously required diff'ing the manifests of both X
and Y. This was expensive since it required reading both entire manifests and
doing a set difference (they weren't already in a set because of the
lazymanifest work). This cost almost 1 second on very large repositories, and
happens N times for a rebase of N commits.

This patch optimizes it for the case of rebase. In a rebase, we are comparing a
commit against it's immediate parent, and therefore we can know what files
changed by looking at ctx.files().  This lets us drastically decrease the size
of the set comparison, and makes it O(# of changes) instead of O(size of
manifest). This makes it take 1ms instead of 1000ms.
2016-02-05 13:23:24 -08:00
Matt Mackall
a8fcfbf03d copies: fix detection of divergent directory renames
If we move all the files out of one directory, but into two different
directories, we should not consider it a directory rename. The
detection of this case was broken.
2016-01-13 10:10:05 -06:00
Mads Kiilerich
09567db49a spelling: trivial spell checking 2015-10-17 00:58:46 +02:00
Matt Mackall
3b6391ff9a copies: group bothnew with other sets 2015-08-19 15:40:13 -05:00
Matt Mackall
a4a77851e3 copies: rename renamedelete to renamedeleteset for clarity 2015-08-19 15:32:27 -05:00
Matt Mackall
505912bcc2 copies: move _makegetfctx calls into checkcopies 2015-08-19 15:26:08 -05:00
Matt Mackall
babc6236a3 copies: factor out setupctx into _makegetfctx
This reduces the scope of mergecopies a bit
2015-08-19 15:17:33 -05:00
Matt Mackall
6be7bd49e6 copies: avoid reference to c1/c2 in makectx 2015-08-21 15:12:58 -05:00
Matt Mackall
846cb052bb copies: move debug statement to appropriate place 2015-08-19 15:11:17 -05:00
Matt Mackall
06dcd73aae copies: rename diverge2 to divergeset for clarity 2015-08-19 14:04:54 -05:00
Matt Mackall
cf71f5908f copies: begin separating mergecopies sides 2015-08-19 13:40:18 -05:00
Matt Mackall
a37c0c906b copies: rename ctx() to getfctx() for clarity 2015-08-19 13:09:54 -05:00
Durham Goode
1bef6fef82 copy: add flag for disabling copy tracing
Copy tracing can be up to 80% of rebase time when rebasing stacks of commits in
large repos (hundreds of thousands of files).  This provides the option of
turning off the majority of copy tracing. It does not turn off _forwardcopies()
since that is used to carry copy information inside a commit across a rebase.

This will affect the situation where a user edits a file, then rebases on top of
commits that have moved that file. The move will not be detected and the user
will have to manually resolve the issue (possibly by redoing the rebase with
this flag off).

The reason to have a flag instead of trying to fix the actual copy tracing
performance is that copy tracing is fundamentally an O(number of files in the
repo) operation.  In order to know if file X in the rebase source was copied
anywhere, we have to walk the filelog for every new file that exists in the
rebase destination (i.e. a file in the destination that is not in the common
ancestor).  Without an index that lets us trace forward (i.e. from file Y in the
common ancestor forward to the rebase destination), it will never be an O(number
of changes in my branch) operation.

In mozilla-central, rebasing a 3 commit stack across 20,000 revs goes from 39s
to 11s.
2015-01-27 11:26:27 -08:00
Gregory Szorc
9440bd18c5 copies: use absolute_import 2015-08-08 00:41:13 -07:00
Matt Mackall
fc81d2a796 merge with stable 2015-05-26 14:52:47 -05:00
Matt Mackall
bd4df663a9 mergecopies: avoid slowdown from linkrev adjustment (issue4680)
checkcopies was using fctx.rev() which it was expecting would be
equivalent to linkrev() but was triggering the new _adjustlinkrev path.
This was making grafts and merges with large sets of potential copies
very expensive.
2015-05-26 06:45:18 -05:00
Martin von Zweigbergk
68e09d510f copies: document hack for adding '' to set of dirs
The root directory is not normally added to 'dirs' instances (although
I think it should be). In copies.mergecopies, we call dirname() to get
the directory of a path and then check for containment in the 'dirs'
instances ('d1' and 'd2'). In order to easily handle files in the root
directory, '/' is added to d1/d2. This results in the empty string
being added to the sets, since what comes before the slash in '/' is
an empty string. This seems less than obvious, so let's document it.
2015-05-22 14:02:04 -07:00
Durham Goode
586027b77d copies: switch to using pathutil.dirname
copies had it's own dirname implementation. Now that pathutils has a common one,
let's use that instead.
2015-05-22 12:58:27 -07:00
Durham Goode
610230ad03 copies: add matcher parameter to copy logic
This allows passing a matcher down the pathcopies() stack to _forwardcopies().
This will let us add logic in a later patch to avoid tracing copies when not
necessary (like when doing hg diff -r 1 -r 2 foo.txt).
2015-04-16 11:29:30 -07:00
Durham Goode
ce308b547b copies: pass changectx instead of manifest to _computenonoverlap
The _computenonoverlap function takes two manifests to allow extensions to hook
in and read the manifest nodes produced by the function. The remotefilelog
extension actually needs the entire changectx instead (which includes the
manifest) so it can prefetch the subset of files necessary for a sparse checkout
(and the sparse checkout depends on which commit is being accessed, hence the
need for the changectx).

I have tests in the remotefilelog extension that cover this.
2015-04-03 15:18:34 -07:00
Matt Mackall
db55434dfb merge with stable 2015-03-20 17:30:38 -05:00