This addresses the bug described in issue4405: when obsolescence markers are
enabled, amending a commit with a file move can lead to the copy information
being lost.
However, the bug is more general and can be reproduced without obsmarkers as
well, as demonstracted by Pierre-Yves and put into the updated test.
Specifically, graph topology divergences between the filelogs and the changelog
can cause copy information to be lost during amends.
This removes an optimization that was introduced in 5a644704d5eb but was too
aggressive - as indicated by how it changed test-mq-merge.t .
We are walking filelogs to find copy sources and we can thus not be sure to hit
the base revision and find the renamed file there - it could also be in the
first ancestor of the base ... in the filelog.
We are walking the filelog and can thus not easily know when we hit the first
ancestor of the base revision and which filename to look for there. Instead, we
use _findlimit like mergecopies do: The lower bound for how far we have to go
is found from the lowest changelog revision that is an ancestor of only one of
the compared revisions. Any filelog ancestor with a revision number lower than
that revision will be the ancestor of both compared revisions, and there is
thus no reason to go further back than that.
This moves checkcopies() out of mergecopies() and makes it a top level
function in the copies module. This allows extensions to override it. For
example, I'm developing a filelog replacement that doesn't have rev numbers
so all the rev number dependent implementation here needs to be replaced
by the extension.
No logic is changed in this commit.
This is a performance win for a number of reasons:
- We don't iterate over contexts, which avoids a completely unnecessary sorted
call + the O(number of files) abstraction cost of doing that.
- We don't check membership in a context, which avoids another
O(number of files) abstraction cost.
- We iterate over the manifests in C instead of Python.
For a large repo with 170,000 files, this improves perfpathcopies from 0.34
seconds to 0.07. Anything that uses pathcopies, such as rebase or diff --git
between two revisions, benefits.
The inverse of a rename is a rename, but the inverse of a copy is not a copy.
Presenting it as such -- in particular, stuffing it into the same dict as real
copies -- causes bugs because other code starts believing the inverse copies
are real.
The only test whose output changes is test-mv-cp-st-diff.t. When a backwards
status -C command is run where a copy is involved, the inverse copy (which was
hitherto presented as a real copy) is no longer displayed.
Keeping track of inverse copies is useful in some situations -- composability
of diffs, for example, since adding "a" followed by an inverse copy "b" to "a"
is equivalent to a rename "b" to "a". However, representing them would require
a more complex data structure than the same dict in which real copies are also
stored.
The -> in debug messages is currently overloaded to mean both source to dest
and dest to source. To fix this, we add explicit labels and make the arrow
direction consistent.
Currently the "copy" dict contains both explicit copies/moves made by a
context and pending moves that need to happen because the other context moved
the directory the file was in. For explicit copies, the dict stores a
destination to source map, while for pending moves via directory renames, it
stores a source to destination map. The merge code uses this fact in a non-
obvious way to differentiate between these two cases.
We make this explicit by storing these pending moves in a separate dict. The
dict still has a source to destination map, but that is called out in the
docstring.
For divergent renames the following message is printed during merge:
note: possible conflict - file was renamed multiple times to:
newfile
file2
When a file is renamed in one branch and deleted in the other, the file still
exists after a merge. With this change a similar message is printed for mv+rm:
note: possible conflict - file was deleted and renamed to:
newfile
Before the copies refactoring, we declared that if a and b were
present in source and destination, we ignored copies between them. The
refactored code could however report b was a copy of a and vice versa
in a situation where we looked for differences between two identical
changesets that copy a to b.
y
/
x
\
y'
The existing copy detection API was designed with merge in mind and
was ill-suited for doing status/diff. The new pathcopies
implementation gives more accurate, easier to use results for
comparing two revisions, and is much simpler to understand.
Test notes:
- test-mv-cp-st.t results finds more renames in the reverse direction now
- test-mq-merge.t was always wrong and duplicated a copy in diff that
was already present in one of the parent revisions
On some large repos, copy detection could spend > 10min using
fctx.ancestor() to determine if file revisions were actually related.
Because ancestor must traverse history to the root to determine the
GCA, it was doing a lot more work than necessary. With this
replacement, same status -r a:b takes ~3 seconds.
The built-in None object is a singleton and it is therefore safe to
compare memory addresses with is. It is also faster, how much depends
on the object being compared. For a simple type like str I get:
| s = "foo" | s = None
----------+-----------+----------
s == None | 0.25 usec | 0.21 usec
s is None | 0.17 usec | 0.17 usec