This moves checkcopies() out of mergecopies() and makes it a top level
function in the copies module. This allows extensions to override it. For
example, I'm developing a filelog replacement that doesn't have rev numbers
so all the rev number dependent implementation here needs to be replaced
by the extension.
No logic is changed in this commit.
This is a performance win for a number of reasons:
- We don't iterate over contexts, which avoids a completely unnecessary sorted
call + the O(number of files) abstraction cost of doing that.
- We don't check membership in a context, which avoids another
O(number of files) abstraction cost.
- We iterate over the manifests in C instead of Python.
For a large repo with 170,000 files, this improves perfpathcopies from 0.34
seconds to 0.07. Anything that uses pathcopies, such as rebase or diff --git
between two revisions, benefits.
The inverse of a rename is a rename, but the inverse of a copy is not a copy.
Presenting it as such -- in particular, stuffing it into the same dict as real
copies -- causes bugs because other code starts believing the inverse copies
are real.
The only test whose output changes is test-mv-cp-st-diff.t. When a backwards
status -C command is run where a copy is involved, the inverse copy (which was
hitherto presented as a real copy) is no longer displayed.
Keeping track of inverse copies is useful in some situations -- composability
of diffs, for example, since adding "a" followed by an inverse copy "b" to "a"
is equivalent to a rename "b" to "a". However, representing them would require
a more complex data structure than the same dict in which real copies are also
stored.
The -> in debug messages is currently overloaded to mean both source to dest
and dest to source. To fix this, we add explicit labels and make the arrow
direction consistent.
Currently the "copy" dict contains both explicit copies/moves made by a
context and pending moves that need to happen because the other context moved
the directory the file was in. For explicit copies, the dict stores a
destination to source map, while for pending moves via directory renames, it
stores a source to destination map. The merge code uses this fact in a non-
obvious way to differentiate between these two cases.
We make this explicit by storing these pending moves in a separate dict. The
dict still has a source to destination map, but that is called out in the
docstring.
For divergent renames the following message is printed during merge:
note: possible conflict - file was renamed multiple times to:
newfile
file2
When a file is renamed in one branch and deleted in the other, the file still
exists after a merge. With this change a similar message is printed for mv+rm:
note: possible conflict - file was deleted and renamed to:
newfile
Before the copies refactoring, we declared that if a and b were
present in source and destination, we ignored copies between them. The
refactored code could however report b was a copy of a and vice versa
in a situation where we looked for differences between two identical
changesets that copy a to b.
y
/
x
\
y'
The existing copy detection API was designed with merge in mind and
was ill-suited for doing status/diff. The new pathcopies
implementation gives more accurate, easier to use results for
comparing two revisions, and is much simpler to understand.
Test notes:
- test-mv-cp-st.t results finds more renames in the reverse direction now
- test-mq-merge.t was always wrong and duplicated a copy in diff that
was already present in one of the parent revisions
On some large repos, copy detection could spend > 10min using
fctx.ancestor() to determine if file revisions were actually related.
Because ancestor must traverse history to the root to determine the
GCA, it was doing a lot more work than necessary. With this
replacement, same status -r a:b takes ~3 seconds.
The built-in None object is a singleton and it is therefore safe to
compare memory addresses with is. It is also faster, how much depends
on the object being compared. For a simple type like str I get:
| s = "foo" | s = None
----------+-----------+----------
s == None | 0.25 usec | 0.21 usec
s is None | 0.17 usec | 0.17 usec
This should be faster and more future-proof. Calls where the result is to be
sorted using util.sort() have been left unchanged. Calls to .items() on
configparser objects have been left as-is, too.
When we're using copies() to find changes between the working directory and
its first parent for diff/status/etc., use dirstate.copies() directly.
This avoids doing a full statwalk for simple diffs (issue1090) and
removes a special case from the status command.