In an upcoming patch, we will add index information to all git diffs, not
only binary diffs, so this code needs to be moved to a more appropriate
place.
Also, since this information is used for patch headers, it makes more
sense to be in the patch module, along with other patch-related metadata.
diffline is not part of diff computation, so it makes more sense
to place it with other header generation in patch module.
In upcoming patches we will generalize this approach for
all headers added in the patch, including the git index
header.
diffline was called from trydiff for binary diffs and from unidiff
for text diffs. In this patch we unify those calls into one.
diffline is also a header, not part of diff mechanisms, so it makes
sense to remove that responsibility from the mdiff module. In
upcoming patches we will move diffline to patch module and
keep grouping responsibilities.
b85diff generates a binary diff, so we move this code to mdiff module
along with unidiff for text diffs. All diffing mechanisms will be in the
same place.
In an upcoming patch we will remove the responsibility to print the
index header from b85diff and move it back to patch, since it's
a patch metadata header, not part of the diff generation.
diff ---/+++ should end filenames with a TAB when they contain spaces. Current
code failed to do so when only the +++ file had spaces. This only happened with
git renames from a name without space to one with space.
When diffing the following documents with --ignore-blank-lines (-B):
$ cat > a <<EOF
>
>
>
> b
> x
> d
> EOF
and:
$ cat > b <<EOF
> b
> y
> d
> EOF
the context lines are taken from the first document, even if the lines differ
(with -w or -b) or if the number of lines differ (with -B). In the second case,
we have to adjust the hunk new lines offsets or we end with inconsistent diffs
like (see the @@ offsets):
diff -r 0e66aa54f318 a
--- a/a
+++ b/a
@@ -1,4 +1,3 @@
b
-x
+y
d
Note that having different context lines in a and b means the diff can be
applied but is not invertible.
Reported by Nicholas Riley <com-selenic@sabi.net>
splitblock() was added to handle blocks returned by bdiff.blocks() which differ
only by blank lines but are not made only of blank lines. I do not know exactly
how it could happen but mdiff.blocks() threshold behaviour makes me think it
can if those blocks are made of very popular lines mixed with popular blank
lines. If it is proven to be wrong, the function can be dropped.
The first implementation made annotate share diff configuration entries. But it
looks like users will user -w/b for annotate but not for diff, on both the
command line and hgweb. Since the latter cannot use command line entries, we
introduce a new [annotate] section duplicating the diff whitespace options.
Prior to this patch "hg diff -U0", i.e., zero lines of context, would
output hunk headers with a start line one greater than what GNU patch
and git output. Guido van Rossum documents the unified diff format[1]
as having a start line value "one lower than one would expect" for
zero length hunks.
Comparing the behaviour of the three systems prior to this patch in
transforming
c1
c3
to
c1
c2
c3
- GNU "diff -U0" reports the hunk as "@@ -1,0 +2 @@"
- "git diff -U0" reports the hunk as "@@ -1,0 +2 @@"
- "hg diff -U0" reports the hunk as "@@ -2,0 +2,1 @@"
After this patch, "hg diff -U0" reports "@@ -1,0 +2,1 @@".
Since "hg export --config diff.unified=0" outputs zero-context unified
diffs, "hg import" has also been updated to account for start lines
one less than expected for zero length hunk ranges.
[1]: http://www.artima.com/weblogs/viewpost.jsp?thread=164293
Previous code was computing hunks then checking if these hunks could be ignored
when taking whitespace/blank-lines options in accounts. This approach is simple
but fails with hunks containing both whitespace and non-whitespace changes, the
whole hunk is emitted while it can be mostly made of whitespace. The new
version normalize the whitespaces before hunk generation, and test for
blank-lines afterwards.
In worst case, generating diff in upgrade mode can be two times more expensive
than generating it in git mode directly: we may have to regenerate the whole
diff again whenever a git feature is detected. Also, the first diff attempt is
completely buffered instead of being streamed. That said, even without having
profiled it yet, I am convinced we can fast-path the upgrade mode if necessary
were it to be used in regular diff commands, and not only in mq where avoiding
data loss is worth the price.
A plain Python string comparison stops when the first mismatch is
found, whereas the call to md5 would need to compute the hash over the
entire string and only then do the comparison.
We'd mistakenly made the -p option always on, which meant there was no
way to turn it off. It also meant that we were sometimes splitting
multibyte characters in function name, which isn't a good default.
To avoid extra memory usage and performance issues with large files,
generate a trivial delta header for deltas against the null revision
rather than calling the usual delta generator.
We append the delta header to meta rather than prepending it to data
to avoid a large allocate and copy.
Add a trailing TAB to the "--- filename" lines if there's a space
in the file name. This allows patch(1) to work correctly. The
same is done for diff --nodates.
This was originally suggested by Andrei Vermel, but at the time
I thought git was doing something different.
Without -a option to "hg diff", mdiff.unidiff reported that "Binary
file foo has changed" without even trying to compare things. Now it
computes MD5 of old and new files, compares them and makes the conclusion.
rename commands.dodiff to patch.diff.
rename commands.doexport to patch.export.
move some functions from commands to new mercurial.cmdutil module.
turn list of diff options into mdiff.diffopts class.
patch.diff and patch.export now has clean api for call from 3rd party
python code.