Commit Graph

83 Commits

Author SHA1 Message Date
Patrick Mezard
1e2e55778b mdiff: fix diff header generation for files with spaces (issue3357)
diff ---/+++ should end filenames with a TAB when they contain spaces. Current
code failed to do so when only the +++ file had spaces. This only happened with
git renames from a name without space to one with space.
2012-04-05 15:39:07 +02:00
Patrick Mezard
a884b59cee mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)
When diffing the following documents with --ignore-blank-lines (-B):

  $ cat > a <<EOF
  >
  >
  >
  > b
  > x
  > d
  > EOF

and:

  $ cat > b <<EOF
  > b
  > y
  > d
  > EOF

the context lines are taken from the first document, even if the lines differ
(with -w or -b) or if the number of lines differ (with -B). In the second case,
we have to adjust the hunk new lines offsets or we end with inconsistent diffs
like (see the @@ offsets):

  diff -r 0e66aa54f318 a
  --- a/a
  +++ b/a
  @@ -1,4 +1,3 @@

   b
  -x
  +y
   d

Note that having different context lines in a and b means the diff can be
applied but is not invertible.

Reported by Nicholas Riley <com-selenic@sabi.net>
2012-02-06 21:17:50 +01:00
Matt Mackall
9f8ee10163 util: don't mess with builtins to emulate buffer() 2011-12-15 15:27:11 -06:00
Patrick Mezard
c4cef76e25 mdiff: replace wscleanup() regexps with C loops
On my system it reduces:

  hg annotate -w mercurial/commands.py

from 36s to less than 8s, to be compared with 6.3s when run without whitespace
options.
2011-11-18 14:23:03 +01:00
Patrick Mezard
252fc23b56 mdiff: split lines in allblocks() only when necessary
These are only required to handle the --ignore-blank-lines case
2011-11-18 14:16:47 +01:00
Patrick Mezard
cc3315778f annotate: support diff whitespace filtering flags (issue3030)
splitblock() was added to handle blocks returned by bdiff.blocks() which differ
only by blank lines but are not made only of blank lines. I do not know exactly
how it could happen but mdiff.blocks() threshold behaviour makes me think it
can if those blocks are made of very popular lines mixed with popular blank
lines. If it is proven to be wrong, the function can be dropped.

The first implementation made annotate share diff configuration entries. But it
looks like users will user -w/b for annotate but not for diff, on both the
command line and hgweb. Since the latter cannot use command line entries, we
introduce a new [annotate] section duplicating the diff whitespace options.
2011-11-18 12:04:31 +01:00
Patrick Mezard
4772aafb85 mdiff: make diffblocks() return all blocks, matching and changed
Annotate uses matching blocks not changed ones.
2011-11-18 12:01:04 +01:00
Patrick Mezard
ddbd229fe3 mdiff: extract blocks whitespace normalization in diffblocks()
We want to reuse it in annotate for whitespace normalization.
2011-11-18 11:53:38 +01:00
Matt Mackall
75db0d196a merge with stable 2011-11-17 16:53:17 -06:00
Patrick Mezard
16811ddcba diff: --ignore-blank-lines was too enthusiastic
It was ignoring changes from:

ab

to:

a
b
2011-11-13 21:37:14 +01:00
Nicolas Venegas
2582d1e3d9 mdiff/patch: fix bad hunk handling for unified diffs with zero context
Prior to this patch "hg diff -U0", i.e., zero lines of context, would
output hunk headers with a start line one greater than what GNU patch
and git output. Guido van Rossum documents the unified diff format[1]
as having a start line value "one lower than one would expect" for
zero length hunks.

Comparing the behaviour of the three systems prior to this patch in
transforming

  c1
  c3

to

  c1
  c2
  c3

- GNU "diff -U0" reports the hunk as "@@ -1,0 +2 @@"
- "git diff -U0" reports the hunk as "@@ -1,0 +2 @@"
- "hg diff -U0" reports the hunk as "@@ -2,0 +2,1 @@"

After this patch, "hg diff -U0" reports "@@ -1,0 +2,1 @@".

Since "hg export --config diff.unified=0" outputs zero-context unified
diffs, "hg import" has also been updated to account for start lines
one less than expected for zero length hunk ranges.

[1]: http://www.artima.com/weblogs/viewpost.jsp?thread=164293
2011-11-09 16:55:59 -08:00
Matt Mackall
8f2b7260c4 merge with stable 2011-11-10 11:00:27 -06:00
Mads Kiilerich
dcf70e02fc diff: always use / in paths in diff
Subrepo diffs would sometimes use backslash on windows.
2011-11-07 02:49:00 +01:00
Brodie Rao
7e742515c2 mdiff: speed up showfunc for large diffs
This addresses the following issues with showfunc:

- Silly usage of regular expressions.
- Doing str.rstrip() needlessly in an inner loop.
- Doing catastrophic backtracking when trying to find a function line.

Finding function text is now at worst O(n lines in the old file), and
at best close to O(n hunks).

Given a diff like this[1]:

 src/main/antlr3/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunker.g        |      4 +-
 src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerLexer.java  |      2 +-
 src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerParser.java |  29189 +++++----
 3 files changed, 14741 insertions(+), 14454 deletions(-)

[1]: https://bitbucket.org/wwmm/chemicaltagger/changeset/d2bfbaecd4fc/raw

Without this change, hg log --stat --config diff.showfunc=1 takes an
absurdly long time to complete:

   CallCount    Recursive    Total(ms)   Inline(ms) module:lineno(function)
       32813            0     80.3546     40.6086   mercurial.mdiff:160(yieldhunk)
   +65062746            0     25.7227     25.7227   +<method 'match' of '_sre.SRE_Pattern' objects>
   +65062746            0     14.0221     14.0221   +<method 'rstrip' of 'str' objects>
       +1809            0      0.0009      0.0009   +mercurial.mdiff:148(contextend)
       +1809            0      0.0003      0.0003   +<len>
    65062746            0     25.7227     25.7227   <method 'match' of '_sre.SRE_Pattern' objects>
    65062763            0     14.0221     14.0221   <method 'rstrip' of 'str' objects>
         543            0      0.1631      0.1631   <zlib.decompress>
           3            0      0.0505      0.0505   <mercurial.bdiff.blocks>
       31007            0     80.4564      0.0477   mercurial.mdiff:147(_unidiff)
      +32813            0     80.3546     40.6086   +mercurial.mdiff:160(yieldhunk)
          +3            0      0.0505      0.0505   +<mercurial.bdiff.blocks>
       +3618            0      0.0022      0.0022   +mercurial.mdiff:154(contextstart)
       +5427            0      0.0013      0.0013   +<len>
          +3            0      0.0001      0.0000   +re:188(compile)
           1            0     80.8381      0.0322   mercurial.patch:1777(diffstatdata)
     +107499            0      0.0235      0.0235   +<method 'startswith' of 'str' objects>
      +31014            0     80.7820      0.0071   +mercurial.util:1284(iterlines)
          +3            0      0.0000      0.0000   +<method 'search' of '_sre.SRE_Pattern' objects>
          +4            0      0.0000      0.0000   +mercurial.patch:1783(addresult)
          +3            0      0.0000      0.0000   +<method 'group' of '_sre.SRE_Match' objects>
           6            0      0.0444      0.0283   mercurial.mdiff:12(splitnewlines)
          +6            0      0.0160      0.0160   +<method 'split' of 'str' objects>
          32            0      0.0246      0.0246   <method 'update' of '_hashlib.HASH' objects>
          11            0      0.0236      0.0236   <method 'read' of 'file' objects>
Time: real 80.880 secs (user 80.200+0.000 sys 0.380+0.000)

With this change, it's almost as fast as not using showfunc at all:

   CallCount    Recursive    Total(ms)   Inline(ms) module:lineno(function)
         543            0      0.1699      0.1699   <zlib.decompress>
           3            0      0.0501      0.0501   <mercurial.bdiff.blocks>
       32813            0      0.0415      0.0348   mercurial.mdiff:161(yieldhunk)
      +70837            0      0.0058      0.0058   +<method 'isalnum' of 'str' objects>
       +1809            0      0.0006      0.0006   +mercurial.mdiff:148(contextend)
       +1809            0      0.0002      0.0002   +<len>
           1            0      0.4879      0.0310   mercurial.patch:1777(diffstatdata)
     +107499            0      0.0230      0.0230   +<method 'startswith' of 'str' objects>
      +31014            0      0.4335      0.0065   +mercurial.util:1284(iterlines)
          +3            0      0.0000      0.0000   +<method 'search' of '_sre.SRE_Pattern' objects>
          +4            0      0.0000      0.0000   +mercurial.patch:1783(addresult)
          +1            0      0.0004      0.0000   +re:188(compile)
          32            0      0.0293      0.0293   <method 'update' of '_hashlib.HASH' objects>
           6            0      0.0427      0.0279   mercurial.mdiff:12(splitnewlines)
          +6            0      0.0147      0.0147   +<method 'split' of 'str' objects>
       31007            0      0.1169      0.0235   mercurial.mdiff:147(_unidiff)
          +3            0      0.0501      0.0501   +<mercurial.bdiff.blocks>
      +32813            0      0.0415      0.0348   +mercurial.mdiff:161(yieldhunk)
       +3618            0      0.0012      0.0012   +mercurial.mdiff:154(contextstart)
       +5427            0      0.0006      0.0006   +<len>
      107597            0      0.0230      0.0230   <method 'startswith' of 'str' objects>
          16            0      0.0213      0.0213   <mercurial.mpatch.patches>
         194            0      0.0149      0.0149   <method 'split' of 'str' objects>
Time: real 0.530 secs (user 0.450+0.000 sys 0.070+0.000)
2011-09-19 15:58:03 -07:00
Mads Kiilerich
35ef3c1409 mdiff: carriage return (\r) is also ignorable whitespace 2010-10-19 03:55:06 +02:00
Benoit Boissinot
0a73b8e369 mdiff.patch(): add a special case for when the base text is empty
remove the special casing from revlog.addgroup()
2010-08-23 13:28:04 +02:00
Benoit Boissinot
621e9c06bf remove header handling out of mdiff.bunidiff, rename it 2010-03-09 18:31:57 +01:00
Matt Mackall
8d99be19f0 many, many trivial check-code fixups 2010-01-25 00:05:27 -06:00
Matt Mackall
cd3ef170f7 Merge with stable 2010-01-19 22:45:09 -06:00
Matt Mackall
595d66f424 Update license to GPLv2+ 2010-01-19 22:20:08 -06:00
Patrick Mezard
a9ef9386bc mq: preserve --git flag when merging patches
Without this, merging a patch queue without diff.git=1 downgrades all git
patches to regular patches, losing data in the process.
2010-01-01 19:53:05 +01:00
Patrick Mezard
2f3f5f28ea mdiff: fix diff -b/B/w on mixed whitespace hunks (issue127)
Previous code was computing hunks then checking if these hunks could be ignored
when taking whitespace/blank-lines options in accounts. This approach is simple
but fails with hunks containing both whitespace and non-whitespace changes, the
whole hunk is emitted while it can be mostly made of whitespace. The new
version normalize the whitespaces before hunk generation, and test for
blank-lines afterwards.
2009-11-11 18:31:42 +01:00
Martin Geisler
5b4e5428df replace "i in range(len(xs))" with "i, x in enumerate(xs)"
The remaining occurrences should be the ones where "xs" is mutated or
where "i" is used for index arithmetic.
2009-05-26 22:59:52 +02:00
Patrick Mezard
d6ce43b965 patch: support diff data loss detection and upgrade
In worst case, generating diff in upgrade mode can be two times more expensive
than generating it in git mode directly: we may have to regenerate the whole
diff again whenever a git feature is detected. Also, the first diff attempt is
completely buffered instead of being streamed. That said, even without having
profiled it yet, I am convinced we can fast-path the upgrade mode if necessary
were it to be used in regular diff commands, and not only in mq where avoiding
data loss is worth the price.
2010-01-01 20:54:05 +01:00
Simon Heimberg
09ac1e6c92 separate import lines from mercurial and general python modules 2009-04-28 17:40:46 +02:00
Martin Geisler
750183bdad updated license to be explicit about GPL version 2 2009-04-26 01:08:54 +02:00
Dirkjan Ochtman
9bf5b2380e diff: fix obscure off-by-one error in diff -p 2008-11-27 17:00:54 +01:00
Thomas Arendsen Hein
712c41183e Remove trailing space 2008-10-22 18:55:07 +02:00
Dirkjan Ochtman
7c9e09c95d patch/diff: use a separate function to write the first line of a file diff 2008-10-22 13:14:52 +02:00
Martin Geisler
d0e419c6a7 mdiff: compare content of binary files directly
A plain Python string comparison stops when the first mismatch is
found, whereas the call to md5 would need to compute the hash over the
entire string and only then do the comparison.
2008-08-09 02:10:22 +02:00
Dirkjan Ochtman
34d6bea8db python 2.6 compatibility: compatibility wrappers for hash functions 2008-04-04 22:36:40 +02:00
Patrick Mezard
8e0cbccd26 Let --unified default to diff.unified (issue 1076) 2008-04-04 22:15:14 +02:00
Matt Mackall
11423d02c7 diff: don't show function name by default
We'd mistakenly made the -p option always on, which meant there was no
way to turn it off. It also meant that we were sometimes splitting
multibyte characters in function name, which isn't a good default.
2008-01-16 11:14:24 -06:00
Dustin Sallings
bebcdac954 Use both the from and to name in mdiff.unidiff.
This fixes a compatibility issue with git diffs.
* * *
2007-11-01 12:17:59 -07:00
Matt Mackall
a92b40c2ed revlog: generate trivial deltas against null revision
To avoid extra memory usage and performance issues with large files,
generate a trivial delta header for deltas against the null revision
rather than calling the usual delta generator.

We append the delta header to meta rather than prepending it to data
to avoid a large allocate and copy.
2007-10-03 17:17:27 -05:00
Matt Mackall
6a7cb8cbaa diff: correctly handle combinations of whitespace options 2007-07-14 12:44:47 -05:00
Alexis S. L. Carvalho
e85eaa593d git patches: correct handling of filenames with spaces
Add a trailing TAB to the "--- filename" lines if there's a space
in the file name.  This allows patch(1) to work correctly.  The
same is done for diff --nodates.

This was originally suggested by Andrei Vermel, but at the time
I thought git was doing something different.
2007-06-22 19:06:04 -03:00
Alexis S. L. Carvalho
425cc6372f add mdiff.get_matching_blocks 2007-04-16 20:17:39 -03:00
Thomas Arendsen Hein
134efad44c merge with crew-stable 2007-02-20 20:55:23 +01:00
tailgunner@smtp.ru
5474a24e48 Don't lie that "binary file has changed"
Without -a option to "hg diff", mdiff.unidiff reported that "Binary
file foo has changed" without even trying to compare things. Now it
computes MD5 of old and new files, compares them and makes the conclusion.
2007-02-17 09:54:56 -02:00
Matt Mackall
f17a4e1934 Replace demandload with new demandimport 2006-12-13 13:27:09 -06:00
Stephen Darnell
a808384cf1 Add -D/--nodates options to hg diff/export that removes dates from diff headers
and replace uses of sed in the tests with --nodates.
2006-09-26 00:05:24 +01:00
Brendan Cully
619c7dab4b Remove dates from git export file lines - they confuse git-apply 2006-08-29 17:08:42 -07:00
Brendan Cully
c18265f47c Add diff --git option 2006-08-14 22:48:03 -07:00
Vadim Gelfer
13d751feaf refactor text diff/patch code.
rename commands.dodiff to patch.diff.
rename commands.doexport to patch.export.
move some functions from commands to new mercurial.cmdutil module.
turn list of diff options into mdiff.diffopts class.

patch.diff and patch.export now has clean api for call from 3rd party
python code.
2006-08-12 16:13:27 -07:00
Vadim Gelfer
dc377b58c1 update copyrights. 2006-08-12 12:30:02 -07:00
Haakon Riiser
7b06333d1a diff: add -b/-B options 2006-06-29 15:16:25 +02:00
Vadim Gelfer
9a0c813fdc use demandload more. 2006-06-20 23:58:21 -07:00
Vadim Gelfer
1de5bf52df fix speed regression in mdiff caused by line split bugfix. 2006-05-10 13:39:12 -07:00
Vadim Gelfer
c440466a54 fix diffs containing embedded "\r".
add test to make sure fix stays fixed.
2006-05-10 10:31:54 -07:00