Commit Graph

116 Commits

Author SHA1 Message Date
Pulkit Goyal
5cabeba9d4 py3: use pycompat.strkwargs() to convert kwargs keys to str 2017-06-27 00:23:32 +05:30
Yuya Nishihara
5fe7742660 mpatch: switch to policy importer 2016-08-13 12:18:58 +09:00
Yuya Nishihara
50b316b748 bdiff: switch to policy importer
# no-check-commit
2016-08-13 12:12:50 +09:00
Yuya Nishihara
d9d64e114f bdiff: proxy through mdiff module
See the previous commit for why.

mdiff seems a good place to host bdiff functions. bdiff.bdiff was already
aliased as textdiff, so we use it.
2017-04-26 22:03:37 +09:00
Yuya Nishihara
ab046506ef base85: proxy through util module
I'm going to replace hgimporter with a simpler import function, so we can
access to pure/cext modules by name:

  # util.py
  base85 = policy.importmod('base85')  # select pure.base85 or cext.base85

  # cffi/base85.py
  from ..pure.base85 import *  # may re-export pure.base85 functions

This means we'll have to use policy.importmod() function in place of the
standard import statement, but we wouldn't want to write it every place where
C extension modules are used. So this patch makes util host base85 functions.
2017-04-26 21:56:47 +09:00
Yuya Nishihara
32bd8b34ed mdiff: move re-exports to top
This style seems more common in our codebase.
2017-05-02 17:05:22 +09:00
Denis Laxalde
927c1336ab mdiff: add a hunkinrange helper function
This factors out hunk filtering logic by line range that is similar in
mdiff.blocksinrange() and hgweb.webutil.diffs().
2017-04-01 12:24:59 +02:00
Pulkit Goyal
87883687d7 diff: slice over bytes to make sure conditions work normally
Both of this are part of generating `hg diff` on python 3.
2017-03-26 20:52:51 +05:30
Pulkit Goyal
eb657f47dd diff: use pycompat.{byteskwargs, strkwargs} to switch opts b/w bytes and str 2017-03-26 20:58:21 +05:30
Denis Laxalde
6759ce9345 mdiff: let unidiff return (diffheader, hunks)
This will be used to make it possible to filter diff hunks based on this range
information.

Now unidiff returns a 'hunks' generator that yield tuple (hunkrange,
hunklines) coming from _unidiff() with 'newline at end of file' processing.
2017-03-03 17:46:40 +01:00
Denis Laxalde
1ec128500e mdiff: extract a checknonewline inner function in unidiff() 2017-03-03 17:46:28 +01:00
Denis Laxalde
fbe932a316 mdiff: distinguish diff headers from hunks in unidiff()
Let unidiff return the list of headers it produces (lines '--- <original>' and
'+++ <new>') apart from diff hunks. In patch.diff(), we combine headers
generated there (not specific to unified format) with those from unidiff().
By returning a list of header lines, we do not append new lines in datetag
inner function of unidiff() so that all header lines are '\n'.join-ed in a
similar way.
2017-03-03 13:51:22 +01:00
Denis Laxalde
dab0828dbc mdiff: let _unidiff yield hunks as (<range information>, <hunk lines>)
Now _unidiff yields each hunk lines packed into a tuple with the "range
information" `(s1, l1, s2, l2)` that is used to build the typical hunk header
'@@ -s1,l1 +s2,l2 @@'.
This will be used to make it possible to filter diff hunks based on this range
information.

The new "range information" is ignored in unidiff() (only caller of _unidiff)
for now.
2017-03-02 17:22:46 +01:00
Denis Laxalde
e158e9848e mdiff: turn the comment above _unidiff into a docstring 2017-01-09 09:34:39 +01:00
Denis Laxalde
9d2de61e1a mdiff: compute newlines-splitted texts within _unidiff
There is no reason to compute splitted texts l1, l2 in unidiff() before
calling _unidiff as they are only used with the latter function.
2016-09-27 20:27:35 +02:00
Sean Farley
bf5e8cb800 patch: add similarity config knob in experimental section
This config knob will control whether or not to show the similarity
calculation in the diff output:

  diff --git a/README.md b/foo.md
  similarity index 88%
  rename from README.md
  rename to foo.md
  --- a/README.md
  +++ b/foo.md
2017-01-09 10:51:44 -08:00
Sean Farley
8cd1b5827c patch: add config knob for displaying the index header
This config knob can take an integer between 0 and 40 or a
keyword ('none', 'short', 'full') to control the length of hash to
output. It will display diffs with the git index header as such,

  diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
  index 112edf7..d6b52c5 100644

We'll put this in the experimental section for now.
2017-01-09 11:13:47 -08:00
Denis Laxalde
dc8e8fcbf9 mdiff: add a "blocksinrange" function to filter diff blocks by line range
The function filters diff blocks as generated by mdiff.allblock function based
on whether they are contained in a given line range based on the "b-side" of
blocks.
2017-01-03 18:15:58 +01:00
Philippe Pepiot
f0ea56f87f mdiff: remove unused parameter 'refine' from allblocks() 2016-09-27 14:46:34 +02:00
Gregory Szorc
fac475dd90 mdiff: remove use of __slots__
The use of __slots__ was added way back in 2006 in ce444c810fcf.
__slots__ isn't necessary for this class.
2016-06-25 13:52:46 -07:00
Mike Hommey
ddcf0dbb88 mdiff: don't emit a diff header for empty trivial deltas
An empty trivial delta, coded as (0, 0, 0) makes the delta application
do nothing, but still takes 12 bytes, while skipping it altogether works
as much, without taking any space at all.
2016-01-11 22:00:07 -05:00
Gregory Szorc
ffa0c7e97c mdiff: use absolute_import 2015-12-21 21:26:14 -08:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Mike Edgar
af0ba8e839 mdiff: add helper for making deltas which replace the full text of a revision
This helper will be used initially for censor-aware delta generation. Deltas
which replace the full contents of the base revision are guaranteed to apply
correctly regardless of whether the delta recipient has censored the base.

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-01-21 16:35:09 -05:00
Siddharth Agarwal
ce2b0fd888 mdiff.unidiff: add support for noprefix 2014-11-12 23:29:14 -08:00
Siddharth Agarwal
535dc77081 mdiff.diffopts: add a new noprefix option
By popular demand, we introduce an option to disable the 'a/' and 'b/'
prefixes in diff output. This makes copying and pasting filenames from diff
output easier.

This option will be implemented and documented in upcoming patches. To ensure
that existing scripts that parse output don't break, we will ensure that this
prefix is disabled in plain mode. A straight 'hg export | hg import' without
HGPLAIN=1 will still be broken though, but there's little that can be done
about that.
2014-11-12 23:25:32 -08:00
Siddharth Agarwal
3deaceac88 mdiff.diffopts: add doc comment for nobinary 2014-11-12 23:19:44 -08:00
Stephen Lee
b831a97d01 diff: add nobinary config to suppress git-style binary diffs 2014-06-21 15:56:49 +10:00
Augie Fackler
9f876f6c89 cleanup: move stdlib imports to their own import statement
There are a few warnings still produced by my import checker, but
those are false positives produced by modules that share a name with
stdlib modules.
2013-11-06 16:48:06 -05:00
Guillermo Pérez
2c1f380e8a diff: move index header generation to patch
In an upcoming patch, we will add index information to all git diffs, not
only binary diffs, so this code needs to be moved to a more appropriate
place.

Also, since this information is used for patch headers, it makes more
sense to be in the patch module, along with other patch-related metadata.
2012-11-15 15:16:41 -08:00
Guillermo Pérez
33e8ab6c12 diff: move diffline to patch module
diffline is not part of diff computation, so it makes more sense
to place it with other header generation in patch module.

In upcoming patches we will generalize this approach for
all headers added in the patch, including the git index
header.
2012-11-15 12:19:03 -08:00
Guillermo Pérez
9482817c96 diff: unify calls to diffline
diffline was called from trydiff for binary diffs and from unidiff
for text diffs. In this patch we unify those calls into one.

diffline is also a header, not part of diff mechanisms, so it makes
sense to remove that responsibility from the mdiff module. In
upcoming patches we will move diffline to patch module and
keep grouping responsibilities.
2012-11-15 12:16:08 -08:00
Guillermo Pérez
d54d996bfb diff: move b85diff to mdiff module
b85diff generates a binary diff, so we move this code to mdiff module
along with unidiff for text diffs. All diffing mechanisms will be in the
same place.

In an upcoming patch we will remove the responsibility to print the
index header from b85diff and move it back to patch, since it's
a patch metadata header, not part of the diff generation.
2012-11-06 14:04:05 -08:00
Patrick Mezard
1e2e55778b mdiff: fix diff header generation for files with spaces (issue3357)
diff ---/+++ should end filenames with a TAB when they contain spaces. Current
code failed to do so when only the +++ file had spaces. This only happened with
git renames from a name without space to one with space.
2012-04-05 15:39:07 +02:00
Patrick Mezard
a884b59cee mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)
When diffing the following documents with --ignore-blank-lines (-B):

  $ cat > a <<EOF
  >
  >
  >
  > b
  > x
  > d
  > EOF

and:

  $ cat > b <<EOF
  > b
  > y
  > d
  > EOF

the context lines are taken from the first document, even if the lines differ
(with -w or -b) or if the number of lines differ (with -B). In the second case,
we have to adjust the hunk new lines offsets or we end with inconsistent diffs
like (see the @@ offsets):

  diff -r 0e66aa54f318 a
  --- a/a
  +++ b/a
  @@ -1,4 +1,3 @@

   b
  -x
  +y
   d

Note that having different context lines in a and b means the diff can be
applied but is not invertible.

Reported by Nicholas Riley <com-selenic@sabi.net>
2012-02-06 21:17:50 +01:00
Matt Mackall
9f8ee10163 util: don't mess with builtins to emulate buffer() 2011-12-15 15:27:11 -06:00
Patrick Mezard
c4cef76e25 mdiff: replace wscleanup() regexps with C loops
On my system it reduces:

  hg annotate -w mercurial/commands.py

from 36s to less than 8s, to be compared with 6.3s when run without whitespace
options.
2011-11-18 14:23:03 +01:00
Patrick Mezard
252fc23b56 mdiff: split lines in allblocks() only when necessary
These are only required to handle the --ignore-blank-lines case
2011-11-18 14:16:47 +01:00
Patrick Mezard
cc3315778f annotate: support diff whitespace filtering flags (issue3030)
splitblock() was added to handle blocks returned by bdiff.blocks() which differ
only by blank lines but are not made only of blank lines. I do not know exactly
how it could happen but mdiff.blocks() threshold behaviour makes me think it
can if those blocks are made of very popular lines mixed with popular blank
lines. If it is proven to be wrong, the function can be dropped.

The first implementation made annotate share diff configuration entries. But it
looks like users will user -w/b for annotate but not for diff, on both the
command line and hgweb. Since the latter cannot use command line entries, we
introduce a new [annotate] section duplicating the diff whitespace options.
2011-11-18 12:04:31 +01:00
Patrick Mezard
4772aafb85 mdiff: make diffblocks() return all blocks, matching and changed
Annotate uses matching blocks not changed ones.
2011-11-18 12:01:04 +01:00
Patrick Mezard
ddbd229fe3 mdiff: extract blocks whitespace normalization in diffblocks()
We want to reuse it in annotate for whitespace normalization.
2011-11-18 11:53:38 +01:00
Matt Mackall
75db0d196a merge with stable 2011-11-17 16:53:17 -06:00
Patrick Mezard
16811ddcba diff: --ignore-blank-lines was too enthusiastic
It was ignoring changes from:

ab

to:

a
b
2011-11-13 21:37:14 +01:00
Nicolas Venegas
2582d1e3d9 mdiff/patch: fix bad hunk handling for unified diffs with zero context
Prior to this patch "hg diff -U0", i.e., zero lines of context, would
output hunk headers with a start line one greater than what GNU patch
and git output. Guido van Rossum documents the unified diff format[1]
as having a start line value "one lower than one would expect" for
zero length hunks.

Comparing the behaviour of the three systems prior to this patch in
transforming

  c1
  c3

to

  c1
  c2
  c3

- GNU "diff -U0" reports the hunk as "@@ -1,0 +2 @@"
- "git diff -U0" reports the hunk as "@@ -1,0 +2 @@"
- "hg diff -U0" reports the hunk as "@@ -2,0 +2,1 @@"

After this patch, "hg diff -U0" reports "@@ -1,0 +2,1 @@".

Since "hg export --config diff.unified=0" outputs zero-context unified
diffs, "hg import" has also been updated to account for start lines
one less than expected for zero length hunk ranges.

[1]: http://www.artima.com/weblogs/viewpost.jsp?thread=164293
2011-11-09 16:55:59 -08:00
Matt Mackall
8f2b7260c4 merge with stable 2011-11-10 11:00:27 -06:00
Mads Kiilerich
dcf70e02fc diff: always use / in paths in diff
Subrepo diffs would sometimes use backslash on windows.
2011-11-07 02:49:00 +01:00
Brodie Rao
7e742515c2 mdiff: speed up showfunc for large diffs
This addresses the following issues with showfunc:

- Silly usage of regular expressions.
- Doing str.rstrip() needlessly in an inner loop.
- Doing catastrophic backtracking when trying to find a function line.

Finding function text is now at worst O(n lines in the old file), and
at best close to O(n hunks).

Given a diff like this[1]:

 src/main/antlr3/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunker.g        |      4 +-
 src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerLexer.java  |      2 +-
 src/main/java/uk/ac/cam/ch/wwmm/pregenerated/ChemicalChunkerParser.java |  29189 +++++----
 3 files changed, 14741 insertions(+), 14454 deletions(-)

[1]: https://bitbucket.org/wwmm/chemicaltagger/changeset/d2bfbaecd4fc/raw

Without this change, hg log --stat --config diff.showfunc=1 takes an
absurdly long time to complete:

   CallCount    Recursive    Total(ms)   Inline(ms) module:lineno(function)
       32813            0     80.3546     40.6086   mercurial.mdiff:160(yieldhunk)
   +65062746            0     25.7227     25.7227   +<method 'match' of '_sre.SRE_Pattern' objects>
   +65062746            0     14.0221     14.0221   +<method 'rstrip' of 'str' objects>
       +1809            0      0.0009      0.0009   +mercurial.mdiff:148(contextend)
       +1809            0      0.0003      0.0003   +<len>
    65062746            0     25.7227     25.7227   <method 'match' of '_sre.SRE_Pattern' objects>
    65062763            0     14.0221     14.0221   <method 'rstrip' of 'str' objects>
         543            0      0.1631      0.1631   <zlib.decompress>
           3            0      0.0505      0.0505   <mercurial.bdiff.blocks>
       31007            0     80.4564      0.0477   mercurial.mdiff:147(_unidiff)
      +32813            0     80.3546     40.6086   +mercurial.mdiff:160(yieldhunk)
          +3            0      0.0505      0.0505   +<mercurial.bdiff.blocks>
       +3618            0      0.0022      0.0022   +mercurial.mdiff:154(contextstart)
       +5427            0      0.0013      0.0013   +<len>
          +3            0      0.0001      0.0000   +re:188(compile)
           1            0     80.8381      0.0322   mercurial.patch:1777(diffstatdata)
     +107499            0      0.0235      0.0235   +<method 'startswith' of 'str' objects>
      +31014            0     80.7820      0.0071   +mercurial.util:1284(iterlines)
          +3            0      0.0000      0.0000   +<method 'search' of '_sre.SRE_Pattern' objects>
          +4            0      0.0000      0.0000   +mercurial.patch:1783(addresult)
          +3            0      0.0000      0.0000   +<method 'group' of '_sre.SRE_Match' objects>
           6            0      0.0444      0.0283   mercurial.mdiff:12(splitnewlines)
          +6            0      0.0160      0.0160   +<method 'split' of 'str' objects>
          32            0      0.0246      0.0246   <method 'update' of '_hashlib.HASH' objects>
          11            0      0.0236      0.0236   <method 'read' of 'file' objects>
Time: real 80.880 secs (user 80.200+0.000 sys 0.380+0.000)

With this change, it's almost as fast as not using showfunc at all:

   CallCount    Recursive    Total(ms)   Inline(ms) module:lineno(function)
         543            0      0.1699      0.1699   <zlib.decompress>
           3            0      0.0501      0.0501   <mercurial.bdiff.blocks>
       32813            0      0.0415      0.0348   mercurial.mdiff:161(yieldhunk)
      +70837            0      0.0058      0.0058   +<method 'isalnum' of 'str' objects>
       +1809            0      0.0006      0.0006   +mercurial.mdiff:148(contextend)
       +1809            0      0.0002      0.0002   +<len>
           1            0      0.4879      0.0310   mercurial.patch:1777(diffstatdata)
     +107499            0      0.0230      0.0230   +<method 'startswith' of 'str' objects>
      +31014            0      0.4335      0.0065   +mercurial.util:1284(iterlines)
          +3            0      0.0000      0.0000   +<method 'search' of '_sre.SRE_Pattern' objects>
          +4            0      0.0000      0.0000   +mercurial.patch:1783(addresult)
          +1            0      0.0004      0.0000   +re:188(compile)
          32            0      0.0293      0.0293   <method 'update' of '_hashlib.HASH' objects>
           6            0      0.0427      0.0279   mercurial.mdiff:12(splitnewlines)
          +6            0      0.0147      0.0147   +<method 'split' of 'str' objects>
       31007            0      0.1169      0.0235   mercurial.mdiff:147(_unidiff)
          +3            0      0.0501      0.0501   +<mercurial.bdiff.blocks>
      +32813            0      0.0415      0.0348   +mercurial.mdiff:161(yieldhunk)
       +3618            0      0.0012      0.0012   +mercurial.mdiff:154(contextstart)
       +5427            0      0.0006      0.0006   +<len>
      107597            0      0.0230      0.0230   <method 'startswith' of 'str' objects>
          16            0      0.0213      0.0213   <mercurial.mpatch.patches>
         194            0      0.0149      0.0149   <method 'split' of 'str' objects>
Time: real 0.530 secs (user 0.450+0.000 sys 0.070+0.000)
2011-09-19 15:58:03 -07:00
Mads Kiilerich
35ef3c1409 mdiff: carriage return (\r) is also ignorable whitespace 2010-10-19 03:55:06 +02:00
Benoit Boissinot
0a73b8e369 mdiff.patch(): add a special case for when the base text is empty
remove the special casing from revlog.addgroup()
2010-08-23 13:28:04 +02:00
Benoit Boissinot
621e9c06bf remove header handling out of mdiff.bunidiff, rename it 2010-03-09 18:31:57 +01:00