Commit Graph

124 Commits

Author SHA1 Message Date
Jun Wu
b7eb2e64e3 mdiff: use xdiff for diff calculation
Summary:
Let's switch to xdiff for its better diff quality and performance!

The test changes demonstrate xdiff's better diff quality.

Reviewed By: ryanmce

Differential Revision: D7135206

fbshipit-source-id: 1775df6fc0f763df074b4f52779835d6ef0f3a4e
2018-04-13 21:51:21 -07:00
Ryan McElroy
c3c6110969 mdiff: remove rewindhunk by yielding a bool first to indicate data
Summary: This backports upstream rev 6a33e81e4c5e.

Reviewed By: quark-zju

Differential Revision: D6951333

fbshipit-source-id: 90b77aa1b758ffe794c82b859bc00bc7122f1cf7
2018-04-13 21:51:10 -07:00
Ryan McElroy
b3ba4d0fde mdiff: explicitly compute places for the newline marker
Summary:
This backports upstream rev a9d07bd8f758. This continues a patch series
that Augie thought was well worth the speed up it provided.

Reviewed By: quark-zju

Differential Revision: D6951332

fbshipit-source-id: a35fce1560a3e2b182faf5871ceb9068a52d697b
2018-04-13 21:51:10 -07:00
Ryan McElroy
edae55b64a patch: avoid repeated binary checks if all files in a patch are text
Summary: This backports upstream rev 079b27b5a869. It saves 3-4% on diffs.

Reviewed By: quark-zju

Differential Revision: D6951334

fbshipit-source-id: 889851e9638e2eeb43549af31e25d75632eccc2b
2018-04-13 21:51:09 -07:00
Jun Wu
f1c575a099 flake8: enable F821 check
Summary:
This check is useful and detects real errors (ex. fbconduit).  Unfortunately
`arc lint` will run it with both py2 and py3 so a lot of py2 builtins will
still be warned.

I didn't find a clean way to disable py3 check. So this diff tries to fix them.
For `xrange`, the change was done by a script:

```
import sys
import redbaron

headertypes = {'comment', 'endl', 'from_import', 'import', 'string',
               'assignment', 'atomtrailers'}

xrangefix = '''try:
    xrange(0)
except NameError:
    xrange = range

'''

def isxrange(x):
    try:
        return x[0].value == 'xrange'
    except Exception:
        return False

def main(argv):
    for i, path in enumerate(argv):
        print('(%d/%d) scanning %s' % (i + 1, len(argv), path))
        content = open(path).read()
        try:
            red = redbaron.RedBaron(content)
        except Exception:
            print('  warning: failed to parse')
            continue
        hasxrange = red.find('atomtrailersnode', value=isxrange)
        hasxrangefix = 'xrange = range' in content
        if hasxrangefix or not hasxrange:
            print('  no need to change')
            continue

        # find a place to insert the compatibility  statement
        changed = False
        for node in red:
            if node.type in headertypes:
                continue
            # node.insert_before is an easier API, but it has bugs changing
            # other "finally" and "except" positions. So do the insert
            # manually.
            # # node.insert_before(xrangefix)
            line = node.absolute_bounding_box.top_left.line - 1
            lines = content.splitlines(1)
            content = ''.join(lines[:line]) + xrangefix + ''.join(lines[line:])
            changed = True
            break

        if changed:
            # "content" is faster than "red.dumps()"
            open(path, 'w').write(content)
            print('  updated')

if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))
```

For other py2 builtins that do not have a py3 equivalent, some `# noqa`
were added as a workaround for now.

Reviewed By: DurhamG

Differential Revision: D6934535

fbshipit-source-id: 546b62830af144bc8b46788d2e0fd00496838939
2018-04-13 21:51:09 -07:00
Matthieu Laneuville
14498f2bf5 patch: add within-line color diff capacity
The `diff' command usually writes deletion in red and insertions in green. This
patch adds within-line colors, to highlight which part of the lines differ.
Lines to compare are decided based on their similarity ratio, as computed by
difflib SequenceMatcher, with an arbitrary threshold (0.7) to decide at which
point two lines are considered entirely different (therefore no inline-diff
required).

The current implementation is kept behind an experimental flag in order to test
the effect on performance. In order to activate it, set inline-color-diff to
true in [experimental].
2017-10-26 00:13:38 +09:00
Pulkit Goyal
fbd5da487b py3: use '%d' for integers instead of '%s'
Differential Revision: https://phab.mercurial-scm.org/D973
2017-10-02 04:48:06 +05:30
David Soria Parra
6d9f90fa8d mdiff: add a --ignore-space-at-eol option
Add an option that only ignores whitespaces at EOL. The name of the option is
the same as Git.

.. feature::

   Added `--ignore-space-at-eol` diff option to ignore whitespace differences
   at line endings.

Differential Revision: https://phab.mercurial-scm.org/D422
2017-08-29 18:20:50 -07:00
Pulkit Goyal
5cabeba9d4 py3: use pycompat.strkwargs() to convert kwargs keys to str 2017-06-27 00:23:32 +05:30
Yuya Nishihara
5fe7742660 mpatch: switch to policy importer 2016-08-13 12:18:58 +09:00
Yuya Nishihara
50b316b748 bdiff: switch to policy importer
# no-check-commit
2016-08-13 12:12:50 +09:00
Yuya Nishihara
d9d64e114f bdiff: proxy through mdiff module
See the previous commit for why.

mdiff seems a good place to host bdiff functions. bdiff.bdiff was already
aliased as textdiff, so we use it.
2017-04-26 22:03:37 +09:00
Yuya Nishihara
ab046506ef base85: proxy through util module
I'm going to replace hgimporter with a simpler import function, so we can
access to pure/cext modules by name:

  # util.py
  base85 = policy.importmod('base85')  # select pure.base85 or cext.base85

  # cffi/base85.py
  from ..pure.base85 import *  # may re-export pure.base85 functions

This means we'll have to use policy.importmod() function in place of the
standard import statement, but we wouldn't want to write it every place where
C extension modules are used. So this patch makes util host base85 functions.
2017-04-26 21:56:47 +09:00
Yuya Nishihara
32bd8b34ed mdiff: move re-exports to top
This style seems more common in our codebase.
2017-05-02 17:05:22 +09:00
Denis Laxalde
927c1336ab mdiff: add a hunkinrange helper function
This factors out hunk filtering logic by line range that is similar in
mdiff.blocksinrange() and hgweb.webutil.diffs().
2017-04-01 12:24:59 +02:00
Pulkit Goyal
87883687d7 diff: slice over bytes to make sure conditions work normally
Both of this are part of generating `hg diff` on python 3.
2017-03-26 20:52:51 +05:30
Pulkit Goyal
eb657f47dd diff: use pycompat.{byteskwargs, strkwargs} to switch opts b/w bytes and str 2017-03-26 20:58:21 +05:30
Denis Laxalde
6759ce9345 mdiff: let unidiff return (diffheader, hunks)
This will be used to make it possible to filter diff hunks based on this range
information.

Now unidiff returns a 'hunks' generator that yield tuple (hunkrange,
hunklines) coming from _unidiff() with 'newline at end of file' processing.
2017-03-03 17:46:40 +01:00
Denis Laxalde
1ec128500e mdiff: extract a checknonewline inner function in unidiff() 2017-03-03 17:46:28 +01:00
Denis Laxalde
fbe932a316 mdiff: distinguish diff headers from hunks in unidiff()
Let unidiff return the list of headers it produces (lines '--- <original>' and
'+++ <new>') apart from diff hunks. In patch.diff(), we combine headers
generated there (not specific to unified format) with those from unidiff().
By returning a list of header lines, we do not append new lines in datetag
inner function of unidiff() so that all header lines are '\n'.join-ed in a
similar way.
2017-03-03 13:51:22 +01:00
Denis Laxalde
dab0828dbc mdiff: let _unidiff yield hunks as (<range information>, <hunk lines>)
Now _unidiff yields each hunk lines packed into a tuple with the "range
information" `(s1, l1, s2, l2)` that is used to build the typical hunk header
'@@ -s1,l1 +s2,l2 @@'.
This will be used to make it possible to filter diff hunks based on this range
information.

The new "range information" is ignored in unidiff() (only caller of _unidiff)
for now.
2017-03-02 17:22:46 +01:00
Denis Laxalde
e158e9848e mdiff: turn the comment above _unidiff into a docstring 2017-01-09 09:34:39 +01:00
Denis Laxalde
9d2de61e1a mdiff: compute newlines-splitted texts within _unidiff
There is no reason to compute splitted texts l1, l2 in unidiff() before
calling _unidiff as they are only used with the latter function.
2016-09-27 20:27:35 +02:00
Sean Farley
bf5e8cb800 patch: add similarity config knob in experimental section
This config knob will control whether or not to show the similarity
calculation in the diff output:

  diff --git a/README.md b/foo.md
  similarity index 88%
  rename from README.md
  rename to foo.md
  --- a/README.md
  +++ b/foo.md
2017-01-09 10:51:44 -08:00
Sean Farley
8cd1b5827c patch: add config knob for displaying the index header
This config knob can take an integer between 0 and 40 or a
keyword ('none', 'short', 'full') to control the length of hash to
output. It will display diffs with the git index header as such,

  diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
  index 112edf7..d6b52c5 100644

We'll put this in the experimental section for now.
2017-01-09 11:13:47 -08:00
Denis Laxalde
dc8e8fcbf9 mdiff: add a "blocksinrange" function to filter diff blocks by line range
The function filters diff blocks as generated by mdiff.allblock function based
on whether they are contained in a given line range based on the "b-side" of
blocks.
2017-01-03 18:15:58 +01:00
Philippe Pepiot
f0ea56f87f mdiff: remove unused parameter 'refine' from allblocks() 2016-09-27 14:46:34 +02:00
Gregory Szorc
fac475dd90 mdiff: remove use of __slots__
The use of __slots__ was added way back in 2006 in ce444c810fcf.
__slots__ isn't necessary for this class.
2016-06-25 13:52:46 -07:00
Mike Hommey
ddcf0dbb88 mdiff: don't emit a diff header for empty trivial deltas
An empty trivial delta, coded as (0, 0, 0) makes the delta application
do nothing, but still takes 12 bytes, while skipping it altogether works
as much, without taking any space at all.
2016-01-11 22:00:07 -05:00
Gregory Szorc
ffa0c7e97c mdiff: use absolute_import 2015-12-21 21:26:14 -08:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Mike Edgar
af0ba8e839 mdiff: add helper for making deltas which replace the full text of a revision
This helper will be used initially for censor-aware delta generation. Deltas
which replace the full contents of the base revision are guaranteed to apply
correctly regardless of whether the delta recipient has censored the base.

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-01-21 16:35:09 -05:00
Siddharth Agarwal
ce2b0fd888 mdiff.unidiff: add support for noprefix 2014-11-12 23:29:14 -08:00
Siddharth Agarwal
535dc77081 mdiff.diffopts: add a new noprefix option
By popular demand, we introduce an option to disable the 'a/' and 'b/'
prefixes in diff output. This makes copying and pasting filenames from diff
output easier.

This option will be implemented and documented in upcoming patches. To ensure
that existing scripts that parse output don't break, we will ensure that this
prefix is disabled in plain mode. A straight 'hg export | hg import' without
HGPLAIN=1 will still be broken though, but there's little that can be done
about that.
2014-11-12 23:25:32 -08:00
Siddharth Agarwal
3deaceac88 mdiff.diffopts: add doc comment for nobinary 2014-11-12 23:19:44 -08:00
Stephen Lee
b831a97d01 diff: add nobinary config to suppress git-style binary diffs 2014-06-21 15:56:49 +10:00
Augie Fackler
9f876f6c89 cleanup: move stdlib imports to their own import statement
There are a few warnings still produced by my import checker, but
those are false positives produced by modules that share a name with
stdlib modules.
2013-11-06 16:48:06 -05:00
Guillermo Pérez
2c1f380e8a diff: move index header generation to patch
In an upcoming patch, we will add index information to all git diffs, not
only binary diffs, so this code needs to be moved to a more appropriate
place.

Also, since this information is used for patch headers, it makes more
sense to be in the patch module, along with other patch-related metadata.
2012-11-15 15:16:41 -08:00
Guillermo Pérez
33e8ab6c12 diff: move diffline to patch module
diffline is not part of diff computation, so it makes more sense
to place it with other header generation in patch module.

In upcoming patches we will generalize this approach for
all headers added in the patch, including the git index
header.
2012-11-15 12:19:03 -08:00
Guillermo Pérez
9482817c96 diff: unify calls to diffline
diffline was called from trydiff for binary diffs and from unidiff
for text diffs. In this patch we unify those calls into one.

diffline is also a header, not part of diff mechanisms, so it makes
sense to remove that responsibility from the mdiff module. In
upcoming patches we will move diffline to patch module and
keep grouping responsibilities.
2012-11-15 12:16:08 -08:00
Guillermo Pérez
d54d996bfb diff: move b85diff to mdiff module
b85diff generates a binary diff, so we move this code to mdiff module
along with unidiff for text diffs. All diffing mechanisms will be in the
same place.

In an upcoming patch we will remove the responsibility to print the
index header from b85diff and move it back to patch, since it's
a patch metadata header, not part of the diff generation.
2012-11-06 14:04:05 -08:00
Patrick Mezard
1e2e55778b mdiff: fix diff header generation for files with spaces (issue3357)
diff ---/+++ should end filenames with a TAB when they contain spaces. Current
code failed to do so when only the +++ file had spaces. This only happened with
git renames from a name without space to one with space.
2012-04-05 15:39:07 +02:00
Patrick Mezard
a884b59cee mdiff: adjust hunk offsets with --ignore-blank-lines (issue3234)
When diffing the following documents with --ignore-blank-lines (-B):

  $ cat > a <<EOF
  >
  >
  >
  > b
  > x
  > d
  > EOF

and:

  $ cat > b <<EOF
  > b
  > y
  > d
  > EOF

the context lines are taken from the first document, even if the lines differ
(with -w or -b) or if the number of lines differ (with -B). In the second case,
we have to adjust the hunk new lines offsets or we end with inconsistent diffs
like (see the @@ offsets):

  diff -r 0e66aa54f318 a
  --- a/a
  +++ b/a
  @@ -1,4 +1,3 @@

   b
  -x
  +y
   d

Note that having different context lines in a and b means the diff can be
applied but is not invertible.

Reported by Nicholas Riley <com-selenic@sabi.net>
2012-02-06 21:17:50 +01:00
Matt Mackall
9f8ee10163 util: don't mess with builtins to emulate buffer() 2011-12-15 15:27:11 -06:00
Patrick Mezard
c4cef76e25 mdiff: replace wscleanup() regexps with C loops
On my system it reduces:

  hg annotate -w mercurial/commands.py

from 36s to less than 8s, to be compared with 6.3s when run without whitespace
options.
2011-11-18 14:23:03 +01:00
Patrick Mezard
252fc23b56 mdiff: split lines in allblocks() only when necessary
These are only required to handle the --ignore-blank-lines case
2011-11-18 14:16:47 +01:00
Patrick Mezard
cc3315778f annotate: support diff whitespace filtering flags (issue3030)
splitblock() was added to handle blocks returned by bdiff.blocks() which differ
only by blank lines but are not made only of blank lines. I do not know exactly
how it could happen but mdiff.blocks() threshold behaviour makes me think it
can if those blocks are made of very popular lines mixed with popular blank
lines. If it is proven to be wrong, the function can be dropped.

The first implementation made annotate share diff configuration entries. But it
looks like users will user -w/b for annotate but not for diff, on both the
command line and hgweb. Since the latter cannot use command line entries, we
introduce a new [annotate] section duplicating the diff whitespace options.
2011-11-18 12:04:31 +01:00
Patrick Mezard
4772aafb85 mdiff: make diffblocks() return all blocks, matching and changed
Annotate uses matching blocks not changed ones.
2011-11-18 12:01:04 +01:00
Patrick Mezard
ddbd229fe3 mdiff: extract blocks whitespace normalization in diffblocks()
We want to reuse it in annotate for whitespace normalization.
2011-11-18 11:53:38 +01:00
Matt Mackall
75db0d196a merge with stable 2011-11-17 16:53:17 -06:00