The following patch splits up changed lines along tabs (using
re.findall), and gives them a "diff.tab" label. This can be used by
the color extension for colorising tabs, like it does right now with
trailing whitespace.
I also provide corresponding tests.
The internal API used IOError to indicate that a file should be marked as
removed.
There is some correlation between IOError (especially with ENOENT) and files
that should be removed, but using IOErrors to represent file removal internally
required some hacks.
Instead, use the value None to indicate that the file not is present.
Before, spurious IO errors could cause commits that silently removed files.
They will now be reported like all other IO errors so the root cause can be
fixed.
Future patches will allow patch.diff to take a basectx so node1 (and node2)
could make hexfunc error out. Instead, we'll call the node function on the
context object directly.
The `hg import` command gains a `--partial` flag. When specified, a commit will
always be created from a patch import. Any hunk that fails to apply will
create .rej file, same as what `hg qimport` would do. This change is mainly
aimed at preserving changeset metadata when applying a patch, something very
important for reviewers.
In case of failure with `--partial`, `hg import` returns 1 and the following
message is displayed:
patch applied partially
(fix the .rej files and run `hg commit --amend`)
When multiple patches are imported, we stop at the first one with failed hunks.
In the future, someone may feel brave enough to tackle a --continue flag to
import.
Before this patch, "contrib/check-code.py" can't detect these
problems, because the regexp pattern to detect "% inside _()" doesn't
suppose the case that format string consists of multiple string
components concatenated implicitly or explicitly,
This patch does below for that regexp pattern to detect "% inside _()"
problems in such case.
- put "+" into separator part ("[ \t\n]") for explicit concatenation
("...." + "...." style)
- enclose "component and separator" part by "(?:....)+" for
concatenation itself ("...." "...." or "...." + "....")
When creating patches modifying binary files using "git format-patch",
git creates 'literal' and 'delta' hunks. Mercurial currently supports
'literal' hunks only, which makes it impossible to import patches with
'delta' hunks.
This changeset adds support for 'delta' hunks. It is a reimplementation
of patch-delta.c from git :
http://git.kernel.org/cgit/git/git.git/tree/patch-delta.c
This is arguably a workaround, a better fix may be in the repo to
ensure that it won't list a file 'modified' unless there is a file
context for the previous version.
addremove required paths relative to the cwd, which meant a lot of extra code
that transformed paths into relative ones. That code is now gone as well.
This reduces the likelihood of a traceback when trying to email a
patch that happens to have 'diff --git' at the beginning of a line
in the description, as this patch did:
http://markmail.org/message/wxpgowxd7ucxygwe
The check pattern only checked for whitespace between keyword and operator.
Now it also warns:
> x = f(),7
missing whitespace after ,
> x = f()+7
missing whitespace in expression
In an upcoming patch, we will add index information to all git diffs, not
only binary diffs, so this code needs to be moved to a more appropriate
place.
Also, since this information is used for patch headers, it makes more
sense to be in the patch module, along with other patch-related metadata.
addmodehdr is a header helper, same as diffline, so it doesn't
need to be a top-level function and can be nested under trydiff.
In upcoming patches we will generalize this approach for
all headers.
Make diffline more readable, using strings with placeholders
rather than appending to a list from many ifs that makes
difficult to understand the actual output format.
Before, quiet mode produced no diffline header for mercurial as a
side effect of not populating "revs". This was a weird side effect,
and we will always need revs for git index header that will be added
in upcoming patches, so now we just check ui.quiet from diffline
directly.
diffline is not part of diff computation, so it makes more sense
to place it with other header generation in patch module.
In upcoming patches we will generalize this approach for
all headers added in the patch, including the git index
header.
diffline was called from trydiff for binary diffs and from unidiff
for text diffs. In this patch we unify those calls into one.
diffline is also a header, not part of diff mechanisms, so it makes
sense to remove that responsibility from the mdiff module. In
upcoming patches we will move diffline to patch module and
keep grouping responsibilities.
b85diff generates a binary diff, so we move this code to mdiff module
along with unidiff for text diffs. All diffing mechanisms will be in the
same place.
In an upcoming patch we will remove the responsibility to print the
index header from b85diff and move it back to patch, since it's
a patch metadata header, not part of the diff generation.
When applying a patch renaming/copying 'a' to 'b' on a revision where
'a' does not exist, the patching process would abort immediately,
without processing the remaining hunks and without reporting it. This
patch makes the patching no longer abort and possible hunks applied on
the copied/renamed file be written in reject files.
There have been quite a few places where we pop elements off the
front of a list. This can turn O(n) algorithms into something more
like O(n**2). Python has provided a deque type that can do this
efficiently since at least 2.4.
As an example of the difference a deque can make, it improves
perfancestors performance on a Linux repo from 0.50 seconds to 0.36.
Since f7e538c3b7ba, if a chunk starts with "@@ -0,1", oldstart turns into
a negative value. Because diffhelpers.testhunk() doesn't expect negative bstart,
it bypasses "alen > blen - bstart" condition and segfaults at
"PyList_GET_ITEM(b, i + bstart)".
The only place where an trailing CR could be meaningful is in the "diff --git"
line as part of a filename, and existing code already rule out this
possibility. Extend the CR/LF filtering to the whole binary hunk.
$ hg import --no-commit ../mercurial_1915035238540490516.patch
applying ../mercurial_1915035238540490516.patch
abort: could not extract binary data
Becomes:
abort: could not extract "binary2" binary data
Before, import was terminating with a traceback. Now it says:
$ hg import --no-commit ../bad.patch
applying ../bad.patch
abort: could not decode binary patch: bad base85 character at position 66
Git patches are parsed in two phases: 1) extract metadata, 2) parse actual
deltas and merge them with the previous metadata. We do this to avoid
dependency issues like "modify a; copy a to b", where "b" must be copied from
the unmodified "a".
Issue3384 is caused by flaky code I wrote to synchronize the patch metadata
with the emitted hunk:
if (gitpatches and
(gitpatches[-1][0] == afile or gitpatches[-1][1] == bfile)):
gp = gitpatches.pop()[2]
With a patch like:
diff --git a/a b/c
copy from a
copy to c
--- a/a
+++ b/c
@@ -1,1 +1,2 @@
a
+a
@@ -2,1 +2,2 @@
a
+a
diff --git a/a b/a
--- a/a
+++ b/a
@@ -1,1 +1,2 @@
a
+b
the first hunk of the first block is matched with the metadata for the block
"diff --git a/a b/c", then the second hunk of the first block is matched with
the metadata of the second block "diff --git a/a b/a", because of the "or" in
the code paste above. Turning the "or" into an "and" is not enough as we have
to deal with /dev/null cases for each file.
We I remove this broken piece of code:
# copy/rename + modify should modify target, not source
if gp.op in ('COPY', 'DELETE', 'RENAME', 'ADD') or gp.mode:
afile = bfile
because "afile = bfile" set "afile" to stuff like "b/file" instead of "a/file",
and because this only happens for git patches, which afile/bfile are ignored
anyway by applydiff().
v2:
- Avoid a traceback on git metadata desynchronization
Here is how export and mq write the "Parent" header:
mq: # Parent XXXXX
export: # Parent XXXXX
then import expects exactly 2 spaces while mq tolerates one or more. So "hg
import --exact" truncates mq generated patches header by one character and
fails. This patch aligns import "Parent" header parsing on mq one. I do not
expect spaces in parent references anytime soon.
Reported by Stefan Ring <stefanrin@gmail.com>
The previous code was assuming a default context of 3 lines. When fuzzing, it
would take this value in account to reduce the amount of removed line from
hunks top or bottom. For instance, if a hunk has only 2 lines of bottom
context, fuzzing with fuzz=1 would do nothing and with fuzz=2 it would remove
one of those lines. A hunk with one line of bottom context could not be fuzzed
at all. patch(1) has apparently no such restrictions and takes the fuzz level
at face value.
- test-import.t: fuzz/offset changes at the beginning of file are explained by
the new fuzzing behaviour and match patch(1) ones. Patching locations are
different but those of my patch(1) do not make a lot of sense right now
(patched output are the same)
- test-import-bypass.t: more agressive fuzzing makes a patching supposed to
fail because of context, succeed. Change the diff to avoid this.
- test-mq-merge.t: more agressive fuzzing would allow the merged patch to apply
with fuzz, but fortunately we disallow this behaviour. The new output is
kept.
I have not enough experience with patch(1) fuzzing to know whether aligning our
implementation on it is a good or bad idea. Until now, it has been the
implementation reference. For instance, "qpush" tolerates fuzz (test-mq-merge.t
runs the special case of pushing merge revisions where fuzzing is forbidden).
When applying hunks such as:
@@ -2,1 +2,2 @@
context
+change
fuzzing would empty the "old" block and make patchfile.apply() traceback.
Instead, we apply the new block at specified location without testing.
The "bottom hunk" test was removed as patch(1) has no problem applying hunk
with no context in the middle of a file.