This passes the existing diff matcher instance down to the copy logic so copy
tracing can be more efficient when possible and only trace copies for matching
files.
This only actually affects forwardcopies (i.e. Given A<-B<-C<-D, it works for
'hg diff -r B -r D foo.txt', but not for 'hg diff -r D -r B foo.txt') since
backward copies require walking all histories, and not just the individual
file's.
This reduces 'hg diff -r A -r B foo.txt' time from 15s to 1s when A and B have
80,000 files different.
Previously we'd request all results, then filter by relative root. This is
clearly inefficient, so we now restrict the matcher to the relative root for
certain easy cases.
The particular case here is when the matcher matches all files. In that case we
can simply create a matcher by the relative root.
This is purely an optimization and has no impact on correctness.
For now this implementation is pretty naive -- it filters out files right
before passing them into trydiff. In upcoming patches we'll add some more
smarts.
This has several advantages compared to resolving it relative to the root:
- '--prefix .' works as expected.
- consistent with upcoming 'hg diff' option to produce relative patches
(I made sure to put in the (glob) annotations this time!)
Although Python supports `X = Y if COND else Z`, this was only
introduced in Python 2.5. Since we have to support Python 2.4, it was
a very common thing to write instead `X = COND and Y or Z`, which is a
bit obscure at a glance. It requires some intricate knowledge of
Python to understand how to parse these one-liners.
We change instead all of these one-liners to 4-liners. This was
executed with the following perlism:
find -name "*.py" -exec perl -pi -e 's,(\s*)([\.\w]+) = \(?(\S+)\s+and\s+(\S*)\)?\s+or\s+(\S*)$,$1if $3:\n$1 $2 = $4\n$1else:\n$1 $2 = $5,' {} \;
I tweaked the following cases from the automatic Perl output:
prev = (parents and parents[0]) or nullid
port = (use_ssl and 443 or 80)
cwd = (pats and repo.getcwd()) or ''
rename = fctx and webutil.renamelink(fctx) or []
ctx = fctx and fctx or ctx
self.base = (mapfile and os.path.dirname(mapfile)) or ''
I also added some newlines wherever they seemd appropriate for readability
There are probably a few ersatz ternary operators still in the code
somewhere, lurking away from the power of a simple regex.
Rev cb4d72125aae adding a parameter here. This breaks third-party extensions
like crecord and also makes the issue fairly hard to fix on the extension's
side if it wants to retain compatibility across Mercurial versions -- in old
versions, the positional argument will be passed into the next unknown
argument, which is 'files'.
The patch also undoes a change to the record extension that is no longer
necessary.
This is preparation for upcoming patches that will add support for applying a
patch within a subdirectory.
We normalize the prefix here because this is the main driver -- all code to
apply patches is expected to go through here.
This is preparation for upcoming patches that will add support for applying a
patch within a subdirectory.
The prefix is applied after path components are stripped.
The code that identifies copies/renames, as well as the filenames
before and after, is now isolated and we can extract it to a function
so it can be overridden by extensions (in particular the narrow clone
extension).
There is not much left of the first block "if opts.git or losedatafn"
block now. The next patch will move the call to getfilectx() out of
that block. We will then be using the defined-ness of 'f1' to tell
whether the file existed in ctx1 (and under what name). We will need
this information whether or not opts.git or losedatafn was set, so
just remove that guard. The only operation in the block that is not
cheap is the call to getfilectx(), but that has an extra 'if opts.git'
guard already.
--ignore-space-change proves that only 'if opts.git or losedatafn:'
was removed.
f1 and f2 are currently set always set to some filename, even for
added or deleted files. Let's instead set them to None to indicate
that one side of the diff doesn't exist. This lets us use the filename
variables instead of the content variables and simplify a bit since
the empty string is not a valid filename. More importantly, it paves
the way for further simplifications.
By having all the checks for lossiness in one place, it becomes much
easier to get an overview of the conditions that lead to losedatafn()
being called. It also makes it obvious that it can not be called
multiple times for a single time (something that was rather tricky to
determine before).
It's not obvious, but every path in the 'if opts.git or losedatafn:'
block will have checked whether the file is binary [1]. Let's assign
the result of this check to a variable so we can simplify by checking
'binary and opts.git' in only one place instead of every place we
currently assign to 'binarydiff'.
[1] Except when deleting an empty file, but checking whether an empty
string is binary is very cheap anyway.
It seems natural that each element in the list corresponds to one line
of output. That is currently true, but only because each element in
the list has a trailing newline. Let's drop those newlines and instead
add them when we print the headers.
By moving the condition out of diffline(), the call site becomes
clearer and diffline() no longer closes on any variables.
Note that this changes the value of the header variable from [''] to
[], but there is no difference in how these two are treated by the
following code. The new value seems more natural anyway.
Now that there is only a single call to addmodehdr() left, and there
is other similar code (for new/deleted files) around that call site,
let's inline the function there. That also makes it clearer under what
circumstances the header is actually written (when modes differ).
This is the first step towards simplifying the big loop in
trydiff(). This will make both the header code and the non-header code
clearer, and it prepares for further simplification of the many nested
if-statements in the body of the loop.
Use '1' and '2' as suffix for names just like in the parameters
'ctx[12]':
to,tn -> content1,content2
a,b -> f1, f2
omode,mode -> mode1,mode2
omode,nmode -> mode1,mode2
onode,nnode -> node1,node2
oflag,nflag -> flag1,flag2
oindex,nindex -> index1,index2
When creating a diff with copy/rename enabled, we consider added files
and check if they are either copy sources or targets. However, an
added file should never be a copy source. The test suite seems to
agree with this: all tests pass if we raise an exception when an added
file is a copy source. So, let's simplify the code by dropping the
conditions that are never true.
For those interested in the historical reasons:
Before commit c15c00e7afba (patch: separate reverse copy data
(issue1959), 2010-02-11), 'copy' seems to have been a bidirectional
map. Then that commit split it up into two unidirectional maps and
duplicated the logic to look in both maps. It was still needed at that
point to look in both maps, as the copy detection was poor and could
sometimes be reported in reverse.
A little later came 5a644704d5eb (copies: rewrite copy detection for
non-merge users, 2012-01-04). That commit fixed the copy detection to
be backwards when it should, and made the hacks in trydiff
unnecessary.
We started updating 'modifiedset' in e85e48ffb83e (trydiff: simplify
checking for additions, 2014-12-23) but in the same commit, we removed
the last use of the variable. Clean it up.
When a binary source has been copied or renamed into a non-binary
file, we don't check whether the copy source was binary. There is a
code comment explaining that a git mode will be forced anyway in order
to capture the copy record (i.e. losedatafn() will be called). This is
true, but forcing git mode is not the only effect binary files have:
when git mode was already requested, we use the binary-ness to tell us
whether to use a regular unified diff or a git binary diff. The user
sees this as a "Binary file $file has changed" instead of the binary
The 'dodiff' variable is initialized to True and may later be set to
either False or "binary". When it's set to False, we skip everything
after that point, so we can simplify by instead continue-ing (the
loop). We can then also drop the 'if dodiff', since it will always be
true.
The conditional-ness is not clear from the name and there is only one
caller, so it's clearer to check on the call site. Moving it also
makes addindexmeta() no longer close on the 'opts' variable.
In the body of the loop in trydiff(), there are conditions like:
addedset or (f in modifiedset and to is None)
The second half of that expression is to account for the fact that
merge-in additions appear as additions. By instead fixing up the sets
of modified and added files to compensate for this fact, we can
simplify the body of the loop. It also fixes one case where the
addedset was checked without the additional check (the "have we
already reported a copy above?" case in the code, also see fixed test
case).
The similar condition with 'removedset' in it seems to have served no
purpose even before this change, so it could have been simplified even
before.
Note that there is a comment saying "ctx2 date may be dynamic". The
comment was introduced in 8ed3d2a60500 (patch: use contexts for diff,
2006-12-25), but it seems like it stopped being dynamic in that very
changeset -- before that changeset, the date seems to have been the
file's mtime, but after the changeset, it seems to be the changeset's
timestamp (current time for workingctx). Since no one seems to have
missed the "dynamicness", let's simplify and extract a date2 for
symmetry with date1.
This only has a noticeable effect on diffs touching A LOT of
files. For example, it takes
hg diff -r FIREFOX_AURORA_30_BASE -r FIREFOX_AURORA_35_BASE
from 1m55.465s to 1m32.354s. That diff has over 50k files.
These aren't exactly format-breaking features -- just ones for which patches
applied to a repo will produce incorrect commits, In any case, some commands
like record and annotate only care about this feature.
Not all callers are interested in all diffopts -- for example, commands like
record (which use diff internally) break when diffopts like noprefix are
enabled. This function will allow us to add flags that callers can use to
enable only the features they're interested in.
In an upcoming patch we'll enable support as an option to 'hg diff' as well.
The tests reflect the current state of the world -- as we add support we'll see
changes in the test output.
Upcoming patches will add an option that will almost certainly break diff
output parsers when enabled. Add support for forcing an option to something in
plain mode, as a fallback. Options passed in via the CLI are not affected,
though -- it is assumed that any script passing the option in explicitly knows
what it is doing.
The following patch splits up changed lines along tabs (using
re.findall), and gives them a "diff.tab" label. This can be used by
the color extension for colorising tabs, like it does right now with
trailing whitespace.
I also provide corresponding tests.
The internal API used IOError to indicate that a file should be marked as
removed.
There is some correlation between IOError (especially with ENOENT) and files
that should be removed, but using IOErrors to represent file removal internally
required some hacks.
Instead, use the value None to indicate that the file not is present.
Before, spurious IO errors could cause commits that silently removed files.
They will now be reported like all other IO errors so the root cause can be
fixed.
Future patches will allow patch.diff to take a basectx so node1 (and node2)
could make hexfunc error out. Instead, we'll call the node function on the
context object directly.
The `hg import` command gains a `--partial` flag. When specified, a commit will
always be created from a patch import. Any hunk that fails to apply will
create .rej file, same as what `hg qimport` would do. This change is mainly
aimed at preserving changeset metadata when applying a patch, something very
important for reviewers.
In case of failure with `--partial`, `hg import` returns 1 and the following
message is displayed:
patch applied partially
(fix the .rej files and run `hg commit --amend`)
When multiple patches are imported, we stop at the first one with failed hunks.
In the future, someone may feel brave enough to tackle a --continue flag to
import.
Before this patch, "contrib/check-code.py" can't detect these
problems, because the regexp pattern to detect "% inside _()" doesn't
suppose the case that format string consists of multiple string
components concatenated implicitly or explicitly,
This patch does below for that regexp pattern to detect "% inside _()"
problems in such case.
- put "+" into separator part ("[ \t\n]") for explicit concatenation
("...." + "...." style)
- enclose "component and separator" part by "(?:....)+" for
concatenation itself ("...." "...." or "...." + "....")
When creating patches modifying binary files using "git format-patch",
git creates 'literal' and 'delta' hunks. Mercurial currently supports
'literal' hunks only, which makes it impossible to import patches with
'delta' hunks.
This changeset adds support for 'delta' hunks. It is a reimplementation
of patch-delta.c from git :
http://git.kernel.org/cgit/git/git.git/tree/patch-delta.c
This is arguably a workaround, a better fix may be in the repo to
ensure that it won't list a file 'modified' unless there is a file
context for the previous version.
addremove required paths relative to the cwd, which meant a lot of extra code
that transformed paths into relative ones. That code is now gone as well.
This reduces the likelihood of a traceback when trying to email a
patch that happens to have 'diff --git' at the beginning of a line
in the description, as this patch did:
http://markmail.org/message/wxpgowxd7ucxygwe
The check pattern only checked for whitespace between keyword and operator.
Now it also warns:
> x = f(),7
missing whitespace after ,
> x = f()+7
missing whitespace in expression
In an upcoming patch, we will add index information to all git diffs, not
only binary diffs, so this code needs to be moved to a more appropriate
place.
Also, since this information is used for patch headers, it makes more
sense to be in the patch module, along with other patch-related metadata.
addmodehdr is a header helper, same as diffline, so it doesn't
need to be a top-level function and can be nested under trydiff.
In upcoming patches we will generalize this approach for
all headers.
Make diffline more readable, using strings with placeholders
rather than appending to a list from many ifs that makes
difficult to understand the actual output format.
Before, quiet mode produced no diffline header for mercurial as a
side effect of not populating "revs". This was a weird side effect,
and we will always need revs for git index header that will be added
in upcoming patches, so now we just check ui.quiet from diffline
directly.