This feature is more a way to test patching without a working directory than
something people asked about. Adding a --rev option to specify the parent patch
revision would make it a little more useful.
What this change introduces is patch.repobackend class which let patches be
applied against repository revisions. The caller must supply a filestore object
to receive patched content, which can be turned into a memctx with
patch.makememctx() helper.
- Add patchmeta.copy() and emit copies from iterhunks. Modifying patchmeta
instances in applydiff() makes things simpler.
- Rename selectfile() into makepatchmeta(). It is responsible for creating
patchmeta for regular patches.
- Pass patchmeta objects to patchfile() directly
patchmeta instances were associated with git patches, for regular patches we
had to pass additional variables to tell the patch intent to patchfile().
Instead, we generate patchmeta for regular patches and pass them. This will
also help with patch filtering by matcher objects.
This information is more correctly returned by backends.
The extra updated file removed from test-mq-merge.t output came from changes
from git patches being counted before being really applied in some cases.
Synchronizing on bfile does not work on file removal where bfile is /dev/null.
We match items on afile or bfile instead. The incorrect code makes iterhunks()
to emit patchmeta and hunks separately in some cases. This is currently hidden
by applydiff() being too tolerant when processing patchmeta, and will be fixed
later.
git patches may require copies to be handled out-of-order. For instance, take
the following sequence:
* modify a
* copy a into b
Here, we have to generate b from a before its modification. To do so,
applydiff() was scanning for copy metadata and performing the copies before
processing the other changes in-order. While smart and efficient, this approach
complicates things by handling file copies and file creations at different
places and times. While a new file must not exist before being patched a copied
file already exists before applying the first hunk.
Instead of copying the files at their final destination before patching, we
store them in a temporary file location and retrieve them when patching. The
filestore always stores file content in real files but nothing prevents adding
a cache layer. The filestore class was kept separate from fsbackend for at
least two reasons:
- This class is likely to be reused as a temporary result store for a future
repository patching call (entries just have to be extended to contain copy
sources).
- Delegating this role to backends might be more efficient in a repository
backend case: the source files are already available in the repository itself
and do not need to be copied again. It also means that third-parties backend
would have to implement two other methods. If we ever decide to merge the
filestore feature into backend, a minimalistic approach would be to compose
with filestore directly. Keep in mind this copy overhead only applies for
copy/rename sources, and may even be reduced to copy sources which have to
handled ahead of time.
The patcher has to know if a file is being created or removed to check if the
target already exists, or to actually unlink the file when a hunk emptying it
is applied. This was done by embedding the creation/removal information in the
first (and only) hunk attached to the file.
There are two problems with this approach:
- creation/removal is really a property of the file being patched and not its
hunk.
- for regular patches, file creation cannot be deduced at parsing time: there
are case where the *stripped* file paths must be compared. Modifying hunks
after their creation is clumsy and prevent further refactorings related to
copies handling.
Instead, we delegate this job to selectfile() which has all the relevant
information, and remove the hunk createfile() and rmfile() methods.
Restore the previous diffstat behaviour of scaling by the maximum number of
changes to a single file. Changeset 7bb0e22a7988 modified the diffstat to be
scaled by the total number of changes. This seems to have been unintentional.
gitpatch objects emitted by iterhunks() were referencing file paths unmodified
from the input patch. _applydif() made them usable by modifying the gitpatch
objects in-place with specified path strip level. The same modified objects
were then reused by iterhunks() generator. _applydiff() now copies and update
the paths which completely decouples both routines.
As a side effect, the "git" event now receives only metadata about
copies/renames to perform the necessary copies ahead of time. Other actions are
handled in the "file" event.
Patch changes are emitted by iterhunks() in two separate events: 'file' when
hunks have to be applied and 'git' to describe other modifications like copies
or mode changes. Note that a file which mode is changed and which content is
modified by the same patch will be emitted in both events. It is more
convenient to handle all file modifications in a single event. This patch
"zips" git actions with regular changes so both kinds can be emitted at the
same place.
git afile/bfile are extracted twice, once when reading a 'diff --git', and
again when reading a unified hunk. The problem is not all git blocks have
unified hunks (renames just have metadata) and they were not extracted the same
way. This is what this patch unifies.
gitpatch objects emitted by iterhunks() are modified in place by applydiff().
Processing them earlier improves iterhunks() isolation. applydiff() modifying
them should still be fixed though.
_updatedir() is no longer used by internalpatch()
The change in test-mq-missingfiles.t comes from workingbackend not considering
the missing 'b' file as changed, thus not calling addremove() on it.
This introduces a performance regression for large files, as they will be
copied just to be clobbered afterwards since binary patching does not use
deltas. But it simplifies the code and the previous optimization will be
reintroduced later in a better way.
_applydiff() patcher argument was added to help hgsubversion like extension
monkeypatching the patching process. While it could be removed at this point, I
prefer to leave it until patch.py is completely refactored and there is a valid
and tested alternative.
Most filesystem calls are already isolated in patchfile but this is not enough:
renames are performed before patchfile is available and some chmod calls are
even done outside of the applydiff call. Once all these calls are extracted
into a backend class, we can provide cleaner APIs to write to a working
directory context directly into the repository.
The local err variable would be bound to PatchError thrown and it
keeps this value even after the except block is executed.
The whole thing worked anyway since the rejected variable is set in
the except block and this makes the function return -1 when a
PatchError is thrown.
Why?
- Mercurial internal patcher works correctly for regular patches and git
patches, is much faster at least on Windows and is more extensible.
- In theory, the external patcher can be used to handle exotic patch formats. I
do not know any and have not heard about any such use in years.
- Most patch programs cannot handle git format patches, which makes the API
caller to decide either to ignore ui.patch by calling patch.internalpatch()
directly, or take the risk of random failures with valid inputs.
- One thing a patch program could do Mercurial patcher cannot is applying with
--reverse. Apparently several shelve like extensions try to use that,
including passing the "reverse" option to Mercurial patcher, which has been
removed mid-2009. I never heard anybody complain about that, and would prefer
reimplementing it anyway.
And from the technical perspective:
- The external patcher makes everything harder to maintain and implement. EOL
normalization is not implemented, and I would bet file renames, if supported
by the patcher, are not correctly recorded in the dirstate.
- No tests.
How?
- Remove related documentation
- Clearly mark patch.externalpatch() as private
- Remove the debuginstall check. This deprecation request was actually
triggered by this last point. debuginstall is the only piece of code patching
without a repository. When migrating to an integrated patch() + updatedir()
call, this was really a showstopper, all workarounds were either ugly or
uselessly complicated to implement. If we do not support external patcher
anymore, the debuginstall check is not useful anymore.
- Remove patch.externalpatch() after 1.9 release.
The patch changes the output of "hg diff --stat" when one file whose filename
has spaces has changed, making it get the full filename instead of just the
substring between the last space and the end of the filename.
It also changes the diffstat generated by "hg email -d" when one of the commit
messages starts with "diff". Because of the regex used to parse the filename,
the diffstat generated by "hg email -d" will still be not correct if a commit
message starts with "diff -r ".
Before the patch Mercurial has the following behavior:
$ echo "foobar">"file with spaces"
$ hg add "file with spaces"
$ hg diff --stat
spaces | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
$ hg diff --git --stat
file with spaces | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
After the patch:
$ echo "foobar">"file with spaces"
$ hg add "file with spaces"
$ hg diff --stat
file with spaces | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
$ hg diff --git --stat
file with spaces | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
Before the patch:
$ hg add mercurial/patch.py tests/tests-diffstat.t
$ hg commit -m "diffstat: fix parsing of filenames"
$ hg email -d --test tip
This patch series consists of 1 patches.
diffstat: fix parsing of filenames
[...]
filenames | 0
mercurial/patch.py | 6 ++++--
tests/test-diffstat.t | 17 +++++++++++++++++
3 files changed, 21 insertions(+), 2 deletions(-)
[...]
After the patch:
$ hg email -d --test tip
This patch series consists of 1 patches.
diffstat: fix parsing of filenames
[...]
mercurial/patch.py | 6 ++++--
tests/test-diffstat.t | 17 +++++++++++++++++
3 files changed, 21 insertions(+), 2 deletions(-)
[...]
Do not pass reject file content to patchfile.writelines() to:
- Avoid line endings transformations
- Avoid polluting overriding implementations with unrelated data. They should
override write_rej() to deal or ignore reject files properly.
Bug report, analysis and original patch and test by
Shun-ichi GOTO <shunichi.goto@gmail.com>
Previously no '# ' lines came through the parser.
Now only the first '# ' lines are processed, from '# HG changeset patch' and to
the first line not starting with '# '.
Some encoding and language combinations (e.g.: UTF-8 and Japanese)
cause encoding characters into sequence of bytes more than column
width of them.
So, encoding.colwidth() should be applied instread of len() on i18n
strings.
In addition to it, formatting by '%*s'/'%-*s' also uses "number of
bytes" to calculate space padding size, and should be fixed, too.
If you have a diff line that matches a header line, the patch splitter
currently breaks your patch at this line. For example a line like:
+key: value
This can lead to "malformed patch" exceptions. Now fixed.
If applydiff() raises an exception, the opened patch file is kept alive in the
exception context. If it is a temporary file (for instance supplied by import
command with stdin input), Windows cannot clean it up.
The built-in patch implementation applied the hunks to the wrong lines of the
file if the file in the repo has been modified to skew the patch line numbers
and the file contains repetitive sequences of lines.
Most of the time, one can reverse a diff by swapping the revisions passed with
-r but it happens that if you use the global -R, and diff against the tip of
the current repo, you can't swap the revisions. One use-case for that is
reviewing changes from a bundle before unbundling. One could also pipe the
output of `hg diff` to a command line filter that reverses the diff, but that
would remove the benefit from color diffs. Therefore, having an option in
`hg diff` to reverse a diff is a good thing.
The option flag selection was tricky. GNU patch uses -R/--reverse but -R is
already used as a global option and --reverse would make --rev ambiguous.
Normally, diffs without any text insertions or deletions are reported
as having 0 lines changed by stock diffstat. Compatibility is
preserved with stock diffstat in this case, but when using --git,
binary files are marked with Bin as a means of clarification.
git diff --stat does something similar, though it also includes the
old and new file sizes.
Symlink creations and deletions were handled with a special symlinkhunk object,
working like a binary hunk. However, this model does not support symlink
updates or replacements, so we teach regular hunks how to handle symlinks.
The previous method of scaling had a tendency to include graph lines that
went past the output width when the file with the most changes had a very
large number of changes.
The intent is to fix many issues involving patching when win32ext is enabled.
With win32ext, the working directory and repository files EOLs are not the same
which means that patches made on a non-win32ext host do not apply cleanly
because of EOLs discrepancies. A theorically correct approach would be
transform either the patched file or the patch content with the
encoding/decoding filters used by win32ext. This solution is tricky to
implement and invasive, instead we prefer to address the win32ext case, by
offering a way to ignore input EOLs when patching and rewriting them when
saving the patched result.
In worst case, generating diff in upgrade mode can be two times more expensive
than generating it in git mode directly: we may have to regenerate the whole
diff again whenever a git feature is detected. Also, the first diff attempt is
completely buffered instead of being streamed. That said, even without having
profiled it yet, I am convinced we can fast-path the upgrade mode if necessary
were it to be used in regular diff commands, and not only in mq where avoiding
data loss is worth the price.
With eolmode set to 'lf' or 'crlf' we avoided the hunk duplication and
normalization by reading the input patch in text mode. Dropping this
optimization simplifies code expectations for a small overhead.
The change in test-mq-eol comes from a tolerance to CRLF instead of LF for last
lines without newlines being broken by this revision. This tolerance was only
partially supported and will be added again in a better way.
EOLs in patched files are restored to their original value after
patching. We use the first EOL found in the file, files with
inconsistent EOLs will thus be normalized during this process.
The old code mapped the value of eolmode ('strict', 'crlf' or 'lf') to
eol (None, '\r\n' or '\n') at the entry point in internalpatch. The
value of eol was then used directly as the desired EOL in patchfile.
We now delay the mapping and let patchfile do it instead. This allows
for more complicated behavior where it does not make sense to map
eolmode directly to the target EOLs.
The built-in None object is a singleton and it is therefore safe to
compare memory addresses with is. It is also faster, how much depends
on the object being compared. For a simple type like str I get:
| s = "foo" | s = None
----------+-----------+----------
s == None | 0.25 usec | 0.21 usec
s is None | 0.17 usec | 0.17 usec
The context variable is either True, False or None. Abbreviate it as C
and we get the following truth table where the second column is the
original expression and the third column is the new expression:
C | C or C == None | C is not False
True | True | True
False | False | False
None | True | True
In Python, the backslash in an unrecognized escape sequence is left
behind, which makes '\.' the same as r'\.'. Relying on this feature is
quite brittle, IMHO.
Removed unnecessary string concatenation as well.
It seems like the old behaviour with different handling for commands with and
without path was intended, but I think this behaviour of util.find_exe is
better:
* Always returns existing file
* or None if command not found - no default
* Windows: Returned file thus always ends with extension from PATHEXT
This fixes http://www.selenic.com/mercurial/bts/issue1459. The change might
fix other unintended behaviour too.
Implemented as two functions: diffstat, which yields lines of text,
formatted as a usual diffstat output, and diffstatdata, which is called
inside diffstat to do real performing and yield file names with
appropriate data (numbers of added and removed lines).