Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
The 'scmutil' is growing large (1500+ lines) and 2/5 of it is related to vfs.
We extract the 'vfs' related code in its own module get both module back to a
better scale and clearer contents.
We keep all the references available in 'scmutil' for now as many reference
needs to be updated.
Add information about tree manifests, copy edit the text and fix up a few
ambiguities.
The document also contains a few additional fixes from Siddharth Agarwal
<sid0@fb.com>, who used it to build a parser for changegroups in Rust.
Before this patch, similarity detection logic (for addremove and
automv) depends entirely on SHA-1 digesting. But this causes incorrect
rename detection, if:
- removing file A and adding file B occur at same committing, and
- SHA-1 hash values of file A and B are same
This may prevent security experts from managing sample files for
SHAttered issue in Mercurial repository, for example.
https://security.googleblog.com/2017/02/announcing-first-sha1-collision.htmlhttps://shattered.it/
Hash collision itself isn't so serious for core repository
functionality of Mercurial, described by mpm as below, though.
https://www.mercurial-scm.org/wiki/mpm/SHA1
This patch compares between actual file contents after hash comparison
for exact identity.
Even after this patch, SHA-1 is still used, because it is reasonable
enough to quickly detect existence of "(almost) same" file.
- replacing SHA-1 causes decreasing performance, and
- replacement of it has ambiguity, yet
Getting content of removed file (= rfctx.data()) at each exact
comparison should be cheap enough, even though getting content of
added one costs much.
======= ============== =====================
file fctx data() reads from
======= ============== =====================
removed filectx in-memory revlog data
added workingfilectx storage
======= ============== =====================
In "aftertrans", we rename "journal.*" to "undo.*". We expect "journal.*"
files to disappear after renaming.
However, if "journal.foo" and "undo.foo" refer to a same file (hardlink),
rename may be a no-op, leaving both files on disk, according to Linux
manpage [1]:
If oldpath and newpath are existing hard links referring to the same
file, then rename() does nothing, and returns a suc‐ cess status.
The POSIX specification [2] is not very clear about what to do.
To be safe, remove "undo.*" before the rename so "journal.*" cannot be left
on disk.
[1]: http://man7.org/linux/man-pages/man2/rename.2.html
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/
Previously, dirstate.savebackup unconditionally dumps the dirstate map to
disk. It may require loading dirstate first to be able to dump it. Those
operations could be expensive if the dirstate is big, and could be avoided
if we know the dirstate file is up-to-date.
This patch avoids the read and write if the dirstate is clean. In that case,
we just do a plain copy without any serialization.
This should make commands which use transactions but do not touch dirstate
faster. For example, "hg bookmark -r REV NAME".
Previously, dirstate.write() would iterate over the entire dirstate to find any
entries that needed to be marked 'lookup' (i.e. if they have the same timestamp
as now). This was O(working copy) and slow in large repos. It was most visible
when rebasing or histediting multiple commits, since it gets executed once per
commit, even if the entire rebase/histedit is wrapped in a transaction.
The fix is to track which files have been editted, and only check those to see
if they need to be marked as 'lookup'. This saves 25% on histedit times in very
large repositories.
I tested this by adding temporary debug logic to verify that the old files
processed in the loop matched the new files processed in the loop and running
the test suite.
When reverting to the parent of working directory, operation is "discard" so
we want hunks to be presented in the same order as the diff (i.e. "reversed").
So we do not query the experimental.revertalternateinteractivemode option in
this case and always set "reversehunks" to True.
Same as 'revs', this predicate does not select files but switches the evaluation
context. This allow to match file according arbitrary status call. We can now
express the same query as 'hg status'.
The API (two 'revsingle' class) have been picked instead of a single 'revs'
revset for multiple reasons:
* it is less confusing to express
* it allow to express more query (eg: backward status, cross branch status)
Unlike other functions, "revs()" does not select files but switches the
evaluation context. This allow to match file with property in another revision
that the one currently evaluated.
This changeset is based on work from Yuya Nishihara.
Future patches will add a function to switch mctx.ctx object so that we can
forcibly evaluate a fileset expression in a specified revision. For example,
new "revs()" function will be used to match predicate agains another revision
$ hg revert 'set:revs(42, added())'
fullmatchctx class is similar to revset.fullreposet. It will allow us to
recalculate the subset only if it is not filtered yet.
This can be used to flag patches by branch or topic automatically. Flags
optionally given by --flag option are exported as {flags} template keyword,
so you can add --flag V2.
I want to move _getpatches() to _getpatchmsgs() to make sure each patch text
is tied with the corresponding revision number. This helps adding templater
support.
IIRC, the pbranch extension doesn't work with the recent Mercurial versions,
so the removal of this option wouldn't hurt.
This allows us to build data not written to the console. That would be
doable by ui.pushbuffer()/popbuffer(), but changing the file object seems
cleaner.
This is a step towards fixing extension load warnings on Python
3. Note that I suspect there are still some bugs in this area and that
things like color won't work, but the code at least executes and
prints text to the console correctly now.
This helps with some debugging in Python 3, and shouldn't hurt
anything in Python 2. The unusual construction using getattr is done
so that StringIO/BytesIO instances can be used as well as real files.
With experimental.updatecheck=noconflict, if the update is aborted
because of conlicts, "uncommitted changes" is not quite
accurate. Let's use "conflicting changes" instead. Also fix the hint
to recomment --clean, not --merge, since that's what we do for other
failed updates.
I was clearly very sloppy when I wrote the test case for
experimental.updatecheck=noconflict. The test case that checks that
one can move between commits with a removed file was deleting a file
that was modified between the source and target commits, which
resulted in a "change/delete" conflict. Since that is a conlict, the
update correctly failed. Let's fix the test by removing a file that is
not modified between the two commits.
Some formatter-based commands provide fields that are identical to the ones
defined in templatekw, but we had to specify them manually to support all
changeset-based template keywords.
This patch adds fm.context() that populates all templatekw. These keywords
are available only in template output, so we still need to set important
keywords via fm.data() if they should be available in e.g. JSON output.
Currently fm.context() takes only 'ctx' argument. It will eventually be
extended to take 'fctx' to support file-based keywords (e.g. {path}) seen
in hgweb.
These templates are used when rendering inner lists of some template keywords,
so it makes sense to define them in templatekw. This allows us to reuse them
to create a templateformatter knowing changectx.