We will soon have matchers that don't have an _always field, so
largefiles needs to stop assuming that they do. _always is only used
by always(), so we safely replace that method instead.
Instead, load the table by commands.py so the debug commands should always
be populated. The table in debugcommands.py is unnamed so extension authors
wouldn't be confused to wrap debugcommands.table in place of commands.table.
The goal is to get rid of the debugcommands -> commands dependency.
Since globalopts is the property of the commands, it's kept in the commands
module.
cmdutil.command wasn't a member of the registrar framework only for a
historical reason. Let's make that happen. This patch keeps cmdutil.command
as an alias for extension compatibility.
lfutil.getstandinmatcher() was setting match._always to False because
it wanted a matcher of no patterns to match no files and match.match()
instead matches everything. However, since 2ef3f2a8de5b (largefiles:
ensure lfutil.getstandinmatcher() only matches standins, 2015-08-12),
it never actually passes an empty list of patterns, so the hack has
become unnecessary.
BTW, C implementation of hexdigest() for SHA-1/256/512 returns hex
hash in lower case, and doctest in Python standard hashlib assumes
that, too. But it isn't explicitly described in API document or so.
Therefore, we can't assume that hexdigest() always returns hex hash in
lower case, for any hash algorithms, on any Python runtimes and
versions.
From point of view of that, it is reasonable for portability that
77f8c025a6ef applies lower() on hex hash in overridefilemerge().
But on the other hand, in largefiles extension, there are still many
code paths comparing between hex hashes or storing hex hash into
standin file, without lower().
Switching to hash algorithm other than SHA-1 may be good chance to
clarify our policy about hexdigest()-ed hash value string.
- assume that hexdigest() always returns hex hash in lower case, or
- apply lower() on hex hash in appropriate layers to ensure
lower-case-ness of it for portability
As the name describes, the 2nd argument 'revorctx' of copytostore()
can accept non-changectx value, for historical reason,
But, since e91ac285f700, copyalltostore(), the only one copytostore()
client in Mercurial source tree, always passes changectx as
'revorctx'.
Therefore, it is reasonable to make copytostore() accept only
changectx as the 2nd argument, now.
AFAIK, 'uploaded' argument of copytostore() (or copytocache(), before
renaming at e2d2a21b7e90) has been never used both on caller and
callee sides, since official release of bundled largefiles extension.
copyalltostore(), only one caller of copytostore(), already knows
standin file name of the target largefile. Therefore, passing it to
copytostore() is more efficient than calculating it in copytostore()
or readstandin().
This will be used to centralize and encapsulate the logic to read hash
from given (filectx of) standin file. readstandin() isn't suitable for
this purpose, because there are some code paths, which want to read
hex hash directly from filectx.
Before this patch, updatestandin() takes "standin" argument, and
applies splitstandin() on it to pick out a path to largefile (aka
"lfile" or so) from standin.
But in fact, all callers already knows "lfile". In addition to it,
many callers knows both "standin" and "lfile".
Therefore, making updatestandin() take only one of "standin" or
"lfile" is inefficient.
repo['.'] is called not as "working context" but as "parent context".
In this code path, hash value of current content of file should be
compared against hash value recorded in "parent context".
Therefore, "wctx" may cause misunderstanding in this case.
Before this patch, this code path contains two loops for m._files: one
for replacement with standin, and another for elimination of None,
which comes from previous replacement ("standin in wctx or
lfdirstate[f] == 'r'" case in tostandin()).
These two loops can be unified into simple one "for" loop.
Updating standin for newly added largefile is needed, only if same
name largefile exists in destination context at linear merging. In
such case, updated standin is used to detect divergence of largefile
at overridefilemerge().
Otherwise, standin doesn't have any responsibility for its content
(usually, it is empty).
This patch also renames argument of hexsha1(), not only for
readability ("data" isn't good name for file-like object), but also
for reviewability (including hexsha1() code helps reviewers to confirm
how these functions are similar).
BTW, copyandhash() has also similar logic, but it can't reuse
hexsha1(), because it writes read-in data into specified fileobj
simultaneously.
There are some code paths, which apply standin() on same value
multilpe times instead of using already standin()-ed value.
"fstandin" is common name for "path to standin file" in lfutil.py, to
avoid shadowing "standin()".
readstandin() takes "node" argument to get changectx by "repo[node]".
There are some readstandin() invocations, which use ctx.node(),
ctx.rev(), or '.' as "node" argument above, even though corresponded
changectx object is already looked up on caller side.
This patch calls readstandin() with already known changectx itself, to
avoid meaningless re-construction of changectx (indirect case via
copytostore() is also included).
BTW, copytostore() uses "rev" argument only for readstandin()
invocation. Therefore, this patch also renames it to "revorctx" to
indicate that it can take not only revision ID or so but also
changectx, for readability.
There are many isstandin() invocations before splitstandin().
The former examines whether specified path starts with ".hglf/". The
latter returns after ".hglf/" of specified path if it starts with that
prefix, or returns None otherwise.
Therefore, value returned by splitstandin() can be used for
replacement of preceding isstandin(), and this replacement can omit
redundant string comparison after isstandin().
Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
The decoders were already run by default for the main repo, so this seemed like
an oversight.
The extdiff extension has been using 'archive' since a80ec1ea2694 to support -S,
and a colleague noticed that after diffing, making changes, and closing it, the
line endings were wrong for the diff-tool modified files in the subrepository.
(Files in the parent repo were correct, with the same .hgeol settings.) The
editor (Visual Studio in this case) reloads the file, but doesn't notice the EOL
change. It still adds new lines with the original EOL setting, and the file
ends up inconsistent.
Without this change, the first file `cat`d in the test prints '\r (esc)' EOL,
but the second doesn't on Windows or Linux.
pycompat.getenv returns os.getenvb on py3 which is not available on Windows.
This patch replaces them with encoding.environ.get and checks to ensure no
new instances of os.getenv or os.setenv are introduced.
os.getenv deals with unicodes on Python 3, so we have pycompat.osgetenv to
deal with bytes. This patch replaces occurrences on os.getenv with
pycompat.osgetenv
Currently, the "streamres" response type is populated with a generator
of chunks with compression possibly already applied. This puts the onus
on commands to perform chunking and compression. Architecturally, I
think this is the wrong place to perform this work. I think commands
should say "here is the data" and the protocol layer should take care
of encoding the final bytes to put on the wire.
Additionally, upcoming commits will improve wire protocol support for
compression. Having a central place for performing compression in the
protocol transport layer will be easier than having to deal with
compression at the commands layer.
This commit refactors the "streamres" response type to accept either
a generator or an object with "read." Additionally, the type now
accepts a flag indicating whether the response is a "version 1
compressible" response. This basically identifies all commands
currently performing compression. I could have used a special type
for this, but a flag works just as well. The argument name
foreshadows the introduction of wire protocol changes, hence the "v1."
The code for chunking and compressing has been moved to the output
generation function for each protocol transport. Some code has been
inlined, resulting in the deletion of now unused methods.
I somehow ended up in a situation where hg crashed on an unlink I introduced in
8fd3fc1ef4c6.
I don't know how it happened and can't reproduce it. It seems like it only can
happen when the file is removed between the time of check in a working
directory context walk that finds a standin file, and the time of use when we
try to remove it because the corresponding largefile doesn't exist.
But better safe than sorry: replace the plain unlink with unlinkpath with
ignoremissing=True. That will also remove remaining empty directories, which
arguably is more correct.