Commit Graph

714 Commits

Author SHA1 Message Date
Matt Harbison
e77c1dddf3 largefiles: set the extension as enabled locally after a share requiring it
This has been done for clone since bd19f94d30e9, so it makes sense here for the
same reasons.
2017-04-11 20:54:50 -04:00
FUJIWARA Katsunori
89f77ed920 largefiles: use readasstandin() to read hex hash directly from filectx
BTW, C implementation of hexdigest() for SHA-1/256/512 returns hex
hash in lower case, and doctest in Python standard hashlib assumes
that, too. But it isn't explicitly described in API document or so.

Therefore, we can't assume that hexdigest() always returns hex hash in
lower case, for any hash algorithms, on any Python runtimes and
versions.

From point of view of that, it is reasonable for portability that
77f8c025a6ef applies lower() on hex hash in overridefilemerge().

But on the other hand, in largefiles extension, there are still many
code paths comparing between hex hashes or storing hex hash into
standin file, without lower().

Switching to hash algorithm other than SHA-1 may be good chance to
clarify our policy about hexdigest()-ed hash value string.

  - assume that hexdigest() always returns hex hash in lower case, or

  - apply lower() on hex hash in appropriate layers to ensure
    lower-case-ness of it for portability
2017-04-01 02:32:49 +09:00
FUJIWARA Katsunori
ff4c75957b largefiles: remove unused readstandin()
Now, there is no client of readstandin().
2017-04-01 02:32:49 +09:00
FUJIWARA Katsunori
156e7d4f73 largefiles: make copytostore() accept only changectx as the 2nd argument (API)
As the name describes, the 2nd argument 'revorctx' of copytostore()
can accept non-changectx value, for historical reason,

But, since e91ac285f700, copyalltostore(), the only one copytostore()
client in Mercurial source tree, always passes changectx as
'revorctx'.

Therefore, it is reasonable to make copytostore() accept only
changectx as the 2nd argument, now.
2017-04-01 02:32:48 +09:00
FUJIWARA Katsunori
7b2d7893cb largefiles: remove unused keyword argument of copytostore() (API)
AFAIK, 'uploaded' argument of copytostore() (or copytocache(), before
renaming at e2d2a21b7e90) has been never used both on caller and
callee sides, since official release of bundled largefiles extension.
2017-04-01 02:32:48 +09:00
FUJIWARA Katsunori
15d9fae8a1 largefiles: add copytostore() fstandin argument to replace readstandin() (API)
copyalltostore(), only one caller of copytostore(), already knows
standin file name of the target largefile. Therefore, passing it to
copytostore() is more efficient than calculating it in copytostore()
or readstandin().
2017-04-01 02:32:48 +09:00
FUJIWARA Katsunori
35dbbb1699 largefiles: replace readstandin() by readasstandin()
These code paths already (or should, for efficiency at repetition)
know the target changectx and path of standin file.
2017-04-01 02:32:47 +09:00
FUJIWARA Katsunori
a88efa2831 largefiles: introduce readasstandin() to read hex hash from given filectx
This will be used to centralize and encapsulate the logic to read hash
from given (filectx of) standin file. readstandin() isn't suitable for
this purpose, because there are some code paths, which want to read
hex hash directly from filectx.
2017-04-01 02:32:31 +09:00
FUJIWARA Katsunori
3d36ed3225 largefiles: add lfile argument to updatestandin() for efficiency (API)
Before this patch, updatestandin() takes "standin" argument, and
applies splitstandin() on it to pick out a path to largefile (aka
"lfile" or so) from standin.

But in fact, all callers already knows "lfile". In addition to it,
many callers knows both "standin" and "lfile".

Therefore, making updatestandin() take only one of "standin" or
"lfile" is inefficient.
2017-03-27 09:44:36 +09:00
FUJIWARA Katsunori
f21209d88f largefiles: use strip() instead of slicing to get rid of EOL of standin
This slicing prevents from replacing SHA-1 by another (= longer hash
value) in the future.
2017-03-27 09:44:36 +09:00
FUJIWARA Katsunori
f10b10ff74 largefiles: rename local variable appropriately
repo['.'] is called not as "working context" but as "parent context".

In this code path, hash value of current content of file should be
compared against hash value recorded in "parent context".

Therefore, "wctx" may cause misunderstanding in this case.
2017-03-27 09:44:36 +09:00
FUJIWARA Katsunori
3e84715300 largefiles: avoid redundant loop to eliminate None from list
Before this patch, this code path contains two loops for m._files: one
for replacement with standin, and another for elimination of None,
which comes from previous replacement ("standin in wctx or
lfdirstate[f] == 'r'" case in tostandin()).

These two loops can be unified into simple one "for" loop.
2017-03-27 09:44:35 +09:00
FUJIWARA Katsunori
bbdc4d8596 largefiles: avoid meaningless changectx looking up
Logically, "repo[ctx.node()]" should be equal to "ctx".

In addition to it, this redundant code path is repeated
"len(match.m_files)" times.
2017-03-27 09:44:35 +09:00
FUJIWARA Katsunori
fd045393ce largefiles: avoid redundant changectx looking up at each repetitions
These code paths look up changectx at each repetitions, even though
the changectx key isn't changed while loop.
2017-03-27 09:44:35 +09:00
FUJIWARA Katsunori
068a31b4e7 largefiles: omit updating newly added standin at linear merging
Updating standin for newly added largefile is needed, only if same
name largefile exists in destination context at linear merging. In
such case, updated standin is used to detect divergence of largefile
at overridefilemerge().

Otherwise, standin doesn't have any responsibility for its content
(usually, it is empty).
2017-03-27 09:44:34 +09:00
FUJIWARA Katsunori
6fcd56dea5 largefiles: reuse hexsha1() to centralize hash calculation logic into it
This patch also renames argument of hexsha1(), not only for
readability ("data" isn't good name for file-like object), but also
for reviewability (including hexsha1() code helps reviewers to confirm
how these functions are similar).

BTW, copyandhash() has also similar logic, but it can't reuse
hexsha1(), because it writes read-in data into specified fileobj
simultaneously.
2017-03-27 09:44:34 +09:00
FUJIWARA Katsunori
7f301613fe largefiles: avoid redundant standin() invocations
There are some code paths, which apply standin() on same value
multilpe times instead of using already standin()-ed value.

"fstandin" is common name for "path to standin file" in lfutil.py, to
avoid shadowing "standin()".
2017-03-24 22:29:22 +09:00
FUJIWARA Katsunori
319a3de075 largefiles: replace hashrepofile by hashfile (API)
There is only one user for the former, and repo.wjoin()-ed value is
alread known by that user.
2017-03-24 22:29:22 +09:00
FUJIWARA Katsunori
b0cce9d114 largefiles: call readstandin() with changectx itself instead of rev or node
readstandin() takes "node" argument to get changectx by "repo[node]".

There are some readstandin() invocations, which use ctx.node(),
ctx.rev(), or '.' as "node" argument above, even though corresponded
changectx object is already looked up on caller side.

This patch calls readstandin() with already known changectx itself, to
avoid meaningless re-construction of changectx (indirect case via
copytostore() is also included).

BTW, copytostore() uses "rev" argument only for readstandin()
invocation. Therefore, this patch also renames it to "revorctx" to
indicate that it can take not only revision ID or so but also
changectx, for readability.
2017-03-24 22:26:34 +09:00
FUJIWARA Katsunori
0813b4a24f largefiles: omit redundant splitstandin() invocations
There are 3 splitstandin() invocations in updatestandin() for same
"standin" value.
2017-03-24 22:24:59 +09:00
FUJIWARA Katsunori
9222f27b6b largefiles: replace splitstandin() by isstandin() to omit str creation
If splitstandin()-ed str itself isn't used, isstandin() should be
used instead of it, to omit meaningless str creation.
2017-03-24 22:24:59 +09:00
FUJIWARA Katsunori
53a96c883c largefiles: omit redundant isstandin() before splitstandin()
There are many isstandin() invocations before splitstandin().

The former examines whether specified path starts with ".hglf/". The
latter returns after ".hglf/" of specified path if it starts with that
prefix, or returns None otherwise.

Therefore, value returned by splitstandin() can be used for
replacement of preceding isstandin(), and this replacement can omit
redundant string comparison after isstandin().
2017-03-24 22:24:58 +09:00
FUJIWARA Katsunori
aaa8db9cef misc: update descriptions about removed file for filectxfn
Since 2eef89bfd70d, filectxfn for memctx should return None for
removed file instead of raising IOError.
2017-03-24 22:13:23 +09:00
Pierre-Yves David
654e9bcf93 largefiles: don't use mutable default argument value
Caught by pylint.
2017-03-14 23:49:10 -07:00
Pierre-Yves David
91ebfa657f largefiles: directly use repo.vfs.join
The 'repo.join' method is about to be deprecated.
2017-03-08 16:52:06 -08:00
Mads Kiilerich
a936a7f3a7 vfs: use repo.wvfs.unlinkpath 2015-01-14 01:15:26 +01:00
Pierre-Yves David
1211038425 vfs: use 'vfs' module directly in 'hgext.largefile'
Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
2017-03-02 13:32:27 +01:00
Pierre-Yves David
e5cb48ac36 vfs: replace 'scmutil.opener' usage with 'scmutil.vfs'
The 'vfs' class is the first class citizen for years. We remove all usages of
the older API. This will let us remove the old API eventually.
2017-03-02 03:52:36 +01:00
Matt Harbison
29208450f3 subrepo: run the repo decoders when archiving
The decoders were already run by default for the main repo, so this seemed like
an oversight.

The extdiff extension has been using 'archive' since a80ec1ea2694 to support -S,
and a colleague noticed that after diffing, making changes, and closing it, the
line endings were wrong for the diff-tool modified files in the subrepository.
(Files in the parent repo were correct, with the same .hgeol settings.)  The
editor (Visual Studio in this case) reloads the file, but doesn't notice the EOL
change.  It still adds new lines with the original EOL setting, and the file
ends up inconsistent.

Without this change, the first file `cat`d in the test prints '\r (esc)' EOL,
but the second doesn't on Windows or Linux.
2017-02-25 21:13:59 -05:00
Yuya Nishihara
d63d83be69 revset: import set classes directly from smartset module
Follows up 97d0be4019ac.
2017-02-19 18:16:09 +09:00
Pulkit Goyal
3c7388da12 py3: replace pycompat.getenv with encoding.environ.get
pycompat.getenv returns os.getenvb on py3 which is not available on Windows.
This patch replaces them with encoding.environ.get and checks to ensure no
new instances of os.getenv or os.setenv are introduced.
2017-01-15 13:17:05 +05:30
Pulkit Goyal
770a0e2938 py3: replace os.getenv with pycompat.osgetenv
os.getenv deals with unicodes on Python 3, so we have pycompat.osgetenv to
deal with bytes. This patch replaces occurrences on os.getenv with
pycompat.osgetenv
2016-12-19 02:54:49 +05:30
Pulkit Goyal
1f6538b90b py3: replace os.name with pycompat.osname (part 2 of 2) 2016-12-19 00:28:12 +05:30
Gregory Szorc
2112fb0fd2 wireproto: perform chunking and compression at protocol layer (API)
Currently, the "streamres" response type is populated with a generator
of chunks with compression possibly already applied. This puts the onus
on commands to perform chunking and compression. Architecturally, I
think this is the wrong place to perform this work. I think commands
should say "here is the data" and the protocol layer should take care
of encoding the final bytes to put on the wire.

Additionally, upcoming commits will improve wire protocol support for
compression. Having a central place for performing compression in the
protocol transport layer will be easier than having to deal with
compression at the commands layer.

This commit refactors the "streamres" response type to accept either
a generator or an object with "read." Additionally, the type now
accepts a flag indicating whether the response is a "version 1
compressible" response. This basically identifies all commands
currently performing compression. I could have used a special type
for this, but a flag works just as well. The argument name
foreshadows the introduction of wire protocol changes, hence the "v1."

The code for chunking and compressing has been moved to the output
generation function for each protocol transport. Some code has been
inlined, resulting in the deletion of now unused methods.
2016-11-20 13:50:45 -08:00
Mads Kiilerich
d18a73f120 largefiles: clarify variable name holding file mode
A follow-up to 9ce3ccc6ef9c.

'st' sounds like the whole stat result while 'mode' is a better name for the
actual file mode.
2016-10-18 16:45:39 +02:00
Mads Kiilerich
4409f61ab2 largefiles: handle that a found standin file doesn't exist when removing it
I somehow ended up in a situation where hg crashed on an unlink I introduced in
8fd3fc1ef4c6.

I don't know how it happened and can't reproduce it. It seems like it only can
happen when the file is removed between the time of check in a working
directory context walk that finds a standin file, and the time of use when we
try to remove it because the corresponding largefile doesn't exist.

But better safe than sorry: replace the plain unlink with unlinkpath with
ignoremissing=True. That will also remove remaining empty directories, which
arguably is more correct.
2016-10-27 20:06:33 +02:00
Kevin Bullock
734a4f1625 merge default into stable for 4.0 code freeze 2016-10-18 14:15:15 -05:00
Mads Kiilerich
8f6d6ccc08 largefiles: fix 'deleted' files sometimes persistently appearing with R status
A code snippet that has been around since largefiles was introduced was wrong:
Standins no longer found in lfdirstate has *not* been removed -
they have probably just been deleted ... or not created.

This wrong reporting did that 'up -C' didn't undo the change and didn't sync
the two dirstates.

Instead of reporting such files as removed, propagate the deletion to the
standin file and report the file as deleted.
2016-10-17 17:12:24 +02:00
Mads Kiilerich
231fdaf8a2 largefiles: more safe handling of interruptions while updating modifications
Largefiles are fragile with the design where dirstate and lfdirstate must be
kept in sync.

To be less fragile, mark all clean largefiles as unsure ("normallookup") before
updating standins. After standins have been updated and we know exactly which
largefile standins actually was changed, mark the unchanged largefiles back to
clean ("normal").

This will make the failure mode more safe. If interrupted, the next command
will continue to perform extra hashing of all largefiles. That will do that all
largefiles that are out of sync with their standin will be marked dirty and
they will show up in status and can be cleaned with update --clean.
2016-10-16 02:29:45 +02:00
Mads Kiilerich
39f2a13215 util: increase filechunkiter size to 128k
util.filechunkiter has been using a chunk size of 64k for more than 10 years,
also in years where Moore's law still was a law. It is probably ok to bump it
now and perhaps get a slight win in some cases.

Also, largefiles have been using 128k for a long time. Specifying that size
multiple times (or forgetting to do it) seems a bit stupid. Decreasing it to
64k also seems unfortunate.

Thus, we will set the default chunksize to 128k and use the default everywhere.
2016-10-14 01:53:15 +02:00
Mads Kiilerich
f926541834 largefiles: always use filechunkiter when iterating files
Before, we would sometimes use the default iterator over large files. That
iterator is line based and would add extra buffering and use odd chunk sizes
which could give some overhead.

copyandhash can't just apply a filechunkiter as it sometimes is passed a
genuine generator when downloading remotely.
2016-10-12 12:22:18 +02:00
Mads Kiilerich
7afa73604d largefiles: use context for file closing
Make the code slightly smaller and safer (and more deeply indented).
2016-10-08 00:59:41 +02:00
Mads Kiilerich
73724460ec largefiles: when setting/clearing x bit on largefiles, don't change other bits
It is only the X bit that it matters to copy from the standin to the largefile
in the working directory. While it generally doesn't do any harm to copy the
whole mode, it is also "wrong" to copy more than the X bit we care about. It
can make a difference if someone should try to handle largefiles differently,
such as marking them read-only.

Thus, do similar to what utils.setflags does and set the X bit where there are
R bits and obey umask.
2016-10-08 00:59:40 +02:00
FUJIWARA Katsunori
7a8d36afcf doc: trim newline at the end of exception message 2016-08-01 06:08:25 +09:00
liscju
347bb0767e largefiles: check file in the repo store before checking remotely (issue5257)
Problem was files to check were gathered in the repository where
the verify was launched but verification was done on the remote
store. It was observed when user committed in cloned repository
and ran verify before pushing - committed files were marked
as non existing.

This commit fixes this by checking in the remote store only files
that are not existing in the repository store where verify was launched.

Solution is similiar to 909b9d8f9ae7
2016-06-23 22:37:17 +02:00
liscju
36f41df99d largefiles: remove additional blank lines
It does not conform to the coding style.
2016-06-27 10:33:33 +02:00
liscju
a37f11d3d7 largefiles: fix misleading comments in lfutil instore and storepath
Problem in both cases is cache in largefiles has assigned
meaning - user cache which is additional place to get/put
files. Those two function works on store - the main place
to store largefiles in the repository - .hg/largefiles and
using "cache" to describe it is misleading.
2016-06-24 09:08:16 +02:00
liscju
7e9f22c475 largefiles: remove additional blank line between methods in localstore
According to the coding style it should be a single blank line
between functions.
2016-06-24 11:51:00 +02:00
liscju
1d5b38f7bf largefiles: make storefactory._openstore public
In storefactory opening store is the main functionality,
so it shouldn't be marked as private with underscore.
2016-06-14 11:21:41 +02:00
Matt Mackall
3eee63391d merge with stable 2016-06-14 14:52:58 -05:00