Commit Graph

153 Commits

Author SHA1 Message Date
FUJIWARA Katsunori
89f77ed920 largefiles: use readasstandin() to read hex hash directly from filectx
BTW, C implementation of hexdigest() for SHA-1/256/512 returns hex
hash in lower case, and doctest in Python standard hashlib assumes
that, too. But it isn't explicitly described in API document or so.

Therefore, we can't assume that hexdigest() always returns hex hash in
lower case, for any hash algorithms, on any Python runtimes and
versions.

From point of view of that, it is reasonable for portability that
77f8c025a6ef applies lower() on hex hash in overridefilemerge().

But on the other hand, in largefiles extension, there are still many
code paths comparing between hex hashes or storing hex hash into
standin file, without lower().

Switching to hash algorithm other than SHA-1 may be good chance to
clarify our policy about hexdigest()-ed hash value string.

  - assume that hexdigest() always returns hex hash in lower case, or

  - apply lower() on hex hash in appropriate layers to ensure
    lower-case-ness of it for portability
2017-04-01 02:32:49 +09:00
FUJIWARA Katsunori
ff4c75957b largefiles: remove unused readstandin()
Now, there is no client of readstandin().
2017-04-01 02:32:49 +09:00
FUJIWARA Katsunori
156e7d4f73 largefiles: make copytostore() accept only changectx as the 2nd argument (API)
As the name describes, the 2nd argument 'revorctx' of copytostore()
can accept non-changectx value, for historical reason,

But, since e91ac285f700, copyalltostore(), the only one copytostore()
client in Mercurial source tree, always passes changectx as
'revorctx'.

Therefore, it is reasonable to make copytostore() accept only
changectx as the 2nd argument, now.
2017-04-01 02:32:48 +09:00
FUJIWARA Katsunori
7b2d7893cb largefiles: remove unused keyword argument of copytostore() (API)
AFAIK, 'uploaded' argument of copytostore() (or copytocache(), before
renaming at e2d2a21b7e90) has been never used both on caller and
callee sides, since official release of bundled largefiles extension.
2017-04-01 02:32:48 +09:00
FUJIWARA Katsunori
15d9fae8a1 largefiles: add copytostore() fstandin argument to replace readstandin() (API)
copyalltostore(), only one caller of copytostore(), already knows
standin file name of the target largefile. Therefore, passing it to
copytostore() is more efficient than calculating it in copytostore()
or readstandin().
2017-04-01 02:32:48 +09:00
FUJIWARA Katsunori
35dbbb1699 largefiles: replace readstandin() by readasstandin()
These code paths already (or should, for efficiency at repetition)
know the target changectx and path of standin file.
2017-04-01 02:32:47 +09:00
FUJIWARA Katsunori
a88efa2831 largefiles: introduce readasstandin() to read hex hash from given filectx
This will be used to centralize and encapsulate the logic to read hash
from given (filectx of) standin file. readstandin() isn't suitable for
this purpose, because there are some code paths, which want to read
hex hash directly from filectx.
2017-04-01 02:32:31 +09:00
FUJIWARA Katsunori
3d36ed3225 largefiles: add lfile argument to updatestandin() for efficiency (API)
Before this patch, updatestandin() takes "standin" argument, and
applies splitstandin() on it to pick out a path to largefile (aka
"lfile" or so) from standin.

But in fact, all callers already knows "lfile". In addition to it,
many callers knows both "standin" and "lfile".

Therefore, making updatestandin() take only one of "standin" or
"lfile" is inefficient.
2017-03-27 09:44:36 +09:00
FUJIWARA Katsunori
f10b10ff74 largefiles: rename local variable appropriately
repo['.'] is called not as "working context" but as "parent context".

In this code path, hash value of current content of file should be
compared against hash value recorded in "parent context".

Therefore, "wctx" may cause misunderstanding in this case.
2017-03-27 09:44:36 +09:00
FUJIWARA Katsunori
6fcd56dea5 largefiles: reuse hexsha1() to centralize hash calculation logic into it
This patch also renames argument of hexsha1(), not only for
readability ("data" isn't good name for file-like object), but also
for reviewability (including hexsha1() code helps reviewers to confirm
how these functions are similar).

BTW, copyandhash() has also similar logic, but it can't reuse
hexsha1(), because it writes read-in data into specified fileobj
simultaneously.
2017-03-27 09:44:34 +09:00
FUJIWARA Katsunori
7f301613fe largefiles: avoid redundant standin() invocations
There are some code paths, which apply standin() on same value
multilpe times instead of using already standin()-ed value.

"fstandin" is common name for "path to standin file" in lfutil.py, to
avoid shadowing "standin()".
2017-03-24 22:29:22 +09:00
FUJIWARA Katsunori
319a3de075 largefiles: replace hashrepofile by hashfile (API)
There is only one user for the former, and repo.wjoin()-ed value is
alread known by that user.
2017-03-24 22:29:22 +09:00
FUJIWARA Katsunori
b0cce9d114 largefiles: call readstandin() with changectx itself instead of rev or node
readstandin() takes "node" argument to get changectx by "repo[node]".

There are some readstandin() invocations, which use ctx.node(),
ctx.rev(), or '.' as "node" argument above, even though corresponded
changectx object is already looked up on caller side.

This patch calls readstandin() with already known changectx itself, to
avoid meaningless re-construction of changectx (indirect case via
copytostore() is also included).

BTW, copytostore() uses "rev" argument only for readstandin()
invocation. Therefore, this patch also renames it to "revorctx" to
indicate that it can take not only revision ID or so but also
changectx, for readability.
2017-03-24 22:26:34 +09:00
FUJIWARA Katsunori
0813b4a24f largefiles: omit redundant splitstandin() invocations
There are 3 splitstandin() invocations in updatestandin() for same
"standin" value.
2017-03-24 22:24:59 +09:00
FUJIWARA Katsunori
53a96c883c largefiles: omit redundant isstandin() before splitstandin()
There are many isstandin() invocations before splitstandin().

The former examines whether specified path starts with ".hglf/". The
latter returns after ".hglf/" of specified path if it starts with that
prefix, or returns None otherwise.

Therefore, value returned by splitstandin() can be used for
replacement of preceding isstandin(), and this replacement can omit
redundant string comparison after isstandin().
2017-03-24 22:24:58 +09:00
Pierre-Yves David
91ebfa657f largefiles: directly use repo.vfs.join
The 'repo.join' method is about to be deprecated.
2017-03-08 16:52:06 -08:00
Pierre-Yves David
1211038425 vfs: use 'vfs' module directly in 'hgext.largefile'
Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
2017-03-02 13:32:27 +01:00
Pierre-Yves David
e5cb48ac36 vfs: replace 'scmutil.opener' usage with 'scmutil.vfs'
The 'vfs' class is the first class citizen for years. We remove all usages of
the older API. This will let us remove the old API eventually.
2017-03-02 03:52:36 +01:00
Pulkit Goyal
3c7388da12 py3: replace pycompat.getenv with encoding.environ.get
pycompat.getenv returns os.getenvb on py3 which is not available on Windows.
This patch replaces them with encoding.environ.get and checks to ensure no
new instances of os.getenv or os.setenv are introduced.
2017-01-15 13:17:05 +05:30
Pulkit Goyal
770a0e2938 py3: replace os.getenv with pycompat.osgetenv
os.getenv deals with unicodes on Python 3, so we have pycompat.osgetenv to
deal with bytes. This patch replaces occurrences on os.getenv with
pycompat.osgetenv
2016-12-19 02:54:49 +05:30
Pulkit Goyal
1f6538b90b py3: replace os.name with pycompat.osname (part 2 of 2) 2016-12-19 00:28:12 +05:30
Mads Kiilerich
39f2a13215 util: increase filechunkiter size to 128k
util.filechunkiter has been using a chunk size of 64k for more than 10 years,
also in years where Moore's law still was a law. It is probably ok to bump it
now and perhaps get a slight win in some cases.

Also, largefiles have been using 128k for a long time. Specifying that size
multiple times (or forgetting to do it) seems a bit stupid. Decreasing it to
64k also seems unfortunate.

Thus, we will set the default chunksize to 128k and use the default everywhere.
2016-10-14 01:53:15 +02:00
Mads Kiilerich
f926541834 largefiles: always use filechunkiter when iterating files
Before, we would sometimes use the default iterator over large files. That
iterator is line based and would add extra buffering and use odd chunk sizes
which could give some overhead.

copyandhash can't just apply a filechunkiter as it sometimes is passed a
genuine generator when downloading remotely.
2016-10-12 12:22:18 +02:00
Mads Kiilerich
7afa73604d largefiles: use context for file closing
Make the code slightly smaller and safer (and more deeply indented).
2016-10-08 00:59:41 +02:00
FUJIWARA Katsunori
7a8d36afcf doc: trim newline at the end of exception message 2016-08-01 06:08:25 +09:00
liscju
36f41df99d largefiles: remove additional blank lines
It does not conform to the coding style.
2016-06-27 10:33:33 +02:00
liscju
a37f11d3d7 largefiles: fix misleading comments in lfutil instore and storepath
Problem in both cases is cache in largefiles has assigned
meaning - user cache which is additional place to get/put
files. Those two function works on store - the main place
to store largefiles in the repository - .hg/largefiles and
using "cache" to describe it is misleading.
2016-06-24 09:08:16 +02:00
Matt Mackall
3eee63391d merge with stable 2016-06-14 14:52:58 -05:00
Augie Fackler
ad67b99d20 cleanup: replace uses of util.(md5|sha1|sha256|sha512) with hashlib.\1
All versions of Python we support or hope to support make the hash
functions available in the same way under the same name, so we may as
well drop the util forwards.
2016-06-10 00:12:33 -04:00
Henrik Stuart
5de4cefa8f largefiles: fix support for local largefiles while using share extension
Prior to revision 149be6a0072e, largefiles were saved in the local repository,
even if it was using the share extension. After that change, all largefiles are
now stored in the shared repository. However, the backward compatibility for
existing largefiles already placed in the local repository was never tested,
and has been broken since.
2016-06-07 08:32:33 +02:00
liscju
934bafbfb7 largefiles: rename match_ to matchmod import in lfutil 2016-05-20 01:42:04 +02:00
liscju
835eadbfd6 py3: make largefiles/lfutil.py use absolute_import 2016-05-10 15:09:22 +02:00
Mads Kiilerich
66a29f6996 largefiles: don't access repo.changelog directly in getlfilestoupload
Make it possible to pass both nodes and revisions to getlfilestoupload.
2016-04-13 01:45:45 +02:00
Mads Kiilerich
6d22385d0b largefiles: add some docstrings 2016-03-19 08:28:24 -07:00
Mads Kiilerich
cdb1fa386c largefiles: drop partial support for not having a user cache
9f1a3c7b4a28 introduced support for not having a "global" user cache.
In the rare cases where the environment didn't provide the location of the
current home directory, the usercachepath function could return None.
That functionality has since bitrotten and several code paths did not correctly
check for usercachepath returning None:

  $ HOME= XDG_CACHE_HOME= hg up --config extensions.largefiles=
  getting changed largefiles
  abort: unknown largefiles usercache location

Dropping the partial support for it is thus not really a backward compatibility
breaking change.

Thus: consistently fail early if the usercache location is unknown.

It is relevant to be able to control where the largefiles are stored and how
they propagate, but that should probably be done differently. The dysfunctional
code just gets in the way.
2016-03-19 08:27:54 -07:00
Mads Kiilerich
afcb680fd7 largefiles: refactor usercachepath - extract user cache path function
It is convenient to have the user cache location explicitly.
2016-03-19 08:23:55 -07:00
liscju
802a1dd151 largefiles: replace invocation of os.path module by vfs in lfutil.py
Replaces invocations os.path functions to methods in vfs. Unfortunately
(in my view) this makes code less readable, because instead of using
clear variable names with path it needs to replace them with vfs(..).
I need guidance how to make such transition look more readable.

For example in this patch there is example with few places with
wvfs.join(standindir), standindir before this patch was absolute
path, in this it is changed to relative because it is used also
in expression wvfs.join(standindir, pat).
2016-03-14 20:20:22 +01:00
Anton Shestakov
f67064214c largefiles: use revisions as a ui.progress unit
Using plural form is consistent with other progress units, and "1 out of 5
revisions" sounds more correct. Also, tests don't show this, but if you have
'speed' item in progress.format config, it shows e.g. '100 revisions/sec',
which also seems better.
2016-03-11 22:26:06 +08:00
Matt Harbison
6d46368119 largefiles: prevent committing a missing largefile
Previously, if the largefile was deleted at the time of a commit, the standin
was silently not updated and its current state (possibly garbage) was recorded.
The test makes it look like this is somewhat of an edge case, but the same thing
happens when an `hg revert` followed by `rm` changes the standin.

Aside from the second invocation of this in lfutil.updatestandinsbymatch()
(which is what triggers this test case), the three other uses are guarded by
dirstate checks for added or modified, or an existence check in the filesystem.
So aborting in lfutil.updatestandins() should be safe, and will avoid silent
skips in the future if this is used elsewhere.
2016-01-24 00:10:19 -05:00
Matt Harbison
9906cb44b6 largefiles: fix an explicit largefile commit after a remove (issue4969)
The change in 6fce9a02f069 to handle a normal -> largefile switch was too
aggressive in preserving the original matcher names.  If a largefile is
explicitly provided by the user, but only the standin exists in dirstate, then
only the standin can be committed.

There's still maybe an issue when the largefile is deleted outside of Mercurial:

  $ rm large
  $ hg ci -m "oops" large
  large: The system cannot find the file specified
  nothing changed
  [1]
2016-01-23 20:51:17 -05:00
Mads Kiilerich
76651c0e10 largefiles: fix commit of missing largefiles
92117e4f6f8d improved merging of standin files referencing missing largefiles.
It did however not test or fix commits of such merges; it would abort.

To fix that, change copytostore to skip and warn about missing largefiles
with a message similar the one for failing get from remote filestores. (It
would perhaps in both cases be better to emit a more helpful warning like
"warning: standin file for large1 references 58e24f733a which can't be found in
the local store".)

To test this, make sure commit doesn't find the "missing" largefile in the global
usercache. For further testing, verify that update and status works as expected
after this.

This will also effectively backout 159c82dd6523.
2016-01-17 17:23:32 +01:00
Mads Kiilerich
24ee58b9f9 largefiles: check hash of files in the store before copying to working dir
If the store somehow got corrupted, users could end up in weird situations that
were very hard to recover from or lead to propagation of the corruption.

Instead, spend the extra time checking the hash when copying to the working
directory. If it doesn't match, emit a warning, and don't put wrong content in
the working directory.
2015-10-23 21:27:29 +02:00
Mads Kiilerich
602d83e7e6 largefiles: fix explicit commit of normal/largefile switch
Commit of corresponding normal/largefiles pairs would only commit the standin.
That is usually fine, except if either the normal file or the standin is a
remove while the other is an add. In that case it would either give duplicate
colliding entries or lose the file.

Instead, commit both filenames if one of them is a remove.
2015-10-21 00:18:11 +02:00
FUJIWARA Katsunori
54ce7de850 dirstate: show develwarn for write() invocation without transaction
This is used to detect 'dirstate.write()' invocation without the value
gotten by 'repo.currenttransaction()' (mainly focused on 3rd party
extensions).
2015-10-17 01:15:34 +09:00
Mads Kiilerich
2de7a8b7cf largefiles: better handling of merge of largefiles that not are available
Before, when merging revisions with missing largefiles, the missing largefiles
would be fetched as a part of the merge. If that failed (for example because
the main repository temporarily was unavailable), the largefile would be left
missing. However, the next commit would abort and (seemed to) fail when
markcommitted tried to mark the standin file as normal and thus had to hash the
largefile that didn't exist. (Actually, the commit would succeed but the
largefile update that follows right after the commit transaction would abort -
quite confusing.)

To fix that, make sure that synclfdirstate only marks files as normal if they
actually exist.
2015-10-12 19:22:34 +02:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Matt Harbison
dd27c92fee largefiles: ensure lfutil.getstandinmatcher() only matches standins
Previously, simply having the largefiles extension loaded without any largefiles
added would crash when amending with -I.  The problem was with no files in the
matcher, the pattern list of files joined with 'standindir' was empty, and
scmutil.match() would match everything.  In lfutil.composestandinmatcher(), the
match function is used to test if the file is a standin, and after getting a
false positive, proceeds to call lfutil.splitstandin().  This returns None
because it isn't a standin, which blows up when passed to rmatcher.matchfn().

Manually overriding _always in getstandinmatcher() probably isn't necessary
anymore, but we leave well enough alone on stable.  This regressed in
78632d61a993.
2015-08-12 12:26:39 -04:00
Matt Harbison
66999fb4d6 largefiles: use the optional badfn argument when building a matcher
The monkey patching in cat() can't be fixed, because it still delegates to the
original bad().  Overriding commands.cat() should go away in favor overriding
cmdutil.cat() anyway, and that matcher can be wrapped with matchmod.badmatch().
2015-06-05 22:53:15 -04:00
Martin von Zweigbergk
8714aec6c0 largefiles: avoid match.files() in conditions
See 559ee9ecae07 (match: introduce boolean prefix() method,
2014-10-28) for reasons to avoid match.files() in conditions.
2015-05-19 13:08:21 -07:00
Martin von Zweigbergk
94f4135a12 largefiles: pass in whole matcher to getstandinmatcher()
The choice between the "always" case and the other case is done in
getstandinmatcher() and the next patch will change how it's determined
based on the matcher, so let's prepare by passing in the matcher, not
just the matcher's files.
2015-05-26 11:06:43 -07:00