Previously, even if a file was added with --large, 'hg addremove' or 'hg ci -A'
would add all files (including the previously added large files) as normal
files. Only after a commit where a file was added with --large would subsequent
adds or 'ci -A' take into account the minsize or the pattern configuration.
This change more closely follows the help for largefiles, which mentions that
'add --large' is required to enable the configuration, but doesn't mention the
previously required commit.
Also, if 'hg add --large' was performed and then 'hg forget <file>' (both before
a largefile enabling commit), the forget command would error out saying
'.hglf/<file> not tracked'. This is also fixed.
This reports that a repo is largefiles enabled as soon as a file is added with
--large, which enables 'add', 'addremove' and 'ci -A' to honor the config
settings before the first commit. Note that prior to the next commit, if all
largefiles are forgotten, the repository goes back to reporting the repo as not
largefiles enabled.
It makes no sense to handle this by adding a --large option to 'addremove',
because then it would also be needed for 'commit', but only when '-A' is
specified. While this gets around the awkwardness of having to add a largefile,
then commit it, and then addremove the other files when importing an existing
codebase (and preserving that extra commit in permanent history), it does still
require finding and manually adding one of the files as --large. Therefore it
is probably desirable to have a --large option for init as well.
Changeset 9467184ce7e7 broke 'outgoing --large'
...
File "hgext\largefiles\lfutil.py", line 56, in findoutgoing
remote.local(), force=force)
File "mercurial\discovery.py", line 31, in findcommonincoming
if not remote.capable('getbundle'):
AttributeError: 'lfilesrepo' object has no attribute 'capable'
This restores the previous functionality, though I'm not sure if there's a
better way to do this- that changeset introduces a hunk in debugdiscovery that
does this:
if not util.safehasattr(remote, 'branches'):
# enable in-client legacy support
remote = localrepo.locallegacypeer(remote.local())
Is there a legacy support issue here too?
Before, a tempfile was used to create a temp file was created with 600
permissions and the uploaded data was written into it. This file was
then *copied* to .hg/largefiles/<hash>.
We now simply use atomictempfile to write the data to a temp file with
the right permissions and then rename that into place.
Before, the mode was copied from the file in the working copy. This is
inconsistent with how Mercurial normally creates files inside the .hg
folder. This can lead to a situation where you can read the files
under .hg/store but not under .hg/largefiles and so a clone will fail
if it needs to access a largefile.
This fixes a serious performance regression in largefiles introduced in
Mercurial 2.1. Caching new largefiles on pull is necessary, because
otherwise largefiles will be missing (and unable to be downloaded) when
the user tries to merge or rebase a new head with an old one. But this
is an expensive operation and should only be done for heads that are new
from the pull, rather than on all heads in the repository.
some problematic encodings use backslash as part of multi-byte characters.
util.pconvert() can treat strings in such encodings correctly, if
win32mbcs is enabled, but str.replace() can not.
Historically, during 'hg update', every largefile in the working copy was
hashed (which is a very expensive operation on big files) and any
largefiles that did not have a hash that matched their standin were
updated.
This patch optimizes 'hg update' by keeping track of what standins have
changed between the old and new revisions, and only updating the largefiles
that have changed. This saves a lot of time by avoiding the unecessary
calculation of a list of sha1 hashes for big files.
With this patch, the time 'hg update' takes to complete is a function of
how many largefiles need to be updated and what their size is.
Performance tests on a repository with about 80 largefiles ranging from
a few MB to about 97 MB are shown below. The tests show how long it takes
to run 'hg update' with no changes actually being updated.
Mercurial 2.1 release:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
getting changed largefiles
0 largefiles updated, 0 removed
real 0m10.045s
user 0m9.367s
sys 0m0.674s
With this patch:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 0m0.965s
user 0m0.845s
sys 0m0.115s
The same repsoitory, without the largefiles extension enabled:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 0m0.799s
user 0m0.684s
sys 0m0.111s
So before the patch, 'hg update' with no changes was approximately 9.25s
slower with largefiles enabled. With this patch, it is approximately 0.165s
slower.
Don't lock/write on operations that should be readonly (status).
Always lock when writing the lfdirstate (rollback).
Don't write lfdirstate until after committing; state isn't actually changed
until the commit is complete.
When rebasing, we need to trust that the standins are always correct. The
rebase operation updates the standins according to the changeset it is
rebasing. We need to make the largefiles in the working copy match. If we
don't make them match, then they get accidentally reverted, either during
the rebase or during the next commit after the rebase.
This worked previously only becuase we were relying on the behavior that
largefiles with a changed standin, but unchanged contents, never showed up in
the list of modified largefiles. Unfortunately, pre-commit hooks can get
an incorrect status this way, and it also results in extra execution of code.
The solution is to simply trust the standins when we are about to commit a
rebased changeset, and politely ask updatelfiles() to pull the new contents
down. In this case, updatelfiles() will also mark any files it has pulled
down as dirty in the lfdirstate so that pre-commit hooks will get correct
status output.
Before, it was possible to create a
.hg/largefiles/hash
file with truncated content, i.e., content where
SHA-1(content) != hash
This breaks the fundamental invariant in largefiles that the file
content for files in .hg/largefiles hash to the filename.
Operating on a non-existant file can cause both IOError and OSError,
depending on the function used: open raises IOError, os.lstat raises
OSError.
The largefiles code called dirstate.normal, which in turn calls
os.lstat, so OSError is the right exception to catch here.
When (1) findfile links a largefile from the user cache to the store
and (2) the store directory doesn't exist yet, findfile errors out. A
simple call to util.makedirs fixes it.
This was unnecessarily verbose: there is no need to unlink the file
when we open it for write anyway, and there is no need to check if the
file exists after we created it.
This is consistent with the rest of Mercurial's code, mirroring the
try-finally-unlink structure elsewhere. Furthermore, it fixes the case where
largefiles throws an IOError on Windows when the temporary file is opened a
second time by copytocacheabsolute.
This patch creates the temporary file in the repo's largefiles store rather than
/tmp, which might be a different filesystem.
The original intent was that the largefiles would primarily be in the
repository, with the global cache being only that--a cache. The naming
conventions and actual intent have both strayed. In this first patch, the
naming conventions are switched to match the actual intent, as are the
configuration options.
This is mainly about keeping code under the 80-column limit with as
few backslashes as possible. I am deliberately not making any logic or
behaviour changes here and have restrained myself to a few "peephole"
refactorings.
- tweak wording of some error messages
- use consistent capitalization
- always say 'largefile', not 'lfile'
- fix I18N problems
- only raise Abort for errors the user can do something about
- fix some ungrammatical/unclear/incorrect comments/docstrings
- rewrite some really unclear comments/docstrings
- make formatting/style more consistent with the rest of Mercurial
(lowercase without period unless it's really multiple sentences)
- wrap to 75 columns
- always say "largefile(s)", not "lfile(s)" (or "big files")
- one space between sentences, not two