Before this patch, "overrides.getoutgoinglfiles()" (called by
"overrideoutgoing()" and "overridesummary()") and "lfilesrepo.push()"
implement similar logic to get outgoing largefiles separately.
This patch centralizes the logic to get outgoing largefiles in
"lfutil.getlfilestoupload()".
"lfutil.getlfilestoupload()" takes "addfunc" argument, because each
callers need different information (and it is useful for enhancement
in the future).
- "overrides.getoutgoinglfiles()" needs only filenames
- "lfilesrepo.push()" needs only hashes of largefiles
This goes a step further than 974959d637b7 and backs out the unreleased
--cache-largefiles option. The same can be achieved with --lfrev heads(pulled()) and
we shouldn't introduce unnecessary command line options.
Looking for a (potentially empty) directory was not reliable - both because it
is a reasonable assumption that empty directories can be removed and because it
wasn't created in all cases ... such as when pulling to an existing repository.
A situation with this case could happen after interrupting an update. Update
would fail with:
abort: No such file or directory: $TESTTMP/f/.hglf/sub2/large6
Update from a merge without using clean is not possible anyway, so this patch
takes a step in the right direction so it gets as far as reporting the right
error.
Previously, if one or more largefiles for a repo being converted were not in the
usercache, the convert would abort with a reference to the largefile being
missing (as opposed to the previous patch, where the standin was referenced as
missing). This is because commitctx() tries to copy all largefiles to the
local store, first from the user cache, and if the file isn't found there, from
the working directory. No files will exist in the working directory during a
convert, however. It is not sufficient to force the source repo to be local
before proceeding, because clone and pull do not download largefiles by default.
This is slightly less than ideal because while the conversion will now complete,
it won't be possible to update to revs with missing largefiles unless the user
intervenes manually, because there is no default path pointing back to the
source repo. Ideally these files would be cached during the conversion.
This check could have been done in reposetup.commitctx() instead, but this
ensures the local store directory is created, which is necessary to enable the
standin matcher.
The rm -> 'rm -f' change in the test is to temporarily suppress an error
clearing the cache- as noted, the cache is is not repopulated during convert.
When that is fixed, this can be changed back and the verification errors will
disappear too.
When the rev isn't specified, the standin for the working copy gets read. But
convert doesn't update the working copy for each cset it processes, so there is
no standin and the 'hg convert' would abort complaining about the standin being
missing.
Note that if the largefile is not in the user cache, 'hg convert' complains
about the largefile itself missing from the destination repo.
It looks like this method missed the updates in 7a899bd0f9c0 (which changed the
preferred discovery method from findcommonincoming() to findcommonoutgoing()),
and 8b2938386599 (which rolls up the outgoing lists into a single object).
Previously, even if a file was added with --large, 'hg addremove' or 'hg ci -A'
would add all files (including the previously added large files) as normal
files. Only after a commit where a file was added with --large would subsequent
adds or 'ci -A' take into account the minsize or the pattern configuration.
This change more closely follows the help for largefiles, which mentions that
'add --large' is required to enable the configuration, but doesn't mention the
previously required commit.
Also, if 'hg add --large' was performed and then 'hg forget <file>' (both before
a largefile enabling commit), the forget command would error out saying
'.hglf/<file> not tracked'. This is also fixed.
This reports that a repo is largefiles enabled as soon as a file is added with
--large, which enables 'add', 'addremove' and 'ci -A' to honor the config
settings before the first commit. Note that prior to the next commit, if all
largefiles are forgotten, the repository goes back to reporting the repo as not
largefiles enabled.
It makes no sense to handle this by adding a --large option to 'addremove',
because then it would also be needed for 'commit', but only when '-A' is
specified. While this gets around the awkwardness of having to add a largefile,
then commit it, and then addremove the other files when importing an existing
codebase (and preserving that extra commit in permanent history), it does still
require finding and manually adding one of the files as --large. Therefore it
is probably desirable to have a --large option for init as well.
Changeset 9467184ce7e7 broke 'outgoing --large'
...
File "hgext\largefiles\lfutil.py", line 56, in findoutgoing
remote.local(), force=force)
File "mercurial\discovery.py", line 31, in findcommonincoming
if not remote.capable('getbundle'):
AttributeError: 'lfilesrepo' object has no attribute 'capable'
This restores the previous functionality, though I'm not sure if there's a
better way to do this- that changeset introduces a hunk in debugdiscovery that
does this:
if not util.safehasattr(remote, 'branches'):
# enable in-client legacy support
remote = localrepo.locallegacypeer(remote.local())
Is there a legacy support issue here too?
Before, a tempfile was used to create a temp file was created with 600
permissions and the uploaded data was written into it. This file was
then *copied* to .hg/largefiles/<hash>.
We now simply use atomictempfile to write the data to a temp file with
the right permissions and then rename that into place.
Before, the mode was copied from the file in the working copy. This is
inconsistent with how Mercurial normally creates files inside the .hg
folder. This can lead to a situation where you can read the files
under .hg/store but not under .hg/largefiles and so a clone will fail
if it needs to access a largefile.
This fixes a serious performance regression in largefiles introduced in
Mercurial 2.1. Caching new largefiles on pull is necessary, because
otherwise largefiles will be missing (and unable to be downloaded) when
the user tries to merge or rebase a new head with an old one. But this
is an expensive operation and should only be done for heads that are new
from the pull, rather than on all heads in the repository.
some problematic encodings use backslash as part of multi-byte characters.
util.pconvert() can treat strings in such encodings correctly, if
win32mbcs is enabled, but str.replace() can not.
Historically, during 'hg update', every largefile in the working copy was
hashed (which is a very expensive operation on big files) and any
largefiles that did not have a hash that matched their standin were
updated.
This patch optimizes 'hg update' by keeping track of what standins have
changed between the old and new revisions, and only updating the largefiles
that have changed. This saves a lot of time by avoiding the unecessary
calculation of a list of sha1 hashes for big files.
With this patch, the time 'hg update' takes to complete is a function of
how many largefiles need to be updated and what their size is.
Performance tests on a repository with about 80 largefiles ranging from
a few MB to about 97 MB are shown below. The tests show how long it takes
to run 'hg update' with no changes actually being updated.
Mercurial 2.1 release:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
getting changed largefiles
0 largefiles updated, 0 removed
real 0m10.045s
user 0m9.367s
sys 0m0.674s
With this patch:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 0m0.965s
user 0m0.845s
sys 0m0.115s
The same repsoitory, without the largefiles extension enabled:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 0m0.799s
user 0m0.684s
sys 0m0.111s
So before the patch, 'hg update' with no changes was approximately 9.25s
slower with largefiles enabled. With this patch, it is approximately 0.165s
slower.
Don't lock/write on operations that should be readonly (status).
Always lock when writing the lfdirstate (rollback).
Don't write lfdirstate until after committing; state isn't actually changed
until the commit is complete.
When rebasing, we need to trust that the standins are always correct. The
rebase operation updates the standins according to the changeset it is
rebasing. We need to make the largefiles in the working copy match. If we
don't make them match, then they get accidentally reverted, either during
the rebase or during the next commit after the rebase.
This worked previously only becuase we were relying on the behavior that
largefiles with a changed standin, but unchanged contents, never showed up in
the list of modified largefiles. Unfortunately, pre-commit hooks can get
an incorrect status this way, and it also results in extra execution of code.
The solution is to simply trust the standins when we are about to commit a
rebased changeset, and politely ask updatelfiles() to pull the new contents
down. In this case, updatelfiles() will also mark any files it has pulled
down as dirty in the lfdirstate so that pre-commit hooks will get correct
status output.
Before, it was possible to create a
.hg/largefiles/hash
file with truncated content, i.e., content where
SHA-1(content) != hash
This breaks the fundamental invariant in largefiles that the file
content for files in .hg/largefiles hash to the filename.