This is essentially a copy of largefile's override of archive() in the
archival class, adapted for overriding hgsubrepo's archive(). That
means decoding isn't taken into consideration, nor is .hg_archival.txt
generated (the same goes for regular subrepos). Unlike subrepos, but
consistent with largefile's handling of the top repo, ui.progress() is
*not* called. This should probably be refactored at some point, but
at least this generates the archives properly for now. Previously,
the standins were ignored and the largefiles were archived only for
the top level repo.
Long term, it would probably be most desirable to figure out how to
tweak archival's archive() if necessary such that largefiles doesn't
need to override it completely just to special case the translating of
standins to the real files. Largefiles will already return a context
with the true largefiles instead of the standins if lfilesrepo's
lfstatus is True- perhaps this can be leveraged?
Summary and commit use dirty() to check the status of a subrepository,
so this overrides dirty() in the subrepo in the same manner as
status() to check the large files instead of their standins.
Previously, if only a large file was changed in a subrepo, summary in
the top level repo would not report the subrepo was dirty and commit
-S would report nothing changed. If any type of file was changed in
the top repo and only a large file in the subrepo, commit -S would not
commit the changes to the subrepo.
Wrapping the status command will only invoke overridestatus() and set
the lfstatus field for the top level repository. Wrapping the status
function is required to set the field on child repositories.
Previously, status -S would report large files in a subrepo as '?'
regardless of their actual states, and was inconsistent with what
status would report from within that subrepo.
This is a fix to largefiles so that 'hg cat' will work correctly when a
largefile is specified.
As per discussion on Issue 3352:
1) The file will be printed regardless if it is binary or large.
2) The file is downloaded if it is not readily available (not found in
the system cache), so that it can be printed. If the download fails,
then we abort.
This fixes a serious performance regression in largefiles introduced in
Mercurial 2.1. Caching new largefiles on pull is necessary, because
otherwise largefiles will be missing (and unable to be downloaded) when
the user tries to merge or rebase a new head with an old one. But this
is an expensive operation and should only be done for heads that are new
from the pull, rather than on all heads in the repository.
Historically, during 'hg update', every largefile in the working copy was
hashed (which is a very expensive operation on big files) and any
largefiles that did not have a hash that matched their standin were
updated.
This patch optimizes 'hg update' by keeping track of what standins have
changed between the old and new revisions, and only updating the largefiles
that have changed. This saves a lot of time by avoiding the unecessary
calculation of a list of sha1 hashes for big files.
With this patch, the time 'hg update' takes to complete is a function of
how many largefiles need to be updated and what their size is.
Performance tests on a repository with about 80 largefiles ranging from
a few MB to about 97 MB are shown below. The tests show how long it takes
to run 'hg update' with no changes actually being updated.
Mercurial 2.1 release:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
getting changed largefiles
0 largefiles updated, 0 removed
real 0m10.045s
user 0m9.367s
sys 0m0.674s
With this patch:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 0m0.965s
user 0m0.845s
sys 0m0.115s
The same repsoitory, without the largefiles extension enabled:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 0m0.799s
user 0m0.684s
sys 0m0.111s
So before the patch, 'hg update' with no changes was approximately 9.25s
slower with largefiles enabled. With this patch, it is approximately 0.165s
slower.
This removes use of unknown files for building the synthetic working
directory manifest used by manifestmerge. Instead, we adopt the
strategy used by _checkunknown.
Side-effect: unknown files are no longer moved by remote directory
renames, and now are left alone like ignored files.
Previously, we would do a full working directory walk including
unknown files to perform a merge. In many cases, this was painful
because unknown files greatly outnumbered tracked files and generally
had no useful effect on the merge.
Here we instead wait until we find a file in the destination that's
not tracked locally and detect if it exists and is not ignored. This
is usually cheaper but can be -more- expensive in the case where we're
adding a huge number of files. On the other hand, the cost of statting
the new files should be dwarfed by the cost of eventually writing
them.
In this version, case collisions are detected implicitly by
os.path.exists and wctx[f] lookup.
assumptions:
- 'lfutil.splitstandin(f)' strips 'f' of '.hglf/', if it can
- 'lfile(f)' examines 'lfutil.standin(f) in manifest'
- 'lfutil.standin(f)' adds '.hglf/' to 'f'
and
- 'f' should start with '.hglf/' there, because it is already
examined by 'lfutil.isstandin(f)'
then:
lfile(lfutil.splitstandin(f))
=> lfile(<f without '.hglf/'>)
=> lfutil.standin(<f without '.hglf/'>) in manifest
=> <<f without '.hglf/'> with '.hglf/'> in manifest
=> f in manifest
When the user pulls from a remote repository that is not his default repo, it
is quite likely that he will pull a new head. This means that if he tries to
merge or rebase with the other head, he will run into a problem becuase
largefiles has no way of tracking where the remote repository for this other
head is, so it cannot download the largefiles from this other remote repository.
It will attempt to download them from its default remote repository, which will
not yet contain the largefiles.
This patch solves this problem by caching any new largefiles for all heads
directly into the system cache at the time of the pull, so they are available
later.
This behavior is actually more in line with Mercurial's distributed nature,
because pulling already implies we have a connection to the remote server, but
merging or rebasing does not.
There is a bug in the merge process where, if a new largefile is introduced
in a merge and the user does not have that largefile in his repo's local store
nor in his system cache, the working copy will retain the old largefile. Upon
the commit of the merge, the standin is re-written to contain the hash of the
old largefile, and the lfdirstate retains a "Modified" status for the file.
The end result is that the largefile can show up in the merge commit as
"Modified", but the standin has no diff. This is wrong in two ways:
1) Such a "wedged" history with a nonsense change in a commit should not be
possible
2) It effectively reverts a largefile to an old version when doing a merge
This is caused by the fact that the updatelfiles() command always checks the
current largefile's hash against the hash stored in the current node's standin.
This is correct behavior in every case except for a merge. When merging, we
must assume that the standin in the working copy contains the correct hash,
because the original hg.merge() has already updated it for us.
This patch fixes the issue by patching the repo object to carry a "_ismerging"
attribute, that the updatelfiles() command checks for. When this attribute is
found, it checks against the working copy's standin, rather than the standin
in the current node.
Don't lock/write on operations that should be readonly (status).
Always lock when writing the lfdirstate (rollback).
Don't write lfdirstate until after committing; state isn't actually changed
until the commit is complete.
Implementing addremove correctly in largefiles is tricky, becuase the original
addremove function does not call into any of the add or remove function we've
already overridden in the extension. So the trick is to implement addremove
without duplicating any code.
This patch implements addremove by pulling out the interesting parts of
override_add() and override_remove() into generic utility functions, and
using those to handle the largefiles in addremove. Then a matcher is
installed that will ignore all largefiles, and the original addremove
function is called to take care of the regular files in addremove.
A small bit of monkey patching is used to make sure that remove_largefiles()
notifies the user when a file is removed by addremove and also makes sure
the removal of largefiles doesn't interfer with the original addremove's
operation of removing the standin.
This patch makes "hg remove" work the same way on largefiles as it does on
regular Mercurial files. If you try to remove an added largefile, the removal
fails and you are instead prompted to use "hg forget" to undo the add.
This comment is invalid. The hg.update() function will abort in the case of
any genuine error, so there is nothing to check. If we have gotten to this
point in execution, nothing critical has gone wrong, and if any standins
have been updated, we must pull new largefiles.
The largefiles extension prevents users from adding a normal file
named 'foo' if there is already a largefile with the same name.
However, there was a loop-hole: when merging, it was possible to bring
in a normal file named 'foo' while also having a '.hglf/foo' file.
This patch fixes this by extending the manifest merge to deal with
these kinds of conflicts. If there is a normal file 'foo' in the
working copy, and the other parent brings in a '.hglf/foo' file, then
the user will be prompted to keep the normal file or the largefile.
Likewise for the symmetric case where a normal file is brought in via
the second parent. The prompt looks like this:
$ hg merge
foo has been turned into a largefile
use (l)argefile or keep as (n)ormal file?
After the merge, either the '.hglf/foo' file or the 'foo' file will
have been deleted. This would cause status to return output like:
$ hg status
M foo
R foo
To fix this, the lfiles_repo.status method is changed so that a
removed normal file isn't shown if there is largefile with the same
name, and vice versa for largefiles.
The code was using the size of a symlink's target, thus wrongly making symlinks
to large files into largefiles themselves. This can be demonstrated by
deleting the symlink and then doing an 'hg up' or 'hg up -C' to restore the
symlink.
overrides.py contains several functions that temporarily override
scmutil.match(), which always takes a changectx object as the first
parameter. But these overrides name that parameter either 'repo' or
'ctxorrepo', which is misleading. So rename them to 'ctx' and remove
the special type-sensitive handling of the one called 'ctxorrepo'.
This is mainly about keeping code under the 80-column limit with as
few backslashes as possible. I am deliberately not making any logic or
behaviour changes here and have restrained myself to a few "peephole"
refactorings.
- fix some ungrammatical/unclear/incorrect comments/docstrings
- rewrite some really unclear comments/docstrings
- make formatting/style more consistent with the rest of Mercurial
(lowercase without period unless it's really multiple sentences)
- wrap to 75 columns
- always say "largefile(s)", not "lfile(s)" (or "big files")
- one space between sentences, not two