This patch makes "hg remove" work the same way on largefiles as it does on
regular Mercurial files. If you try to remove an added largefile, the removal
fails and you are instead prompted to use "hg forget" to undo the add.
The largefiles extension prevents users from adding a normal file
named 'foo' if there is already a largefile with the same name.
However, there was a loop-hole: when merging, it was possible to bring
in a normal file named 'foo' while also having a '.hglf/foo' file.
This patch fixes this by extending the manifest merge to deal with
these kinds of conflicts. If there is a normal file 'foo' in the
working copy, and the other parent brings in a '.hglf/foo' file, then
the user will be prompted to keep the normal file or the largefile.
Likewise for the symmetric case where a normal file is brought in via
the second parent. The prompt looks like this:
$ hg merge
foo has been turned into a largefile
use (l)argefile or keep as (n)ormal file?
After the merge, either the '.hglf/foo' file or the 'foo' file will
have been deleted. This would cause status to return output like:
$ hg status
M foo
R foo
To fix this, the lfiles_repo.status method is changed so that a
removed normal file isn't shown if there is largefile with the same
name, and vice versa for largefiles.
If a largefile is introduced on the branch that is merged into the
working copy, then 'hg status' would abort with an error like:
$ hg status
abort: .hglf/foo@33fdd332ec: not found in manifest!
The problem was that the largefiles status code only looked in the
first parent for the largefile. Largefiles are now always reported as
modified if they don't exist in the first parent -- this matches the
behavior of localrepo.status for normal files.
There is a bug in the merge process where, if a new largefile is introduced
in a merge and the user does not have that largefile in his repo's local store
nor in his system cache, the working copy will retain the old largefile. Upon
the commit of the merge, the standin is re-written to contain the hash of the
old largefile, and the lfdirstate retains a "Modified" status for the file.
The end result is that the largefile can show up in the merge commit as
"Modified", but the standin has no diff. This is wrong in two ways:
1) Such a "wedged" history with a nonsense change in a commit should not be
possible
2) It effectively reverts a largefile to an old version when doing a merge
This is caused by the fact that the updatelfiles() command always checks the
current largefile's hash against the hash stored in the current node's standin.
This is correct behavior in every case except for a merge. When merging, we
must assume that the standin in the working copy contains the correct hash,
because the original hg.merge() has already updated it for us.
This patch fixes the issue by patching the repo object to carry a "_ismerging"
attribute, that the updatelfiles() command checks for. When this attribute is
found, it checks against the working copy's standin, rather than the standin
in the current node.
Don't lock/write on operations that should be readonly (status).
Always lock when writing the lfdirstate (rollback).
Don't write lfdirstate until after committing; state isn't actually changed
until the commit is complete.
When rebasing, we need to trust that the standins are always correct. The
rebase operation updates the standins according to the changeset it is
rebasing. We need to make the largefiles in the working copy match. If we
don't make them match, then they get accidentally reverted, either during
the rebase or during the next commit after the rebase.
This worked previously only becuase we were relying on the behavior that
largefiles with a changed standin, but unchanged contents, never showed up in
the list of modified largefiles. Unfortunately, pre-commit hooks can get
an incorrect status this way, and it also results in extra execution of code.
The solution is to simply trust the standins when we are about to commit a
rebased changeset, and politely ask updatelfiles() to pull the new contents
down. In this case, updatelfiles() will also mark any files it has pulled
down as dirty in the lfdirstate so that pre-commit hooks will get correct
status output.
Implementing addremove correctly in largefiles is tricky, becuase the original
addremove function does not call into any of the add or remove function we've
already overridden in the extension. So the trick is to implement addremove
without duplicating any code.
This patch implements addremove by pulling out the interesting parts of
override_add() and override_remove() into generic utility functions, and
using those to handle the largefiles in addremove. Then a matcher is
installed that will ignore all largefiles, and the original addremove
function is called to take care of the regular files in addremove.
A small bit of monkey patching is used to make sure that remove_largefiles()
notifies the user when a file is removed by addremove and also makes sure
the removal of largefiles doesn't interfer with the original addremove's
operation of removing the standin.
This comment is invalid. The hg.update() function will abort in the case of
any genuine error, so there is nothing to check. If we have gotten to this
point in execution, nothing critical has gone wrong, and if any standins
have been updated, we must pull new largefiles.
Before, it was possible to create a
.hg/largefiles/hash
file with truncated content, i.e., content where
SHA-1(content) != hash
This breaks the fundamental invariant in largefiles that the file
content for files in .hg/largefiles hash to the filename.
current lfconvert implementation uses combination of "ui.config()" and
"str.split(' ')" to get largefiles.patterns configuration.
but it can not handle multiline configuration in hgrc files correctly.
lfconvert should use "ui.configlist()" instead of it, as same as
override_add does.
Operating on a non-existant file can cause both IOError and OSError,
depending on the function used: open raises IOError, os.lstat raises
OSError.
The largefiles code called dirstate.normal, which in turn calls
os.lstat, so OSError is the right exception to catch here.
"hg status" may treat cache missed largefiles as "removed" incorrectly.
assumptions for problem case:
- there is no cache for largefile "L"
- at first, update working directory to the revision in which "L" is
not yet added,
- then, update working directory to the revision in which "L" is
already added
and now, "hg status" treats "L" as "removed".
current implementation does not allocate entry for cache missed
largefile in ".hg/largefiles/dirstate", but files without
".hg/largefiles/dirstate" entry are treated as "removed" by largefiles
extension.
"hg revert" can not recover from this situation, but "rm -rf
.hg/largefiles", because it causes dirstate rebuilding.
this patch invokes normallookup() for cache missed largefiles to
allocate entry in ".hg/largefiles/dirstate", so "hg status" can treat
it as "missing" correctly.
When (1) findfile links a largefile from the user cache to the store
and (2) the store directory doesn't exist yet, findfile errors out. A
simple call to util.makedirs fixes it.
This is consistent with the rest of Mercurial's code, mirroring the
try-finally-unlink structure elsewhere. Furthermore, it fixes the case where
largefiles throws an IOError on Windows when the temporary file is opened a
second time by copytocacheabsolute.
This patch creates the temporary file in the repo's largefiles store rather than
/tmp, which might be a different filesystem.
When largefiles is enabled, commands on large repositories which don't
require largefiles could be slowed down substantially. Disable
checking this for every command.
The code was using the size of a symlink's target, thus wrongly making symlinks
to large files into largefiles themselves. This can be demonstrated by
deleting the symlink and then doing an 'hg up' or 'hg up -C' to restore the
symlink.
The original intent was that the largefiles would primarily be in the
repository, with the global cache being only that--a cache. The naming
conventions and actual intent have both strayed. In this first patch, the
naming conventions are switched to match the actual intent, as are the
configuration options.
overrides.py contains several functions that temporarily override
scmutil.match(), which always takes a changectx object as the first
parameter. But these overrides name that parameter either 'repo' or
'ctxorrepo', which is misleading. So rename them to 'ctx' and remove
the special type-sensitive handling of the one called 'ctxorrepo'.
This fixes a performance issue with 'hg status' when files are specified
on the command-line. Previously, a large amount of largefiles code was
executed, even if files were specified on the command-line and those files
were not largefiles. This patch fixes the problem by first checking if
non-largefiles were specified on the command-line and, just letting the
normal status function handle the case if they were.
On a brand new machine, the execution time for 'hg status filename' on
a repository with largefiles was:
real 0m0.636s
user 0m0.512s
sys 0m0.120s
versus the following (the same repository, with largefiles disabled):
real 0m0.215s
user 0m0.180s
sys 0m0.032s
After this patch, the performance of 'hg status filename' on the same
repository, with largefiles enabled is:
real 0m0.228s
user 0m0.189s
sys 0m0.036s
This performance boost is also true when patterns (rather than specific
files) are specified on the command-line.
In the case where patterns are specified in addition to a file list, we
just defer to the normal codepath in order to not spend extra time
expanding the patterns to just risk having to expand them again later.
This was unnecessarily verbose: there is no need to unlink the file
when we open it for write anyway, and there is no need to check if the
file exists after we created it.
This is mainly about keeping code under the 80-column limit with as
few backslashes as possible. I am deliberately not making any logic or
behaviour changes here and have restrained myself to a few "peephole"
refactorings.
- tweak wording of some error messages
- use consistent capitalization
- always say 'largefile', not 'lfile'
- fix I18N problems
- only raise Abort for errors the user can do something about
- fix some ungrammatical/unclear/incorrect comments/docstrings
- rewrite some really unclear comments/docstrings
- make formatting/style more consistent with the rest of Mercurial
(lowercase without period unless it's really multiple sentences)
- wrap to 75 columns
- always say "largefile(s)", not "lfile(s)" (or "big files")
- one space between sentences, not two