Problem: 'hg status' with largefiles enabled would walk through all the files
that .hgignore said should be ignored. That made it slow if a lot of files were
.hgignored or the cache was cold.
It seems like there was a reason to this, but other improvements has rendered
this unnecessary.
Solution: .hgignore is now only ignored when that is requested (--ignore).
This is a minimal 'stable' change. There is room for other improvement.
Problem: 'hg status' kept checking largefiles with an unknown state until some
other command wrote the updated dirstate.
Solution: Add missing lfdirstate.write().
If we pass a directory to commit whose only commitable files
are largefiles, the core commit code aborts before finding
the largefiles.
So we do the following:
For directories that only have largefiles as matches,
we explicitly add the largefiles to the matchlist and remove
the directory.
In other cases, we leave the match list unmodified.
For some yet-unkown reason, largefile status does not work on
repoproxy. As status is not affected by filtering, we run it unfiltered.
Na'Tosha Bard's view on this issue:
"but, well, largefiles status is kind of an unholy piece of code"
Most method added (or overwritten) to repo by largefile works on `repo`
instead of `self`. This currently works without trouble because `self` and
`repo` are likely the same. However this is semantically dubious and this may
cause issue for filtering. `self` may be proxy object different from the `repo`
one.
This changeset fix that and use `self` when applicable.
The example in comment #9 of the bug writeup must be run exactly- it was the
commit after the rm and prior to the addremove that screwed things up, because
that commit noticed that the largefile was missing, called drop(), and then the
original commit function did nothing (due to the file in the '!' state). The
addremove command properly put it into the 'R' state, but it remained stuck in
that state (because commit insisted 'nothing changed'). Without the commit
prior to addremove, the problem didn't occur.
Maybe this is an indication that lfdirstate needs to take a few more hints from
the regular dirstate, regardless of what _it_ thinks the state is- similar
inconsistency is probably still possible with this patch if the original commit
succeeds but the lfdirstate write fails.
largefiles status implementation attemps to rewrite the input match objects to
match the "standins" as well as the regular files. When fixing the directories
listed in match.files(), if there was related standin entry, it was kept and
the original path discarded. But directories can appear both as regular and
standin entries.
original implementation queries whether specified pattern is related
or not to largefiles in target context by 'dirstate.__contains__()'.
but this can't recognize 'directory pattern' correctly, so this patch
uses 'dirstate.dirs()' for it.
this patch uses dirstate instead of lfdirstate in 'working' route
(second patch hunk for 'hgext/largefiles/reposetup.py'), because
'dirs()' information may be already built for dirstate but not yet for
lfdirstate at this point. this prevents lfdirstate from building up
and having 'dirs()' information.
original implementation queries whether specified pattern is related
or not to largefiles, to target context.
but changectx/workingctx returns False about relationship with files
marked as removed.
So, 'hg status' with 'file pattern' for removed file shows unexpected
warning message in below process:
1. 'tostandin()' returns non-STANDIN filename for removed file,
because changectx/workingctx returns False about relationship
with it
2. 'match.files()' contains non-STANDIN filename, which is already
removed from working directory
3. 'dirstate.walk()' invoked via 'localrepository.status()' treats
non-STANDIN filename as bad filename, because there is no entry
for it in dirstate: only STANDIN is managed in dirstate
4. 'dirstate.walk()' invokes 'match.bad()', which is defined in
'localrepository.status()' as 'bad()'
5. 'bad()' shows warning message for non-STANDIN, because it is
not related to source context: only STANDIN is related to it
this patch queries to dirstate instead of changectxt/workingctx,
because dirstate returns expected result for removed files.
'match.files()' is used by 'localrepository.status()' only in
'working' case, so this patched code also works correctly in
non-'working' case.
current 'lfiles_repo.status()' implementation examines whether
specified patterns are related to largefiles in working directory (not
to STANDIN) or not by NOT-EMPTY-NESS of below list:
[f for f in match.files() if f in lfdirstate]
but it can not be assumed that all in 'match.files()' are file itself
exactly, because user may only specify part of path to match whole
under subdirectories recursively.
above examination will mis-recognize such pattern as 'not related to
largefiles', and executes normal 'status()' procedure. so, 'hg status'
shows '?'(unknown) status for largefiles in working directory unexpectedly.
this patch examines relation of pattern to largefiles by applying
'match()' on each entries in lfdirstate and checking wheter there is
no matched entry.
it may increase cost of examination, because it causes of full scan of
entries in lfdirstate.
so this patch uses normal for-loop instead of list comprehensions, to
decrease cost when matching is found.
Don't lock/write on operations that should be readonly (status).
Always lock when writing the lfdirstate (rollback).
Don't write lfdirstate until after committing; state isn't actually changed
until the commit is complete.
When rebasing, we need to trust that the standins are always correct. The
rebase operation updates the standins according to the changeset it is
rebasing. We need to make the largefiles in the working copy match. If we
don't make them match, then they get accidentally reverted, either during
the rebase or during the next commit after the rebase.
This worked previously only becuase we were relying on the behavior that
largefiles with a changed standin, but unchanged contents, never showed up in
the list of modified largefiles. Unfortunately, pre-commit hooks can get
an incorrect status this way, and it also results in extra execution of code.
The solution is to simply trust the standins when we are about to commit a
rebased changeset, and politely ask updatelfiles() to pull the new contents
down. In this case, updatelfiles() will also mark any files it has pulled
down as dirty in the lfdirstate so that pre-commit hooks will get correct
status output.
This fixes a performance issue with 'hg status' when files are specified
on the command-line. Previously, a large amount of largefiles code was
executed, even if files were specified on the command-line and those files
were not largefiles. This patch fixes the problem by first checking if
non-largefiles were specified on the command-line and, just letting the
normal status function handle the case if they were.
On a brand new machine, the execution time for 'hg status filename' on
a repository with largefiles was:
real 0m0.636s
user 0m0.512s
sys 0m0.120s
versus the following (the same repository, with largefiles disabled):
real 0m0.215s
user 0m0.180s
sys 0m0.032s
After this patch, the performance of 'hg status filename' on the same
repository, with largefiles enabled is:
real 0m0.228s
user 0m0.189s
sys 0m0.036s
This performance boost is also true when patterns (rather than specific
files) are specified on the command-line.
In the case where patterns are specified in addition to a file list, we
just defer to the normal codepath in order to not spend extra time
expanding the patterns to just risk having to expand them again later.
The largefiles extension prevents users from adding a normal file
named 'foo' if there is already a largefile with the same name.
However, there was a loop-hole: when merging, it was possible to bring
in a normal file named 'foo' while also having a '.hglf/foo' file.
This patch fixes this by extending the manifest merge to deal with
these kinds of conflicts. If there is a normal file 'foo' in the
working copy, and the other parent brings in a '.hglf/foo' file, then
the user will be prompted to keep the normal file or the largefile.
Likewise for the symmetric case where a normal file is brought in via
the second parent. The prompt looks like this:
$ hg merge
foo has been turned into a largefile
use (l)argefile or keep as (n)ormal file?
After the merge, either the '.hglf/foo' file or the 'foo' file will
have been deleted. This would cause status to return output like:
$ hg status
M foo
R foo
To fix this, the lfiles_repo.status method is changed so that a
removed normal file isn't shown if there is largefile with the same
name, and vice versa for largefiles.
If a largefile is introduced on the branch that is merged into the
working copy, then 'hg status' would abort with an error like:
$ hg status
abort: .hglf/foo@33fdd332ec: not found in manifest!
The problem was that the largefiles status code only looked in the
first parent for the largefile. Largefiles are now always reported as
modified if they don't exist in the first parent -- this matches the
behavior of localrepo.status for normal files.
When largefiles is enabled, commands on large repositories which don't
require largefiles could be slowed down substantially. Disable
checking this for every command.
The original intent was that the largefiles would primarily be in the
repository, with the global cache being only that--a cache. The naming
conventions and actual intent have both strayed. In this first patch, the
naming conventions are switched to match the actual intent, as are the
configuration options.
This is mainly about keeping code under the 80-column limit with as
few backslashes as possible. I am deliberately not making any logic or
behaviour changes here and have restrained myself to a few "peephole"
refactorings.
- tweak wording of some error messages
- use consistent capitalization
- always say 'largefile', not 'lfile'
- fix I18N problems
- only raise Abort for errors the user can do something about
- fix some ungrammatical/unclear/incorrect comments/docstrings
- rewrite some really unclear comments/docstrings
- make formatting/style more consistent with the rest of Mercurial
(lowercase without period unless it's really multiple sentences)
- wrap to 75 columns
- always say "largefile(s)", not "lfile(s)" (or "big files")
- one space between sentences, not two