Previously, the source was silently skipped because the largefile was in the
list of changed files, but the standin was in the copies dictionary. The source
is only displayed if the changed file is a key in the copies dictionary.
It's probably possible to refactor so that the 'if m._cwd' check isn't
necessary, but the False case is the typical case (i.e. run from the root of the
repo), and simpler to read.
An exact path to a largefile from outside the repo was previously ignored.
match.rel('.hglf') will handle figuring out both the correct '../' length to the
standin directory if inside the repo, or path/to/repo from outside, at the cost
of a pconvert() to keep the patterns using '/' on Windows.
When logging '.hglf/foo', the pattern list was being transformed from
['.hglf/foo'] into ['.hglf/foo', '.hglf/.hglf/foo']. Aside from the
pathological case of somebody getting a directory named '.hglf' created inside
the standing directory, the old code shouldn't have had any bad effects.
(amended by mpm to sort patterns for test stability and not upset check-code)
Adding the standin to the patterns list was (possibly) harmless before, but was
wrong, because the pattern list was already updated above that code. Now that
patterns are handled, it was actually harmful. For example, in this test:
$ hg log -G glob:**another*
the adjusted pattern list would have been:
['glob:**another*', '.hglf/.', 'glob:.hglf/**another*']
which causes every largefile in the root to be matched.
I'm not sure why 'glob:a*' picks up the rename of a -> b commit in test-log.t,
but a simple 'a' doesn't. But it doesn't appear to be caused by the largefiles
extension.
All current callers supply some sort of prefix, so the issue was hidden. But if
no parameter was specified, a crash occurred in the write() closure when
concatenating 'prefix' and 'name'.
When there isn't lfdirstate file in cases below, "openlfdirstate()"
call "scmutil.match()" indirectly to build lfdirstate up.
- subrepos disabling largefiles locally
- lfdirstate file is missed accidentally
This causes infinite recursive call of "openlfdirstate()" in
"overriderevert()" (introduced by 25febe9568dd), because
"openlfdirstate()" is invoked from the function overriding
"scmutil.match()" itself.
To avoid infinite recursive call of "openlfdirstate()" in
"overriderevert()" in such cases, this patch passes "create=False"
argument to "openlfdirstate()".
"create=False" forcibly makes "openlfdirstate()" avoid code path to
build lfdirstate up.
Even if largefiles extension is enabled in a repository, "repo"
object, which isn't "largefiles.reposetup()"-ed, is passed to
overridden functions in the cases below unexpectedly, because
extensions are enabled for each repositories strictly.
(1) clone without -U:
(2) pull with -U:
(3) pull with --rebase:
combination of "enabled@src", "disabled@dst" and
"not-required@src" cause this situation.
largefiles requirement
@src @dst @src result
-------- -------- --------------- --------------------
enabled disabled not-required aborted unexpectedly
required requirement error (intentional)
-------- -------- --------------- --------------------
enabled enabled * success
-------- -------- --------------- --------------------
disabled enabled * success (only for "pull")
-------- -------- --------------- --------------------
disabled disabled not-required success
required requirement error (intentional)
-------- -------- --------------- --------------------
(4) update/revert with a subrepo disabling largefiles
In these cases, overridden functions cause accessing to largefiles
specific fields of not "largefiles.reposetup()"-ed "repo" object, and
execution is aborted.
- (1), (2), (4) cause accessing to "_lfstatuswriters" in
"getstatuswriter()" invoked via "updatelfiles()"
- (3) causes accessing to "_lfcommithooks" in "overriderebase()"
For safe accessing to these fields, this patch examines whether passed
"repo" object is "largefiles.reposetup()"-ed or not before accessing
to them.
This patch chooses examining existence of newly introduced
"_largefilesenabled" instead of "_lfcommithooks" and
"_lfstatuswriters" directly, because the former is better name for the
generic "largefiles is enabled in this repo" mark than the latter.
In the future, all other overridden functions should avoid largefiles
specific processing for efficiency, and "_largefilesenabled" is better
also for such purpose.
BTW, "lfstatus" can't be used for such purpose, because some code
paths set it forcibly regardless of existence of it in specified
"repo" object.
The previous code was adding standin files to the matcher's file list when
neither the standin file nor the original existed in the context. Somehow, this
was confusing the logging code into behaving differently from when the extension
wasn't loaded.
It seems that this was an attempt to support naming a directory that only
contains largefiles, as a test fails if the else clause is dropped entirely.
Therefore, only append the "standin" if it is a directory. This was found by
running the test suite with --config extensions.largefiles=.
The first added test used to log an additional cset that wasn't logged normally.
The only relation it had to file 'a' is that 'a' was the source of a move, but
it isn't clear why having '.hglf/a' in the list causes this change:
@@ -47,6 +47,11 @@
Make sure largefiles doesn't interfere with logging a regular file
$ hg log a --config extensions.largefiles=
+ changeset: 3:2ca5ba701980
+ user: test
+ date: Thu Jan 01 00:00:04 1970 +0000
+ summary: d
+
changeset: 0:9161b9aeaf16
user: test
date: Thu Jan 01 00:00:01 1970 +0000
The second added test used to complain about a file not being in the parent
revision:
@@ -1638,10 +1643,8 @@
Ensure that largefiles doesn't intefere with following a normal file
$ hg --config extensions.largefiles= log -f d -T '{desc}' -G
- @ c
- |
- o a
-
+ abort: cannot follow file not in parent revision: ".hglf/d"
+ [255]
$ hg log -f d/a -T '{desc}' -G
@ c
|
Note that there is still something fishy with the largefiles code, because when
using a glob pattern like this:
$ hg log 'glob:sub/*'
the pattern list would contain '.hglf/glob:sub/*'. None of the tests show this
(this test lives in test-largefiles.t at 1349), it was just something that I
noticed when the code was loaded up with print statements.
This effectively reverts 9fc565fa1621, which caused some normal file copies to
not be displayed as copies. Other normal file copies could be displayed- the
exact reason isn't clear. This also adds two tests that were failing prior to
this backout, so that this can be sorted out next cycle.
The difference between copy cases that worked and those that didn't seemed to be
in copies.pathcopies(). When largefiles isn't enabled for the changed test, or
lfstatus is not set in the commands.status() override, 'y.ancestor(x) == x'.
That wasn't true otherwise, which fell through to the _chain() method. In this
case, the copy is removed in the criss cross loop.
'y.ancestor(x)' returns a context.changectx type, while 'x' is a lfilesctx type
in the failing case. I tried adding the ancestor method to the lfilesctx class
to change the type of the ancestor context, however the context when printed as
a string then gains a '+'. This points to it being a context.committablectx,
which clearly isn't correct for an ancestor. Possibly the problem is the
lfilesctx needs to subclass context.committablectx in some cases, but
context.changectx in others, within the same invocation? I'm not sure how to
pull that off, and backing out this change is safer during the freeze.
As to the status changing when a path is specified, I haven't looked into it
yet.
Previously, when a largefile is forgotten and then reverted, a warning was
issued:
$ hg revert -R subrepo subrepo/large.txt
file not managed: subrepo/large.txt (glob)
This was purely cosmetic as the file itself actually was reverted.
The problem was even with all of the matcher patching, the largefile pattern
given on the command line wasn't converted to a standin because the standin was
neither in ctx nor wctx. This causes the named largefile to be added to the
'names' dict in cmdutil.revert() in the repo walk at line 2550. The warning was
printed out when the 'names' dict is iterated, because the file was specified
exactly.
Since core revert recurses into subrepos and largefiles only overrides the
revert method in commands.py, it doesn't work properly when reverting a subrepo.
However, it still will recurse into the subrepo and call the installed matcher
method, so lfdirstate is reopened for the current repo level to prevent any new
problems.
When a directory is named in the commit file list, the previous behavior was to
walk the list, and if no normal files in the directory were also named, add the
corresponding standin for each largefile in that directory. The directory is
then dropped from the list, so that committing a directory with no normal file
changes works. It then added the corresponding standin directory for the first
largefile seen, by prefixing it with '.hglf/'.
The latter is unnecessary since each affected largefile is explicitly referenced
by its standin in the list. It also caused an abort if there were no changed
largefiles in the directory, because none of its standins changed:
abort: .hglf/foo/bar: no match under directory!
This list of files is used to tweak a matcher in lfutil.updatestandinsbymatch(),
which is what is passed to commit().
The status() call that is ultimately done in the commit code with this matcher
seems to have some OS specific differences. It is not necessary to append '.'
for Windows to run the largefiles tests cleanly. But if '.' is not added to the
list, the match function isn't called on Linux, so status() would miss any
normal files that were also in a named directory. The commit then proceeds
without those normal files, or says "nothing changed" if there were no changed
largefiles in the directory. This is not filesystem specific, as VFAT on Linux
had the same behavior as when run on ext4. It is also not an issue with
lfilesrepo.status(), since that only calls the overridden implementation when
paths are passed to commit. I dont have access to an OS X machine ATM to test
there.
Maybe there's a better way to do this. But since the standin directory for the
first largefile was previously being added, and that caused the same walk in
status(), there's no preformance change to this. There is no danger of
erroneously committing files in '.', because the original match function is
called, and if it fails, the lfutil.updatestandinsbymatch() tweaked matcher only
indicates a match if the file is in the list of standins- and '.' never is. The
added tests confirm this.
Standins are read before and after an update/merge, and all the standins that
changes are handed to updatelfiles for getting their corresponding largefiles
updated. updatelfiles would then hash the largefile and see if it already
matched the new expected hash. If so, it would skip the update. If different,
the largefile would be updated.
It would happen very rarely that the largefile happened to match the new hash
(and thus not the old one) and the hashing would thus be pointless ... and
hashing is not cheap.
Instead, when it is known that the standin hash changed (from an update), just
update the standin unconditionally. If the largefile was "unsure" before the
update, it was hashed at that point, so we know there is nothing to preserve.
(Also, the hashing in updatelfiles was not used to preserve changes, but only
to be lazy about updating the largefile, so nothing is lost by not doing this
extra hashing.)
There might be rare situations where we now will update largefiles that didn't
have to be updated, but in all relevant cases (?) this will improve
performance.
Updates on a repo with some big largefiles has been seen to go from 9.19 s to
6.8 s - that is 26% less painful.
The --large, --normal and --lfsize args couldn't be passed to a subrepo before,
and files in the subrepos would be added silently (if -v wasn't specified) as
normal files. As an added bonus, 'hg add --dry-run' no longer prints that
largefiles would also be added as normal files as well.
'hg update' would hash all 'unsure' largefiles before performing the merge. It
would update the standins but not detect the very common case where the
largefile never had been changed by the user but just had been marked with an
invalid dirstate mtime to make sure any changes done by the user in the same
second would be detected. The largefile would remain in that state and would
have to be hashed again next time even though it still not had been changed.
Sad trombone.
Instead, for largefiles listed as 'unsure' or 'modified', after updating the
standin with the actual hash, mark the largefile as normal if it turns out to
not be modified relative to the revision in the parent revision. That will
prevent it from being hashed again next time.
This will be used to exclude largefile candidates from the normal file matcher,
which will allow add and addremove dryruns to not print a file as both a normal
and a large file.
Core addremove prints the file relative to cwd only if patterns are provided to
the command. Core add always prints relative to cwd. Also, both methods print
the subrepo prefix when needed. The 'already a largefile' doesn't have an
analog in core, but follows the same rules for consistency.
Both cmdutil.remove() and scmutil.addremove() require verbose mode or an inexact
match to print the filename. Core addremove also prints the file relative to
cwd only if patterns are provided to the command. And finally, both methods
print the subrepo prefix when needed.
When cloning a repo that requires largefiles, the user had to either enable the
extension on the command line and then manually edit the local hgrc file after
the clone, or just enable it globally for the user. Since it is a feature of
last resort, and materially affects even repos without any largefiles when it is
enabled, we should make it easier to not have it enabled globally.
This simply adds the enabling statement to the local hgrc if the requires file
mandates its use (which only happens after the first largefile is committed).
That means that a user who works with a mix of largefile and normal repos can
always clone with '--config extensions.largefiles=', and the extension is
permanently enabled or not as appropriate.
The change in test-largefiles.t is simply because the order of loading rebase
and largefiles changed. The same change occurs if the order is flipped in the
hgrc file at the top of the test.
The destination is validated by pathutil.canonpath() for illegal components, and
that it is in the repository. The logic for creating the standin directory tree
was calling this before cmdutil.copy(), but without the destination file name
component. The cmdutil.copy() logic also calls pathutil.canonpath(), but with
the file name component. By always calling the core logic first, the error
message is always consistent. Specifically, the old behavior for these tests
was to say '.hg' contains an illegal component, and '..' is not under root.
A user wasn't likely to notice the discrepancy, but this eliminates a needless
difference when running the test suite with --config extensions.largefiles=.
This is consistent with addlargefiles(), and will make it easier to get the
paths that are printed correct when recursing into subrepos or invoking from
outside the repository. It also now restricts the path that the addremove is
performed on if a path is given, as is done with normal files.
The repo.status() call needs to exclude clean files when performing an
addremove, because the addremove override method calling this used to pass the
list of files to delete, which caused the matcher to only consider those files
in building the status list. Now the matcher is restricted only to the extent
that the caller requested- usually directories if at all. There's no reason for
addremove to care about clean files anyway- we don't want them deleted.
The more aggressive synchronization of lfdirstate that was backed out in
4fe80f20ab06 masked the problem where lfdirstate would hold an 'R' for a
largefile that was added and then removed without a commit between. We could
just conditionally call lfdirstate.drop() or lfdirstate.remove() here, but this
also properly updates lfdirstate if the standin doesn't exist for the file
somehow (i.e. call drop instead of remove).
Without this change, the precommit status in the commit command immediately
after the test change lists the removed (and never committed) largefile as 'R'.
It can also lead to situations where the status command reports the same, long
after the commit [1].
[1] http://www.selenic.com/pipermail/mercurial-devel/2015-January/065153.html
d20af8be6a14 introduced a significant performance regression: All largefiles
were marked 'normallookup' by linear (or noop) updates and had to be rehashed
by the next command.
The previous change introduced a different solution to the problem d20af8be6a14
solved and we can thus back it out again.
This is an alternative solution to the problem addressed by d20af8be6a14. This
implementation has the advantage that it doesn't mark clean largefiles as
normallookup. We can thus avoid repeated rehashing of all largefiles when
d20af8be6a14 is backed out.
This implementation use the existing 'lfmr' actions that 4f7f7352d9d0
introduced for handling another part of the same cases.
If an uncommitted and deleted file was forgotten, a warning would be emitted,
even though the operation was successful. See the previous patch for
'remove -A' for the exact circumstances, and details about the cause.
The bug report doesn't mention largefiles, but the given recipe doesn't fail
unless the largefiles extension is loaded. The problem only affected normal
files, whether or not any largefiles are committed, and only files that have
not been committed yet. (Files with an 'a' state are dropped from dirstate,
not marked removed.) Further, if the named normal file never existed, the
warning would be printed out twice.
The problem is that the core implementation of remove() calls repo.status(),
which eventually triggers a dirstate.walk(). When the file isn't seen in the
filesystem during the walk, the exception handling finds the file in
dirstate, so it doesn't complain. However, the largefiles implementation
called status() again with all of the original files (including the normal
ones, just dropped). This time, the exception handler doesn't find the file
in dirstate and does complain. This simply excludes the normal files from
the second repo.status() call, which the largefiles extension has no interest
is processing anyway.
This is a copy/paste (with the necessary tweaks) of the composenormalfilematcher
method currently on default, which does the inverse- this trims the normal files
out of the matcher. It will be used in the next patch.
This is in the way of passing a matcher to removelargefiles(). This method is
called in exactly two places- first in overrides.addremove() (but only if the
pattern list passed to it is not empty), and second in the commands.remove()
override. But since the latter calls commands.remove() first, which also does
this check, it isn't needed here.
The comment about status being buggy with a repo proxy seems to be that status
wasn't being redirected to testing the largefiles themselves- lfstatus was
always False, and so the original status method was being called instead of
doing the largefiles processing. This is because when the various largefile
command overrides call 'repo.lfstatus = True', __setattr__() in repoview is in
turn setting the value on the unfiltered repo, and the value in the filtered
repo remains False.
Explicitly looking at the attribute on the unfiltered repo keeps all views in
sync.
There is no --after for addremove, so the printing for addremove can be hoisted
out of the 'not after' check. The difference between the two remove messages
reflects the existing difference between core remove and core addremove styles
for printing the file.
There are still some pre-existing issues here. Core addremove only prints on
inexact matches or when verbose. But since the largefiles that are being
removed are passed to removelargefiles() as a pattern list, there is never an
inexact match, which would keep the largefiles from being printed at all unless
verbose is specified. Therefore, the output is a little more aggressive than
core. The addremove print style here is also inconsistent with core- it should
use matcher.uipath(f) instead of f. These can be fixed once a matcher is passed
in.
The function only adds the hash content of the file to the set to upload if the
file in the ctx is a standin. It is called by overrides.summaryremotehook(),
which is called in the summary method. The largefiles extension switches
'lfstatus' on in summary, so the standins shouldn't be visible when obtaining a
context there.
The reason this wasn't noticed before is that the 'lfstatus' attribute is only
being set on the unfiltered repo because of how repoview delegates attribute
assignment. Therefore any filtered view will return a context containing
standins, whether or not 'lfstatus' was set in the various overrides methods.
That will be fixed in the next patch. But without this change, the next patch
would have test failures for 'summary --large' stating there are no files to
upload.
When a directory was renamed and a new untracked file was added in the
new directory and the remote directory added a file by the same name
in the old directory, the local untracked file gets overwritten, as
demonstrated by the broken test case in test-rename-dir-merge.
Fix by checking for unknown files for 'dg' actions too. Since
_checkunknownfile() currently expects the same filename in both
contexts, we need to add a new parameter for the remote filename to
it.
This simplifies largefiles' overridecalculateupdates(), which no
longer has to do the conversion it started doing in 478d610ca1b0
(largefiles: rewrite merge code using dictionary with entry per file,
2014-12-09).
To keep this patch small, we'll leave the name 'actionbyfile' in
overrides.py. It will be renamed in the next patch.
The hack for method monkey patching on repoview has been ruled out as
fragile, so we are rolling it back. We'll expand the explanation in the next
changeset.
Changeset 5b64e22ecd8e introduced the examination of exec bit of
largefiles in "hg status --rev REV" case, but it doesn't avoid it on
the platform being unaware of exec-bit (e.g. on NTFS of Windows).
The current state of subrepo methods is to pass a 'ui' object to some methods,
which has the effect of overriding the subrepo configuration since it is the
root repo's 'ui' that is passed along as deep as there are subrepos. Other
subrepo method are *not* passed the root 'ui', and instead delegate to their
repo object's 'ui'. Even in the former case where the root 'ui' is available,
some methods are inconsistent in their use of both the root 'ui' and the local
repo's 'ui'. (Consider hg._incoming() uses the root 'ui' for path expansion
and some status messages, but also calls bundlerepo.getremotechanges(), which
eventually calls discovery.findcommonincoming(), which calls
setdiscovery.findcommonheads(), which calls status() on the local repo 'ui'.)
This inconsistency with respect to the configured output level is probably
always hidden, because --verbose, --debug and --quiet, along with their 'ui.xxx'
equivalents in the global and user level hgrc files are propagated from the
parent repo to the subrepo via 'baseui'. The 'ui.xxx' settings in the parent
repo hgrc file are not propagated, but that seems like an unusual thing to set
on a per repo config file. Any 'ui.xxx' options changed by --config are also
not propagated, because they are set on repo.ui by dispatch.py, not repo.baseui.
The goal here is to cleanup the subrepo methods by dropping the 'ui' parameter,
which in turn prevents mixing subtly different 'ui' instances on a given subrepo
level. Some methods use more than just the output level settings in 'ui' (add
for example ends up calling scmutil.checkportabilityalert() with both the root
and local repo's 'ui' at different points). This series just goes for the low
hanging fruit and switches methods that only use the output level.
If we really care about not letting a subrepo config override the root repo's
output level, we can propagate the verbose, debug and quiet settings to the
subrepo in the same way 'ui.commitsubrepos' is in hgsubrepo.__init__.
Archive only uses the 'ui' object to call its progress() method, and gitsubrepo
calls status().
By moving the cd/dc prompts out of calculateupdates(), we let
largefiles' overridecalculateupdates() so the unresolved values
(i.e. 'cd' or 'dc' rather than 'g', 'r', 'a' and missing). This allows
overridecalculateupdates() to ask the user whether to keep the normal
file or the largefile before the user gets the cd/dc prompt. Whichever
answer the user gives, we make overridecalculateupdates() replace 'cd'
or 'dc' action, saving the user one annoying (and less clear)
question.
The recursive addremove operation occurs completely before the first subrepo is
committed. Only hg subrepos support the addremove operation at the moment- svn
and git subrepos will warn and abort the commit.
Instead of iterating over 'g' action, first find the set of all files
that are largefiles in p1. Then iterate over these files. This
prepares for considering actions other than 'g'.
In overridecalculateupdates(), we currently only deal with conflicts
that result in a 'g' action for either the largefile or a standin. We
will soon want to deal cases with 'cd' and 'dc' actions here. It will
be easier to reason about such cases if we rewrite it using a dict
from filename to action.
A side-effect of this change is that the output can only have one
action per file (which should be a good change). Before this change,
when one of the tests in test-issue3084 received this input (the 'a'
in the input was a result of 'cd' conflict resolved in favor of the
modified file):
'g': [('.hglf/f', ('',), 'remote created')],
'a': [('f', None, 'prompt keep')],
and the user chose to keep the local largefile, it produced this
output:
'g': [('.hglf/f', ('',), 'remote created')],
'r': [('f', None, 'replaced by standin')],
'a': [('f', None, 'prompt keep')],
Although 'a' actions are processed after 'r' actions by
recordupdates(), it still worked because 'a' actions have no effect on
merges (only on updates). After this change, the output is:
'g': [('.hglf/f', ('',), 'remote created')],
'r': [('f', None, 'replaced by standin')],
Similarly, there are several tests in test-largefiles-update that get
inputs like:
'a': [('.hglf/large2', None, 'prompt keep')],
'g': [('large2', ('',), 'remote created')],
and when the user chooses to keep the local largefile, they produce
this output:
'a': [('.hglf/large2', None, 'prompt keep'),
('.hglf/large2', None, 'keep standin')],
'lfmr': [('large2', None, 'forget non-standin largefile')],
In this case, it was not a merge but an update, so the 'a' action does
have an effect. However, since dirstate.add() is idempotent, it still
has no obserable effect.
After this change, the output is:
'a': [('.hglf/large2', None, 'keep standin')],
'lfmr': [('large2', None, 'forget non-standin largefile')],
The items we put in 'newglist' are always the same as what we found in
actions['g'], so let's just put the same item into the list instead of
creating a new one.
The action lists returned from calculateupdates() (in merge.py) are
not required to be sorted. In fact, since they result from iteration
over the unordered manifest, they are unlikely to be sorted. Moreover,
some of the lists are appended to after they are returned from
manifestmerge(). The lists are instead sorted in
applyupdates(). Therefore, let's not sort the lists generated in
largefiles' overridecalculateupdates().
When merging and the remote has turned a normal file into a largefile
and the user chooses to keep the local largefile, we use the 'r'
action for the remote largefile standin. This is wrong, since that
file does not exist in the parent of the working copy. Use 'k', which
does nothing but debug logging, instead.
When merging and the remote has turned a largefile into a normal file
and the user chooses to keep the local largefile, we use the 'r'
action for the remote normal file. This is wrong, since that file does
not exist in the parent of the working copy. Use 'k', which does
nothing but debug logging, instead.
In c69fe5519c86 (largefiles: don't show largefile/normal prompts if
one side is unchanged, 2014-12-01), overridecalculateupdates() started
checking for false modify/delete conflicts in large files and their
standins. Then, in the very next changeset, 99b29d2bd5ed (merge:
before cd/dc prompt, check that changed side really changed,
2014-12-01), calculateupdates() itself started checking for false
modify/delete conflicts in all files. Since "large files and their
standins" is a subset of "all files", we can now drop the checks in
overridecalculateupdates().
In overridecalculateupdates(), 'g' (get) actions may be converted into
other actions. In most of these cases, it does not make sense to keep
the action's message. For example, 'remote created' does not make
sense for an 'r' (remove) action.
The message in the action is used for debugging and should not be the
same as the question presented to the user. Use a different variable
for the user message, so the 'msg' variable already in scope does not
get overwritten.
The fetch extension has been calling cmdutil.bailifchanged() since 70b2d52341c9,
so this is redundant. Add test coverage to prevent regression. It doesn't look
like there is any testing for fetch with largefiles.
Refactoring addremove to support subrepos will need the ability to keep passing
the same matcher and narrowing it, instead of monkey patching scmutil's matcher.
The method has been called from commands.py since 8d9ca2ac2fe8
(update: just merge unknown file collisions, 2012-02-09), so drop the
underscore prefix that suggests that it's private.
Before this patch, while "hg convert", largefiles avoids copying
largefiles in the working directory into the store area by combination
of setting "repo._isconverting" in "mercurialsink{before|after}" and
checking it in "copytostoreabsolute".
This avoiding is needed while "hg convert", because converting doesn't
update largefiles in the working directory.
But this implementation is not efficient, because:
- invocation in "markcommitted" can easily ensure updating
largefiles in the working directory
"markcommitted" is invoked only when new revision is committed via
"commit" of "localrepository" (= with files in the working
directory). On the other hand, "commitctx" may be invoked directly
for in-memory committing.
- committing without updating the working directory (e.g. "import
--bypass") also needs this kind of avoiding
For efficiency of this kind of avoiding, this patch does:
- move "copyalltostore" invocation into "markcommitted"
- remove meaningless procedures below:
- hooking "mercurialsink{before|after}" to (un)set "repo._isconverting"
- checking "repo._isconverting" in "copytostoreabsolute"
This patch invokes "copyalltostore" also in "_commitcontext", because
"_commitcontext" expects that largefiles in the working directory are
copied into store area after "commitctx". In this case, the working
directory is used as a kind of temporary area to write largefiles out,
even though converted revisions are committed via "commitctx" (without
updating normal files).
Putting "lambda *msg, **opts: None" (= avoid printing messages always)
into "_lfstatuswriters" while transplanting makes explicit passing
"printmessage = False" for "updatelfiles()" useless.
This patch also removes setting/unsetting "repo._istransplanting" in
"overridetransplant", because there is no code path referring it.
Before this patch, "hg transplant --continue" may record incorrect
standins, because largefiles extension always avoid updating standins
while transplanting, even though largefiles in the working directory
may be modified manually at the 1st commit of "hg transplant --continue".
But, on the other hand, updating standins should be avoided at
subsequent commits for efficiency reason.
To update standins only at the 1st commit of "hg transplant
--continue", this patch uses "automatedcommithook", which updates
standins by "lfutil.updatestandinsbymatch()" only at the 1st commit of
resuming.
Even after this patch, "repo._istransplanting = True" is still needed
to avoid some status report while updating largefiles in
"lfcommands.updatelfiles()".
This is reason why this patch omits not "repo._istransplanting = True"
in "overriderebase" but examination of "getattr(repo,
"_istransplanting", False)" in "updatestandinsbymatch".
At "hg transplant --merge REV", largefiles newly coming from the 2nd
parent (= REV) are marked as "a"(dded) by "patch.patch()", and have to
be marked as "n"(ormal) after commit.
But until changeset 978713c45992, such largefiles were still marked as
"a" unexpectedly even after commit, because no additional entry is
added to filelog of such largefiles and they aren't listed in
"repo[newnode].files()" in this case: "newnode" is one of newly
committed changeset (= result of "repo.commit()").
"updatelfiles" invocation in "overridetransplant" shadows this problem
by forcibly synchronizing lfdirstate to dirstate.
Now, "updatelfiles" invocation in "overridetransplant" is redundant,
because changeset 978713c45992 made "markcommitted" use "ctx.files()"
to get targets of "synclfdirstate" instead of "repo[newnode].files()".
Putting "lambda *msg, **opts: None" (= avoid printing messages always)
into "_lfstatuswriters" while rebasing makes explicit passing
"printmessage = False" for "updatelfiles()" useless.
This patch also removes setting/unsetting "repo._isrebasing" in
"overriderebase", because there is no code path referring it.
This patch makes "updatelfiles()" get appropriate function to write
largefiles specific status messages via "getstatuswriter()".
This patch introduces None as "print messages if needed", because True
(forcibly writing) and False (forcibly ignoring) are already used for
"printmessage" of "updatelfiles".
Subsequent patch will move "avoid printing messages only while
automated committing" decision from caller of "updatelfiles()" into
"getstatuswriter()".
"lfutil.getstatuswriter" is the utility to get appropriate function to
write largefiles specific status out from "repo._lfstatuswriters".
This patch uses "stack" with an element instead of flag like
"_isXXXXing" or so, because:
- the former works correctly even when customizations are nested, and
- ensuring at least one element can ignore empty check
Before this patch, "hg rebase --continue" may record incorrect
standins, because largefiles extension always avoid updating standins
while rebasing, even though largefiles in the working directory may be
modified manually at the 1st commit of "hg rebase --continue".
But, on the other hand, updating standins should be avoided at
subsequent commits for efficiency reason.
To update standins only at the 1st commit of "hg rebase --continue",
this patch introduces state-full callable object
"automatedcommithook", which updates standins by
"lfutil.updatestandinsbymatch()" only at the 1st commit of resuming.
Even after this patch, "repo._isrebasing = True" is still needed to
avoid some status report while updating largefiles in
"lfcommands.updatelfiles()".
This is reason why this patch omits not "repo._isrebasing = True" in
"overriderebase" but examination of "getattr(repo, "_isrebasing",
False)" in "updatestandinsbymatch".
This changes allows to customize pre-committing procedures according
to conditions.
This patch uses "stack" with an element instead of flag like
"_isXXXXing" or so, because:
- the former works correctly even when customizations are nested, and
- ensuring at least one element can ignore empty check
This patch factors out procedures to update standins for
pre-committing. This is one of preparations to avoid execution of such
procedures according to invocation context.
For example, resuming automated committing (e.g. "hg rebase
--continue") should update standins at the 1st commit, because
largefiles in the working directory may be modified manually. But on
the other hand, it should avoid updating standins at subsequent
committings for efficiency reason.
For simplicity, this patch just moves procedures mechanically only
with replacing below.
- "self" => "repo"
- "lfutil." => (none)
- "orig" invocation => returning "match"
Using "fstandin" instead "standin" as the name of local variable for
the loop below is the only special care, because the latter shadows
the same name function in "lfutil.py".
[before]
for standin in standins:
lfile = lfutil.splitstandin(standin)
if lfdirstate[lfile] != 'r':
lfutil.updatestandin(self, standin)
[after]
for fstandin in standins:
lfile = splitstandin(fstandin)
if lfdirstate[lfile] != 'r':
updatestandin(repo, fstandin)
Before this patch, procedures to update lfdirstate for post-committing
are scattered in "lfilesrepo.commit". In the case of "hg commit" with
patterns for target files ("Case 2"), lfdirstate is updated BEFORE
real committing.
This patch factors out procedures to update lfdirstate for
post-committing into "lfutil.markcommitted", and makes it callable via
"markcommitted" of the context passed to "lfilesrepo.commitctx".
"markcommitted" of the context is called, only when it is committed
successfully.
Passing original "markcommitted" of the context is meaningless in this
patch, but required in subsequent one to prepare something before
invocation of it.
This patch removes "--rebase" specific code path for "hg pull" in
"overridepull", because previous patch makes it meaningless: now,
"rebase.rebase" ("orig" invocation in this patch) can
update/commit largefiles safely without "repo._isrebasing = True".
As a side effect of removing "rebase.rebase" invocation in
"overridepull", this patch removes "nothing to rebase ..." message in
"test-largefiles.t", which is shown only when rebase extension is
enabled AFTER largefiles:
before this patch:
1. "dispatch" invokes "pullrebase" of rebase as "hg pull" at
first, because rebase wraps "hg pull" later
2. "pullrebase" invokes "overridepull" of largefiles as "orig",
even though rebase assumes that "orig" is "pull" of commands
3. "overridepull" executes "pull" and "rebase" directly
3.1 "pull" pulls changesets and creates new head "X"
3.2 "rebase" rebases current working parent "Y" on "X"
4. "overridepull" returns to "pullrebase"
5. "pullrebase" tries to rebase, but there is nothing to be done,
because "Y" is already rebased on "X". then, it shows "nothing
to rebase ..."
after this patch:
1. "dispatch" invokes "pullrebase" of rebase as "hg pull"
2. "pullrebase" invokes "overridepull" of largefiles as "orig"
3. "overridepull" executes "pull" as "orig"
4. "overridepull" returns to "pullrebase"
5. revision "Y" is not yet rebased, so "pullrebase" doesn't shows
"nothing to rebase ..."
As another side effect of removing "rebase.rebase" invocation, this
patch fixes issue3861, which occurs only when rebase extension is
enabled BEFORE largefiles:
before this patch:
1. "dispatch" invokes "overridepull" of largefiles at first,
because largefiles wrap "hg pull" later
2. "overridepull" executes "pull" and "rebase" explicitly
2.1 "pull" pulls changesets and creates new head "X"
2.2 "rebase" rebases current working parent, but fails because
no revision is checked out in issue3861 case
3. "overridepull" returns to "dispatch" with exit code 1 returned
from "rebase" at (2.2)
4. "hg pull" terminates with exit code 1 unexpectedly
after this patch:
1. "dispatch" invokes "overridepull" of largefiles at first
2. "overridepull" invokes "pullrebase" of rebase as "orig"
3. "pullrebase" invokes "pull" as "orig"
4. "pullrebase" invokes "rebase", and it fails
5. "pullrebase" returns to "overridepull" with exit code 0
(because "pullrebase" ignores result of "pull" and "rebase")
6. "overridepull" returns to "dispatch" with exit code 0 returned
from "rebase" at (5)
7. "hg pull" terminates with exit code 0
Before this patch, largefiles extension wraps only "rebase" in the
command table by "extensions.wrapcommand". But there are some
functions using "rebase.rebase" directly.
Without special care for them, largefiles extension can't work
correctly with such functions. In addition to it, "special care" often
becomes complicated and awkward. For example:
- "unshelve" can't get correct result of "rebase.rebase", because of
lack of special care
- special care for "hg pull --rebase" causes issue3861
This patch wraps "rebase.rebase" for functions using it directly.
For simplicity, this patch keeps 'special care for "hg pull --rebase"'.
It is removed in the subsequent patch.
Instead of checking for a partial merge by checking that the matches
has no files and no patterns, check that it's not an
always-matcher. Except for being shorter, it also catches the rare
case of an exact-matcher with no files.
We currently shortcircuit the checking for large file standins if only
patterns of type 'path' are given on the command line. That makes e.g.
"hg st 'glob:foo/**'" unnecessarily slow when the only large files are
in a sibling directory.
Relax the check to be that it is not an always-matcher and that no
large files match the patterns given on the command line.
Note that before this change, only the latter of the following two
would show the status of files in .hglf (since the -I makes
match.anypats() true). After this change, they both display the
status. This behavior doesn't seem correct, but it would be a separate
change to explicitly filter out .hglf even in the shortcircuit case.
hg st .hglf/$file
hg st .hglf/$file -I .
In two very similar segments of code, an existing matcher is modified
by changing its _files attribute through a map and a filter
operation. Neither operation can cause an empty list to become
non-empty, so a matcher that always matches can not stop always
matching. Drop the setting of the attribute, so we don't unnecessarily
prevent the fast paths to be taken where these matchers end up being
used.
Before this patch, "hg status --rev REV" doesn't list largefiles up
with "M" mark, even if exec bit of them is changed, because
"lfilesrepo.status" doesn't examine exec bit in such case.
Before this patch, "hg status --rev REV" listed largefiles removed in
the working directory up with "R" mark, even if they aren't managed in
the REV. Normal files aren't listed up in such case.
When "lfilesrepo.status" is invoked for "hg status --rev REV", it
treats files on conditions below as "removed" (to avoid manifest full
scan in "ctx.status" ?):
- marked as "R" in lfdirstate, or
- files managed in the target revision but unknown in the manifest
of the working context (= not including "R" files)
But the former can include files not managed in the target context.
To ignore removal status of files not managed in the target context,
this patch drops files unknown in the target revision from "removed"
list.
In lfdirstatestatus(), the status tuple gets deconstructed, the lists
get updated, and then an identical status tuple gets created and
returned. Change it so we simply return the original tuple.
The status tuple returned from dirstate.status() has an additional
field compared to the other status tuples: lookup/unsure. This field
is just an optimization and not something most callers care about
(they want the resolved value of 'modified' or 'clean'). To prepare
for a single future status type, let's separate out the 'lookup' field
from the rest by having dirstate.status() return a pair: (lookup,
status).
Instead of iterating over all files in the context and ignoring those
that are not standins, pass a standin-matcher to the context and
iterate over only the files matching.
Apart from making the intent clearer, this implementation will also
benefit from any future optimizations done to the manifest walking
code.
The variable 'lfiles' is first used for a set of the names of all the
large files. It is then overwritten with a tuple like the ones
returned from status(). To reduce confusion, let's create a separate
variable for the second use.
At the end of lfilesrepo.status(), we clear the lists of unknown,
ignored and clean files, depending on the values of 'listunknown'
etc. The lists originate from other calls to status(), and it is only
'clean' that may get updated after the calls. Let's remove the need to
clear any of the lists by explicitly only adding to 'clean' when
'listclean' is true.
The internal API used IOError to indicate that a file should be marked as
removed.
There is some correlation between IOError (especially with ENOENT) and files
that should be removed, but using IOErrors to represent file removal internally
required some hacks.
Instead, use the value None to indicate that the file not is present.
Before, spurious IO errors could cause commits that silently removed files.
They will now be reported like all other IO errors so the root cause can be
fixed.
After previous patches, largefiles in the working directory are
ensured to be updated before "repo.commit" invocation for automated
committing below:
- by "overrides.mergeupdate" via "merge.update" for rebase
- by "overrides.scmutilmarktouched" via "patch.patch" for transplant
This patch removes redundant "lfcommands.updatelfiles" invocation in
"Case 0" code path of "lfilesrepo.commit" for automated committing,
and revises detailed comment.
Before this patch, largefiles in the working directory aren't updated
correctly, if transplant is aborted by conflict. This prevents users
from viewing appropriate largefiles while resolving conflicts.
While transplant, largefiles in the working directory are updated only
at successful committing in the special code path of
"lfilesrepo.commit()".
To update largefiles even if transplant is aborted by conflict, this
patch wraps "scmutil.marktouched", which is invoked from "patch.patch"
with "files" list of added/modified/deleted files.
This patch invokes "updatelfiles" with:
- "printmessage=False", to suppress "getting changed largefiles ..."
messages while automated committing by transplant
- "normallookup=True", because "patch.patch" doesn't update dirstate
for modified files
in such case, "normallookup=False" may cause marking modified
largefiles as "clean" unexpectedly
Before this patch, largefiles in the working directory aren't updated
correctly, if rebase is aborted by conflict. This prevents users from
viewing appropriate largefiles while resolving conflicts.
While rebase, largefiles in the working directory are updated only at
successful committing in the special code path of
"lfilesrepo.commit()".
To update largefiles even if rebase is aborted by conflict, this patch
centralizes the logic of updating largefiles in the working directory
into the "mergeupdate" wrapping "merge.update".
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, largefiles in the
working directory should be updated with other (normal) files
simultaneously while "merge.update" execution: maybe by hooking
"applyupdates".
"Action list based updating" introduced by hooking "applyupdates" will
also improve performance of updating, because it automatically
decreases target files to be checked.
Just after this patch, there are some improper things in "Case 0" code
path of "lfilesrepo.commit()":
- "updatelfiles" invocation is redundant for rebase
- detailed comment doesn't meet to rebase behavior
These will be resolved after the subsequent patch for transplant,
because this code path is shared with transplant.
Even though replacing "merge.update" in rebase extension by "hg.merge"
can also avoid this problem, this patch chooses centralizing the logic
into "mergeupdate", because:
- "merge.update" invocation in rebase extension can't be directly
replaced by "hg.merge", because:
- rebase requires some extra arguments, which "hg.merge" doesn't
take (e.g. "ancestor")
- rebase doesn't require statistics information forcibly displayed
in "hg.merge"
- introducing "mergeupdate" can resolve also problem of some other
code paths directly using "merge.update"
largefiles in the working directory aren't updated regardless of
the result of commands below, before this patch:
- backout (for revisions other than the parent revision of the
working directory without "--merge")
- graft
- histedit (for revisions other than the parent of the working
directory
When "partial" is specified, "merge.update" doesn't update dirstate
entries for standins, even though standins themselves are updated.
In this case, "normallookup" should be used to mark largefiles as
"possibly dirty" forcibly, because applying "normal" on lfdirstate
treats them as "clean" unexpectedly.
This is reason why "normallookup=partial" is specified for
"lfcommands.updatelfiles".
This patch doesn't test "hg rebase --continue", because it doesn't
work correctly if largefiles in the working directory are modified
manually while resolving conflicts. This will be fixed in the next
step of refactoring for largefiles.
All changes of tests/*.t files other than test-largefiles-update.t in
this patch come from invoking "updatelfiles" not after but before
statistics output of "hg.update", "hg.clean" and "hg.merge".
Code paths below expect "hg.updaterepo" (or "hg.update" using it) to
execute linear merging:
- "update" in commands
- "postincoming" in commands, used for:
- "hg pull --update"
- "hg unbundle --update"
- "hgsubrepo.get" in subrepo
For linear merging with largefiles, standins should be updated
according to (possibly dirty) largefiles before "merge.update"
invocation to detect conflicts correctly.
Before this patch, only the "update" command can execute linear merging
correctly, because largefiles extension takes care of only it.
This patch moves "updatestandin" invocation from "overrideupdate" ("hg
update" wrapper) to "_hgupdaterepo" ("hg.updaterepo" wrapper) to
execute linear merging in "hg.updaterepo" correctly.
This is also a preparation to centralize the logic of updating
largefiles in the working directory into the function wrapping
"merge.update" in the subsequent patch.
Before this patch, standinds not known to the restored dirstate at
rollback still exist after rollback of the parent of the working
directory, and they become orphans unexpectedly.
This patch unlinks standins not known to the restored dirstate.
This patch saves names of standins matched against not
"repo.dirstate[f] == 'a'" but "repo.dirstate[f] != 'r'" before
rollback, because branch merging marks files newly added to
dirstate as not "a" but "n".
Such standins will also become orphan after rollback, because they are
not known to the restored dirstate.
Before this patch, standins are restored from the NEW parent of the
working directory at "hg rollback", and this causes:
- standins removed in the rollback-ed revision are restored, and
become orphan, because they are already marked as "R" in the
restored dirstate and expected to be unlinked
- standins added in the rollback-ed revision are left as they were
before rollback, because they are not included in the new parent
(this may not be so serious)
This patch replaces the "merge.update" invocation with a specific
implementation to restore standins according to restored dirstate.
This is also the preparation to centralize the logic of updating
largefiles into the function wrapping "merge.update" in the subsequent
patch.
After that patch, "merge.update" will also update largefiles in the
working directory and be redundant for restoring standins only.
Before this patch, "hg rollback" can't restore standins correclty, if:
- old parent of the working directory is rollback-ed, and
- new parent of the working directory is not branch-tip
"overriderollback" uses "merge.update" as a kind of "revert" utility
to restore only standins with "node=None", and this makes
"merge.update" choose "branch-tip" revision as the updating target
unexpectedly.
Then, "merge.update" restores standins from the branch-tip revision
regardless of the parent of the working directory after rollback and
this may cause unexpected behavior.
This patch invokes "merge.update" with "node='.'" to restore standins
from the parent revision of the working directory.
In fact, this "merge.update" invocation will be replaced in the
subsequent patch to fix another problem, but this change is usefull to
inform reason why such complicated case should be tested.
For efficiency, this patch omits restoring standins and updating
lfdirstate, if the parent of the working directory is not rollbacked.
This patch adds the test not to confirm whether restoring is skipped
or not, but to detect unexpected regression in the future: it is
difficult to distinguish between skipping and perfectly restoring.
Before this patch, linear merging of modified largefiles causes
an unexpected result, if (1) largefile collides with same-name normal one
in the target revision and (2) "local" largefile is chosen, even
though branch merging between such revisions works correctly.
Expected result of such linear merging is marking the largefile as
(re-)"added", but the actual result is marking it as "modified".
The standin of modified "local largefile" is not changed by linear
merging, and updating/merging update lfdirstate entries only for
largefiles of which standins are changed.
This patch adds the code path to update lfdirstate only for largefiles
of which standins are not changed.
In this case, "synclfdirstate" should be invoked with True as
"normallookup" argument always to force using "normallookup" on
dirstate for "n" files, because "normal" may mark target files as
"clean" unexpectedly.
To reduce cost of "lfile not in filelist", this patch converts
"filelist" to a "set" object: "filelist" is used only in (1) the newly
added code path and (2) the next line of "filelist = set(filelist)".
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, "lfdirstate" should
be updated with "dirstate" simultaneously during "merge.update"
execution: maybe by hooking "recordupdates" (+ total refactoring
around lfdirstate handling)
Before this patch, linear merging of modified or newly added largefile
causes unexpected result, if (1) largefile collides with same name
normal one in the target revision and (2) "local" largefile is chosen,
even though branch merging between such revisions doesn't.
Expected result of such linear merging is:
(1) (not yet recorded) largefile is kept in the working directory
(2) largefile is marked as (re-)"added"
(3) colliding normal file is marked as "removed"
But actual result is:
(1) largefile in the working directory is unlinked
(2) largefile is marked as "normal" (so treated as "missing")
(3) the dirstate entry for colliding normal file is just dropped
(1) is very serious, because there is no way to restore temporarily
modified largefiles.
(3) prevents the next commit from adding the manifest with correct
"removal of (normal) file" information for newly created changeset.
The root cause of this problem is putting "lfile" into "actions['r']"
in linear-merging case. At liner merging, "actions['r']" causes:
- unlinking "target file" in the working directory, but "lfile" as
"target file" is also largefile itself in this case
- dropping the dirstate entry for target file
"actions['f']" (= "forget") does only the latter, and this is reason
why this patch doesn't choose putting "lfile" into it instead of
"actions['r']".
This patch newly introduces action "lfmr" (LargeFiles: Mark as
Removed) to mark colliding normal file as "removed" without unlinking
it.
This patch uses "hg debugdirstate" instead of "hg status" in test,
because:
- choosing "local largefile" hides "removed" status of "remote
normal file" in "hg status" output, and
- "hg status" for "large2" in this case has another problem fixed in
the subsequent patch
Before this patch, there are two distinct "wlock" scopes below in
"hgmerge":
1. "merge.update" via original "hg.merge" function
2. "updatelfiles" specific "wlock" scope (to synchronize largefile
dirstate)
But these should be executed in the same "wlock" scope for
consistency, because users of "hg.merge" don't get "wlock" explicitly
before invocation of it.
- merge in commands
This patch puts almost all of the original "hgmerge" implementation into
"_hgmerge" to reduce changes.
Before this patch, there are two distinct "wlock" scopes below in
"hgupdaterepo":
1. "merge.update" via original "hg.updaterepo" function
2. "updatelfiles" specific "wlock" scope (to synchronize largefile
dirstate)
In addition to them, "dirstate.walk" is executed between these "wlock"
scopes.
But these should be executed in the same "wlock" scope for
consistency, because many (indirect) users of "hg.updaterepo" don't
get "wlock" explicitly before invocation of it.
"hg.clean" is invoked without "wlock" from:
- mqrepo.restore in mq
- bisect in commands
- update in commands
"hg.update" is invoked without "wlock" from:
- clone in mq
- pullrebase in rebase
- postincoming in commands (used in "hg pull -u", "hg unbundle")
- update in commands
This patch puts almost all original "hgupdaterepo" implementation into
"_hgupdaterepo" to reduce changes.
This makes hg log --follow --patch work, since in cmdutil._makelogrevset we
use the non-follow matcher for hg log --follow --patch with no file arguments.
This has actually been broken since at least Mercurial 2.8 -- hg log --patch
with largefiles only used to work when no largefiles existed. Rev 658ce4a0a0a9
exposed this bug for all cases.
lfstatus should only be True for operations where we want standins to be
printed out. We explicitly do not want that for historical operations like log.
Other historical operations like hg diff -r A -r B don't print out standins
either.
This is required to fix issue4334, but doesn't fix anything by itself. That's
why there aren't any tests accompanying this patch.
Before this patch, after successful "hg rebase" of the revision
removing largefiles, "hg status" may still show ""R" for such
largefiles unexpectedly.
"lfilesrepo.commit" executes the special code path for automated
committing while rebase/transplant, and lfdirstate entries for removed
files aren't updated in this code path, even after successful
committing.
Then, "R" entries still existing in lfdirstate cause unexpected "hg
status" output.
This patch synchronizes lfdirstate with dirstate after automated
committing.
This patch passes False as "normallookup" to "synclfdirstate", because
modified files in "files()" of the recent (= just committed) context
should be "normal"-ed.
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, lfdirstate should
be updated with dirstate simultaneously. Hooking "markcommitted" of
ctx in "localrepository.commitctx" may achieve this.
This problem occurs, only when (1) the parent of the working directory
is rebased and (2) it removes largefiles, because:
- if the parent of the working directory isn't rebased, returning to
the initial revision (= update) after rebase hides this problem
- files added on "other" branch (= rebase target) are treated not as
"added" but as "modified" (= "normal" status and "unset"
timestamp) at merging
This patch tests also the status of added largefile, but it is only
for avoiding regression.
In addition to conditions above, "hg status" must not take existing
files to reproduce this problem, because existing files make
"match._files" not empty in "lfilesrepo.status" code path below:
def sfindirstate(f):
sf = lfutil.standin(f)
dirstate = self.dirstate
return sf in dirstate or sf in dirstate.dirs()
match._files = [f for f in match._files
if sfindirstate(f)]
Not empty "match._files" prevents "status" on lfdirstate from
returning the result containing problematic "R" files.
This is reason why "large1" (removed) and "largeX" (added) are checked
separately in this patch.
Problematic code path in "lfilesrepo.commit" is used also by "hg
transplant", but this problem doesn't occur at "hg transplant",
because invocation of "updatelfiles" after transplant-ing in
"overridetransplant" causes cleaning lfdirstate up.
This patch tests also "hg transplant" as same as "hg rebase", but it
is only for avoiding regression.
Before this patch, newly added (but not yet committed) largefiles
aren't treated as unknown ("?") after "hg rollback".
After "hg rollback", lfdirstate still contains "A" status entries for
such largefiles, even though corresponding entries for standins are
already dropped from dirstate.
Such "orphan" entries in lfdirstate prevent unknown (large)files in
the working directory from being listed up in "unknown" list. The code
path in "if working" route of "lfilesrepo.status" below drops
largefiles tracked in lfdirstate from "unknown" list:
lfiles = set(lfdirstate._map)
# Unknown files
result[4] = set(result[4]).difference(lfiles)
This patch drops orphan entries from lfdristate at "hg rollback".
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, lfdirstate should
be rollback-ed as a part of transaction, as same as dirstate.
Before this patch, removed or forgotten largefiles aren't treated as
removed ("R") after "hg rollback". Removed ones are treated as missing
("!") and forgotten ones are treated as clean ("C") unexpectedly.
"overriderollback" uses "normallookup" to restore status in lfdirstate
for largefiles other than ones not added in rollback-ed revision, but
this isn't correct for removed (or forgotten) largefiles.
This patch uses "lfutil.synclfdirstate" to restore "R" status of
removed (or forgotten) largefiles correctly at "hg rollback".
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, lfdirstate should
be rollback-ed as a part of transaction, as same as dirstate.
Before this patch, there are three distinct "wlock" scopes in
"overriderollback":
1. "localrepository.rollback" via original "rollback" command,
2. "merge.update" for reverting standin files only, and
3. "overriderollback" specific "wlock" scope (to synchronize
largefile dirstate)
But these should be executed in the same "wlock" scope for
consistency.
Before this patch, largefiles gotten from revisions other than the
parent of the working directory at "hg revert" become "clean"
unexpectedly in steps below:
1. "repo.status()" is invoked (for status check before reverting)
1-1 "dirstate" entry for standinfile SF is "normal"-ed
(1-2 "lfdirstate" entry of largefile LF (for SF) is "normal"-ed)
2. "cmdutil.revert()" is invoked
2-1 standinfile SF is updated in the working directory
2-2 "dirstate" entry for SF is NOT updated
3. "lfcommands.updatelfiles()" is invoked (by "overrides.overriderevert()")
3-1 largefile LF (for SF) is updated in the working directory
3-2 "dirstate" returns "n" and valid timestamp for SF (by 1-1 and 2-2)
3-3 "lfdirstate" entry for LF is "normal"-ed
3-4 "lfdirstate" is written into ".hg/largefiles/dirstate", and
timestamp of LF is stored into "lfdirstate" file (by 3-3)
(ASSUMPTION: timestamp of LF differs from one of "lfdirstate" file)
Then, "hs status" treats LF as "clean", even though LF is updated by
"other" revision (by 3-1), because "lfilesrepo.status()" always treats
"normal"-ed files (by 3-3 and 3-4) as "clean".
When largefiles are reverted, they should be "normallookup"-ed
forcibly.
This patch uses "normallookup" on "lfdirstate" while reverting, by
passing "True" to newly added argument "normallookup".
Forcible "normallookup"-ing is not so expensive, because list of
target largefiles is explicitly specified in this case.
This patch uses "[debug] dirstate.delaywrite" feature in the test, to
ensure that timestamp of the largefile gotten from "other" revision is
stored into ".hg/largefiles/dirstate" (for ASSUMPTION at 3-4)
Before this patch, largefiles gotten from "other" revision (with
conflict) at "hg merge" become "clean" unexpectedly in steps below:
1. "repo.status()" is invoked (for status check before merging)
1-1 "dirstate" entry for standinfile SF is "normal"-ed
1-2 "lfdirstate" entry of largefile LF (for SF) is "normal"-ed
2. "merge.update()" is invoked
2-1 SF is updated in the working directory
(ASSUMPTION: user choice "other" at conflict)
2-2 "dirstate" entry for SF is "merge"-ed
3. "lfcommands.updatelfiles()" is invoked (by "overrides.hgmerge()")
3-1 largefile LF (for SF) is updated in the working directory
3-2 "dirstate" returns "m" for SF (by 2-2)
3-3 "lfdirstate" entry for LF is left as it is
3-4 "lfdirstate" is written into ".hg/largefiles/dirstate", and
timestamp of LF is stored into "lfdirstate" file (by 1-2)
(ASSUMPTION: timestamp of LF differs from one of "lfdirstate" file)
Then, "hs status" treats LF as "clean", even though LF is updated by
"other" revision (by 3-1), because "lfilesrepo.status()" always treats
"normal"-ed files (by 1-2 and 3-4) as "clean".
When state of standinfile in "dirstate" is "m", largefile should be
"normallookup"-ed.
This patch invokes "normallookup" on "lfdirstate" for merged files.
This patch uses "[debug] dirstate.delaywrite" feature in the test, to
ensure that timestamp of the largefile gotten from "other" revision is
stored into ".hg/largefiles/dirstate". (for ASSUMPTION at 3-4)
Before this patch, largefiles gotten from "other" revision (without
conflict) at "hg merge" become "clean" unexpectedly in steps below:
1. "merge.update()" is invoked
1-1 standinfile SF is updated in the working directory
1-2 "dirstate" entry for SF is "normallookup"-ed
2. "lfcommands.updatelfiles()" is invoked (by "overrides.hgmerge()")
2-1 largefile LF (for SF) is updated in the working directory
2-2 "dirstate" returns "n" for SF (by 1-2)
2-3 "lfdirstate" entry for LF is "normal"-ed
2-4 "lfdirstate" is written into ".hg/largefiles/dirstate", and
timestamp of LF is stored into "lfdirstate" file
(ASSUMPTION: timestamp of LF differs from one of "lfdirstate" file)
Then, "hs status" treats LF as "clean", even though LF is updated by
"other" revision (by 2-1), because "lfilesrepo.status()" always treats
"normal"-ed files (by 2-3 and 2-4) as "clean".
When timestamp is not set (= negative value) for standinfile in
"dirstate", largefile should be "normallookup"-ed regardless of
rebasing or not, because "n" state in "dirstate" doesn't ensure
"clean"-ness of a standinfile at that time.
This patch uses "normallookup" instead of "normal", if "mtime" of
standin is unset
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, "lfdirstate" should
be updated with "dirstate" simultaneously while "merge.update"
execution: maybe by hooking "recordupdates"
It is also why this patch (temporarily) uses internal field "_map" of
"dirstate" directly.
This patch uses "[debug] dirstate.delaywrite" feature in the test, to
ensure that timestamp of the largefile gotten from "other" revision is
stored into ".hg/largefiles/dirstate". (for ASSUMPTION at 2-4)
This patch newly adds "test-largefiles-update.t", to avoid increasing
cost to run other tests for largefiles by subsequent patches
(especially, "[debug] dirstate.delaywrite" causes so).
Previously, the directory '.hg/largefiles' would always be created if it didn't
exist when the lfdirstate was opened. If there were no standin files, no
dirstate file would be created in the directory. The end result was that
enabling the largefiles extension globally, but not explicitly adding a
largefile would result in the repository eventually sprouting this directory.
Creation of this directory effectively changes readonly operations like summary
and status into operations that require write access. Without write access,
commands that would succeed without the extension loaded would abort with a
surprising error when the extension is loaded, but not actively used:
$ hg sum -R /tmp/thg --config extensions.largefiles=
parent: 16541:00dc703d5aed
repowidget: specify incoming bundle by plain file path to avoid url parsing
branch: default
abort: Permission denied: '/tmp/thg/.hg/largefiles'
This change is simpler than changing the callers of openlfdirstate() to use the
'create' parameter that was introduced in 74522122b97d, and probably how that
should have been implemented in the first place.
Before this patch, "hg summary" and "hg outgoing" show and count up
all largefiles changed/added in outgoing revisions, even though some
of them are already uploaded into remote store.
This patch confirms existence of outgoing largefile entities in remote
store, to show and count up only really outgoing largefile entities at
"hg summary" and "hg outgoing".
Before this patch, "hg outgoing --large" shows which largefiles are
changed or added in outgoing revisions only in the point of the view
of filenames.
For example, according to the list of outgoing largefiles shown in "hg
outgoing" output, users should expect that the former below costs much
more to upload outgoing largefiles than the latter.
- outgoing revisions add a hundred largefiles, but all of them refer
the same data entity
in this case, only one data entity is outgoing, even though "hg
summary" says that a hundred largefiles are outgoing.
- a hundred outgoing revisions change only one largefile with
distinct data
in this case, a hundred data entities are outgoing, even though
"hg summary" says that only one largefile is outgoing.
But the latter costs much more than the former, in fact.
This patch shows also how many data entities are outgoing at "hg
outgoing" by counting number of unique hash values for outgoing
largefiles.
When "--debug" is specified, this patch also shows what entities (in
hash) are outgoing for each largefiles listed up, for debug purpose.
In "ui.debugflag" route, "addfunc()" can append given "lfhash" to the
list "toupload[fn]" always without duplication check, because
de-duplication is already done in "_getoutgoings()".
Before this patch, "hg summary --large" shows how many largefiles are
changed or added in outgoing revisions only in the point of the view
of filenames.
For example, according to the number of outgoing largefiles shown in
"hg summary" output, users should expect that the former below costs
much more to upload outgoing largefiles than the latter.
- outgoing revisions add a hundred largefiles, but all of them refer
the same data entity
in this case, only one data entity is outgoing, even though "hg
summary" says that a hundred largefiles are outgoing.
- a hundred outgoing revisions change only one largefile with
distinct data
in this case, a hundred data entities are outgoing, even though
"hg summary" says that only one largefile is outgoing.
But the latter costs much more than the former, in fact.
This patch shows also how many data entities are outgoing at "hg
summary" by counting number of unique hash values for outgoing
largefiles.
This patch introduces "_getoutgoings" to centralize the logic
(de-duplication, too) into it for convenience of subsequent patches,
even though it is not required in "hg summary" case.
This patch changes the calling signature of memfilectx's __init__ to fall in
line with the other file contexts.
Calling code and tests have been updated accordingly.
This replaces the grand unified action list that had multiple action types as
tuples in one big list. That list was iterated multiple times just to find
actions of a specific type. This data model also made some code more
convoluted than necessary.
Instead we now store actions as a tuple of lists. Using multiple lists gives a
bit of cut'n'pasted code but also enables other optimizations.
This patch uses 'if True:' to preserve indentations and help reviewing. It also
limits the number of conflicts with other pending patches. It can trivially be
cleaned up later.
Adds a labels function parameter to all the functions between merge.update and
filemerge.filemerge. This will allow commands like rebase to specify custom
marker labels.
When invoked from another directory, the matchers m._cwd will be the absolute
path. The code for calculating relative path to .hglf did not consider that and
log would fail with weird errors and paths.
For now, just don't do any largefile magic when invoked from other directories.
Log for largefiles was failing for graph log since it was overriding match
instead of matchandpats.
[Mads Kiilerich modified this patch to address his review comments and ended up
rewriting/removing most of it.]
cat of a standin would silently fail.
The use of standins is mostly an implementation detail, but it is already a bit
leaking. Being able to see the content of standins might be convenient for
debugging.
A .orig of a standin after the update do that a .orig of the actual largefile
is created. The .orig standin was however never removed again and the largefile
.orig was thus overwritten again and again.
The fix: remove the standin .orig when it is used.
Before this patch, "hg outgoing" invokes "findcommonoutgoing()" not
only in "commands.outgoing()" but also in
"overrides.overrideoutgoing()" (via "getoutgoinglfiles()"), when
largefiles is enabled. The latter is redundant.
This patch uses "outgoinghooks" to avoid redundant outgoing check.
Newly introduced function "overrides.outgoinghook()" is registered
into "outgoinghooks" to get the result of outgoing check in
"commands.outgoing()".
It invokes "lfutil.getlfilestoupload()" directly with the result of
outgoing check to avoid redundant outgoing check in
"getoutgoinglfiles()": "sort()" is needed, because
"lfutil.getlfilestoupload()" doesn't sort the result of it.
This patch also omits "if toupload is None" ("No remote repo") case,
because failure of looking remote repository up should raise exception
in "commands.outgoing()" before invocation of "outgoinghooks".
Newly added "hg outgoing --large --graph" tests examine
"outgoinghooks" invocations in "hg outgoing --graph" code path.
Before this patch, "hg summary --remote --large" invokes
"findcommonoutgoing()" not only in "commands.summary()" but also in
"overrides.overridesummary()" (via "getoutgoinglfiles()"). The latter
is redundant.
This patch uses "summaryremotehooks" to avoid redundant outgoing check.
Newly introduced function "overrides.summaryremotehook()" is
registered into "summaryremotehooks" to get the result of outgoing
check in "commands.summary()".
It invokes "lfutil.getlfilestoupload()" directly with the result of
outgoing check to avoid redundant outgoing check in
"getoutgoinglfiles()".
Before this patch, "hg push" invokes "findcommonoutgoing()" not only
in "exchange.push()" but also in "lfilesrepo.push()", when largefiles
is enabled. The latter is redundant.
This patch registers own "prepushoutgoinghook" function into
"prepushoutgoinghooks" of "localrepository" to reuse
"findcommonoutgoing()" result.
"prepushoutgoinghook" omits "changelog.nodesbetween()" invocation,
because "findcommonoutgoing()" invocation in "exchange.push()" takes
"onlyheads" argument and it considers "nodesbetween()".
Before this patch, "overrides.getoutgoinglfiles()" (called by
"overrideoutgoing()" and "overridesummary()") and "lfilesrepo.push()"
implement similar logic to get outgoing largefiles separately.
This patch centralizes the logic to get outgoing largefiles in
"lfutil.getlfilestoupload()".
"lfutil.getlfilestoupload()" takes "addfunc" argument, because each
callers need different information (and it is useful for enhancement
in the future).
- "overrides.getoutgoinglfiles()" needs only filenames
- "lfilesrepo.push()" needs only hashes of largefiles
Before this patch, "contrib/check-code.py" can't detect these
problems, because the regexp pattern to detect "% inside _()" doesn't
suppose the case that the format string and "%" aren't placed in the
same line.
This patch replaces "\s" in that regexp pattern with "[ \t\n]" to
detect "% inside _()" problems in such case.
"[\s\n]" can't be used in this purpose, because "\s" is automatically
replaced with "[ \t]" by "_preparepats()" and "\s" in "[]" causes
nested "[]" unexpectedly.
Since changeset a8955c4d9ef5, "reposetup()" of each extensions is
invoked only on repositories enabling corresponded extensions.
This causes that largefiles specific interactions between the
repository enabling largefiles locally and remote (wire) peer fail,
because there is no way to know whether largefiles is enabled on the
remote repository behind the wire peer, and largefiles specific
"wireproto functions" are not given to any wire peers.
To avoid this problem, largefiles should be enabled in wider scope
than each repositories (e.g. user-wide "${HOME}/.hgrc").
This patch introduces "wirepeersetupfuncs" to setup wire peer by
extensions already enabled. Functions registered into
"wirepeersetupfuncs" are invoked for all wire peers.
This patch uses plain list instead of "util.hooks" for
"wirepeersetupfuncs", because the former allows to control order of
function invocation by order of extension enabling: it may be useful
for workaround of problems with combination of enabled extensions
This should correct an earlier couple of bad merges (5433856b2558 and
596960a4ad0d, now pruned) that accidentally brought in a change that had
been marked obsolete (244ac996a821).
Some extensions set configuration settings that showed up in 'hg showconfig
--debug' with 'none' as source. That was confusing.
Instead, they will now tell which extension they come from.
This change tries to be consistent and specify a source everywhere - also where
it perhaps is less relevant.
The largefile hashes are mostly an implementation detail, but they are "leaked"
in several places anyway, and showing the hashes is better than not giving the
user any information about the options in the prompt.
The hashes are long, but it is largefile hashes and it would thus be confusing
to shorten them.
Before it tried to explain the exact situation when merging moved largefiles.
That do not happen for normal merges and is not more relevant for largefiles
than for normal files. It is unneeded complexity - remove it.
Since the localrepositoyry.push() method in mercurial/localrepo.py is defined
this way:
def push(self, remote, force=False, revs=None, newbranch=False):
it is better for largefiles to call push() on the super class with proper
kwargs to respect the API.
This will avoid breaking other extensions overriding the push method this way:
def push(self, remote, force=False, **kwargs):
a8386b4c47b1 introduced splitstandin on all action filenames. It would however
crash on 'd' actions where the filename is None.
Fix that and add test coverage for that case.
An update would try to fetch any missing largefiles after having updated normal
files and standins. That could fail or be interrupted and would leave the
working directory in a state where the largefiles not only were missing but
also were scheduled for remove ... and where the old largefile was left in
place.
Instead we now remove old largefiles before starting to download and update
missing largefiles.
Prompts like
foo has been turned into a largefile
use (l)argefile or keep as (n)ormal file?
was not as clear as the usual prompts that use 'remote' or 'local' to explain
what happened on which side ... especially not when used to the normal prompts.
"as" could also indicate that it would be possible to take the content of the
largefile and somehow put it into the normal file. It could make it more clear
that it was a choice between one side or the other.
For consistency we will now phrase it like:
remote turned local normal file f into a largefile
use (l)argefile or keep (n)ormal file?
We used to get like:
$ hg up -r 2
foo has been turned into a normal file
keep as (l)argefile or use (n)ormal file? l
getting changed largefiles
0 largefiles updated, 0 removed
0 files updated, 0 files merged, 2 files removed, 0 files unresolved
$ cat foo
cat: foo: No such file or directory
[1]
- which both asked the wrong question and did the wrong thing.
Instead, skip this conflict resolution when the local conflicting file has been
scheduled for removal and there thus is no conflict.