Files that are already in local store should be checked locally. The problem
with this implementation is how difference in messages between local and remote
checks should look like. For now local errors for file missing and content
corrupted looks like this:
'changeset cset: filename references missing storepath\n'
'changeset cset: filename references corrupted storepath\n'
for remote it looks like:
'changeset cset: filename missing\n'
'changeset cset: filename: contents differ\n'
Contents differ error for remote calls is never raised currently - for now
statlfile implementation lacks checking file content.
I've caught multiple extensions in the wild lying about being
'internal', so it's time to move the goalposts on people. Goalpost
moving will continue until third party extensions stop trying to
defeat the system.
The default of pushing all largefiles referenced in outgoing revisions is safe,
but also expensive and sometimes not what is needed. We thus introduce a
--lfrev option, similar to what pull already has.
By specifying an empty set of revisions (or null), it is possible to get lazy
(and insecure!) pushes of revisions without referenced largefiles, similar to
how pull works.
prepushoutgoinghook was introduced in 8dfcd476a7f7 and largefiles is the only
in-tree use of it. Refactor it to be more useful for other use cases in
largefiles.
This commit is part of bigger effort described in 'Windows UTF-8' plan.
It is not changing all invocations but the ones where change is
obviously correct and doesn't require complicated changes.
9f1a3c7b4a28 introduced support for not having a "global" user cache.
In the rare cases where the environment didn't provide the location of the
current home directory, the usercachepath function could return None.
That functionality has since bitrotten and several code paths did not correctly
check for usercachepath returning None:
$ HOME= XDG_CACHE_HOME= hg up --config extensions.largefiles=
getting changed largefiles
abort: unknown largefiles usercache location
Dropping the partial support for it is thus not really a backward compatibility
breaking change.
Thus: consistently fail early if the usercache location is unknown.
It is relevant to be able to control where the largefiles are stored and how
they propagate, but that should probably be done differently. The dysfunctional
code just gets in the way.
Replaces invocations os.path functions to methods in vfs. Unfortunately
(in my view) this makes code less readable, because instead of using
clear variable names with path it needs to replace them with vfs(..).
I need guidance how to make such transition look more readable.
For example in this patch there is example with few places with
wvfs.join(standindir), standindir before this patch was absolute
path, in this it is changed to relative because it is used also
in expression wvfs.join(standindir, pat).
Using plural form is consistent with other progress units, and "1 out of 5
revisions" sounds more correct. Also, tests don't show this, but if you have
'speed' item in progress.format config, it shows e.g. '100 revisions/sec',
which also seems better.
Methods _put, _get, _stat were used in remotestore class as
abstract expecting that subclass would implement them. This
commit makes this fact explicit.
This patch consists of changes below (these can't be applied
separately).
- replace revset.extpredicate by registrar.revsetpredicate in
extensions
- remove setup() on an instance named as revsetpredicate in
uisetup()/extsetup() of each extensions
registrar.revsetpredicate doesn't have setup() API.
- put new entry for revsetpredicate into extraloaders in dispatch
This causes implicit loading predicate functions at loading
extension.
This loading mechanism requires that an extension has an instance
named as revsetpredicate, and this is reason why
largefiles/__init__.py is also changed in this patch.
Before this patch, test-revset.t tests that all decorated revset
predicates are loaded by explicit setup() at once ("all or nothing").
Now, test-revset.t tests that any revset predicate isn't loaded at
failure of loading extension, because loading itself is executed by
dispatch and it can't be controlled on extension side.
In an upcoming patch we'll have different behavior here for when 'merge
--force' is used as opposed to when other kinds of force operations are
performed, like rebases.
Previously, if the largefile was deleted at the time of a commit, the standin
was silently not updated and its current state (possibly garbage) was recorded.
The test makes it look like this is somewhat of an edge case, but the same thing
happens when an `hg revert` followed by `rm` changes the standin.
Aside from the second invocation of this in lfutil.updatestandinsbymatch()
(which is what triggers this test case), the three other uses are guarded by
dirstate checks for added or modified, or an existence check in the filesystem.
So aborting in lfutil.updatestandins() should be safe, and will avoid silent
skips in the future if this is used elsewhere.
The change in 6fce9a02f069 to handle a normal -> largefile switch was too
aggressive in preserving the original matcher names. If a largefile is
explicitly provided by the user, but only the standin exists in dirstate, then
only the standin can be committed.
There's still maybe an issue when the largefile is deleted outside of Mercurial:
$ rm large
$ hg ci -m "oops" large
large: The system cannot find the file specified
nothing changed
[1]
92117e4f6f8d improved merging of standin files referencing missing largefiles.
It did however not test or fix commits of such merges; it would abort.
To fix that, change copytostore to skip and warn about missing largefiles
with a message similar the one for failing get from remote filestores. (It
would perhaps in both cases be better to emit a more helpful warning like
"warning: standin file for large1 references 58e24f733a which can't be found in
the local store".)
To test this, make sure commit doesn't find the "missing" largefile in the global
usercache. For further testing, verify that update and status works as expected
after this.
This will also effectively backout 159c82dd6523.
Previous patch introduced 'revset.predicate' decorator to register
revset predicate function easily.
But it shouldn't be used in extension directly, because it registers
specified function immediately. Registration itself can't be restored,
even if extension loading fails after that.
Therefore, registration should be delayed until 'uisetup()' or so.
This patch uses 'extpredicate' decorator derived from 'delayregistrar'
to register predicate in extension easily.
This patch also tests whether 'registrar.delayregistrar' avoids
function registration if 'setup()' isn't invoked on it, because
'extpredicate' is the first user of it.
Once we get a matcher down into manifestmerge, we can make narrowhg
work more easily and potentially let manifest.match().diff() do less
work in manifestmerge.
The largefiles merge code (currently) does not handle change/delete conflicts.
So fall back to regular filemerge in that case.
Making this code handle change/delete conflicts is left as an exercise for the
future.
We're going to treat these conflicts similarly to merge conflicts, and this
change to the way we store things in memory makes future code a lot simpler.
'unexpected putlfile response: None' when an http error occurs is not very
helpful.
Instead, leave the handling of urllib2.HTTPError exceptions to other layers.
If the store somehow got corrupted, users could end up in weird situations that
were very hard to recover from or lead to propagation of the corruption.
Instead, spend the extra time checking the hash when copying to the working
directory. If it doesn't match, emit a warning, and don't put wrong content in
the working directory.
Commit of corresponding normal/largefiles pairs would only commit the standin.
That is usually fine, except if either the normal file or the standin is a
remove while the other is an add. In that case it would either give duplicate
colliding entries or lose the file.
Instead, commit both filenames if one of them is a remove.
Before, when merging revisions with missing largefiles, the missing largefiles
would be fetched as a part of the merge. If that failed (for example because
the main repository temporarily was unavailable), the largefile would be left
missing. However, the next commit would abort and (seemed to) fail when
markcommitted tried to mark the standin file as normal and thus had to hash the
largefile that didn't exist. (Actually, the commit would succeed but the
largefile update that follows right after the commit transaction would abort -
quite confusing.)
To fix that, make sure that synclfdirstate only marks files as normal if they
actually exist.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
This regressed in f07f4c45a8f2, when trying to conditionally disable archiving
of largefiles.
I'm not sure if wrapfunction() is the right way to do this, but it seems to
work. The mysterious issue with lfstatus getting out of sync in the proxy and
the unfiltered view crops up again here. See the referenced cset for more info.
Previously, simply having the largefiles extension loaded without any largefiles
added would crash when amending with -I. The problem was with no files in the
matcher, the pattern list of files joined with 'standindir' was empty, and
scmutil.match() would match everything. In lfutil.composestandinmatcher(), the
match function is used to test if the file is a standin, and after getting a
false positive, proceeds to call lfutil.splitstandin(). This returns None
because it isn't a standin, which blows up when passed to rmatcher.matchfn().
Manually overriding _always in getstandinmatcher() probably isn't necessary
anymore, but we leave well enough alone on stable. This regressed in
78632d61a993.
There are currently no users of this, but it is a necessary step before
converting extdiff to use archive. It may be useful to add an argument to
extdiff in the future and allow largefiles to be diffed, but archiving
largefiles can have significant overhead and may not be very diffable, so
archiving them by default seems wrong.
It is a mystery to me why the lfstatus attribute needs to be set on the
unfiltered repo. However if it is set on the filtered repo instead (and the
filtered repo is passed to the original command), the lfstatus attribute is
False in the overrides for archival.archive() and hgsubrepo.archive() when
invoking the archive command. This smells like the buggy status behavior (see
9fc565fa1621, which was reverted in e1574a4a2cad). Neither the status nor
summary commands have this weird behavior in their respective overrides.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
Python 2.6 introduced a new octal syntax: "0oXXX", replacing "0XXX". The
old syntax is not recognized in Python 3 and will result in a parse
error.
Mass rewrite all instances of the old octal syntax to the new syntax.
This patch was generated by `2to3 -f numliterals -w -n .` and the diff
was selectively recorded to exclude changes to "<N>l" syntax conversion,
which will be handled separately.
This is a step toward replacing the extdiff internals with archive, in order to
support 'extdiff -S'. Only Mercurial subrepos are supported for now.
If a file is missing from the filesystem, it is silently skipped. Perhaps it
should warn, but it cannot abort when working with extdiff because deleting a
file is a legitimate diff.
Previously, if there were any hidden changesets, the --lfa argument would cause
the command to abort with a hint about using --hidden when it tripped over a
hidden changeset.
The monkey patching in cat() can't be fixed, because it still delegates to the
original bad(). Overriding commands.cat() should go away in favor overriding
cmdutil.cat() anyway, and that matcher can be wrapped with matchmod.badmatch().
No known issues with the previous code since it restored the original method,
but this is cleaner.
The monkey patching in cat is harmless, because it is created locally, and
doesn't pass it anywhere (subrepo cat isn't supported with largefiles).
The logic in the convert extension is more advanced, supporting extra features
like converting revision IDs in 'extras' (e.g. 'amend_source'), supports
updating hashes in commit messages, and outputs an SHA map file. Rather than
try to duplicate all of that, just use the existing code.
Even though the convert extension supports user supplied options like filemap,
etc, those features aren't available on the lfconvert interface. Therefore, it
is safe to use the filemap mechanism (in memory) to handle the standin -> file
rename. The convert extension handles the destination locking for this path.
There was a comment in test-lfconvert.t about the hash on rev 5 being different
because it was doing a better job than "hg remove" + "hg merge" + "hg commit".
It isn't clear to me what was happening or why, but now the hashes match the
original repo exactly after a roundtrip, which seems like a good idea. If there
really was something beneficial about the previous behavior, perhaps merge can
be changed so that everyone benefits.
Converting to a largefiles repo still uses the original (limited) lfconvert
logic.
The choice between the "always" case and the other case is done in
getstandinmatcher() and the next patch will change how it's determined
based on the matcher, so let's prepare by passing in the matcher, not
just the matcher's files.
Extension authors (notably at companies using hg) have been
cargo-culting the `testedwith = 'internal'` bit from hg's own
extensions, which then defeats our "file bugs over here" logic in
dispatch. Let's be more aggressive about trying to give extension
authors a hint about what testedwith should say.
This is related to df069d92be27, but not required for that to function correctly
because 'prefix' always ends with '/', so os.path.join(prefix, _path) simply
concatenates the strings. Take the opportunity to drop the os.path usage, and
avoid confusion by using the same pattern as the overridden subrepo method.
The fileset-generated.t test previously failed with this:
+ hg: parse error: unknown identifier: .hglf/modified
+ (did you mean 'modified'?)
+ [255]
Filesets will find the standins on their own, without any help. While that's
useful for some things like modified(), clean(), etc., it is wrong for things
like size(). Proper fileset support for largefiles is not trivial, but this was
failing with just the extension enabled on a normal repo.
The immediate crash was when checking for requirements immediately after this,
but lfcommands.downloadlfiles() will also crash if --all-largefiles is
specified. That has been in place since atleast e01343f7da6f (2.3-rc) without
anyone noticing.
I can't tell from the peer classes if there's a way to make the custom largefile
functionality work in this case, but atleast it doesn't crash.
mergeupdate already set the flag to update all. This will thus only change
overriderevert and scmutilmarktouched ... where the flag effectually also were
true. The test coverage thus shows no change.
As the flag always is set, it is removed.
This is mainly a change for keeping the code simple and consistent and correct,
but it should also make it faster in many cases.
Before, a --clean update with largefiles would use the "optimization" that it
didn't read hashes from standin files before and after the update. Instead of
trusting the content of the standin files, it would rehash all the actual
largefiles that lfdirstate reported clean and update the standins that didn't
have the expected content. It could thus in some "impossible" situations
automatically recover from some "largefile got out sync with its standin"
issues (even there apparently still were weird corner cases where it could
fail). This extra checking is similar to what core --clean intentionally do
not do, and it made update --clean unbearable slow.
Usually in core Mercurial, --clean will rely on the dirstate to find the files
it should update. (It is thus intentionally possible (when trying to trick the
system or if there should be bugs) to end up in situations where --clean not
will restore the working directory content correctly.) Checking every file when
we "know" it is ok is however not an option - that would be too slow.
Instead, trust the content of the standin files. Use the same logic for --clean
as for linear updates and trust the dirstate and that our "logic" will keep
them in sync. It is much cheaper to just rehash the largefiles reported dirty
by a status walk and read all standins than to hash largefiles.
Most of the changes are just a change of indentation now when the different
kinds of updates no longer are handled that differently. Standins for added
files are however only written when doing a normal update, while deleted and
removed files only will be updated for --clean updates.
This allows passing a matcher down the pathcopies() stack to _forwardcopies().
This will let us add logic in a later patch to avoid tracing copies when not
necessary (like when doing hg diff -r 1 -r 2 foo.txt).
This makes treemanifest.walk() not visit submanifests that are known not to
have any matching files. It does this by calling match.visitdir() on
submanifests as it walks.
This change also updates largefiles to be able to work with this new behavior
in treemanifests. It overrides match.visitdir(), the function that dictates
how walk() and matches() skip over directories.
The greatest speed improvements are seen with narrower scopes. For example,
this commit speeds up the following command on the Mozilla repo from 1.14s
to 1.02s:
hg files -r . dom/apps/
Whereas with a wider scope, dom/, the speed only improves from 1.21s to 1.13s.
As with similar a similar optimization to treemanifest.matches(), this change
will bring out even bigger performance improvements once treemanifests are
loaded lazily. Once that happens, we won't just skip over looking at
submanifests, but we'll skip even loading them.
The benefit of retargeting the local store to the share source is that all
shares will always have access to the largefiles any one of them commit, even if
the user cache is deleted (which is documented to be OK to do). Further, any
push into the source (and now any shares), will likewise make the largefile(s)
visible to all related repositories.
In order to maintain compatibility with existing repos, where the largefiles
would be cached only in the local share, fallback to searching the local share
if it isn't found at the share source.
The unshare command should probably be taught to copy the source store into the
store for the repo being unshared to complete the loop.
This patch changes the test like this:
@@ -159,6 +159,5 @@
$ hg share -q src share_dst --config extensions.share=
$ hg -R share_dst update -r0
getting changed largefiles
- large: largefile $HASH not available from file:///$TESTTMP\share_dst
- 0 largefiles updated, 0 removed
+ 1 largefiles updated, 0 removed
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
The issue writeup mentions pushing a largefile from a remote repo to the main
local repo, and the largefile is then not available in any shares. Since the
push doesn't cache the largefile in $USERCACHE, the trashed $USERCACHE in this
test is equivalent.
The handful of direct uses of lfutil.storepath() merely need a single path to
read from or write to the largefile, whether or not it exists. Most callers
that care about the file existing call lfutil.findfile(), in order to fallback
from the store to the user cache.
localstore._verify() doesn't call lfutil.findfile(). This prevents redirecting
the store to the share source because the largefiles for existing repos may not
be in the source's store, so verification may fail. It can't be changed to call
findfile(), because findfile() links the file from the usercache to the local
store[1], and because it returns None instead of a path if the file doesn't
exist.
For now, this method is just a cover for lfutil.storepath(), but it will be
filled out in an upcoming patch.
[1] Maybe we shouldn't care? But on a filesystem that doesn't support
hardlinks, then verify will take a lot longer, and start to consume disk
space.
Now, "overrideupdate()" wrapping "hg update" is useless, because
"workingctx.dirty() and raising Abort" in "hg update" was replaced by
"cmdutil.bailifchanged()" in the previous patch, and the latter can
detect changes of largefiles in the working directory.
In "commands.update()", "cmdutil.bailifchanged()" isn't used for
"abort if the working directory is dirty", because it forcibly
examines about merging in progress.
"workingctx.dirty()" used in "commands.update()" can't detect changes
of largefiles in the working directory without "repo.lfstatus = True"
wrapping. This is only reason of "commands.update()" wrapping by
largefiles extension.
On the other hand, "cmdutil.bailifchanged()" already wrapped by
largefiles extension can detect changes of largefiles.
This patch is a preparations for replacing "workingctx.dirty()" and
raising Abort in "commands.update()" by "cmdutil.bailifchanged()". It
can remove redundant "commands.update()" wrapping.
As the failing revert tests in test-fileset-generated.t show,
Revert currently creates one matcher for matching files in the working
copy and another matcher for matching files in the target
revision. The two matchers are created with different contexts, which
means filesets are evaluated differently. Then the union of the sets
of files matching the matchers in the two contexts are reverted. It
doesn't seem to make sense to use two different matchers; only the
context they're applied to should be different.
It seems very likely that the user wants the filesets to be evaluated
against the working directory, which the tests
test-fileset-generated.t also assume, so let's make it so.
I willingly admit that the largefiles code was modified by trial and
error (according to tests).
By overriding the cmdutil method we don't need to override both the
function and the command. Also, we get access to the 'ctx' and
'parents' variables, which will soon prove useful.
Rename the 'ctx' argument to overridematch() to 'mctx' rather than
letting it shadow new 'ctx'.
Commit 359818e9961c (largefiles: don't create chain of __contains__
calls, 2015-03-11) introduced a typo with __class instead of
__class__. Let's fix it.
There are (obviously) no tests covering this code path, and I could
not figure out a way to trigger it, so it remains untested.
Spotted by Drew Gottlieb.
Matt Harbison pointed out that my recent cdf4a99e1657 might cause
__contains__ to continously get replaced by another version that calls
itself, since the manifest instance returned by the super method is
always the same instance due to @propertycache. He also suggested
replacing the class instead, as is done with the context class in the
surrounding code. Do so.
We're soon going to add an alternative manifest class
(treemanifest). Rather than extending both those classes by
largesfiles versions, let's replace the method on the manifest
instances.
Previously, the source was silently skipped because the largefile was in the
list of changed files, but the standin was in the copies dictionary. The source
is only displayed if the changed file is a key in the copies dictionary.
It's probably possible to refactor so that the 'if m._cwd' check isn't
necessary, but the False case is the typical case (i.e. run from the root of the
repo), and simpler to read.
An exact path to a largefile from outside the repo was previously ignored.
match.rel('.hglf') will handle figuring out both the correct '../' length to the
standin directory if inside the repo, or path/to/repo from outside, at the cost
of a pconvert() to keep the patterns using '/' on Windows.
When logging '.hglf/foo', the pattern list was being transformed from
['.hglf/foo'] into ['.hglf/foo', '.hglf/.hglf/foo']. Aside from the
pathological case of somebody getting a directory named '.hglf' created inside
the standing directory, the old code shouldn't have had any bad effects.
(amended by mpm to sort patterns for test stability and not upset check-code)
Adding the standin to the patterns list was (possibly) harmless before, but was
wrong, because the pattern list was already updated above that code. Now that
patterns are handled, it was actually harmful. For example, in this test:
$ hg log -G glob:**another*
the adjusted pattern list would have been:
['glob:**another*', '.hglf/.', 'glob:.hglf/**another*']
which causes every largefile in the root to be matched.
I'm not sure why 'glob:a*' picks up the rename of a -> b commit in test-log.t,
but a simple 'a' doesn't. But it doesn't appear to be caused by the largefiles
extension.
All current callers supply some sort of prefix, so the issue was hidden. But if
no parameter was specified, a crash occurred in the write() closure when
concatenating 'prefix' and 'name'.
When there isn't lfdirstate file in cases below, "openlfdirstate()"
call "scmutil.match()" indirectly to build lfdirstate up.
- subrepos disabling largefiles locally
- lfdirstate file is missed accidentally
This causes infinite recursive call of "openlfdirstate()" in
"overriderevert()" (introduced by 25febe9568dd), because
"openlfdirstate()" is invoked from the function overriding
"scmutil.match()" itself.
To avoid infinite recursive call of "openlfdirstate()" in
"overriderevert()" in such cases, this patch passes "create=False"
argument to "openlfdirstate()".
"create=False" forcibly makes "openlfdirstate()" avoid code path to
build lfdirstate up.
Even if largefiles extension is enabled in a repository, "repo"
object, which isn't "largefiles.reposetup()"-ed, is passed to
overridden functions in the cases below unexpectedly, because
extensions are enabled for each repositories strictly.
(1) clone without -U:
(2) pull with -U:
(3) pull with --rebase:
combination of "enabled@src", "disabled@dst" and
"not-required@src" cause this situation.
largefiles requirement
@src @dst @src result
-------- -------- --------------- --------------------
enabled disabled not-required aborted unexpectedly
required requirement error (intentional)
-------- -------- --------------- --------------------
enabled enabled * success
-------- -------- --------------- --------------------
disabled enabled * success (only for "pull")
-------- -------- --------------- --------------------
disabled disabled not-required success
required requirement error (intentional)
-------- -------- --------------- --------------------
(4) update/revert with a subrepo disabling largefiles
In these cases, overridden functions cause accessing to largefiles
specific fields of not "largefiles.reposetup()"-ed "repo" object, and
execution is aborted.
- (1), (2), (4) cause accessing to "_lfstatuswriters" in
"getstatuswriter()" invoked via "updatelfiles()"
- (3) causes accessing to "_lfcommithooks" in "overriderebase()"
For safe accessing to these fields, this patch examines whether passed
"repo" object is "largefiles.reposetup()"-ed or not before accessing
to them.
This patch chooses examining existence of newly introduced
"_largefilesenabled" instead of "_lfcommithooks" and
"_lfstatuswriters" directly, because the former is better name for the
generic "largefiles is enabled in this repo" mark than the latter.
In the future, all other overridden functions should avoid largefiles
specific processing for efficiency, and "_largefilesenabled" is better
also for such purpose.
BTW, "lfstatus" can't be used for such purpose, because some code
paths set it forcibly regardless of existence of it in specified
"repo" object.
The previous code was adding standin files to the matcher's file list when
neither the standin file nor the original existed in the context. Somehow, this
was confusing the logging code into behaving differently from when the extension
wasn't loaded.
It seems that this was an attempt to support naming a directory that only
contains largefiles, as a test fails if the else clause is dropped entirely.
Therefore, only append the "standin" if it is a directory. This was found by
running the test suite with --config extensions.largefiles=.
The first added test used to log an additional cset that wasn't logged normally.
The only relation it had to file 'a' is that 'a' was the source of a move, but
it isn't clear why having '.hglf/a' in the list causes this change:
@@ -47,6 +47,11 @@
Make sure largefiles doesn't interfere with logging a regular file
$ hg log a --config extensions.largefiles=
+ changeset: 3:2ca5ba701980
+ user: test
+ date: Thu Jan 01 00:00:04 1970 +0000
+ summary: d
+
changeset: 0:9161b9aeaf16
user: test
date: Thu Jan 01 00:00:01 1970 +0000
The second added test used to complain about a file not being in the parent
revision:
@@ -1638,10 +1643,8 @@
Ensure that largefiles doesn't intefere with following a normal file
$ hg --config extensions.largefiles= log -f d -T '{desc}' -G
- @ c
- |
- o a
-
+ abort: cannot follow file not in parent revision: ".hglf/d"
+ [255]
$ hg log -f d/a -T '{desc}' -G
@ c
|
Note that there is still something fishy with the largefiles code, because when
using a glob pattern like this:
$ hg log 'glob:sub/*'
the pattern list would contain '.hglf/glob:sub/*'. None of the tests show this
(this test lives in test-largefiles.t at 1349), it was just something that I
noticed when the code was loaded up with print statements.
This effectively reverts 9fc565fa1621, which caused some normal file copies to
not be displayed as copies. Other normal file copies could be displayed- the
exact reason isn't clear. This also adds two tests that were failing prior to
this backout, so that this can be sorted out next cycle.
The difference between copy cases that worked and those that didn't seemed to be
in copies.pathcopies(). When largefiles isn't enabled for the changed test, or
lfstatus is not set in the commands.status() override, 'y.ancestor(x) == x'.
That wasn't true otherwise, which fell through to the _chain() method. In this
case, the copy is removed in the criss cross loop.
'y.ancestor(x)' returns a context.changectx type, while 'x' is a lfilesctx type
in the failing case. I tried adding the ancestor method to the lfilesctx class
to change the type of the ancestor context, however the context when printed as
a string then gains a '+'. This points to it being a context.committablectx,
which clearly isn't correct for an ancestor. Possibly the problem is the
lfilesctx needs to subclass context.committablectx in some cases, but
context.changectx in others, within the same invocation? I'm not sure how to
pull that off, and backing out this change is safer during the freeze.
As to the status changing when a path is specified, I haven't looked into it
yet.
Previously, when a largefile is forgotten and then reverted, a warning was
issued:
$ hg revert -R subrepo subrepo/large.txt
file not managed: subrepo/large.txt (glob)
This was purely cosmetic as the file itself actually was reverted.
The problem was even with all of the matcher patching, the largefile pattern
given on the command line wasn't converted to a standin because the standin was
neither in ctx nor wctx. This causes the named largefile to be added to the
'names' dict in cmdutil.revert() in the repo walk at line 2550. The warning was
printed out when the 'names' dict is iterated, because the file was specified
exactly.
Since core revert recurses into subrepos and largefiles only overrides the
revert method in commands.py, it doesn't work properly when reverting a subrepo.
However, it still will recurse into the subrepo and call the installed matcher
method, so lfdirstate is reopened for the current repo level to prevent any new
problems.
When a directory is named in the commit file list, the previous behavior was to
walk the list, and if no normal files in the directory were also named, add the
corresponding standin for each largefile in that directory. The directory is
then dropped from the list, so that committing a directory with no normal file
changes works. It then added the corresponding standin directory for the first
largefile seen, by prefixing it with '.hglf/'.
The latter is unnecessary since each affected largefile is explicitly referenced
by its standin in the list. It also caused an abort if there were no changed
largefiles in the directory, because none of its standins changed:
abort: .hglf/foo/bar: no match under directory!
This list of files is used to tweak a matcher in lfutil.updatestandinsbymatch(),
which is what is passed to commit().
The status() call that is ultimately done in the commit code with this matcher
seems to have some OS specific differences. It is not necessary to append '.'
for Windows to run the largefiles tests cleanly. But if '.' is not added to the
list, the match function isn't called on Linux, so status() would miss any
normal files that were also in a named directory. The commit then proceeds
without those normal files, or says "nothing changed" if there were no changed
largefiles in the directory. This is not filesystem specific, as VFAT on Linux
had the same behavior as when run on ext4. It is also not an issue with
lfilesrepo.status(), since that only calls the overridden implementation when
paths are passed to commit. I dont have access to an OS X machine ATM to test
there.
Maybe there's a better way to do this. But since the standin directory for the
first largefile was previously being added, and that caused the same walk in
status(), there's no preformance change to this. There is no danger of
erroneously committing files in '.', because the original match function is
called, and if it fails, the lfutil.updatestandinsbymatch() tweaked matcher only
indicates a match if the file is in the list of standins- and '.' never is. The
added tests confirm this.
Standins are read before and after an update/merge, and all the standins that
changes are handed to updatelfiles for getting their corresponding largefiles
updated. updatelfiles would then hash the largefile and see if it already
matched the new expected hash. If so, it would skip the update. If different,
the largefile would be updated.
It would happen very rarely that the largefile happened to match the new hash
(and thus not the old one) and the hashing would thus be pointless ... and
hashing is not cheap.
Instead, when it is known that the standin hash changed (from an update), just
update the standin unconditionally. If the largefile was "unsure" before the
update, it was hashed at that point, so we know there is nothing to preserve.
(Also, the hashing in updatelfiles was not used to preserve changes, but only
to be lazy about updating the largefile, so nothing is lost by not doing this
extra hashing.)
There might be rare situations where we now will update largefiles that didn't
have to be updated, but in all relevant cases (?) this will improve
performance.
Updates on a repo with some big largefiles has been seen to go from 9.19 s to
6.8 s - that is 26% less painful.
The --large, --normal and --lfsize args couldn't be passed to a subrepo before,
and files in the subrepos would be added silently (if -v wasn't specified) as
normal files. As an added bonus, 'hg add --dry-run' no longer prints that
largefiles would also be added as normal files as well.
'hg update' would hash all 'unsure' largefiles before performing the merge. It
would update the standins but not detect the very common case where the
largefile never had been changed by the user but just had been marked with an
invalid dirstate mtime to make sure any changes done by the user in the same
second would be detected. The largefile would remain in that state and would
have to be hashed again next time even though it still not had been changed.
Sad trombone.
Instead, for largefiles listed as 'unsure' or 'modified', after updating the
standin with the actual hash, mark the largefile as normal if it turns out to
not be modified relative to the revision in the parent revision. That will
prevent it from being hashed again next time.
This will be used to exclude largefile candidates from the normal file matcher,
which will allow add and addremove dryruns to not print a file as both a normal
and a large file.
Core addremove prints the file relative to cwd only if patterns are provided to
the command. Core add always prints relative to cwd. Also, both methods print
the subrepo prefix when needed. The 'already a largefile' doesn't have an
analog in core, but follows the same rules for consistency.
Both cmdutil.remove() and scmutil.addremove() require verbose mode or an inexact
match to print the filename. Core addremove also prints the file relative to
cwd only if patterns are provided to the command. And finally, both methods
print the subrepo prefix when needed.
When cloning a repo that requires largefiles, the user had to either enable the
extension on the command line and then manually edit the local hgrc file after
the clone, or just enable it globally for the user. Since it is a feature of
last resort, and materially affects even repos without any largefiles when it is
enabled, we should make it easier to not have it enabled globally.
This simply adds the enabling statement to the local hgrc if the requires file
mandates its use (which only happens after the first largefile is committed).
That means that a user who works with a mix of largefile and normal repos can
always clone with '--config extensions.largefiles=', and the extension is
permanently enabled or not as appropriate.
The change in test-largefiles.t is simply because the order of loading rebase
and largefiles changed. The same change occurs if the order is flipped in the
hgrc file at the top of the test.
The destination is validated by pathutil.canonpath() for illegal components, and
that it is in the repository. The logic for creating the standin directory tree
was calling this before cmdutil.copy(), but without the destination file name
component. The cmdutil.copy() logic also calls pathutil.canonpath(), but with
the file name component. By always calling the core logic first, the error
message is always consistent. Specifically, the old behavior for these tests
was to say '.hg' contains an illegal component, and '..' is not under root.
A user wasn't likely to notice the discrepancy, but this eliminates a needless
difference when running the test suite with --config extensions.largefiles=.
This is consistent with addlargefiles(), and will make it easier to get the
paths that are printed correct when recursing into subrepos or invoking from
outside the repository. It also now restricts the path that the addremove is
performed on if a path is given, as is done with normal files.
The repo.status() call needs to exclude clean files when performing an
addremove, because the addremove override method calling this used to pass the
list of files to delete, which caused the matcher to only consider those files
in building the status list. Now the matcher is restricted only to the extent
that the caller requested- usually directories if at all. There's no reason for
addremove to care about clean files anyway- we don't want them deleted.
The more aggressive synchronization of lfdirstate that was backed out in
4fe80f20ab06 masked the problem where lfdirstate would hold an 'R' for a
largefile that was added and then removed without a commit between. We could
just conditionally call lfdirstate.drop() or lfdirstate.remove() here, but this
also properly updates lfdirstate if the standin doesn't exist for the file
somehow (i.e. call drop instead of remove).
Without this change, the precommit status in the commit command immediately
after the test change lists the removed (and never committed) largefile as 'R'.
It can also lead to situations where the status command reports the same, long
after the commit [1].
[1] http://www.selenic.com/pipermail/mercurial-devel/2015-January/065153.html