This commit is part of bigger effort described in 'Windows UTF-8' plan.
It is not changing all invocations but the ones where change is
obviously correct and doesn't require complicated changes.
9f1a3c7b4a28 introduced support for not having a "global" user cache.
In the rare cases where the environment didn't provide the location of the
current home directory, the usercachepath function could return None.
That functionality has since bitrotten and several code paths did not correctly
check for usercachepath returning None:
$ HOME= XDG_CACHE_HOME= hg up --config extensions.largefiles=
getting changed largefiles
abort: unknown largefiles usercache location
Dropping the partial support for it is thus not really a backward compatibility
breaking change.
Thus: consistently fail early if the usercache location is unknown.
It is relevant to be able to control where the largefiles are stored and how
they propagate, but that should probably be done differently. The dysfunctional
code just gets in the way.
Replaces invocations os.path functions to methods in vfs. Unfortunately
(in my view) this makes code less readable, because instead of using
clear variable names with path it needs to replace them with vfs(..).
I need guidance how to make such transition look more readable.
For example in this patch there is example with few places with
wvfs.join(standindir), standindir before this patch was absolute
path, in this it is changed to relative because it is used also
in expression wvfs.join(standindir, pat).
Using plural form is consistent with other progress units, and "1 out of 5
revisions" sounds more correct. Also, tests don't show this, but if you have
'speed' item in progress.format config, it shows e.g. '100 revisions/sec',
which also seems better.
Methods _put, _get, _stat were used in remotestore class as
abstract expecting that subclass would implement them. This
commit makes this fact explicit.
This patch consists of changes below (these can't be applied
separately).
- replace revset.extpredicate by registrar.revsetpredicate in
extensions
- remove setup() on an instance named as revsetpredicate in
uisetup()/extsetup() of each extensions
registrar.revsetpredicate doesn't have setup() API.
- put new entry for revsetpredicate into extraloaders in dispatch
This causes implicit loading predicate functions at loading
extension.
This loading mechanism requires that an extension has an instance
named as revsetpredicate, and this is reason why
largefiles/__init__.py is also changed in this patch.
Before this patch, test-revset.t tests that all decorated revset
predicates are loaded by explicit setup() at once ("all or nothing").
Now, test-revset.t tests that any revset predicate isn't loaded at
failure of loading extension, because loading itself is executed by
dispatch and it can't be controlled on extension side.
In an upcoming patch we'll have different behavior here for when 'merge
--force' is used as opposed to when other kinds of force operations are
performed, like rebases.
Previously, if the largefile was deleted at the time of a commit, the standin
was silently not updated and its current state (possibly garbage) was recorded.
The test makes it look like this is somewhat of an edge case, but the same thing
happens when an `hg revert` followed by `rm` changes the standin.
Aside from the second invocation of this in lfutil.updatestandinsbymatch()
(which is what triggers this test case), the three other uses are guarded by
dirstate checks for added or modified, or an existence check in the filesystem.
So aborting in lfutil.updatestandins() should be safe, and will avoid silent
skips in the future if this is used elsewhere.
The change in 6fce9a02f069 to handle a normal -> largefile switch was too
aggressive in preserving the original matcher names. If a largefile is
explicitly provided by the user, but only the standin exists in dirstate, then
only the standin can be committed.
There's still maybe an issue when the largefile is deleted outside of Mercurial:
$ rm large
$ hg ci -m "oops" large
large: The system cannot find the file specified
nothing changed
[1]
92117e4f6f8d improved merging of standin files referencing missing largefiles.
It did however not test or fix commits of such merges; it would abort.
To fix that, change copytostore to skip and warn about missing largefiles
with a message similar the one for failing get from remote filestores. (It
would perhaps in both cases be better to emit a more helpful warning like
"warning: standin file for large1 references 58e24f733a which can't be found in
the local store".)
To test this, make sure commit doesn't find the "missing" largefile in the global
usercache. For further testing, verify that update and status works as expected
after this.
This will also effectively backout 159c82dd6523.
Previous patch introduced 'revset.predicate' decorator to register
revset predicate function easily.
But it shouldn't be used in extension directly, because it registers
specified function immediately. Registration itself can't be restored,
even if extension loading fails after that.
Therefore, registration should be delayed until 'uisetup()' or so.
This patch uses 'extpredicate' decorator derived from 'delayregistrar'
to register predicate in extension easily.
This patch also tests whether 'registrar.delayregistrar' avoids
function registration if 'setup()' isn't invoked on it, because
'extpredicate' is the first user of it.
Once we get a matcher down into manifestmerge, we can make narrowhg
work more easily and potentially let manifest.match().diff() do less
work in manifestmerge.
The largefiles merge code (currently) does not handle change/delete conflicts.
So fall back to regular filemerge in that case.
Making this code handle change/delete conflicts is left as an exercise for the
future.
We're going to treat these conflicts similarly to merge conflicts, and this
change to the way we store things in memory makes future code a lot simpler.
'unexpected putlfile response: None' when an http error occurs is not very
helpful.
Instead, leave the handling of urllib2.HTTPError exceptions to other layers.
If the store somehow got corrupted, users could end up in weird situations that
were very hard to recover from or lead to propagation of the corruption.
Instead, spend the extra time checking the hash when copying to the working
directory. If it doesn't match, emit a warning, and don't put wrong content in
the working directory.
Commit of corresponding normal/largefiles pairs would only commit the standin.
That is usually fine, except if either the normal file or the standin is a
remove while the other is an add. In that case it would either give duplicate
colliding entries or lose the file.
Instead, commit both filenames if one of them is a remove.
Before, when merging revisions with missing largefiles, the missing largefiles
would be fetched as a part of the merge. If that failed (for example because
the main repository temporarily was unavailable), the largefile would be left
missing. However, the next commit would abort and (seemed to) fail when
markcommitted tried to mark the standin file as normal and thus had to hash the
largefile that didn't exist. (Actually, the commit would succeed but the
largefile update that follows right after the commit transaction would abort -
quite confusing.)
To fix that, make sure that synclfdirstate only marks files as normal if they
actually exist.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
This regressed in f07f4c45a8f2, when trying to conditionally disable archiving
of largefiles.
I'm not sure if wrapfunction() is the right way to do this, but it seems to
work. The mysterious issue with lfstatus getting out of sync in the proxy and
the unfiltered view crops up again here. See the referenced cset for more info.
Previously, simply having the largefiles extension loaded without any largefiles
added would crash when amending with -I. The problem was with no files in the
matcher, the pattern list of files joined with 'standindir' was empty, and
scmutil.match() would match everything. In lfutil.composestandinmatcher(), the
match function is used to test if the file is a standin, and after getting a
false positive, proceeds to call lfutil.splitstandin(). This returns None
because it isn't a standin, which blows up when passed to rmatcher.matchfn().
Manually overriding _always in getstandinmatcher() probably isn't necessary
anymore, but we leave well enough alone on stable. This regressed in
78632d61a993.
There are currently no users of this, but it is a necessary step before
converting extdiff to use archive. It may be useful to add an argument to
extdiff in the future and allow largefiles to be diffed, but archiving
largefiles can have significant overhead and may not be very diffable, so
archiving them by default seems wrong.
It is a mystery to me why the lfstatus attribute needs to be set on the
unfiltered repo. However if it is set on the filtered repo instead (and the
filtered repo is passed to the original command), the lfstatus attribute is
False in the overrides for archival.archive() and hgsubrepo.archive() when
invoking the archive command. This smells like the buggy status behavior (see
9fc565fa1621, which was reverted in e1574a4a2cad). Neither the status nor
summary commands have this weird behavior in their respective overrides.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
Python 2.6 introduced a new octal syntax: "0oXXX", replacing "0XXX". The
old syntax is not recognized in Python 3 and will result in a parse
error.
Mass rewrite all instances of the old octal syntax to the new syntax.
This patch was generated by `2to3 -f numliterals -w -n .` and the diff
was selectively recorded to exclude changes to "<N>l" syntax conversion,
which will be handled separately.
This is a step toward replacing the extdiff internals with archive, in order to
support 'extdiff -S'. Only Mercurial subrepos are supported for now.
If a file is missing from the filesystem, it is silently skipped. Perhaps it
should warn, but it cannot abort when working with extdiff because deleting a
file is a legitimate diff.
Previously, if there were any hidden changesets, the --lfa argument would cause
the command to abort with a hint about using --hidden when it tripped over a
hidden changeset.
The monkey patching in cat() can't be fixed, because it still delegates to the
original bad(). Overriding commands.cat() should go away in favor overriding
cmdutil.cat() anyway, and that matcher can be wrapped with matchmod.badmatch().
No known issues with the previous code since it restored the original method,
but this is cleaner.
The monkey patching in cat is harmless, because it is created locally, and
doesn't pass it anywhere (subrepo cat isn't supported with largefiles).
The logic in the convert extension is more advanced, supporting extra features
like converting revision IDs in 'extras' (e.g. 'amend_source'), supports
updating hashes in commit messages, and outputs an SHA map file. Rather than
try to duplicate all of that, just use the existing code.
Even though the convert extension supports user supplied options like filemap,
etc, those features aren't available on the lfconvert interface. Therefore, it
is safe to use the filemap mechanism (in memory) to handle the standin -> file
rename. The convert extension handles the destination locking for this path.
There was a comment in test-lfconvert.t about the hash on rev 5 being different
because it was doing a better job than "hg remove" + "hg merge" + "hg commit".
It isn't clear to me what was happening or why, but now the hashes match the
original repo exactly after a roundtrip, which seems like a good idea. If there
really was something beneficial about the previous behavior, perhaps merge can
be changed so that everyone benefits.
Converting to a largefiles repo still uses the original (limited) lfconvert
logic.
The choice between the "always" case and the other case is done in
getstandinmatcher() and the next patch will change how it's determined
based on the matcher, so let's prepare by passing in the matcher, not
just the matcher's files.
Extension authors (notably at companies using hg) have been
cargo-culting the `testedwith = 'internal'` bit from hg's own
extensions, which then defeats our "file bugs over here" logic in
dispatch. Let's be more aggressive about trying to give extension
authors a hint about what testedwith should say.
This is related to df069d92be27, but not required for that to function correctly
because 'prefix' always ends with '/', so os.path.join(prefix, _path) simply
concatenates the strings. Take the opportunity to drop the os.path usage, and
avoid confusion by using the same pattern as the overridden subrepo method.
The fileset-generated.t test previously failed with this:
+ hg: parse error: unknown identifier: .hglf/modified
+ (did you mean 'modified'?)
+ [255]
Filesets will find the standins on their own, without any help. While that's
useful for some things like modified(), clean(), etc., it is wrong for things
like size(). Proper fileset support for largefiles is not trivial, but this was
failing with just the extension enabled on a normal repo.
The immediate crash was when checking for requirements immediately after this,
but lfcommands.downloadlfiles() will also crash if --all-largefiles is
specified. That has been in place since atleast e01343f7da6f (2.3-rc) without
anyone noticing.
I can't tell from the peer classes if there's a way to make the custom largefile
functionality work in this case, but atleast it doesn't crash.
mergeupdate already set the flag to update all. This will thus only change
overriderevert and scmutilmarktouched ... where the flag effectually also were
true. The test coverage thus shows no change.
As the flag always is set, it is removed.
This is mainly a change for keeping the code simple and consistent and correct,
but it should also make it faster in many cases.
Before, a --clean update with largefiles would use the "optimization" that it
didn't read hashes from standin files before and after the update. Instead of
trusting the content of the standin files, it would rehash all the actual
largefiles that lfdirstate reported clean and update the standins that didn't
have the expected content. It could thus in some "impossible" situations
automatically recover from some "largefile got out sync with its standin"
issues (even there apparently still were weird corner cases where it could
fail). This extra checking is similar to what core --clean intentionally do
not do, and it made update --clean unbearable slow.
Usually in core Mercurial, --clean will rely on the dirstate to find the files
it should update. (It is thus intentionally possible (when trying to trick the
system or if there should be bugs) to end up in situations where --clean not
will restore the working directory content correctly.) Checking every file when
we "know" it is ok is however not an option - that would be too slow.
Instead, trust the content of the standin files. Use the same logic for --clean
as for linear updates and trust the dirstate and that our "logic" will keep
them in sync. It is much cheaper to just rehash the largefiles reported dirty
by a status walk and read all standins than to hash largefiles.
Most of the changes are just a change of indentation now when the different
kinds of updates no longer are handled that differently. Standins for added
files are however only written when doing a normal update, while deleted and
removed files only will be updated for --clean updates.
This allows passing a matcher down the pathcopies() stack to _forwardcopies().
This will let us add logic in a later patch to avoid tracing copies when not
necessary (like when doing hg diff -r 1 -r 2 foo.txt).
This makes treemanifest.walk() not visit submanifests that are known not to
have any matching files. It does this by calling match.visitdir() on
submanifests as it walks.
This change also updates largefiles to be able to work with this new behavior
in treemanifests. It overrides match.visitdir(), the function that dictates
how walk() and matches() skip over directories.
The greatest speed improvements are seen with narrower scopes. For example,
this commit speeds up the following command on the Mozilla repo from 1.14s
to 1.02s:
hg files -r . dom/apps/
Whereas with a wider scope, dom/, the speed only improves from 1.21s to 1.13s.
As with similar a similar optimization to treemanifest.matches(), this change
will bring out even bigger performance improvements once treemanifests are
loaded lazily. Once that happens, we won't just skip over looking at
submanifests, but we'll skip even loading them.
The benefit of retargeting the local store to the share source is that all
shares will always have access to the largefiles any one of them commit, even if
the user cache is deleted (which is documented to be OK to do). Further, any
push into the source (and now any shares), will likewise make the largefile(s)
visible to all related repositories.
In order to maintain compatibility with existing repos, where the largefiles
would be cached only in the local share, fallback to searching the local share
if it isn't found at the share source.
The unshare command should probably be taught to copy the source store into the
store for the repo being unshared to complete the loop.
This patch changes the test like this:
@@ -159,6 +159,5 @@
$ hg share -q src share_dst --config extensions.share=
$ hg -R share_dst update -r0
getting changed largefiles
- large: largefile $HASH not available from file:///$TESTTMP\share_dst
- 0 largefiles updated, 0 removed
+ 1 largefiles updated, 0 removed
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
The issue writeup mentions pushing a largefile from a remote repo to the main
local repo, and the largefile is then not available in any shares. Since the
push doesn't cache the largefile in $USERCACHE, the trashed $USERCACHE in this
test is equivalent.
The handful of direct uses of lfutil.storepath() merely need a single path to
read from or write to the largefile, whether or not it exists. Most callers
that care about the file existing call lfutil.findfile(), in order to fallback
from the store to the user cache.
localstore._verify() doesn't call lfutil.findfile(). This prevents redirecting
the store to the share source because the largefiles for existing repos may not
be in the source's store, so verification may fail. It can't be changed to call
findfile(), because findfile() links the file from the usercache to the local
store[1], and because it returns None instead of a path if the file doesn't
exist.
For now, this method is just a cover for lfutil.storepath(), but it will be
filled out in an upcoming patch.
[1] Maybe we shouldn't care? But on a filesystem that doesn't support
hardlinks, then verify will take a lot longer, and start to consume disk
space.
Now, "overrideupdate()" wrapping "hg update" is useless, because
"workingctx.dirty() and raising Abort" in "hg update" was replaced by
"cmdutil.bailifchanged()" in the previous patch, and the latter can
detect changes of largefiles in the working directory.
In "commands.update()", "cmdutil.bailifchanged()" isn't used for
"abort if the working directory is dirty", because it forcibly
examines about merging in progress.
"workingctx.dirty()" used in "commands.update()" can't detect changes
of largefiles in the working directory without "repo.lfstatus = True"
wrapping. This is only reason of "commands.update()" wrapping by
largefiles extension.
On the other hand, "cmdutil.bailifchanged()" already wrapped by
largefiles extension can detect changes of largefiles.
This patch is a preparations for replacing "workingctx.dirty()" and
raising Abort in "commands.update()" by "cmdutil.bailifchanged()". It
can remove redundant "commands.update()" wrapping.
As the failing revert tests in test-fileset-generated.t show,
Revert currently creates one matcher for matching files in the working
copy and another matcher for matching files in the target
revision. The two matchers are created with different contexts, which
means filesets are evaluated differently. Then the union of the sets
of files matching the matchers in the two contexts are reverted. It
doesn't seem to make sense to use two different matchers; only the
context they're applied to should be different.
It seems very likely that the user wants the filesets to be evaluated
against the working directory, which the tests
test-fileset-generated.t also assume, so let's make it so.
I willingly admit that the largefiles code was modified by trial and
error (according to tests).
By overriding the cmdutil method we don't need to override both the
function and the command. Also, we get access to the 'ctx' and
'parents' variables, which will soon prove useful.
Rename the 'ctx' argument to overridematch() to 'mctx' rather than
letting it shadow new 'ctx'.
Commit 359818e9961c (largefiles: don't create chain of __contains__
calls, 2015-03-11) introduced a typo with __class instead of
__class__. Let's fix it.
There are (obviously) no tests covering this code path, and I could
not figure out a way to trigger it, so it remains untested.
Spotted by Drew Gottlieb.