Replaces invocations os.path functions to methods in vfs. Unfortunately
(in my view) this makes code less readable, because instead of using
clear variable names with path it needs to replace them with vfs(..).
I need guidance how to make such transition look more readable.
For example in this patch there is example with few places with
wvfs.join(standindir), standindir before this patch was absolute
path, in this it is changed to relative because it is used also
in expression wvfs.join(standindir, pat).
Using plural form is consistent with other progress units, and "1 out of 5
revisions" sounds more correct. Also, tests don't show this, but if you have
'speed' item in progress.format config, it shows e.g. '100 revisions/sec',
which also seems better.
Previously, if the largefile was deleted at the time of a commit, the standin
was silently not updated and its current state (possibly garbage) was recorded.
The test makes it look like this is somewhat of an edge case, but the same thing
happens when an `hg revert` followed by `rm` changes the standin.
Aside from the second invocation of this in lfutil.updatestandinsbymatch()
(which is what triggers this test case), the three other uses are guarded by
dirstate checks for added or modified, or an existence check in the filesystem.
So aborting in lfutil.updatestandins() should be safe, and will avoid silent
skips in the future if this is used elsewhere.
The change in 6fce9a02f069 to handle a normal -> largefile switch was too
aggressive in preserving the original matcher names. If a largefile is
explicitly provided by the user, but only the standin exists in dirstate, then
only the standin can be committed.
There's still maybe an issue when the largefile is deleted outside of Mercurial:
$ rm large
$ hg ci -m "oops" large
large: The system cannot find the file specified
nothing changed
[1]
92117e4f6f8d improved merging of standin files referencing missing largefiles.
It did however not test or fix commits of such merges; it would abort.
To fix that, change copytostore to skip and warn about missing largefiles
with a message similar the one for failing get from remote filestores. (It
would perhaps in both cases be better to emit a more helpful warning like
"warning: standin file for large1 references 58e24f733a which can't be found in
the local store".)
To test this, make sure commit doesn't find the "missing" largefile in the global
usercache. For further testing, verify that update and status works as expected
after this.
This will also effectively backout 159c82dd6523.
If the store somehow got corrupted, users could end up in weird situations that
were very hard to recover from or lead to propagation of the corruption.
Instead, spend the extra time checking the hash when copying to the working
directory. If it doesn't match, emit a warning, and don't put wrong content in
the working directory.
Commit of corresponding normal/largefiles pairs would only commit the standin.
That is usually fine, except if either the normal file or the standin is a
remove while the other is an add. In that case it would either give duplicate
colliding entries or lose the file.
Instead, commit both filenames if one of them is a remove.
Before, when merging revisions with missing largefiles, the missing largefiles
would be fetched as a part of the merge. If that failed (for example because
the main repository temporarily was unavailable), the largefile would be left
missing. However, the next commit would abort and (seemed to) fail when
markcommitted tried to mark the standin file as normal and thus had to hash the
largefile that didn't exist. (Actually, the commit would succeed but the
largefile update that follows right after the commit transaction would abort -
quite confusing.)
To fix that, make sure that synclfdirstate only marks files as normal if they
actually exist.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
Previously, simply having the largefiles extension loaded without any largefiles
added would crash when amending with -I. The problem was with no files in the
matcher, the pattern list of files joined with 'standindir' was empty, and
scmutil.match() would match everything. In lfutil.composestandinmatcher(), the
match function is used to test if the file is a standin, and after getting a
false positive, proceeds to call lfutil.splitstandin(). This returns None
because it isn't a standin, which blows up when passed to rmatcher.matchfn().
Manually overriding _always in getstandinmatcher() probably isn't necessary
anymore, but we leave well enough alone on stable. This regressed in
78632d61a993.
The monkey patching in cat() can't be fixed, because it still delegates to the
original bad(). Overriding commands.cat() should go away in favor overriding
cmdutil.cat() anyway, and that matcher can be wrapped with matchmod.badmatch().
The choice between the "always" case and the other case is done in
getstandinmatcher() and the next patch will change how it's determined
based on the matcher, so let's prepare by passing in the matcher, not
just the matcher's files.
The benefit of retargeting the local store to the share source is that all
shares will always have access to the largefiles any one of them commit, even if
the user cache is deleted (which is documented to be OK to do). Further, any
push into the source (and now any shares), will likewise make the largefile(s)
visible to all related repositories.
In order to maintain compatibility with existing repos, where the largefiles
would be cached only in the local share, fallback to searching the local share
if it isn't found at the share source.
The unshare command should probably be taught to copy the source store into the
store for the repo being unshared to complete the loop.
This patch changes the test like this:
@@ -159,6 +159,5 @@
$ hg share -q src share_dst --config extensions.share=
$ hg -R share_dst update -r0
getting changed largefiles
- large: largefile $HASH not available from file:///$TESTTMP\share_dst
- 0 largefiles updated, 0 removed
+ 1 largefiles updated, 0 removed
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
The issue writeup mentions pushing a largefile from a remote repo to the main
local repo, and the largefile is then not available in any shares. Since the
push doesn't cache the largefile in $USERCACHE, the trashed $USERCACHE in this
test is equivalent.
The handful of direct uses of lfutil.storepath() merely need a single path to
read from or write to the largefile, whether or not it exists. Most callers
that care about the file existing call lfutil.findfile(), in order to fallback
from the store to the user cache.
localstore._verify() doesn't call lfutil.findfile(). This prevents redirecting
the store to the share source because the largefiles for existing repos may not
be in the source's store, so verification may fail. It can't be changed to call
findfile(), because findfile() links the file from the usercache to the local
store[1], and because it returns None instead of a path if the file doesn't
exist.
For now, this method is just a cover for lfutil.storepath(), but it will be
filled out in an upcoming patch.
[1] Maybe we shouldn't care? But on a filesystem that doesn't support
hardlinks, then verify will take a lot longer, and start to consume disk
space.
Even if largefiles extension is enabled in a repository, "repo"
object, which isn't "largefiles.reposetup()"-ed, is passed to
overridden functions in the cases below unexpectedly, because
extensions are enabled for each repositories strictly.
(1) clone without -U:
(2) pull with -U:
(3) pull with --rebase:
combination of "enabled@src", "disabled@dst" and
"not-required@src" cause this situation.
largefiles requirement
@src @dst @src result
-------- -------- --------------- --------------------
enabled disabled not-required aborted unexpectedly
required requirement error (intentional)
-------- -------- --------------- --------------------
enabled enabled * success
-------- -------- --------------- --------------------
disabled enabled * success (only for "pull")
-------- -------- --------------- --------------------
disabled disabled not-required success
required requirement error (intentional)
-------- -------- --------------- --------------------
(4) update/revert with a subrepo disabling largefiles
In these cases, overridden functions cause accessing to largefiles
specific fields of not "largefiles.reposetup()"-ed "repo" object, and
execution is aborted.
- (1), (2), (4) cause accessing to "_lfstatuswriters" in
"getstatuswriter()" invoked via "updatelfiles()"
- (3) causes accessing to "_lfcommithooks" in "overriderebase()"
For safe accessing to these fields, this patch examines whether passed
"repo" object is "largefiles.reposetup()"-ed or not before accessing
to them.
This patch chooses examining existence of newly introduced
"_largefilesenabled" instead of "_lfcommithooks" and
"_lfstatuswriters" directly, because the former is better name for the
generic "largefiles is enabled in this repo" mark than the latter.
In the future, all other overridden functions should avoid largefiles
specific processing for efficiency, and "_largefilesenabled" is better
also for such purpose.
BTW, "lfstatus" can't be used for such purpose, because some code
paths set it forcibly regardless of existence of it in specified
"repo" object.
The function only adds the hash content of the file to the set to upload if the
file in the ctx is a standin. It is called by overrides.summaryremotehook(),
which is called in the summary method. The largefiles extension switches
'lfstatus' on in summary, so the standins shouldn't be visible when obtaining a
context there.
The reason this wasn't noticed before is that the 'lfstatus' attribute is only
being set on the unfiltered repo because of how repoview delegates attribute
assignment. Therefore any filtered view will return a context containing
standins, whether or not 'lfstatus' was set in the various overrides methods.
That will be fixed in the next patch. But without this change, the next patch
would have test failures for 'summary --large' stating there are no files to
upload.
Before this patch, while "hg convert", largefiles avoids copying
largefiles in the working directory into the store area by combination
of setting "repo._isconverting" in "mercurialsink{before|after}" and
checking it in "copytostoreabsolute".
This avoiding is needed while "hg convert", because converting doesn't
update largefiles in the working directory.
But this implementation is not efficient, because:
- invocation in "markcommitted" can easily ensure updating
largefiles in the working directory
"markcommitted" is invoked only when new revision is committed via
"commit" of "localrepository" (= with files in the working
directory). On the other hand, "commitctx" may be invoked directly
for in-memory committing.
- committing without updating the working directory (e.g. "import
--bypass") also needs this kind of avoiding
For efficiency of this kind of avoiding, this patch does:
- move "copyalltostore" invocation into "markcommitted"
- remove meaningless procedures below:
- hooking "mercurialsink{before|after}" to (un)set "repo._isconverting"
- checking "repo._isconverting" in "copytostoreabsolute"
This patch invokes "copyalltostore" also in "_commitcontext", because
"_commitcontext" expects that largefiles in the working directory are
copied into store area after "commitctx". In this case, the working
directory is used as a kind of temporary area to write largefiles out,
even though converted revisions are committed via "commitctx" (without
updating normal files).
Before this patch, "hg transplant --continue" may record incorrect
standins, because largefiles extension always avoid updating standins
while transplanting, even though largefiles in the working directory
may be modified manually at the 1st commit of "hg transplant --continue".
But, on the other hand, updating standins should be avoided at
subsequent commits for efficiency reason.
To update standins only at the 1st commit of "hg transplant
--continue", this patch uses "automatedcommithook", which updates
standins by "lfutil.updatestandinsbymatch()" only at the 1st commit of
resuming.
Even after this patch, "repo._istransplanting = True" is still needed
to avoid some status report while updating largefiles in
"lfcommands.updatelfiles()".
This is reason why this patch omits not "repo._istransplanting = True"
in "overriderebase" but examination of "getattr(repo,
"_istransplanting", False)" in "updatestandinsbymatch".
At "hg transplant --merge REV", largefiles newly coming from the 2nd
parent (= REV) are marked as "a"(dded) by "patch.patch()", and have to
be marked as "n"(ormal) after commit.
But until changeset 978713c45992, such largefiles were still marked as
"a" unexpectedly even after commit, because no additional entry is
added to filelog of such largefiles and they aren't listed in
"repo[newnode].files()" in this case: "newnode" is one of newly
committed changeset (= result of "repo.commit()").
"updatelfiles" invocation in "overridetransplant" shadows this problem
by forcibly synchronizing lfdirstate to dirstate.
Now, "updatelfiles" invocation in "overridetransplant" is redundant,
because changeset 978713c45992 made "markcommitted" use "ctx.files()"
to get targets of "synclfdirstate" instead of "repo[newnode].files()".
"lfutil.getstatuswriter" is the utility to get appropriate function to
write largefiles specific status out from "repo._lfstatuswriters".
This patch uses "stack" with an element instead of flag like
"_isXXXXing" or so, because:
- the former works correctly even when customizations are nested, and
- ensuring at least one element can ignore empty check
Before this patch, "hg rebase --continue" may record incorrect
standins, because largefiles extension always avoid updating standins
while rebasing, even though largefiles in the working directory may be
modified manually at the 1st commit of "hg rebase --continue".
But, on the other hand, updating standins should be avoided at
subsequent commits for efficiency reason.
To update standins only at the 1st commit of "hg rebase --continue",
this patch introduces state-full callable object
"automatedcommithook", which updates standins by
"lfutil.updatestandinsbymatch()" only at the 1st commit of resuming.
Even after this patch, "repo._isrebasing = True" is still needed to
avoid some status report while updating largefiles in
"lfcommands.updatelfiles()".
This is reason why this patch omits not "repo._isrebasing = True" in
"overriderebase" but examination of "getattr(repo, "_isrebasing",
False)" in "updatestandinsbymatch".
This patch factors out procedures to update standins for
pre-committing. This is one of preparations to avoid execution of such
procedures according to invocation context.
For example, resuming automated committing (e.g. "hg rebase
--continue") should update standins at the 1st commit, because
largefiles in the working directory may be modified manually. But on
the other hand, it should avoid updating standins at subsequent
committings for efficiency reason.
For simplicity, this patch just moves procedures mechanically only
with replacing below.
- "self" => "repo"
- "lfutil." => (none)
- "orig" invocation => returning "match"
Using "fstandin" instead "standin" as the name of local variable for
the loop below is the only special care, because the latter shadows
the same name function in "lfutil.py".
[before]
for standin in standins:
lfile = lfutil.splitstandin(standin)
if lfdirstate[lfile] != 'r':
lfutil.updatestandin(self, standin)
[after]
for fstandin in standins:
lfile = splitstandin(fstandin)
if lfdirstate[lfile] != 'r':
updatestandin(repo, fstandin)
Before this patch, procedures to update lfdirstate for post-committing
are scattered in "lfilesrepo.commit". In the case of "hg commit" with
patterns for target files ("Case 2"), lfdirstate is updated BEFORE
real committing.
This patch factors out procedures to update lfdirstate for
post-committing into "lfutil.markcommitted", and makes it callable via
"markcommitted" of the context passed to "lfilesrepo.commitctx".
"markcommitted" of the context is called, only when it is committed
successfully.
Passing original "markcommitted" of the context is meaningless in this
patch, but required in subsequent one to prepare something before
invocation of it.
In lfdirstatestatus(), the status tuple gets deconstructed, the lists
get updated, and then an identical status tuple gets created and
returned. Change it so we simply return the original tuple.
The status tuple returned from dirstate.status() has an additional
field compared to the other status tuples: lookup/unsure. This field
is just an optimization and not something most callers care about
(they want the resolved value of 'modified' or 'clean'). To prepare
for a single future status type, let's separate out the 'lookup' field
from the rest by having dirstate.status() return a pair: (lookup,
status).
Previously, the directory '.hg/largefiles' would always be created if it didn't
exist when the lfdirstate was opened. If there were no standin files, no
dirstate file would be created in the directory. The end result was that
enabling the largefiles extension globally, but not explicitly adding a
largefile would result in the repository eventually sprouting this directory.
Creation of this directory effectively changes readonly operations like summary
and status into operations that require write access. Without write access,
commands that would succeed without the extension loaded would abort with a
surprising error when the extension is loaded, but not actively used:
$ hg sum -R /tmp/thg --config extensions.largefiles=
parent: 16541:00dc703d5aed
repowidget: specify incoming bundle by plain file path to avoid url parsing
branch: default
abort: Permission denied: '/tmp/thg/.hg/largefiles'
This change is simpler than changing the callers of openlfdirstate() to use the
'create' parameter that was introduced in 74522122b97d, and probably how that
should have been implemented in the first place.
Before this patch, "overrides.getoutgoinglfiles()" (called by
"overrideoutgoing()" and "overridesummary()") and "lfilesrepo.push()"
implement similar logic to get outgoing largefiles separately.
This patch centralizes the logic to get outgoing largefiles in
"lfutil.getlfilestoupload()".
"lfutil.getlfilestoupload()" takes "addfunc" argument, because each
callers need different information (and it is useful for enhancement
in the future).
- "overrides.getoutgoinglfiles()" needs only filenames
- "lfilesrepo.push()" needs only hashes of largefiles
This goes a step further than 974959d637b7 and backs out the unreleased
--cache-largefiles option. The same can be achieved with --lfrev heads(pulled()) and
we shouldn't introduce unnecessary command line options.
Looking for a (potentially empty) directory was not reliable - both because it
is a reasonable assumption that empty directories can be removed and because it
wasn't created in all cases ... such as when pulling to an existing repository.