This simplifies largefiles' overridecalculateupdates(), which no
longer has to do the conversion it started doing in 478d610ca1b0
(largefiles: rewrite merge code using dictionary with entry per file,
2014-12-09).
To keep this patch small, we'll leave the name 'actionbyfile' in
overrides.py. It will be renamed in the next patch.
The hack for method monkey patching on repoview has been ruled out as
fragile, so we are rolling it back. We'll expand the explanation in the next
changeset.
Changeset 5b64e22ecd8e introduced the examination of exec bit of
largefiles in "hg status --rev REV" case, but it doesn't avoid it on
the platform being unaware of exec-bit (e.g. on NTFS of Windows).
The current state of subrepo methods is to pass a 'ui' object to some methods,
which has the effect of overriding the subrepo configuration since it is the
root repo's 'ui' that is passed along as deep as there are subrepos. Other
subrepo method are *not* passed the root 'ui', and instead delegate to their
repo object's 'ui'. Even in the former case where the root 'ui' is available,
some methods are inconsistent in their use of both the root 'ui' and the local
repo's 'ui'. (Consider hg._incoming() uses the root 'ui' for path expansion
and some status messages, but also calls bundlerepo.getremotechanges(), which
eventually calls discovery.findcommonincoming(), which calls
setdiscovery.findcommonheads(), which calls status() on the local repo 'ui'.)
This inconsistency with respect to the configured output level is probably
always hidden, because --verbose, --debug and --quiet, along with their 'ui.xxx'
equivalents in the global and user level hgrc files are propagated from the
parent repo to the subrepo via 'baseui'. The 'ui.xxx' settings in the parent
repo hgrc file are not propagated, but that seems like an unusual thing to set
on a per repo config file. Any 'ui.xxx' options changed by --config are also
not propagated, because they are set on repo.ui by dispatch.py, not repo.baseui.
The goal here is to cleanup the subrepo methods by dropping the 'ui' parameter,
which in turn prevents mixing subtly different 'ui' instances on a given subrepo
level. Some methods use more than just the output level settings in 'ui' (add
for example ends up calling scmutil.checkportabilityalert() with both the root
and local repo's 'ui' at different points). This series just goes for the low
hanging fruit and switches methods that only use the output level.
If we really care about not letting a subrepo config override the root repo's
output level, we can propagate the verbose, debug and quiet settings to the
subrepo in the same way 'ui.commitsubrepos' is in hgsubrepo.__init__.
Archive only uses the 'ui' object to call its progress() method, and gitsubrepo
calls status().
By moving the cd/dc prompts out of calculateupdates(), we let
largefiles' overridecalculateupdates() so the unresolved values
(i.e. 'cd' or 'dc' rather than 'g', 'r', 'a' and missing). This allows
overridecalculateupdates() to ask the user whether to keep the normal
file or the largefile before the user gets the cd/dc prompt. Whichever
answer the user gives, we make overridecalculateupdates() replace 'cd'
or 'dc' action, saving the user one annoying (and less clear)
question.
The recursive addremove operation occurs completely before the first subrepo is
committed. Only hg subrepos support the addremove operation at the moment- svn
and git subrepos will warn and abort the commit.
Instead of iterating over 'g' action, first find the set of all files
that are largefiles in p1. Then iterate over these files. This
prepares for considering actions other than 'g'.
In overridecalculateupdates(), we currently only deal with conflicts
that result in a 'g' action for either the largefile or a standin. We
will soon want to deal cases with 'cd' and 'dc' actions here. It will
be easier to reason about such cases if we rewrite it using a dict
from filename to action.
A side-effect of this change is that the output can only have one
action per file (which should be a good change). Before this change,
when one of the tests in test-issue3084 received this input (the 'a'
in the input was a result of 'cd' conflict resolved in favor of the
modified file):
'g': [('.hglf/f', ('',), 'remote created')],
'a': [('f', None, 'prompt keep')],
and the user chose to keep the local largefile, it produced this
output:
'g': [('.hglf/f', ('',), 'remote created')],
'r': [('f', None, 'replaced by standin')],
'a': [('f', None, 'prompt keep')],
Although 'a' actions are processed after 'r' actions by
recordupdates(), it still worked because 'a' actions have no effect on
merges (only on updates). After this change, the output is:
'g': [('.hglf/f', ('',), 'remote created')],
'r': [('f', None, 'replaced by standin')],
Similarly, there are several tests in test-largefiles-update that get
inputs like:
'a': [('.hglf/large2', None, 'prompt keep')],
'g': [('large2', ('',), 'remote created')],
and when the user chooses to keep the local largefile, they produce
this output:
'a': [('.hglf/large2', None, 'prompt keep'),
('.hglf/large2', None, 'keep standin')],
'lfmr': [('large2', None, 'forget non-standin largefile')],
In this case, it was not a merge but an update, so the 'a' action does
have an effect. However, since dirstate.add() is idempotent, it still
has no obserable effect.
After this change, the output is:
'a': [('.hglf/large2', None, 'keep standin')],
'lfmr': [('large2', None, 'forget non-standin largefile')],
The items we put in 'newglist' are always the same as what we found in
actions['g'], so let's just put the same item into the list instead of
creating a new one.
The action lists returned from calculateupdates() (in merge.py) are
not required to be sorted. In fact, since they result from iteration
over the unordered manifest, they are unlikely to be sorted. Moreover,
some of the lists are appended to after they are returned from
manifestmerge(). The lists are instead sorted in
applyupdates(). Therefore, let's not sort the lists generated in
largefiles' overridecalculateupdates().
When merging and the remote has turned a normal file into a largefile
and the user chooses to keep the local largefile, we use the 'r'
action for the remote largefile standin. This is wrong, since that
file does not exist in the parent of the working copy. Use 'k', which
does nothing but debug logging, instead.
When merging and the remote has turned a largefile into a normal file
and the user chooses to keep the local largefile, we use the 'r'
action for the remote normal file. This is wrong, since that file does
not exist in the parent of the working copy. Use 'k', which does
nothing but debug logging, instead.
In c69fe5519c86 (largefiles: don't show largefile/normal prompts if
one side is unchanged, 2014-12-01), overridecalculateupdates() started
checking for false modify/delete conflicts in large files and their
standins. Then, in the very next changeset, 99b29d2bd5ed (merge:
before cd/dc prompt, check that changed side really changed,
2014-12-01), calculateupdates() itself started checking for false
modify/delete conflicts in all files. Since "large files and their
standins" is a subset of "all files", we can now drop the checks in
overridecalculateupdates().
In overridecalculateupdates(), 'g' (get) actions may be converted into
other actions. In most of these cases, it does not make sense to keep
the action's message. For example, 'remote created' does not make
sense for an 'r' (remove) action.
The message in the action is used for debugging and should not be the
same as the question presented to the user. Use a different variable
for the user message, so the 'msg' variable already in scope does not
get overwritten.
The fetch extension has been calling cmdutil.bailifchanged() since 70b2d52341c9,
so this is redundant. Add test coverage to prevent regression. It doesn't look
like there is any testing for fetch with largefiles.
Refactoring addremove to support subrepos will need the ability to keep passing
the same matcher and narrowing it, instead of monkey patching scmutil's matcher.
The method has been called from commands.py since 8d9ca2ac2fe8
(update: just merge unknown file collisions, 2012-02-09), so drop the
underscore prefix that suggests that it's private.
Before this patch, while "hg convert", largefiles avoids copying
largefiles in the working directory into the store area by combination
of setting "repo._isconverting" in "mercurialsink{before|after}" and
checking it in "copytostoreabsolute".
This avoiding is needed while "hg convert", because converting doesn't
update largefiles in the working directory.
But this implementation is not efficient, because:
- invocation in "markcommitted" can easily ensure updating
largefiles in the working directory
"markcommitted" is invoked only when new revision is committed via
"commit" of "localrepository" (= with files in the working
directory). On the other hand, "commitctx" may be invoked directly
for in-memory committing.
- committing without updating the working directory (e.g. "import
--bypass") also needs this kind of avoiding
For efficiency of this kind of avoiding, this patch does:
- move "copyalltostore" invocation into "markcommitted"
- remove meaningless procedures below:
- hooking "mercurialsink{before|after}" to (un)set "repo._isconverting"
- checking "repo._isconverting" in "copytostoreabsolute"
This patch invokes "copyalltostore" also in "_commitcontext", because
"_commitcontext" expects that largefiles in the working directory are
copied into store area after "commitctx". In this case, the working
directory is used as a kind of temporary area to write largefiles out,
even though converted revisions are committed via "commitctx" (without
updating normal files).
Putting "lambda *msg, **opts: None" (= avoid printing messages always)
into "_lfstatuswriters" while transplanting makes explicit passing
"printmessage = False" for "updatelfiles()" useless.
This patch also removes setting/unsetting "repo._istransplanting" in
"overridetransplant", because there is no code path referring it.
Before this patch, "hg transplant --continue" may record incorrect
standins, because largefiles extension always avoid updating standins
while transplanting, even though largefiles in the working directory
may be modified manually at the 1st commit of "hg transplant --continue".
But, on the other hand, updating standins should be avoided at
subsequent commits for efficiency reason.
To update standins only at the 1st commit of "hg transplant
--continue", this patch uses "automatedcommithook", which updates
standins by "lfutil.updatestandinsbymatch()" only at the 1st commit of
resuming.
Even after this patch, "repo._istransplanting = True" is still needed
to avoid some status report while updating largefiles in
"lfcommands.updatelfiles()".
This is reason why this patch omits not "repo._istransplanting = True"
in "overriderebase" but examination of "getattr(repo,
"_istransplanting", False)" in "updatestandinsbymatch".
At "hg transplant --merge REV", largefiles newly coming from the 2nd
parent (= REV) are marked as "a"(dded) by "patch.patch()", and have to
be marked as "n"(ormal) after commit.
But until changeset 978713c45992, such largefiles were still marked as
"a" unexpectedly even after commit, because no additional entry is
added to filelog of such largefiles and they aren't listed in
"repo[newnode].files()" in this case: "newnode" is one of newly
committed changeset (= result of "repo.commit()").
"updatelfiles" invocation in "overridetransplant" shadows this problem
by forcibly synchronizing lfdirstate to dirstate.
Now, "updatelfiles" invocation in "overridetransplant" is redundant,
because changeset 978713c45992 made "markcommitted" use "ctx.files()"
to get targets of "synclfdirstate" instead of "repo[newnode].files()".
Putting "lambda *msg, **opts: None" (= avoid printing messages always)
into "_lfstatuswriters" while rebasing makes explicit passing
"printmessage = False" for "updatelfiles()" useless.
This patch also removes setting/unsetting "repo._isrebasing" in
"overriderebase", because there is no code path referring it.
This patch makes "updatelfiles()" get appropriate function to write
largefiles specific status messages via "getstatuswriter()".
This patch introduces None as "print messages if needed", because True
(forcibly writing) and False (forcibly ignoring) are already used for
"printmessage" of "updatelfiles".
Subsequent patch will move "avoid printing messages only while
automated committing" decision from caller of "updatelfiles()" into
"getstatuswriter()".
"lfutil.getstatuswriter" is the utility to get appropriate function to
write largefiles specific status out from "repo._lfstatuswriters".
This patch uses "stack" with an element instead of flag like
"_isXXXXing" or so, because:
- the former works correctly even when customizations are nested, and
- ensuring at least one element can ignore empty check
Before this patch, "hg rebase --continue" may record incorrect
standins, because largefiles extension always avoid updating standins
while rebasing, even though largefiles in the working directory may be
modified manually at the 1st commit of "hg rebase --continue".
But, on the other hand, updating standins should be avoided at
subsequent commits for efficiency reason.
To update standins only at the 1st commit of "hg rebase --continue",
this patch introduces state-full callable object
"automatedcommithook", which updates standins by
"lfutil.updatestandinsbymatch()" only at the 1st commit of resuming.
Even after this patch, "repo._isrebasing = True" is still needed to
avoid some status report while updating largefiles in
"lfcommands.updatelfiles()".
This is reason why this patch omits not "repo._isrebasing = True" in
"overriderebase" but examination of "getattr(repo, "_isrebasing",
False)" in "updatestandinsbymatch".
This changes allows to customize pre-committing procedures according
to conditions.
This patch uses "stack" with an element instead of flag like
"_isXXXXing" or so, because:
- the former works correctly even when customizations are nested, and
- ensuring at least one element can ignore empty check
This patch factors out procedures to update standins for
pre-committing. This is one of preparations to avoid execution of such
procedures according to invocation context.
For example, resuming automated committing (e.g. "hg rebase
--continue") should update standins at the 1st commit, because
largefiles in the working directory may be modified manually. But on
the other hand, it should avoid updating standins at subsequent
committings for efficiency reason.
For simplicity, this patch just moves procedures mechanically only
with replacing below.
- "self" => "repo"
- "lfutil." => (none)
- "orig" invocation => returning "match"
Using "fstandin" instead "standin" as the name of local variable for
the loop below is the only special care, because the latter shadows
the same name function in "lfutil.py".
[before]
for standin in standins:
lfile = lfutil.splitstandin(standin)
if lfdirstate[lfile] != 'r':
lfutil.updatestandin(self, standin)
[after]
for fstandin in standins:
lfile = splitstandin(fstandin)
if lfdirstate[lfile] != 'r':
updatestandin(repo, fstandin)
Before this patch, procedures to update lfdirstate for post-committing
are scattered in "lfilesrepo.commit". In the case of "hg commit" with
patterns for target files ("Case 2"), lfdirstate is updated BEFORE
real committing.
This patch factors out procedures to update lfdirstate for
post-committing into "lfutil.markcommitted", and makes it callable via
"markcommitted" of the context passed to "lfilesrepo.commitctx".
"markcommitted" of the context is called, only when it is committed
successfully.
Passing original "markcommitted" of the context is meaningless in this
patch, but required in subsequent one to prepare something before
invocation of it.
This patch removes "--rebase" specific code path for "hg pull" in
"overridepull", because previous patch makes it meaningless: now,
"rebase.rebase" ("orig" invocation in this patch) can
update/commit largefiles safely without "repo._isrebasing = True".
As a side effect of removing "rebase.rebase" invocation in
"overridepull", this patch removes "nothing to rebase ..." message in
"test-largefiles.t", which is shown only when rebase extension is
enabled AFTER largefiles:
before this patch:
1. "dispatch" invokes "pullrebase" of rebase as "hg pull" at
first, because rebase wraps "hg pull" later
2. "pullrebase" invokes "overridepull" of largefiles as "orig",
even though rebase assumes that "orig" is "pull" of commands
3. "overridepull" executes "pull" and "rebase" directly
3.1 "pull" pulls changesets and creates new head "X"
3.2 "rebase" rebases current working parent "Y" on "X"
4. "overridepull" returns to "pullrebase"
5. "pullrebase" tries to rebase, but there is nothing to be done,
because "Y" is already rebased on "X". then, it shows "nothing
to rebase ..."
after this patch:
1. "dispatch" invokes "pullrebase" of rebase as "hg pull"
2. "pullrebase" invokes "overridepull" of largefiles as "orig"
3. "overridepull" executes "pull" as "orig"
4. "overridepull" returns to "pullrebase"
5. revision "Y" is not yet rebased, so "pullrebase" doesn't shows
"nothing to rebase ..."
As another side effect of removing "rebase.rebase" invocation, this
patch fixes issue3861, which occurs only when rebase extension is
enabled BEFORE largefiles:
before this patch:
1. "dispatch" invokes "overridepull" of largefiles at first,
because largefiles wrap "hg pull" later
2. "overridepull" executes "pull" and "rebase" explicitly
2.1 "pull" pulls changesets and creates new head "X"
2.2 "rebase" rebases current working parent, but fails because
no revision is checked out in issue3861 case
3. "overridepull" returns to "dispatch" with exit code 1 returned
from "rebase" at (2.2)
4. "hg pull" terminates with exit code 1 unexpectedly
after this patch:
1. "dispatch" invokes "overridepull" of largefiles at first
2. "overridepull" invokes "pullrebase" of rebase as "orig"
3. "pullrebase" invokes "pull" as "orig"
4. "pullrebase" invokes "rebase", and it fails
5. "pullrebase" returns to "overridepull" with exit code 0
(because "pullrebase" ignores result of "pull" and "rebase")
6. "overridepull" returns to "dispatch" with exit code 0 returned
from "rebase" at (5)
7. "hg pull" terminates with exit code 0
Before this patch, largefiles extension wraps only "rebase" in the
command table by "extensions.wrapcommand". But there are some
functions using "rebase.rebase" directly.
Without special care for them, largefiles extension can't work
correctly with such functions. In addition to it, "special care" often
becomes complicated and awkward. For example:
- "unshelve" can't get correct result of "rebase.rebase", because of
lack of special care
- special care for "hg pull --rebase" causes issue3861
This patch wraps "rebase.rebase" for functions using it directly.
For simplicity, this patch keeps 'special care for "hg pull --rebase"'.
It is removed in the subsequent patch.
Instead of checking for a partial merge by checking that the matches
has no files and no patterns, check that it's not an
always-matcher. Except for being shorter, it also catches the rare
case of an exact-matcher with no files.
We currently shortcircuit the checking for large file standins if only
patterns of type 'path' are given on the command line. That makes e.g.
"hg st 'glob:foo/**'" unnecessarily slow when the only large files are
in a sibling directory.
Relax the check to be that it is not an always-matcher and that no
large files match the patterns given on the command line.
Note that before this change, only the latter of the following two
would show the status of files in .hglf (since the -I makes
match.anypats() true). After this change, they both display the
status. This behavior doesn't seem correct, but it would be a separate
change to explicitly filter out .hglf even in the shortcircuit case.
hg st .hglf/$file
hg st .hglf/$file -I .
In two very similar segments of code, an existing matcher is modified
by changing its _files attribute through a map and a filter
operation. Neither operation can cause an empty list to become
non-empty, so a matcher that always matches can not stop always
matching. Drop the setting of the attribute, so we don't unnecessarily
prevent the fast paths to be taken where these matchers end up being
used.
Before this patch, "hg status --rev REV" doesn't list largefiles up
with "M" mark, even if exec bit of them is changed, because
"lfilesrepo.status" doesn't examine exec bit in such case.
Before this patch, "hg status --rev REV" listed largefiles removed in
the working directory up with "R" mark, even if they aren't managed in
the REV. Normal files aren't listed up in such case.
When "lfilesrepo.status" is invoked for "hg status --rev REV", it
treats files on conditions below as "removed" (to avoid manifest full
scan in "ctx.status" ?):
- marked as "R" in lfdirstate, or
- files managed in the target revision but unknown in the manifest
of the working context (= not including "R" files)
But the former can include files not managed in the target context.
To ignore removal status of files not managed in the target context,
this patch drops files unknown in the target revision from "removed"
list.
In lfdirstatestatus(), the status tuple gets deconstructed, the lists
get updated, and then an identical status tuple gets created and
returned. Change it so we simply return the original tuple.
The status tuple returned from dirstate.status() has an additional
field compared to the other status tuples: lookup/unsure. This field
is just an optimization and not something most callers care about
(they want the resolved value of 'modified' or 'clean'). To prepare
for a single future status type, let's separate out the 'lookup' field
from the rest by having dirstate.status() return a pair: (lookup,
status).
Instead of iterating over all files in the context and ignoring those
that are not standins, pass a standin-matcher to the context and
iterate over only the files matching.
Apart from making the intent clearer, this implementation will also
benefit from any future optimizations done to the manifest walking
code.
The variable 'lfiles' is first used for a set of the names of all the
large files. It is then overwritten with a tuple like the ones
returned from status(). To reduce confusion, let's create a separate
variable for the second use.
At the end of lfilesrepo.status(), we clear the lists of unknown,
ignored and clean files, depending on the values of 'listunknown'
etc. The lists originate from other calls to status(), and it is only
'clean' that may get updated after the calls. Let's remove the need to
clear any of the lists by explicitly only adding to 'clean' when
'listclean' is true.
The internal API used IOError to indicate that a file should be marked as
removed.
There is some correlation between IOError (especially with ENOENT) and files
that should be removed, but using IOErrors to represent file removal internally
required some hacks.
Instead, use the value None to indicate that the file not is present.
Before, spurious IO errors could cause commits that silently removed files.
They will now be reported like all other IO errors so the root cause can be
fixed.
After previous patches, largefiles in the working directory are
ensured to be updated before "repo.commit" invocation for automated
committing below:
- by "overrides.mergeupdate" via "merge.update" for rebase
- by "overrides.scmutilmarktouched" via "patch.patch" for transplant
This patch removes redundant "lfcommands.updatelfiles" invocation in
"Case 0" code path of "lfilesrepo.commit" for automated committing,
and revises detailed comment.
Before this patch, largefiles in the working directory aren't updated
correctly, if transplant is aborted by conflict. This prevents users
from viewing appropriate largefiles while resolving conflicts.
While transplant, largefiles in the working directory are updated only
at successful committing in the special code path of
"lfilesrepo.commit()".
To update largefiles even if transplant is aborted by conflict, this
patch wraps "scmutil.marktouched", which is invoked from "patch.patch"
with "files" list of added/modified/deleted files.
This patch invokes "updatelfiles" with:
- "printmessage=False", to suppress "getting changed largefiles ..."
messages while automated committing by transplant
- "normallookup=True", because "patch.patch" doesn't update dirstate
for modified files
in such case, "normallookup=False" may cause marking modified
largefiles as "clean" unexpectedly
Before this patch, largefiles in the working directory aren't updated
correctly, if rebase is aborted by conflict. This prevents users from
viewing appropriate largefiles while resolving conflicts.
While rebase, largefiles in the working directory are updated only at
successful committing in the special code path of
"lfilesrepo.commit()".
To update largefiles even if rebase is aborted by conflict, this patch
centralizes the logic of updating largefiles in the working directory
into the "mergeupdate" wrapping "merge.update".
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, largefiles in the
working directory should be updated with other (normal) files
simultaneously while "merge.update" execution: maybe by hooking
"applyupdates".
"Action list based updating" introduced by hooking "applyupdates" will
also improve performance of updating, because it automatically
decreases target files to be checked.
Just after this patch, there are some improper things in "Case 0" code
path of "lfilesrepo.commit()":
- "updatelfiles" invocation is redundant for rebase
- detailed comment doesn't meet to rebase behavior
These will be resolved after the subsequent patch for transplant,
because this code path is shared with transplant.
Even though replacing "merge.update" in rebase extension by "hg.merge"
can also avoid this problem, this patch chooses centralizing the logic
into "mergeupdate", because:
- "merge.update" invocation in rebase extension can't be directly
replaced by "hg.merge", because:
- rebase requires some extra arguments, which "hg.merge" doesn't
take (e.g. "ancestor")
- rebase doesn't require statistics information forcibly displayed
in "hg.merge"
- introducing "mergeupdate" can resolve also problem of some other
code paths directly using "merge.update"
largefiles in the working directory aren't updated regardless of
the result of commands below, before this patch:
- backout (for revisions other than the parent revision of the
working directory without "--merge")
- graft
- histedit (for revisions other than the parent of the working
directory
When "partial" is specified, "merge.update" doesn't update dirstate
entries for standins, even though standins themselves are updated.
In this case, "normallookup" should be used to mark largefiles as
"possibly dirty" forcibly, because applying "normal" on lfdirstate
treats them as "clean" unexpectedly.
This is reason why "normallookup=partial" is specified for
"lfcommands.updatelfiles".
This patch doesn't test "hg rebase --continue", because it doesn't
work correctly if largefiles in the working directory are modified
manually while resolving conflicts. This will be fixed in the next
step of refactoring for largefiles.
All changes of tests/*.t files other than test-largefiles-update.t in
this patch come from invoking "updatelfiles" not after but before
statistics output of "hg.update", "hg.clean" and "hg.merge".
Code paths below expect "hg.updaterepo" (or "hg.update" using it) to
execute linear merging:
- "update" in commands
- "postincoming" in commands, used for:
- "hg pull --update"
- "hg unbundle --update"
- "hgsubrepo.get" in subrepo
For linear merging with largefiles, standins should be updated
according to (possibly dirty) largefiles before "merge.update"
invocation to detect conflicts correctly.
Before this patch, only the "update" command can execute linear merging
correctly, because largefiles extension takes care of only it.
This patch moves "updatestandin" invocation from "overrideupdate" ("hg
update" wrapper) to "_hgupdaterepo" ("hg.updaterepo" wrapper) to
execute linear merging in "hg.updaterepo" correctly.
This is also a preparation to centralize the logic of updating
largefiles in the working directory into the function wrapping
"merge.update" in the subsequent patch.
Before this patch, standinds not known to the restored dirstate at
rollback still exist after rollback of the parent of the working
directory, and they become orphans unexpectedly.
This patch unlinks standins not known to the restored dirstate.
This patch saves names of standins matched against not
"repo.dirstate[f] == 'a'" but "repo.dirstate[f] != 'r'" before
rollback, because branch merging marks files newly added to
dirstate as not "a" but "n".
Such standins will also become orphan after rollback, because they are
not known to the restored dirstate.
Before this patch, standins are restored from the NEW parent of the
working directory at "hg rollback", and this causes:
- standins removed in the rollback-ed revision are restored, and
become orphan, because they are already marked as "R" in the
restored dirstate and expected to be unlinked
- standins added in the rollback-ed revision are left as they were
before rollback, because they are not included in the new parent
(this may not be so serious)
This patch replaces the "merge.update" invocation with a specific
implementation to restore standins according to restored dirstate.
This is also the preparation to centralize the logic of updating
largefiles into the function wrapping "merge.update" in the subsequent
patch.
After that patch, "merge.update" will also update largefiles in the
working directory and be redundant for restoring standins only.
Before this patch, "hg rollback" can't restore standins correclty, if:
- old parent of the working directory is rollback-ed, and
- new parent of the working directory is not branch-tip
"overriderollback" uses "merge.update" as a kind of "revert" utility
to restore only standins with "node=None", and this makes
"merge.update" choose "branch-tip" revision as the updating target
unexpectedly.
Then, "merge.update" restores standins from the branch-tip revision
regardless of the parent of the working directory after rollback and
this may cause unexpected behavior.
This patch invokes "merge.update" with "node='.'" to restore standins
from the parent revision of the working directory.
In fact, this "merge.update" invocation will be replaced in the
subsequent patch to fix another problem, but this change is usefull to
inform reason why such complicated case should be tested.
For efficiency, this patch omits restoring standins and updating
lfdirstate, if the parent of the working directory is not rollbacked.
This patch adds the test not to confirm whether restoring is skipped
or not, but to detect unexpected regression in the future: it is
difficult to distinguish between skipping and perfectly restoring.
Before this patch, linear merging of modified largefiles causes
an unexpected result, if (1) largefile collides with same-name normal one
in the target revision and (2) "local" largefile is chosen, even
though branch merging between such revisions works correctly.
Expected result of such linear merging is marking the largefile as
(re-)"added", but the actual result is marking it as "modified".
The standin of modified "local largefile" is not changed by linear
merging, and updating/merging update lfdirstate entries only for
largefiles of which standins are changed.
This patch adds the code path to update lfdirstate only for largefiles
of which standins are not changed.
In this case, "synclfdirstate" should be invoked with True as
"normallookup" argument always to force using "normallookup" on
dirstate for "n" files, because "normal" may mark target files as
"clean" unexpectedly.
To reduce cost of "lfile not in filelist", this patch converts
"filelist" to a "set" object: "filelist" is used only in (1) the newly
added code path and (2) the next line of "filelist = set(filelist)".
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, "lfdirstate" should
be updated with "dirstate" simultaneously during "merge.update"
execution: maybe by hooking "recordupdates" (+ total refactoring
around lfdirstate handling)
Before this patch, linear merging of modified or newly added largefile
causes unexpected result, if (1) largefile collides with same name
normal one in the target revision and (2) "local" largefile is chosen,
even though branch merging between such revisions doesn't.
Expected result of such linear merging is:
(1) (not yet recorded) largefile is kept in the working directory
(2) largefile is marked as (re-)"added"
(3) colliding normal file is marked as "removed"
But actual result is:
(1) largefile in the working directory is unlinked
(2) largefile is marked as "normal" (so treated as "missing")
(3) the dirstate entry for colliding normal file is just dropped
(1) is very serious, because there is no way to restore temporarily
modified largefiles.
(3) prevents the next commit from adding the manifest with correct
"removal of (normal) file" information for newly created changeset.
The root cause of this problem is putting "lfile" into "actions['r']"
in linear-merging case. At liner merging, "actions['r']" causes:
- unlinking "target file" in the working directory, but "lfile" as
"target file" is also largefile itself in this case
- dropping the dirstate entry for target file
"actions['f']" (= "forget") does only the latter, and this is reason
why this patch doesn't choose putting "lfile" into it instead of
"actions['r']".
This patch newly introduces action "lfmr" (LargeFiles: Mark as
Removed) to mark colliding normal file as "removed" without unlinking
it.
This patch uses "hg debugdirstate" instead of "hg status" in test,
because:
- choosing "local largefile" hides "removed" status of "remote
normal file" in "hg status" output, and
- "hg status" for "large2" in this case has another problem fixed in
the subsequent patch
Before this patch, there are two distinct "wlock" scopes below in
"hgmerge":
1. "merge.update" via original "hg.merge" function
2. "updatelfiles" specific "wlock" scope (to synchronize largefile
dirstate)
But these should be executed in the same "wlock" scope for
consistency, because users of "hg.merge" don't get "wlock" explicitly
before invocation of it.
- merge in commands
This patch puts almost all of the original "hgmerge" implementation into
"_hgmerge" to reduce changes.
Before this patch, there are two distinct "wlock" scopes below in
"hgupdaterepo":
1. "merge.update" via original "hg.updaterepo" function
2. "updatelfiles" specific "wlock" scope (to synchronize largefile
dirstate)
In addition to them, "dirstate.walk" is executed between these "wlock"
scopes.
But these should be executed in the same "wlock" scope for
consistency, because many (indirect) users of "hg.updaterepo" don't
get "wlock" explicitly before invocation of it.
"hg.clean" is invoked without "wlock" from:
- mqrepo.restore in mq
- bisect in commands
- update in commands
"hg.update" is invoked without "wlock" from:
- clone in mq
- pullrebase in rebase
- postincoming in commands (used in "hg pull -u", "hg unbundle")
- update in commands
This patch puts almost all original "hgupdaterepo" implementation into
"_hgupdaterepo" to reduce changes.
This makes hg log --follow --patch work, since in cmdutil._makelogrevset we
use the non-follow matcher for hg log --follow --patch with no file arguments.
This has actually been broken since at least Mercurial 2.8 -- hg log --patch
with largefiles only used to work when no largefiles existed. Rev 658ce4a0a0a9
exposed this bug for all cases.
lfstatus should only be True for operations where we want standins to be
printed out. We explicitly do not want that for historical operations like log.
Other historical operations like hg diff -r A -r B don't print out standins
either.
This is required to fix issue4334, but doesn't fix anything by itself. That's
why there aren't any tests accompanying this patch.
Before this patch, after successful "hg rebase" of the revision
removing largefiles, "hg status" may still show ""R" for such
largefiles unexpectedly.
"lfilesrepo.commit" executes the special code path for automated
committing while rebase/transplant, and lfdirstate entries for removed
files aren't updated in this code path, even after successful
committing.
Then, "R" entries still existing in lfdirstate cause unexpected "hg
status" output.
This patch synchronizes lfdirstate with dirstate after automated
committing.
This patch passes False as "normallookup" to "synclfdirstate", because
modified files in "files()" of the recent (= just committed) context
should be "normal"-ed.
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, lfdirstate should
be updated with dirstate simultaneously. Hooking "markcommitted" of
ctx in "localrepository.commitctx" may achieve this.
This problem occurs, only when (1) the parent of the working directory
is rebased and (2) it removes largefiles, because:
- if the parent of the working directory isn't rebased, returning to
the initial revision (= update) after rebase hides this problem
- files added on "other" branch (= rebase target) are treated not as
"added" but as "modified" (= "normal" status and "unset"
timestamp) at merging
This patch tests also the status of added largefile, but it is only
for avoiding regression.
In addition to conditions above, "hg status" must not take existing
files to reproduce this problem, because existing files make
"match._files" not empty in "lfilesrepo.status" code path below:
def sfindirstate(f):
sf = lfutil.standin(f)
dirstate = self.dirstate
return sf in dirstate or sf in dirstate.dirs()
match._files = [f for f in match._files
if sfindirstate(f)]
Not empty "match._files" prevents "status" on lfdirstate from
returning the result containing problematic "R" files.
This is reason why "large1" (removed) and "largeX" (added) are checked
separately in this patch.
Problematic code path in "lfilesrepo.commit" is used also by "hg
transplant", but this problem doesn't occur at "hg transplant",
because invocation of "updatelfiles" after transplant-ing in
"overridetransplant" causes cleaning lfdirstate up.
This patch tests also "hg transplant" as same as "hg rebase", but it
is only for avoiding regression.
Before this patch, newly added (but not yet committed) largefiles
aren't treated as unknown ("?") after "hg rollback".
After "hg rollback", lfdirstate still contains "A" status entries for
such largefiles, even though corresponding entries for standins are
already dropped from dirstate.
Such "orphan" entries in lfdirstate prevent unknown (large)files in
the working directory from being listed up in "unknown" list. The code
path in "if working" route of "lfilesrepo.status" below drops
largefiles tracked in lfdirstate from "unknown" list:
lfiles = set(lfdirstate._map)
# Unknown files
result[4] = set(result[4]).difference(lfiles)
This patch drops orphan entries from lfdristate at "hg rollback".
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, lfdirstate should
be rollback-ed as a part of transaction, as same as dirstate.
Before this patch, removed or forgotten largefiles aren't treated as
removed ("R") after "hg rollback". Removed ones are treated as missing
("!") and forgotten ones are treated as clean ("C") unexpectedly.
"overriderollback" uses "normallookup" to restore status in lfdirstate
for largefiles other than ones not added in rollback-ed revision, but
this isn't correct for removed (or forgotten) largefiles.
This patch uses "lfutil.synclfdirstate" to restore "R" status of
removed (or forgotten) largefiles correctly at "hg rollback".
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, lfdirstate should
be rollback-ed as a part of transaction, as same as dirstate.
Before this patch, there are three distinct "wlock" scopes in
"overriderollback":
1. "localrepository.rollback" via original "rollback" command,
2. "merge.update" for reverting standin files only, and
3. "overriderollback" specific "wlock" scope (to synchronize
largefile dirstate)
But these should be executed in the same "wlock" scope for
consistency.
Before this patch, largefiles gotten from revisions other than the
parent of the working directory at "hg revert" become "clean"
unexpectedly in steps below:
1. "repo.status()" is invoked (for status check before reverting)
1-1 "dirstate" entry for standinfile SF is "normal"-ed
(1-2 "lfdirstate" entry of largefile LF (for SF) is "normal"-ed)
2. "cmdutil.revert()" is invoked
2-1 standinfile SF is updated in the working directory
2-2 "dirstate" entry for SF is NOT updated
3. "lfcommands.updatelfiles()" is invoked (by "overrides.overriderevert()")
3-1 largefile LF (for SF) is updated in the working directory
3-2 "dirstate" returns "n" and valid timestamp for SF (by 1-1 and 2-2)
3-3 "lfdirstate" entry for LF is "normal"-ed
3-4 "lfdirstate" is written into ".hg/largefiles/dirstate", and
timestamp of LF is stored into "lfdirstate" file (by 3-3)
(ASSUMPTION: timestamp of LF differs from one of "lfdirstate" file)
Then, "hs status" treats LF as "clean", even though LF is updated by
"other" revision (by 3-1), because "lfilesrepo.status()" always treats
"normal"-ed files (by 3-3 and 3-4) as "clean".
When largefiles are reverted, they should be "normallookup"-ed
forcibly.
This patch uses "normallookup" on "lfdirstate" while reverting, by
passing "True" to newly added argument "normallookup".
Forcible "normallookup"-ing is not so expensive, because list of
target largefiles is explicitly specified in this case.
This patch uses "[debug] dirstate.delaywrite" feature in the test, to
ensure that timestamp of the largefile gotten from "other" revision is
stored into ".hg/largefiles/dirstate" (for ASSUMPTION at 3-4)
Before this patch, largefiles gotten from "other" revision (with
conflict) at "hg merge" become "clean" unexpectedly in steps below:
1. "repo.status()" is invoked (for status check before merging)
1-1 "dirstate" entry for standinfile SF is "normal"-ed
1-2 "lfdirstate" entry of largefile LF (for SF) is "normal"-ed
2. "merge.update()" is invoked
2-1 SF is updated in the working directory
(ASSUMPTION: user choice "other" at conflict)
2-2 "dirstate" entry for SF is "merge"-ed
3. "lfcommands.updatelfiles()" is invoked (by "overrides.hgmerge()")
3-1 largefile LF (for SF) is updated in the working directory
3-2 "dirstate" returns "m" for SF (by 2-2)
3-3 "lfdirstate" entry for LF is left as it is
3-4 "lfdirstate" is written into ".hg/largefiles/dirstate", and
timestamp of LF is stored into "lfdirstate" file (by 1-2)
(ASSUMPTION: timestamp of LF differs from one of "lfdirstate" file)
Then, "hs status" treats LF as "clean", even though LF is updated by
"other" revision (by 3-1), because "lfilesrepo.status()" always treats
"normal"-ed files (by 1-2 and 3-4) as "clean".
When state of standinfile in "dirstate" is "m", largefile should be
"normallookup"-ed.
This patch invokes "normallookup" on "lfdirstate" for merged files.
This patch uses "[debug] dirstate.delaywrite" feature in the test, to
ensure that timestamp of the largefile gotten from "other" revision is
stored into ".hg/largefiles/dirstate". (for ASSUMPTION at 3-4)
Before this patch, largefiles gotten from "other" revision (without
conflict) at "hg merge" become "clean" unexpectedly in steps below:
1. "merge.update()" is invoked
1-1 standinfile SF is updated in the working directory
1-2 "dirstate" entry for SF is "normallookup"-ed
2. "lfcommands.updatelfiles()" is invoked (by "overrides.hgmerge()")
2-1 largefile LF (for SF) is updated in the working directory
2-2 "dirstate" returns "n" for SF (by 1-2)
2-3 "lfdirstate" entry for LF is "normal"-ed
2-4 "lfdirstate" is written into ".hg/largefiles/dirstate", and
timestamp of LF is stored into "lfdirstate" file
(ASSUMPTION: timestamp of LF differs from one of "lfdirstate" file)
Then, "hs status" treats LF as "clean", even though LF is updated by
"other" revision (by 2-1), because "lfilesrepo.status()" always treats
"normal"-ed files (by 2-3 and 2-4) as "clean".
When timestamp is not set (= negative value) for standinfile in
"dirstate", largefile should be "normallookup"-ed regardless of
rebasing or not, because "n" state in "dirstate" doesn't ensure
"clean"-ness of a standinfile at that time.
This patch uses "normallookup" instead of "normal", if "mtime" of
standin is unset
This is a temporary way to fix with less changes. For fundamental
resolution of this kind of problems in the future, "lfdirstate" should
be updated with "dirstate" simultaneously while "merge.update"
execution: maybe by hooking "recordupdates"
It is also why this patch (temporarily) uses internal field "_map" of
"dirstate" directly.
This patch uses "[debug] dirstate.delaywrite" feature in the test, to
ensure that timestamp of the largefile gotten from "other" revision is
stored into ".hg/largefiles/dirstate". (for ASSUMPTION at 2-4)
This patch newly adds "test-largefiles-update.t", to avoid increasing
cost to run other tests for largefiles by subsequent patches
(especially, "[debug] dirstate.delaywrite" causes so).
Previously, the directory '.hg/largefiles' would always be created if it didn't
exist when the lfdirstate was opened. If there were no standin files, no
dirstate file would be created in the directory. The end result was that
enabling the largefiles extension globally, but not explicitly adding a
largefile would result in the repository eventually sprouting this directory.
Creation of this directory effectively changes readonly operations like summary
and status into operations that require write access. Without write access,
commands that would succeed without the extension loaded would abort with a
surprising error when the extension is loaded, but not actively used:
$ hg sum -R /tmp/thg --config extensions.largefiles=
parent: 16541:00dc703d5aed
repowidget: specify incoming bundle by plain file path to avoid url parsing
branch: default
abort: Permission denied: '/tmp/thg/.hg/largefiles'
This change is simpler than changing the callers of openlfdirstate() to use the
'create' parameter that was introduced in 74522122b97d, and probably how that
should have been implemented in the first place.
Before this patch, "hg summary" and "hg outgoing" show and count up
all largefiles changed/added in outgoing revisions, even though some
of them are already uploaded into remote store.
This patch confirms existence of outgoing largefile entities in remote
store, to show and count up only really outgoing largefile entities at
"hg summary" and "hg outgoing".
Before this patch, "hg outgoing --large" shows which largefiles are
changed or added in outgoing revisions only in the point of the view
of filenames.
For example, according to the list of outgoing largefiles shown in "hg
outgoing" output, users should expect that the former below costs much
more to upload outgoing largefiles than the latter.
- outgoing revisions add a hundred largefiles, but all of them refer
the same data entity
in this case, only one data entity is outgoing, even though "hg
summary" says that a hundred largefiles are outgoing.
- a hundred outgoing revisions change only one largefile with
distinct data
in this case, a hundred data entities are outgoing, even though
"hg summary" says that only one largefile is outgoing.
But the latter costs much more than the former, in fact.
This patch shows also how many data entities are outgoing at "hg
outgoing" by counting number of unique hash values for outgoing
largefiles.
When "--debug" is specified, this patch also shows what entities (in
hash) are outgoing for each largefiles listed up, for debug purpose.
In "ui.debugflag" route, "addfunc()" can append given "lfhash" to the
list "toupload[fn]" always without duplication check, because
de-duplication is already done in "_getoutgoings()".
Before this patch, "hg summary --large" shows how many largefiles are
changed or added in outgoing revisions only in the point of the view
of filenames.
For example, according to the number of outgoing largefiles shown in
"hg summary" output, users should expect that the former below costs
much more to upload outgoing largefiles than the latter.
- outgoing revisions add a hundred largefiles, but all of them refer
the same data entity
in this case, only one data entity is outgoing, even though "hg
summary" says that a hundred largefiles are outgoing.
- a hundred outgoing revisions change only one largefile with
distinct data
in this case, a hundred data entities are outgoing, even though
"hg summary" says that only one largefile is outgoing.
But the latter costs much more than the former, in fact.
This patch shows also how many data entities are outgoing at "hg
summary" by counting number of unique hash values for outgoing
largefiles.
This patch introduces "_getoutgoings" to centralize the logic
(de-duplication, too) into it for convenience of subsequent patches,
even though it is not required in "hg summary" case.
This patch changes the calling signature of memfilectx's __init__ to fall in
line with the other file contexts.
Calling code and tests have been updated accordingly.
This replaces the grand unified action list that had multiple action types as
tuples in one big list. That list was iterated multiple times just to find
actions of a specific type. This data model also made some code more
convoluted than necessary.
Instead we now store actions as a tuple of lists. Using multiple lists gives a
bit of cut'n'pasted code but also enables other optimizations.
This patch uses 'if True:' to preserve indentations and help reviewing. It also
limits the number of conflicts with other pending patches. It can trivially be
cleaned up later.
Adds a labels function parameter to all the functions between merge.update and
filemerge.filemerge. This will allow commands like rebase to specify custom
marker labels.
When invoked from another directory, the matchers m._cwd will be the absolute
path. The code for calculating relative path to .hglf did not consider that and
log would fail with weird errors and paths.
For now, just don't do any largefile magic when invoked from other directories.
Log for largefiles was failing for graph log since it was overriding match
instead of matchandpats.
[Mads Kiilerich modified this patch to address his review comments and ended up
rewriting/removing most of it.]
cat of a standin would silently fail.
The use of standins is mostly an implementation detail, but it is already a bit
leaking. Being able to see the content of standins might be convenient for
debugging.
A .orig of a standin after the update do that a .orig of the actual largefile
is created. The .orig standin was however never removed again and the largefile
.orig was thus overwritten again and again.
The fix: remove the standin .orig when it is used.
Before this patch, "hg outgoing" invokes "findcommonoutgoing()" not
only in "commands.outgoing()" but also in
"overrides.overrideoutgoing()" (via "getoutgoinglfiles()"), when
largefiles is enabled. The latter is redundant.
This patch uses "outgoinghooks" to avoid redundant outgoing check.
Newly introduced function "overrides.outgoinghook()" is registered
into "outgoinghooks" to get the result of outgoing check in
"commands.outgoing()".
It invokes "lfutil.getlfilestoupload()" directly with the result of
outgoing check to avoid redundant outgoing check in
"getoutgoinglfiles()": "sort()" is needed, because
"lfutil.getlfilestoupload()" doesn't sort the result of it.
This patch also omits "if toupload is None" ("No remote repo") case,
because failure of looking remote repository up should raise exception
in "commands.outgoing()" before invocation of "outgoinghooks".
Newly added "hg outgoing --large --graph" tests examine
"outgoinghooks" invocations in "hg outgoing --graph" code path.
Before this patch, "hg summary --remote --large" invokes
"findcommonoutgoing()" not only in "commands.summary()" but also in
"overrides.overridesummary()" (via "getoutgoinglfiles()"). The latter
is redundant.
This patch uses "summaryremotehooks" to avoid redundant outgoing check.
Newly introduced function "overrides.summaryremotehook()" is
registered into "summaryremotehooks" to get the result of outgoing
check in "commands.summary()".
It invokes "lfutil.getlfilestoupload()" directly with the result of
outgoing check to avoid redundant outgoing check in
"getoutgoinglfiles()".
Before this patch, "hg push" invokes "findcommonoutgoing()" not only
in "exchange.push()" but also in "lfilesrepo.push()", when largefiles
is enabled. The latter is redundant.
This patch registers own "prepushoutgoinghook" function into
"prepushoutgoinghooks" of "localrepository" to reuse
"findcommonoutgoing()" result.
"prepushoutgoinghook" omits "changelog.nodesbetween()" invocation,
because "findcommonoutgoing()" invocation in "exchange.push()" takes
"onlyheads" argument and it considers "nodesbetween()".
Before this patch, "overrides.getoutgoinglfiles()" (called by
"overrideoutgoing()" and "overridesummary()") and "lfilesrepo.push()"
implement similar logic to get outgoing largefiles separately.
This patch centralizes the logic to get outgoing largefiles in
"lfutil.getlfilestoupload()".
"lfutil.getlfilestoupload()" takes "addfunc" argument, because each
callers need different information (and it is useful for enhancement
in the future).
- "overrides.getoutgoinglfiles()" needs only filenames
- "lfilesrepo.push()" needs only hashes of largefiles
Before this patch, "contrib/check-code.py" can't detect these
problems, because the regexp pattern to detect "% inside _()" doesn't
suppose the case that the format string and "%" aren't placed in the
same line.
This patch replaces "\s" in that regexp pattern with "[ \t\n]" to
detect "% inside _()" problems in such case.
"[\s\n]" can't be used in this purpose, because "\s" is automatically
replaced with "[ \t]" by "_preparepats()" and "\s" in "[]" causes
nested "[]" unexpectedly.
Since changeset a8955c4d9ef5, "reposetup()" of each extensions is
invoked only on repositories enabling corresponded extensions.
This causes that largefiles specific interactions between the
repository enabling largefiles locally and remote (wire) peer fail,
because there is no way to know whether largefiles is enabled on the
remote repository behind the wire peer, and largefiles specific
"wireproto functions" are not given to any wire peers.
To avoid this problem, largefiles should be enabled in wider scope
than each repositories (e.g. user-wide "${HOME}/.hgrc").
This patch introduces "wirepeersetupfuncs" to setup wire peer by
extensions already enabled. Functions registered into
"wirepeersetupfuncs" are invoked for all wire peers.
This patch uses plain list instead of "util.hooks" for
"wirepeersetupfuncs", because the former allows to control order of
function invocation by order of extension enabling: it may be useful
for workaround of problems with combination of enabled extensions
This should correct an earlier couple of bad merges (5433856b2558 and
596960a4ad0d, now pruned) that accidentally brought in a change that had
been marked obsolete (244ac996a821).
Some extensions set configuration settings that showed up in 'hg showconfig
--debug' with 'none' as source. That was confusing.
Instead, they will now tell which extension they come from.
This change tries to be consistent and specify a source everywhere - also where
it perhaps is less relevant.
The largefile hashes are mostly an implementation detail, but they are "leaked"
in several places anyway, and showing the hashes is better than not giving the
user any information about the options in the prompt.
The hashes are long, but it is largefile hashes and it would thus be confusing
to shorten them.
Before it tried to explain the exact situation when merging moved largefiles.
That do not happen for normal merges and is not more relevant for largefiles
than for normal files. It is unneeded complexity - remove it.
Since the localrepositoyry.push() method in mercurial/localrepo.py is defined
this way:
def push(self, remote, force=False, revs=None, newbranch=False):
it is better for largefiles to call push() on the super class with proper
kwargs to respect the API.
This will avoid breaking other extensions overriding the push method this way:
def push(self, remote, force=False, **kwargs):
a8386b4c47b1 introduced splitstandin on all action filenames. It would however
crash on 'd' actions where the filename is None.
Fix that and add test coverage for that case.
An update would try to fetch any missing largefiles after having updated normal
files and standins. That could fail or be interrupted and would leave the
working directory in a state where the largefiles not only were missing but
also were scheduled for remove ... and where the old largefile was left in
place.
Instead we now remove old largefiles before starting to download and update
missing largefiles.
Prompts like
foo has been turned into a largefile
use (l)argefile or keep as (n)ormal file?
was not as clear as the usual prompts that use 'remote' or 'local' to explain
what happened on which side ... especially not when used to the normal prompts.
"as" could also indicate that it would be possible to take the content of the
largefile and somehow put it into the normal file. It could make it more clear
that it was a choice between one side or the other.
For consistency we will now phrase it like:
remote turned local normal file f into a largefile
use (l)argefile or keep (n)ormal file?
We used to get like:
$ hg up -r 2
foo has been turned into a normal file
keep as (l)argefile or use (n)ormal file? l
getting changed largefiles
0 largefiles updated, 0 removed
0 files updated, 0 files merged, 2 files removed, 0 files unresolved
$ cat foo
cat: foo: No such file or directory
[1]
- which both asked the wrong question and did the wrong thing.
Instead, skip this conflict resolution when the local conflicting file has been
scheduled for removal and there thus is no conflict.
Before this patch, each feature setup functions for localrepository
class should examine whether corresponding extension is enabled or not
by themselves.
This patch invokes only feature setup functions defined in module of
enabled extensions, and it makes implementation of feature setup
functions easier and simpler.
Before the hack would replace 'heads' with 'lheads' no matter where it occured
in a batch command string.
Instead we will use a regexp to more carefully only match the 'heads' commands.
Before this patch, if largefiles extension is enabled once in any of
target repositories, commands handling multiple repositories at a time
like below misunderstand that "largefiles" feature is supported also
in all other local repositories:
- clone/pull from or push to localhost
- recursive execution in subrepo tree
This patch registers "featuresetup()" into "featuresetupfuncs" of
"localrepository" to support "largefiles" features only in
repositories enabling largefiles extension, instead of adding
"largefiles" feature to class variable "_basesupported" of
"localrepository".
This patch also adds checking below to the largefiles specific class
derived from "localrepository":
- push to localhost: whether features supported in the local(= dst)
repository satisfies ones required in the remote(= src)
This can prevent useless looking up in the remote repository, when
supported and required features are mismatched: "push()" of
"localrepository" also checks it, but it is executed after looking up
in the remote.
Before this patch, all localrepositories support same features,
because supported features are managed by the class variable
"supported" of "localrepository".
For example, "largefiles" feature provided by largefiles extension is
recognized as supported, by adding the feature name to "supported" of
"localrepository".
So, commands handling multiple repositories at a time like below
misunderstand that such features are supported also in repositories
not enabling corresponded extensions:
- clone/pull from or push to localhost
- recursive execution in subrepo tree
"reposetup()" can't be used to fix this problem, because it is invoked
after checking whether supported features satisfy ones required in the
target repository.
So, this patch adds the set object named as "featuresetupfuncs" to
"localrepository" to manage hook functions to setup supported features
of each repositories.
If any functions are added to "featuresetupfuncs", they are invoked,
and information about supported features is managed in each
repositories individually.
This patch also adds checking below:
- pull from localhost: whether features supported in the local(= dst)
repository satisfies ones required in the remote(= src)
- push to localhost: whether features supported in the remote(= dst)
repository satisfies ones required in the local(= src)
Managing supported features by the class variable means that there is
no difference of supported features between each instances of
"localrepository" in the same Python process, so such checking is not
needed before this patch.
Even with this patch, if intermediate bundlefile is used as pulling
source, pulling indirectly from the remote repository, which requires
features more than ones supported in the local, can't be prevented,
because bundlefile has no information about "required features" in it.
The refactoring of all the context objects allows us to simply pass a basectx
to the __new__ constructor and have it return the same object without
allocating new memory.
This also removes the need to import the context module.
Before this patch, largefiles extension checks unknown files in the
working directory always case sensitively.
This causes failure in updating from the revision X consisting of
'.hglf/A' (and "A" implicitly) to the revision Y consisting of 'a'
(not ".hglf/A") on case insensitive filesystem, because "A" in the
working directory is treated as colliding against and different from
'a' on the revision Y.
This patch uses "repo.dirstate.normalize()" to check unknown files
with case awareness of the filesystem.
Before this patch, largefiles extension always unlinks largefiles
untracked on the target context in merging/updating after updating
working directory.
For example, it is assumed that the revision X consists of ".hglf/A"
(and "A" implicitly) and revision Y consists of "a" (not ".hglf/A").
In the case of updating from X to Y, largefiles extension tries to
unlink "A" after updating "a" in working directory. This causes
unexpected unlinking "a" on the case insensitive filesystem.
This patch checks existence of the file in the working context with
case awareness of the filesystem to prevent from such unexpected
unlinking.
"lfcommands._updatelfile()" also unlinks target file in the case
"largefile is tracked in the target context, but fails to be fetched".
This patch doesn't apply "repo.dirstate.normalize()" in this case,
because it should be already ensured in the manifest merging that
there is no normal file colliding against any largefiles.
After 08202d1ef738 I see:
$ hg id -q
largefiles: repo method 'commit' appears to have already been wrapped by another extension: largefiles may behave incorrectly
largefiles: repo method 'push' appears to have already been wrapped by another extension: largefiles may behave incorrectly
3bd0c95ec1bf
The warning is bad:
* The message gives no hint what the problem is and how it can be resolved.
The message is useless.
* Largefiles do have its share of problems, but I don't think I ever have seen
a problem where this warning would have helped. The 'may' in the warning
seems like an exaggeration of the risk. Having largefiles enabled in
combination with for instance mq, hggit and hgsubversion causes a warning
(depending on the configuration order) but do not cause problems. Extensions
might of course be incompatible, but they can be that in many other ways.
The check and the message are incorrect.
It would thus be better to remove the check and the warning completely.
Before 08202d1ef738 the check always failed. That change made the check work
more like intended ... but the intention was wrong. This change will thus also
back that change out.
This avoids a lot of expensive roundtrips to remote repositories ... but might
be slightly slower for local operations.
This will also change some aborts on missing files to warnings. That will in
some situations make it possible to continue working on a repository with
missing largefiles.
We did read the exactly the right number of bytes from the response body. But
if the response came in chunked encoding then that meant that the HTTP layer
still hadn't read the last 0-sized chunk and expected the app layer to read
more data from the stream. The app layer was however happy and sent another
request which had to be sent on another HTTP connection while the old one was
lingering until some other event closed the connection.
Adding an extra read where we expect to hit the end of file makes the HTTP
connection ready for reuse. This thus plugs a real socket leak.
To distinguish HTTP from SSH we look at self's class, just like it is done in
putlfile.
Avoid the intermediate limitreader and filechunkiter between getlfile and
copyandhash - return the right protocol and put the complexity where it better
can be managed.
This goes a step further than 974959d637b7 and backs out the unreleased
--cache-largefiles option. The same can be achieved with --lfrev heads(pulled()) and
we shouldn't introduce unnecessary command line options.
The revset will be evaluated after the changesets has been pulled, and missing
largefiles from matching revisions will be pulled to the local caches.
This in combination with revsets will make it possible to specify different
strategies for pulling largefiles.
The revset expressions used for this option might be quite complex and will
probably be most useful from scripts or an alias ... but less complicated than
configuring hooks.
We were calling back to the original commands.cat from inside the walk loop
that handled and filtered out largefiles. That did however happen with file
paths relative to repo root and the original cat would fail when it applied its
own walk and match on top of that.
Instead we now duplicate and modify the code from commands.cat and patch it to
handle both normal and largefiles.
A change in test output shows that this also makes the exit code with
largefiles consistent with the normal one in the case where one of several
specified files are missing.
This also fixes the combination of --output and largefiles.
Before this patch, repo wrapping detection in "reposetup()" of
largefiles can detect only limited repo wrapping: replacing target
functions by another one named as "wrap".
So, it can't detect repo wrapping even in recommended style: replacing
"__class__" of repo by derived class.
This patch can detect repo wrapping in both styles below:
- replacing "__class__" of repo by derived class (recommended style):
class derived(repo.__class__):
def push(self, *args, **kwargs):
return super(derived, self).push(*args, **kwargs)
repo.__class__ = derived
- replacing function of repo by another one (not recommended style):
orgpush = repo.push
def push(*args, **kwargs):
return orgpush(*args, **kwargs)
repo.push = push
These names were found using Cython; I was completely puzzled until
I searched the rest of the tree. It's icky to mess with another
module's namespace, but ickier yet to do so without a comment :-)
Upcoming patches will speed dirstate.walk up by not filtering based on the
match function when match.always() is True. For that to work, match.always()
needs to be accurate. Previously it wasn't so for largefiles.
lfutil.splitstandin(f) can be None, and we query the dirstate for that without
checking if it is. This will cause problems with the upcoming move to critbit-
based dicts, since they only support strings as keys.
When a series of commits first adds a file and then removes it,
hg rebase --collapse prompts whether to keep the file or delete it. This is
due to it reusing the branch merge code. In a noninteractive terminal it
defaults to keeping the file, which results in a collapsed commit that is
has a file that should be deleted. This bug resulted in developers accidentally
commiting unintentional changes to our repo twice today, so it's fairly
important to get fixed.
This change allows rebase --collapse to tell the merge code to accept the
latest version every time without prompting.
Adds a test as well.
cachelfiles jumped through loops to handle merges and modified files ... but it
did apparently no longer have a valid reason to do so. It should just always
make sure that the largefiles referenced from the standins are present - no
matter which actual largefile is stored in the working directory. If there is
no standin then there is nothing to fetch.
The old code usually verified the hash of all largefiles every time this
function was invoked - for examply by 'update'.
This change makes a trivial noop update 5-10 seconds faster on our repo (with
the other 50% spent doing another unnecessary hashing of all largefiles).
Situations where a largefile for some reason wasn't available sometimes caused
wrong largefile content and state. It has mostly been seen when interrupting
download of largefiles ... and when introducing programming errors.
Instead we now make sure to delete the old and wrong largefile. A missing file
is a well-known error condition and much more reasonable way to handle the
situation.
Largefiles can easily become missing - for example if it simply isn't available
or the download fail. It might even be convenient to be able to work that way
in some cases.
But commiting missing largefiles as if they had been 'hg remove'd is plain wrong.
Looking for a (potentially empty) directory was not reliable - both because it
is a reasonable assumption that empty directories can be removed and because it
wasn't created in all cases ... such as when pulling to an existing repository.
Test output is changed in a case where one revision was pulled, but because of
the off-by-one error it thought that 0 revisions were pulled ... and because of
another bug it thus (tried to) fetch largefiles for all revisions.
After this change it no longer reports failure when it failed while trying to
fetch largefiles it shouldn't fetch. Largefiles that it shouldn't fetch but
managed to fetch anyway will now correctly be missing later on.
This change thus resolves some of unexplained test output introduced in
8664d9900884.
After discussion, we've agreed that largefiles for newly pulled heads should
not be cached by default. The use case for this is using largefiles repos
with multiple remote servers (and therefore multiple remote largefiles caches),
where users will be pulling from non-default locations on a regular basis. We
think this use case will be significantly less common than the use case where
all largefiles are stored on the same central server, so the default should be
no caching.
The old behavior can be obtained by passing the --cache-largefiles flag to
pull.
largefiles tried to create a peer directly with the specified url. That caused
abort: unsupported URL component: "..."
if a revision was specified in the url.
The branch name do not matter for largefiles' use of remote peers. Largefiles
will be shared among all branches anyway.
When changesets referencing largefiles are pushed then the corresponding
largefiles will be pushed too - unless the target already has them. The client
will use statlfile to make sure it only sends largefiles that the target
doesn't have. The server would however on every statlfile check that the
content of the largefile had the expected hash. What should be cheap thus
became an expensive operation that trashed the disk and the cache.
Largefile hashes are already checked by putlfile before being stored on the
server. A server should thus be able to keep its largefile store free of
errors - even more than it can keep revlogs free of errors. Verification should
happen when running 'hg verify' locally on the server. Rehashing every
largefile on every remote stat is too expensive.
Clients will also stat lfiles before downloading them. When the server verified
the hash in stat it meant that it had to read the file twice to serve it.
With this change the server will assume its own hashes are ok without checking
them on every statlfile.
Some consequences of this change:
- in case of server side corruption the problem will be detected by the
existing check on the client side - not on server side
- clients that could upload an uncorrupted largefile when pushing will no
longer magically heal the server (and break hardlinks) - a client will now
only upload its uncorrupted files after the corrupted file has been removed
on the server side
- client side verify will no longer report corruption in files it doesn't have
(Issue3123 discussed related problems - and how they have been fixed.)
6fb54510b150 introduced batching of statlfile, but not all codepaths got
converted.
_getfile gave _stat garbage and got garbage back. The garbage didn't match the
expected error codes and was thus interpreted as success. It could thus end up
trying to fetch a largefile that didn't exist.
Instead we now pass _stat valid input and handle both correct and invalid
output correctly.
This makes the code work as intended ... but it would probably be better if it
didn't abort on missing largefiles, just like it happened to do before.
basestore.get uses util.atomictempfile when checking and receiving a new
largefile ... but the close/discard logic was too clever for largefiles.
Largefiles relied on being able to discard the file and thus prevent it from
being written to the store. That was however too brittle. lfutil.copyandhash
closes the infile after writing to it ... with a 'blecch' comment. The discard
was thus a silent noop, and as a result of that corruption would be detected
... and then the corrupted files would be used anyway.
Instead we now use a tmp file and rename or unlink it after validating it.
A better solution should be implemented ... but not now.
6fb54510b150 introduced batching of statlfile, but not all codepaths got
converted.
'hg verify' with a remotestore could thus crash with
TypeError: 'builtin_function_or_method' object is not iterable
Also, the 'hash' variable was used without assigning to it. Don't use variable
names that collide with Python built-in functions. Instead we use 'expecthash'
as in localstore.
The tests for this issue covers an untested area. The tests happens to also
reveal incorrect attempts at getting non-existing largefiles, bad server side
handling of that, and corruption issues - all to be fixed later.
Override updaterepo() instead of individual methods that may not be called for
each subrepo. Add test.
Based on patch from Matt Harbison.
Changes the order of update-related messages (now largefiles comes before the
global status).
In some cases, caching largefiles may take a long time (if the user has
pulled a lot of new heads). This patch makes it more clear what is happening,
by showing the number of heads we are caching largefiles for.
The slightly obscure --lfa and --lfc only worked as modifiers to --large and
could be combined. The documentation was however not clear what they did.
Instead they now imply --large and the description is updated.
Show messages at a point where the actions have been sorted, thus preparing for
backout of 14f4258e3526.
This makes manifestmerge more of a silent operation, just like 'copies' is.
Indent 'preserving' messages to make them subordinate to the action logging so
they fit in the new context. (The 'preserving' messages are quite redundant and
could also be removed completely.)
Add sorted() in places found by testing with PYTHONHASHSEED=random and code
inspection.
An alternative to sprinkling sorted() all over would be to change substate to a
custom dict with sorted iterators...
A situation with this case could happen after interrupting an update. Update
would fail with:
abort: No such file or directory: $TESTTMP/f/.hglf/sub2/large6
Update from a merge without using clean is not possible anyway, so this patch
takes a step in the right direction so it gets as far as reporting the right
error.
Largefiles update would try to copy f to f.orig if there was a .hglf/f.orig .
That is in many many ways very very wrong, but it also caused an abort if f
didn't exist.
Now it only tries to copy f if it exists.
Problem: 'hg status' with largefiles enabled would walk through all the files
that .hgignore said should be ignored. That made it slow if a lot of files were
.hgignored or the cache was cold.
It seems like there was a reason to this, but other improvements has rendered
this unnecessary.
Solution: .hgignore is now only ignored when that is requested (--ignore).
This is a minimal 'stable' change. There is room for other improvement.
Largefiles revert do for some reason have two lfdirstates and lfdirstatestatus
invocations in one function. The result from the first lfdirstate check was
however not written back to the lfdirstate, and some files was thus checked
twice.
Problem: 'hg status' kept checking largefiles with an unknown state until some
other command wrote the updated dirstate.
Solution: Add missing lfdirstate.write().
If we pass a directory to commit whose only commitable files
are largefiles, the core commit code aborts before finding
the largefiles.
So we do the following:
For directories that only have largefiles as matches,
we explicitly add the largefiles to the matchlist and remove
the directory.
In other cases, we leave the match list unmodified.