Before this patch, "hg summary" and "hg outgoing" show and count up
all largefiles changed/added in outgoing revisions, even though some
of them are already uploaded into remote store.
This patch confirms existence of outgoing largefile entities in remote
store, to show and count up only really outgoing largefile entities at
"hg summary" and "hg outgoing".
Before this patch, "hg outgoing --large" shows which largefiles are
changed or added in outgoing revisions only in the point of the view
of filenames.
For example, according to the list of outgoing largefiles shown in "hg
outgoing" output, users should expect that the former below costs much
more to upload outgoing largefiles than the latter.
- outgoing revisions add a hundred largefiles, but all of them refer
the same data entity
in this case, only one data entity is outgoing, even though "hg
summary" says that a hundred largefiles are outgoing.
- a hundred outgoing revisions change only one largefile with
distinct data
in this case, a hundred data entities are outgoing, even though
"hg summary" says that only one largefile is outgoing.
But the latter costs much more than the former, in fact.
This patch shows also how many data entities are outgoing at "hg
outgoing" by counting number of unique hash values for outgoing
largefiles.
When "--debug" is specified, this patch also shows what entities (in
hash) are outgoing for each largefiles listed up, for debug purpose.
In "ui.debugflag" route, "addfunc()" can append given "lfhash" to the
list "toupload[fn]" always without duplication check, because
de-duplication is already done in "_getoutgoings()".
The `test-largefiles.t` unified test is significantly longer (about 30%) than
any other tests in the mercurial test suite. As a result, its is alway the last
test my test runner is waiting for at the end of a run.
In practice, this means that `test-largefile.t` is wasting half a minute of my
life every times I'm running the mercurial test suites. This probably mean more
a few cumulated day by now.
I've finally decided to split it up in multiple smaller tests to bring it back in
reasonable length.
This changeset extracts independent test cases in two files. One dedicated to
wire protocole testing, and another one dedicated to all other tests that could
be independently extracted.
No test case were haltered in the making of this changeset.
Various timing available below. All timing have been done on a with 90 jobs on a
64 cores machine. Similar result are shown on firefly (20 jobs on 12 core).
General timing of the whole run
--------------------------------
We see a 25% real time improvement for no significant cpu time impact.
Before split:
real 2m1.149s
user 58m4.662s
sys 11m28.563s
After split:
real 1m31.977s
user 57m45.993s
sys 11m33.634s
Last test to finish (using run-test.py --time)
----------------------------------------------
test-largefile.t is now finishing at the same time than other slow tests.
Before split:
Time Test
119.280 test-largefiles.t
93.995 test-mq.t
89.897 test-subrepo.t
86.920 test-glog.t
85.508 test-rename-merge2.t
83.594 test-revset.t
79.824 test-keyword.t
78.077 test-mq-header-date.t
After split:
Time Test
90.414 test-mq.t
88.594 test-largefiles.t
85.363 test-subrepo.t
81.059 test-glog.t
78.927 test-rename-merge2.t
78.021 test-revset.t
77.777 test-command-template.t
Timing of largefile test themself
-----------------------------------
Running only tests prefixed with "test-largefiles".
No significant change in cumulated time.
Before:
Time Test
58.673 test-largefiles.t
2.931 test-largefiles-cache.t
0.583 test-largefiles-small-disk.t
After:
Time Test
31.754 test-largefiles.t
17.460 test-largefiles-misc.t
8.888 test-largefiles-wireproto.t
2.864 test-largefiles-cache.t
0.580 test-largefiles-small-disk.t
When invoked from another directory, the matchers m._cwd will be the absolute
path. The code for calculating relative path to .hglf did not consider that and
log would fail with weird errors and paths.
For now, just don't do any largefile magic when invoked from other directories.
Log for largefiles was failing for graph log since it was overriding match
instead of matchandpats.
[Mads Kiilerich modified this patch to address his review comments and ended up
rewriting/removing most of it.]
[Mads Kiilerich placed this patch before the patch that makes graphlog actually
work correctly for largefiles. As it is introduced here it just adds test
coverage and the actual bugfix patch will show the actual change.]
cat of a standin would silently fail.
The use of standins is mostly an implementation detail, but it is already a bit
leaking. Being able to see the content of standins might be convenient for
debugging.
A .orig of a standin after the update do that a .orig of the actual largefile
is created. The .orig standin was however never removed again and the largefile
.orig was thus overwritten again and again.
The fix: remove the standin .orig when it is used.
Before this patch, "hg outgoing" invokes "findcommonoutgoing()" not
only in "commands.outgoing()" but also in
"overrides.overrideoutgoing()" (via "getoutgoinglfiles()"), when
largefiles is enabled. The latter is redundant.
This patch uses "outgoinghooks" to avoid redundant outgoing check.
Newly introduced function "overrides.outgoinghook()" is registered
into "outgoinghooks" to get the result of outgoing check in
"commands.outgoing()".
It invokes "lfutil.getlfilestoupload()" directly with the result of
outgoing check to avoid redundant outgoing check in
"getoutgoinglfiles()": "sort()" is needed, because
"lfutil.getlfilestoupload()" doesn't sort the result of it.
This patch also omits "if toupload is None" ("No remote repo") case,
because failure of looking remote repository up should raise exception
in "commands.outgoing()" before invocation of "outgoinghooks".
Newly added "hg outgoing --large --graph" tests examine
"outgoinghooks" invocations in "hg outgoing --graph" code path.
Before this patch, "hg summary --remote --large" invokes
"findcommonoutgoing()" not only in "commands.summary()" but also in
"overrides.overridesummary()" (via "getoutgoinglfiles()"). The latter
is redundant.
This patch uses "summaryremotehooks" to avoid redundant outgoing check.
Newly introduced function "overrides.summaryremotehook()" is
registered into "summaryremotehooks" to get the result of outgoing
check in "commands.summary()".
It invokes "lfutil.getlfilestoupload()" directly with the result of
outgoing check to avoid redundant outgoing check in
"getoutgoinglfiles()".
Before this patch, "hg push" invokes "findcommonoutgoing()" not only
in "exchange.push()" but also in "lfilesrepo.push()", when largefiles
is enabled. The latter is redundant.
This patch registers own "prepushoutgoinghook" function into
"prepushoutgoinghooks" of "localrepository" to reuse
"findcommonoutgoing()" result.
"prepushoutgoinghook" omits "changelog.nodesbetween()" invocation,
because "findcommonoutgoing()" invocation in "exchange.push()" takes
"onlyheads" argument and it considers "nodesbetween()".
This should correct an earlier couple of bad merges (5433856b2558 and
596960a4ad0d, now pruned) that accidentally brought in a change that had
been marked obsolete (244ac996a821).
Since changeset a8955c4d9ef5, "reposetup()" of each extensions is
invoked only on repositories enabling corresponded extensions.
This causes that largefiles specific interactions between the
repository enabling largefiles locally and remote (wire) peer fail,
because there is no way to know whether largefiles is enabled on the
remote repository behind the wire peer, and largefiles specific
"wireproto functions" are not given to any wire peers.
To avoid this problem, largefiles should be enabled in wider scope
than each repositories (e.g. user-wide "${HOME}/.hgrc").
This patch introduces "wirepeersetupfuncs" to setup wire peer by
extensions already enabled. Functions registered into
"wirepeersetupfuncs" are invoked for all wire peers.
This patch uses plain list instead of "util.hooks" for
"wirepeersetupfuncs", because the former allows to control order of
function invocation by order of extension enabling: it may be useful
for workaround of problems with combination of enabled extensions
Unknown requirements will now be reported as:
abort: repository requires features unknown to this Mercurial: largefiles!
(see http://mercurial.selenic.com/wiki/MissingRequirement for more information)
Some features of this phrasing:
* avoid double ':' in abort message
* make it more clear who requires and knows what
* don't quote the requirement names - it is not something the user entered or
need the exact spelling of ... and it is "identifiers" that are unambiguous
anyway
* remove double hint by removing "(upgrade Mercurial)" comment
* don't mention upgrading Mercurial without mentioning enabling the feature -
instead, just refer to wiki page for both
* don't just talk about "details", talk about "more information"
The largefile hashes are mostly an implementation detail, but they are "leaked"
in several places anyway, and showing the hashes is better than not giving the
user any information about the options in the prompt.
The hashes are long, but it is largefile hashes and it would thus be confusing
to shorten them.
Before it tried to explain the exact situation when merging moved largefiles.
That do not happen for normal merges and is not more relevant for largefiles
than for normal files. It is unneeded complexity - remove it.
Before it just said 'nothing to rebase'.
Now 'if "base" is an empty set:
abort: empty "base" revision set - can't compute rebase set
If the set of changesets to rebase can't be found from "base", it will fail as
before but with more explanation of what the problem was.
The name of the "base" option is not obvious - it is more like "samples
identifying the branch to rebase". The error messages for problems with the
specified "base" value will use that term and might thus also not be obvious,
but at least they are consistent with the option name. The name "base" will not
be used if the base only was specified implicitly as the working directory
parent.
a8386b4c47b1 introduced splitstandin on all action filenames. It would however
crash on 'd' actions where the filename is None.
Fix that and add test coverage for that case.
Before this patch, if largefiles extension is enabled once in any of
target repositories, commands handling multiple repositories at a time
like below misunderstand that "largefiles" feature is supported also
in all other local repositories:
- clone/pull from or push to localhost
- recursive execution in subrepo tree
This patch registers "featuresetup()" into "featuresetupfuncs" of
"localrepository" to support "largefiles" features only in
repositories enabling largefiles extension, instead of adding
"largefiles" feature to class variable "_basesupported" of
"localrepository".
This patch also adds checking below to the largefiles specific class
derived from "localrepository":
- push to localhost: whether features supported in the local(= dst)
repository satisfies ones required in the remote(= src)
This can prevent useless looking up in the remote repository, when
supported and required features are mismatched: "push()" of
"localrepository" also checks it, but it is executed after looking up
in the remote.
After 08202d1ef738 I see:
$ hg id -q
largefiles: repo method 'commit' appears to have already been wrapped by another extension: largefiles may behave incorrectly
largefiles: repo method 'push' appears to have already been wrapped by another extension: largefiles may behave incorrectly
3bd0c95ec1bf
The warning is bad:
* The message gives no hint what the problem is and how it can be resolved.
The message is useless.
* Largefiles do have its share of problems, but I don't think I ever have seen
a problem where this warning would have helped. The 'may' in the warning
seems like an exaggeration of the risk. Having largefiles enabled in
combination with for instance mq, hggit and hgsubversion causes a warning
(depending on the configuration order) but do not cause problems. Extensions
might of course be incompatible, but they can be that in many other ways.
The check and the message are incorrect.
It would thus be better to remove the check and the warning completely.
Before 08202d1ef738 the check always failed. That change made the check work
more like intended ... but the intention was wrong. This change will thus also
back that change out.
This avoids a lot of expensive roundtrips to remote repositories ... but might
be slightly slower for local operations.
This will also change some aborts on missing files to warnings. That will in
some situations make it possible to continue working on a repository with
missing largefiles.
This goes a step further than 974959d637b7 and backs out the unreleased
--cache-largefiles option. The same can be achieved with --lfrev heads(pulled()) and
we shouldn't introduce unnecessary command line options.
The revset will be evaluated after the changesets has been pulled, and missing
largefiles from matching revisions will be pulled to the local caches.
This in combination with revsets will make it possible to specify different
strategies for pulling largefiles.
The revset expressions used for this option might be quite complex and will
probably be most useful from scripts or an alias ... but less complicated than
configuring hooks.
We were calling back to the original commands.cat from inside the walk loop
that handled and filtered out largefiles. That did however happen with file
paths relative to repo root and the original cat would fail when it applied its
own walk and match on top of that.
Instead we now duplicate and modify the code from commands.cat and patch it to
handle both normal and largefiles.
A change in test output shows that this also makes the exit code with
largefiles consistent with the normal one in the case where one of several
specified files are missing.
This also fixes the combination of --output and largefiles.
Before this patch, repo wrapping detection in "reposetup()" of
largefiles can detect only limited repo wrapping: replacing target
functions by another one named as "wrap".
So, it can't detect repo wrapping even in recommended style: replacing
"__class__" of repo by derived class.
This patch can detect repo wrapping in both styles below:
- replacing "__class__" of repo by derived class (recommended style):
class derived(repo.__class__):
def push(self, *args, **kwargs):
return super(derived, self).push(*args, **kwargs)
repo.__class__ = derived
- replacing function of repo by another one (not recommended style):
orgpush = repo.push
def push(*args, **kwargs):
return orgpush(*args, **kwargs)
repo.push = push
Largefiles can easily become missing - for example if it simply isn't available
or the download fail. It might even be convenient to be able to work that way
in some cases.
But commiting missing largefiles as if they had been 'hg remove'd is plain wrong.
Test output is changed in a case where one revision was pulled, but because of
the off-by-one error it thought that 0 revisions were pulled ... and because of
another bug it thus (tried to) fetch largefiles for all revisions.
After this change it no longer reports failure when it failed while trying to
fetch largefiles it shouldn't fetch. Largefiles that it shouldn't fetch but
managed to fetch anyway will now correctly be missing later on.
This change thus resolves some of unexplained test output introduced in
8664d9900884.
After discussion, we've agreed that largefiles for newly pulled heads should
not be cached by default. The use case for this is using largefiles repos
with multiple remote servers (and therefore multiple remote largefiles caches),
where users will be pulling from non-default locations on a regular basis. We
think this use case will be significantly less common than the use case where
all largefiles are stored on the same central server, so the default should be
no caching.
The old behavior can be obtained by passing the --cache-largefiles flag to
pull.
79f69be29aed introduced a crash when cloning a url without path - where
util.url().path would be None.
This None will now be handled as ''. clone will thus abort with 'repository /
not found' as before.