This replaces the grand unified action list that had multiple action types as
tuples in one big list. That list was iterated multiple times just to find
actions of a specific type. This data model also made some code more
convoluted than necessary.
Instead we now store actions as a tuple of lists. Using multiple lists gives a
bit of cut'n'pasted code but also enables other optimizations.
This patch uses 'if True:' to preserve indentations and help reviewing. It also
limits the number of conflicts with other pending patches. It can trivially be
cleaned up later.
Adds a labels function parameter to all the functions between merge.update and
filemerge.filemerge. This will allow commands like rebase to specify custom
marker labels.
When invoked from another directory, the matchers m._cwd will be the absolute
path. The code for calculating relative path to .hglf did not consider that and
log would fail with weird errors and paths.
For now, just don't do any largefile magic when invoked from other directories.
Log for largefiles was failing for graph log since it was overriding match
instead of matchandpats.
[Mads Kiilerich modified this patch to address his review comments and ended up
rewriting/removing most of it.]
cat of a standin would silently fail.
The use of standins is mostly an implementation detail, but it is already a bit
leaking. Being able to see the content of standins might be convenient for
debugging.
A .orig of a standin after the update do that a .orig of the actual largefile
is created. The .orig standin was however never removed again and the largefile
.orig was thus overwritten again and again.
The fix: remove the standin .orig when it is used.
Before this patch, "hg outgoing" invokes "findcommonoutgoing()" not
only in "commands.outgoing()" but also in
"overrides.overrideoutgoing()" (via "getoutgoinglfiles()"), when
largefiles is enabled. The latter is redundant.
This patch uses "outgoinghooks" to avoid redundant outgoing check.
Newly introduced function "overrides.outgoinghook()" is registered
into "outgoinghooks" to get the result of outgoing check in
"commands.outgoing()".
It invokes "lfutil.getlfilestoupload()" directly with the result of
outgoing check to avoid redundant outgoing check in
"getoutgoinglfiles()": "sort()" is needed, because
"lfutil.getlfilestoupload()" doesn't sort the result of it.
This patch also omits "if toupload is None" ("No remote repo") case,
because failure of looking remote repository up should raise exception
in "commands.outgoing()" before invocation of "outgoinghooks".
Newly added "hg outgoing --large --graph" tests examine
"outgoinghooks" invocations in "hg outgoing --graph" code path.
Before this patch, "hg summary --remote --large" invokes
"findcommonoutgoing()" not only in "commands.summary()" but also in
"overrides.overridesummary()" (via "getoutgoinglfiles()"). The latter
is redundant.
This patch uses "summaryremotehooks" to avoid redundant outgoing check.
Newly introduced function "overrides.summaryremotehook()" is
registered into "summaryremotehooks" to get the result of outgoing
check in "commands.summary()".
It invokes "lfutil.getlfilestoupload()" directly with the result of
outgoing check to avoid redundant outgoing check in
"getoutgoinglfiles()".
Before this patch, "hg push" invokes "findcommonoutgoing()" not only
in "exchange.push()" but also in "lfilesrepo.push()", when largefiles
is enabled. The latter is redundant.
This patch registers own "prepushoutgoinghook" function into
"prepushoutgoinghooks" of "localrepository" to reuse
"findcommonoutgoing()" result.
"prepushoutgoinghook" omits "changelog.nodesbetween()" invocation,
because "findcommonoutgoing()" invocation in "exchange.push()" takes
"onlyheads" argument and it considers "nodesbetween()".
Before this patch, "overrides.getoutgoinglfiles()" (called by
"overrideoutgoing()" and "overridesummary()") and "lfilesrepo.push()"
implement similar logic to get outgoing largefiles separately.
This patch centralizes the logic to get outgoing largefiles in
"lfutil.getlfilestoupload()".
"lfutil.getlfilestoupload()" takes "addfunc" argument, because each
callers need different information (and it is useful for enhancement
in the future).
- "overrides.getoutgoinglfiles()" needs only filenames
- "lfilesrepo.push()" needs only hashes of largefiles
Before this patch, "contrib/check-code.py" can't detect these
problems, because the regexp pattern to detect "% inside _()" doesn't
suppose the case that the format string and "%" aren't placed in the
same line.
This patch replaces "\s" in that regexp pattern with "[ \t\n]" to
detect "% inside _()" problems in such case.
"[\s\n]" can't be used in this purpose, because "\s" is automatically
replaced with "[ \t]" by "_preparepats()" and "\s" in "[]" causes
nested "[]" unexpectedly.
Since changeset a8955c4d9ef5, "reposetup()" of each extensions is
invoked only on repositories enabling corresponded extensions.
This causes that largefiles specific interactions between the
repository enabling largefiles locally and remote (wire) peer fail,
because there is no way to know whether largefiles is enabled on the
remote repository behind the wire peer, and largefiles specific
"wireproto functions" are not given to any wire peers.
To avoid this problem, largefiles should be enabled in wider scope
than each repositories (e.g. user-wide "${HOME}/.hgrc").
This patch introduces "wirepeersetupfuncs" to setup wire peer by
extensions already enabled. Functions registered into
"wirepeersetupfuncs" are invoked for all wire peers.
This patch uses plain list instead of "util.hooks" for
"wirepeersetupfuncs", because the former allows to control order of
function invocation by order of extension enabling: it may be useful
for workaround of problems with combination of enabled extensions
This should correct an earlier couple of bad merges (5433856b2558 and
596960a4ad0d, now pruned) that accidentally brought in a change that had
been marked obsolete (244ac996a821).
Some extensions set configuration settings that showed up in 'hg showconfig
--debug' with 'none' as source. That was confusing.
Instead, they will now tell which extension they come from.
This change tries to be consistent and specify a source everywhere - also where
it perhaps is less relevant.
The largefile hashes are mostly an implementation detail, but they are "leaked"
in several places anyway, and showing the hashes is better than not giving the
user any information about the options in the prompt.
The hashes are long, but it is largefile hashes and it would thus be confusing
to shorten them.
Before it tried to explain the exact situation when merging moved largefiles.
That do not happen for normal merges and is not more relevant for largefiles
than for normal files. It is unneeded complexity - remove it.
Since the localrepositoyry.push() method in mercurial/localrepo.py is defined
this way:
def push(self, remote, force=False, revs=None, newbranch=False):
it is better for largefiles to call push() on the super class with proper
kwargs to respect the API.
This will avoid breaking other extensions overriding the push method this way:
def push(self, remote, force=False, **kwargs):
a8386b4c47b1 introduced splitstandin on all action filenames. It would however
crash on 'd' actions where the filename is None.
Fix that and add test coverage for that case.
An update would try to fetch any missing largefiles after having updated normal
files and standins. That could fail or be interrupted and would leave the
working directory in a state where the largefiles not only were missing but
also were scheduled for remove ... and where the old largefile was left in
place.
Instead we now remove old largefiles before starting to download and update
missing largefiles.
Prompts like
foo has been turned into a largefile
use (l)argefile or keep as (n)ormal file?
was not as clear as the usual prompts that use 'remote' or 'local' to explain
what happened on which side ... especially not when used to the normal prompts.
"as" could also indicate that it would be possible to take the content of the
largefile and somehow put it into the normal file. It could make it more clear
that it was a choice between one side or the other.
For consistency we will now phrase it like:
remote turned local normal file f into a largefile
use (l)argefile or keep (n)ormal file?
We used to get like:
$ hg up -r 2
foo has been turned into a normal file
keep as (l)argefile or use (n)ormal file? l
getting changed largefiles
0 largefiles updated, 0 removed
0 files updated, 0 files merged, 2 files removed, 0 files unresolved
$ cat foo
cat: foo: No such file or directory
[1]
- which both asked the wrong question and did the wrong thing.
Instead, skip this conflict resolution when the local conflicting file has been
scheduled for removal and there thus is no conflict.
Before this patch, each feature setup functions for localrepository
class should examine whether corresponding extension is enabled or not
by themselves.
This patch invokes only feature setup functions defined in module of
enabled extensions, and it makes implementation of feature setup
functions easier and simpler.
Before the hack would replace 'heads' with 'lheads' no matter where it occured
in a batch command string.
Instead we will use a regexp to more carefully only match the 'heads' commands.
Before this patch, if largefiles extension is enabled once in any of
target repositories, commands handling multiple repositories at a time
like below misunderstand that "largefiles" feature is supported also
in all other local repositories:
- clone/pull from or push to localhost
- recursive execution in subrepo tree
This patch registers "featuresetup()" into "featuresetupfuncs" of
"localrepository" to support "largefiles" features only in
repositories enabling largefiles extension, instead of adding
"largefiles" feature to class variable "_basesupported" of
"localrepository".
This patch also adds checking below to the largefiles specific class
derived from "localrepository":
- push to localhost: whether features supported in the local(= dst)
repository satisfies ones required in the remote(= src)
This can prevent useless looking up in the remote repository, when
supported and required features are mismatched: "push()" of
"localrepository" also checks it, but it is executed after looking up
in the remote.
Before this patch, all localrepositories support same features,
because supported features are managed by the class variable
"supported" of "localrepository".
For example, "largefiles" feature provided by largefiles extension is
recognized as supported, by adding the feature name to "supported" of
"localrepository".
So, commands handling multiple repositories at a time like below
misunderstand that such features are supported also in repositories
not enabling corresponded extensions:
- clone/pull from or push to localhost
- recursive execution in subrepo tree
"reposetup()" can't be used to fix this problem, because it is invoked
after checking whether supported features satisfy ones required in the
target repository.
So, this patch adds the set object named as "featuresetupfuncs" to
"localrepository" to manage hook functions to setup supported features
of each repositories.
If any functions are added to "featuresetupfuncs", they are invoked,
and information about supported features is managed in each
repositories individually.
This patch also adds checking below:
- pull from localhost: whether features supported in the local(= dst)
repository satisfies ones required in the remote(= src)
- push to localhost: whether features supported in the remote(= dst)
repository satisfies ones required in the local(= src)
Managing supported features by the class variable means that there is
no difference of supported features between each instances of
"localrepository" in the same Python process, so such checking is not
needed before this patch.
Even with this patch, if intermediate bundlefile is used as pulling
source, pulling indirectly from the remote repository, which requires
features more than ones supported in the local, can't be prevented,
because bundlefile has no information about "required features" in it.
The refactoring of all the context objects allows us to simply pass a basectx
to the __new__ constructor and have it return the same object without
allocating new memory.
This also removes the need to import the context module.
Before this patch, largefiles extension checks unknown files in the
working directory always case sensitively.
This causes failure in updating from the revision X consisting of
'.hglf/A' (and "A" implicitly) to the revision Y consisting of 'a'
(not ".hglf/A") on case insensitive filesystem, because "A" in the
working directory is treated as colliding against and different from
'a' on the revision Y.
This patch uses "repo.dirstate.normalize()" to check unknown files
with case awareness of the filesystem.
Before this patch, largefiles extension always unlinks largefiles
untracked on the target context in merging/updating after updating
working directory.
For example, it is assumed that the revision X consists of ".hglf/A"
(and "A" implicitly) and revision Y consists of "a" (not ".hglf/A").
In the case of updating from X to Y, largefiles extension tries to
unlink "A" after updating "a" in working directory. This causes
unexpected unlinking "a" on the case insensitive filesystem.
This patch checks existence of the file in the working context with
case awareness of the filesystem to prevent from such unexpected
unlinking.
"lfcommands._updatelfile()" also unlinks target file in the case
"largefile is tracked in the target context, but fails to be fetched".
This patch doesn't apply "repo.dirstate.normalize()" in this case,
because it should be already ensured in the manifest merging that
there is no normal file colliding against any largefiles.
After 08202d1ef738 I see:
$ hg id -q
largefiles: repo method 'commit' appears to have already been wrapped by another extension: largefiles may behave incorrectly
largefiles: repo method 'push' appears to have already been wrapped by another extension: largefiles may behave incorrectly
3bd0c95ec1bf
The warning is bad:
* The message gives no hint what the problem is and how it can be resolved.
The message is useless.
* Largefiles do have its share of problems, but I don't think I ever have seen
a problem where this warning would have helped. The 'may' in the warning
seems like an exaggeration of the risk. Having largefiles enabled in
combination with for instance mq, hggit and hgsubversion causes a warning
(depending on the configuration order) but do not cause problems. Extensions
might of course be incompatible, but they can be that in many other ways.
The check and the message are incorrect.
It would thus be better to remove the check and the warning completely.
Before 08202d1ef738 the check always failed. That change made the check work
more like intended ... but the intention was wrong. This change will thus also
back that change out.
This avoids a lot of expensive roundtrips to remote repositories ... but might
be slightly slower for local operations.
This will also change some aborts on missing files to warnings. That will in
some situations make it possible to continue working on a repository with
missing largefiles.
We did read the exactly the right number of bytes from the response body. But
if the response came in chunked encoding then that meant that the HTTP layer
still hadn't read the last 0-sized chunk and expected the app layer to read
more data from the stream. The app layer was however happy and sent another
request which had to be sent on another HTTP connection while the old one was
lingering until some other event closed the connection.
Adding an extra read where we expect to hit the end of file makes the HTTP
connection ready for reuse. This thus plugs a real socket leak.
To distinguish HTTP from SSH we look at self's class, just like it is done in
putlfile.
Avoid the intermediate limitreader and filechunkiter between getlfile and
copyandhash - return the right protocol and put the complexity where it better
can be managed.
This goes a step further than 974959d637b7 and backs out the unreleased
--cache-largefiles option. The same can be achieved with --lfrev heads(pulled()) and
we shouldn't introduce unnecessary command line options.
The revset will be evaluated after the changesets has been pulled, and missing
largefiles from matching revisions will be pulled to the local caches.
This in combination with revsets will make it possible to specify different
strategies for pulling largefiles.
The revset expressions used for this option might be quite complex and will
probably be most useful from scripts or an alias ... but less complicated than
configuring hooks.
We were calling back to the original commands.cat from inside the walk loop
that handled and filtered out largefiles. That did however happen with file
paths relative to repo root and the original cat would fail when it applied its
own walk and match on top of that.
Instead we now duplicate and modify the code from commands.cat and patch it to
handle both normal and largefiles.
A change in test output shows that this also makes the exit code with
largefiles consistent with the normal one in the case where one of several
specified files are missing.
This also fixes the combination of --output and largefiles.
Before this patch, repo wrapping detection in "reposetup()" of
largefiles can detect only limited repo wrapping: replacing target
functions by another one named as "wrap".
So, it can't detect repo wrapping even in recommended style: replacing
"__class__" of repo by derived class.
This patch can detect repo wrapping in both styles below:
- replacing "__class__" of repo by derived class (recommended style):
class derived(repo.__class__):
def push(self, *args, **kwargs):
return super(derived, self).push(*args, **kwargs)
repo.__class__ = derived
- replacing function of repo by another one (not recommended style):
orgpush = repo.push
def push(*args, **kwargs):
return orgpush(*args, **kwargs)
repo.push = push
These names were found using Cython; I was completely puzzled until
I searched the rest of the tree. It's icky to mess with another
module's namespace, but ickier yet to do so without a comment :-)
Upcoming patches will speed dirstate.walk up by not filtering based on the
match function when match.always() is True. For that to work, match.always()
needs to be accurate. Previously it wasn't so for largefiles.
lfutil.splitstandin(f) can be None, and we query the dirstate for that without
checking if it is. This will cause problems with the upcoming move to critbit-
based dicts, since they only support strings as keys.
When a series of commits first adds a file and then removes it,
hg rebase --collapse prompts whether to keep the file or delete it. This is
due to it reusing the branch merge code. In a noninteractive terminal it
defaults to keeping the file, which results in a collapsed commit that is
has a file that should be deleted. This bug resulted in developers accidentally
commiting unintentional changes to our repo twice today, so it's fairly
important to get fixed.
This change allows rebase --collapse to tell the merge code to accept the
latest version every time without prompting.
Adds a test as well.
cachelfiles jumped through loops to handle merges and modified files ... but it
did apparently no longer have a valid reason to do so. It should just always
make sure that the largefiles referenced from the standins are present - no
matter which actual largefile is stored in the working directory. If there is
no standin then there is nothing to fetch.
The old code usually verified the hash of all largefiles every time this
function was invoked - for examply by 'update'.
This change makes a trivial noop update 5-10 seconds faster on our repo (with
the other 50% spent doing another unnecessary hashing of all largefiles).
Situations where a largefile for some reason wasn't available sometimes caused
wrong largefile content and state. It has mostly been seen when interrupting
download of largefiles ... and when introducing programming errors.
Instead we now make sure to delete the old and wrong largefile. A missing file
is a well-known error condition and much more reasonable way to handle the
situation.
Largefiles can easily become missing - for example if it simply isn't available
or the download fail. It might even be convenient to be able to work that way
in some cases.
But commiting missing largefiles as if they had been 'hg remove'd is plain wrong.
Looking for a (potentially empty) directory was not reliable - both because it
is a reasonable assumption that empty directories can be removed and because it
wasn't created in all cases ... such as when pulling to an existing repository.
Test output is changed in a case where one revision was pulled, but because of
the off-by-one error it thought that 0 revisions were pulled ... and because of
another bug it thus (tried to) fetch largefiles for all revisions.
After this change it no longer reports failure when it failed while trying to
fetch largefiles it shouldn't fetch. Largefiles that it shouldn't fetch but
managed to fetch anyway will now correctly be missing later on.
This change thus resolves some of unexplained test output introduced in
8664d9900884.
After discussion, we've agreed that largefiles for newly pulled heads should
not be cached by default. The use case for this is using largefiles repos
with multiple remote servers (and therefore multiple remote largefiles caches),
where users will be pulling from non-default locations on a regular basis. We
think this use case will be significantly less common than the use case where
all largefiles are stored on the same central server, so the default should be
no caching.
The old behavior can be obtained by passing the --cache-largefiles flag to
pull.
largefiles tried to create a peer directly with the specified url. That caused
abort: unsupported URL component: "..."
if a revision was specified in the url.
The branch name do not matter for largefiles' use of remote peers. Largefiles
will be shared among all branches anyway.
When changesets referencing largefiles are pushed then the corresponding
largefiles will be pushed too - unless the target already has them. The client
will use statlfile to make sure it only sends largefiles that the target
doesn't have. The server would however on every statlfile check that the
content of the largefile had the expected hash. What should be cheap thus
became an expensive operation that trashed the disk and the cache.
Largefile hashes are already checked by putlfile before being stored on the
server. A server should thus be able to keep its largefile store free of
errors - even more than it can keep revlogs free of errors. Verification should
happen when running 'hg verify' locally on the server. Rehashing every
largefile on every remote stat is too expensive.
Clients will also stat lfiles before downloading them. When the server verified
the hash in stat it meant that it had to read the file twice to serve it.
With this change the server will assume its own hashes are ok without checking
them on every statlfile.
Some consequences of this change:
- in case of server side corruption the problem will be detected by the
existing check on the client side - not on server side
- clients that could upload an uncorrupted largefile when pushing will no
longer magically heal the server (and break hardlinks) - a client will now
only upload its uncorrupted files after the corrupted file has been removed
on the server side
- client side verify will no longer report corruption in files it doesn't have
(Issue3123 discussed related problems - and how they have been fixed.)
6fb54510b150 introduced batching of statlfile, but not all codepaths got
converted.
_getfile gave _stat garbage and got garbage back. The garbage didn't match the
expected error codes and was thus interpreted as success. It could thus end up
trying to fetch a largefile that didn't exist.
Instead we now pass _stat valid input and handle both correct and invalid
output correctly.
This makes the code work as intended ... but it would probably be better if it
didn't abort on missing largefiles, just like it happened to do before.
basestore.get uses util.atomictempfile when checking and receiving a new
largefile ... but the close/discard logic was too clever for largefiles.
Largefiles relied on being able to discard the file and thus prevent it from
being written to the store. That was however too brittle. lfutil.copyandhash
closes the infile after writing to it ... with a 'blecch' comment. The discard
was thus a silent noop, and as a result of that corruption would be detected
... and then the corrupted files would be used anyway.
Instead we now use a tmp file and rename or unlink it after validating it.
A better solution should be implemented ... but not now.
6fb54510b150 introduced batching of statlfile, but not all codepaths got
converted.
'hg verify' with a remotestore could thus crash with
TypeError: 'builtin_function_or_method' object is not iterable
Also, the 'hash' variable was used without assigning to it. Don't use variable
names that collide with Python built-in functions. Instead we use 'expecthash'
as in localstore.
The tests for this issue covers an untested area. The tests happens to also
reveal incorrect attempts at getting non-existing largefiles, bad server side
handling of that, and corruption issues - all to be fixed later.
Override updaterepo() instead of individual methods that may not be called for
each subrepo. Add test.
Based on patch from Matt Harbison.
Changes the order of update-related messages (now largefiles comes before the
global status).
In some cases, caching largefiles may take a long time (if the user has
pulled a lot of new heads). This patch makes it more clear what is happening,
by showing the number of heads we are caching largefiles for.
The slightly obscure --lfa and --lfc only worked as modifiers to --large and
could be combined. The documentation was however not clear what they did.
Instead they now imply --large and the description is updated.
Show messages at a point where the actions have been sorted, thus preparing for
backout of 14f4258e3526.
This makes manifestmerge more of a silent operation, just like 'copies' is.
Indent 'preserving' messages to make them subordinate to the action logging so
they fit in the new context. (The 'preserving' messages are quite redundant and
could also be removed completely.)
Add sorted() in places found by testing with PYTHONHASHSEED=random and code
inspection.
An alternative to sprinkling sorted() all over would be to change substate to a
custom dict with sorted iterators...
A situation with this case could happen after interrupting an update. Update
would fail with:
abort: No such file or directory: $TESTTMP/f/.hglf/sub2/large6
Update from a merge without using clean is not possible anyway, so this patch
takes a step in the right direction so it gets as far as reporting the right
error.
Largefiles update would try to copy f to f.orig if there was a .hglf/f.orig .
That is in many many ways very very wrong, but it also caused an abort if f
didn't exist.
Now it only tries to copy f if it exists.
Problem: 'hg status' with largefiles enabled would walk through all the files
that .hgignore said should be ignored. That made it slow if a lot of files were
.hgignored or the cache was cold.
It seems like there was a reason to this, but other improvements has rendered
this unnecessary.
Solution: .hgignore is now only ignored when that is requested (--ignore).
This is a minimal 'stable' change. There is room for other improvement.
Largefiles revert do for some reason have two lfdirstates and lfdirstatestatus
invocations in one function. The result from the first lfdirstate check was
however not written back to the lfdirstate, and some files was thus checked
twice.
Problem: 'hg status' kept checking largefiles with an unknown state until some
other command wrote the updated dirstate.
Solution: Add missing lfdirstate.write().
If we pass a directory to commit whose only commitable files
are largefiles, the core commit code aborts before finding
the largefiles.
So we do the following:
For directories that only have largefiles as matches,
we explicitly add the largefiles to the matchlist and remove
the directory.
In other cases, we leave the match list unmodified.
Especially the "no default or default-push path set in hgrc" was often very
misleading and didn't give any hint where it actually was looking.
A long error messages is better than several multi-line messages.
Before this patch, largefiles extension uses "largefiles: No remote
repo" message not only for "outgoing" as status report, but also for
"summary" as summarized information.
This sharing prevents message translators from inserting white spaces
between "largefiles:" and "No remote repo" in translated message to
align column position of summarized information.
This patch changes output of largefiles for summary to distinguish
from one for outgoing.
This patch puts "no remote repo" into parentheses, because this is not
summarized information.
Previously, if one or more largefiles for a repo being converted were not in the
usercache, the convert would abort with a reference to the largefile being
missing (as opposed to the previous patch, where the standin was referenced as
missing). This is because commitctx() tries to copy all largefiles to the
local store, first from the user cache, and if the file isn't found there, from
the working directory. No files will exist in the working directory during a
convert, however. It is not sufficient to force the source repo to be local
before proceeding, because clone and pull do not download largefiles by default.
This is slightly less than ideal because while the conversion will now complete,
it won't be possible to update to revs with missing largefiles unless the user
intervenes manually, because there is no default path pointing back to the
source repo. Ideally these files would be cached during the conversion.
This check could have been done in reposetup.commitctx() instead, but this
ensures the local store directory is created, which is necessary to enable the
standin matcher.
The rm -> 'rm -f' change in the test is to temporarily suppress an error
clearing the cache- as noted, the cache is is not repopulated during convert.
When that is fixed, this can be changed back and the verification errors will
disappear too.
When the rev isn't specified, the standin for the working copy gets read. But
convert doesn't update the working copy for each cset it processes, so there is
no standin and the 'hg convert' would abort complaining about the standin being
missing.
Note that if the largefile is not in the user cache, 'hg convert' complains
about the largefile itself missing from the destination repo.
For some yet-unkown reason, largefile status does not work on
repoproxy. As status is not affected by filtering, we run it unfiltered.
Na'Tosha Bard's view on this issue:
"but, well, largefiles status is kind of an unholy piece of code"
This only applies to downloading largefiles, and only when no source for the
pull is explicitly provided. The repository itself was properly being pulled
via 'default' previously.
Using --all-largefiles is not necessary on a bare pull to test this (this
existing test is merely a convenience), but it is required to test pulling on
the rebase path.
Note that the errors generated in the --rebase case are because the repo
specified doesn't have the largefiles in its cache (though they are in the user
cache), so the errors are misleading. Specifying --all-largefiles when cloning
to 'b' fixes this, but instead of errors, it reports caching only 5 largefiles
instead of the 9 that come up missing. Likely this is because the largefile
download procedure tries to download missing files for each rev, and some of the
files have standins in more than one rev that gets pulled.
Before this patch, when no files to upload, "hg outgoing --large" and
"hg summary --large" show "no remote repo", even though valid remote
repository is specified.
It is because that "getoutgoinglfiles()" returns None, not only if no
valid remote repository is specified, but also if no files to upload.
This patch makes "getoutgoinglfiles()" return empty list when no files
to upload, and makes largefiles show "no files to upload" message at
that time.
This patch doesn't test "if toupload is None" route in
"overrideoutgoing()", because this route is not executed unless remote
repository becomes inaccessible just before largefiles specific
processing: successful execution of "orig()" means that at least one
of "default", "default-push" or dest is valid one, and that
"getoutgoinglfiles()" never returns None in such cases.
At "hg summary --large" invocation, this patch shows message below:
largefiles: (no files to upload)
This follows the message shown by "hg summary" with MQ:
mq: (empty queue)
The standin matcher only works if the .hglf directory exists (and it won't exist
if 'clone -U' is used, unless --all-largefiles is also specified). Since not
even 'update -r null' will get rid of the standin directory, this ensures that
the standin directory always exists if the repo has the 'largefiles'
requirement. This requirement is only set after a largefile is committed, so
these directories will not be created for repos that have the extension enabled
but have not committed a largefile.
With the standin directory in place, 'lfconvert --to-normal' will now be able to
download the required largefiles when converting a repo that was created with
'clone -U', and whose files are not in the usercache.
The downloadlfiles command could probably be put inside the 'largefiles'
requirement conditional too, but given that the user specified --all-largefiles,
there is likely an expectation to print out the number of largefiles downloaded,
even if it is 0.
The largefile may be missing for various reasons, including that a remote
repository was cloned without the --all-largefiles option. Therefore, it seems
reasonable to attempt to download the missing files and failing that, abort and
indicate the affected file and revision so the user can manually fix the
problem.
Most method added (or overwritten) to repo by largefile works on `repo`
instead of `self`. This currently works without trouble because `self` and
`repo` are likely the same. However this is semantically dubious and this may
cause issue for filtering. `self` may be proxy object different from the `repo`
one.
This changeset fix that and use `self` when applicable.
It looks like this method missed the updates in 7a899bd0f9c0 (which changed the
preferred discovery method from findcommonincoming() to findcommonoutgoing()),
and 8b2938386599 (which rolls up the outgoing lists into a single object).
Maintain a whitelist of commands to infer the repo for instead. The whitelist
contains those commands that take file(s) in the working dir as arguments.
This allows the wrapped command's validation code to run (which is currently
only to ensure 'noupdate' and 'updaterev' aren't both specified), the
copy/pasted unpacking of hg.clone() args to be removed, and any future changes
to the base command (however unlikely) to be inherited by largefiles.
Unfortunately, the command override can't be swapped entirely for an hg.clone()
override because the extra --all-largefiles arg needs to be injected. It also
isn't enough to call the wrapped clone command and leave the caching code after
it, because the file caching code needs access to the destination repo, which is
only available from hg.clone(). An alternative would be to use the dest path in
the clone command override to re-obtain a reference to the repo.
A slight deviation from the regular hg.clone() function is that the repo is NOT
deleted if the caching fails, but that was also the previous behavior. Maybe it
should for consistency?
A status message is output if hg.clone() determines the default destination
because None was provided. The previous code never passed None to hg.clone().
Previously, tip would be checked out regardless of the -u or -U parameter. I'm
not sure what the 'required for successful walkchangerevs' comment meant, but it
appears to reference code which has since moved to downloadlfiles() in
5c06bddf85b8. Perhaps it was to force caching when the -U parameter is given?
The price of this change is that -U --all-largefiles won't cache anything. That
will be fixed next.
Note that X + Y in the 'X largefiles updated, n removed' and 'Y additional
largefiles cached' lines do not add up to the same values in these tests, but
all of the largefiles have been downloaded. The reason being that several
largefiles have the same content (eb7338044 is pointed to by sub/large2, large3
and sub/large4). In the 'clone -u 1' operation, this largefile is cached to
populate the working directory, even without --all-largefiles. That means the
file isn't downloaded again and cached in the rev where large3 and sub/large4
both point to this file. Downloading that one file in that one rev seems to be
counted twice with 'clone -u 0'. (Maybe it is also being downloaded twice?)
Fix:
* unnecessary use of map and lambda
* comment instead of readable code
* comparing None with 0
* and...or anti-pattern for 'ternary if' with very clever handling of i=None
Previously, even if a file was added with --large, 'hg addremove' or 'hg ci -A'
would add all files (including the previously added large files) as normal
files. Only after a commit where a file was added with --large would subsequent
adds or 'ci -A' take into account the minsize or the pattern configuration.
This change more closely follows the help for largefiles, which mentions that
'add --large' is required to enable the configuration, but doesn't mention the
previously required commit.
Also, if 'hg add --large' was performed and then 'hg forget <file>' (both before
a largefile enabling commit), the forget command would error out saying
'.hglf/<file> not tracked'. This is also fixed.
This reports that a repo is largefiles enabled as soon as a file is added with
--large, which enables 'add', 'addremove' and 'ci -A' to honor the config
settings before the first commit. Note that prior to the next commit, if all
largefiles are forgotten, the repository goes back to reporting the repo as not
largefiles enabled.
It makes no sense to handle this by adding a --large option to 'addremove',
because then it would also be needed for 'commit', but only when '-A' is
specified. While this gets around the awkwardness of having to add a largefile,
then commit it, and then addremove the other files when importing an existing
codebase (and preserving that extra commit in permanent history), it does still
require finding and manually adding one of the files as --large. Therefore it
is probably desirable to have a --large option for init as well.
Previous to this, 'commit -A' would add as normal files, files that were already
committed as largefiles, resulting in files being listed twice by 'status -A'.
It also missed when (only) a largefile was deleted, even though status reported
it as '!'. This also has the side effect of properly reporting the state of the
affected largefiles in the post commit hook after a remove that also affected a
normal file (the largefiles used to be 'R', now are properly absent).
Since scmutil.addremove() is called both by the ui command (after some trivial
argument validation) and during the commit process when -A is specified, it
seems like a more appropriate method to wrap than the addremove command.
Currently, a repo is only enabled to use largefiles after an add that explicitly
identifies some file as large, and a subsequent commit. Therefore, this patch
only changes behavior after such a largefile enabling commit.
Note that in the test, if the final commit had a '-v', 'removing large8' would
be printed twice. Both of these originate in removelargefiles(). The first
print is in verbose mode after traversing remove + forget, the second is because
the '_isaddremove' attr is set and 'after' is not.
Changeset 9467184ce7e7 broke 'outgoing --large'
...
File "hgext\largefiles\lfutil.py", line 56, in findoutgoing
remote.local(), force=force)
File "mercurial\discovery.py", line 31, in findcommonincoming
if not remote.capable('getbundle'):
AttributeError: 'lfilesrepo' object has no attribute 'capable'
This restores the previous functionality, though I'm not sure if there's a
better way to do this- that changeset introduces a hunk in debugdiscovery that
does this:
if not util.safehasattr(remote, 'branches'):
# enable in-client legacy support
remote = localrepo.locallegacypeer(remote.local())
Is there a legacy support issue here too?
Previously, a copy or a move of a largefile only worked if the cwd was the root
of the repository. The first issue was that the destination path passed to
os.mkdirs() chopped the absolute path to the standin after '.hglf/', which
essentially created a path relative to the repository root. Similarly, the
second issue was that the source and dest paths for copyfile() were relative to
the repo root. This converts these three paths to absolute paths.
Some notable issues, regardless of the directory in which the cp/mv is executed:
1) The copy is not being recorded in lfdirstate, but it is in dirstate for the
standins. I'm not sure if this is by design (i.e. minimal info in lfdirstate).
2) status -C doesn't behave as expected. Using the testcase as an example:
# after mv + ci
$ hg status -C -v --rev '.^' # expected to see 'A' and ' ' lines too
R dira\dirb\largefile
$ hg status -C -v --rev '.^' foo/largefile
# no output # expected to see 'A' and ' ' lines only
$ hg status -C -v --rev '.^' foo/
# no output # expected to see 'A', ' ' and 'R' lines
$ hg status -C -v --rev '.^' ./ # expected to see 'A' and ' ' lines too
R dirb\largefile
$ hg status -C -v --rev '.^' ../.hglf/dira/foo/largefile
A ..\.hglf\dira\foo\largefile
..\.hglf\dira\dirb\largefile # no 'R' expected when new file is specified
$ hg status -C -v --rev '.^' ../.hglf # OK
A ..\.hglf\dira\foo\largefile
..\.hglf\dira\dirb\largefile
R ..\.hglf\dira\dirb\largefile
An easy way to force this (and cause a traceback) prior to the fix for 3507 was
$ touch large
$ hg add --large large
$ hg ci -m "add"
$ hg remove large
$ touch large
$ hg addremove --config largefiles.patterns=**large
This patch also detected (and corrected) a previous test where a standin got
added as a largefile (without a traceback).
The problem only occurred if a file was removed with 'hg rm' (as opposed to the
OS utilities), and then addremove was run before a commit. Both normal and
large files were affected.
Ensuring that the file exists prior to an lstat() for size seems like the Right
Thing. But oddly enough, the missing file that was causing lstat() to blow up
was a standin when a largefile was removed, which seems fishy, because a standin
should never be added as a largefile. I was then able to get a standin added as
a largefile (whose name is 'large') with
hg addremove --config largefiles.patterns=**large
which also causes a backtrace. That will be fixed next.
The example in comment #9 of the bug writeup must be run exactly- it was the
commit after the rm and prior to the addremove that screwed things up, because
that commit noticed that the largefile was missing, called drop(), and then the
original commit function did nothing (due to the file in the '!' state). The
addremove command properly put it into the 'R' state, but it remained stuck in
that state (because commit insisted 'nothing changed'). Without the commit
prior to addremove, the problem didn't occur.
Maybe this is an indication that lfdirstate needs to take a few more hints from
the regular dirstate, regardless of what _it_ thinks the state is- similar
inconsistency is probably still possible with this patch if the original commit
succeeds but the lfdirstate write fails.
If a file was missing, the missing list contained a path relative to the repo.
When building the matcher from that list, the file name ended up concatenated to
cwd, causing the command to abort with '<file> not under root'. This rebuilds
the missing list with paths relative to cwd.
This change separates peer implementations from the repository implementation.
localpeer currently is a simple pass-through to localrepository, except for
legacy calls, which have already been removed from localpeer. This ensures that
the local client code only uses the most modern peer API when talking to local
repos.
Peers have a .local() method which returns either None or the underlying
localrepository (or descendant thereof). Repos have a .peer() method to return
a freshly constructed localpeer. The latter is used by hg.peer(), and also to
allow folks to pass either a peer or a repo to some generic helper methods.
We might want to get rid of .peer() eventually.
The only user of locallegacypeer is debugdiscovery, which uses it to pose as a
pre-setdiscovery client. But we decided to leave the old API defined in
locallegacypeer for clarity and maybe for other uses in the future.
It might be nice to actually define the peer API directly in peer.py as stub
methods. One problem there is, however, that localpeer implements
lock/addchangegroup, whereas the true remote peers implement unbundle.
It might be desireable to get rid of this distinction eventually.
This introduces a peer method into all repository classes, which currently
simply returns self. It also changes hg.repository so it now raises an
exception if the supplied paths does not resolve to a localrepo or descendant.
Finally, all call sites are changed to use the peer and local methods as
appropriate, where peer is used whenever the code is dealing with a remote
repository (even if it's on local disk).
This speeds up status on a largefiles repo by synchronizing the largefiles dirstate to the
largefile's mtime upon update, preventing the files from coming back as "unsure" later,
requiring a check of the SHA1 sum.
This implements a part of issue 3386. It batches the request for the status of
all largefiles in the revisions that are about to be pushed into a single
request, instead of doing N separate requests.
In a real world test case, this change was verified to save 1,116 round-trips to
the server. It only requires a client-side change; it is backwards-compatible
with an older version of the server.
Add a match object to subrepo.archive(). This will allow the -X and -I
options to be honored inside subrepos when archiving. They formerly
only affect the top level repo.
largefiles and lfconvert do dirty hacks with dirstate, so to avoid writing that
as a side effect of the wlock release we clear dirstate first.
To avoid confusing lock validation algorithms in error situations we unlock
_before_ removing the target directory.
Previously, when updating, cachelfiles was called blindly on all largefiles
in the repository at the revision being updated to, despite the fact that
a list of which largefiles needs to be updated has already been collected. This
optimization constrains the cachelfiles call to only the largefiles that need
to be updated.
On a repository with about 80 largefiles, updating between two revisions that
only change one largefile goes from approximately 6.7 seconds to 3.3 seconds.
largefiles status implementation attemps to rewrite the input match objects to
match the "standins" as well as the regular files. When fixing the directories
listed in match.files(), if there was related standin entry, it was kept and
the original path discarded. But directories can appear both as regular and
standin entries.
Let R be a repo served by an hg daemon on a machine with an empty largefiles
cache. Pushing a largefiles repo to R will result in a no-such-file-or-directory
OSError because putlfile will attempt to create a temporary file in
R/.hg/largefiles, which does not yet exist.
This patch also adds a regression test for this scenario.
This is essentially a copy of largefile's override of archive() in the
archival class, adapted for overriding hgsubrepo's archive(). That
means decoding isn't taken into consideration, nor is .hg_archival.txt
generated (the same goes for regular subrepos). Unlike subrepos, but
consistent with largefile's handling of the top repo, ui.progress() is
*not* called. This should probably be refactored at some point, but
at least this generates the archives properly for now. Previously,
the standins were ignored and the largefiles were archived only for
the top level repo.
Long term, it would probably be most desirable to figure out how to
tweak archival's archive() if necessary such that largefiles doesn't
need to override it completely just to special case the translating of
standins to the real files. Largefiles will already return a context
with the true largefiles instead of the standins if lfilesrepo's
lfstatus is True- perhaps this can be leveraged?
The fix introduced in 3509b9cf8f86 was only partially successful. It is correct
to turn dirstate 'm' merge records into normal/dirty ones but copy records are
lost in the process. To adjust them as well, we need to look in the first
parent manifest to know which files were added and preserve only related
records. But the dirstate does not have access to changesets, the logic has to
moved at another level, in localrepo.
Summary and commit use dirty() to check the status of a subrepository,
so this overrides dirty() in the subrepo in the same manner as
status() to check the large files instead of their standins.
Previously, if only a large file was changed in a subrepo, summary in
the top level repo would not report the subrepo was dirty and commit
-S would report nothing changed. If any type of file was changed in
the top repo and only a large file in the subrepo, commit -S would not
commit the changes to the subrepo.
Wrapping the status command will only invoke overridestatus() and set
the lfstatus field for the top level repository. Wrapping the status
function is required to set the field on child repositories.
Previously, status -S would report large files in a subrepo as '?'
regardless of their actual states, and was inconsistent with what
status would report from within that subrepo.
This is a fix to largefiles so that 'hg cat' will work correctly when a
largefile is specified.
As per discussion on Issue 3352:
1) The file will be printed regardless if it is binary or large.
2) The file is downloaded if it is not readily available (not found in
the system cache), so that it can be printed. If the download fails,
then we abort.
original implementation queries whether specified pattern is related
or not to largefiles in target context by 'dirstate.__contains__()'.
but this can't recognize 'directory pattern' correctly, so this patch
uses 'dirstate.dirs()' for it.
this patch uses dirstate instead of lfdirstate in 'working' route
(second patch hunk for 'hgext/largefiles/reposetup.py'), because
'dirs()' information may be already built for dirstate but not yet for
lfdirstate at this point. this prevents lfdirstate from building up
and having 'dirs()' information.
original implementation queries whether specified pattern is related
or not to largefiles, to target context.
but changectx/workingctx returns False about relationship with files
marked as removed.
So, 'hg status' with 'file pattern' for removed file shows unexpected
warning message in below process:
1. 'tostandin()' returns non-STANDIN filename for removed file,
because changectx/workingctx returns False about relationship
with it
2. 'match.files()' contains non-STANDIN filename, which is already
removed from working directory
3. 'dirstate.walk()' invoked via 'localrepository.status()' treats
non-STANDIN filename as bad filename, because there is no entry
for it in dirstate: only STANDIN is managed in dirstate
4. 'dirstate.walk()' invokes 'match.bad()', which is defined in
'localrepository.status()' as 'bad()'
5. 'bad()' shows warning message for non-STANDIN, because it is
not related to source context: only STANDIN is related to it
this patch queries to dirstate instead of changectxt/workingctx,
because dirstate returns expected result for removed files.
'match.files()' is used by 'localrepository.status()' only in
'working' case, so this patched code also works correctly in
non-'working' case.
Before, a tempfile was used to create a temp file was created with 600
permissions and the uploaded data was written into it. This file was
then *copied* to .hg/largefiles/<hash>.
We now simply use atomictempfile to write the data to a temp file with
the right permissions and then rename that into place.
This replaces another use of tempfile with atomictempfile. The problem
with tempfile is that it creates files with 600 permissions instead of
respecting repo.store.createmode.
Before, the mode was copied from the file in the working copy. This is
inconsistent with how Mercurial normally creates files inside the .hg
folder. This can lead to a situation where you can read the files
under .hg/store but not under .hg/largefiles and so a clone will fail
if it needs to access a largefile.
current 'lfiles_repo.status()' implementation examines whether
specified patterns are related to largefiles in working directory (not
to STANDIN) or not by NOT-EMPTY-NESS of below list:
[f for f in match.files() if f in lfdirstate]
but it can not be assumed that all in 'match.files()' are file itself
exactly, because user may only specify part of path to match whole
under subdirectories recursively.
above examination will mis-recognize such pattern as 'not related to
largefiles', and executes normal 'status()' procedure. so, 'hg status'
shows '?'(unknown) status for largefiles in working directory unexpectedly.
this patch examines relation of pattern to largefiles by applying
'match()' on each entries in lfdirstate and checking wheter there is
no matched entry.
it may increase cost of examination, because it causes of full scan of
entries in lfdirstate.
so this patch uses normal for-loop instead of list comprehensions, to
decrease cost when matching is found.
This fixes a serious performance regression in largefiles introduced in
Mercurial 2.1. Caching new largefiles on pull is necessary, because
otherwise largefiles will be missing (and unable to be downloaded) when
the user tries to merge or rebase a new head with an old one. But this
is an expensive operation and should only be done for heads that are new
from the pull, rather than on all heads in the repository.
some problematic encodings use backslash as part of multi-byte characters.
util.pconvert() can treat strings in such encodings correctly, if
win32mbcs is enabled, but str.replace() can not.
Historically, during 'hg update', every largefile in the working copy was
hashed (which is a very expensive operation on big files) and any
largefiles that did not have a hash that matched their standin were
updated.
This patch optimizes 'hg update' by keeping track of what standins have
changed between the old and new revisions, and only updating the largefiles
that have changed. This saves a lot of time by avoiding the unecessary
calculation of a list of sha1 hashes for big files.
With this patch, the time 'hg update' takes to complete is a function of
how many largefiles need to be updated and what their size is.
Performance tests on a repository with about 80 largefiles ranging from
a few MB to about 97 MB are shown below. The tests show how long it takes
to run 'hg update' with no changes actually being updated.
Mercurial 2.1 release:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
getting changed largefiles
0 largefiles updated, 0 removed
real 0m10.045s
user 0m9.367s
sys 0m0.674s
With this patch:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 0m0.965s
user 0m0.845s
sys 0m0.115s
The same repsoitory, without the largefiles extension enabled:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 0m0.799s
user 0m0.684s
sys 0m0.111s
So before the patch, 'hg update' with no changes was approximately 9.25s
slower with largefiles enabled. With this patch, it is approximately 0.165s
slower.
This removes use of unknown files for building the synthetic working
directory manifest used by manifestmerge. Instead, we adopt the
strategy used by _checkunknown.
Side-effect: unknown files are no longer moved by remote directory
renames, and now are left alone like ignored files.
Previously, we would do a full working directory walk including
unknown files to perform a merge. In many cases, this was painful
because unknown files greatly outnumbered tracked files and generally
had no useful effect on the merge.
Here we instead wait until we find a file in the destination that's
not tracked locally and detect if it exists and is not ignored. This
is usually cheaper but can be -more- expensive in the case where we're
adding a huge number of files. On the other hand, the cost of statting
the new files should be dwarfed by the cost of eventually writing
them.
In this version, case collisions are detected implicitly by
os.path.exists and wctx[f] lookup.
assumptions:
- 'lfutil.splitstandin(f)' strips 'f' of '.hglf/', if it can
- 'lfile(f)' examines 'lfutil.standin(f) in manifest'
- 'lfutil.standin(f)' adds '.hglf/' to 'f'
and
- 'f' should start with '.hglf/' there, because it is already
examined by 'lfutil.isstandin(f)'
then:
lfile(lfutil.splitstandin(f))
=> lfile(<f without '.hglf/'>)
=> lfutil.standin(<f without '.hglf/'>) in manifest
=> <<f without '.hglf/'> with '.hglf/'> in manifest
=> f in manifest
When the user pulls from a remote repository that is not his default repo, it
is quite likely that he will pull a new head. This means that if he tries to
merge or rebase with the other head, he will run into a problem becuase
largefiles has no way of tracking where the remote repository for this other
head is, so it cannot download the largefiles from this other remote repository.
It will attempt to download them from its default remote repository, which will
not yet contain the largefiles.
This patch solves this problem by caching any new largefiles for all heads
directly into the system cache at the time of the pull, so they are available
later.
This behavior is actually more in line with Mercurial's distributed nature,
because pulling already implies we have a connection to the remote server, but
merging or rebasing does not.
This patch makes "hg remove" work the same way on largefiles as it does on
regular Mercurial files. If you try to remove an added largefile, the removal
fails and you are instead prompted to use "hg forget" to undo the add.
The largefiles extension prevents users from adding a normal file
named 'foo' if there is already a largefile with the same name.
However, there was a loop-hole: when merging, it was possible to bring
in a normal file named 'foo' while also having a '.hglf/foo' file.
This patch fixes this by extending the manifest merge to deal with
these kinds of conflicts. If there is a normal file 'foo' in the
working copy, and the other parent brings in a '.hglf/foo' file, then
the user will be prompted to keep the normal file or the largefile.
Likewise for the symmetric case where a normal file is brought in via
the second parent. The prompt looks like this:
$ hg merge
foo has been turned into a largefile
use (l)argefile or keep as (n)ormal file?
After the merge, either the '.hglf/foo' file or the 'foo' file will
have been deleted. This would cause status to return output like:
$ hg status
M foo
R foo
To fix this, the lfiles_repo.status method is changed so that a
removed normal file isn't shown if there is largefile with the same
name, and vice versa for largefiles.
If a largefile is introduced on the branch that is merged into the
working copy, then 'hg status' would abort with an error like:
$ hg status
abort: .hglf/foo@33fdd332ec: not found in manifest!
The problem was that the largefiles status code only looked in the
first parent for the largefile. Largefiles are now always reported as
modified if they don't exist in the first parent -- this matches the
behavior of localrepo.status for normal files.
There is a bug in the merge process where, if a new largefile is introduced
in a merge and the user does not have that largefile in his repo's local store
nor in his system cache, the working copy will retain the old largefile. Upon
the commit of the merge, the standin is re-written to contain the hash of the
old largefile, and the lfdirstate retains a "Modified" status for the file.
The end result is that the largefile can show up in the merge commit as
"Modified", but the standin has no diff. This is wrong in two ways:
1) Such a "wedged" history with a nonsense change in a commit should not be
possible
2) It effectively reverts a largefile to an old version when doing a merge
This is caused by the fact that the updatelfiles() command always checks the
current largefile's hash against the hash stored in the current node's standin.
This is correct behavior in every case except for a merge. When merging, we
must assume that the standin in the working copy contains the correct hash,
because the original hg.merge() has already updated it for us.
This patch fixes the issue by patching the repo object to carry a "_ismerging"
attribute, that the updatelfiles() command checks for. When this attribute is
found, it checks against the working copy's standin, rather than the standin
in the current node.
Don't lock/write on operations that should be readonly (status).
Always lock when writing the lfdirstate (rollback).
Don't write lfdirstate until after committing; state isn't actually changed
until the commit is complete.
When rebasing, we need to trust that the standins are always correct. The
rebase operation updates the standins according to the changeset it is
rebasing. We need to make the largefiles in the working copy match. If we
don't make them match, then they get accidentally reverted, either during
the rebase or during the next commit after the rebase.
This worked previously only becuase we were relying on the behavior that
largefiles with a changed standin, but unchanged contents, never showed up in
the list of modified largefiles. Unfortunately, pre-commit hooks can get
an incorrect status this way, and it also results in extra execution of code.
The solution is to simply trust the standins when we are about to commit a
rebased changeset, and politely ask updatelfiles() to pull the new contents
down. In this case, updatelfiles() will also mark any files it has pulled
down as dirty in the lfdirstate so that pre-commit hooks will get correct
status output.
Implementing addremove correctly in largefiles is tricky, becuase the original
addremove function does not call into any of the add or remove function we've
already overridden in the extension. So the trick is to implement addremove
without duplicating any code.
This patch implements addremove by pulling out the interesting parts of
override_add() and override_remove() into generic utility functions, and
using those to handle the largefiles in addremove. Then a matcher is
installed that will ignore all largefiles, and the original addremove
function is called to take care of the regular files in addremove.
A small bit of monkey patching is used to make sure that remove_largefiles()
notifies the user when a file is removed by addremove and also makes sure
the removal of largefiles doesn't interfer with the original addremove's
operation of removing the standin.
This comment is invalid. The hg.update() function will abort in the case of
any genuine error, so there is nothing to check. If we have gotten to this
point in execution, nothing critical has gone wrong, and if any standins
have been updated, we must pull new largefiles.
Before, it was possible to create a
.hg/largefiles/hash
file with truncated content, i.e., content where
SHA-1(content) != hash
This breaks the fundamental invariant in largefiles that the file
content for files in .hg/largefiles hash to the filename.
current lfconvert implementation uses combination of "ui.config()" and
"str.split(' ')" to get largefiles.patterns configuration.
but it can not handle multiline configuration in hgrc files correctly.
lfconvert should use "ui.configlist()" instead of it, as same as
override_add does.
Operating on a non-existant file can cause both IOError and OSError,
depending on the function used: open raises IOError, os.lstat raises
OSError.
The largefiles code called dirstate.normal, which in turn calls
os.lstat, so OSError is the right exception to catch here.
"hg status" may treat cache missed largefiles as "removed" incorrectly.
assumptions for problem case:
- there is no cache for largefile "L"
- at first, update working directory to the revision in which "L" is
not yet added,
- then, update working directory to the revision in which "L" is
already added
and now, "hg status" treats "L" as "removed".
current implementation does not allocate entry for cache missed
largefile in ".hg/largefiles/dirstate", but files without
".hg/largefiles/dirstate" entry are treated as "removed" by largefiles
extension.
"hg revert" can not recover from this situation, but "rm -rf
.hg/largefiles", because it causes dirstate rebuilding.
this patch invokes normallookup() for cache missed largefiles to
allocate entry in ".hg/largefiles/dirstate", so "hg status" can treat
it as "missing" correctly.
When (1) findfile links a largefile from the user cache to the store
and (2) the store directory doesn't exist yet, findfile errors out. A
simple call to util.makedirs fixes it.
This is consistent with the rest of Mercurial's code, mirroring the
try-finally-unlink structure elsewhere. Furthermore, it fixes the case where
largefiles throws an IOError on Windows when the temporary file is opened a
second time by copytocacheabsolute.
This patch creates the temporary file in the repo's largefiles store rather than
/tmp, which might be a different filesystem.
When largefiles is enabled, commands on large repositories which don't
require largefiles could be slowed down substantially. Disable
checking this for every command.
The code was using the size of a symlink's target, thus wrongly making symlinks
to large files into largefiles themselves. This can be demonstrated by
deleting the symlink and then doing an 'hg up' or 'hg up -C' to restore the
symlink.
The original intent was that the largefiles would primarily be in the
repository, with the global cache being only that--a cache. The naming
conventions and actual intent have both strayed. In this first patch, the
naming conventions are switched to match the actual intent, as are the
configuration options.
overrides.py contains several functions that temporarily override
scmutil.match(), which always takes a changectx object as the first
parameter. But these overrides name that parameter either 'repo' or
'ctxorrepo', which is misleading. So rename them to 'ctx' and remove
the special type-sensitive handling of the one called 'ctxorrepo'.