largefiles and lfconvert do dirty hacks with dirstate, so to avoid writing that
as a side effect of the wlock release we clear dirstate first.
To avoid confusing lock validation algorithms in error situations we unlock
_before_ removing the target directory.
Let R be a repo served by an hg daemon on a machine with an empty largefiles
cache. Pushing a largefiles repo to R will result in a no-such-file-or-directory
OSError because putlfile will attempt to create a temporary file in
R/.hg/largefiles, which does not yet exist.
This patch also adds a regression test for this scenario.
largefiles status implementation attemps to rewrite the input match objects to
match the "standins" as well as the regular files. When fixing the directories
listed in match.files(), if there was related standin entry, it was kept and
the original path discarded. But directories can appear both as regular and
standin entries.
The fix introduced in 3509b9cf8f86 was only partially successful. It is correct
to turn dirstate 'm' merge records into normal/dirty ones but copy records are
lost in the process. To adjust them as well, we need to look in the first
parent manifest to know which files were added and preserve only related
records. But the dirstate does not have access to changesets, the logic has to
moved at another level, in localrepo.
Summary and commit use dirty() to check the status of a subrepository,
so this overrides dirty() in the subrepo in the same manner as
status() to check the large files instead of their standins.
Previously, if only a large file was changed in a subrepo, summary in
the top level repo would not report the subrepo was dirty and commit
-S would report nothing changed. If any type of file was changed in
the top repo and only a large file in the subrepo, commit -S would not
commit the changes to the subrepo.
Wrapping the status command will only invoke overridestatus() and set
the lfstatus field for the top level repository. Wrapping the status
function is required to set the field on child repositories.
Previously, status -S would report large files in a subrepo as '?'
regardless of their actual states, and was inconsistent with what
status would report from within that subrepo.
This is a fix to largefiles so that 'hg cat' will work correctly when a
largefile is specified.
As per discussion on Issue 3352:
1) The file will be printed regardless if it is binary or large.
2) The file is downloaded if it is not readily available (not found in
the system cache), so that it can be printed. If the download fails,
then we abort.
original implementation queries whether specified pattern is related
or not to largefiles in target context by 'dirstate.__contains__()'.
but this can't recognize 'directory pattern' correctly, so this patch
uses 'dirstate.dirs()' for it.
this patch uses dirstate instead of lfdirstate in 'working' route
(second patch hunk for 'hgext/largefiles/reposetup.py'), because
'dirs()' information may be already built for dirstate but not yet for
lfdirstate at this point. this prevents lfdirstate from building up
and having 'dirs()' information.
original implementation queries whether specified pattern is related
or not to largefiles, to target context.
but changectx/workingctx returns False about relationship with files
marked as removed.
So, 'hg status' with 'file pattern' for removed file shows unexpected
warning message in below process:
1. 'tostandin()' returns non-STANDIN filename for removed file,
because changectx/workingctx returns False about relationship
with it
2. 'match.files()' contains non-STANDIN filename, which is already
removed from working directory
3. 'dirstate.walk()' invoked via 'localrepository.status()' treats
non-STANDIN filename as bad filename, because there is no entry
for it in dirstate: only STANDIN is managed in dirstate
4. 'dirstate.walk()' invokes 'match.bad()', which is defined in
'localrepository.status()' as 'bad()'
5. 'bad()' shows warning message for non-STANDIN, because it is
not related to source context: only STANDIN is related to it
this patch queries to dirstate instead of changectxt/workingctx,
because dirstate returns expected result for removed files.
'match.files()' is used by 'localrepository.status()' only in
'working' case, so this patched code also works correctly in
non-'working' case.
Before, a tempfile was used to create a temp file was created with 600
permissions and the uploaded data was written into it. This file was
then *copied* to .hg/largefiles/<hash>.
We now simply use atomictempfile to write the data to a temp file with
the right permissions and then rename that into place.
This replaces another use of tempfile with atomictempfile. The problem
with tempfile is that it creates files with 600 permissions instead of
respecting repo.store.createmode.
Before, the mode was copied from the file in the working copy. This is
inconsistent with how Mercurial normally creates files inside the .hg
folder. This can lead to a situation where you can read the files
under .hg/store but not under .hg/largefiles and so a clone will fail
if it needs to access a largefile.
current 'lfiles_repo.status()' implementation examines whether
specified patterns are related to largefiles in working directory (not
to STANDIN) or not by NOT-EMPTY-NESS of below list:
[f for f in match.files() if f in lfdirstate]
but it can not be assumed that all in 'match.files()' are file itself
exactly, because user may only specify part of path to match whole
under subdirectories recursively.
above examination will mis-recognize such pattern as 'not related to
largefiles', and executes normal 'status()' procedure. so, 'hg status'
shows '?'(unknown) status for largefiles in working directory unexpectedly.
this patch examines relation of pattern to largefiles by applying
'match()' on each entries in lfdirstate and checking wheter there is
no matched entry.
it may increase cost of examination, because it causes of full scan of
entries in lfdirstate.
so this patch uses normal for-loop instead of list comprehensions, to
decrease cost when matching is found.
This fixes a serious performance regression in largefiles introduced in
Mercurial 2.1. Caching new largefiles on pull is necessary, because
otherwise largefiles will be missing (and unable to be downloaded) when
the user tries to merge or rebase a new head with an old one. But this
is an expensive operation and should only be done for heads that are new
from the pull, rather than on all heads in the repository.
some problematic encodings use backslash as part of multi-byte characters.
util.pconvert() can treat strings in such encodings correctly, if
win32mbcs is enabled, but str.replace() can not.
Historically, during 'hg update', every largefile in the working copy was
hashed (which is a very expensive operation on big files) and any
largefiles that did not have a hash that matched their standin were
updated.
This patch optimizes 'hg update' by keeping track of what standins have
changed between the old and new revisions, and only updating the largefiles
that have changed. This saves a lot of time by avoiding the unecessary
calculation of a list of sha1 hashes for big files.
With this patch, the time 'hg update' takes to complete is a function of
how many largefiles need to be updated and what their size is.
Performance tests on a repository with about 80 largefiles ranging from
a few MB to about 97 MB are shown below. The tests show how long it takes
to run 'hg update' with no changes actually being updated.
Mercurial 2.1 release:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
getting changed largefiles
0 largefiles updated, 0 removed
real 0m10.045s
user 0m9.367s
sys 0m0.674s
With this patch:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 0m0.965s
user 0m0.845s
sys 0m0.115s
The same repsoitory, without the largefiles extension enabled:
$ time hg update
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 0m0.799s
user 0m0.684s
sys 0m0.111s
So before the patch, 'hg update' with no changes was approximately 9.25s
slower with largefiles enabled. With this patch, it is approximately 0.165s
slower.
This removes use of unknown files for building the synthetic working
directory manifest used by manifestmerge. Instead, we adopt the
strategy used by _checkunknown.
Side-effect: unknown files are no longer moved by remote directory
renames, and now are left alone like ignored files.
Previously, we would do a full working directory walk including
unknown files to perform a merge. In many cases, this was painful
because unknown files greatly outnumbered tracked files and generally
had no useful effect on the merge.
Here we instead wait until we find a file in the destination that's
not tracked locally and detect if it exists and is not ignored. This
is usually cheaper but can be -more- expensive in the case where we're
adding a huge number of files. On the other hand, the cost of statting
the new files should be dwarfed by the cost of eventually writing
them.
In this version, case collisions are detected implicitly by
os.path.exists and wctx[f] lookup.
assumptions:
- 'lfutil.splitstandin(f)' strips 'f' of '.hglf/', if it can
- 'lfile(f)' examines 'lfutil.standin(f) in manifest'
- 'lfutil.standin(f)' adds '.hglf/' to 'f'
and
- 'f' should start with '.hglf/' there, because it is already
examined by 'lfutil.isstandin(f)'
then:
lfile(lfutil.splitstandin(f))
=> lfile(<f without '.hglf/'>)
=> lfutil.standin(<f without '.hglf/'>) in manifest
=> <<f without '.hglf/'> with '.hglf/'> in manifest
=> f in manifest
When the user pulls from a remote repository that is not his default repo, it
is quite likely that he will pull a new head. This means that if he tries to
merge or rebase with the other head, he will run into a problem becuase
largefiles has no way of tracking where the remote repository for this other
head is, so it cannot download the largefiles from this other remote repository.
It will attempt to download them from its default remote repository, which will
not yet contain the largefiles.
This patch solves this problem by caching any new largefiles for all heads
directly into the system cache at the time of the pull, so they are available
later.
This behavior is actually more in line with Mercurial's distributed nature,
because pulling already implies we have a connection to the remote server, but
merging or rebasing does not.