dirstate.walk will return unknown files that were explicitly requested, even
if listunknown is false. There's no point in dropping these files on the
floor in dirstate.status.
This has no effect on any current callers, because all of them assume the
unknown list is empty and ignore it. Future callers may find it useful,
though.
On Windows, there are two ways symlinks can manifest themselves:
1. As placeholders: text files containing the symlink's target. This is what
usually happens with fresh clones on Windows.
2. With their dereferenced contents. This happens with clones accessed over NFS
or Samba.
In order to handle case 2, 28af3e0c54f0 made dirstate.status ignore all symlink
placeholders on Windows. It doesn't ignore symlinks in the lookup set, though,
since those don't have the link bit set. This is problematic because it
violates the invariant that `hg status` with every file in the normal set
produces the same output as `hg status` with every file in the lookup set.
With this change, symlink placeholders in the normal set are no longer ignored.
We instead rely on code in localrepo.status that uses heuristics to look for
suspect placeholders.
An upcoming patch will test this out by no longer adding files written in the
last second of an update to the lookup set.
If a directory matched a regex in hgignore but the files inside the directory
did not match the regex, they would appear as deleted in hg status. This
change fixes them to appear normally in hg status.
Removing the ignore(nf) conditional here is ok because it just means we might
stat more files than we had before. My testing on a large repo shows this
causes no performance regression since the only additional files being stat'd
are the ones that are missing (i.e. status=!), which are generally rare.
Before this patch, all files in dirstate are used to build "_foldmap"
up on case insensitive filesystem regardless of their statuses.
For example, when dirstate contains both removed file 'a' and added
file 'A', "_foldmap" may be updated finally by removed file 'a'. This
causes unexpected status information for added file 'A' at "hg status"
invocation.
This patch ignores removed files at building "_foldmap" up on case
insensitive filessytem.
This patch doesn't add any test, because this issue is difficult to
reproduce intentionally: it depends on iteration order of "dirstate._map".
Consider a hypothetical extension that implements walk in a more efficient
manner and skips some known-clean files. However, that can only be done under
some situations, such as when clean files are not being asked for and a
match.traversedir callback is not set. The full flag lets walk tell these two
cases apart.
This makes a big difference to performance in some cases.
hg --time locate 'set:symlink()'
mozilla-central (70,000 files):
before: 2.92 sec
after: 2.47
another repo (170,000 files):
before: 7.87 sec
after: 6.86
An upcoming patch will speed dirstate.walk up by not querying the results dict
when it is empty. This ensures it is in some common cases.
This should be safe because subrepos and .hg aren't part of the dirstate.
The bash_completion code uses "hg status" to generate a list of
possible completions for commands that operate on files in the
working directory. In a large working directory, this can result
in a single tab-completion being very slow (several seconds) as a
result of checking the status of every file, even when there is no
need to check status or no possible matches.
The new debugpathcomplete command gains performance in a few simple
ways:
* Allow completion to operate on just a single directory. When used
to complete the right commands, this considerably reduces the
number of completions returned, at no loss in functionality.
* Never check the status of files. For completions that really must
know if a file is modified, it is faster to use status:
hg status -nm 'glob:myprefix**'
Performance:
Here are the commands used by bash_completion to complete, run in
the root of the mozilla-central working dir (~77,000 files) and
another repo (~165,000 files):
All "normal state" files (used by e.g. remove, revert):
mozilla other
status -nmcd 'glob:**' 1.77 4.10 sec
debugpathcomplete -f -n 0.53 1.26
debugpathcomplete -n 0.17 0.41
("-f" means "complete full paths", rather than the current directory)
Tracked files matching "a":
mozilla other
status -nmcd 'glob:a**' 0.26 0.47
debugpathcomplete -f -n a 0.10 0.24
debugpathcomplete -n a 0.10 0.22
We should be able to further improve completion performance once
the critbit work lands. Right now, our performance is limited by
the need to iterate over all keys in the dirstate.
hg strip -k was using dirstate.rebuild() which reset all the dirstate
entries timestamps to 0. This meant that the next time hg status was
run every file was considered to be 'unsure', which caused it to do
expensive read operations on every filelog. On a repo with >150,000
files it took 70 seconds when everything was in memory. From a cold
cache it took several minutes.
The fix is to only reset files that have changed between the working
context and the destination context.
For reference, --keep means the working directory is left alone during
the strip. We have users wanting to use this operation to store their
work-in-progress as a commit on a branch while they go work on another
branch, then come back later and be able to uncommit that work and
continue working. They currently use 'git reset HARD^' to accomplish
this in git.
util.statfiles returns a generator on python 2.7 with c extensions disabled.
This caused util.statfiles(...) [0] to break in tests. Since we're only
stat'ing one file, I just changed it to call lstat directly.
Previously dirstate.walk would return a stat object for files in the dmap
that have a symlink to a directory in their path. Now it will return None
to indicate that they are no longer considered part of the repository. This
currently only affects walks that traverse the entire directory tree (ex:
hg status) and not walks that only list the contents of the dmap (ex: hg diff).
In a situation like this:
mkdir foo && touch foo/a && hg commit -Am "a"
mv foo bar
ln -s bar foo
'hg status' will now show '! foo/a', whereas before it incorrectly considered
'foo/a' to be unchanged.
In addition to making 'hg status' report the correct information, this will
allow callers to dirstate.walk to not have to detect symlinks themselves,
which can be very expensive.
Starting with 049792af94d6, users are no longer able to update a
working copy to a branch named with a "bad" character (such as ':').
Prior to v2.4, it was possible to create branch names using "bad"
characters, so this breaks backwards compatibility.
Mercurial must allow users to update to existing branches with bad
names. However, it should continue to prevent the creation of new
branches with bad names.
A test was added to confirm that 'hg update' works as expected. The
test uses a bundled repo that was created with an earlier version of
Mercurial.
hg perfstatus -u on a working directory with 170,000 files, without this
change:
! wall 1.839561 comb 1.830000 user 1.120000 sys 0.710000 (best of 6)
With this change:
! wall 1.804222 comb 1.790000 user 1.140000 sys 0.650000 (best of 6)
hg perfstatus on the same directory, without this change:
! wall 1.016609 comb 1.020000 user 0.670000 sys 0.350000 (best of 10)
With this change:
! wall 0.985573 comb 0.980000 user 0.650000 sys 0.330000 (best of 10)
hg perfstatus -u on a working directory with 170,000 files, without this
change:
! wall 1.869404 comb 1.850000 user 1.170000 sys 0.680000 (best of 6)
With this change:
! wall 1.839561 comb 1.830000 user 1.120000 sys 0.710000 (best of 6)
In a clean working directory containing 170,000 tracked files, this
improves performance of "hg --time diff" from 1.69 seconds to 1.43.
This idea is due to Siddharth Agarwal.
This factors out the checks from tags and bookmarks, and newly applies
the same prohibitions to branches. checknewlabel takes a new parameter,
kind, indicating the kind of label being checked.
Test coverage is added for all three types of labels.
Dates and times that are outside the 31-bit signed range are now
compared modulo 2^31. This should prevent it from behaving badly with
very large files or corrupt dates while still having a high
probability of detecting changes.
state == 'a' implies check
I fail to see what the point of this check parameter is. Near as I can see,
the only _addpath call where it was set to True was in add(), but there, state
is 'a'.
This is a follow-up to 24a646d9943a.
Factor update code common to all callers of _addpath into _addpath.
By centralizing the update code here, it provides one place to put
updates to new data structures - in a future patch. It also removes
a few lines of duplicate code.
This is about 9 times faster than the Python dirstate packing code.
The relatively small speedup is due to the poor locality and memory
access patterns caused by traversing dicts and other boxed Python
values.