In a mozilla repo with tip at bb3ff09f52fe,
hg update tip~1000 && time hg revert -nq -r tip .
displays ~4:20 minutes. With tip~100, it runs in ~11 s. With revision
100000, it did not finish in 12 minutes.
Revert calls dirstate.status() with a matcher that matches each file
in the target revision. The main problem [1] lies in
dirstate._walkexplicit(), which looks for matching deleted directories
by checking whether each path is prefix of any path in the
dirstate. With m files in the dirstate and n files in the target
revision that are not in the dirstate, this is clearly O(m*n). Let's
improve by keeping a lazily initialized set of all the directories in
the dirstate, so the time becomes O(m+n).
After this patch, the 4:20 minutes become 5.5 s, while for a single
missing path, it slows down from 1.092 s to 1.150 s (best of 4). The
>12 min case becomes 5.8 s.
[1] A narrower optimization would be to make revert take the fast
path for '.' and '--all'.
This patch also replaces "self._getstorehashcachepath" (building
absolute path up) by "self._getstorehashcachename" (building relative
path up), because "vfs.writelines" requires relative path.
This patch uses "False" as default value of "notindexed" argument,
even though "vfs.makedir()" uses "True" for it, because "os.mkdir()"
doesn't set "_FILE_ATTRIBUTE_NOT_CONTENT_INDEXED" attribute to newly
created directories.
This patch also replaces "self._getstorehashcachepath" (building
absolute path up) by "self._getstorehashcachename" (building relative
path up), because "vfs.tryreadlines" requires relative path.
This patch makes "_readstorehashcache()" return "[]" (returned by
"vfs.tryreadlines()"), when cache file doesn't exist, even though
"_readstorehashcache()" returned '' (empty string) in such case before
this patch.
"_readstorehashcache()" is invoked only by the code path below in
"_storeclean()":
for filehash in self._readstorehashcache(path):
if filehash != itercache.next():
clean = False
break
In this case, "[]" and '' don't differ from each other, because both
of them cause avoiding iteration of "for loop".
This patch allows "readlines" and "tryreadlines" to take "mode"
argument, because "subrepo" requires to read files not in "rb"
(binary, default for vfs) but in "r" (text) mode in subsequent patch.
This "vfs" object will be used by subsequent patches to handle cache
store hash files without direct file APIs.
This patch decorates "_cachestorehashvfs" with "@propertycache" to
delay vfs creation, because it is used only for cooperation with other
repositories.
In this patch, "/" is used as the path separator, even though
"self._repo.join" uses platform specific path separator (e.g. "\\" on
Windows). But it is reasonable enough, because "store" and other
management file handling already include such implementation, and they
work well.
"_calcfilehash" can be completely replaced by simple "vfs.tryread"
invocation.
def _calcfilehash(filename):
data = ''
if os.path.exists(filename):
fd = open(filename, 'rb')
data = fd.read()
fd.close()
return util.sha1(data).hexdigest()
Building absolute path "absname" up by "self._repo.join" for files in
"filelist" is avoided, because "vfs.tryread" does so internally.
Existance of specified "path" should be examined by "exists" via wvfs
of the parent repository, because the working directory of the parent
repository may be in UTF-8 mode. Wide API should be used via wvfs in
such case.
In this patch, "/" is used as the path separator, even though "path"
uses platform specific path separator (e.g. "\\" on Windows). But it
is reasonable enough, because "store" and other management file
handling already include such implementation, and they work well.
"util.makedirs" for the (sub-)repository root of "hgsubrepo" is also
executed in the constructor of "localrepository", if "create" is True
and ".hg" of it doesn't exist.
This patch avoids redundant "util.makedirs" invocation in the
constructor of "hgsubrepo".
manifestmerge() has a piece of code that's roughly:
if not force and different:
abort
else:
# if different: old untracked f may be overwritten and lost
...
The comment only talks about what happens when 'different' is true,
and in combination with the if-block above, that must mean that it is
only about what happens when 'force and different'. It seems quite
fine that files are overwritten when 'force' is true, so let's remove
the comment. As it stands, it can easily be interpreted as a TODO
(which is how I interpreted it at first).
simplejson produces slightly different output from the built-in json
module, specifically:
* It uses 0.000 instead of 0.0000
* It likes to put a trailing space after a comma
This change works around both of those variations.
A matcher is required when enabling the subrepo option on archival.archive(),
because that calls match.narrowmatcher(), which accesses fields on the object.
It's therefore probably a bad idea to default the matcher to None on archive(),
but that's a fix for default.
Previously, fnodes had a key and empty dict value for every element in
changedfiles. This is somewhat wasteful. Empty dicts in CPython consume
a lot more memory than you would expect - 280 bytes.
On mozilla-central, which has ~190,000 files/fnodes keys, the previous
loop populating fnodes allocated 91,924 KB of memory, most of that for
the empty dicts.
With this patch in place, our peak RSS during mozilla-central clone
drops:
before: 364,356 KB
after: 326,008 KB
delta: -38,348 KB
When combined with the previous patch, total peak RSS decrease is now
190,116 KB.
The contents of fnodes are only accessed once per key. It is wasteful to
cache the value since nobody will use it.
Before this patch, the caching of unused data in fnodes was effectively
causing a memory leak during the file streaming part of bundle creation.
On mozilla-central (which has ~190,000 entries in fnodes), this patch
has a significant impact on RSS at the end of generate():
before: 516,124 KB
after: 364,356 KB
delta: -151,768 KB
The origin of this code can be traced back to 1f567a607f1f and has been
with us since the 2.7 release.
lookupmf() is currently defined earlier than when it is needed. Future
patches further refactoring this code will be easier to read when
lookupmf() is in its new home.
The mail module only verifies the smtp ssl certificate if 'verifycert' is enabled
(the default). The 'verifycert' can take three possible values:
- 'strict'
- 'loose'
- any "False" value, eg: 'false' or '0'
We tested the validity of the third value, but never converted it to actual
falseness, making 'False' an equivalent for 'loose'.
This changeset fixes it.
2ec3e28dea6b changed 'sample' from a list to a set. The iteration order is thus
undefined and the yesno indices are not stable.
To solve this, repeat the listification and comment from elsewhere in the code.
Note: the randomness in the discovery protocol can make this problem hard to
reproduce.
2ec3e28dea6b made it possible that the initial head check didn't include all
heads. If that is the case, don't use the early exit just because this random
sample happened to be 'all known'.
Note: the randomness in the discovery protocol can make this problem hard to
reproduce.
This keyword remapping was introduced in 236440938a03 as part of converting
generator based iterators into list based iterators, mentioning "undesired
behavior in template" when a generator is exhausted, but doesn't say what and
introduces no tests.
The problem with the remapping was that it corrupted the output for keywords
like 'extras', 'file_copies' and 'file_copies_switch' in templates such as:
$ hg log -r 82a4f5557c6b --template "{file_copies % ' File: {file_copy}\n'}"
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
File: mercurial/changelog.py (mercurial/hg.py)
What was happening was that in the first call to runtemplate() inside runmap(),
'lm' mapped the keyword (e.g. file_copies) to the appropriate showxxx() method.
On each subsequent call to runtemplate() in that loop however, the keyword was
mapped to a list of the first item's pieces, e.g.:
'file_copy': ['mercurial/changelog.py', ' (', 'mercurial/hg.py', ')']
Therefore, the dict for the second and any subsequent items were not processed
through the corresponding showxxx() method, and the first item's data was
reused.
The 'extras' keyword regressed in 56b014c52204, and 'file_copies' regressed in
4e182fb53989 for other reasons. The common thread of things fixed by this seems
to be when a list of dicts are passed to the templatekw._hybrid class.
We are about to make bookmarks and phases available for hooks.
Therefore we need a witness for this new availability. We introduce
the new hooks in a distinct changeset to reduce the noise in the ones
with actual changes.
Such file are generated with a .pending prefix. It is up to the reader to
implement the necessary logic for reading pending files.
We add a test to ensure pending files are properly cleaned-up in both success and
error cases.
This will allow us to generate temporary pending files. Files
generated with a suffix are assumed temporary and will be cleaned up
at the end of the transaction.
When test output is processed, if os.altsep is defined (i.e. on Windows),
TTest.globmatch() will cause a warning later on if a line has a glob that isn't
necessary. Unfortunately, the regex checking in check-code.py doesn't have this
context. Therefore we ended up with cases where the test would get flagged with
a warning only on Windows because a glob was present, because check-code.py
would warn if it wasn't. For example, from test-subrepo.t:
$ hg -R issue1852a push `pwd`/issue1852c
pushing to $TESTTMP/issue1852c (glob)
The glob isn't necessary here because the slash is shown as it was provided.
However, check-code mandates one to handle the case where the default path has
backslashes in it.
Break the cycle by checking against a subset of the check-code rules before
flagging the test with a warning, and ignore the superfluous glob if it matches
a rule. This change fixes warnings in test-largefiles-update.t, test-subrepo.t,
test-tag.t, and test-rename-dir-merge.t on Windows.
I really hate that the rules are copy/pasted here (minus the leading two spaces)
because it would be nice to only update the rules once, in a single place. But
I'm not sure how else to do it. I'm open to suggestions. Splitting some of the
rules out of check-code.py seems wrong, but so does moving check-code.py out of
contrib, given that other checking scripts live there.
There are other glob patterns that could be copied over, but this is enough to
make the current tests run on Windows.
As far as I and the test suite can tell, the checks in manifestmerge()
already report the errors (whether or not --check is given), so we
don't need to call merge.checkunknown(). Since this is the last call
to the method, also remove the method.
Before this patch, a part of "test-push-hook-lock.t" fails unexpectedly on
Windows environment, because semicolon (";") isn't recognized as the command
separator by "cmd.exe". This is fixed the same way as a similar issue in
7c253c23de3b.
test-largefiles-update.t, test-subrepo.t, test-tag.t, and
test-rename-dir-merge.t still warn about no result returned because of
unnecessary globs that test-check-code-hg.t wants, relating to output for
pushing to, pulling from and moving X to Y.
This fixes test-install.t on Windows that broke in 97300cee8fc0 when
shlex.split() was added to the debuginstall command:
@@ -7,8 +7,11 @@
checking installed modules (*mercurial)... (glob)
checking templates (*mercurial?templates)... (glob)
checking commit editor...
+ Can't find editor 'c:\Python27\python.exe -c "(omitted)"' in PATH
+ (specify a commit editor in your configuration file)
checking username...
- no problems detected
+ 1 problems detected, please check your install!
+ [1]
What happens is that shlex.split() on Windows turns this:
c:\Python27\python.exe -c "import sys; sys.exit(0)"
into this:
['c:Python27python.exe', '-c', 'import sys; sys.exit(0)']
While technically a regression, most programs on Windows live in some flavor of
'Program Files', and therefore the environment variable needs to contain quotes
anyway to handle the space. This wasn't handled prior to the shlex() change,
because it tested the whole environment variable to see if it was an executable,
or split on the first space and tested again.
We were definitely being suboptimal here: we were constructing two full sets,
one with the full set of common nodes (i.e. a graph traversal) and one with all
nodes. Then we subtract one set from the other. This whole process is
O(commits) and causes discovery to be significantly slower than it should be.
Instead, keep track of common incrementally and keep undecided as small as
possible.
This makes discovery massively faster on large repos: on one such repo, 'hg
debugdiscovery' over SSH with one commit missing on the client and five on the
server went from 4.5 seconds to 1.5. (An 'hg debugdiscovery' with no commits
missing on the client, i.e. connection startup time, was 1.2 seconds.)