I often want to measure the cost of a function call before/after
an optimization, where using top level "hg --time" timing introduces
enough other noise that I can't tell if my efforts are having an
effect.
This decorator allows a developer to measure a function's cost with
finer granularity.
With the new parallel update code, it is possible for multiple
workers to try to create a hierarchy of directories at the same
time. This is hard to trigger in general, but most likely during
initial checkout.
To deal with these races, we introduce a new ensuredirs function
whose contract is to ensure that a directory hierarchy exists - it
will ignore a failure that implies that the desired directory already
exists.
In certain cases we would like to have a cache of the last N results of a
given computation, where N is small. This will be used in an upcoming patch to
increase the size of the manifest cache from 1 to 3.
Adding support to parsedate in util module to understand the more idiomatic
dates 'today' and 'yesterday'.
Added unified tests and docstring tests for added functionality.
This makes a big difference to performance.
In a clean working directory containing 170,000 files, performance of
"hg --time diff" improves from 2.38 seconds to 1.69.
Some of the localrepo property caches must be computed unfiltered and
stored globally. Some others must see the filtered version and store data
relative to the current filtering.
This changeset introduces two classes `unfilteredpropertycache`
and `filteredpropertycache` for this purpose. A new function
`hasunfilteredcache` is introduced for unambiguous checking for cached
values on unfiltered repos.
A few tweaks are made to the property cache class to allow overriding
the way the computed value is stored on the object.
Some logic relative to _tagcaches is cleaned up in the process.
The old str-based += collector performed very nicely on Linux, but
turns out to be quadratically expensive on Windows, causing
chunkbuffer to dominate in profiles.
This list-based version has been measured to significantly improve
performance with large chunks on Windows, with negligible overall
overhead on Linux (though microbenchmarks show it to be about 50% slower).
This may increase memory overhead where += didn't behave quadratically. If we
want to gather up 1G of data to join, we temporarily have 1G in our
list and 1G in our string.
bffd8f8dfc85 claims this was needed "to avoid cyclic dependency", but there is
no cyclic dependency.
windows.py already imports encoding, posix.py can import it too, so we can
simply use encoding.upper in windows.py and in posix.py.
(this is a partial backout of bffd8f8dfc85)
There are two sets of Python re2 bindings available on the internet;
this code works with both.
Using re2 can greatly improve "hg status" performance when a .hgignore
file becomes even modestly complex.
Example: "hg status" on a clean tree with 134K files, where "hg
debugignore" reports a regexp 4256 bytes in size.
no .hgignore: 1.76 sec
Python re: 2.79
re2: 1.82
The overhead of regexp matching drops from 1.03 seconds with stock
re to 0.06 with re2.
(For comparison, a git repo with the same contents and .gitignore
file runs "git status -s" in 1.71 seconds, i.e. only slightly faster
than hg with re2.)
There have been quite a few places where we pop elements off the
front of a list. This can turn O(n) algorithms into something more
like O(n**2). Python has provided a deque type that can do this
efficiently since at least 2.4.
As an example of the difference a deque can make, it improves
perfancestors performance on a Linux repo from 0.50 seconds to 0.36.
This patch contains support for Plan 9 from Bell Labs. A README is
provided in contrib/plan9 which describes the port in greater detail.
A new extension is also provided named factotum which permits the
factotum(4) authentication agent to provide credentials for HTTP
repositories. This extension is also applicable to other POSIX
platforms which make use of Plan 9 from User Space (aka plan9ports).
Currently, the 'user' filter is using util.shortuser(text) (which clearly
doesn't extract only the user portion of an email address, even though the
help text says it does).
The new 'emailuser' filter uses the new util.emailuser(text) function which,
instead, does exactly that.
The help text on the 'user' filter has been modified accordingly.
this patch disuses length check against un-normcase()-ed filenames
gotten by "os.listdir()", because there is no assurance that
filesystem stores filenames normalized except in letter case, even
though some case insensitive filesystems (in some environment, for
some language setting) store them in such manner.
'dirstate._normalize()', the only caller of 'util.fspath()', has
already confirmed exsistance of specified file as relative to root.
so, this patch omits path-absoluteness/existance check from
'util.fspath()'.
some hg operation (e.g.: qpush) create new files after first
dirstate.walk()-ing, and it invalidates _fspathcache for fspath().
then, fspath() will fail to look up specified name in _fspathcache.
this causes case preservation breaking, because parts of already
normcase()-ed path are used as result at that time.
in this case, file creation and writing out should be done before
fspath() invocation, so the second invocation of os.listdir() has not
so much impact on runtime performance.
this patch uses encoding.lower/upper for case folding, because ones of
str can not fold case of non ascii characters correctly.
to avoid cyclic dependency and to encapsulate logic of normcase in
each platforms, this patch introduces encodinglower/encodingupper in
both posix/windows specific files.
this patch does not change implementation of normcase() in posix.py,
because we do not know the encoding of filenames on POSIX.
some "normcase()" are excluded from function wrap list in
hgext/win32mbcs.py, because they become encoding aware by this patch.
'dirstate._normalize()', the only caller of 'util.fspath()', has
already normcase()-ed path before invocation of it.
normcase()-ed root can be cached on dirstate side, too.
so, this patch changes 'util.fspath()' API specification to avoid
normcase()-ing in it.
Before:
>>> str(url('file:///c:/tmp/foo/bar'))
'file:c%3C/tmp/foo/bar'
After:
>>> str(url('file:///c:/tmp/foo/bar'))
'file:///c%3C/tmp/foo/bar'
The previous behaviour had no effect on mercurial itself (clone command for
instance) because we fortunately called .localpath() on the parsed URL.
hgsubversion was not so lucky and cloning a local subversion repository on
Windows no longer worked on the default branch (it works on stable because
2b62605189dc defeats the hasdriveletter() test in url class).
I do not know if the %3C is correct or not but svn accepts file:// URLs
containing it. Mads fixed it in 2b62605189dc, so we can always backport should
the need arise.
Python's time module sets timezone and altzone based on UTC offsets of
two dates: first and middle day of the current year. This approach
doesn't work on a year when DST rules change.
For example Russia abandoned winter time this year, so the correct UTC
offset should be +4 now, but time.timezone returns 3 hours difference
because that's what it was on 01.01.2011.
Related python issue: http://bugs.python.org/issue1647654
It was very elegant that httpsendfile implemented __len__ like a string. It was
however also dangerous because that protocol can't handle sizes bigger than 2 GB.
Mercurial tried to work around that, but it turned out to be too easy to
introduce new errors in this area.
With this change __len__ is no longer implemented at all and the code will work
the same way for short and long posts.
neither number of 'bytes' in any encoding nor 'characters' is
appropriate to calculate terminal columns for specified string.
this patch modifies MBTextWrapper for:
- overriding '_wrap_chunks()' to make it use not built-in 'len()'
but 'encoding.colwidth()' for columns of string
- fixing '_cutdown()' to make it use 'encoding.colwidth()' instead
of local, similar but incorrect implementation
this patch also modifies 'encoding.py':
- dividing 'colwith()' into 2 pieces: one for calculation columns of
specified UNICODE string, and another for rest part of original
one. the former is used from MBTextWrapper in 'util.py'.
- preventing 'colwidth()' from evaluating HGENCODINGAMBIGUOUS
configuration per each invocation: 'unicodedata.east_asian_width'
checking is kept intact for reducing startup cost.
This re-introduces the unicode conversion what was lost in e5976ee55f4b 5 years
ago and had the comment:
To avoid corrupting multi-byte characters in line, we must wrap
a Unicode string instead of a bytestring.
urllib2 password manager does not strip credentials from URIs registered with
add_password() and compare them with stripped URIs in find_password(). Remove
credentials from URIs returned by util.url.authinfo(). It sometimes works when
no port was specified as the URI host is registered too.
8264e5172141 made sure that paths that seemed to start with a windows drive
letter would not get an extra leading slash.
localpath should thus not try to handle this case by removing a leading slash,
and this special handling is thus removed.
(The localpath handling of this case was wrong anyway, because paths that look
like they start with a windows drive letter can't have a leading slash.)
A quick verification of this is to run 'hg id file:///c:/foo/bar/'.
The usual contract is that close() makes your writes permanent, so
atomictempfile's use of close() to *discard* writes (and rename() to
keep them) is rather unexpected. Thus, change it so close() makes
things permanent and add a new discard() method to throw them away.
discard() is only used internally, in __del__(), to ensure that writes
are discarded when an atomictempfile object goes out of scope.
I audited mercurial.*, hgext.*, and ~80 third-party extensions, and
found no one using the existing semantics of close() to discard
writes, so this should be safe.
Before, makedirs could call itself recursively with the same path name it was
given, relying on sane file system behavior to terminate the recursion. That
could cause infinite recursion on insane file systems.
Instead we now call mkdir explicitly after having created parent directory
recursively. Exceptions from this mkdir is not swallowed.
This re-introduces the unicode conversion what was lost in e5976ee55f4b 5 years
ago and had the comment:
To avoid corrupting multi-byte characters in line, we must wrap
a Unicode string instead of a bytestring.
reducing it to a NOP on Windows.
This eliminates a pointless stat call on Windows and reduces the risk of
interferring with other processes (e.g. AV-scanners, file change watchers).
See also http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows, item 2d
util is never imported by any other name than util, so this is mostly just a
simple search and replace from util.localpath to util.urllocalpath (assuming
other uses of util.localpath already has been renamed).
str(url) was recently changed to return only file:/. However, the
canonical way to represent absolute local paths is file:/// [1], which
is also expected by at least hgsubversion.
Relative paths are returned as file:the/relative/path.
[1] http://en.wikipedia.org/wiki/File_URI_scheme
This class contains a stat result, and possibly other file info to reliably
determine between two points in time whether a file has changed.
Uniquely identifying a file gives us that reliability because we either
atomic rename or append. So one of two will happen: the file 'id' will change,
or the size of the file will change.
posix implements it simply by calling os.stat() and checking if the result
has st_ino.
For now on Windows we always assume the path is uncacheable. This can be
improved on NTFS due to file IDs: http://msdn.microsoft.com/en-us/library/aa363788(v=vs.85).aspx
So we need to find out if a file path is on an NTFS drive, for that we have:
- GetVolumeInformation, which unfortunately only works with a root path (but is available on XP)
- GetVolumeInformationByHandleW, works on a full file path but requires Vista or higher
The ui class uses util.hasscheme() in a couple of places, causing hg
to import urllib even when it doesn't need to. This copies
urllib.unquote() to avoid that import.
perfstartup time before the URL refactoring (707e4b1e8064):
! wall 0.050692 comb 0.000000 user 0.000000 sys 0.000000 (best of 100)
before this change:
! wall 0.064742 comb 0.000000 user 0.000000 sys 0.000000 (best of 100)
after this change:
! wall 0.052126 comb 0.000000 user 0.000000 sys 0.000000 (best of 100
The introduction of the new URL parsing code has created a startup
time regression. This is mainly due to the use of url.hasscheme() in
the ui class. It ends up importing many libraries that the url module
requires.
This fix helps marginally, but if we can get rid of the urllib import
in the URL parser all together, startup time will go back to normal.
perfstartup time before the URL refactoring (707e4b1e8064):
! wall 0.050692 comb 0.000000 user 0.000000 sys 0.000000 (best of 100)
current startup time (9ad1dce9e7f4):
! wall 0.070685 comb 0.000000 user 0.000000 sys 0.000000 (best of 100)
after this change:
! wall 0.064667 comb 0.000000 user 0.000000 sys 0.000000 (best of 100)
The problem is that a programmer using atomictempfile directly can
make an innocent everyday mistake -- not enough args to the
constructor -- which escalates badly. You would expect a simple
TypeError crash in that case, but you actually get an infinite
recursion that is surprisingly difficult to kill: it happens between
__del__() and __getattr__(), and Python does not handle infinite
recursion from __del__() well.
The fix is to not implement __getattr__(), but instead assign instance
attributes for the methods we wish to delegate to the builtin file
type: write() and fileno(). I've audited mercurial.* and hgext.* and
found no users of atomictempfile using methods other than write() and
rename(). I audited third-party extensions and found one (snap)
passing an atomictempfile to util.fstat(), so I also threw in
fileno().
The last time I submitted a similar patch, Matt proposed that we make
atomictempfile a subclass of file instead of wrapping it. Rejected on
grounds of unnecessary complexity: for one thing, it would make the
Windows implementation of posixfile quite a bit more complex. It would
have to become a subclass of file rather than a simple function -- but
since it's written in C, this is non-obvious and non-trivial.
Furthermore, there's nothing wrong with wrapping objects and
delegating methods: it's a well-established pattern that works just
fine in many cases. Subclassing is not the answer to all of life's
problems.
path_auditor is used for checking patterns too, but a pattern is not a valid
filename.
This patch fixes fb1792e89e34, which introduced the bug:
$ hg log -l3 glob:**.py
abort: filename contains '*', which is reserved on Windows: mercurial\**.py
Example (on Windows):
$ hg parents
$ hg manifest tip
con.xml
$ hg update
abort: filename contains 'con', which is reserved on Windows: con.xml
Before this patch, update produced (as explained in issue2755):
$ hg update
abort: No usable temporary filename found
I've added the new function checkwinfilename to util.py and not to windows.py,
so that we can later call it when running on posix platforms too, for when we
decide to implement a (configurable) warning message on 'hg add'.
As per this patch, checkwinfilename is currently only used when running
on Windwows.
path_auditor calls checkosfilename, which is a NOP on posix and an alias for
checkwinfilename on Windows.
Python added support for Windows 6.0 (Vista) symbolic links in 3.2 [1], but
even these symbolic links aren't what we can expect from a canonical
symbolic link, since creation requires SeCreateSymbolicLinkPrivilege,
which typically only admins have.
So we can safely assume that we don't have symbolic links on Windows.
[1] http://docs.python.org/py3k/library/os.html#os.symlink
The use of "{datetime}" was unfortunate since I as a user never knew
if I was expected to do
hg log -d '>{2011-04-01}'
or
hg log -d '>2011-04-01'
The word "datetime" is also confusing -- calling it a date it much
simpler.
This replaces util.drop_scheme() with url.localpath(), using url.url for
parsing instead of doing it on its own. The function is moved from
util to url to avoid an import cycle.
hg.localpath() is removed in favor of using url.localpath(). This
provides more consistent behavior between "hg clone" and other
commands.
To preserve backwards compatibility, URLs like bundle://../foo still
refer to ../foo, not /foo.
If a URL contains a scheme, percent-encoded entities are decoded. When
there's no scheme, all characters are left untouched.
Comparison of old and new behaviors:
URL drop_scheme() hg.localpath() url.localpath()
=== ============= ============== ===============
file://foo/foo /foo foo/foo /foo
file://localhost:80/foo /foo localhost:80/foo /foo
file://localhost:/foo /foo localhost:/foo /foo
file://localhost/foo /foo /foo /foo
file:///foo /foo /foo /foo
file://foo (empty string) foo /
file:/foo /foo /foo /foo
file:foo foo foo foo
file:foo%23bar foo%23bar foo%23bar foo#bar
foo%23bar foo%23bar foo%23bar foo%23bar
/foo /foo /foo /foo
Windows-related paths on Windows:
URL drop_scheme() hg.localpath() url.localpath()
=== ============= ============== ===============
file:///C:/foo C:/C:/foo /C:/foo C:/foo
file:///D:/foo C:/D:/foo /D:/foo D:/foo
file://C:/foo C:/foo C:/foo C:/foo
file://D:/foo C:/foo D:/foo D:/foo
file:////foo/bar //foo/bar //foo/bar //foo/bar
//foo/bar //foo/bar //foo/bar //foo/bar
\\foo\bar //foo/bar //foo/bar \\foo\bar
Windows-related paths on other platforms:
file:///C:/foo C:/C:/foo /C:/foo C:/foo
file:///D:/foo C:/D:/foo /D:/foo D:/foo
file://C:/foo C:/foo C:/foo C:/foo
file://D:/foo C:/foo D:/foo D:/foo
file:////foo/bar //foo/bar //foo/bar //foo/bar
//foo/bar //foo/bar //foo/bar //foo/bar
\\foo\bar //foo/bar //foo/bar \\foo\bar
For more information about file:// URL handling, see:
http://www-archive.mozilla.org/quality/networking/testing/filetests.html
Related issues:
- issue1153: File URIs aren't handled correctly in windows
This patch should preserve the fix implemented in
5c92d05b064e. However, it goes a step further and "promotes"
Windows-style drive letters from being interpreted as host names to
being part of the path.
- issue2154: Cannot escape '#' in Mercurial URLs (#1172 in THG)
The fragment is still interpreted as a revision or a branch, even in
paths to bundles. However, when file: is used, percent-encoded
entities are decoded, so file:test%23bundle.hg can refer to
test#bundle.hg ond isk.
The previous test assumed that 'os.name' was "mac" on Mac OS X. This
is not the case; 'mac' was classic Mac OS, whereas Mac OS X has 'os.name'
be 'posix'.
Please note that this change will break Mercurial on hypothetical
non-Mac OS X deployments of Darwin.
Credit to Brodie Rao for thinking of CGSessionCopyCurrentDictionary()
and Kevin Bullock for testing.
Add missing calls to close() to many places where files are
opened. Relying on reference counting to catch them soon-ish is not
portable and fails in environments with a proper GC, such as PyPy.
The pywin32 package is no longer needed.
ctypes is now required for running Mercurial on Windows.
ctypes is included in Python since version 2.5. For Python 2.4, ctypes is
available as an extra installer package for Windows.
Moved spawndetached() from windows.py to win32.py and fixed it, using
ctypes as well. spawndetached was defunct with Python 2.6.6 because Python
removed their undocumented subprocess.CreateProcess. This fixes
'hg serve -d' on Windows.
If pywin32 is not installed, 'os.lstat(pathname).st_nlink' is used for
nlinks(), which is always zero for all files on Windows.
To make sure we break up hardlinks if pywin32 is missing, we force
nlink = 2 if nlinks() returns < 1.
(this completely fixes issue1922)
It tries to convert localstr to unicode before truncating.
Because we cannot assume that the given text is encoded in local encoding,
it falls back to raw string in case of unicode error.
In a date like 10:30, there are two underspecified ends: the specific
end (seconds) and the broad end (day, month, year). When matching
"10:30", we need to allow the specific end to go from 0 to 59 seconds,
while the broad end is assumed to be today's date.
Similar handling applies for a date range like "Mar 1": year is fixed
to today, any time matches.