The ui class uses util.hasscheme() in a couple of places, causing hg
to import urllib even when it doesn't need to. This copies
urllib.unquote() to avoid that import.
perfstartup time before the URL refactoring (707e4b1e8064):
! wall 0.050692 comb 0.000000 user 0.000000 sys 0.000000 (best of 100)
before this change:
! wall 0.064742 comb 0.000000 user 0.000000 sys 0.000000 (best of 100)
after this change:
! wall 0.052126 comb 0.000000 user 0.000000 sys 0.000000 (best of 100
The introduction of the new URL parsing code has created a startup
time regression. This is mainly due to the use of url.hasscheme() in
the ui class. It ends up importing many libraries that the url module
requires.
This fix helps marginally, but if we can get rid of the urllib import
in the URL parser all together, startup time will go back to normal.
perfstartup time before the URL refactoring (707e4b1e8064):
! wall 0.050692 comb 0.000000 user 0.000000 sys 0.000000 (best of 100)
current startup time (9ad1dce9e7f4):
! wall 0.070685 comb 0.000000 user 0.000000 sys 0.000000 (best of 100)
after this change:
! wall 0.064667 comb 0.000000 user 0.000000 sys 0.000000 (best of 100)
The problem is that a programmer using atomictempfile directly can
make an innocent everyday mistake -- not enough args to the
constructor -- which escalates badly. You would expect a simple
TypeError crash in that case, but you actually get an infinite
recursion that is surprisingly difficult to kill: it happens between
__del__() and __getattr__(), and Python does not handle infinite
recursion from __del__() well.
The fix is to not implement __getattr__(), but instead assign instance
attributes for the methods we wish to delegate to the builtin file
type: write() and fileno(). I've audited mercurial.* and hgext.* and
found no users of atomictempfile using methods other than write() and
rename(). I audited third-party extensions and found one (snap)
passing an atomictempfile to util.fstat(), so I also threw in
fileno().
The last time I submitted a similar patch, Matt proposed that we make
atomictempfile a subclass of file instead of wrapping it. Rejected on
grounds of unnecessary complexity: for one thing, it would make the
Windows implementation of posixfile quite a bit more complex. It would
have to become a subclass of file rather than a simple function -- but
since it's written in C, this is non-obvious and non-trivial.
Furthermore, there's nothing wrong with wrapping objects and
delegating methods: it's a well-established pattern that works just
fine in many cases. Subclassing is not the answer to all of life's
problems.
path_auditor is used for checking patterns too, but a pattern is not a valid
filename.
This patch fixes fb1792e89e34, which introduced the bug:
$ hg log -l3 glob:**.py
abort: filename contains '*', which is reserved on Windows: mercurial\**.py
Example (on Windows):
$ hg parents
$ hg manifest tip
con.xml
$ hg update
abort: filename contains 'con', which is reserved on Windows: con.xml
Before this patch, update produced (as explained in issue2755):
$ hg update
abort: No usable temporary filename found
I've added the new function checkwinfilename to util.py and not to windows.py,
so that we can later call it when running on posix platforms too, for when we
decide to implement a (configurable) warning message on 'hg add'.
As per this patch, checkwinfilename is currently only used when running
on Windwows.
path_auditor calls checkosfilename, which is a NOP on posix and an alias for
checkwinfilename on Windows.
Python added support for Windows 6.0 (Vista) symbolic links in 3.2 [1], but
even these symbolic links aren't what we can expect from a canonical
symbolic link, since creation requires SeCreateSymbolicLinkPrivilege,
which typically only admins have.
So we can safely assume that we don't have symbolic links on Windows.
[1] http://docs.python.org/py3k/library/os.html#os.symlink
The use of "{datetime}" was unfortunate since I as a user never knew
if I was expected to do
hg log -d '>{2011-04-01}'
or
hg log -d '>2011-04-01'
The word "datetime" is also confusing -- calling it a date it much
simpler.
This replaces util.drop_scheme() with url.localpath(), using url.url for
parsing instead of doing it on its own. The function is moved from
util to url to avoid an import cycle.
hg.localpath() is removed in favor of using url.localpath(). This
provides more consistent behavior between "hg clone" and other
commands.
To preserve backwards compatibility, URLs like bundle://../foo still
refer to ../foo, not /foo.
If a URL contains a scheme, percent-encoded entities are decoded. When
there's no scheme, all characters are left untouched.
Comparison of old and new behaviors:
URL drop_scheme() hg.localpath() url.localpath()
=== ============= ============== ===============
file://foo/foo /foo foo/foo /foo
file://localhost:80/foo /foo localhost:80/foo /foo
file://localhost:/foo /foo localhost:/foo /foo
file://localhost/foo /foo /foo /foo
file:///foo /foo /foo /foo
file://foo (empty string) foo /
file:/foo /foo /foo /foo
file:foo foo foo foo
file:foo%23bar foo%23bar foo%23bar foo#bar
foo%23bar foo%23bar foo%23bar foo%23bar
/foo /foo /foo /foo
Windows-related paths on Windows:
URL drop_scheme() hg.localpath() url.localpath()
=== ============= ============== ===============
file:///C:/foo C:/C:/foo /C:/foo C:/foo
file:///D:/foo C:/D:/foo /D:/foo D:/foo
file://C:/foo C:/foo C:/foo C:/foo
file://D:/foo C:/foo D:/foo D:/foo
file:////foo/bar //foo/bar //foo/bar //foo/bar
//foo/bar //foo/bar //foo/bar //foo/bar
\\foo\bar //foo/bar //foo/bar \\foo\bar
Windows-related paths on other platforms:
file:///C:/foo C:/C:/foo /C:/foo C:/foo
file:///D:/foo C:/D:/foo /D:/foo D:/foo
file://C:/foo C:/foo C:/foo C:/foo
file://D:/foo C:/foo D:/foo D:/foo
file:////foo/bar //foo/bar //foo/bar //foo/bar
//foo/bar //foo/bar //foo/bar //foo/bar
\\foo\bar //foo/bar //foo/bar \\foo\bar
For more information about file:// URL handling, see:
http://www-archive.mozilla.org/quality/networking/testing/filetests.html
Related issues:
- issue1153: File URIs aren't handled correctly in windows
This patch should preserve the fix implemented in
5c92d05b064e. However, it goes a step further and "promotes"
Windows-style drive letters from being interpreted as host names to
being part of the path.
- issue2154: Cannot escape '#' in Mercurial URLs (#1172 in THG)
The fragment is still interpreted as a revision or a branch, even in
paths to bundles. However, when file: is used, percent-encoded
entities are decoded, so file:test%23bundle.hg can refer to
test#bundle.hg ond isk.
The previous test assumed that 'os.name' was "mac" on Mac OS X. This
is not the case; 'mac' was classic Mac OS, whereas Mac OS X has 'os.name'
be 'posix'.
Please note that this change will break Mercurial on hypothetical
non-Mac OS X deployments of Darwin.
Credit to Brodie Rao for thinking of CGSessionCopyCurrentDictionary()
and Kevin Bullock for testing.
Add missing calls to close() to many places where files are
opened. Relying on reference counting to catch them soon-ish is not
portable and fails in environments with a proper GC, such as PyPy.
The pywin32 package is no longer needed.
ctypes is now required for running Mercurial on Windows.
ctypes is included in Python since version 2.5. For Python 2.4, ctypes is
available as an extra installer package for Windows.
Moved spawndetached() from windows.py to win32.py and fixed it, using
ctypes as well. spawndetached was defunct with Python 2.6.6 because Python
removed their undocumented subprocess.CreateProcess. This fixes
'hg serve -d' on Windows.
If pywin32 is not installed, 'os.lstat(pathname).st_nlink' is used for
nlinks(), which is always zero for all files on Windows.
To make sure we break up hardlinks if pywin32 is missing, we force
nlink = 2 if nlinks() returns < 1.
(this completely fixes issue1922)
It tries to convert localstr to unicode before truncating.
Because we cannot assume that the given text is encoded in local encoding,
it falls back to raw string in case of unicode error.
In a date like 10:30, there are two underspecified ends: the specific
end (seconds) and the broad end (day, month, year). When matching
"10:30", we need to allow the specific end to go from 0 to 59 seconds,
while the broad end is assumed to be today's date.
Similar handling applies for a date range like "Mar 1": year is fixed
to today, any time matches.
Preventing file loss repository corruption (e.g. vanished changelog.i) when
Mercurial pushes to repositories on Windows shares served by Samba.
This is a workaround for Samba bug 7863, which is present in current latest
stable Samba 3.5.6 and various prior versions down to 3.0.26a (the oldest one
I tested).
Of course this should be fixed in Samba, but there probably aren't that many
other applications who use hardlinks that extensively and keep files open like
Mercurial, so the pressure to fix this on Samba is probably not that high. And
even if the Samba project should be able to fix their bug within a month or
two, it will take quite some time until users upgrade their Samba installs.
split can be more readable for longer lists like the list in
dirstate.invalidate. As dirstate.invalidate is used in wlock() and therefoe
used heavily, I think it's worth avoiding a split there too.
If Linux is asked to open a filename with a trailing directory separator,
e.g. "foo/", the open fails with EISDIR. On AIX, the open succeeds, opening
file "foo". This causes test-mq-qnew to fail on AIX.
Fix by adding 'ends with directory separator' to the conditions checked
by the path auditor. Change test to expect auditor fail message.
Python's time.gmtime(lt) fails on Windows, producing a traceback with
ValueError: (22, 'Invalid argument')
if lt < -43200.
We get a local time boundary value of -43200 if we take "the epoch"
Thu Jan 01 00:00:00 1970 = time.gmtime(0)
from timezone 'UTC+0' into timezone 'UTC-12'. All other timezones will have
larger local time values for that point in time.
Aborting with a traceback on 'hg log' for revisions with a timestamp value
< -43200 is clearly not acceptable.
Returning "invalid timestamp" or similar as string representation is not an
option either, since that may crash other tools which parse the output of
'hg log'.
Instead, we teach util.datestr() to return the epoch in timezone UTC+0 on
*all platforms*, represented by the string
Thu Jan 01 00:00:00 1970 +0000
if the timestamp's unix time value is negative.
(based on a patch originally proposed by Benjamin Pollack)
If os_link fails on Windows, errno is always errno.EINVAL,
so we can't really say if the testlink could not be created
because (a) the FS doesn't support hardlinks or (b) there
is a leaked .hgtmp file lying around from a previous crashed
run.
So let's err on the safe side, keep the code simple and assume
we can't detect hardlinks in both cases.
The Linux CIFS kernel driver (even in 2.6.36) suffers from a hardlink
count blindness bug (lstat() returning 1 in st_nlink when it is expected
to return >1), which causes repository corruption if Mercurial running
on Linux pushes or commits to a hardlinked repository stored on a Windows
share, if that share is mounted using the CIFS driver.
This patch works around issue1866 and improves the workaround done in
65e082ae3076 to fix issue761, by teaching the opener to lazily execute a
runtime check (new function checknlink) to see if the hardlink count
reported by nlinks() can be trusted.
Since nlinks() is also known to return varying count values (1 or >1)
depending on whether the file is open or not and depending on what client
and server software combination is being used for accessing and serving
the Windows share, we deliberately open the file before calling nlinks() in
order to have a stable precondition. Trying to depend on the precondition
"file closed" would be fragile, as the file could have been opened very
easily somewhere else in the program.
- Don't call atomictempfile or nlinks() if the path is malformed
(no basename). Let posixfile() raise IOError directly.
- atomictempfile already breaks up hardlinks, no need to poke
at the file with nlinks() if atomictemp.
- No need to copy the file contents to break hardlinks for 'w'rite
modes (w, wb, w+, w+b). Unlinking and recreating the file is faster.
226847bf9cab updated copyfile to also copy over atimes and
mtimes. That behavior is specifically to trick editors into thinking
files that hg record has modified haven't changed. We don't really
care about preserving times in the general case.
The canonpath function will default to creating its own path auditor,
but in some cases it will be useful to use a specialized auditor,
e.g., one that wont abort if a path lies within a subrepository.
This adds util.getport(port) which tries to parse port as an int, and
failing that, looks it up using socket.getservbyname(). Thus, the
following will work:
[smtp]
port = submission
[web]
port = http
This does not apply to ports in URLs used in clone, pull, etc.
The following patch allows the use of python2.4 with a standalone
hashlib rather than assuming that python2.5 is in use when hashlib is
imported successfully.
util.interpolate can be used to replace multiple items in a string all at once
(and optionally apply a function to the replacement), without worrying about
recursing:
>>> import util
>>> s = '$foo, $spam'
>>> util.interpolate(r'\$', { 'foo': 'bar', 'spam': 'eggs' }, s)
'bar, eggs'
>>> util.interpolate(r'\$', { 'foo': 'spam', 'spam': 'foo' }, s)
'spam, foo'
>>> util.interpolate(r'\$', { 'foo': 'spam', 'spam': 'foo' }, s, lambda s: s.upper())
'SPAM, FOO'
The patch also changes filemerge.py to use this new function.
The following patch allows the use of python2.4 with a standalone
hashlib rather than assuming that python2.5 is in use when hashlib is
imported successfully.
This significantly refactors the read() loop to use a queue of chunks.
The queue is alternately filled to at least 256k and then emptied by
concatenating onto the output buffer.
For very large read sizes, += uses less memory because it can resize
the target string in place.
There's no buffer type in py3k, util.py has a function, called
fakebuffer, that implements a similar API. This patch implements
fakebuffer as a memoryview wrapper in py3k.
2to3 is unable to translate '__builtin__' calls to 'builtins' when
hasattr is used (as in hasattr(__builtin__, buffer)). Translating the
check to the format
try:
whatever
except NameError
# define whatever
__builtin__.whatever = whatever
is a correct way of checking for the name and has the benefit of being
translated by 2to3. This patch implements the same idea for the
aforementioned example.
Mercurial has problem around text wrapping/filling in MBCS encoding
environment, because standard 'textwrap' module of Python can not
treat it correctly. It splits byte sequence for one character into two
lines.
According to unicode specification, "east asian width" classifies
characters into:
W(ide), N(arrow), F(ull-width), H(alf-width), A(mbiguous)
W/N/F/H can be always recognized as 2/1/2/1 bytes in byte sequence,
but 'A' can not. Size of 'A' depends on language in which it is used.
Unicode specification says:
If the context(= language) cannot be established reliably they
should be treated as narrow characters by default
but many of class 'A' characters are full-width, at least, in Japanese
environment.
So, this patch treats class 'A' characters as full-width always for
safety wrapping.
This patch focuses only on MBCS safe-ness, not on writing/printing
rule strict wrapping for each languages
MBCS sensitive textwrap class is originally implemented
by ITO Nobuaki <daydream.trippers@gmail.com>.
If the os_link() call on the first file in the directory fails [1],
we switch mode to using shutil.copy() for all remaining files.
[1] happens for example on Windows for every file when cloning from a UNC
path without specifying --pull.
sys.stdout.write('-'*80 + '\n')
or
sys.stdout.write('-'*80 + '\r')
do not work on Windows as they do on unix. On a 80 columns Windows console, the
extra CR or LF are interpreted as if belonging to the next line, so the first
command displays 2 lines (only one on unix) and the second one leave the line
visible and move back to the following line. To avoid this, we sacrifice one
column under Windows.
The file-based synchronization introduced by 670de588e29e hangs when the child
process fails before terminating the handshake, which the previous pipe-based
version handled correctly. To fix this, the parent polling loop was fixed to
detect premature terminations of the child process.
Hiding the child process window is not strictly necessary but it avoids opening
an empty shell window when running hg serve as well as a task in the task bar.
The window is hidden after the process is already started causing a single
flicker.
On Windows, Mercurial can be run from the python script of from a frozen
executable. In the first case, we have to call the python interpreter since the
script is not executable. Frozen executable can be called directly.
Fix 3/3 for issue421
The new code aims to implement the RFC correctly for file URIs.
Previously they were handled incorrectly in several ways, which
could cause problem on Windows in particular.
This is necessary when the executable name is not 'hg'. For example,
if your system-wide mercurial is name 'hgs', sys.argv[0] is more
accurate than 'hg'.
subprocess allows the environment and working directory to be specified
directly, so the hacks for making temporary changes while forking is no longer
necessary.
This also fixes failures on solaris where the temporary changes can't be undone
because there is no unsetenv.
The code for wrapping a single line of text with a hanging indent was
duplicated in commands and help -- it's now moved to a new function
called wrap in util.
The function defaults to a line width is 78 chars, and this un-wraps
some command line flag descriptions, hence the test output changes.
Parser only knows about en_US output. Forcing the encoding to UTF-8 might not
be the best thing to do since the caller may receive some of the subversion
output, but at least it should prevent conversion errors from svn client.
The subprocess module is not thread safe. Spawning a thread to read
the output leads to exceptions like this when Mercurial exits:
Exception exceptions.TypeError: TypeError("'NoneType' object is not
callable",) in <bound method Popen.__del__ of <subprocess.Popen
object at 0x9ed0dcc>> ignored
The bug is already reported in the Python bug tracker:
http://bugs.python.org/issue1731717
Some modules (like revlog) would import util.sha1 as _sha1. This
defeats the purpose of having util.sha1 overwrite itself with a faster
version -- revlog would end up always calling the slow version. By
always delegating to util._fastsha1 we avoid this at the cost of an
extra (but unconditional) indirection.
Use a temporary file name as target for a forced rename on Windows. The
target file name is not opened at any time; just renamed into and then
unlinked. Using a temporary instead of a static name is necessary since
otherwise a hg crash can leave the file lying around, blocking future
attempts at renaming.