While the URL parser is very forgiving about what characters are
allowed in each component, it's useful to be strict about the scheme
so we don't accidentally interpret local paths with colons as URLs.
This restricts schemes to containing alphanumeric characters, dashes,
pluses, and dots (as specified in RFC 2396).
This replaces util.drop_scheme() with url.localpath(), using url.url for
parsing instead of doing it on its own. The function is moved from
util to url to avoid an import cycle.
hg.localpath() is removed in favor of using url.localpath(). This
provides more consistent behavior between "hg clone" and other
commands.
To preserve backwards compatibility, URLs like bundle://../foo still
refer to ../foo, not /foo.
If a URL contains a scheme, percent-encoded entities are decoded. When
there's no scheme, all characters are left untouched.
Comparison of old and new behaviors:
URL drop_scheme() hg.localpath() url.localpath()
=== ============= ============== ===============
file://foo/foo /foo foo/foo /foo
file://localhost:80/foo /foo localhost:80/foo /foo
file://localhost:/foo /foo localhost:/foo /foo
file://localhost/foo /foo /foo /foo
file:///foo /foo /foo /foo
file://foo (empty string) foo /
file:/foo /foo /foo /foo
file:foo foo foo foo
file:foo%23bar foo%23bar foo%23bar foo#bar
foo%23bar foo%23bar foo%23bar foo%23bar
/foo /foo /foo /foo
Windows-related paths on Windows:
URL drop_scheme() hg.localpath() url.localpath()
=== ============= ============== ===============
file:///C:/foo C:/C:/foo /C:/foo C:/foo
file:///D:/foo C:/D:/foo /D:/foo D:/foo
file://C:/foo C:/foo C:/foo C:/foo
file://D:/foo C:/foo D:/foo D:/foo
file:////foo/bar //foo/bar //foo/bar //foo/bar
//foo/bar //foo/bar //foo/bar //foo/bar
\\foo\bar //foo/bar //foo/bar \\foo\bar
Windows-related paths on other platforms:
file:///C:/foo C:/C:/foo /C:/foo C:/foo
file:///D:/foo C:/D:/foo /D:/foo D:/foo
file://C:/foo C:/foo C:/foo C:/foo
file://D:/foo C:/foo D:/foo D:/foo
file:////foo/bar //foo/bar //foo/bar //foo/bar
//foo/bar //foo/bar //foo/bar //foo/bar
\\foo\bar //foo/bar //foo/bar \\foo\bar
For more information about file:// URL handling, see:
http://www-archive.mozilla.org/quality/networking/testing/filetests.html
Related issues:
- issue1153: File URIs aren't handled correctly in windows
This patch should preserve the fix implemented in
5c92d05b064e. However, it goes a step further and "promotes"
Windows-style drive letters from being interpreted as host names to
being part of the path.
- issue2154: Cannot escape '#' in Mercurial URLs (#1172 in THG)
The fragment is still interpreted as a revision or a branch, even in
paths to bundles. However, when file: is used, percent-encoded
entities are decoded, so file:test%23bundle.hg can refer to
test#bundle.hg ond isk.
Like the previous patch to getauthinfo(), this also makes
username/password parsing more forgiving for SSH URLs.
This also opens up the possibility of allowing non-numeric ports,
since the URL parser has no problem handling them.
Related issues:
- issue851: @ in password in http url
- issue2055: nonnumeric port bug with https protocol
This works around a potential issue in Python 2.4 where cloning a repo
with a URL like http://foo:8080 would cause urllib2 to query on
http://foo:8080?cmd=capabilities instead of
http://foo:8080/?cmd=capabilities.
In the past, this issue has been masked by the fact that
url.getauthinfo() added a trailing slash when it was missing.
This adds a url object that re-implements urlsplit() and
unsplit(). The implementation splits out usernames, passwords, and
ports.
The implementation is based on the behavior specified by RFC
2396[1]. However, it is much more forgiving than the RFC's
specification; it places no specific restrictions on what characters
are allowed in each segment of the URL other than what is necessary to
split the URL into its constituent parts.
[1]: http://www.ietf.org/rfc/rfc2396.txt
Internally, the group name is only used in debug statements, but readauthforuri
can be also used externally to determine which group will be matched for a given
URL.
None of the auth section subkeys include a period, so it makes zero
sense to not split from the end. By using rsplit() users can use
the hostname as group keys.
If --insecure specified, it behaves in the same way as no web.cacerts
configured.
Also shows hint for --insecure option when _verifycert() failed. But currently
the hint isn't displayed on SSLError, because it needs a certain level of
changes.
Known fingerprints of HTTPS servers can now be configured in the
hostfingerprints section. That makes it possible to verify the identify of web
servers without configuring and trusting the CA chain.
Limitations:
* Portnumbers are ignored, just like with ordinary certificates.
* Host name matching is case sensitive.
SSLSocket.getpeercert() returns tuple containing unicode for 'subject'.
Since Mercurial does't support IDN at all, it just returns error for non-ascii
certname.
Mercurial will verify HTTPS server certificates if web.cacerts is configured,
but it will by default silently not verify any certificates.
We now warn the user that when the certificate isn't verified she won't get the
security she might expect from https:
warning: localhost certificate not verified (check web.cacerts config setting)
Self-signed certificates can be accepted silently by configuring web.cacerts to
point to a suitable certificate file.
This allows us to provide deterministic progress information during
transfer of bundle data over HTTP. This is required because we
currently buffer the bundle data to local disk prior to transfer since
wsgiref lacks chunked transfer-coding support.
Pythons SSL module verifies that certificates received for HTTPS are valid
according to the specified cacerts, but it doesn't verify that the certificate
is for the host we connect to.
We now explicitly verify that the commonName in the received certificate
matches the requested hostname and is valid for the time being.
This is a minimal patch where we try to fail to the safe side, but we do still
rely on Python's SSL functionality and do not try to implement the standards
fully and correctly. CRLs and subjectAltName are not handled and proxies
haven't been considered.
This change might break connections to some sites if cacerts is specified and
the certificates (by our definition) isn't correct. The workaround is to
disable cacerts which in most cases isn't much worse than it was before with
cacerts.
When using hg commands over an http repository in a long running process, a
httphandler instance is leaked for each command, because of a loop
handler.parent -> OpenerDirector and OpenerDirector.handlers -> handler which
is not handled by Python's gc. Discussion on #mercurial concluded that removing
the __del__ method solved the problem.
Mads Kiilerich pointed out that 1e4ade283b02 was too eager since the
prefix and password keys may contain $-signs. So this only add the
username to the list of keys that are expanded.
This also updates the documentation to match.
Since py3k doesn't have a "file" builtin and, consequently, doesn't support
inheriting from it, this patch refactors the httpsendfile class to wrap the
objects returned by the builtin "open" function while adding the necessary
methods (__len__ for constructing the Content-Length header and read, write,
close and seek for the file-like interface).
On Python 2.6.6 (and patched 2.6.5 on certain Linux distros),
the change that caused issue2255 was also applied to non-digest
authentication; this change extends the 3405cb92c120 fix
accordingly.
Mads Kiilerich pointed out that 1e4ade283b02 was too eager since the
prefix and password keys may contain $-signs. So this only add the
username to the list of keys that are expanded.
This also updates the documentation to match.
This fixes a regression introduced in d5b57915925b when Mercurial reuses the
auth handler for several requests and the redirect counter never is reset.
Python 2.6.5 will keep resetting the retry count on redirects, for example when
the server returns 401 on failing auth (like google code currently does). We
stop the endless recursion by not resetting the count.
http://bugs.python.org/issue3819 introduced the regression with Python 2.6.5.
http://bugs.python.org/issue8797 discusses a fix which might make it to 2.6.6
and 2.7.0.
Recent Pythons (e.g. 2.6.5 and 3.1) introduce a change that causes
urlparse.urlunparse(urlparse.urlparse('x://')) to return 'x:' instead of 'x://'i and
urlparse.urlunparse(urlparse.urlparse('x:///y')) to return 'x:/y' instead of 'x:///y'.
Fix url.hidepassword() and url.removeauth() to handle these cases.
E1101:514:httpshandler._makeconnection: Instance of 'tuple' has no 'pop' member
E1101:516:httpshandler._makeconnection: Instance of 'tuple' has no 'pop' member
Python 2.6.3 introduced HTTPS proxy tunnelling in a way that interferes with
the way HTTPS proxying is handled in Mercurial. This fix generalises it to work
on Python 2.4 to 2.6.
Versions of Python before 2.6 cannot automatically convert a given
port number to an integer, so we add a conversion to coerce the given
input to an int.
This extends the httpshandler with the means to utilise the auth
section to provide it with a PEM encoded certificate key file and
certificate chain file. This works also with sites that both require
client certificate authentication and basic or digest password
authentication, although the latter situation may require the user to
enter the PEM password multiple times.
urllib2 and httplib does not support using CONNECT proxy requests, but
only regular requests over the proxy. This does not work with HTTPS
requests as they typically require that the client issues a CONNECT to
the proxy to give a direct connection to the remote HTTPS server.
This is solved by duplicating some of the httplib functionality and
tying it together with the keepalive library such that a HTTPS
connection that need to be proxied can be proxied by letting a
connection be established to the proxy server and then subsequently
performing the normal request to the specified server through the
proxy server.
As it stands, the code also purports to support HTTPS proxies, i.e.
proxies that you connect to using SSL. These are extremely rare and
nothing is done to ensure that CONNECT requests can be made to these
as that would require multiple SSL handshakes. This use case is also
not supported by most other contemporary web tools like curl and
Firefox3.