Commit Graph

135 Commits

Author SHA1 Message Date
Brodie Rao
3bbcdd41bc url: be stricter about detecting schemes
While the URL parser is very forgiving about what characters are
allowed in each component, it's useful to be strict about the scheme
so we don't accidentally interpret local paths with colons as URLs.

This restricts schemes to containing alphanumeric characters, dashes,
pluses, and dots (as specified in RFC 2396).
2011-03-31 17:37:33 -07:00
Matt Mackall
d0ca936e58 url: nuke some newly-introduced underbars in identifiers 2011-03-31 10:43:53 -05:00
Brodie Rao
3a43fa887e url: refactor util.drop_scheme() and hg.localpath() into url.localpath()
This replaces util.drop_scheme() with url.localpath(), using url.url for
parsing instead of doing it on its own. The function is moved from
util to url to avoid an import cycle.

hg.localpath() is removed in favor of using url.localpath(). This
provides more consistent behavior between "hg clone" and other
commands.

To preserve backwards compatibility, URLs like bundle://../foo still
refer to ../foo, not /foo.

If a URL contains a scheme, percent-encoded entities are decoded. When
there's no scheme, all characters are left untouched.

Comparison of old and new behaviors:

URL                      drop_scheme()   hg.localpath()    url.localpath()
===                      =============   ==============    ===============
file://foo/foo           /foo            foo/foo           /foo
file://localhost:80/foo  /foo            localhost:80/foo  /foo
file://localhost:/foo    /foo            localhost:/foo    /foo
file://localhost/foo     /foo            /foo              /foo
file:///foo              /foo            /foo              /foo
file://foo               (empty string)  foo               /
file:/foo                /foo            /foo              /foo
file:foo                 foo             foo               foo
file:foo%23bar           foo%23bar       foo%23bar         foo#bar
foo%23bar                foo%23bar       foo%23bar         foo%23bar
/foo                     /foo            /foo              /foo

Windows-related paths on Windows:

URL                      drop_scheme()   hg.localpath()    url.localpath()
===                      =============   ==============    ===============
file:///C:/foo           C:/C:/foo       /C:/foo           C:/foo
file:///D:/foo           C:/D:/foo       /D:/foo           D:/foo
file://C:/foo            C:/foo          C:/foo            C:/foo
file://D:/foo            C:/foo          D:/foo            D:/foo
file:////foo/bar         //foo/bar       //foo/bar         //foo/bar
//foo/bar                //foo/bar       //foo/bar         //foo/bar
\\foo\bar                //foo/bar       //foo/bar         \\foo\bar

Windows-related paths on other platforms:

file:///C:/foo           C:/C:/foo       /C:/foo           C:/foo
file:///D:/foo           C:/D:/foo       /D:/foo           D:/foo
file://C:/foo            C:/foo          C:/foo            C:/foo
file://D:/foo            C:/foo          D:/foo            D:/foo
file:////foo/bar         //foo/bar       //foo/bar         //foo/bar
//foo/bar                //foo/bar       //foo/bar         //foo/bar
\\foo\bar                //foo/bar       //foo/bar         \\foo\bar

For more information about file:// URL handling, see:
http://www-archive.mozilla.org/quality/networking/testing/filetests.html

Related issues:

- issue1153: File URIs aren't handled correctly in windows

  This patch should preserve the fix implemented in
  5c92d05b064e. However, it goes a step further and "promotes"
  Windows-style drive letters from being interpreted as host names to
  being part of the path.

- issue2154: Cannot escape '#' in Mercurial URLs (#1172 in THG)

  The fragment is still interpreted as a revision or a branch, even in
  paths to bundles. However, when file: is used, percent-encoded
  entities are decoded, so file:test%23bundle.hg can refer to
  test#bundle.hg ond isk.
2011-03-30 20:03:05 -07:00
Brodie Rao
82ebe71a03 url: use url.url in proxyhandler 2011-03-30 20:01:44 -07:00
Brodie Rao
de32ed5320 httprepo/sshrepo: use url.url
Like the previous patch to getauthinfo(), this also makes
username/password parsing more forgiving for SSH URLs.

This also opens up the possibility of allowing non-numeric ports,
since the URL parser has no problem handling them.

Related issues:

- issue851: @ in password in http url
- issue2055: nonnumeric port bug with https protocol
2011-03-30 20:01:35 -07:00
Brodie Rao
c01046e0d6 url: use url.url in url.open() 2011-03-30 20:01:34 -07:00
Brodie Rao
80fd2713db url: abort on file:// URLs with non-localhost hosts 2011-03-30 20:01:31 -07:00
Brodie Rao
7db4593199 url: special case bundle URL parsing to preserve backwards compatibility
This allows bundle://../foo to continue to refer to the relative path
../foo (bundle URLs do not take host names).
2011-03-30 20:00:24 -07:00
Brodie Rao
40c99bbacb url: add trailing slashes to URLs with hostnames that don't have one
This works around a potential issue in Python 2.4 where cloning a repo
with a URL like http://foo:8080 would cause urllib2 to query on
http://foo:8080?cmd=capabilities instead of
http://foo:8080/?cmd=capabilities.

In the past, this issue has been masked by the fact that
url.getauthinfo() added a trailing slash when it was missing.
2011-03-30 20:00:23 -07:00
Brodie Rao
193402d9df url: move drive letter checking into has_drive_letter() for extensions
This will let the schemes extension override drive letter detection to
allow single letter schemes.
2011-03-30 19:50:56 -07:00
Matt Mackall
712f3e23d9 url: deal with drive letters 2011-03-30 13:34:39 -05:00
Brodie Rao
a4072d0def url: use url.url in hidepassword() and removeauth() 2011-03-25 22:59:09 -07:00
Brodie Rao
186bc90ec4 url: provide url object
This adds a url object that re-implements urlsplit() and
unsplit(). The implementation splits out usernames, passwords, and
ports.

The implementation is based on the behavior specified by RFC
2396[1]. However, it is much more forgiving than the RFC's
specification; it places no specific restrictions on what characters
are allowed in each segment of the URL other than what is necessary to
split the URL into its constituent parts.

[1]: http://www.ietf.org/rfc/rfc2396.txt
2011-03-25 22:58:56 -07:00
timeless
49d2d3233b cacert: improve error report when web.cacert file does not exist 2011-03-06 10:27:07 -06:00
Mads Kiilerich
1edc9de542 url: merge BetterHTTPS with httpsconnection to get some proxy https validation 2011-02-16 04:36:36 +01:00
Mads Kiilerich
c7b145f8d2 url: always create BetterHTTPS connections the same way 2011-02-16 04:28:17 +01:00
Mads Kiilerich
ed367d71cf url: refactor BetterHTTPS.connect 2011-02-16 04:28:17 +01:00
Mads Kiilerich
6c8a377242 url: refactor _gen_sendfile 2011-02-16 04:28:17 +01:00
Mads Kiilerich
caabaf5584 url: remove test for self.ui in BetterHTTPS
We use self.ui unconditionally anyway so we would have noticed if it in some
cases wasn't set.
2011-02-16 04:28:17 +01:00
Steve Borho
712b02dd36 url: return the matched authentication group name from readauthforuri()
Internally, the group name is only used in debug statements, but readauthforuri
can be also used externally to determine which group will be matched for a given
URL.
2011-02-13 12:19:58 -06:00
Steve Borho
dbbd9916fa url: move [auth] parsing out into a utility function
No functionality change, but it makes the [auth] section parsing and
best match detection usable by third party tools
2011-02-12 21:59:43 -06:00
Steve Borho
38702faab3 url: use rsplit to split [auth] keys
None of the auth section subkeys include a period, so it makes zero
sense to not split from the end.  By using rsplit() users can use
the hostname as group keys.
2011-02-12 21:53:27 -06:00
Mads Kiilerich
0c2d4a2e7e merge with stable 2011-02-01 01:55:45 +01:00
Yuya Nishihara
da93c3bd0b url: add --insecure option to bypass verification of ssl certificates
If --insecure specified, it behaves in the same way as no web.cacerts
configured.

Also shows hint for --insecure option when _verifycert() failed. But currently
the hint isn't displayed on SSLError, because it needs a certain level of
changes.
2011-01-29 23:23:24 +09:00
Mads Kiilerich
e10e504454 url: 'ssh known host'-like checking of fingerprints of HTTPS certificates
Known fingerprints of HTTPS servers can now be configured in the
hostfingerprints section. That makes it possible to verify the identify of web
servers without configuring and trusting the CA chain.

Limitations:
* Portnumbers are ignored, just like with ordinary certificates.
* Host name matching is case sensitive.
2011-01-28 02:57:59 +01:00
Yuya Nishihara
593388c52e url: check subjectAltName when verifying ssl certificate
Now it verifies certificate in the same manner as py3k implementation:
http://svn.python.org/view/python/branches/py3k/Lib/ssl.py?view=markup#match_hostname
2011-01-09 00:35:36 +09:00
Yuya Nishihara
ecde2415b3 url: fix UnicodeDecodeError on certificate verification error
SSLSocket.getpeercert() returns tuple containing unicode for 'subject'.
Since Mercurial does't support IDN at all, it just returns error for non-ascii
certname.
2011-01-08 21:52:25 +09:00
Mads Kiilerich
adb37b3b39 merge with stable 2011-01-28 03:09:22 +01:00
Mads Kiilerich
e45cfbe202 merge with stable 2011-01-11 02:48:58 +01:00
Eduard-Cristian Stefan
c66ec9cf09 url: expand path for web.cacerts 2011-01-02 15:30:12 +02:00
Martin Geisler
dc8a50e193 merge with stable 2011-01-05 15:56:03 +01:00
Matt Mackall
64de41cf08 url: fix trailing whitespace 2010-12-20 15:26:36 -06:00
Mads Kiilerich
97213d6b00 https: warn when server certificate isn't verified
Mercurial will verify HTTPS server certificates if web.cacerts is configured,
but it will by default silently not verify any certificates.

We now warn the user that when the certificate isn't verified she won't get the
security she might expect from https:
  warning: localhost certificate not verified (check web.cacerts config setting)

Self-signed certificates can be accepted silently by configuring web.cacerts to
point to a suitable certificate file.
2010-12-18 21:58:52 +01:00
Matt Mackall
86bd49d7fc url: fix check-code whitespace complaint 2010-12-20 12:12:18 -06:00
Mads Kiilerich
298ff06c2d merge with stable 2010-12-18 22:06:11 +01:00
Mads Kiilerich
0f04e7650d url: fix https client authentication through proxy
There is no tests for this, but the parameter order was obviously wrong.
2010-11-01 01:56:12 +01:00
Augie Fackler
ae84cccfbf httpsendfile: record progress information during read()
This allows us to provide deterministic progress information during
transfer of bundle data over HTTP. This is required because we
currently buffer the bundle data to local disk prior to transfer since
wsgiref lacks chunked transfer-coding support.
2010-12-10 13:31:06 -06:00
timeless
2834b35b09 url: show realm/user when asking for username/password 2010-10-26 14:41:58 +03:00
Martin Geisler
77ce66fb6a check-code: find trailing whitespace 2010-10-20 10:13:04 +02:00
Mads Kiilerich
70b420d9b9 url: validity (notBefore/notAfter) is checked by OpenSSL (issue2407)
Removing the check from our code makes https with cacerts check work with
Python < 2.6.
2010-10-17 04:14:06 +02:00
Martin Geisler
a82d25297a merge with stable 2010-10-01 16:43:03 +02:00
Martin Geisler
08e5ff349e url: mark certificate error string for translation 2010-10-01 16:08:46 +02:00
Mads Kiilerich
916b2a0e20 url: verify correctness of https server certificates (issue2407)
Pythons SSL module verifies that certificates received for HTTPS are valid
according to the specified cacerts, but it doesn't verify that the certificate
is for the host we connect to.

We now explicitly verify that the commonName in the received certificate
matches the requested hostname and is valid for the time being.

This is a minimal patch where we try to fail to the safe side, but we do still
rely on Python's SSL functionality and do not try to implement the standards
fully and correctly. CRLs and subjectAltName are not handled and proxies
haven't been considered.

This change might break connections to some sites if cacerts is specified and
the certificates (by our definition) isn't correct. The workaround is to
disable cacerts which in most cases isn't much worse than it was before with
cacerts.
2010-10-01 00:46:59 +02:00
Mads Kiilerich
2fcbe3473c merge with stable 2010-10-01 00:54:03 +02:00
Alexandre Fayolle
55241e7f70 Fix memory leak when using hg commands over http repositories
When using hg commands over an http repository in a long running process, a
httphandler instance is leaked for each command, because of a loop
handler.parent -> OpenerDirector and OpenerDirector.handlers -> handler which
is not handled by Python's gc. Discussion on #mercurial concluded that removing
the __del__ method solved the problem.
2010-09-23 11:41:27 +02:00
Patrick Mezard
bb3259c957 Merge with stable 2010-09-24 00:17:04 +02:00
Martin Geisler
db0d34b21b url: limit expansion to safe auth keys (Issue2328)
Mads Kiilerich pointed out that 1e4ade283b02 was too eager since the
prefix and password keys may contain $-signs. So this only add the
username to the list of keys that are expanded.

This also updates the documentation to match.
2010-08-13 10:53:10 +02:00
Martin Geisler
74902a4df9 url: expand vars in all [auth] settings (issue2328) 2010-08-13 10:10:11 +02:00
Renato Cunha
706e64ef14 url.py: removed 'file' inheritance in the httpsendfile class
Since py3k doesn't have a "file" builtin and, consequently, doesn't support
inheriting from it, this patch refactors the httpsendfile class to wrap the
objects returned by the builtin "open" function while adding the necessary
methods (__len__ for constructing the Content-Length header and read, write,
close and seek for the file-like interface).
2010-08-14 18:31:22 -03:00
Martin Geisler
e7bd3fc69a Merge with stable 2010-08-14 03:30:35 +02:00
Wagner Bruna
4da44ca3ac http basic auth: reset redirect counter on new requests (issue2255)
On Python 2.6.6 (and patched 2.6.5 on certain Linux distros),
the change that caused issue2255 was also applied to non-digest
authentication; this change extends the 3405cb92c120 fix
accordingly.
2010-08-13 13:32:05 -03:00
Martin Geisler
a3814ecf05 url: limit expansion to safe auth keys (Issue2328)
Mads Kiilerich pointed out that 1e4ade283b02 was too eager since the
prefix and password keys may contain $-signs. So this only add the
username to the list of keys that are expanded.

This also updates the documentation to match.
2010-08-13 10:53:10 +02:00
Martin Geisler
e669ffa346 url: expand vars in all [auth] settings (issue2328) 2010-08-13 10:10:11 +02:00
Mads Kiilerich
107eb37856 http digest auth: reset redirect counter on new requests (issue2255)
This fixes a regression introduced in d5b57915925b when Mercurial reuses the
auth handler for several requests and the redirect counter never is reset.
2010-06-26 23:00:58 +02:00
Mads Kiilerich
206817ee38 http push: break infinite recursion on failure with Python 2.6.5 (issue2179)
Python 2.6.5 will keep resetting the retry count on redirects, for example when
the server returns 401 on failing auth (like google code currently does). We
stop the endless recursion by not resetting the count.

http://bugs.python.org/issue3819 introduced the regression with Python 2.6.5.

http://bugs.python.org/issue8797 discusses a fix which might make it to 2.6.6
and 2.7.0.
2010-06-16 22:54:58 +02:00
Michael Glassford
f37b4abb21 schemes: fix // breakage with Python 2.6.5 (issue2111)
Recent Pythons (e.g. 2.6.5 and 3.1) introduce a change that causes
urlparse.urlunparse(urlparse.urlparse('x://')) to return 'x:' instead of 'x://'i and
urlparse.urlunparse(urlparse.urlparse('x:///y')) to return 'x:/y' instead of 'x:///y'.
Fix url.hidepassword() and url.removeauth() to handle these cases.
2010-04-08 11:00:46 -04:00
Benoit Allard
5c44ed9439 url: expand path in auth filenames 2010-03-26 21:37:18 +01:00
Martin Geisler
b7c229f7fa url: only mark format string for translation 2010-02-24 17:11:37 +01:00
Patrick Mezard
297d72a71e url: avoid traceback when parsing [auth] (issue2056) 2010-02-23 22:31:54 +01:00
Benoit Boissinot
3e6397360c url: *args argument is a tuple, not a list (found by pylint)
E1101:514:httpshandler._makeconnection: Instance of 'tuple' has no 'pop' member
E1101:516:httpshandler._makeconnection: Instance of 'tuple' has no 'pop' member
2010-02-19 02:51:35 +01:00
Benoit Boissinot
a92c59721b url: correctly quote '/' in user and password embedded in urls 2010-02-15 22:39:36 +01:00
Benoit Boissinot
0a475e9eff url: fix python < 2.6 with ssl installed
_GLOBAL_DEFAULT_TIMEOUT isn't related to ssl, but to python < 2.6, move it to
the right hunk.
2010-02-15 18:12:50 +01:00
Benoit Boissinot
b500e0f9b2 url: proxy handling, simplify and correctly deal with IPv6
Thanks to Henrik for testing.
2010-02-11 20:42:20 +01:00
Matt Mackall
fac557c194 ssl: fix compatibility with pre-2.6 Python 2010-02-10 17:42:57 -06:00
Henrik Stuart
2d2f851cb8 url: SSL server certificate verification using web.cacerts file (issue1174) 2010-02-10 20:27:46 +01:00
Benoit Boissinot
7f48909ac5 url: httplib.HTTPSConnection already handles IPv6 and port parsing fine 2010-02-10 20:08:18 +01:00
Matt Mackall
8d99be19f0 many, many trivial check-code fixups 2010-01-25 00:05:27 -06:00
Matt Mackall
595d66f424 Update license to GPLv2+ 2010-01-19 22:20:08 -06:00
Henrik Stuart
f5163bdcf6 url: generalise HTTPS proxy handling to accomodate Python changes
Python 2.6.3 introduced HTTPS proxy tunnelling in a way that interferes with
the way HTTPS proxying is handled in Mercurial. This fix generalises it to work
on Python 2.4 to 2.6.
2009-11-13 06:29:49 +01:00
Augie Fackler
c66ce37035 keepalive: handle broken pipes gracefully during large POSTs 2009-11-02 11:03:22 -05:00
Martin Geisler
9f1896c083 do not attempt to translate ui.debug output 2009-09-19 01:15:38 +02:00
Henrik Stuart
53551a6626 url: add support for custom handlers in extensions 2009-08-11 22:45:38 +02:00
Wagner Bruna
1dc5d9c774 url: fix use of non-int port in https connections via proxy
Complements aaf0c304ea93 (issue1725).
2009-07-14 17:12:12 -03:00
Henrik Stuart
6360c8356c url: fix use of non-int port in https connections (issue1725)
Versions of Python before 2.6 cannot automatically convert a given
port number to an integer, so we add a conversion to coerce the given
input to an int.
2009-07-08 18:35:13 +02:00
Henrik Stuart
eb1e0babd5 url: let host port take precedence when connecting to HTTPS
Fixes use of HTTPS connections on non-standard ports.
2009-06-20 17:09:49 +02:00
Henrik Stuart
972f154635 url: support client certificate files over HTTPS (issue643)
This extends the httpshandler with the means to utilise the auth
section to provide it with a PEM encoded certificate key file and
certificate chain file. This works also with sites that both require
client certificate authentication and basic or digest password
authentication, although the latter situation may require the user to
enter the PEM password multiple times.
2009-06-20 10:58:57 +02:00
Henrik Stuart
a79a5a1a15 url: use CONNECT for HTTPS connections through HTTP proxy (issue967)
urllib2 and httplib does not support using CONNECT proxy requests, but
only regular requests over the proxy. This does not work with HTTPS
requests as they typically require that the client issues a CONNECT to
the proxy to give a direct connection to the remote HTTPS server.

This is solved by duplicating some of the httplib functionality and
tying it together with the keepalive library such that a HTTPS
connection that need to be proxied can be proxied by letting a
connection be established to the proxy server and then subsequently
performing the normal request to the specified server through the
proxy server.

As it stands, the code also purports to support HTTPS proxies, i.e.
proxies that you connect to using SSL. These are extremely rare and
nothing is done to ensure that CONNECT requests can be made to these
as that would require multiple SSL handshakes. This use case is also
not supported by most other contemporary web tools like curl and
Firefox3.
2009-05-22 08:56:43 +02:00
Sune Foldager
3b6f32dcc9 url: fix bug in passwordmgr related to auth configuration
Usernames given as part of the URL would be ignored.
This bug was introduced in ff71c1e17a2ae6bd42d85685b412005cc1340c33
2009-05-11 07:55:13 +02:00
Sune Foldager
4a665141b4 allow http authentication information to be specified in the configuration 2009-05-04 20:26:27 +02:00
Martin Geisler
750183bdad updated license to be explicit about GPL version 2 2009-04-26 01:08:54 +02:00
Matt Mackall
c15de6b1b7 ui: make interactive a method 2009-04-26 16:50:44 -05:00
Martin Geisler
1deb417a82 util: use built-in set and frozenset
This drops Python 2.3 compatibility.
2009-04-22 00:55:32 +02:00
Patrick Mezard
1c11cefeaf url: detect scheme with a regexp instead of urlsplit()
The latter says 'c' is a scheme in 'c:\foo\bar'
2008-10-28 23:54:01 +01:00
Patrick Mezard
b5e6153e70 url: fix file:// URL handling 2008-10-28 22:24:41 +01:00
Benoit Boissinot
214af7ec3c factor out the url handling from httprepo
Create url.py to handle all the url handling:
- proxy handling
- workaround various python bugs
- handle username/password embedded in the url
2008-10-27 21:50:01 +01:00