Commit Graph

29327 Commits

Author SHA1 Message Date
Gregory Szorc
60b5c4f506 sslutil: implement wrapserversocket()
wrapsocket() is heavily tailored towards client use. In preparation
for converting the built-in server to use sslutil (as opposed to
the ssl module directly), we add wrapserversocket() for wrapping
a socket to be used on servers.
2016-07-14 20:14:19 -07:00
Gregory Szorc
6052136b07 hgweb: pass ui into preparehttpserver
Upcoming patches will need the built-in HTTPS server to be more
configurable.
2016-07-13 00:14:50 -07:00
Kostia Balytskyi
847780e6fe rebase: remove sortedstate-related confusion
The following rebase implementation details are frustrating:
- storing a list of sorted revision numbers in a field named sortedstate
- having sortedstate be a field of the rebaseruntime class
- using sortedstate[-1] as opposed to a more intuitive max(self.state) to
  compute the latest revision in the state

This commit fixes those imperfections.
2016-07-14 03:12:09 -07:00
Kostia Balytskyi
1c4ad1d1d3 rebase: replace extrafn field with _makeextrafn invocations
As per Yuya's advice, we would like to slightly reduce the amount of state
which is stored in rebaseruntime class. In this case, we don't need to store
extrafn field, as we can produce the necessary value by calling _makeextrafn
and the perf overhead is negligible.
2016-07-14 02:59:27 -07:00
Gregory Szorc
2125e5d7d4 mercurial: implement a source transforming module loader on Python 3
The most painful part of ensuring Python code runs on both Python 2
and 3 is string encoding. Making this difficult is that string
literals in Python 2 are bytes and string literals in Python 3 are
unicode. So, to ensure consistent types are used, you have to
use "from __future__ import unicode_literals" and/or prefix literals
with their type (e.g. b'foo' or u'foo').

Nearly every string in Mercurial is bytes. So, to use the same source
code on both Python 2 and 3 would require prefixing nearly every
string literal with "b" to make it a byte literal. This is ugly and
not something mpm is willing to do at this point in time.

This patch implements a custom module loader on Python 3 that performs
source transformation to convert string literals (unicode in Python 3)
to byte literals. In effect, it changes Python 3's string literals to
behave like Python 2's.

In addition, the module loader recognizes well-known built-in
functions (getattr, setattr, hasattr) and methods (encode and decode)
that barf when bytes are used and prevents these from being rewritten.
This prevents excessive source changes to accommodate this change
(we would have to rewrite every occurrence of these functions passing
string literals otherwise).

The module loader is only used on Python packages belonging to
Mercurial.

The loader works by tokenizing the loaded source and replacing
"string" tokens if necessary. The modified token stream is
untokenized back to source and loaded like normal. This does add some
overhead. However, this all occurs before caching: .pyc files will
cache the transformed version. This means the transformation penalty
is only paid on first load.

As the extensive inline comments explain, the presence of a custom
source transformer invalidates assumptions made by Python's built-in
bytecode caching mechanism. So, we have to wrap bytecode loading and
writing and add an additional header to bytecode files to facilitate
additional cache validation when the source transformations
change in the future.

There are still a few things this code doesn't handle well, namely
support for zip files as module sources and for extensions. Since
Mercurial doesn't officially support Python 3 yet, I'm inclined to
leave these as to-do items: getting a basic module loading mechanism
in place to unblock further Python 3 porting effort is more important
than comprehensive module importing support.

check-py3-compat.py has been updated to ignore frames. This is
necessary because CPython has built-in code to strip frames from the
built-in importer. When our custom code is present, this doesn't work
and the frames get all messed up. The new code is not perfect. It
works for now. But once you start chasing import failures you find
some edge cases where the files aren't being printed properly. This
only burdens people doing future Python 3 porting work so I'm inclined
to punt on the issue: the most important thing is for the source
transforming module loader to land.

There was a bit of churn in test-check-py3-compat.t because we now
trip up on str/unicode/bytes failures as a result of source
transformation. This is unfortunate but what are you going to do.

It's worth noting that other approaches were investigated.

We considered using a custom file encoding whose decode() would
apply source transformations. This was rejected because it would
require each source file to declare its custom Mercurial encoding.
Furthermore, when changing the source transformation we'd need to
version bump the encoding name otherwise the module caching layer
wouldn't know the .pyc file was invalidated. This would mean mass
updating every file when the source transformation changes. Yuck.

We also considered transforming at the AST layer. However, Python's
ast module is quite gnarly and doing AST transforms is quite
complicated, even for trivial rewrites. There are whole Python packages
that exist to make AST transformations usable. AST transforms would
still require import machinery, so the choice was basically to
perform source-level, token-level, or ast-level transforms.

Token-level rewriting delivers the metadata we need to rewrite
intelligently while being relatively easy to understand. So it won.

General consensus seems to be that this approach is the best available
to avoid bulk rewriting of '' to b''. However, we aren't confident
that this approach will never be a future maintenance burden. This
approach does unblock serious Python 3 porting efforts. So we can
re-evaulate once more work is done to support Python 3.
2016-07-04 11:18:03 -07:00
Yuya Nishihara
5c185fa427 compat: define ssize_t as int on 32bit Windows, silences C4142 warning
It appears Python.h provides ssize_t, which is aliased to int.

https://hg.python.org/cpython/file/v2.7.11/PC/pyconfig.h#l205
2016-07-15 23:54:56 +09:00
Yuya Nishihara
076cc2bdc1 commandserver: drop old unixservice implementation
It's been superseded by unixforkingservice.
2016-05-22 13:45:09 +09:00
Yuya Nishihara
cbf8b420cf chgserver: switch to new forking service
Threading and complex classes are no longer necessary. _autoexitloop() has
been replaced by polling cycle in the main thread.
2016-05-22 13:36:37 +09:00
Yuya Nishihara
ad9977e54c chgserver: extract stub factory of service object
The class inheritance will be replaced by composition. See the next patch
for details.
2016-05-22 13:13:04 +09:00
Yuya Nishihara
692e073582 chgserver: reorder service classes to make future patches readable
Includes no functional change.
2016-05-22 13:08:30 +09:00
Yuya Nishihara
2857a87cb4 commandserver: add new forking server implemented without using SocketServer
SocketServer.ForkingMixIn of Python 2.x has a couple of issues, such as:

 - race condition that leads to 100% CPU usage (Python 2.6)
   https://bugs.python.org/issue21491
 - can't wait for children belonging to different process groups (Python 2.6)
 - leaves at least one zombie process (Python 2.6, 2.7)
   https://bugs.python.org/issue11109

The first two are critical because we do setpgid(0, 0) in child process to
isolate terminal signals. The last one isn't, but ForkingMixIn seems to be
doing silly. So there are two choices:

 a) backport and maintain SocketServer until we can drop support for Python 2.x
 b) replace SocketServer by simpler one and eliminate glue codes

I chose (b) because it's great time for getting rid of utterly complicated
SocketServer stuff, and preparing for future move towards prefork service.

New unixforkingservice is implemented loosely based on chg 531f8ef64be6. It
is monolithic but much simpler than SocketServer. unixservicehandler provides
customizing points for chg, and it will be shared with future prefork service.

Old unixservice class is still used by chgserver. It will be removed later.

Thanks to Jun Wu for investigating these issues.
2016-05-22 11:43:18 +09:00
Yuya Nishihara
d298dc2539 commandserver: extract function that serves for the current connection
This will be used by new server implementation.
2016-05-22 12:49:22 +09:00
Yuya Nishihara
be6319726b commandserver: manually create file objects from socket
Prepares for moving away from SocketServer. See the subsequent patches for
why.
2016-05-22 12:44:25 +09:00
Maciej Fijalkowski
04dfc79402 bdiff: split bdiff into cpy-aware and cpy-agnostic part 2016-07-13 10:46:26 +02:00
Maciej Fijalkowski
99c86205e7 bdiff: rename functions and structs to be amenable for later exporting 2016-07-13 10:07:17 +02:00
Maciej Fijalkowski
020e7f27ca bdiff: use ssize_t in favor of Py_ssize_t in cpython-unaware locations
This function and struct will be exposed via cffi, so we need to
remove the cpython API dependency they currently have.
2016-07-13 09:36:24 +02:00
Anton Shestakov
2f0cc33284 hgweb: enumerate lines in loop header, not before
Doing this will allow access to the lines in arbitrary order (because the
result of enumerate() is an iterator), and that will help calculating rowspan
for annotate blocks.
2016-07-14 12:33:44 +08:00
Gregory Szorc
9cc195a913 sslutil: add assertion to prevent accidental CA usage on Windows
Yuya suggested we add this check to ensure we don't accidentally try
to load user-writable paths on Windows if we change the control
flow of this function later.
2016-07-13 19:33:52 -07:00
Kostia Balytskyi
e27abece5f shelve: make unshelve be able to abort in any case 2016-07-13 16:16:18 +01:00
Augie Fackler
c36830f5df osx: explicitly build hg with /usr/bin/python2.7
This should help avoid creating a package that depends on a custom
Python, as happened when I built a package for 3.8.
2016-07-13 10:39:33 -04:00
Augie Fackler
ceba27d835 osx: correct comment about ordering of welcome page 2016-07-13 11:26:44 -04:00
Augie Fackler
7db40cdbbf osx: jettison outdated build instructions 2016-07-13 11:24:31 -04:00
Yuya Nishihara
53e19881da commandserver: extract _cleanup() hook to clarify chg is doing differently
This makes it clear that chg needs its own way to unlink closed socket file.
I made a mistake in draft patches without noting the difference.
2016-05-22 11:21:11 +09:00
Yuya Nishihara
278af8b465 chgserver: drop repo at chgunixservice.__init__()
Since it isn't expensive operation, we don't have to delay it to init().
2016-05-21 17:06:39 +09:00
Yuya Nishihara
9531323bdb chgserver: extract utility to bind unix domain socket to long path
This is common problem of using sockaddr_un.
2016-05-21 16:52:04 +09:00
Yuya Nishihara
e524ae5976 chgserver: narrow scope of chdir() to socket.bind()
This helps extracting a utility function.
2016-05-21 16:42:59 +09:00
Denis Laxalde
399b84e4b1 annotate: handle empty files earlier
Rather than looping on funcmap and then checking for non-zero `l`
continue if the result of fctx.annotate is empty.
2016-07-11 15:45:34 +02:00
Denis Laxalde
873ac38beb context: eliminate handling of linenumber being None in annotate
I could not find any use of this parameter value. And it arguably makes
understanding of the function more difficult. Setting the parameter default
value to False.
2016-07-11 14:44:19 +02:00
Gregory Szorc
a17406014b tests: regenerate x509 test certificates
The old x509 test certificates were using cryptographic settings
that are ancient by today's standards, namely 512 bit RSA keys.
To put things in perspective, browsers have been dropping support
for 1024 bit RSA keys.

I think it is important that tests match the realities of the times.
And 2048 bit RSA keys with SHA-2 hashing are what the world is
moving to.

This patch replaces all the x509 certificates with new versions using
modern best practices. In addition, the docs for generating the
keys have been updated, as the existing docs left out a few steps,
namely how to generate certs that were not active yet or expired.
2016-07-12 22:26:04 -07:00
Denis Laxalde
349c6778aa hgweb: add a link on node id in annotate hover-box
The link pointing the annotate view at this revision, just like the one in the
left-column but accessible from anywhere.
2016-07-12 15:09:07 +02:00
Denis Laxalde
0986e60532 hgweb: move author information from left-column to hover-box in annotate view
And display the full author information since there is enough space there.
2016-07-12 15:07:37 +02:00
Denis Laxalde
81b6a5375a hgweb: add links to diff and changeset in hover-box on annotate view 2016-06-14 11:01:30 +02:00
Denis Laxalde
07a35f6357 hgweb: add link to parents of annotated revision in annotate view
The link is embedded into a div with class="annotate-info" that only shows up
upon hover of the annotate column. To avoid duplicate hover-overs (this new
one and the one coming from link's title), drop "title" attribute from a
element and put it in the annotate-info element.
2016-06-28 11:42:42 +02:00
Maciej Fijalkowski
99f034fe89 compat: provide a declaration of ssize_t, for MS windows 2016-07-11 13:53:35 +02:00
Augie Fackler
917aac8fd9 check-code: enforce (glob) on output lines containing 127.0.0.1 2016-07-09 23:04:03 -04:00
Augie Fackler
a746cac8cc tests: add (glob) annotations to output lines with 127.0.0.1 2016-07-09 23:03:45 -04:00
Augie Fackler
2356a5286e run-tests: add support for using 127.0.0.1 as a glob
Some systems don't have a 127/8 address for localhost (I noticed this
on a FreeBSD jail). In order to work around this, use 127.0.0.1 as a
glob pattern. A future commit will update needed output lines and add
a requirement to check-code.py.
2016-07-09 23:01:02 -04:00
Augie Fackler
2bd6c4334d check-code: only treat a # as a line in a t-test if it has a space before it
Prior to this, check-code wouldn't notice things like (glob)
annotations or similar in a test if they were after a # anywhere in
the line. This resolves a defect in a future change, and also exposed
a couple of small spots that needed some attention.
2016-07-12 15:34:17 -04:00
Augie Fackler
a03c51a052 test-export: be more aggressive about quoting ^
An upcoming change to check-code will notice this isn't quoted
enough. Presumably it's been fine by luck all this time.
2016-07-12 15:41:38 -04:00
Augie Fackler
f5ff8e7bc2 test-check-shbang: work around check-code not wanting hardcoded paths
I'm about to fix a bug in check-code that a # anywhere on a line
treated the rest of the line as a comment, even if it was
meaningful. This test is the one place we explicitly *do* want
hardcoded paths referenced, but we can work around that by specifying
bin as a regular expression.
2016-07-12 15:32:24 -04:00
Augie Fackler
6e00945221 tests: relax "Connection refused" match
We already had the match relaxed on Windows, but on Google Compute
Engine VMs I'm seeing "Network is unreachable" instead of "Connection
refused". At this point, just give up and make sure we get an error back.
2016-07-12 11:20:30 -04:00
Yuya Nishihara
bfbe03d5c2 commandserver: backport handling of forking server from chgserver
This is common between chg and vanilla forking server, so move it to
commandserver and unify handle().

It would be debatable whether we really need gc.collect() or not, but that
is beyond the scope of this series. Maybe we can remove gc.collect() once
all resource deallocations are switched to context manager.
2016-05-21 15:23:21 +09:00
Yuya Nishihara
b7be4fa70f commandserver: promote .cleanup() hook from chgserver
This allows us to unify _requesthandler.handle().
2016-05-21 15:18:23 +09:00
Yuya Nishihara
3f1513d0b9 commandserver: extract method to create commandserver instance per request
This is a step toward merging chgserver._requesthandler with commandserver's.
2016-05-21 15:12:19 +09:00
Yuya Nishihara
2e776c55f7 error: make hintable exceptions reject unknown keyword arguments (API)
Previously they would accept any typos of the hint keyword.
2016-07-11 21:40:02 +09:00
Yuya Nishihara
348d73cc98 error: make HintException a mix-in class not derived from BaseException (API)
HintException is unrelated to the hierarchy of errors. It is an implementation
detail whether a class inherits from HintException or not, a sort of "private
inheritance" in C++.

New Hint isn't an exception class, which prevents catching error by its type:

    try:
        dosomething()
    except error.Hint:
        pass

Unfortunately, this passes on PyPy 5.3.1, but not on Python 2, and raises more
detailed TypeError on Python 3.
2016-07-09 14:28:30 +09:00
Gregory Szorc
8417644a05 sslutil: move context options flags to _hostsettings
Again, moving configuration determination to a single location.
2016-07-06 22:53:22 -07:00
Gregory Szorc
8d840bafa6 sslutil: move protocol determination to _hostsettings
Most of the logic for configuring TLS is now in this function.
Let's move protocol determination code there as well.
2016-07-06 22:47:24 -07:00
Durham Goode
3d4a2a798a share: don't recreate the source repo each time
Previously, every time you asked for the source repo of a shared working copy it
would recreate the repo object, which required calling reposetup. With certain
extension enabled, this can be quite expensive, and it can happen many times
(for instance, share attaches a post transaction hook to update bookmarks that
triggers this).

The fix is to just cache the repo object instead of constantly recreating it.
2016-07-11 13:40:02 -07:00
Maciej Fijalkowski
b3d62e8a18 setup: prepare for future cffi modules by adding placeholder in setup 2016-07-11 10:44:18 +02:00