This patch implements a script that inherits most of its functionality from
hg's setup.py and adds support to calling 2to3 during invocation with python3.
The motivation of having this script around is twofold:
1) It enables py3k crazies to test mercurial in py3k and, hopefully, patch it
more easily, so it can improve the py3k support to eventually run there.
2) Being separated from the main setup.py eliminates the need to make hg's
setup.py even more cluttered, and enables "independent" development until
the port is done.
Some considerations about the structure of this patch:
Mercurial already overrides the behavior of build_py, this patch tweaks it a bit
more to add support to call 2to3 with a custom fixer* location for Mercurial.
There is also a need of having the core C modules built *before* the
translation process starts, otherwise 2to3 will think those are global modules.
* A fixer is a python module that transforms python 2.x code in python 3.x
code.
This patch implement a fixer that replaces all calls to the '%' when bytes
arguments are used to a call to bytesformatter(), a function that knows how to
format byte strings. As one can't be sure if a formatting call is done when
only variables are used in a '%' call, these calls are also translated. The
bytesformatter, in runtime, makes sure to return the "raw" % operation if
that's what was intended.
This patch adds some ugly constructs. The first of them is bytesformatter, a
function that formats strings like when '%' is called. The main motivation for
this function is py3k's strange behavior:
>>> 'foo %s' % b'bar'
"foo b'bar'"
>>> b'foo %s' % b'bar'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'
>>> b'foo %s' % 'bar'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'str'
In other words, if we can't format bytes with bytes, and recall that all
mercurial strings will be converted by a fixer, then things will break badly if
we don't take a similar approach.
The other addition with this patch is that the os.environ dictionary is
monkeypatched to have bytes items. Hopefully this won't be needed in the
future, as python 3.2 might get a os.environb dictionary that holds bytes
items.
This patch implements a 2to3 fixer that converts all plain strings in a python
source file to byte strings syntax. Example:
foo = 'Normal string'
would become
foo = b'Normal string'
The motivation behind this fixer can be found in
http://selenic.com/pipermail/mercurial-devel/2010-June/022363.html or, in other
words: the current hg source assumes that _most_ strings are "meant" to be byte
sequences, so it makes sense to make the convertion implemented by this patch.
As mentioned above, not all mercurial modules want to use strings as bytes,
examples include i18n (which uses unicode), and demandimport (in py3k, module
names are normal strings, thus unicode, and there's no need for a convertion).
Therefore, these modules are blacklisted in the fixer. There are also a few
functions that can take only unicode arguments, thus the convertion shouldn't
be done for those.
The old check would only detect any/all at the beginning of a line.
The regexp was probably just modeled after the preceding regexp which
(correctly) finds the 'with' keyword at the beginning of a line.
We now complain about 'any(' and 'all(' anywhere in a line, unless it
is preceded by 'def'. This allows us to define our own compatibility
wrapper in util and use 'util.any(' in the code.
Otherwise, the shrunken index file always has mode 0600 thanks to
mkstemp(). This is annoying on a server, where multiple users may need
to read/write the manifest. chmod()ing the data file is not strictly
necessary, but it's nice for consistency.
Even though the name of the project is Docutils, most packagers use
the package name python-docutils to fit into the naming scheme of
other packages written in Python. The name is used by Fedora, EPEL,
DAG, Mandriva, and a few other distributions.
Another tool had -10 already. Since 'merge' is clearly a worst-case
tool (internal), lowering to -100 ensures there's plenty of room for
slightly better cases.