Commit Graph

63 Commits

Author SHA1 Message Date
Pulkit Goyal
73d422914f py3: add a bytes version of urllib.parse.urlencode() to pycompat.py
urllib.parse.urlencode() returns unicodes on Python 3. This commit adds a
method which will take its output and encode it to bytes so that we can use
bytes consistently.
2017-04-07 16:00:44 +05:30
Yuya Nishihara
dc941eb2d4 py3: have registrar process docstrings in bytes
Mixing bytes and unicode creates a mess. Do things in bytes as possible.

New sysbytes() helper only takes care of ASCII characters, but avoids raising
nasty unicode exception. This is the same design principle as sysstr().
2017-04-05 00:34:58 +09:00
Yuya Nishihara
2f682a6206 pycompat: provide bytes os.linesep 2017-03-29 21:23:28 +09:00
Yuya Nishihara
236d81fcc8 pycompat: introduce identity function as a compat stub
I was sometimes too lazy to use 'str' instead of 'lambda a: a'. Let's add
a named function for that purpose.
2017-03-29 21:13:55 +09:00
Gregory Szorc
56e2825efa py3: stop exporting urlparse from pycompat and util (API)
There are no consumers of this in tree.

Functions formerly available on this object/module can now be accessed
via {pycompat,util}.urlreq.
2017-03-21 22:47:49 -07:00
Gregory Szorc
7046e7f549 pycompat: define urlreq.urlparse and urlreq.unparse aliases
Currently, we export urlparse via util.urlparse then
call util.urlparse.urlparse() and util.urlparse.urlunparse()
in a few places. This is the only url* module exported from
pycompat, making it a one-off. So let's transition to urlreq
to match everything else.

Yes, we double import "urlparse" now on Python 2. This will
be cleaned up in a subsequent patch.

Also, the Python 3 functions trade in str/unicode not bytes.
So we'll likely need to write a custom implementation that
speaks bytes. But moving everyone to an abstracted API
is a good first step.
2017-03-21 22:34:17 -07:00
Gregory Szorc
76701c9657 pycompat: remove urlunquote alias
It is duplicated by urlreq.unquote and is unused. Kill it.

We retain the imports because it is re-exported via util.urlparse,
which is used elsewhere.

Since we no longer access attributes of urlparse at module load time,
this change /should/ result in that module reverting to a lazy module.
2017-03-21 22:28:16 -07:00
Gregory Szorc
a0383567c2 pycompat: alias urlreq.unquote to unquote_to_bytes
Previously, urlreq.unquote aliased to urllib.parse.unquote,
which returned a str/unicode. We like bytes, so switch urlreq.unquote
to dispatch to urllib.parse.unquote_to_bytes.

This required a minor helper function to register an alias under a
different name from which it points. If this turns into a common
pattern, we could likely teach _registeralias to accept tuple
values defining the mapping. Until then, I didn't feel like
adding complexity to _registeralias.
2017-03-21 22:20:11 -07:00
Augie Fackler
2c48d16072 pycompat: add maplist alias for old map behavior 2017-03-19 14:12:38 -04:00
Yuya Nishihara
791afb08eb pycompat: add bytestr wrapper which mostly acts as a Python 2 str
This allows us to handle bytes in mostly the same manner as Python 2 str,
so we can get rid of ugly s[i:i + 1] hacks:

  s = bytestr(s)
  while i < len(s):
      c = s[i]
      ...

This is the simpler version of the previous RFC patch which tried to preserve
the bytestr type if possible. New version simply drops the bytestr wrapping
so we aren't likely to pass a bytestr to a function that expects Python 3
bytes.
2017-03-08 22:48:26 +09:00
Martin von Zweigbergk
7b5dab5409 py3: make py3 compat.iterbytestr simpler and faster
With Python 3.4.3, timit says 11.9 usec-> 6.44 usec. With Python
3.6.0, timeit says 14.1 usec -> 9.55 usec.
2017-03-15 09:32:18 -07:00
Martin von Zweigbergk
cf1f0d3920 py3: optimize py3 compat.bytechr using Struct.pack
With Python 3.4.3, timeit says 0.437 usec -> 0.0685 usec. With Python
3.6, timeit says 0.157 usec -> 0.0907 usec. So it's faster on both
versions, but the speedup varies a lot.

Thanks to Gregory Szorc for the suggestion.
2017-03-15 09:30:50 -07:00
Gregory Szorc
75a74f883f pycompat: custom implementation of urllib.parse.quote()
urllib.parse.quote() accepts either str or bytes and returns str.

There exists a urllib.parse.quote_from_bytes() which only accepts
bytes. We should probably use that to retain strong typing and
avoid surprises.

In addition, since nearly all strings in Mercurial are bytes, we
probably don't want quote() returning unicode.

So, this patch implements a custom quote() that only accepts bytes
and returns bytes. The quoted URL should only contain URL safe
characters which is a strict subset of ASCII. So
`.encode('ascii', 'strict')` should be safe.
2017-03-13 12:16:47 -07:00
Gregory Szorc
db258a9802 pycompat: alias urllib symbols directly
urllib.request imports a bunch of symbols from other urllib
modules. We should map to the original symbols not the
re-exported ones because this is more correct. Also, it
will prevent an import of urllib.request if only one of
the lower-level symbols/modules is needed.
2017-03-13 12:14:17 -07:00
Yuya Nishihara
a1b53adeff pycompat: add helper to iterate each char in bytes 2017-03-12 17:04:45 -07:00
Yuya Nishihara
7daa87b335 pycompat: move imports of cStringIO/io to where they are used
There's no point to import cStringIO as io since we have to select StringIO
or BytesIO conditionally.
2017-03-12 12:54:11 -07:00
Pulkit Goyal
a0c31269e8 pycompat: default to BytesIO instead of StringIO 2017-03-13 00:55:14 +05:30
Augie Fackler
00b54d28ff merge with stable 2017-03-11 13:53:14 -05:00
Augie Fackler
5811358454 pycompat: verify sys.argv exists before forwarding it (issue5493)
ISAPI_WSGI doesn't set up sys.argv, so we have to look for the
attribute before assuming it exists.
2017-03-07 13:24:24 -05:00
Pulkit Goyal
3c7388da12 py3: replace pycompat.getenv with encoding.environ.get
pycompat.getenv returns os.getenvb on py3 which is not available on Windows.
This patch replaces them with encoding.environ.get and checks to ensure no
new instances of os.getenv or os.setenv are introduced.
2017-01-15 13:17:05 +05:30
Yuya Nishihara
7c6e9b463e py3: factor out bytechr() function
I also changed xrange(127) to range(127) as the number is relatively small.
2017-03-08 22:30:12 +09:00
Pulkit Goyal
d310fba3dd py3: have a bytes version of shlex.split()
shlex.split() only accepts unicodes on Python 3. After this patch we will be
using pycompat.shlexsplit(). This patch also replaces existing occurences of
shlex.split with pycompat.shlexsplit.
2016-12-25 03:06:55 +05:30
Pulkit Goyal
5b1c662d4d py3: have bytes version of sys.executable
sys.executable on Python 3 returns unicodes and we want bytes. So this patch
adds a new pycompat.sysexecutable which returns bytes by encoding using
os.fsencode() since it is path variable.
2016-12-20 00:02:24 +05:30
Pulkit Goyal
3b34ae4e1d py3: have bytes version of os.getenv
os.getenv() on python 3 deals with unicodes. If we want to pass bytes. we have
os.getenvb() which deals with bytes. This patch adds up a pycompat.osgetenv
which deals with bytes on both python 2 and 3.
2016-12-19 02:35:38 +05:30
Pulkit Goyal
22be95eb3e py3: have a bytes version of sys.platform
sys.platform returns unicodes on Python 3. This patch adds up
pycompat.sysplatform which returns bytes.
2016-12-18 00:52:05 +05:30
Pulkit Goyal
06f595a242 py3: have a bytes version of os.altsep
os.altsep returns unicodes on Python 3. We need a bytes version hence added
pycompat.altsep.
2016-12-18 00:44:21 +05:30
Pulkit Goyal
b18d8e2c04 py3: utility functions to convert keys of kwargs to bytes/unicodes
Keys of keyword arguments need to be str(unicodes) on Python 3. We have a lot
of function where we pass keyword arguments. Having utility functions to help
converting keys to unicodes before passing and convert back them to bytes once
passed into the function will be helpful. We now have functions named
pycompat.strkwargs(dic) and pycompat.byteskwargs(dic) to help us.
2016-12-07 21:53:03 +05:30
Pulkit Goyal
3f64a7a3eb py3: make a bytes version of getopt.getopt()
getopt.getopt() deals with unicodes on Python 3 internally and if bytes
arguments are passed, then it will return TypeError. So we have now
pycompat.getoptb() which takes bytes arguments, convert them to unicode, call
getopt.getopt() and then convert the returned value back to bytes and then
return those value.
All the instances of getopt.getopt() are replaced with pycompat.getoptb().
2016-12-06 06:36:36 +05:30
Pulkit Goyal
57f271b08e py3: add os.getcwdb() to have bytes path
Following the behaviour of Python 3, os.getcwd() return unicodes. We need
bytes version as path variables are bytes in UNIX. Python 3 has os.getcwdb()
which returns current working directory in bytes.

Like rest of the things there in pycompat, like osname, ossep, we need to
rewrite every instance of os.getcwd to pycompat.getcwd to make them work
correctly on Python 3.
2016-11-22 18:46:50 +05:30
Yuya Nishihara
ba083b6361 py3: provide bytes stdin/out/err through util module
Since standard streams are TextIO on Python 3, we can't use sys.stdin/out/err
directly. Fortunately we can get the underlying BytesIO via .buffer as long as
the streams aren't replaced by e.g. StringIO.

stdin/out/err are provided through util so we can wrap them by platform API.
2016-10-20 23:40:24 +09:00
Pulkit Goyal
cb0521463d py3: add pycompat.open and replace open() calls
open() requires mode argument as unicodes on Python 3. This patch introduces
pycompat.open() which is inserted to files using transformer and replaces
builtins.open() calls.
2017-03-03 13:04:32 +05:30
Yuya Nishihara
3b52d0164f py3: document why os.fsencode() can be used to get back bytes argv
And a possible Windows issue. I'm sad we have to do such ugly hack, but
that's the unicode on Python 3.
2016-11-09 22:06:09 +09:00
Pulkit Goyal
41a3214683 py3: have bytes version of sys.argv
sys.argv returns unicodes on Python 3. We need a bytes version for us.
There was also a python bug/feature request which wanted then to implement
one. They rejected and it is quoted in one of the comments that we can use
fsencode() to get a bytes version of sys.argv. Though not sure about its
correctness.

Link to the comment: http://bugs.python.org/issue8776#msg217416

After this patch we will have pycompat.sysargv which will return us bytes
version of sys.argv. If this patch goes in, i will like to make transformer
rewrite sys.argv with pycompat.argv because there are lot of occurences.
2016-11-06 04:36:26 +05:30
Augie Fackler
c2678fbe9d pycompat: introduce an alias for urllib.unquote
We have to use unquote_to_bytes on Python 3, so we need an abstraction
for this.
2016-10-09 09:02:25 -04:00
Pulkit Goyal
92076ed1a3 py3: have pycompat.ospathsep and pycompat.ossep
We needed bytes version of os.sep and os.pathsep in py3 as they return
unicodes.
2016-11-06 03:44:44 +05:30
Pulkit Goyal
214b36d54b py3: add a bytes version of os.name
os.name returns unicodes on py3. Most of our checks are like
    os.name == 'nt'

Because of the transformer, on the right hand side we have b'nt'. The
condition will never satisfy even if os.name returns 'nt' as that will be an
unicode.
We either need to encode every occurence of os.name or have a
new variable which is much cleaner. Now we have pycompat.osname.
There are around 53 occurences of os.name in the codebase which needs to
be replaced by pycompat.osname to support Python 3.
2016-11-06 03:33:22 +05:30
Pulkit Goyal
3c0a6ae01d py3: add os.fsdecode() as pycompat.fsdecode()
We need to use os.fsdecode() but this was not present in Python 2. So added
the function in pycompat.py
2016-11-06 03:12:40 +05:30
Martijn Pieters
6c2c90ea4c pycompat: only accept a bytestring filepath in Python 2 2016-10-10 23:11:15 +01:00
Martijn Pieters
74d3bea9ae py3: add an os.fsencode backport to ease path handling 2016-10-09 17:44:23 +02:00
Augie Fackler
867b91b167 pycompat: when setting attrs, ensure we use sysstr
The custom module importer was making these bytes, so when we poked
values into self.__dict__ we had bytes instead of unicode on py3 and
it didn't work.
2016-10-08 08:35:43 -04:00
Yuya Nishihara
fd6ad62876 pycompat: extract function that converts attribute or encoding name to str
This will be used to convert encoding.encoding to a str acceptable by
Python 3 functions.

The source encoding is changed to "latin-1" because encoding.encoding can
have arbitrary bytes. Since valid names should consist of ASCII characters,
we don't care about the mapping of non-ASCII characters so long as invalid
names are distinct from valid names.
2016-09-28 22:32:09 +09:00
Yuya Nishihara
7c03e0d6ba pycompat: provide 'ispy3' constant
We compare version_info at several places, which seems enough to define
a constant.
2016-09-28 20:01:23 +09:00
Yuya Nishihara
69789d265a pycompat: delay loading modules registered to stub
Replacement _pycompatstub designed to be compatible with our demandimporter.
try-except is replaced by version comparison because ImportError will no longer
be raised immediately.
2016-08-14 14:46:24 +09:00
Yuya Nishihara
12eca1889e py3: import builtin wrappers automagically by code transformer
This should be less invasive than mucking builtins.

Since tokenize.untokenize() looks start/end positions of tokens, we calculates
them from the NEWLINE token of the future import.
2016-08-16 12:35:15 +09:00
Yuya Nishihara
29c5e8dc21 py3: provide (del|get|has|set)attr wrappers that accepts bytes
These functions will be imported automagically by our code transformer.

getattr() and setattr() are widely used in our code. We wouldn't probably
want to rewrite every single call of getattr/setattr. delattr() and hasattr()
aren't that important, but they are functions of the same kind.
2016-08-14 12:51:21 +09:00
Yuya Nishihara
df4c2c74a3 py3: check python version to enable builtins hack
Future patches will add (del|get|has|set)attr wrappers.
2016-08-14 12:44:13 +09:00
Yuya Nishihara
1532480b48 py3: move xrange alias next to import lines
Builtin functions should be available in compatibility code.
2016-08-14 12:41:54 +09:00
Pulkit Goyal
0ce0d571e7 pycompat: avoid using an extra function
We have a single line function which just lowercase the letters and replaces
"_" with "". Its better to avoid that function call. Moreover we calling this
 function around 33 times.
2016-08-13 04:21:42 +05:30
Pulkit Goyal
1eb9840e42 pycompat: remove multiple occurences of urlencode
By mistake we had two occurences of urlencode.
2016-08-13 03:03:01 +05:30
Pulkit Goyal
c87a2b01ec pycompat: make pycompat demandimport friendly
pycompat.py includes hack to import modules whose names are changed in Python 3.
We use try-except to load module according to the version of python. But this
method forces us to import the modules to raise an ImportError and hence making
it demandimport unfriendly.

This patch changes the try-except blocks to a single if-else block. To avoid
test-check-pyflakes.t complain about unused imports, pycompat.py is excluded
from the test.
2016-07-17 19:48:04 +05:30