Python has its own memory allocation APIs. For allocations
<= 512 bytes, it allocates memory from arenas. This means that
average small allocations don't call the system allocator, which
makes them faster. Also, arena allocations cut down on memory
fragmentation, which can matter for performance in long-running
processes.
Another advantage of using the Python memory allocator is that
allocations are tracked by Python. This is a bigger deal in
Python 3, as modern versions of Python have some decent built-in
tools for examining memory usage, leaks, etc.
This patch converts a trivial malloc() + free() in the bdiff code
to use the Python allocator APIs. Since the object being
operated on is a line, chances are it will use an arena. So,
this could have a net positive impact on performance (although
I didn't measure it).
The patch lingered a bit too long in my local clone and I messed up when I
updated the version number. Since nobody caught it, I'm fixing the version after
the fact.
Changeset 3b9cdb72931f removed the mutable default value, but did not explicitly
tested for None. Such implicit checking can introduce semantic and performance
issue. We move to an explicit check for None as recommended by PEP8:
https://www.python.org/dev/peps/pep-0008/#programming-recommendations
The two function takes the very same arguments. We make this clearer and less
error prone by dispatching on the function only and having a single call point
in the code.
906be86990 recently changed to switch from:
self._rbcrevs[rbcrevidx:rbcrevidx + _rbcrecsize] = rec
to
pack_into(_rbcrecfmt, self._rbcrevs, rbcrevidx, node, branchidx)
This causes an exception if rbcrevidx is -1 (i.e. the nullrev). The old code
handled this because python handles out of bound sets to arrays gracefully. The
new code throws because the self._rbcrevs buffer isn't long enough to write 8
bytes to. Normally it would've been resized by the immediately preceding line,
but because the 0 length buffer is greater than the idx (-1) times the size, no
resize happens.
Setting the branch for the nullrev doesn't make sense anyway, so let's skip it.
This was caught by external tests in the Facebook extensions repo, but I've
added a test here that catches the issue.
string_escape doesn't exist on Python 3, but fortunately the undocumented
codecs.escape_encode() function exists on CPython 2.6, 2.7, 3.5 and PyPy 5.6.
So let's use it for now.
http://stackoverflow.com/a/23151714
Since extras may contain blob, the default template escapes its values:
'extra': '{key}={value|stringescape}'
join() should follow the output style of the default template.
.next attribute does not exist on Python 3. As this function seems to really
care about the overhead of the Python interpreter, I follow the way of micro
optimization.
Currently hg.exe will only try to load python27.dll from hg-python
subdir if PYTHONHOME environment variable is not set. I think that
it is better to check whether 'hg-python' subdir exists and load
python from it in that case, regardless of environment. This allows
for reliable approach of distributing Mercurial with its own Python.
This allows us to handle bytes in mostly the same manner as Python 2 str,
so we can get rid of ugly s[i:i + 1] hacks:
s = bytestr(s)
while i < len(s):
c = s[i]
...
This is the simpler version of the previous RFC patch which tried to preserve
the bytestr type if possible. New version simply drops the bytestr wrapping
so we aren't likely to pass a bytestr to a function that expects Python 3
bytes.
Changeset c832083e5671 removed the mutable default value, but did not explicitly
tested for None. Such implicit testing can introduce semantic and performance
issue. We move to an explicit testing for None as recommended by PEP8:
https://www.python.org/dev/peps/pep-0008/#programming-recommendations
Changeset 3495cae22a41 removed the mutable default value, but did not explicitly
tested for None. Such implicit testing can introduce semantic and performance
issue. We move to an explicit testing for None as recommended by PEP8:
https://www.python.org/dev/peps/pep-0008/#programming-recommendations
Changeset 11e325d162fe removed the mutable default value, but did not explicitly
tested for None. Such implicit testing can introduce semantic and performance
issue. We move to an explicit testing for None as recommended by PEP8:
https://www.python.org/dev/peps/pep-0008/#programming-recommendations
Changeset 45c7a22dbdc0 removed the mutable default value, but did not explicitly
tested for None. Such implicit testing can introduce semantic and performance
issue. We move to an explicit testing for None as recommended by PEP8:
https://www.python.org/dev/peps/pep-0008/#programming-recommendations
Changeset ba7f2a1cc2d2 removed the mutable default value, but did not explicitly
tested for None. Such implicit testing can introduce semantic and performance
issue. We move to an explicit testing for None as recommended by PEP8:
https://www.python.org/dev/peps/pep-0008/#programming-recommendations
With Python 3.4.3, timeit says 0.437 usec -> 0.0685 usec. With Python
3.6, timeit says 0.157 usec -> 0.0907 usec. So it's faster on both
versions, but the speedup varies a lot.
Thanks to Gregory Szorc for the suggestion.
We have used dict.keys() which returns a dict_keys() object instead
of list on Python 3. So this patch replaces that with list comprehension
which works both on Python 2 and 3.
It was specified to be an empty list in 3d8abfdaa08a in 2007.
It was correct at the time. But when the function was
refactored in 7bca0f2718ab (2010), it started expecting a dict.
I guess this code path is untested?
Thanks to Yuya for spotting this.
urllib.parse.quote() accepts either str or bytes and returns str.
There exists a urllib.parse.quote_from_bytes() which only accepts
bytes. We should probably use that to retain strong typing and
avoid surprises.
In addition, since nearly all strings in Mercurial are bytes, we
probably don't want quote() returning unicode.
So, this patch implements a custom quote() that only accepts bytes
and returns bytes. The quoted URL should only contain URL safe
characters which is a strict subset of ASCII. So
`.encode('ascii', 'strict')` should be safe.
urllib.request imports a bunch of symbols from other urllib
modules. We should map to the original symbols not the
re-exported ones because this is more correct. Also, it
will prevent an import of urllib.request if only one of
the lower-level symbols/modules is needed.
By luck, we appear to not pass any long instances into
the JSON formatter. I suspect this will change with all the
Python 3 porting work. Plus I have another series that will
convert some ints to longs that triggers this.
I don't think this is any tight loops and we'd need to worry about
PyObject creation overhead. Also, I'm pretty sure strptime()
will be much slower than PyObject creation (date parsing is
surprisingly slow).