demandimport: replace more references to _demandmod instances

_demandmod instances may be referenced by multiple importing modules.
Before this patch, the _demandmod instance only maintained a reference
to its first consumer when using the "from X import Y" syntax. This is
because we only created a single _demandmod instance (attached to the
parent X module). If multiple modules A and B performed
"from X import Y", we'd produce a single _demandmod instance
"demandmod" with the following references:

  X.Y = <demandmod>
  A.Y = <demandmod>
  B.Y = <demandmod>

The locals from the first consumer (A) would be stored in <demandmod1>.
When <demandmod1> was loaded, we'd look at the locals for the first
consumer and replace the symbol, if necessary. This resulted in state:

  X.Y = <module>
  A.Y = <module>
  B.Y = <demandmod>

B's reference to Y wasn't updated and was still using the proxy object
because we just didn't record that B had a reference to <demandmod> that
needed updating!

With this patch, we add support for tracking which modules in addition
to the initial importer have a reference to the _demandmod instance and
we replace those references at module load time.

In the case of posix.py, this fixes an issue where the "encoding" module
was being proxied, resulting in hundreds of thousands of
__getattribute__ lookups on the _demandmod instance during dirstate
operations on mozilla-central, speeding up execution by many
milliseconds. There are likely several other operation that benefit from
this change as well.

The new mechanism isn't perfect: references in locals (not globals) may
likely linger. So, if there is an import inside a function and a symbol
from that module is used in a hot loop, we could have unwanted overhead
from proxying through _demandmod. Non-global imports are discouraged
anyway. So hopefully this isn't a big deal in practice. We could
potentially deploy a code checker that bans use of attribute lookups of
function-level-imported modules inside loops.

This deficiency in theory could be avoided by storing the set of globals
and locals dicts to update in the _demandmod instance. However, I tried
this and it didn't work. One reason is that some globals are _demandmod
instances. We could work around this, but it's a bit more work. There
also might be other module import foo at play. The solution as
implemented is better than what we had and IMO is good enough for the
time being.

It's worth noting that this sub-optimal behavior was made worse by the
introduction of absolute_import and its recommended "from . import X"
syntax for importing modules from the "mercurial" package. If we ever
wrote performance tests, measuring the amount of module imports and
__getattribute__ proxy calls through _demandmod instances would be
something I'd have it check.
This commit is contained in:
Gregory Szorc 2015-10-04 11:17:43 -07:00
parent 74682dae31
commit 54a54d074c

View File

@ -73,14 +73,26 @@ class _demandmod(object):
head = name
after = []
object.__setattr__(self, "_data",
(head, globals, locals, after, level))
(head, globals, locals, after, level, set()))
object.__setattr__(self, "_module", None)
def _extend(self, name):
"""add to the list of submodules to load"""
self._data[3].append(name)
def _addref(self, name):
"""Record that the named module ``name`` imports this module.
References to this proxy class having the name of this module will be
replaced at module load time. We assume the symbol inside the importing
module is identical to the "head" name of this module. We don't
actually know if "as X" syntax is being used to change the symbol name
because this information isn't exposed to __import__.
"""
self._data[5].add(name)
def _load(self):
if not self._module:
head, globals, locals, after, level = self._data
head, globals, locals, after, level, modrefs = self._data
mod = _hgextimport(_import, head, globals, locals, None, level)
# load submodules
def subload(mod, p):
@ -95,9 +107,15 @@ class _demandmod(object):
for x in after:
subload(mod, x)
# are we in the locals dictionary still?
# Replace references to this proxy instance with the actual module.
if locals and locals.get(head) == self:
locals[head] = mod
for modname in modrefs:
modref = sys.modules.get(modname, None)
if modref and getattr(modref, head, None) == self:
setattr(modref, head, mod)
object.__setattr__(self, "_module", mod)
def __repr__(self):
@ -107,7 +125,7 @@ class _demandmod(object):
def __call__(self, *args, **kwargs):
raise TypeError("%s object is not callable" % repr(self))
def __getattribute__(self, attr):
if attr in ('_data', '_extend', '_load', '_module'):
if attr in ('_data', '_extend', '_load', '_module', '_addref'):
return object.__getattribute__(self, attr)
self._load()
return getattr(self._module, attr)
@ -143,6 +161,9 @@ def _demandimport(name, globals=None, locals=None, fromlist=None, level=level):
# The modern Mercurial convention is to use absolute_import everywhere,
# so modern Mercurial code will have level >= 0.
# The name of the module the import statement is located in.
globalname = globals.get('__name__')
def processfromitem(mod, attr, **kwargs):
"""Process an imported symbol in the import statement.
@ -154,6 +175,12 @@ def _demandimport(name, globals=None, locals=None, fromlist=None, level=level):
symbol = _demandmod(attr, mod.__dict__, locals, **kwargs)
setattr(mod, attr, symbol)
# Record the importing module references this symbol so we can
# replace the symbol with the actual module instance at load
# time.
if globalname and isinstance(symbol, _demandmod):
symbol._addref(globalname)
if level >= 0:
# Mercurial's enforced import style does not use
# "from a import b,c,d" or "from .a import b,c,d" syntax. In