Move the code in context._manifestmatches() into a new
manifest.matches(). It's a natural place for the code to live and it
allows other callers to easily use it. It should also make it easier
to optimize the new method in alternative implementations of the
manifest (same reasoning as with manifest.diff()).
From manifest.diff(), we return a dict from filename to pairs of pairs
of file nodeids and flags (values of the form ((n1,n2),(fl1,fl2))). To
create this dict, we currently generate one dict for files (with
(n1,n2) values) and one for flags (with (fl1,fl2) values) and then
join these dicts. Missing files are represented by None and missing
flags by '', but due to the dict joining, the inner pairs themselves
can also be None. The only caller, merge.manifestmerge(), then unpacks
these values while checking for None values.
By inlining the calls to dicthelpers and simplifying it to only
iterate over files (ignoring flags-only differences), we can simplify
life for our caller.
The manifestdict class already has a method for diff flags between two
manifests (presumably because there is no full access to the private
_flags field). The only caller is merge.manifestmerge(), which also
wants a diff of files between the same manifests. Let's combine the
code for diffing files and flags into a single method on
manifestdict. This puts all the manifest diffing in one place and will
allow for further simplification. It might also be useful for it to be
encapsulated in manifestdict if we later decide to to shard
manifests. The docstring is intentionally unclear about missing
entries for now.
Omit the check of bool(p1) since it's always true in practice: it will
either be nullid or some valid manifest sha, and we know nullid won't
ever be in the cache so we can simplify understanding of this code.
I verified that changed was never false (it was always a 2-tuple) by
adding
@@ -220,6 +225,8 @@ class manifest(revlog.revlog):
def add(self, map, transaction, link, p1=None, p2=None,
changed=None):
+ if not changed:
+ assert False, 'changed was %r' % changed
# if we're using the cache, make sure it is valid and
# parented by the same node we're diffing against
if not (changed and p1 and (p1 in self._mancache)):
and observing that the test suite still passed. Making all the
arguments required should help future readers understand what's going
on here.
During a commit amend there are 4 manifests being handled:
- original commit
- temporary commit
- amended commit
- merge base
This causes a manifest cache miss which hurts perf on large repos. On a large
repo, this fix causes amend to go from 6 seconds to 5.5 seconds.
Previously, the manifest cache would store the last manifest parsed. We could
run into situations with operations like update where we would try parsing the
manifest for a revision r1, then r2, then r1 again. This increases the cache
size to 3 to avoid that bit of performance fragility.
When commiting to a repo with lots of files (>170000),
manifest.py:addlistdelta takes some time because it's editing a large
array many times. Changing it to build a new array instead of editing
the old one saves around 0.04 seconds on a 1.64 second commit. A 2.5%
gain.
The gain here is pretty minor, but it was blatantly at the top of the
profiler report and the fix is straight forward.
I tested it by comparing the arrays produced by the new and old logic
while running all of the tests.
Introduce manifestdict.withflags() to get a set of all files which have any
flags set, since these are likely to be a minority. Otherwise checking .flags()
for every file is a lot of dictionary lookups and is quite slow.
Though both give the same result (a NUL byte), I found that I tend to
read "\000" as "\0" + "00", which is something completely different.
I did not change the occurance of "\000" in archival.py since there
are other octal constants in that file.
Py3k doesn't have a global cmp() function, making this call problematic in the
py3k port. Also, calling cmp() here is not necessary, since we only want to
know if the two values are equal. A check for equality perfect in this case and
this patch does that.
If a buffer of an mutable object is passed to revlog.addrevision(), the revlog
will happily store it in its cache. Later when the revlog reuses the cached
entry, if the manifest modified the object in-between, all kind of bugs
appears.
We fix it by:
- passing immutable objects to addrevision() if they are already available
- only storing the text in the cache if it's of str type
Then we can remove the conversion of the cache entry to str() during
retrieval. That was probably just there hiding the bug for the common cases
but not really fixing it.
No need to copy the dict, dict.__init__() will do that for us.
It was responsible for a non-negligeable waste of time during a qpush of an
-mm queue on the kernel repo.
Include the offending filenames in the error message. Now this error message
is consistent with the same error issued by dirstate.py (although there is
still duplicate code).
- create error.py for exception classes to reduce demandloading
- move revlog exceptions to it
- change users to import error and drop revlog import if possible