e3b881ea6832 (lazymanifest: extract function for iterating to next
line, 2015-03-11) introduced a large memory leak because I stopped
running the 'bail' block in the successful case. Let's rename 'bail'
to 'done', since it has to run not only in the error case.
__setitem__ on the lazymanifest C type wasn't checking to see if a
line had previously been malloced before replacing it, leading to
leaks if files got updated multiple times in the course of a task.
I was able to reproduce the leak with this change to test-manifest.py:
diff --git a/tests/test-manifest.py b/tests/test-manifest.py
--- a/tests/test-manifest.py
+++ b/tests/test-manifest.py
@@ -456,6 +456,16 @@ class basemanifesttests(object):
['a/b/c/bar.txt', 'a/b/c/foo.txt', 'a/b/d/ten.txt'],
m2.keys())
+ def testManifestSetItem(self):
+ m = self.parsemanifest('')
+ for x in range(3):
+ m['file%d' % x] = BIN_HASH_1
+ for x in range(3):
+ m['file%d' % x] = BIN_HASH_2
+ import time
+ time.sleep(4)
+
+
along with the commands:
$ make local
$ PYTHONPATH=. SILENT_BE_NOISY=1 python tests/test-manifest.py testmanifestdict.testManifestSetItem &
$ sleep 4
$ leaks $(jobs -p | tee /dev/stderr | awk '{print $3}')
$ wait
in an interactive shell on OS X. As far as I can tell, it had to be an
interactive shell so that I could get the pid of the test run using
the jobs builtin. Prior to this change, I was leaking several strings,
and after this change leaks reports no leaks.
I thought there was a bug filed for this in bugzilla, but I can't find
it either in bugzilla or by searching my email.
This fixes an MSVC 2008 warning that I don't see with gcc 4.6.3-2:
warning C4047: 'return' :
'bool' differs in levels of indirection from 'line *'
More importantly, the truncation from pointer to 'unsigned char' would have
returned 0 if self->lines pointed to an address divisible by 0xFF, which causes
find_lines() to return MANIFEST_OOM. I was able to cause this to happen in a
trivial program with the gcc compiler.
The _lazymanifest type(s) behave very much like a sorted dict with
filenames as keys and (nodeid, flags) as values. It therefore seems
surprising that its __iter__ generates 3-tuples of (path, nodeid,
flags). Let's make it match dict's behavior of generating the keys
instead, and add a new iterentries method for the 3-tuples. With this
change, the "x" in "if x in lm" and "for x in lm" now have the same
type (a filename string).
Where we convert a 20-byte binary to a 40-byte hex string in
lazymanifest_setitem(), we use sprintf("%02hhx", hash[i]). As Matt
Harbison found out, 'hh' seems to be ignored on some platforms (Visual
Studio?). If char is signed on such platforms, the value gets
sign-extended as it gets promoted into an int when passed into the
variadic sprintf(). The resulting integer value will then be printed
with leading f's (14 of them on 64-bit systems), since the '2' in the
format string indicates only minimum number of characters. This is
both incorrect and runs the risk of writing outside of allocated
memory (as Matt reported).
To fix, let's cast the value to unsigned char before passing it to
sprintf(). Also drop the poorly supported 'hh' formatting that now
becomes unnecessary.
This makes insertions log(n) plus some memmove() overhead, rather than
doing an append followed by an n*log(n) sort. There's probably a lot
of performance to be gained by adding a batch-add method, which could
be smarter about the memmove()s performed.
Includes a couple of extra tests that should help prevent bugs.
Thanks to Martin for some significant pre-mail cleanup of this change.
This lets us iterate manifests in order, but do a _lot_ less work in
the common case when we only care about a few manifest entries.
Many thanks to Mike Edgar for reviewing this in advance of it going
out to the list, which caught many things I missed.
This version of the patch includes C89 fixes from Sean Farley and
many correctness/efficiency cleanups from Martin von
Zweigbergk. Thanks to both!