git_handler: use convert_list to cache git objects

getnewgitcommits() does a weird traversal where a particular commit SHA is
visited as many times as the number of parents it has, effectively doubling
object reads in the standard case with one parent. This patch makes the
convert_list a cache for objects, so that a particular Git object is read just
once.

On a mostly linear repository with over 50,000 commits, this brings a no-op hg
pull down from 70 seconds to 38, which is close to half the time, as expected.
Note that even a no-op hg pull currently does a full DAG traversal -- an
upcoming patch will fix this.
This commit is contained in:
Siddharth Agarwal 2014-02-18 20:22:13 -08:00
parent 36052aca77
commit 6f79df86d2

View File

@ -620,7 +620,11 @@ class GitHandler(object):
todo.pop()
continue
assert isinstance(sha, str)
obj = self.git.get_object(sha)
if sha in convert_list:
obj = convert_list[sha]
else:
obj = self.git.get_object(sha)
convert_list[sha] = obj
assert isinstance(obj, Commit)
for p in obj.parents:
if p not in done:
@ -630,7 +634,6 @@ class GitHandler(object):
break
else:
commits.append(sha)
convert_list[sha] = obj
done.add(sha)
todo.pop()