Any commit in _map_git is already known, so there's no point walking further
down the DAG.
For a repo with over 50,000 commits, this brings down a no-op hg pull from 38
seconds to 2.5.
getnewgitcommits() does a weird traversal where a particular commit SHA is
visited as many times as the number of parents it has, effectively doubling
object reads in the standard case with one parent. This patch makes the
convert_list a cache for objects, so that a particular Git object is read just
once.
On a mostly linear repository with over 50,000 commits, this brings a no-op hg
pull down from 70 seconds to 38, which is close to half the time, as expected.
Note that even a no-op hg pull currently does a full DAG traversal -- an
upcoming patch will fix this.
For a repo with over 50,000 commits, this brings down the computation of
'export' from 1.25 seconds to 0.25 seconds.
To scale this to hundreds of thousands of commits, one solution might be to
maintain the mapping in a DAG data structure mirroring the changelog, over
which findcommonmissing can be used.
Before this patch, in the git to hg conversion, .hgsubstate once created is
never deleted, even if no submodules are any longer present. This is broken
state, as shown by the test for which the SHA changes. Fix that by looking at
the diff instead of just what submodules are present.
Since 'gitlinks' now contains *changed* gitlinks, not *all* gitlinks, it no
longer makes sense to gate gitmodules checks on that.
This patch simply demonstrates that the test was broken; an upcoming patch will
introduce more tests.
Bonus: this also makes the import process faster because we no longer need to
walk the entire tree to collect gitlinks.
This will cause the SHAs of repos that have submodules added and then removed
to change.
Currently, to figure out which gitlinks are in a repository we walk through the
entire tree. This patch lets us use get_files_changed to detect which gitlinks
have changed.
This is an adaptation of the original patch submitted in [1], without the
monkey-patching: a patch has been committed in dulwich [2] which allows clients
to supply a custom urllib2 "opener" for opening the url; here, we provide such
an opener, which provides authentication information obtained from the hg
config.
[1] https://groups.google.com/forum/#!topic/hg-git/9clPr1wdtiw
[2] https://bugs.launchpad.net/dulwich/+bug/909037
Consider two octopus merges, one of which is a child of the other. Without this
patch, get_git_parents() called on the second octopus merge checks that each p1
is neither in the middle of an octopus merge nor the end of it. Since the end
of the first octopus merge is a p1 of the second one, this asserts.
Change the sanity check to only make sure that p1 is not in the middle of an
octopus merge.
This was crafted mostly via a bunch of aimless flailing in the
code. I'm pretty well convinced at this point that the incoming
support needs to be rewritten slightly to behave properly in the new
world order (specifically, the overlayrepo class probably should be
subclassing localrepo, or else more directly reimplementing things
instead of trying to forward methods.)
I've been waiting for dulwich upstream to fix this *and* for a test
from domruf that's acceptable. Having gotten neither over a period of
/months/, and having hit the bug myself, I'm moving on and accepting a
patch without tests. This will likely break again, but hopefully
before we'd break it dulwich will be fixed.