Commit Graph

244 Commits

Author SHA1 Message Date
Siddharth Agarwal
89c409af60 verify: add new command to verify the contents of a Mercurial rev
Since the Git to Mercurial conversion process is incremental, it's at risk of
missing files, or recording files the wrong way, or recording the wrong commit
metadata. Add a command called 'gverify' that can verify the contents of a
particular Mercurial rev against the corresponding Git commit.

Currently, this is limited to checking file names, flags and contents, but this
can be made as robust as desired. Further additions will probably require
refactoring git_handler.py a bit though.

This function is pretty fast: on a Linux machine with a warm cache, verifying a
repository with around 50,000 files takes just 20 seconds. There is scope for
further improvement through parallelization, but conducting tree walks in
parallel is non-trivial with the current worker infrastructure in Mercurial.
2014-02-26 14:19:24 -08:00
Siddharth Agarwal
56cbe49bb0 git_handler: remove init_if_missing
This function is a no-op and can be removed.
2014-02-25 20:01:42 -08:00
Siddharth Agarwal
6d1bd2e02e git_handler: make self.git a lazily evaluated property
This allows other functions to be able to use the `git` property without
needing to care about initializing it.

An upcoming patch will remove the `init_if_missing` function.
2014-02-25 19:51:02 -08:00
Siddharth Agarwal
bbfc3bf8b0 overlayrevlog: handle root commits correctly
Previously, we'd try to access commit.parents[0] and fail. Now, check for
commit.parents being empty and return what Mercurial thinks is a repository
root in that case.
2014-02-25 00:23:12 -08:00
Siddharth Agarwal
a5e956514d overlayrevlog: handle rev = 0 correctly
Previously we'd just test if gitrev was falsy, which it is if the rev returned
is 0, even though it shouldn't be. With this patch, test against None
explicitly.

This unmasks another bug: see next patch for a fix and a test.
2014-02-25 00:20:22 -08:00
Siddharth Agarwal
fea85d57c6 git_handler: fix call to self.ui.progress in flush
Since we now directly use progress on self.ui, we shouldn't pass in self.ui as
the first argument. Oops.
2014-02-24 15:29:31 -08:00
Siddharth Agarwal
2cc81f4c1f git_handler: don't compute tags for each tag imported
Previously we'd recompute the repo tags each time we'd consider importing a Git
tag. This is O(n^2) in the number of tags and produced noticeable slowdowns in
repos with large numbers of tags.

To fix this, compute the tags just once. This is correct because the only case
where we'd have issues is if multiple new Git tags with the same name were
introduced, which can't happen because Git tags cannot share names.

For a repository with over 200 tags, this causes a no-op hg pull to be sped up
by around 0.5 seconds.
2014-02-24 11:38:00 -08:00
Siddharth Agarwal
27784bf3bc util: drop support for Mercurial < 1.4 2014-02-19 18:49:42 -08:00
Siddharth Agarwal
01fb9068a0 git_handler: replace util.progress with ui.progress
util.progress was a shim for Mercurial < 1.4.
2014-02-19 18:49:28 -08:00
Siddharth Agarwal
01896282f6 overlay: drop support for Mercurial < 1.9 2014-02-19 18:46:56 -08:00
Siddharth Agarwal
a49ac12684 git_handler: remove old and bogus code for deleting entries from tags cache
This code never worked for Mercurial >= 2.0, since it neither had repo._tags
nor repo.tagscache.
2014-02-19 18:45:36 -08:00
Siddharth Agarwal
d7bad71d02 git_handler.save_tags: drop support for Mercurial < 1.9 2014-02-19 16:12:27 -08:00
Siddharth Agarwal
5ef555a629 git_handler.save_map: drop support for Mercurial < 1.9 2014-02-19 16:10:35 -08:00
Siddharth Agarwal
8f2d697a54 hgrepo.tags: drop support for Mercurial < 2.0
A new property called _tagscache was introduced in Mercurial 2.0, so the cache
wasn't actually working.

The contract for tags() also changed at some point -- it stopped returning
nodes that weren't in the repo. This will need to be accounted for if we
start using the tags cache again. However, it isn't very clear whether the
Mercurial tags cache is actually worth doing, since we already have a
separate in-memory cache for Git tags in the handler.
2014-02-19 16:09:23 -08:00
Siddharth Agarwal
41357ce554 hgrepo.push: drop support for Mercurial < 1.6 2014-02-19 15:55:45 -08:00
Siddharth Agarwal
732da34592 gitrepo: drop support for Mercurial < 1.7 2014-02-19 15:54:37 -08:00
Siddharth Agarwal
7e329463b5 getremotechanges: drop support for Mercurial < 1.7 2014-02-19 15:54:04 -08:00
Siddharth Agarwal
a1a2eb9b35 nodetags: drop support for Mercurial < 1.6 2014-02-19 15:53:14 -08:00
Siddharth Agarwal
c11b48a4e7 extsetup: drop support for Mercurial < 1.7 2014-02-19 15:52:14 -08:00
Siddharth Agarwal
6a0d42bac0 version: drop support for Mercurial 1.9.3
Upcoming patches will clean up some code that makes hg-git work with Mercurial
versions < 2.0.
2014-02-19 15:48:27 -08:00
Siddharth Agarwal
6b4e5f67db hg2git: fix subrepo handling to be deterministic
Previously, the correctness of _handle_subrepos was based on the order the
files were processed in. For example, consider the case where a subrepo at
location 'loc' is replaced with a file at 'loc', while another subrepo exists.
This would cause .hgsubstate and .hgsub to be modified and the file added.

If .hgsubstate was seen _before_ 'loc' in the modified/added loop, then
_handle_subrepos would run and remove 'loc' correctly, before 'loc' was added
back later. If, however, .hgsubstate was seen _after_ 'loc', then
_handle_subrepos would run after 'loc' was added and would remove 'loc'.

With this patch, _handle_subrepos merely computes the changes that need to be
applied. The changes are then applied, making sure removed files and subrepos
are processed before added ones.

This was detected by setting a random PYTHONHASHSEED (in this case, 3910358828)
and running the test suite against it. An upcoming patch will randomize the
PYTHONHASHSEED in run-tests.py, just like is done in Mercurial.
2014-02-19 20:52:59 -08:00
Siddharth Agarwal
689b38dc44 hg2git: move parse_subrepos to top level
durin42 expressed a desire for this function to be at the top level.
2014-02-19 20:18:43 -08:00
Siddharth Agarwal
08f028a3c9 gitnodekw: use githandler from repo
Since a fresh GitHandler is no longer created for every commit, this speeds up
the {gitnode} template massively.

For a repo with over 50,000 commits, the command

hg log -l 10 --template '{gitnode}\n'

speeds up from 2.4 seconds to 0.3.
2014-02-19 15:23:36 -08:00
Siddharth Agarwal
e7c06facc2 revset_gitnode: use githandler from repo 2014-02-19 15:22:54 -08:00
Siddharth Agarwal
1d58a0a197 revset_fromgit: use githandler from repo 2014-02-19 15:22:36 -08:00
Siddharth Agarwal
b1bbd30c48 getremotechanges: use githandler from repo 2014-02-19 15:15:01 -08:00
Siddharth Agarwal
886532ea23 findcommonoutgoing: use githandler from repo 2014-02-19 15:13:43 -08:00
Siddharth Agarwal
068acd034c gclear: use githandler from repo 2014-02-19 15:12:59 -08:00
Siddharth Agarwal
fcd7e472fc gexport: use githandler from repo 2014-02-19 15:12:42 -08:00
Siddharth Agarwal
298b98a518 gimport: use githandler from repo 2014-02-19 15:12:20 -08:00
Siddharth Agarwal
dd8bbcebed gitrepo: drop unused _initializehandler function and handler property
Also drop the GitHandler import. All this now lives on hgrepo.
2014-02-19 15:11:14 -08:00
Siddharth Agarwal
d239f557d1 gitrepo.listkeys: use githandler from localrepo
Previously we'd load the git and hg maps twice on separate git handler objects.
This avoids that.

For a repo with over 50,000 commits, this brings a no-op hg pull down from 2.45
seconds to 2.37.
2014-02-19 15:07:19 -08:00
Siddharth Agarwal
772133c48a hgrepo.tags: use githandler property
Currently we call hgrepo.tags() separately for each tag. (This should be fixed
at some point.) This avoids initializing a separate git handler for each tag.

For a repository with over 150 tags, this brings down a no-op hg pull by 0.05
seconds.
2014-02-19 14:16:40 -08:00
Siddharth Agarwal
232c6612ae hgrepo._findtags: use githandler property 2014-02-19 14:15:33 -08:00
Siddharth Agarwal
1c6dc044d5 hgrepo.findoutgoing: use githandler property 2014-02-19 14:14:54 -08:00
Siddharth Agarwal
f87f28dc3a hgrepo.push: use githandler property 2014-02-19 14:14:01 -08:00
Siddharth Agarwal
63f40c5059 hgrepo.pull: use githandler property 2014-02-19 14:12:38 -08:00
Siddharth Agarwal
728f8df8de hgrepo: expose git handler as a property
This and upcoming patches have the goal of initializing a GitHandler just once
for a Mercurial repo.
2014-02-19 14:12:03 -08:00
Siddharth Agarwal
7d37b2a516 git_handler: terminate new commit DAG traversal at known commits
Any commit in _map_git is already known, so there's no point walking further
down the DAG.

For a repo with over 50,000 commits, this brings down a no-op hg pull from 38
seconds to 2.5.
2014-02-18 20:30:27 -08:00
Siddharth Agarwal
6f79df86d2 git_handler: use convert_list to cache git objects
getnewgitcommits() does a weird traversal where a particular commit SHA is
visited as many times as the number of parents it has, effectively doubling
object reads in the standard case with one parent. This patch makes the
convert_list a cache for objects, so that a particular Git object is read just
once.

On a mostly linear repository with over 50,000 commits, this brings a no-op hg
pull down from 70 seconds to 38, which is close to half the time, as expected.
Note that even a no-op hg pull currently does a full DAG traversal -- an
upcoming patch will fix this.
2014-02-18 20:22:13 -08:00
Siddharth Agarwal
36052aca77 git_handler: note that new commits are returned in topo order
This wasn't obvious to me at first.
2014-02-18 20:13:15 -08:00
Siddharth Agarwal
5e72b26e7b git_handler: fix progress reset call 2014-02-16 01:13:10 -08:00
Siddharth Agarwal
298fec2a4b git_handler: use repo.changelog.node instead of repo.lookup
For a repo with over 50,000 commits, this brings down the computation of
'export' from 1.25 seconds to 0.25 seconds.

To scale this to hundreds of thousands of commits, one solution might be to
maintain the mapping in a DAG data structure mirroring the changelog, over
which findcommonmissing can be used.
2014-02-16 01:11:47 -08:00
Siddharth Agarwal
d7dbce79bd hg2git: call _handle_subrepos when .hgsubstate is removed
Now that _handle_subrepos can handle .hgsubstate being removed, we should use
it for that.

The test changes make sure that the SHAs roundtrip.
2014-02-12 22:55:16 -08:00
Siddharth Agarwal
39d1c15298 hg2git: make _handle_subrepos worked in the removed case
A test for this will be included in an upcoming patch.
2014-02-12 21:19:04 -08:00
Siddharth Agarwal
ca74d6d967 hg2git: add 'new' prefix to _handle_subrepos variables
An upcoming patch will introduce similar variables for self._ctx. This helps
disambiguate.
2014-02-12 20:34:09 -08:00
Siddharth Agarwal
3cadf19b94 hg2git: factor out subrepo parsing into a separate function
This code will be used in multiple contexts in an upcoming patch.
2014-02-12 20:28:28 -08:00
Siddharth Agarwal
44c13be822 hg2git: factor out remove path logic into a separate function
This will be used by _handle_subrepos in an upcoming patch.
2014-02-12 19:50:56 -08:00
Siddharth Agarwal
94957f9a66 git_handler: remove collect_gitlinks now that it is unused 2014-02-15 16:21:49 -08:00
Siddharth Agarwal
8d0c4fe9f2 git_handler: fix hgsubstate generation
Before this patch, in the git to hg conversion, .hgsubstate once created is
never deleted, even if no submodules are any longer present. This is broken
state, as shown by the test for which the SHA changes. Fix that by looking at
the diff instead of just what submodules are present.

Since 'gitlinks' now contains *changed* gitlinks, not *all* gitlinks, it no
longer makes sense to gate gitmodules checks on that.

This patch simply demonstrates that the test was broken; an upcoming patch will
introduce more tests.

Bonus: this also makes the import process faster because we no longer need to
walk the entire tree to collect gitlinks.

This will cause the SHAs of repos that have submodules added and then removed
to change.
2014-02-14 15:44:50 -08:00