Commit Graph

12 Commits

Author SHA1 Message Date
Siddharth Agarwal
6b4e5f67db hg2git: fix subrepo handling to be deterministic
Previously, the correctness of _handle_subrepos was based on the order the
files were processed in. For example, consider the case where a subrepo at
location 'loc' is replaced with a file at 'loc', while another subrepo exists.
This would cause .hgsubstate and .hgsub to be modified and the file added.

If .hgsubstate was seen _before_ 'loc' in the modified/added loop, then
_handle_subrepos would run and remove 'loc' correctly, before 'loc' was added
back later. If, however, .hgsubstate was seen _after_ 'loc', then
_handle_subrepos would run after 'loc' was added and would remove 'loc'.

With this patch, _handle_subrepos merely computes the changes that need to be
applied. The changes are then applied, making sure removed files and subrepos
are processed before added ones.

This was detected by setting a random PYTHONHASHSEED (in this case, 3910358828)
and running the test suite against it. An upcoming patch will randomize the
PYTHONHASHSEED in run-tests.py, just like is done in Mercurial.
2014-02-19 20:52:59 -08:00
Siddharth Agarwal
689b38dc44 hg2git: move parse_subrepos to top level
durin42 expressed a desire for this function to be at the top level.
2014-02-19 20:18:43 -08:00
Siddharth Agarwal
d7dbce79bd hg2git: call _handle_subrepos when .hgsubstate is removed
Now that _handle_subrepos can handle .hgsubstate being removed, we should use
it for that.

The test changes make sure that the SHAs roundtrip.
2014-02-12 22:55:16 -08:00
Siddharth Agarwal
39d1c15298 hg2git: make _handle_subrepos worked in the removed case
A test for this will be included in an upcoming patch.
2014-02-12 21:19:04 -08:00
Siddharth Agarwal
ca74d6d967 hg2git: add 'new' prefix to _handle_subrepos variables
An upcoming patch will introduce similar variables for self._ctx. This helps
disambiguate.
2014-02-12 20:34:09 -08:00
Siddharth Agarwal
3cadf19b94 hg2git: factor out subrepo parsing into a separate function
This code will be used in multiple contexts in an upcoming patch.
2014-02-12 20:28:28 -08:00
Siddharth Agarwal
44c13be822 hg2git: factor out remove path logic into a separate function
This will be used by _handle_subrepos in an upcoming patch.
2014-02-12 19:50:56 -08:00
Siddharth Agarwal
873a402c5e hg2git: call status on newctx, not newctx.rev()
There's no benefit to calling rev().
2014-02-12 18:05:12 -08:00
Siddharth Agarwal
17657a025c hg2git: store ctx instead of rev
Storing a ctx enables values like manifests to be cached on the context.
2014-02-12 17:49:14 -08:00
Siddharth Agarwal
b470bfcf51 hg2git: rename ctx to newctx in update_changeset
An upcoming patch will introduce a new field called _ctx. This helps prevent
confusion.
2014-02-12 17:47:38 -08:00
Gregory Szorc
10dcc5b5c0 Only export modified Git trees
Previously, we emitted every Git tree when updating between Mercurial
changesets. With this patch, we now only emit Git trees that changed. A
side-effect of the implementation is that we now only update in-memory
Git trees objects that changed. Before, we always touched Git trees,
invalidating them in the process and causing Dulwich to recalculate
their SHA-1. Profiling revealed this to be expensive and removing the
extra calculation shows a nice performance win.

Another optimization is to not sort the order that changed paths are
processed in. Previously, we sorted by length, longest to shortest.
Profiling revealed that the sorts took a non-trivial amount of time.
While sorted execution resulted in likely idempotent behavior, it
shouldn't be strictly required.

On the author's machine, conversion of the Mercurial repository itself
decreased from ~493s to ~333s. Even more impressive is conversion of
Firefox's main repository (which is considerably larger). Converting the
first 200 revisions of that repository decreased from ~152s to ~42s.
2013-04-14 11:11:41 -07:00
Gregory Szorc
baa19027ef Export Git objects from incremental Mercurial changes
This replaces the brute force Mercurial to Git export with one that is
incremental. It results in a decent performance win and paves the road
for parallel export via using multiple incremental exporters.
2013-03-19 22:44:01 -07:00