Consider a Mercurial commit with hash 'h1'. Originally, if the only Mercurial
field stored is the branch info (which is stored in the commit message rather
than as an extra field), we'd store the rename source explicitly as a Git extra
field -- let's call the original exported hash 'g1'.
In Git, some operations throw all extra fields away. (One such example is a
rebase.) If such an operation happens, we'll be left with a frankencommit with
the branch info but without the rename source. Let's call this hash 'g2'. For a
setup where Git is the source of truth, let's say that this 'g2' frankencommit
is what gets pushed to the server.
When 'g2' is subsequently imported into Mercurial, we'd look at the fact that
it contains a Mercurial field in the commit message and believe that it was a
legacy commit from the olden days when all info was stored in the commit
message. In that case, in an attempt to preserve the hash, we wouldn't store
any extra rename source info, resulting in 'h1'. Then, when the commit is
re-exported to Git, we'd add the rename source again and produce 'g1' -- and
thus break bidirectionality.
Prevent this situation by not storing the rename source if we're adding branch
info to the commit message. Then for 'h1' we export as 'g2' directly and never
produce 'g1'.
What happens if we not only need to store branch info but also other extra
info, like renames? For 'h1' we'd produce 'g1', then it'd be rewritten on the
Git side to 'g2' throwing away all that extra information. 'g2' being
subsequently imported into Mercurial would produce a new hash, say 'h2'. That's
fine because the commit did get rewritten in Git. We unfortunately wouldn't
perform rename detection thinking that the commit is from Mercurial and had no
renames recorded there, but when the commit is re-exported to Git we'd export
it to 'g2' again. This at least preserves bidirectionality.
Deleting a .gitignore using 'rm' results in 'hg add' or 'hg status' aborting.
For example if the top-level .gitignore is removed:
> abort: No such file or directory: .gitignore
This change avoids that by checking the presence of the .gitignore files.
See comment inline for explanation. Also add tests for this (the bug was masked
with rename detection disabled -- it only appeared with rename detection
enabled).
This adds some test coverage for cases that are not buggy. There is a bug
lurking here, and an upcoming patch will extend these tests to cover those
cases too.
A user recently got confused and managed to track and export a .git
directory, which confuses git and causes it to emit very odd
errors. For example, cloning one such repository (which has a symlink
for .git) produces this output from git:
Cloning into 'git'...
done.
error: Updating '.git' would lose untracked files in it
and another (which has a .git directory checked in) produces this:
Cloning into 'git'...
done.
error: Invalid path '.git/hooks/post-update'
If it ended there, that'd be fine, but this led to a line of
investigation that ended with CVE-2014-9390, so now git will block
checking these revisions out, so we should try to prevent
foot-shooting on our end. Since some servers (notably github) are
blocking trees that contain these entries, default to refusing to
export any path component that looks like it folds to .git. Since some
histories probably contain this already, we offer an escape hatch via
the config option git.blockdotgit that allows users to resume
foot-shooting behavior.
See inline comments for why the additional metadata needs to be stored.
This literally breaks all the hashes because of the additional metadata. The
changing of hashes is unfortunate but necessary to preserve bidirectionality.
While this could be broken up into multiple commits, there was no way to do
that while preserving bidirectionality. Following the principle that every
intermediate commit must result in a correct state, I decided to combine the
commits.
We use Dulwich's rename detector to detect any renames over the specified
similarity threshold.
This isn't fully bidirectional yet -- when the commit is exported to Git
the hashes will no longer be the same. That's why that isn't tested here. In
upcoming patches we'll make sure it's bidirectional and will add the
corresponding tests.
Git 1.8.0 was released in late 2012. Given that we only support versions of
Mercurial released in 2014, it doesn't make sense to support versions of Git
this old.
This is quite similar to syntax Git supports. In the future maybe core
Mercurial could be extended to support this, but I think this independently
makes sense in hg-git.
Git has a well-hidden notion of extra fields. They aren't accessible from the
CLI, but Dulwich can create them. They're a better place to store Mercurial's
extra metadata than the commit message.
An upcoming patch will switch hg-git to saving the metadata in the Git extra
fields. This commit:
(a) prepares for that
(b) extracts other Git extra fields so that commits from Git with such fields
roundtrip correctly
Previously, we'd iterate over the extra elements in arbitrary order. We now
sort the elements and store them in deterministic order.
Without sorting, the included test fails half the time.
Mercurial commit a6d634896102 added support for hg cat across subrepositories.
That caused these tests to break. Fix this by using hg manifest instead.
Previously, whenever a tree that wasn't the root ('') was stored, we'd prepend
a '/' to it. Then, when we'd try retrieving the entry, we'd do so without the
leading '/'. This caused data loss because existing tree entries were dropped
on the floor. Fix that by only adding '/' if we're adding to a non-empty
initial path.
This wasn't detected in tests because most of them deal only with files in the
root and not ones in subdirectories.
The usage of getattr was unsafe. Use hgutil.safehasattr instead.
util.safehasattr has been around since Mercurial 2.0.
This also fixes the formerly disabled test in test-pull.t.
Previously we'd attempt to import every single reachable commit in the Git
object store.
The test adds another branch to the Git repo and doesn't import it until much
later. Previously we'd import it when we ran `hg -R hgrepo pull -r beta`. Now
we won't.
The return value as implemented in git_handler.fetch was pretty bogus. It used
to return the number of values that changed in the 'refs/heads/' namespace,
regardless of whether multiple values in there point to the same Mercurial
commit, or whether particular heads were even imported. Fix all of that by
using the actual heads in the changelog, just like vanilla Mercurial.
The test output changes demonstrate examples where the code was buggy.
Since Mercurial is commit-oriented, the 'no changes found' message really
should rely on what new commits are in the repo, not on new heads. This also
makes an upcoming patch much simpler.
Since everything around this code is completely broken anyway, writing a test
for this that doesn't trigger other bugs is close to impossible. An upcoming
patch will include tests.
The test output change is for an empty clone -- the output is precisely how
vanilla Mercurial treats an empty clone.
Since the Git to Mercurial conversion process is incremental, it's at risk of
missing files, or recording files the wrong way, or recording the wrong commit
metadata. Add a command called 'gverify' that can verify the contents of a
particular Mercurial rev against the corresponding Git commit.
Currently, this is limited to checking file names, flags and contents, but this
can be made as robust as desired. Further additions will probably require
refactoring git_handler.py a bit though.
This function is pretty fast: on a Linux machine with a warm cache, verifying a
repository with around 50,000 files takes just 20 seconds. There is scope for
further improvement through parallelization, but conducting tree walks in
parallel is non-trivial with the current worker infrastructure in Mercurial.
Previously, we'd try to access commit.parents[0] and fail. Now, check for
commit.parents being empty and return what Mercurial thinks is a repository
root in that case.
This is the version in Mercurial rev 8f2c4360fa44, plus a patch to make
--with-hg work for system hg (sent upstream). Importantly, this gets us the
hash seed randomization we need for bugs like the one fixed by the parent
commit to be detected.
Before this patch, in the git to hg conversion, .hgsubstate once created is
never deleted, even if no submodules are any longer present. This is broken
state, as shown by the test for which the SHA changes. Fix that by looking at
the diff instead of just what submodules are present.
Since 'gitlinks' now contains *changed* gitlinks, not *all* gitlinks, it no
longer makes sense to gate gitmodules checks on that.
This patch simply demonstrates that the test was broken; an upcoming patch will
introduce more tests.
Bonus: this also makes the import process faster because we no longer need to
walk the entire tree to collect gitlinks.
This will cause the SHAs of repos that have submodules added and then removed
to change.
Consider two octopus merges, one of which is a child of the other. Without this
patch, get_git_parents() called on the second octopus merge checks that each p1
is neither in the middle of an octopus merge nor the end of it. Since the end
of the first octopus merge is a p1 of the second one, this asserts.
Change the sanity check to only make sure that p1 is not in the middle of an
octopus merge.