When converting a Git repository to Mercurial at Mozilla, I encountered
a scenario where I didn't want `hg convert` to automatically add the
"committer: <committer>" line to commit messages. While I can hack around
this by rewriting the Git commit before it is fed into `hg convert`,
I figured it would be a useful knob to control.
This patch introduces a config option that allows lots of control
over the committer value. I initially implemented this as a single
boolean flag to control whether to save the committer message. But
then there was feedback that it would be useful to save the committer
in extra data. While this patch doesn't implement support for saving
in extra data, it does add a mechanism for extending which actions
to take on the committer field. We should be able to easily add
actions to save in extra data.
Some of the implemented features weren't asked for. But I figured they
could be useful. If nothing else they demonstrate the extensibility
of this mechanism.
The extra glob in test-command-template.t caused it to say no result was
reported. It used to be (within the past year), that both this and the missing
glob cases could be fixed simply by editing any output in the test, and
re-running it in interactive mode. But that no longer works, and I had to diff
*.t against *.t.err. I didn't dig into what changed.
common.commit.__init__ sets saverev=True by default. The side effect
of this is that the hg sink will always set the "convert_revision"
extras key to the commit being converted.
This patch adds a config option to disable this behavior.
While most consumers will want "convert_revision" to be a) written
b) with the exact Git commit that was converted, some have use cases
that prefer otherwise. In my case, I am performing significant
rewrites of a Git repository *before* it is fed into `hg convert`.
I have to do this because `hg convert` does not easily support the kind
of transform I desire, even with extensions. (For the curious, I am
"linearizing" the history of a GitHub repo by removing merge commits
which add little value to the final history. It isn't easy to do this
during `hg convert` because of Mercurial's file copy/rename metadata
requirements.)
In my scenario, my pre-convert transform stores a "convert_revision"
key in the Git commit object containing the original Git commit ID.
I want this original Git commit ID carried forward to Mercurial. By
disabling the setting of this extra during `hg convert` and copying
the value from the Git commit object, I can have the final
"convert_revision" extra key contain the original Git commit ID. An
added test verifies this exact scenario.
This feature could likely be implemented for other VCS sources. But
until someone needs the feature, I'm inclined to hold off implementing.
Git commit objects support storing arbitrary key-value metadata. While
there is no user-facing mechanism in Git to record these values, some
tools do record data here.
Currently, `hg convert` only handles the "author," "committer," and
"parent" keys in Git commit objects. All other keys are ignored. This
means that any custom keys are lost when converting Git repos to
Mercurial.
This patch implements support for copying a whitelist of extra keys
from Git commit objects to the "extras" dict of the destination. As
the added tests demonstate, this allows extra metadata to be preserved
during the conversion process.
This patch stops short of converting all metadata to "extras." We could
potentially implement this via `convert.git.extrakeys=*` or similar.
But copying everything by default is a bit dangerous because if Git
adds new keys to commit objects, we could find ourselves copying
things that shouldn't be copied!
This patch also assumes the source key is the same as the destination
key. We could implement support for prefixing the output key to
distinguish it as coming from Git. But until this feature is needed,
I'm inclined to hold off implementing it.
For reasons I can't explain, Git's copy detection code was identifying
different source files on OS X and (presumably) Solaris versus Linux
(which the test was originally authored against). This was causing
unstable test output.
Changing the test to use a non-ambiguous source file appears to make
the test stable.
The test was introduced recently in 3556dac2da27.
By default, Git applies rename and copy detection to 400 files. The
diff.renamelimit config option and -l argument to diff commands can
override this.
As part of converting some repositories in the wild, I was hitting
the default limit. Unfortunately, the warnings that Git prints in this
scenario are swallowed because the process running functionality in
common.py redirects stderr to /dev/null by default. This seems like
a bug, but a bug for another day.
This commit establishes a config option to send the rename limit
through to `git diff-tree`. The added tests demonstrate a too-low
rename limit doesn't result in copy metadata being recorded.
Fixes CVE-2016-3105 (1/1).
Previously, it was possible for the repository path passed to git-ls-remote
to be misinterpreted as a URL.
Always passing an absolute path to git is a simple way to avoid this.
CVE-2016-3069 (5/5)
Before recent refactoring we were not escaping calls to git at all
which made such injections possible. Let's have a test for that to
avoid this problem in the future. Reported by Blake Burkhart.
On my machine, whenever I run all test with a high -j value, test-convert-git.t
would consistently fail by displaying an estimate. This patch removes that value
from the output.
I do see this on Linux with 1.7.7.6, but not on Windows with 1.9.5.msysgit.0.
Adding a (?) didn't work to conditionally ignore the line; I'm not sure if the
(glob) interferes with that.
When running the tests with 1.7.7.6, I get 'files' and 'insertions' instead of
the singular forms, and there is also an additional '0 deletions(-)' at the end.
Since this doesn't seem important to the test, silence it.
This adds an option to not pull in gitsubmodules during a convert. This is
useful when converting large git repositories where gitsubmodules were allowed
historically, but are no longer wanted.
git version 2.4.3:
--- /home/augie/hg/tests/test-convert-git.t
+++ /home/augie/hg/tests/test-convert-git.t.err
@@ -659,7 +659,7 @@
$ touch a && git add a && git commit -am "commit a"
[master (root-commit) 8ae5f69] commit a
Author: nottest <test@example.org>
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 a
$ cd ..
$ git clone git-repo7 git-repo7-client
git version 1.7.9.5:
--- /home/augie/hg/tests/test-convert-git.t
+++ /home/augie/hg/tests/test-convert-git.t.err
@@ -659,7 +659,7 @@
$ touch a && git add a && git commit -am "commit a"
[master (root-commit) 8ae5f69] commit a
Author: nottest <test@example.org>
- 1 file changed, 0 insertions(+), 0 deletions(-)
+ 0 files changed
create mode 100644 a
$ cd ..
$ git clone git-repo7 git-repo7-client
I don't know when this changed in git and am too lazy to try and
bisect it, so just work around the change.
There was a bug in the git convert code where if you copied a file and modified
the copy source in the same commit, and if the copy dest was alphabetically
earlier than the copy source, the converted version would use the copy dest
contents for both the source and the target.
The root of the bug is that the git diff-tree output is formatted like so:
:<mode> <mode> <oldhash> <newhash> <state> <src> <dest>
:100644 100644 c1ab79a15... 3dfc779ab... C069 oldname newname
:100644 100644 c1ab79a15... 03e2188a6... M oldname
The old code would always take the 'oldname' field as the name of the file being
processed, then it would try to do an extra convert for the newname. This works
for renames because it does a delete for the oldname and a create for the
newname.
For copies though, it ends up associating the copied content (3dfc779ab above)
with the oldname. It only happened when the dest was alphabetically before
because that meant the copy got processed before the modification.
The fix is the treat copy lines as affecting only the newname, and not marking
the oldname as processed.
The conversion from git to hg was reading the remote branch list directly from
the origin server. If the origin's branch had moved forward since the last git
fetch, it would return a git hash which didn't exist locally, and therefore the
branch was not converted.
This changes it to rely on the local repo's refs/remotes list of branches
instead, so it's completely cut off from the server.
Previously all git remotes were created as "remote/foo". This patch adds a
configuration option for deciding what the prefix should be. This is useful if
you want the bookmarks to be "origin/foo" like they are in git, or if you're
integrating with the remotenames extension and don't want the local remote/foo
bookmarks to overlap with the remote foo bookmarks.
The absolute URL was causing this error with 1.9.5 on Windows, which had a
cascading effect:
@@ -466,22 +466,24 @@
> url = $TESTTMP/git-repo5
> EOF
$ git commit -a -m "weird white space submodule"
- [master *] weird white space submodule (glob)
- Author: nottest <test@example.org>
- 1 file changed, 3 insertions(+)
+ fatal: bad config file line 6 in $TESTTMP/git-repo6/.gitmodules
+ [128]
$ cd ..
$ hg convert git-repo6 hg-repo6
initializing destination hg-repo6 repository
scanning source...
For reasons unknown, there is still this delta on Windows:
@@ -490,7 +490,6 @@
$ git commit -q -m "missing .gitmodules"
$ cd ..
$ hg convert git-repo6 hg-repo6 --traceback
- fatal: Path '.gitmodules' does not exist in '*' (glob)
initializing destination hg-repo6 repository
scanning source...
sorting...
The output has apparently changed slightly since this version. Since they are
just commits without any obvious importance to the test, and I can't figure out
how to glob them away, silence them.
Sample diffs were like this:
@@ -468,7 +468,7 @@
$ git commit -a -m "weird white space submodule"
[master *] weird white space submodule (glob)
Author: nottest <test@example.org>
- 1 file changed, 3 insertions(+)
+ 1 files changed, 3 insertions(+), 0 deletions(-)
$ cd ..
$ hg convert git-repo6 hg-repo6
initializing destination hg-repo6 repository
Previously convert would throw an exception if it encountered a git commit with
a .gitmodules file that was malformed (i.e. was missing, but had submodule
files, or was malformed).
Instead of breaking the convert entirely, let's print error messages and move
on.
For merges, we walk the files N-1 times, where N is the number of
parents. This means that for an octopus merge with 3 parents and 2
changed files, we actually fetch 6 files. This corrects the progress
output of the convert command when such commits are encountered.
Since both 'bar' and 'bar-copied' matched 'bar-copied2', Git's copy detection
would sometimes result in 'bar' being the source for 'bar-copied2' and
sometimes 'bar-copied'. Change bar in a separate commit so that that would no
longer be the case.
Git is fairly unique among VCSes in that it doesn't record copies and renames,
instead choosing to detect them on the fly. Since Mercurial expects copies and
renames to be recorded, it can be valuable to preserve this history while
converting a Git repository to Mercurial. This patch adds a new convert option,
called 'convert.git.similarity', which determines how similar files must be to
be treated as renames or copies.
Before this patch, all operations applied on ".gitmodules" at git
source revisions are treated as modification, even if they are
actually removal of it.
If removal of ".gitmodules" is treated as modification unexpectedly,
"hg convert" is aborted by the exception raised in
"retrievegitmodules()" for ".gitmodules" at the git source revision
removing it, because that revision doesn't have any information of
".gitmodules".
This patch detects removal of ".gitmodules" at git source revisions
correctly.
If ".gitmodules" is removed at the git source revision, this patch
records "hex(nullid)" as the contents hash value for ".hgsub" and
".hgsubstate" at the destination revision.
This patch makes "getfile()" raise IOError also for ".hgstatus" and
".hgsubstate" if the contents hash value is "hex(nullid)", and this
tells removal of ".hgstatus" and ".hgsubstate" at the destination
revision to "localrepository.commitctx()" correctly.
For files other than ".hgstatus" and ".hgsubstate", checking the
contents hash value in "getfile()" may be redundant, because
"catfile()" for them also does so.
But this patch chooses writing it only once at the beginning of
"getfile()", to avoid writing same code twice both for ".hgsub" and
".hgsubstate" separately.
Implemented similar error handling that is done for hg in an earlier revision.
These are:
a. add checking for splicemap file format
b. add checking for each revision string formats
This error would show up only intermittently since the
test depended on the order of the directories returned by os.walk.
The damage repository test would delete the first object file it came
across. However, the order of the directory listing is arbitrary (it
seems to depend on the filesystem). This meant that sometimes a commit
object was deleted, sometimes a blob object and sometimes a tree
object.
So, fix by hardcoding which object to delete. Delete a commit object,
a blob object and a tree object in three separate tests.
Previously, convert aborted upon encountering a git submodule. This patch
changes it so that it now succeeds. It modifies convert_git to manually generate
'.hgsub' and '.hgsubstate' files for each git revision, so as to convert git sub
modules to non-mercurial subrepositories.
Git object files are stored read-only in the filesystem. Trying to remove a
read-only file on windows will fail with access denied, so we have to make them
writeable before they can be removed.
Git might have autocrlf=true as a global or default setting, especially on
windows. That is not expected in the tests and can cause
+ warning: LF will be replaced by CRLF in d/b.
+ The file will have its original line endings in your working directory.
Explicitly setting it false will make the test pass in some setups - but still
not out of the box.