Commit Graph

105 Commits

Author SHA1 Message Date
Martin von Zweigbergk
c3406ac3db cleanup: use set literals
We no longer support Python 2.6, so we can now use set literals.
2017-02-10 16:56:29 -08:00
Gregory Szorc
19ccacb90b convert: remove "replacecommitter" action
As pointed out by Yuya, this action doesn't add much (any?) value.
2017-01-14 10:11:19 -08:00
Gregory Szorc
2fc8eb0c18 convert: config option to control Git committer actions
When converting a Git repository to Mercurial at Mozilla, I encountered
a scenario where I didn't want `hg convert` to automatically add the
"committer: <committer>" line to commit messages. While I can hack around
this by rewriting the Git commit before it is fed into `hg convert`,
I figured it would be a useful knob to control.

This patch introduces a config option that allows lots of control
over the committer value. I initially implemented this as a single
boolean flag to control whether to save the committer message. But
then there was feedback that it would be useful to save the committer
in extra data. While this patch doesn't implement support for saving
in extra data, it does add a mechanism for extending which actions
to take on the committer field. We should be able to easily add
actions to save in extra data.

Some of the implemented features weren't asked for. But I figured they
could be useful. If nothing else they demonstrate the extensibility
of this mechanism.
2017-01-13 23:21:10 -08:00
Gregory Szorc
57ef32586b convert: add config option to control storing original revision
common.commit.__init__ sets saverev=True by default. The side effect
of this is that the hg sink will always set the "convert_revision"
extras key to the commit being converted.

This patch adds a config option to disable this behavior.

While most consumers will want "convert_revision" to be a) written
b) with the exact Git commit that was converted, some have use cases
that prefer otherwise. In my case, I am performing significant
rewrites of a Git repository *before* it is fed into `hg convert`.
I have to do this because `hg convert` does not easily support the kind
of transform I desire, even with extensions. (For the curious, I am
"linearizing" the history of a GitHub repo by removing merge commits
which add little value to the final history. It isn't easy to do this
during `hg convert` because of Mercurial's file copy/rename metadata
requirements.)

In my scenario, my pre-convert transform stores a "convert_revision"
key in the Git commit object containing the original Git commit ID.
I want this original Git commit ID carried forward to Mercurial. By
disabling the setting of this extra during `hg convert` and copying
the value from the Git commit object, I can have the final
"convert_revision" extra key contain the original Git commit ID. An
added test verifies this exact scenario.

This feature could likely be implemented for other VCS sources. But
until someone needs the feature, I'm inclined to hold off implementing.
2016-12-22 23:28:35 -07:00
Gregory Szorc
6c0707a02c convert: add config option to copy extra keys from Git commits
Git commit objects support storing arbitrary key-value metadata. While
there is no user-facing mechanism in Git to record these values, some
tools do record data here.

Currently, `hg convert` only handles the "author," "committer," and
"parent" keys in Git commit objects. All other keys are ignored. This
means that any custom keys are lost when converting Git repos to
Mercurial.

This patch implements support for copying a whitelist of extra keys
from Git commit objects to the "extras" dict of the destination. As
the added tests demonstate, this allows extra metadata to be preserved
during the conversion process.

This patch stops short of converting all metadata to "extras." We could
potentially implement this via `convert.git.extrakeys=*` or similar.
But copying everything by default is a bit dangerous because if Git
adds new keys to commit objects, we could find ourselves copying
things that shouldn't be copied!

This patch also assumes the source key is the same as the destination
key. We could implement support for prefixing the output key to
distinguish it as coming from Git. But until this feature is needed,
I'm inclined to hold off implementing it.
2016-12-22 23:28:11 -07:00
Gregory Szorc
8f73471951 convert: config option for git rename limit
By default, Git applies rename and copy detection to 400 files. The
diff.renamelimit config option and -l argument to diff commands can
override this.

As part of converting some repositories in the wild, I was hitting
the default limit. Unfortunately, the warnings that Git prints in this
scenario are swallowed because the process running functionality in
common.py redirects stderr to /dev/null by default. This seems like
a bug, but a bug for another day.

This commit establishes a config option to send the rename limit
through to `git diff-tree`. The added tests demonstrate a too-low
rename limit doesn't result in copy metadata being recorded.
2016-12-18 12:53:20 -08:00
Yuya Nishihara
a5c934df3c py3: move up symbol imports to enforce import-checker rules
Since (b) is banned, we should do the same for (a) for consistency.

 a) from mercurial import hg
    from mercurial.i18n import _

 b) from . import hg
    from .i18n import _
2016-05-14 14:03:12 +09:00
Blake Burkhart
a5bebc504a convert: pass absolute paths to git (SEC)
Fixes CVE-2016-3105 (1/1).

Previously, it was possible for the repository path passed to git-ls-remote
to be misinterpreted as a URL.

Always passing an absolute path to git is a simple way to avoid this.
2016-04-06 22:57:46 -05:00
Julien Cristau
3e185dc2e6 convert: kill dead code
gitread is unused with the new commandline-based code.
2016-04-04 15:39:13 +02:00
Julien Cristau
dd250f1d5d convert: don't ignore errors from git diff-tree 2016-04-04 15:38:48 +02:00
Martin von Zweigbergk
a993a6ecbd convert: delete unused imports in git.py
As reported by pyflakes
2016-03-29 10:49:33 -07:00
Matt Mackall
4a96519522 merge with stable 2016-03-29 12:29:00 -05:00
Mateusz Kwapich
843a151296 convert: rewrite gitpipe to use common.commandline (SEC)
CVE-2016-3069 (4/5)
2016-03-22 17:05:11 -07:00
Mateusz Kwapich
e02db5f882 convert: dead code removal - old git calling functions (SEC)
CVE-2016-3069 (3/5)
2016-03-22 17:05:11 -07:00
Mateusz Kwapich
98120175fb convert: rewrite calls to Git to use the new shelling mechanism (SEC)
CVE-2016-3069 (2/5)

One test output changed because we were ignoring git return code in numcommits
before.
2016-03-22 17:05:11 -07:00
Mateusz Kwapich
8103d3a484 convert: add new, non-clowny interface for shelling out to git (SEC)
CVE-2016-3069 (1/5)

To avoid shell injection and for the sake of simplicity let's use the
common.commandline for calling git.
2016-03-22 17:05:11 -07:00
timeless
65e468cce9 convert: git use absolute_import 2016-03-02 20:42:13 +00:00
timeless@mozdev.org
1b8a3b4ea1 grammar: use does instead of do where appropriate 2015-10-14 02:06:54 -04:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Durham Goode
d57c6ca233 convert: add convert.git.skipsubmodules option
This adds an option to not pull in gitsubmodules during a convert. This is
useful when converting large git repositories where gitsubmodules were allowed
historically, but are no longer wanted.
2015-08-24 22:16:01 -07:00
Matt Mackall
6c04738a65 merge with stable 2015-08-10 15:30:28 -05:00
Durham Goode
8f62a68933 convert: fix git copy file content conversions
There was a bug in the git convert code where if you copied a file and modified
the copy source in the same commit, and if the copy dest was alphabetically
earlier than the copy source, the converted version would use the copy dest
contents for both the source and the target.

The root of the bug is that the git diff-tree output is formatted like so:

:<mode> <mode> <oldhash>    <newhash>    <state> <src>   <dest>
:100644 100644 c1ab79a15... 3dfc779ab... C069    oldname newname
:100644 100644 c1ab79a15... 03e2188a6... M       oldname

The old code would always take the 'oldname' field as the name of the file being
processed, then it would try to do an extra convert for the newname. This works
for renames because it does a delete for the oldname and a create for the
newname.

For copies though, it ends up associating the copied content (3dfc779ab above)
with the oldname. It only happened when the dest was alphabetically before
because that meant the copy got processed before the modification.

The fix is the treat copy lines as affecting only the newname, and not marking
the oldname as processed.
2015-08-06 17:21:46 -07:00
Durham Goode
4126c0d435 convert: fix git convert using servers branches
The conversion from git to hg was reading the remote branch list directly from
the origin server. If the origin's branch had moved forward since the last git
fetch, it would return a git hash which didn't exist locally, and therefore the
branch was not converted.

This changes it to rely on the local repo's refs/remotes list of branches
instead, so it's completely cut off from the server.
2015-07-29 13:21:03 -07:00
Durham Goode
ecce172bba convert: allow customizing git remote prefix
Previously all git remotes were created as "remote/foo". This patch adds a
configuration option for deciding what the prefix should be. This is useful if
you want the bookmarks to be "origin/foo" like they are in git, or if you're
integrating with the remotenames extension and don't want the local remote/foo
bookmarks to overlap with the remote foo bookmarks.
2015-07-13 21:37:46 -07:00
Durham Goode
54fa7659ef convert: support multiple specifed revs in git source
This allows specifying multiple revs/branches to convert from a git repo.
2015-07-08 10:29:11 -07:00
Durham Goode
0f795691b8 convert: add support for specifying multiple revs
Previously convert could only take one '--rev'. This change allows the user to
specify multiple --rev entries. For instance, this could allow converting
multiple branches (but not all branches) at once from git.

In this first patch, we disable support for this for all sources.  Future
patches will enable it for select sources (like git).
2015-07-08 10:27:43 -07:00
Durham Goode
bc1c03c315 convert: improve support for unusual .gitmodules
Previously convert would throw an exception if it encountered a git commit with
a .gitmodules file that was malformed (i.e. was missing, but had submodule
files, or was malformed).

Instead of breaking the convert entirely, let's print error messages and move
on.
2015-06-29 17:19:58 -07:00
Durham Goode
8785908f11 convert: handle .gitmodules with non-tab whitespaces
The old implementation assumed .gitmodules file lines always began
with tabs. It can be any whitespace, so lets trim the lines
appropriately.
2015-06-29 17:19:18 -07:00
Mads Kiilerich
c8659cbb76 convert: optimize convert of files that are unmodified from p2 in merges
Conversion of a merge starts with p1 and re-adds the files that were changed in
the merge or came unmodified from p2. Files that are unmodified from p1 will
thus not be touched and take no time. Files that are unmodified from p2 would be
retrieved and rehashed. They would end up getting the same hash as in p2 and end
up reusing the filelog entry and look like the p1 case ... but it was slow.

Instead, make getchanges also return 'files that are unmodified from p2' so the
sink can reuse the existing p2 entry instead of calling getfile.

Reuse of filelog entries can make a big difference when files are big and with
long revlong chains so they take time to retrieve and hash, or when using an
expensive custom getfile function (think
http://mercurial.selenic.com/wiki/ConvertExtension#Customization with a code
reformatter).

This in combination with changes to reuse filectx entries in
localrepo._filecommit make 'unchanged from p2' almost as fast as 'unchanged
from p1'.

This is so far only implemented for the combination of hg source and hg sink.

This is a refactoring/optimization. It is covered by existing tests and show no
changes - which is a good thing.
2015-03-19 17:40:19 +01:00
Thomas Arendsen Hein
64014abd9c convert: use git diff-tree -Cn% instead of --find-copies=n% for older git
The option --find-copies was added in a later git version than the one included
in Debian squeeze-lts (1.7.2.5), probably around 1.7.4.
2014-11-06 09:36:39 +01:00
Siddharth Agarwal
f149aa54b6 convert: change default for git rename detection to 50%
This default mirrors the default for 'git diff'. Other commands have slightly
different defaults -- for example, the move/copy detection for 'git blame'
assumes that a hunk is moved if more than 40 alphanumeric characters are the
same, or copied if more than 20 alphanumeric characters are the same. 50% seems
to be the most common default, though.
2014-09-23 14:45:23 -07:00
Siddharth Agarwal
f28296d4ee convert: simplify git.similarity parsing 2014-09-23 14:40:32 -07:00
Siddharth Agarwal
c1d6e28494 convert: add support to find git copies from all files in the working copy
I couldn't think of a better name for this option, so I stole the Git one in
the hope that anyone converting a Git repo knows what it means.
2014-09-12 12:28:30 -07:00
Siddharth Agarwal
476da59ea2 convert: add support to detect git renames and copies
Git is fairly unique among VCSes in that it doesn't record copies and renames,
instead choosing to detect them on the fly. Since Mercurial expects copies and
renames to be recorded, it can be valuable to preserve this history while
converting a Git repository to Mercurial. This patch adds a new convert option,
called 'convert.git.similarity', which determines how similar files must be to
be treated as renames or copies.
2014-09-12 11:23:26 -07:00
Siddharth Agarwal
cff27af0c6 convert: for git, factor out code to add entries to a separate function
We're going to call this for multiple files in one iteration in upcoming
patches.
2014-09-11 23:57:49 -07:00
Siddharth Agarwal
02fa27546c convert: for git's getchanges, always split entry line into components
We always need to know whether the entry is a rename or copy, so split it up
unconditionally.
2014-09-11 23:37:47 -07:00
Siddharth Agarwal
1ab3d98437 convert: for git's getchanges, use explicit index for iteration
Upcoming patches will add support for copies and renames, for which we'll need
to access multiple lines of the difftree output at once.
2014-09-11 23:35:19 -07:00
Augie Fackler
8997a30abc convert: enable deterministic conversion progress bar for git 2014-09-10 10:51:46 -04:00
Mads Kiilerich
0df22182cc convert: introduce --full for converting all files
Convert will normally only process files that were changed in a source
revision, apply the filemap, and record it has a change in the target
repository. (If it ends up not really changing anything, nothing changes.)

That means that _if_ the filemap is changed before continuing an incremental
convert, the change will only kick in when the files it affects are modified in
a source revision and thus processed.

With --full, convert will make a full conversion every time and process
all files in the source repo and remove target repo files that shouldn't be
there. Filemap changes will thus kick in on the first converted revision, no
matter what is changed.

This flag should in most cases not make any difference but will make convert
significantly slower.

Other names has been considered for this feature, such as "resync", "sync",
"checkunmodified", "all" or "allfiles", but I found that they were less obvious
and required more explanation than "full" and were harder to describe
consistently.
2014-08-26 22:03:32 +02:00
Mads Kiilerich
4dd236da3f convert: use None value for missing files instead of overloading IOError
The internal API used IOError to indicate that a file should be marked as
removed.

There is some correlation between IOError (especially with ENOENT) and files
that should be removed, but using IOErrors to represent file removal internally
required some hacks.

Instead, use the value None to indicate that the file not is present.

Before, spurious IO errors could cause commits that silently removed files.
They will now be reported like all other IO errors so the root cause can be
fixed.
2014-08-26 22:03:32 +02:00
FUJIWARA Katsunori
413d1be801 convert: fix argument mismatch at formatting the abort message
This patch fixes argument mismatch at formatting the abort message,
introduced by dcf325a55d85: the last '%s' doesn't have corresponded
argument.

This patch uses "unexpected size" in the abort message, to distinguish
the reason of failure from "unexpected type" failure checked in the
prior code path below:

        if info[1] != type:
            raise util.Abort(_('cannot read %r object at %s') % (type, rev))
2014-08-01 02:14:24 +09:00
Matt Mackall
5fc8526562 merge with stable 2014-07-14 18:53:03 -05:00
FUJIWARA Katsunori
8ae5e732ec convert: detect removal of ".gitmodules" at git source revisions correctly
Before this patch, all operations applied on ".gitmodules" at git
source revisions are treated as modification, even if they are
actually removal of it.

If removal of ".gitmodules" is treated as modification unexpectedly,
"hg convert" is aborted by the exception raised in
"retrievegitmodules()" for ".gitmodules" at the git source revision
removing it, because that revision doesn't have any information of
".gitmodules".

This patch detects removal of ".gitmodules" at git source revisions
correctly.

If ".gitmodules" is removed at the git source revision, this patch
records "hex(nullid)" as the contents hash value for ".hgsub" and
".hgsubstate" at the destination revision.

This patch makes "getfile()" raise IOError also for ".hgstatus" and
".hgsubstate" if the contents hash value is "hex(nullid)", and this
tells removal of ".hgstatus" and ".hgsubstate" at the destination
revision to "localrepository.commitctx()" correctly.

For files other than ".hgstatus" and ".hgsubstate", checking the
contents hash value in "getfile()" may be redundant, because
"catfile()" for them also does so.

But this patch chooses writing it only once at the beginning of
"getfile()", to avoid writing same code twice both for ".hgsub" and
".hgsubstate" separately.
2014-07-14 23:33:59 +09:00
David Schleimer
ab695e13b2 convert: drastically speed up git conversions
We would formerly exec git cat-file once for every commit, plus once for
every tree and file we wnated to read.  This switches to using git
cat-file's batch mode, which is much, much, much faster.

Using this new code, converting the git git repo to hg ran in 106
minutes on my machine.  Using the stock mercurial, it required 1239
minutes.  I believe this to be typical of the speedups we will see
form this patch.
2014-05-27 21:12:24 -07:00
Sean Farley
4b88fb7766 convert: add mapname parameter to checkrevformat
Upcoming patches will add new map files so we change the calling sequence of
checkrevformat so that error messages will let the user know which file has the
wrong rev format.
2014-01-21 11:34:55 -06:00
Ben Goswami
d334de37cd splicemap: improve error handling when source is git (issue2084)
Implemented similar error handling that is done for hg in an earlier revision.
These are:
   a. add checking for splicemap file format
   b. add checking for each revision string formats
2013-04-25 16:02:58 -07:00
Augie Fackler
5c8acaf5b4 git convert: some versions of git use fatal: instead of error:
I saw this behavior with git 1.7.12 on my Mac.
2013-02-08 07:09:48 -06:00
Ross Lagerwall
cb8603510f convert/git: catch errors from modern git-ls-remote (issue3428)
Since git v1.7.8.2-327-g926f1dd (the change was first released in git
1.7.10), git does not return non-zero when "git ls-remote --tags ..."
is run and the repository is damaged. This causes the "damaged
repository with missing commit" test in test-convert-git.t to
unexpectedly succeed.

Fix by aborting if git outputs any lines beginning with "error:",
which required adding some subprocess use in convert/git.py.
2013-02-08 08:02:57 -06:00
Bryan O'Sullivan
8f2116535a convert: fix a too-long line nag 2012-11-13 13:09:42 -08:00
YaNan Xu
69a8b90589 convert: add support for converting git submodule (issue3528)
Previously, convert aborted upon encountering a git submodule. This patch
changes it so that it now succeeds. It modifies convert_git to manually generate
'.hgsub' and '.hgsubstate' files for each git revision, so as to convert git sub
modules to non-mercurial subrepositories.
2012-10-29 17:40:13 -07:00