This default mirrors the default for 'git diff'. Other commands have slightly
different defaults -- for example, the move/copy detection for 'git blame'
assumes that a hunk is moved if more than 40 alphanumeric characters are the
same, or copied if more than 20 alphanumeric characters are the same. 50% seems
to be the most common default, though.
Git is fairly unique among VCSes in that it doesn't record copies and renames,
instead choosing to detect them on the fly. Since Mercurial expects copies and
renames to be recorded, it can be valuable to preserve this history while
converting a Git repository to Mercurial. This patch adds a new convert option,
called 'convert.git.similarity', which determines how similar files must be to
be treated as renames or copies.
This produces slightly bad results when branches are in play, but
overall I think it's probably worthwhile. We might be able to do
better with branches somehow, but I haven't given it any thought.
This makes it possible to estimate how long the "scanning source"
phase will take, if the specified source repo type supports a quick
"how many changes" check.
Convert will normally only process files that were changed in a source
revision, apply the filemap, and record it has a change in the target
repository. (If it ends up not really changing anything, nothing changes.)
That means that _if_ the filemap is changed before continuing an incremental
convert, the change will only kick in when the files it affects are modified in
a source revision and thus processed.
With --full, convert will make a full conversion every time and process
all files in the source repo and remove target repo files that shouldn't be
there. Filemap changes will thus kick in on the first converted revision, no
matter what is changed.
This flag should in most cases not make any difference but will make convert
significantly slower.
Other names has been considered for this feature, such as "resync", "sync",
"checkunmodified", "all" or "allfiles", but I found that they were less obvious
and required more explanation than "full" and were harder to describe
consistently.
Since it was introduced in 670e8681d92a, tidy_dirs has been comparing
the result of os.listdir with a string - which never can be true.
Convert apparently works anyway and there is no test coverage of it.
It also seems like it could make a bigger difference on older svn versions but
is less relevant with more recent versions.
Instead of trying to fix the code, we take the low risk option and remove it.
The internal API used IOError to indicate that a file should be marked as
removed.
There is some correlation between IOError (especially with ENOENT) and files
that should be removed, but using IOErrors to represent file removal internally
required some hacks.
Instead, use the value None to indicate that the file not is present.
Before, spurious IO errors could cause commits that silently removed files.
They will now be reported like all other IO errors so the root cause can be
fixed.
Perforce has the concept of "+Sn" files where only the last revisions of the
file is stored. In p4d 2012.1 old purged revisions were not included in the
"manifest". With 2012.2 they started being included and convert's getfile
failed to recognize the "purged" flag and saw it as an empty file. That made
test-convert-p4-filetypes.t fail.
There is no point in storing an empty file as placeholder for a purged file so
we restore the old behaviour by checking the flag and letting getfile consider
purged files deleted.
(It is questionable whether it makes sense to convert not-yet-purged +S files
to mercurial ... but that is another question.)
test-convert-cvs.t has been a little flaky for a while now. Add an
extra tiebreaker in cscmp so that all the cases in the test will sort
reliably.
Without this patch, test-convert-cvs.t failed after 346 runs. With
this patch, I stopped trying to get it to fail after 615 runs. While
not conclusive, that makes me pretty optimistic that this is a working
fix.
In all the remaining cases the comprehension variable is used for the same
thing as a previous loop variable.
This will mute some pyflakes "list comprehension redefines" warnings.
This patch fixes argument mismatch at formatting the abort message,
introduced by dcf325a55d85: the last '%s' doesn't have corresponded
argument.
This patch uses "unexpected size" in the abort message, to distinguish
the reason of failure from "unexpected type" failure checked in the
prior code path below:
if info[1] != type:
raise util.Abort(_('cannot read %r object at %s') % (type, rev))
Before this patch, all operations applied on ".gitmodules" at git
source revisions are treated as modification, even if they are
actually removal of it.
If removal of ".gitmodules" is treated as modification unexpectedly,
"hg convert" is aborted by the exception raised in
"retrievegitmodules()" for ".gitmodules" at the git source revision
removing it, because that revision doesn't have any information of
".gitmodules".
This patch detects removal of ".gitmodules" at git source revisions
correctly.
If ".gitmodules" is removed at the git source revision, this patch
records "hex(nullid)" as the contents hash value for ".hgsub" and
".hgsubstate" at the destination revision.
This patch makes "getfile()" raise IOError also for ".hgstatus" and
".hgsubstate" if the contents hash value is "hex(nullid)", and this
tells removal of ".hgstatus" and ".hgsubstate" at the destination
revision to "localrepository.commitctx()" correctly.
For files other than ".hgstatus" and ".hgsubstate", checking the
contents hash value in "getfile()" may be redundant, because
"catfile()" for them also does so.
But this patch chooses writing it only once at the beginning of
"getfile()", to avoid writing same code twice both for ".hgsub" and
".hgsubstate" separately.
This change allows the origin() and destination() revsets to yield the same
results in the new and old repos after a conversion. Previously, nothing would
be listed for queries in the new repo.
Like the SHA1 updates to the commit messages, this is only operational when the
'convert.hg.saverev=True' option is specified. If the old reference cannot be
found, it is left as-is. It seems slightly better to leave stale evidence of
the graft/transplant/rebase than to eliminate it entirely.
This patch changes the calling signature of memfilectx's __init__ to fall in
line with the other file contexts.
Calling code and tests have been updated accordingly.
Rollback or strip could leave a Mercurial repo with a shamap with revisions no
longer in the repository.
To ensure reliable conversions we now check that the commit actually exists and
consider it non-existing if it doesn't exist.
Mercurial has stable revision identifiers and rollback and strip. Revisions
referenced in the shamap are thus not necessarily still present but we can
easily check for it.
Subversion do not have stable identifiers and no rollback or strip(?). We must
thus assume that all revisions referenced from a shamap still must be present.
This method is similar to hascommitforsplicemap but different ...
The name 'hascommit' sounds like something generic ... but it might
also throw exceptions in specific cases and it is thus (apparently)
only useful for splicemap.
We would formerly exec git cat-file once for every commit, plus once for
every tree and file we wnated to read. This switches to using git
cat-file's batch mode, which is much, much, much faster.
Using this new code, converting the git git repo to hg ran in 106
minutes on my machine. Using the stock mercurial, it required 1239
minutes. I believe this to be typical of the speedups we will see
form this patch.
Closemap solves a very specific use case. It would be better to have a more
generic solution than to have to maintain this forever.
Closemap has not been released yet and removing it now will not break any
backward compatibility contract.
There is no test coverage for closemap but it seems like the same can be
achieved with a simple and much more powerful custom extension:
import hgext.convert.hg
class source(hgext.convert.hg.mercurial_source):
def getcommit(self, rev):
c = super(source, self).getcommit(rev)
if rev in ['''
d643f67092ff123f6a192d52f12e7d123dae229f
3a6a38229d418ba09cb7784c01453a93b4d363f8
facceca31c18f7ef800977055dbcbd7fcb5c5cb2
''']:
c.extra = c.extra.copy()
c.extra['close'] = '1'
return c
hgext.convert.hg.mercurial_source = source
Tagmap solves a very specific use case. It would be better to have a more
generic solution than to have to maintain this forever.
Tagmap has not been released yet and removing it now will not break any
backward compatibility contract.
There is no test coverage for tagmap but it seems like the same can be achieved
with a (relatively) simple and much more powerful custom extension:
import hgext.convert.hg
def f(tag):
return tag.replace('some', 'other')
class source(hgext.convert.hg.mercurial_source):
def gettags(self):
return dict((f(tag), node)
for tag, node in in super(source, self).gettags().items())
def getfile(self, name, rev):
data, flags = super(source, self).getfile(name, rev)
if name == '.hgtags':
data = ''.join(l[:41] + f(l[41:]) + '\n' for l in data.splitlines())
return data, flags
hgext.convert.hg.mercurial_source = source
The fix for issue2653 broke the ability to map the default branch of a source
repository to a non-default named branch in the destination repository. Leave
the default behaviour as is, but allow the branch name "None" to be used to map
to a non-default named branch in the destination repository.
Subversion issues involving svn log such as 1e493b49245f can be tricky to
debug when it is run in an 'hg debugsvnlog' sub process. Debugging is simpler
when convert only uses one process.
With this change convert will invoke the svn log directly when setting
[convert]
svn.debugsvnlog = False
This is intentionally not documented.
Mercurial tags can be local (tag -l, stored in .hg/localtags) or global (normal
tags, tracked in .hgtags) ... or extensions can add other kind of tags.
Convert would take all tags (except "tip"), not just the ones from .hgtags, and
put them into .hgtags.
Instead, convert only the global tags that come from .hgtags.
Previously, there was no way to rewrite tags on the fly while converting. Now,
we add similar logic to branchmap to provide a way to map old tags to new tags.
Currently, this is not enabled since there is not yet a command-line option.
Previously, when converting from a mercurial repo there would be an extraneous
commit at the end of the convert process that would rewrite tags. Now, we check
if there are any new tags before doing this rewriting.
Previously, the hg sink for puttags would just use one head for getting the old
tags which would sometimes lead to tags disappearing. Now, we iterate over all
heads and merge the results.