The getchanges function of some converter_source classes can return
some false positives. I.e. they sometimes claim that a file "foo"
was changed in some revision, even though its contents are still the
same.
convert_svn is particularly bad, but I think this can also happen with
convert_cvs and, at least in theory, with mercurial_source.
For regular conversions this is not really a problem - as long as
getfile returns the right contents, we'll get a converted revision
with the right contents. But when we use --filemap, this could lead
to superfluous revisions being converted.
Instead of fixing every converter_source, I decided to change
mercurial_sink to work around this problem.
When --filemap is used, we're interested only in revisions that touch
some specific files. If a revision doesn't change any of these files,
then we're not interested in it (at least for revisions with a single
parent; merges are special).
For mercurial_sink, we abuse this property and rollback a commit if
the manifest text hasn't changed. This avoids duplicating the logic
from localrepo.filecommit to detect unchanged files.
There are some corner cases where we may have a copy in a file that
isn't in the added list:
- the result of a hg copy --after --force
- after a merge across a (local) rename
Clearing it before the conversion protects us from whatever data were
there (file copies in particular).
Invalidating it after the conversion avoids writing a possibly
inconsistent dirstate to disk.
During a conversion, the dirstate contents are not consistent - there
are files that may be missing from the dirstate and there may be files
that shouldn't be in the dirstate.
While this is not fixed, don't mark files as added - put them directly
in state 'n'ormal.
Changeset 31be2f4d36a5 added some code to putcommit to avoid creating a
revision that touches no files, but this can break regular conversions
from some repositories:
- conceptually, since we're converting a repo, we should try to make
the new hg repo as similar as possible to the original repo - we
should create a new changeset, even if the original revision didn't
touch any files (maybe the commit message had some important bit);
- even if a "regular" revision that doesn't touch any file may seem
weird (and maybe even broken), it's completely legitimate for a merge
revision to not touch any file, and, if we just skip it, the
converted repo will end up with wrong history and possibly an extra
head.
As an example, say the crew and main hg repos are sync'ed. Somebody
sends an important patch to the mailing list. Matt quickly applies
and pushes it. But at the same time somebody also applies it to crew
and pushes it. Suppose the commit message ended up being a bit
different (say, there was a typo and somebody didn't fix it) or that
the date ended up being different (because of different patch-applying
scripts): the changeset hashes will be different, but the manifests
will be the same.
Since both changesets were pushed to public repos, it's hard to recall
them. If both are merged, the manifest from the resulting merge
revision will have the exact same contents as its parents - i.e. the
merge revision really doesn't touch any file at all.
To keep the file filtering stuff "working", the generic code was changed
to skip empty revisions if we're filtering the repo, fixing a bug in the
process (we want parents[0] instead of tip).
If convert.hg.clonebranches is set, branches will be created as clones of
their parent revisions. All clones will be subdirectories of the
destination path.
Allows mapping usernames to new ones during conversion process.
- Use -A option for first import
- Then at the end of the conversion process and if the destination
repo supports authorfile attribute, author map content is copied
to the file pointed by the authorfile call.
- On incremental conversions w/o any -A option specified, the
destination authorfile, if any, gets read automatically.
EG: This allows mapping unix system usernames used in CVS accounts
to a more typical "Firstname Lastname <address@server.org>" pair.