The getchanges function of some converter_source classes can return
some false positives. I.e. they sometimes claim that a file "foo"
was changed in some revision, even though its contents are still the
same.
convert_svn is particularly bad, but I think this can also happen with
convert_cvs and, at least in theory, with mercurial_source.
For regular conversions this is not really a problem - as long as
getfile returns the right contents, we'll get a converted revision
with the right contents. But when we use --filemap, this could lead
to superfluous revisions being converted.
Instead of fixing every converter_source, I decided to change
mercurial_sink to work around this problem.
When --filemap is used, we're interested only in revisions that touch
some specific files. If a revision doesn't change any of these files,
then we're not interested in it (at least for revisions with a single
parent; merges are special).
For mercurial_sink, we abuse this property and rollback a commit if
the manifest text hasn't changed. This avoids duplicating the logic
from localrepo.filecommit to detect unchanged files.
To handle merges correctly, this revision adds a filemap_source class
that wraps a converter_source and does the work necessary to calculate
the subgraph we're interested in.
The wrapped converter_source must provide a new getchangedfiles method
that, given a revision rev, and an index N, returns the list of files
that are different in rev and its Nth parent.
The implementation depends on the ability to skip some revisions and to
change the parents field of the commit objects that we returned earlier.
To make the conversion restartable, we assume the revisons in the
revmapfile are topologically sorted.
The --filemap support in hg convert doesn't handle merges correctly.
(And after 98d1e8c16343 I managed to break it even for simple cases
where we don't want the first revision.)
If getchanges returns a string, it's assumed to be the id of an
already converted revision. We map the current revision to the same
revision this converted revision was mapped to.
To allow skipping a root revision, getchanges can return the special
string 'hg-convert-skipped-revision' (a.k.a. common.SKIPREV), which
hopefully won't clash with any real id.
The converter_source is responsible for rewriting the parents of the
commit objects to make sure the revision graph makes sense.
In non-cygwin environment, cvsps fails to create its cache directory and redirect its output to stderr. Just ignore the error and capture stderr as well.
CVS connection strings regexp detect colons to separate protocols from path and login. Unfortunately, Windows paths contains colons and were interpreted as rsh connection strings.
- something else than "pat" followed by a number can be used as key
- something else than "/" can be used as delimiter
- "ilmsux" flags (e.g. "i" for case insensitive) can be used
There are some corner cases where we may have a copy in a file that
isn't in the added list:
- the result of a hg copy --after --force
- after a merge across a (local) rename
Clearing it before the conversion protects us from whatever data were
there (file copies in particular).
Invalidating it after the conversion avoids writing a possibly
inconsistent dirstate to disk.
During a conversion, the dirstate contents are not consistent - there
are files that may be missing from the dirstate and there may be files
that shouldn't be in the dirstate.
While this is not fixed, don't mark files as added - put them directly
in state 'n'ormal.
This fixes, for example,
r4151
D /branches
A /project/branches (from /branches:4150)
A /project/tags (from /tags:4150)
A /project/trunk (from /trunk:4150)
D /tags
D /trunk
In addition to the old cmd.foo, opts.foo hgrc entries, allow a more simple
alias = command [opts]... form. For example:
[extdiff]
cdiff = colordiff -uprN
If wctx already has two parents, ancestor calculation is wrong.
Normally merge is called before wctx gets the second parent, so
we simulate this in imerge by temporarily popping the second parent
before calling filemerge. Highly dirty.
This patch also handles the ParseError move from cmdutil to dispatch.
After a hg merge, we want to include in the commit all the files that we
got from the second parent, so that we have the correct file-level
history. To make them visible to hg commit, we try to mark them as dirty.
Unfortunately, right now we can't really mark them as dirty[1] - the
best we can do is to mark them as needing a full comparison of their
contents, but they will still be considered clean if they happen to be
identical to the version in the first parent.
This changeset extends the dirstate format in a compatible way, so that
we can mark a file as dirty:
Right now we use a negative file size to indicate we don't have valid
stat data for this entry. In practice, this size is always -1.
This patch uses -2 to indicate that the entry is dirty. Older versions
of hg won't choke on this dirstate, but they may happily mark the file
as clean after a full comparison, destroying all of our hard work.
The patch adds a dirstate.normallookup method with the semantics of the
current normaldirty, and changes normaldirty to forcefully mark the
entry as dirty.
This should fix issue522.
[1] - well, we could put them in state 'm', but that state has a
different meaning.
Changeset 31be2f4d36a5 added some code to putcommit to avoid creating a
revision that touches no files, but this can break regular conversions
from some repositories:
- conceptually, since we're converting a repo, we should try to make
the new hg repo as similar as possible to the original repo - we
should create a new changeset, even if the original revision didn't
touch any files (maybe the commit message had some important bit);
- even if a "regular" revision that doesn't touch any file may seem
weird (and maybe even broken), it's completely legitimate for a merge
revision to not touch any file, and, if we just skip it, the
converted repo will end up with wrong history and possibly an extra
head.
As an example, say the crew and main hg repos are sync'ed. Somebody
sends an important patch to the mailing list. Matt quickly applies
and pushes it. But at the same time somebody also applies it to crew
and pushes it. Suppose the commit message ended up being a bit
different (say, there was a typo and somebody didn't fix it) or that
the date ended up being different (because of different patch-applying
scripts): the changeset hashes will be different, but the manifests
will be the same.
Since both changesets were pushed to public repos, it's hard to recall
them. If both are merged, the manifest from the resulting merge
revision will have the exact same contents as its parents - i.e. the
merge revision really doesn't touch any file at all.
To keep the file filtering stuff "working", the generic code was changed
to skip empty revisions if we're filtering the repo, fixing a bug in the
process (we want parents[0] instead of tip).
- move command dispatching functions from commands and cmdutil to dispatch
- change findcmd to take a table argument
- remove circular import of commands in cmdutil
- privatize helper functions in dispatch
If convert.hg.clonebranches is set, branches will be created as clones of
their parent revisions. All clones will be subdirectories of the
destination path.
This was changed to NoRepo in 821162e04f85, because specifying non-integer
revisions for e.g. the Mercurial backend caused Abort to be raised in the
subversion importer.
Now util.Abort is raised again, but the check is done after verifying if it
really is a subversion repository.
Extdiff was always making a temporary directory and copying files even when not required. This change makes extdiff avoid the copy when diffing a single file that lives in the wc. This lets external diff tools edit the working copy file directly. It also lets other extensions resuse the functions in extdiff and get in-place diffs.
Made the status info only display in verbose mode since most hg commands aren't so chatty. This also makes it cleaner for other extensions to call extdiff.
This way rollbacks happen while the repo is still locked.
Deleting lock before wlock is not strictly necessary, but is
more consistent with the locking order.
I have tried Debian's default emacs and the current CVS version. Default emacs
doesn't have highlighting enabled (and being emacs-illiterate I don't know how
to enable it) and the CVS emacs' Python highlighting has no problems with '
characters here.
The svn.ra.get_log wrapper attaches the hash of changed paths for every
log entry to a global memory pool, so memory consumption increases
rapidly, with no way to free it.
Our workaround is to call this function in a child process, and feed
its results back over a pipe. The memory consumption of the child still
grows huge (hundreds of megabytes), but at least it goes away once the
reading-the-log phase is done.
The SVN converter assumed that the trunk and branches paths were fixed,
and immediately under the base of the SVN URL. Fix the second assumption,
and allow the trunk and branches paths to be reconfigured.