The existing knobs for controlling which revisions to convert were often
insufficient. Revsets is a shiny hammer that provides a better solution.
Revsets has been introduced in --rev handling in a lot of other places while
being more or less backwards compatible. Doing the same here would be a much
more elegant ... but that would unfortunately not work in this case. "--rev 7"
used to mean revision 0 to 7 - it would be an unacceptable change if it
suddenly just meant revision 7.
Instead we introduce a new configuration setting. It will only work for
Mercurial repositories so adding a new commandline option for it would not be a
nice solution.
There is no way to use the fancy deprecation markup for configuration settings
so we just remove the documentation of hg.startrev.
destc is not a string and can thus not be os.path.join'ed. Convert would crash
if we ended up there ... but we wouldn't because both the sinks (hg and
subversion) sinks implement .revmapfile and "never" throws exceptions.
The 'copynode' was looked up in self.keep as if it was a changeset node. It is
however a filelog node, and self.keep would thus fail if it actually looked at
its parameter ... which it only did if a startrev was specified.
Instead we now don't check the copy node - we don't have to. It must have been
copied from one of the parents, and we already check whether one of the parents
have the copy source.
We could perhaps use linkrev to see if the corresponding changeset was
converted ... but that would sometimes be wrong.
The existing test of this was wrong - now it is better, but it seems like it
exposes a 'log' issue.
The cvsps.py:getrepopath suffers from a string parsing bug (it returns
"user@server/path/to/repository" if the CVSROOT is given like this:
":pserver:user@server/path/to/repository" ), which gives returnes the wrong
value becouse cvsps.py fails to strip the prefix from filenames.
With this patch for the same input we get the correct repo path that is:
"/path/to/repository"
Implemented error handling on splicemap file when source is
subversion (This checks are similar to when source is hg or git).
The revision string is expected to be of svn:<uuid><path>@<number>
format.
the test case has been enhanced to check this format.
Implemented similar error handling that is done for hg in an earlier revision.
These are:
a. add checking for splicemap file format
b. add checking for each revision string formats
1. Introduced 2 levels of error handling for splicemap files
a. Check the splicemap file for rules which are same across different
types of source repos. This is done through enhancing parsesplicemap
function
b. Check revision string formats. Each repo may have their own format.
This is done usign checkrevformat function
c. Implemented the above two for hg
If you actively work with branches, sometimes you need to close old branches
which last commited hundreds revisions ago. After close you will see long
lines in graph visually spoiling history. This sort only moves closed
revisions as close as possible to parents and does not increase storage size
as datesort do.
Since git v1.7.8.2-327-g926f1dd (the change was first released in git
1.7.10), git does not return non-zero when "git ls-remote --tags ..."
is run and the repository is damaged. This causes the "damaged
repository with missing commit" test in test-convert-git.t to
unexpectedly succeed.
Fix by aborting if git outputs any lines beginning with "error:",
which required adding some subprocess use in convert/git.py.
Simplify core logic by no longer attempting to work around missing
class attributes. Instead always generate the attributes and ignore
the cache if the attributes are missing
The default for the time zone offset in a converted changeset has
always been 0 (UTC). With this patch, the converted changeset is
modified so that the local offset from UTC is specified as the time
zone offset.
The option is specified as the boolean convert.localtimezone (default
False). Example usage:
hg convert -s cvs --config convert.localtimezone=True example-cvs example-hg
IMPORTANT: the patch only applies to conversions from cvs or svn.
The documentation for the option only appears in those two sections
in the convert help text.
Previously, convert aborted upon encountering a git submodule. This patch
changes it so that it now succeeds. It modifies convert_git to manually generate
'.hgsub' and '.hgsubstate' files for each git revision, so as to convert git sub
modules to non-mercurial subrepositories.
Bookmarks persistence still showed a fair amount of its legacy as a
monkeypatching extension. This encapsulates all bookmarks
serialization and parsing in a single class, and offers a single
location where other bookmarks storage engines can be substituted
in. As a result, many files no longer import the bookmarks module,
which strikes me as an encapsulation win.
This doesn't do anything to the current bookmark state yet, but I'm
hoping put that in the bmstore class as well.
convert doesn't normalise double slashes in paths. Path normalization
is applied when a path is loaded into filemap and when a file lookup
request is issued to filemap.
Avoid mixing popen and subprocess calls, it simplifies the command line
generation and quoting issues with redirections.
In practice, it fixes the subversion sink on Windows and probably helps
with monotone and darcs sources.
Some help topics use "-" for the top level underlining section mark,
but "-" is used also for the top level categorization in generated
documents: "hg.1.html", for example.
So, TOC in such documents contain "sections in each topics", too.
This patch changes underlining section mark in some help topics to
unify section level in generated documents.
After this patching, levels of each section marks are:
level0
""""""
level1
======
level2
------
level3
......
level4
######
And use of section markers in each documents are:
- mercurial/help/*.txt can use level1 or more
(now these use level1 and level2)
- help for core commands can use level2 or more
(now these use no section marker)
- descriptions of extensions can use level2 or more
(now hgext/acl uses level2)
- help for commands defined in extension can use level4 or more
(now "convert" of hgext/convert uses level4)
"Level0" is used as top level categorization only in "doc/hg.1.txt"
and the intermediate file generated by "doc/gendoc.py", so end users
don't see it in "hg help" outoput and so on.
When the source repository had a revision renaming "$new -> $old",
but the filemap a "$old -> $new" rename, the converted revision could
use either $new (deleting the file) or $old (keeping the file) when
getting the file data, depending on the lexicographical order of
those names. So the resulting revision would leave some files
untouched (as expected), but delete others arbitrarely.
When running convert with a filemap, merge parents which are ancestors
of other parents are ignored. This is hardly a problem when parents
belong to the same branch, but the result could be confusing when named
branches are involved. With:
-o-a1-a2-a3... <- A
\ \
b1-b2-b3...-m- <- B
If all b* revisions are discarded, it is useful to preserve 'm' even if
it is empty after filtering to record the branch switch.
This patch makes filemap preserve "ancestor parents" if there is no
"non-ancestor parent" on the same branch than the merge revision.
Remarks:
- I am not completely convinced by the reasons given above and those
detailed by Matt in this thread:
http://selenic.com/pipermail/mercurial-devel/2012-May/040627.html
The properties we try to preserve are not clearly defined. That said,
I know this patch already helped someone on IRC and the tests output
look reasonable.
- This is a new version of the original "convert: filemap must preserve
fast-forward merges" patch. It has exactly the same output for 2
parents merges, the additional complexity is here to handle more than
two parents.
'hg debugsvnlog' failed with a crash when using the uninitialized transport in
get_log_child if the import of the svn libraries had failed.
'convert' should never get as far as launching 'hg debugsvnlog' if the svn
libraries are missing, but by launching a subprocess there is risk that the
environment is mangled so the second import fails.
It is in principle also possible to launch the command manually.
Subversion can handle ':' quoted as '%3A' but urllib.url2pathname can't and
Mercurial thus rejected some valid subversions URLs.
This particular case will now be handled by some preprocessing before handing
it over to urllib.url2pathname.
This is tested by a0c992a723f9 when test-convert-svn-source.t and
test-convert-svn-move.t can be run on Windows.
Calling propset/propdel with subversion 1.6 on FAT gave
abort: svn exited with status 256
and made test-convert-hg-svn.t and test-convert-svn-sink.t fail. 1.7 worked.
This is a rework of 5ba59c098f03 but ignores the executable bit when it isn't
supported instead of using an approximation.
on some non "en" locale environments, "hg convert" is aborted, because
"util.parsedate()" fails.
it fails in "memctx.__init__()" called by "putcommit()" of "convert".
in "hg convert", datetimes gotten from source repository
are usually formatted by "util.datestr()" with default format "%a %b
%d %H:%M:%S %Y %1%2".
but on some environments, "%a" and "%b" may cause locale sensitive
string, and such string may cause parse error in "util.parsedate()".
this path uses "%Y-%m-%d %H:%M:%S %1%2" as intermediate representation
format for datetimes, because it consists only of locale insensitive
elements.
datetimes in above format are only used for passing them from
conversion logic to memctx object, so it doesn't have to be formatted
by locale sensitive one.
this patch just avoids locale sensitivity problem of "datestr()" and
"parsedate()" combintion.
"svn add file" now fails if "file" is already tracked. To filter them we have
to mirror the svn manifest in the sink.
Tested with svn 1.6.12 and 1.7.4.
Subversion conversion works by picking trunk and branches heads, computing a
revision graph from them and converting the selected commits. By design we fail
to convert empty revisions so we have to be careful when discovering the
revision graph. In this particular issue, the source svn repository was a
partial mirror made by svnsync. The funny part is svnsync preserves all
revisions including empty ones. Also, we trusted ra.stat(path,
stop).created_rev to give us the latest revision with changes in path history
up to stop. This assumption broke at least when path is '', that is the
repository root, which always returned 'stop' revision despited being empty.
The workaround is to first trust ra.stat() but if the returned revision appear
empty, search the whole path history from stop to r1 until some changes are
found.
Do not blindly filter out non ending ^{} tags. The new logic
is:
- if both "tag" and "tag^{}" exist, "tag^{}" is what we want
- if only "tag" exists, "tag" is fine
This improves the error message when convert encounters a git
submodule. Now, instead of a git-cat-file error, we'll directly report
the lack of support for git submodules.
Splicemap lines are documented in hg help convert like:
key parent1, parent2
but parsed like:
key, parents = line.strip().rsplit(' ', 1)
parents = parents.replace(',', ' ').split()
The rsplit() call was introduced to handle spaces in keys for the generic
mapfile format. Spaces can appear in svn identifiers since they contain path
components. This logic makes less sense with splicemap since svn identifiers
can also appear on the right side, even if it is a bit less likely. Given the
parsing is theorically broken, I would rather follow what is documented already
and is correct in the main case where all identifiers are hg hashes. Also,
using svn identifiers in a splicemap sounds difficult as they are not easily
accessible.
When sorting revisions before converting them, we have to edit the revision
graph using splicemap entries. Otherwise, a spliced revision may be converted
before its synthetic parents. Invalid splicemap revisions are now detected
before starting the conversion.
Parsing the splicemap as a mapfile was a pain because map does not let us
override its parsing code and splicemap entries are not key/values. Besides we
had no need for mapfiles extra features. Just parse the splicemap and return a
dictionary.
some problematic encodings use backslash as part of multi-byte characters.
util.pconvert() can treat strings in such encodings correctly, if
win32mbcs is enabled, but str.replace() can not.
Instead of opening the target bzr checkout as a single branch, we try to open
it as a repository. This has the following effects:
- All branches are now converted
- bzr branch names are preserved. Previously, the selected branch was always
converted as 'default'. Branches without a name or 'trunk' are mapped to
'default branch.
- Lightweight checkouts are no longer supported. Maybe they can be, I did not
try to fix that at all.
Implementation notes:
- This was a quick fix, I have no knowledge of bzr API besides browsing 2.0.3
sources.
- The fix was only tested on OSX against bzr 2.4.2.
- Tags discovery does not handle collisions. I have no idea how tags work in
bzr so maybe such collisions are not possible.
Before this patch, metadata and file names were interpreted like:
- unicode objects were converted to UTF-8
- non unicode objects were left unchanged
Looking at the code and bzr being known for transcoding filenames, we expect
everything to be returned as unicode objects, and we want to encode them in
UTF-8, like the subversion source does. To do that, we just remove the custom
implementation of .recode().
The GPLv3 FAQ suggests to upgrade by
[...] replace all your existing v2 license notices (usually at the
top of each file) with the new recommended text available on the GNU
licenses howto. It's more future-proof because it no longer includes
the FSF's postal mailing address.
This removes the postal address, but leaves the version number at 2+.
- catch all exceptions
- pickle a stringified version of the exception
- use a normal abort
Hopefully this will result in less mysterious convert exceptions
A convert run with a branchmap made with
echo default namedbranch > branchmap
on Windows fails silently and surprisingly; it actually
adds a space after 'namedbranch', so it ends up mapping
"default namedbranch" to "".
This also affects splicemaps, since the same parser is used
for both.
As of svn 1.7, many svn calls expect "canonical" paths. In theory, we should
call svn.core.*canonicalize() on all paths before passing them to the API.
Instead, we assume the base url is canonical and copy the behaviour of svn URL
encoding function so we can extend it safely with new components.
With renames like:
a -> b
a/c -> a/c
We were ignoring or duplicating the second one instead of leaving files
unchanged or moving them to their proper destination only.
To avoid this, we process the files in reverse lexicographic order, from most
to least specific change, and ignore files already processed.
v2:
- Add a test
- Change "reverse=1" into "reverse=True"
I have not tried to produce the bug but here is idea: b2b0622d9e96 stopped
passing the modified files list to commit. This makes commit more fragile since
we better not touch unrelated files by mistake. But putcommit() still applies
file changes before exiting upon ignored revisions. So in theory, we could
apply changes from a skipped branch then commit them as part of another
revision.
This patch makes the sink apply the changes after possibly skipping the
revision. The real fix would be to use svn commit --targets option to pass the
file names in an argument file. Unfortunately, it seems to be bugged in svn
1.7.1:
http://svn.haxx.se/dev/archive-2011-11/0211.shtml
When using the convert extension from a Mercurial rep. to subset it with
filemap, the bookmarks are not copied. I fixed this by calling the
base.get_bookmarks() from the filemap getbookmarks() instead of returning an
empty dictionary. It should work also for other converters that implement
getbookmarks() (like git). I don't see any drawbacks except that the bookmarks
are always copied (not necessarily wanted all the times).
These leaks may occur in environments that don't employ a reference
counting GC, i.e. PyPy.
This implies:
- changing opener(...).read() calls to opener.read(...)
- changing opener(...).write() calls to opener.write(...)
- changing open(...).read(...) to util.readfile(...)
- changing open(...).write(...) to util.writefile(...)
The previous behaviour was almost as if convert.hg.ignoreerrors was always set
for revisions without parents, except that errors were silently ignored. Revlog
errors are handled as a side effect of getcopies(), but getcopies() was only
called when convert.hg.ignoreerrors was set.
Now we always call self.getcopies for root revisions, not only when
convert.hg.ignoreerrors is set, just like we do on all other revisions.
The extra call might be a bit expensive, but the proper fix for that would be
to catch these errors in another way.
When converting directory additions/replacement with project directory set to
root, _iterfiles() sometimes returned paths starting with a slash making
following svn calls to fail.
I could not reproduce the issue with hand-crafted repositories.
Report and first analysis by Clinton Chau <clinton@clearcanvas.ca>
"mtn automate stdio" will break output larger than 32kB into several packets.
This ensures that we are processing all the output on the main stream and not
only the last packet.
Monotone treats branch closing ("suspending") in a similar manner to how we do
in mercurial - a cert is added to a revision that marks the branch to be hidden.
If a subsequent commit is made, the branch is effectively reopened.
Currently the convert extension spawns a new mtn process for each
operation. For a large repository, this ends up being hundreds of
thousands of processes. The following enables usage of monotone's
"automate stdio" functionality - documented at:
http://www.monotone.ca/docs/Automation.html#index-mtn-automate-stdio-188
The effect is that (after determining that a new enough mtn executable
is available) a single long-running mtn process is used for all the
operations, using stdin/stdout to send commands and read output.
This has a pretty significant effect on the performance of some parts
of the conversion process.
Converting from subversion specifying config.svn.trunk results
in storing trunk under branch named as config.svn.trunk, where `default'
brunch is expected. Submission contains patch and test.
Subversion python bindings check was not present in svn_sink source
class which made it fail while using svn as destination repository.
Added a more maintainble svn bindings check for svn_source and svn_sink
classes.
Use field list instead of option list in convert help, because the
option list format used, with defaults and type of argument is not
supported by docutils.
The p4 command-line client sometimes fails upon doing "p4 describe"
when trying to produce a patch. (I'm guessing it's a bug in p4.)
However, "hg convert" doesn't even make use of the patch, and it can
be elided by adding "-s" to the p4 command line here.
Changes the characters used as section separators, so different ones
are used for module docstring and command docstring.
This is done because the section from the docstring will be at
different levels in the restructured text output, therefore
different symbols have to be used.
This is a followup to dd4fb29994d3, which only fixed the conversion of
patches with UTF-8 metadata.
This patch allows a changelog to have any bytes with values
0x7F-0xFF. It parses the XML changelog as Latin-1 and uses
converter_source.recode() to decode the data as UTF-8/Latin-1.
Caveats:
- Since the convert extension doesn't provide any way to specify the
source encoding, users are still limited to UTF-8 and Latin-1.
- etree will still complain if the changelog has bytes with values
0x00-0x19. XML only allows printable characters.
Given a commit author or message with non-ASCII characters in a darcs
repo, convert would raise a UnicodeEncodeError when adding changesets
to the hg changelog.
This happened because etree returns back unicode objects for any text
it can't encode into ASCII. convert was passing these objects to
changelog.add(), which would then attempt encoding.fromlocal() on
them.
This patch ensures converter_source.recode() is called on each piece
of commit data returned by etree.
(Also note that darcs is currently encoding agnostic and will print
out whatever is in a patch's metadata byte-for-byte, even in the XML
changelog.)
Also document that
- empty lines are skipped and comment are supported in author map
- whitespace is not allowed in branch map entries since we split on it
when parsing the file
This aligns the authormap option with the other three mapping options.
The old --authors option is still supported and 'hg help convert -v'
will still show it.
The convert extension requires puttags(self, tags) to return a sequence
for a multi-variable assignment. If puttags implicitly returns None,
the code will break when trying to un-pack None for assignment.
Clarify that:
- Specified paths are matched by comparing name of file or directory.
- Line order (thus) doesn't matter.
- Rename doesn't imply include.
this helps users to know what kind of option is:
- no value is required(flag option)
- value is required
- value is required, and multiple occurrences are allowed
each kinds are shown as below:
-f --force force push
-e --ssh CMD specify ssh command to use
-b --branch BRANCH [+] a specific branch you would like to push
if one or more 3rd type options are shown, explanation for '[+]' mark
is also shown as footnote.
When converting non-local repositories, scanning changed paths before
retrieving data can be almost as slow as retrieving the data itself, thanks to
HTTP calls overhead.
We do not care about directories when looking for recursively added or removed
items, and the redundant _checkpath() call is expensive with remote
repositories.
This recode call was removed in 597805361f86, because it looked the
encode(decode()) construct was a no-op. In fact, the first decode() call was
wrong, and entries still have to be encoded before being passed to the sink.
For some reason, if a copy source is deleted in the same revision it is
referenced, it is filtered out. This is silly, because this happens all the
time with move operations. Fortunately, the filtering code is buggy and ends
being a no-op 99% of the time, since it does not delete the right key. Just
remove all this nonsense.
If the CVS repo somehow has a symbolic name that references a revision
consisting of a single number (e.g. BAD_TAG: 1), convert will fail when
attempting to find the branches, preventing the initial import from
working.
This patch skips those symbolic names--without warning.
Fix as proposed by Frank Kingswood.
Avoids
UnboundLocalError: local variable 'mode' referenced before assignment
when cvs fails.
This alsa partially fixes issue1592.
For a CVS repository checked out with "cvs co .", the prefix used to strip of
what we get from CVS was previously erroneously set to "repopath/.".
We now prevent the dot to be added.
Test folded in test-convert-cvs and simplified by Patrick Mézard
<pmezard@gmail.com>.
The bzrlib try to import the ElementPath module but had a fallback in
case the import fails. Lazy import of this module leads to later
failure.
The bzrlib is used by the convert extension.
Otherwise when processing a changeset that in fact changes no files
(perhaps due to bug in import from CVS) can get something like:
unexpected svn output:
abort: unable to cope with svn output
Bug report and patch draft by Jesse Glick <jesse.glick@sun.com>