This is a followup to dd4fb29994d3, which only fixed the conversion of
patches with UTF-8 metadata.
This patch allows a changelog to have any bytes with values
0x7F-0xFF. It parses the XML changelog as Latin-1 and uses
converter_source.recode() to decode the data as UTF-8/Latin-1.
Caveats:
- Since the convert extension doesn't provide any way to specify the
source encoding, users are still limited to UTF-8 and Latin-1.
- etree will still complain if the changelog has bytes with values
0x00-0x19. XML only allows printable characters.
Given a commit author or message with non-ASCII characters in a darcs
repo, convert would raise a UnicodeEncodeError when adding changesets
to the hg changelog.
This happened because etree returns back unicode objects for any text
it can't encode into ASCII. convert was passing these objects to
changelog.add(), which would then attempt encoding.fromlocal() on
them.
This patch ensures converter_source.recode() is called on each piece
of commit data returned by etree.
(Also note that darcs is currently encoding agnostic and will print
out whatever is in a patch's metadata byte-for-byte, even in the XML
changelog.)
Also document that
- empty lines are skipped and comment are supported in author map
- whitespace is not allowed in branch map entries since we split on it
when parsing the file
This aligns the authormap option with the other three mapping options.
The old --authors option is still supported and 'hg help convert -v'
will still show it.
The convert extension requires puttags(self, tags) to return a sequence
for a multi-variable assignment. If puttags implicitly returns None,
the code will break when trying to un-pack None for assignment.
Clarify that:
- Specified paths are matched by comparing name of file or directory.
- Line order (thus) doesn't matter.
- Rename doesn't imply include.
this helps users to know what kind of option is:
- no value is required(flag option)
- value is required
- value is required, and multiple occurrences are allowed
each kinds are shown as below:
-f --force force push
-e --ssh CMD specify ssh command to use
-b --branch BRANCH [+] a specific branch you would like to push
if one or more 3rd type options are shown, explanation for '[+]' mark
is also shown as footnote.
When converting non-local repositories, scanning changed paths before
retrieving data can be almost as slow as retrieving the data itself, thanks to
HTTP calls overhead.