Old svn allowed users to include invalid utf8 in their commits. Since
there are real repos with said invalid utf8, we need to be able to
import them, even if svn won't.
It turns out that SVN has bizarre path canonicalization rules that
are sort of close to what urllib.quote does, but different in
peculiar ways, and 1.7 suddenly cares deeply about canonicality.
For instance, space (' ') maps to %20, but '~' stays unchanged
instead of turning into %7e.
Along with its new policy of frequent beatings administered to users
of its bindings, SVN 1.7 introduces a function that idempotently
canonicalizes URIs, which I found sort of by accident, because
that's how you learn about SVN API changes.
Older versions of SVN are less anal, so urllib.quote continues to
work fine for them.
This case contains a couple of unlikely (but not impossible) failure
cases that the code previously did not handle. The verifier is updated
to address these, and the output made a bit more consistent.
Previously, a file beginning with the repository subdirectory would be
stripped, resulting in a leftover file name with a wrong name. A
subsequent pull of a revision modifying the file would add it under
its correct name, but leave the leftover file.
This prevents re-pulling the same revision over and over, which was a
problem when the most recent revision was a tagging revision that
wouldn't exist properly in the revmap. This should also allow users to
not re-pull huge volumes of commits that have no effect on the hg
repository.
The value of the default commit message is now configurable by setting
'hgsubversion.defaultmessage'. In addition, the log output is made
consistent with the result of the conversion.
Previously, property changes to links caused 'link ' to be prepended
to the link destination. Removing a line that prepended it in
Revision::set() appears to fix it. In these cases, the "file marked as
link, but contains data" warning might be triggered. This should be
safe, so it's lowered to a note and the language made less conclusive.
In order to test this, extra revisions are added to the
'symlinks.svndump' fixture. As one of the new revisions add a link
that points to 'link to this', a check that asserted that link
destinations must not start with 'link ' was removed. This change is
safe, as the test later on asserts exact equality with the contents of
the 'links' dictionary.
The way hgsubversion handles URLs that may or may not be quoted is
somewhat fragile. As part of fixing issue 132 in 06d89c2063a2, the
path component of URLs was always quoted. The URL has been attempted
encoded since the initial check-in.
The fix from 06d89c2063a2 was incomplete; reverting it allows us to
clone a URL with a '~' in it.[1] Encoding the URL as UTF-8 seldom
works as expected, as the default string encoding is ASCII, causing
Python to be unable to decode any URL containing an 8-bit
character.
The core problem here is that we don't know whether the URL specified
by the user is quoted or not. Rather than trying to deal with this
ourselves, we pass the problem on to Subversion. Then, we obtain the
URL from the RA instance, where it is always quoted. (It's worth
noting that the editor interface, on the other hand, always deals with
unquoted paths...)
Thus, the following invariants should apply to SubversionRepo
attributes:
- svn_url and root will always be quoted.
- subdir will always be unquoted.
Tests are added that verify that it won't affect the conversion
whether a URL is specified in quoted or unquoted form. Furthermore, a
test fixture for this is added *twice*, so that we can thoroughly test
both quoted and unquoted URLs. I'm not adding a test dedicated to
tildes in URLs; it doesn't seem necessary.
[1] Such as <https://svn.kenai.com/svn/winsw~subversion>.
getcopies() assumed that copies where happening withing the current branch.
This is wrong when a branch replaces another, and used to generate wrong copy
records when copy sources existed in parent revision but were coming from an
unrelated revision.
Known failures:
- comprehensive/test_verify on replace_branch_with_branch: replaced files
content is incorrect
- comprehensive/test_stupid_pull on replace_branch_with_branch: very stupid
mode does not handle replacements correctly.
This was broken because file edits were skipped if they were in tags, but
committags in svnmeta didn't check to see if any files were changed during
initial tag creation.
Author maps for the Python repo got truncated because of the author map stupidly
writing upon itself. This patch implements a better and faster scenario, where
entries will only be written to the saved author map if they're not coming from that
file. They're also now streamed into the file directly, instead of having to re-open
the file on every entry, and formatting is preserved.
Peg revisions are now parsed separately. If a revision is supplied but not a
peg revision, we used the former as peg revision, as subversion seems to do.
This fix solves the following case: let's /dumb/layout/project be an existing
project. To normalize the trunk/branches/tags layout, people may do:
$ mkdir /project
$ mv /dumb/layout/project /project/project
# Oups, should have been trunk!
$ mv /project/project /project/trunk
trunk creation was ignore because:
- update_branch_map() sees it come from a non-branch copy source and ignores it
(case #3).
- since it is not in self.branches, add_directory() ignores the non-existing path.
Then trunk is left uninitialized.
To solve this, we allow update_branch_map() to detect branches copied from
non-canonical locations.