The next patch will wrap the conversion code, in order to write out a
requirement for 'lfs' when appropriate. Wrapping convcmd.convertsink() in an
afterloaded callback works fine when the convert extension is enabled by the
user. The problem here is that lfconvert uses the convert extension, whether or
not it was formally enabled by the user.
My first attempt was to have lfs install an afterloaded callback that would wrap
the convert sink if convert was loaded, or wrap lfconvert if it wasn't. Then
the lfconvert override could install an afterloaded callback to try wrapping the
convert sink again, before calling the original lfconvert. But that breaks down
if largefiles can't load the convert extension on the fly. [1] Further, some
tests were failing with an error indicating that the size of the afterloaded
list changed while iterating it.
Yuya mentioned that maybe some bits of convert could be moved into core, but I'm
not sure where to draw that line. The convertsink() method depends on the list
of sinks, which in turn depends on the sink classes.
[1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-November/108038.html
This seems like basic info to have, and will be used shortly when deciding
whether or not to wrap the class for lfs conversions.
The other option is to just add a function to each class. But this seems better
in that the strings aren't duplicated, and the constructor for most of these
will run even if the VCS isn't installed, so it's easier to catch errors.
Since (b) is banned, we should do the same for (a) for consistency.
a) from mercurial import hg
from mercurial.i18n import _
b) from . import hg
from .i18n import _
Before, when converting revisions without also including their already
converted parents in convert.hg.revs, the parents would no longer be parents.
That seems unfortunate and we dare to assume that nobody ever wants that.
Instead, preserve parents that are outside the current convert range but
already have been converted.
The parents returned in getcommit() are unconditionally converted, so we
introduce a separate optparents with optional parents.
Instead of reporting
spliced in ['82544090e14fe18091e04f1fb0f0d7991cbe6e7e'] as parents of 369fd983d9e13330e9f12d9fce820deae84ea223
report
spliced in 82544090e14fe18091e04f1fb0f0d7991cbe6e7e as parents of 369fd983d9e13330e9f12d9fce820deae84ea223
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
This adds a base implementation of a function that tests if a given file from a
target repo came from the source repo. This will be used later to detect which
files did not come from the source repo during a merge, so we can merge those
files correctly instead of dropping them.
A fix for issue2653 with f5abbf51a76e introduced a discrepancy how default
branch should be denoted when converting with branchmap from different SCM.
E.g. for Git and Mercurial you need to use 'default' whilst for Perforce and
SVN you had to use 'None'. This changeset unifies 'default' for such purposes
whilst falling back to 'None' when no 'default' mapping specified.
Previously convert could only take one '--rev'. This change allows the user to
specify multiple --rev entries. For instance, this could allow converting
multiple branches (but not all branches) at once from git.
In this first patch, we disable support for this for all sources. Future
patches will enable it for select sources (like git).
In some cases we do not want to convert tags from the source repo to be tags in
the target repo (for instance, in a large repository, hgtags cause scaling
issues so we want to avoid them). This adds a config option to disable
converting tags.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
Conversion of a merge starts with p1 and re-adds the files that were changed in
the merge or came unmodified from p2. Files that are unmodified from p1 will
thus not be touched and take no time. Files that are unmodified from p2 would be
retrieved and rehashed. They would end up getting the same hash as in p2 and end
up reusing the filelog entry and look like the p1 case ... but it was slow.
Instead, make getchanges also return 'files that are unmodified from p2' so the
sink can reuse the existing p2 entry instead of calling getfile.
Reuse of filelog entries can make a big difference when files are big and with
long revlong chains so they take time to retrieve and hash, or when using an
expensive custom getfile function (think
http://mercurial.selenic.com/wiki/ConvertExtension#Customization with a code
reformatter).
This in combination with changes to reuse filectx entries in
localrepo._filecommit make 'unchanged from p2' almost as fast as 'unchanged
from p1'.
This is so far only implemented for the combination of hg source and hg sink.
This is a refactoring/optimization. It is covered by existing tests and show no
changes - which is a good thing.
For merges, we walk the files N-1 times, where N is the number of
parents. This means that for an octopus merge with 3 parents and 2
changed files, we actually fetch 6 files. This corrects the progress
output of the convert command when such commits are encountered.
Although Python supports `X = Y if COND else Z`, this was only
introduced in Python 2.5. Since we have to support Python 2.4, it was
a very common thing to write instead `X = COND and Y or Z`, which is a
bit obscure at a glance. It requires some intricate knowledge of
Python to understand how to parse these one-liners.
We change instead all of these one-liners to 4-liners. This was
executed with the following perlism:
find -name "*.py" -exec perl -pi -e 's,(\s*)([\.\w]+) = \(?(\S+)\s+and\s+(\S*)\)?\s+or\s+(\S*)$,$1if $3:\n$1 $2 = $4\n$1else:\n$1 $2 = $5,' {} \;
I tweaked the following cases from the automatic Perl output:
prev = (parents and parents[0]) or nullid
port = (use_ssl and 443 or 80)
cwd = (pats and repo.getcwd()) or ''
rename = fctx and webutil.renamelink(fctx) or []
ctx = fctx and fctx or ctx
self.base = (mapfile and os.path.dirname(mapfile)) or ''
I also added some newlines wherever they seemd appropriate for readability
There are probably a few ersatz ternary operators still in the code
somewhere, lurking away from the power of a simple regex.
This makes it possible to estimate how long the "scanning source"
phase will take, if the specified source repo type supports a quick
"how many changes" check.
Convert will normally only process files that were changed in a source
revision, apply the filemap, and record it has a change in the target
repository. (If it ends up not really changing anything, nothing changes.)
That means that _if_ the filemap is changed before continuing an incremental
convert, the change will only kick in when the files it affects are modified in
a source revision and thus processed.
With --full, convert will make a full conversion every time and process
all files in the source repo and remove target repo files that shouldn't be
there. Filemap changes will thus kick in on the first converted revision, no
matter what is changed.
This flag should in most cases not make any difference but will make convert
significantly slower.
Other names has been considered for this feature, such as "resync", "sync",
"checkunmodified", "all" or "allfiles", but I found that they were less obvious
and required more explanation than "full" and were harder to describe
consistently.
Rollback or strip could leave a Mercurial repo with a shamap with revisions no
longer in the repository.
To ensure reliable conversions we now check that the commit actually exists and
consider it non-existing if it doesn't exist.
The name 'hascommit' sounds like something generic ... but it might
also throw exceptions in specific cases and it is thus (apparently)
only useful for splicemap.
Closemap solves a very specific use case. It would be better to have a more
generic solution than to have to maintain this forever.
Closemap has not been released yet and removing it now will not break any
backward compatibility contract.
There is no test coverage for closemap but it seems like the same can be
achieved with a simple and much more powerful custom extension:
import hgext.convert.hg
class source(hgext.convert.hg.mercurial_source):
def getcommit(self, rev):
c = super(source, self).getcommit(rev)
if rev in ['''
d643f67092ff123f6a192d52f12e7d123dae229f
3a6a38229d418ba09cb7784c01453a93b4d363f8
facceca31c18f7ef800977055dbcbd7fcb5c5cb2
''']:
c.extra = c.extra.copy()
c.extra['close'] = '1'
return c
hgext.convert.hg.mercurial_source = source
Tagmap solves a very specific use case. It would be better to have a more
generic solution than to have to maintain this forever.
Tagmap has not been released yet and removing it now will not break any
backward compatibility contract.
There is no test coverage for tagmap but it seems like the same can be achieved
with a (relatively) simple and much more powerful custom extension:
import hgext.convert.hg
def f(tag):
return tag.replace('some', 'other')
class source(hgext.convert.hg.mercurial_source):
def gettags(self):
return dict((f(tag), node)
for tag, node in in super(source, self).gettags().items())
def getfile(self, name, rev):
data, flags = super(source, self).getfile(name, rev)
if name == '.hgtags':
data = ''.join(l[:41] + f(l[41:]) + '\n' for l in data.splitlines())
return data, flags
hgext.convert.hg.mercurial_source = source
Previously, there was no way to rewrite tags on the fly while converting. Now,
we add similar logic to branchmap to provide a way to map old tags to new tags.
Currently, this is not enabled since there is not yet a command-line option.
The fix for issue2653 broke the ability to map the default branch of a source
repository to a non-default named branch in the destination repository. Leave
the default behaviour as is, but allow the branch name "None" to be used to map
to a non-default named branch in the destination repository.
destc is not a string and can thus not be os.path.join'ed. Convert would crash
if we ended up there ... but we wouldn't because both the sinks (hg and
subversion) sinks implement .revmapfile and "never" throws exceptions.
1. Introduced 2 levels of error handling for splicemap files
a. Check the splicemap file for rules which are same across different
types of source repos. This is done through enhancing parsesplicemap
function
b. Check revision string formats. Each repo may have their own format.
This is done usign checkrevformat function
c. Implemented the above two for hg
If you actively work with branches, sometimes you need to close old branches
which last commited hundreds revisions ago. After close you will see long
lines in graph visually spoiling history. This sort only moves closed
revisions as close as possible to parents and does not increase storage size
as datesort do.
When sorting revisions before converting them, we have to edit the revision
graph using splicemap entries. Otherwise, a spliced revision may be converted
before its synthetic parents. Invalid splicemap revisions are now detected
before starting the conversion.
Parsing the splicemap as a mapfile was a pain because map does not let us
override its parsing code and splicemap entries are not key/values. Besides we
had no need for mapfiles extra features. Just parse the splicemap and return a
dictionary.
This aligns the authormap option with the other three mapping options.
The old --authors option is still supported and 'hg help convert -v'
will still show it.