Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
When converting a Git repository to Mercurial at Mozilla, I encountered
a scenario where I didn't want `hg convert` to automatically add the
"committer: <committer>" line to commit messages. While I can hack around
this by rewriting the Git commit before it is fed into `hg convert`,
I figured it would be a useful knob to control.
This patch introduces a config option that allows lots of control
over the committer value. I initially implemented this as a single
boolean flag to control whether to save the committer message. But
then there was feedback that it would be useful to save the committer
in extra data. While this patch doesn't implement support for saving
in extra data, it does add a mechanism for extending which actions
to take on the committer field. We should be able to easily add
actions to save in extra data.
Some of the implemented features weren't asked for. But I figured they
could be useful. If nothing else they demonstrate the extensibility
of this mechanism.
common.commit.__init__ sets saverev=True by default. The side effect
of this is that the hg sink will always set the "convert_revision"
extras key to the commit being converted.
This patch adds a config option to disable this behavior.
While most consumers will want "convert_revision" to be a) written
b) with the exact Git commit that was converted, some have use cases
that prefer otherwise. In my case, I am performing significant
rewrites of a Git repository *before* it is fed into `hg convert`.
I have to do this because `hg convert` does not easily support the kind
of transform I desire, even with extensions. (For the curious, I am
"linearizing" the history of a GitHub repo by removing merge commits
which add little value to the final history. It isn't easy to do this
during `hg convert` because of Mercurial's file copy/rename metadata
requirements.)
In my scenario, my pre-convert transform stores a "convert_revision"
key in the Git commit object containing the original Git commit ID.
I want this original Git commit ID carried forward to Mercurial. By
disabling the setting of this extra during `hg convert` and copying
the value from the Git commit object, I can have the final
"convert_revision" extra key contain the original Git commit ID. An
added test verifies this exact scenario.
This feature could likely be implemented for other VCS sources. But
until someone needs the feature, I'm inclined to hold off implementing.
Git commit objects support storing arbitrary key-value metadata. While
there is no user-facing mechanism in Git to record these values, some
tools do record data here.
Currently, `hg convert` only handles the "author," "committer," and
"parent" keys in Git commit objects. All other keys are ignored. This
means that any custom keys are lost when converting Git repos to
Mercurial.
This patch implements support for copying a whitelist of extra keys
from Git commit objects to the "extras" dict of the destination. As
the added tests demonstate, this allows extra metadata to be preserved
during the conversion process.
This patch stops short of converting all metadata to "extras." We could
potentially implement this via `convert.git.extrakeys=*` or similar.
But copying everything by default is a bit dangerous because if Git
adds new keys to commit objects, we could find ourselves copying
things that shouldn't be copied!
This patch also assumes the source key is the same as the destination
key. We could implement support for prefixing the output key to
distinguish it as coming from Git. But until this feature is needed,
I'm inclined to hold off implementing it.
By default, Git applies rename and copy detection to 400 files. The
diff.renamelimit config option and -l argument to diff commands can
override this.
As part of converting some repositories in the wild, I was hitting
the default limit. Unfortunately, the warnings that Git prints in this
scenario are swallowed because the process running functionality in
common.py redirects stderr to /dev/null by default. This seems like
a bug, but a bug for another day.
This commit establishes a config option to send the rename limit
through to `git diff-tree`. The added tests demonstrate a too-low
rename limit doesn't result in copy metadata being recorded.
We are using read-only attributes that parse the perforce data on
demand. We are reading the data only once whenever an attribute is
requested and use it throughout the import process. This is equivalent
to the previous behavior, but we are avoiding reading from perforce when
we initialize the object, but instead run it during the actual import
process, when the first attribute is requested (usually getheads(), see
`convertcmd.convert`).
This is the only place where strutil is used. I don't think it's worth to
keep the strutil module, so inline it.
Also, strutil.rfindall() appears to have off-by-one error. 'end = c - 1' is
wrong because 'end' is exclusive.
Source revision data that exists in the revmap are ignored when pulling
data from Perforce as we consider them already imported. In case where
the `convertcmd.convert` algorithm requests a commit object for such
a revision we are creating it. This is usually the case for parent of
the first imported revision.
Split fetching the `describe` form from Perforce and the commit object creation
into two functions. This allows us to reuse the commit construction for
revisions passed from a revmap.
Don't set a head revision in cases where we have a revmap but no
changesets to import, as convertcmd.convert() treats them as heads of
to-imported revisions.
Implement `common.setrevmap` which is used to pass in a file with existing
revision mappings. This functionality is used by `convertcmd.convert` if it
exists and allows implementors such as the p4 converter to make use of an
existing mapping.
We are using the revmap to abort scanning and the repository for more information
if we already have the revision. This means we are allowing incremental imports
in cases where a revmap is provided.
We are using convert_revisions in other importers. In order to unify this
we are also using convert_revision for Perforce in addition to the original
'p4'.
We are iterating over p4changes. Make the continue condition more clear
and easier to add new conditions in future patches, by removing the list
comprehension and move the condition into the existing for-loop.
Now that all the functionality has been moved to manifestlog/manifestrevlog/etc,
we can finally change all the uses of repo.manifest to use the new versions. A
future diff will then delete repo.manifest.
One additional change in this commit is to change repo.manifestlog to be a
@storecache property instead of @property. This is required by some uses of
repo.manifest require that it be settable (contrib/perf.py and the static http
server). We can't do this in a prior change because we can't use @storecache on
this until repo.manifest is no longer used anywhere.
I've caught multiple extensions in the wild lying about being
'internal', so it's time to move the goalposts on people. Goalpost
moving will continue until third party extensions stop trying to
defeat the system.
The svn_config_get_config config call was being called at the module level, but
had the potential to throw permission denied errors if ~/.subversion/servers was
not readable. This could happen in certain test environments where the user
permissions were very particular.
This prevented the remotenames extension from loading, since it imports
convert's hg module, which imports convert's subversion module, which calls
this. The config is only ever used from this one constructor, so let's just move
it in to there.
The inventory property was deprecated in favor of root_inventory in bzr
2.5.0. Current version is 2.7.0.
I noticed this when testing locally on Python 2.6.9, which has warnings
turned on by default. The failure that occurs without this patch can be
seen on Python 2.7 by running with warnings enabled:
$ PYTHONWARNINGS=::DeprecationWarning make 'test-convert-bzr*'
The cPickle is renamed to _pickle in python3 and this C extension is available
in pickle which was not included in earlier versions. So imports are conditionalized
to import cPickle in py2 and pickle in py3. Moreover the use of pickle in py2 is
switched to cPickle as the C extension is faster. The hack is added in util.py and
the modules import util.pickle
Since (b) is banned, we should do the same for (a) for consistency.
a) from mercurial import hg
from mercurial.i18n import _
b) from . import hg
from .i18n import _
Fixes CVE-2016-3105 (1/1).
Previously, it was possible for the repository path passed to git-ls-remote
to be misinterpreted as a URL.
Always passing an absolute path to git is a simple way to avoid this.
Before, when converting revisions without also including their already
converted parents in convert.hg.revs, the parents would no longer be parents.
That seems unfortunate and we dare to assume that nobody ever wants that.
Instead, preserve parents that are outside the current convert range but
already have been converted.
The parents returned in getcommit() are unconditionally converted, so we
introduce a separate optparents with optional parents.
Before this patch, import-checker reports error below for importing
subversion python binding libraries.
stdlib import "svn.*" follows local import: mercurial
Before this patch, import-checker reports "relative import of stdlib
module" error for importing Pool and SubversionException from svn.core
in subversion.py.
To fix this relative import of stdlib module, this patch adds prefix
'svn.core.' to Pool and SubversionException in source.
These 'svn.core.' relative accessing shouldn't cause performance
impact, because there are much more code paths accessing to
'svn.core.' relative properties.
BTW, in transport.py, this error is avoided by assignment below.
SubversionException = svn.core.SubversionException
But this can't be used in subversion.py case, because:
- such assignment in indented code block causes "don't use camelcase
in identifiers" error of check-code.py
- but it should be placed in indented block, because svn is None at
failure of importing subversion python binding libraries (=
examination of 'svn' is needed)
mercurial_source.getchanges() seems to care about files whose nodeid
has changed even if their contents has not (i.e. it has been
reverted/backed out). The method uses ctx1.status(ctx2) to find
differencing files. However, that method is currently broken and
reports reverted changes as modified. In order to fix that method, we
first need to rewrite getchanges() using manifest.diff(), which does
report reverted files as modified (because it's about differences in
the manifest, so about nodeids).
The next commit will rewrite the way we find changes between two
manifests. By making the cache not care about the difference between
added and modified files, we don't require the rewritten code to care
about that difference either. Also extract the call to ctx.status() to
simplify the next commit.
Once we get a matcher down into manifestmerge, we can make narrowhg
work more easily and potentially let manifest.match().diff() do less
work in manifestmerge.
Before this patch, convert was using repo._bookmarks.write, a deprecated API
for saving bookmarks.
This patch changes the use of repo._bookmarks.write to
repo._bookmarks.recordchange.
Instead of reporting
spliced in ['82544090e14fe18091e04f1fb0f0d7991cbe6e7e'] as parents of 369fd983d9e13330e9f12d9fce820deae84ea223
report
spliced in 82544090e14fe18091e04f1fb0f0d7991cbe6e7e as parents of 369fd983d9e13330e9f12d9fce820deae84ea223
cvsps computes the parent revisions of log entries by walking the cvs log
sorted by (rcs, revision) and by iteratively maintaining a 'versions'
dictionary which maps a (rcs, branch) pair onto the last revision seen for that
pair. When log caching is on and a log cache exists, cvsps fails to set the
parent revisions of new log entries because it does not iterate over the log
cache in the parents computation. A complication is that a file rcs can change
(move to/from the attic), with respect to its value in the log cache, if the
file is removed/added back. This patch adds an iteration over the log cache to
update the rcs of cached log entries, if changed, and to properly populate the
'versions' dictionary.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
In some languages that have no caps, "DEPRECATED" and "deprecated" can be
translated to the same byte sequence. So it is too wild to exclude messages
by _("DEPRECATED").
Multiple --rev args on convert is a new feature, and was initially disabled for
all sources. It has since been enabled on git sources, and this patch enables it
on mercurial sources.
Recently we fixed converting merges to correctly sync changes from p2. We missed
the case of deletes though (so p2 deleted a file that p1 had not yet deleted,
and the file does not belong to the source).
The fix is to detect when p2 doesn't have the file, so we just sync it as a
delete to p1 in the merge.
Updated the test, and verified it failed before the fix.
This adds an option to not pull in gitsubmodules during a convert. This is
useful when converting large git repositories where gitsubmodules were allowed
historically, but are no longer wanted.
When converting a merge commit using a filemap convert (i.e. when moving
contents from the root of the repo into subdir1/), convert would silently drop
the entire contents of the target repo's p2. This was because when it built the
target commit, it did so by taking the target p1 and adding only the files that
changed in the source repo's merge commit.
This breaks in the case where the target repo has files that are unrelated to
the source repo (like in the case where you use convert to import a repo as a
subdirectory of another).
The fix is to use Mercurial's merge logic to detect which files in p2 we should
carry over to the merge. It follows three rules:
1) if the file belongs to the source, don't try to merge it. Rely on the list of
files provided to putcommit to be correct.
2) if the file requires merging or user input (change vs deleted), throw an
exception. We don't have enough info to do this.
3) if p2 has the newest, non-merge-requiring version of the file, take it
I've also added a test to cover this issue.
This is an implementation of the new targetfilebelongstosource() function for
the filemapper. It simply checks if the given file name is prefixed by any of
the rename destinations.
It is not a perfect implementation since it doesn't account for the filemap
specifying includes or excludes, but that makes the problem much harder, and
this implementation should suffice for most cases.
This adds a base implementation of a function that tests if a given file from a
target repo came from the source repo. This will be used later to detect which
files did not come from the source repo during a merge, so we can merge those
files correctly instead of dropping them.
There was a bug in the git convert code where if you copied a file and modified
the copy source in the same commit, and if the copy dest was alphabetically
earlier than the copy source, the converted version would use the copy dest
contents for both the source and the target.
The root of the bug is that the git diff-tree output is formatted like so:
:<mode> <mode> <oldhash> <newhash> <state> <src> <dest>
:100644 100644 c1ab79a15... 3dfc779ab... C069 oldname newname
:100644 100644 c1ab79a15... 03e2188a6... M oldname
The old code would always take the 'oldname' field as the name of the file being
processed, then it would try to do an extra convert for the newname. This works
for renames because it does a delete for the oldname and a create for the
newname.
For copies though, it ends up associating the copied content (3dfc779ab above)
with the oldname. It only happened when the dest was alphabetically before
because that meant the copy got processed before the modification.
The fix is the treat copy lines as affecting only the newname, and not marking
the oldname as processed.
On Windows Perforce command line client uses default system locale to encode
output. Using 'latin_1' causes locale-specific characters to be replaced with
question marks. With this patch we will use default locale by default whilst
allowing to specify it explicity with 'convert.p4.encoding' config option.
This is a potentially breaking change for any scripts relying on output treated
as in 'latin_1' encoding.
Also because hgext.convert.convcmd overwrites detected default system locale
with UTF-8 we had to introduce an import cycle in hgext.convert.p4 to retrieve
originally detected encoding from hgext.convert.convcmd.
As it turned out, even when getting relatively small files, concatenating
string data every time when new chunk is received is very inefficient.
Maintaining a string list of data chunks and concatenating everything in one go
at the end seems much more efficient - in my testing it made getting 40 MB file
7 times faster, whilst converting of a particularly big changelist with some big
files went down from 20 hours to 3 hours.
The conversion from git to hg was reading the remote branch list directly from
the origin server. If the origin's branch had moved forward since the last git
fetch, it would return a git hash which didn't exist locally, and therefore the
branch was not converted.
This changes it to rely on the local repo's refs/remotes list of branches
instead, so it's completely cut off from the server.
A fix for issue2653 with f5abbf51a76e introduced a discrepancy how default
branch should be denoted when converting with branchmap from different SCM.
E.g. for Git and Mercurial you need to use 'default' whilst for Perforce and
SVN you had to use 'None'. This changeset unifies 'default' for such purposes
whilst falling back to 'None' when no 'default' mapping specified.