Summary: We want to have just one entry point to Mercurial, namely the Rust binary. Getting rid of the `hg` Python script means that we finally can do things that only exist in Rust and not in Python.
Reviewed By: simpkins
Differential Revision: D13186374
fbshipit-source-id: f3c8cfe4beb7bf764172a8af04fd25202eca9af2
Summary:
Subrepo is another unloved features that we don't want to support.
Aggressively remove it everywhere, instead of just turning off configs.
I didn't spend much time to split this commit so it's smaller and more friendly
to review. But it seems tests are passing.
Reviewed By: sfilipco
Differential Revision: D14220099
fbshipit-source-id: adc512a047d99cd4bafd0362e3e9b24e71defe13
Summary:
The CVS branch support will break with the upcoming branchcache removal.
CVS repos are rare. Let's just remove the code.
Differential Revision: D14179863
fbshipit-source-id: 890ce34958d2efe4f2ef02b1d72cf92f6269378c
Summary:
Monotone repos are rare these days. Drop support for it.
This also solves an issue that test-convert-mtn.t uses named branches.
Differential Revision: D14059836
fbshipit-source-id: 1e9d4fe6fdc295393ff67c5e068b230b9ed0c0af
Summary:
Replace the default help for Mercurial with a curated list of interesting
commands, categorized by their use case.
Reviewed By: phillco
Differential Revision: D10356916
fbshipit-source-id: 65e578a4bfde7b0ad04e7107f4e77d8ea882d78a
Summary:
With `buck build`, the single hg binary won't be guarnateed to have
access to i18n messages because directories like `mercurial/locale`
do not exist on filesystem. It could also mess up with `PYTHONPATH`
somehow because the python binary wrapper sometimes ignores
`PYTHONPATH`.
So let's add a hghave feature for it. And gate troublesome tests
with `#if normal-layout`.
Reviewed By: DurhamG, phillco
Differential Revision: D6879876
fbshipit-source-id: 3d63605b55c8f7096093b89be824add2ec491f81
Summary:
The helper could be used in individual tests to enable chg if chg exists.
This allows us to have more precise control on what tests to use chg instead
of using a global flag in run-tests.py.
This makes certain tests containing many hg commands much faster. For example,
`test-revset.t` took 99 seconds before:
% ./run-tests.py test-revset.t --time
.
# Ran 1 tests, 0 skipped, 0 failed.
# Producing time report
start end cuser csys real Test
0.000 99.990 86.410 12.000 99.990 test-revset.t
And 10 seconds after:
% ./run-tests.py test-revset.t --time
.
# Ran 1 tests, 0 skipped, 0 failed.
# Producing time report
start end cuser csys real Test
0.000 10.080 0.380 0.130 10.080 test-revset.t
Also enable it for some other tests. Note the whitelist is not complete. We
probably want to whitelist more tests in the future.
The feature could be opted out by deleting `contrib/chg/chg`.
Reviewed By: phillco
Differential Revision: D6767036
fbshipit-source-id: 8220cf408aa198d5d8e2ca5127ca60e2070d3444
Summary: Also change the internal API so it no longer accepts the "heads" argument.
Reviewed By: ryanmce
Differential Revision: D6745865
fbshipit-source-id: 368742be49b192f7630421003552d0a10eb0b76d
# skip-blame because this was mechanically rewritten the following script. I
ran it on both *.t and *.py, but none of the *.py changes were proper. All *.t
ones appear to be, and they run without addition failures on both Windows and
Linux.
import argparse
import os
import re
ap = argparse.ArgumentParser()
ap.add_argument('path', nargs='+')
opts = ap.parse_args()
globre = re.compile(r'^(.*) \(glob\)(.*)$')
for p in opts.path:
tmp = p + '.tmp'
with open(p, 'rb') as src, open(tmp, 'wb') as dst:
for line in src:
m = globre.match(line)
if not m or '$LOCALIP' in line or '*' in line:
dst.write(line)
continue
if '?' in line[:-3] or ('?' in line[:-3] and line[-3:] != '(?)'):
dst.write(line)
continue
dst.write(m.group(1) + m.group(2) + '\n')
os.unlink(p)
os.rename(tmp, p)
Converting from CVS to Mercurial assumes that CVS log messages in "cvs
rlog" output are encoded in UTF-8 (or basic Latin-1). But cvs itself
is usually unaware of encoding of log messages, in practice.
Therefore, if there are commits, of which log message is encoded in
other than UTF-8, log message of corresponded revisions in the
converted repository will be broken.
To avoid such broken log messages, this patch transcodes CVS log
messages by encoding specified via "convert.cvsps.logencoding"
configuration.
This patch accepts multiple encoding for convenience, because
"multiple encoding mixed in a repository" easily occurs. For example,
UTF-8 (recent POSIX), cp932 (Windows), and EUC-JP (legacy POSIX) are
well known encoding for Japanese.
When converting a Git repository to Mercurial at Mozilla, I encountered
a scenario where I didn't want `hg convert` to automatically add the
"committer: <committer>" line to commit messages. While I can hack around
this by rewriting the Git commit before it is fed into `hg convert`,
I figured it would be a useful knob to control.
This patch introduces a config option that allows lots of control
over the committer value. I initially implemented this as a single
boolean flag to control whether to save the committer message. But
then there was feedback that it would be useful to save the committer
in extra data. While this patch doesn't implement support for saving
in extra data, it does add a mechanism for extending which actions
to take on the committer field. We should be able to easily add
actions to save in extra data.
Some of the implemented features weren't asked for. But I figured they
could be useful. If nothing else they demonstrate the extensibility
of this mechanism.
common.commit.__init__ sets saverev=True by default. The side effect
of this is that the hg sink will always set the "convert_revision"
extras key to the commit being converted.
This patch adds a config option to disable this behavior.
While most consumers will want "convert_revision" to be a) written
b) with the exact Git commit that was converted, some have use cases
that prefer otherwise. In my case, I am performing significant
rewrites of a Git repository *before* it is fed into `hg convert`.
I have to do this because `hg convert` does not easily support the kind
of transform I desire, even with extensions. (For the curious, I am
"linearizing" the history of a GitHub repo by removing merge commits
which add little value to the final history. It isn't easy to do this
during `hg convert` because of Mercurial's file copy/rename metadata
requirements.)
In my scenario, my pre-convert transform stores a "convert_revision"
key in the Git commit object containing the original Git commit ID.
I want this original Git commit ID carried forward to Mercurial. By
disabling the setting of this extra during `hg convert` and copying
the value from the Git commit object, I can have the final
"convert_revision" extra key contain the original Git commit ID. An
added test verifies this exact scenario.
This feature could likely be implemented for other VCS sources. But
until someone needs the feature, I'm inclined to hold off implementing.
Git commit objects support storing arbitrary key-value metadata. While
there is no user-facing mechanism in Git to record these values, some
tools do record data here.
Currently, `hg convert` only handles the "author," "committer," and
"parent" keys in Git commit objects. All other keys are ignored. This
means that any custom keys are lost when converting Git repos to
Mercurial.
This patch implements support for copying a whitelist of extra keys
from Git commit objects to the "extras" dict of the destination. As
the added tests demonstate, this allows extra metadata to be preserved
during the conversion process.
This patch stops short of converting all metadata to "extras." We could
potentially implement this via `convert.git.extrakeys=*` or similar.
But copying everything by default is a bit dangerous because if Git
adds new keys to commit objects, we could find ourselves copying
things that shouldn't be copied!
This patch also assumes the source key is the same as the destination
key. We could implement support for prefixing the output key to
distinguish it as coming from Git. But until this feature is needed,
I'm inclined to hold off implementing it.
By default, Git applies rename and copy detection to 400 files. The
diff.renamelimit config option and -l argument to diff commands can
override this.
As part of converting some repositories in the wild, I was hitting
the default limit. Unfortunately, the warnings that Git prints in this
scenario are swallowed because the process running functionality in
common.py redirects stderr to /dev/null by default. This seems like
a bug, but a bug for another day.
This commit establishes a config option to send the rename limit
through to `git diff-tree`. The added tests demonstrate a too-low
rename limit doesn't result in copy metadata being recorded.
Fixes CVE-2016-3105 (1/1).
Previously, it was possible for the repository path passed to git-ls-remote
to be misinterpreted as a URL.
Always passing an absolute path to git is a simple way to avoid this.
This adds an option to not pull in gitsubmodules during a convert. This is
useful when converting large git repositories where gitsubmodules were allowed
historically, but are no longer wanted.
On Windows Perforce command line client uses default system locale to encode
output. Using 'latin_1' causes locale-specific characters to be replaced with
question marks. With this patch we will use default locale by default whilst
allowing to specify it explicity with 'convert.p4.encoding' config option.
This is a potentially breaking change for any scripts relying on output treated
as in 'latin_1' encoding.
Also because hgext.convert.convcmd overwrites detected default system locale
with UTF-8 we had to introduce an import cycle in hgext.convert.p4 to retrieve
originally detected encoding from hgext.convert.convcmd.
Previously all git remotes were created as "remote/foo". This patch adds a
configuration option for deciding what the prefix should be. This is useful if
you want the bookmarks to be "origin/foo" like they are in git, or if you're
integrating with the remotenames extension and don't want the local remote/foo
bookmarks to overlap with the remote foo bookmarks.
This creates the convert.hg.sourcename config option which will embed a user
defined name into each commit created by the convert. This is useful when using
the convert extension to merge several repositories together and we want to
record where each commit came from.
Previously convert could only take one '--rev'. This change allows the user to
specify multiple --rev entries. For instance, this could allow converting
multiple branches (but not all branches) at once from git.
In this first patch, we disable support for this for all sources. Future
patches will enable it for select sources (like git).
In some cases we do not want to convert tags from the source repo to be tags in
the target repo (for instance, in a large repository, hgtags cause scaling
issues so we want to avoid them). This adds a config option to disable
converting tags.
This was implied in issue3486, which specifically asked for subrepo support in
lfconvert. Now that lfconvert uses the convert extension internally when going
to normal files, the issue is half fixed. But now even non largefile repos
benefit when other transformations are needed.
Supporting a full subrepo tree conversion from a single command doesn't seem
reasonable, given the number of options that can be provided, and the
transformations that would need to occur when entering a subrepo (consider
'filemap' paths). Instead, this allows the user to incrementally convert each
hg subrepo from bottom up like so:
# so convert knows the dest type when it sees a non empty dest dir
$ hg init converted
$ hg convert orig/sub1 converted/sub1
$ hg convert orig/sub2 converted/sub2
$ hg convert orig converted
This allows different options to be applied to different subrepos more readily.
It assumes the shamap is in the default location in each converted subrepo for
simplicity. It also allows for a subrepo to be cloned into place, in case _it_
doesn't need a conversion. I was able to convert away from using
largefiles/bfiles in several subrepos with this mechanism.
This default mirrors the default for 'git diff'. Other commands have slightly
different defaults -- for example, the move/copy detection for 'git blame'
assumes that a hunk is moved if more than 40 alphanumeric characters are the
same, or copied if more than 20 alphanumeric characters are the same. 50% seems
to be the most common default, though.
Git is fairly unique among VCSes in that it doesn't record copies and renames,
instead choosing to detect them on the fly. Since Mercurial expects copies and
renames to be recorded, it can be valuable to preserve this history while
converting a Git repository to Mercurial. This patch adds a new convert option,
called 'convert.git.similarity', which determines how similar files must be to
be treated as renames or copies.
Convert will normally only process files that were changed in a source
revision, apply the filemap, and record it has a change in the target
repository. (If it ends up not really changing anything, nothing changes.)
That means that _if_ the filemap is changed before continuing an incremental
convert, the change will only kick in when the files it affects are modified in
a source revision and thus processed.
With --full, convert will make a full conversion every time and process
all files in the source repo and remove target repo files that shouldn't be
there. Filemap changes will thus kick in on the first converted revision, no
matter what is changed.
This flag should in most cases not make any difference but will make convert
significantly slower.
Other names has been considered for this feature, such as "resync", "sync",
"checkunmodified", "all" or "allfiles", but I found that they were less obvious
and required more explanation than "full" and were harder to describe
consistently.
We used to have two slightly different message which people wouldn't read...
and then complain that they couldn't find the global options or examples.
So we unify them into one message that's upfront that STUFF IS
INTENTIONALLY HIDDEN and that looks more like our normal hint style.
Closemap solves a very specific use case. It would be better to have a more
generic solution than to have to maintain this forever.
Closemap has not been released yet and removing it now will not break any
backward compatibility contract.
There is no test coverage for closemap but it seems like the same can be
achieved with a simple and much more powerful custom extension:
import hgext.convert.hg
class source(hgext.convert.hg.mercurial_source):
def getcommit(self, rev):
c = super(source, self).getcommit(rev)
if rev in ['''
d643f67092ff123f6a192d52f12e7d123dae229f
3a6a38229d418ba09cb7784c01453a93b4d363f8
facceca31c18f7ef800977055dbcbd7fcb5c5cb2
''']:
c.extra = c.extra.copy()
c.extra['close'] = '1'
return c
hgext.convert.hg.mercurial_source = source
Tagmap solves a very specific use case. It would be better to have a more
generic solution than to have to maintain this forever.
Tagmap has not been released yet and removing it now will not break any
backward compatibility contract.
There is no test coverage for tagmap but it seems like the same can be achieved
with a (relatively) simple and much more powerful custom extension:
import hgext.convert.hg
def f(tag):
return tag.replace('some', 'other')
class source(hgext.convert.hg.mercurial_source):
def gettags(self):
return dict((f(tag), node)
for tag, node in in super(source, self).gettags().items())
def getfile(self, name, rev):
data, flags = super(source, self).getfile(name, rev)
if name == '.hgtags':
data = ''.join(l[:41] + f(l[41:]) + '\n' for l in data.splitlines())
return data, flags
hgext.convert.hg.mercurial_source = source
This adds a new root hghave to test against. Almost all of these are a
subset of unix-permissions, but that is also used for checking exec
bit handling.
The existing knobs for controlling which revisions to convert were often
insufficient. Revsets is a shiny hammer that provides a better solution.
Revsets has been introduced in --rev handling in a lot of other places while
being more or less backwards compatible. Doing the same here would be a much
more elegant ... but that would unfortunately not work in this case. "--rev 7"
used to mean revision 0 to 7 - it would be an unacceptable change if it
suddenly just meant revision 7.
Instead we introduce a new configuration setting. It will only work for
Mercurial repositories so adding a new commandline option for it would not be a
nice solution.
There is no way to use the fancy deprecation markup for configuration settings
so we just remove the documentation of hg.startrev.
If you actively work with branches, sometimes you need to close old branches
which last commited hundreds revisions ago. After close you will see long
lines in graph visually spoiling history. This sort only moves closed
revisions as close as possible to parents and does not increase storage size
as datesort do.
Mercurial would sometimes exit with:
abort: No such file or directory
where str of the actual OSError exception was the more helpful:
[Errno 2] No such file or directory: ''
The exception will now always show the filename and quote it:
abort: No such file or directory: ''