p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
# (c) 2017-present Facebook Inc.
|
|
|
|
"""p4fastimport - A fast importer from Perforce to Mercurial
|
|
|
|
|
|
|
|
Config example:
|
|
|
|
|
|
|
|
[p4fastimport]
|
|
|
|
# whether use worker or not
|
|
|
|
useworker = false
|
|
|
|
# trace copies?
|
|
|
|
copytrace = false
|
2017-04-25 17:29:39 +03:00
|
|
|
# if LFS is enabled, write only the metadata to disk, do not write the
|
|
|
|
# blob itself to the local cache.
|
|
|
|
lfspointeronly = false
|
2017-04-25 17:29:39 +03:00
|
|
|
# path to sqlite output file for lfs metadata
|
|
|
|
lfsmetadata = PATH
|
2017-04-25 17:29:39 +03:00
|
|
|
# path to sqlite output file for metadata
|
|
|
|
metadata = PATH
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
"""
|
|
|
|
from __future__ import absolute_import
|
|
|
|
|
|
|
|
import collections
|
|
|
|
import json
|
2017-04-25 17:29:39 +03:00
|
|
|
import sqlite3
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
from . import (
|
|
|
|
p4,
|
|
|
|
importer,
|
2017-05-16 22:36:23 +03:00
|
|
|
filetransaction as ftrmod
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
)
|
2017-05-16 22:36:23 +03:00
|
|
|
|
2017-05-02 06:43:11 +03:00
|
|
|
from .util import runworker, lastcl, decodefileflags
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
from mercurial.i18n import _
|
2017-04-25 17:29:39 +03:00
|
|
|
from mercurial.node import short, hex
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
from mercurial import (
|
|
|
|
error,
|
2017-04-25 17:29:39 +03:00
|
|
|
extensions,
|
2017-05-22 23:38:37 +03:00
|
|
|
registrar,
|
2017-05-13 03:03:20 +03:00
|
|
|
revlog,
|
2017-04-25 17:29:39 +03:00
|
|
|
scmutil,
|
2017-04-25 17:29:39 +03:00
|
|
|
verify,
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
)
|
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
def reposetup(ui, repo):
|
|
|
|
def nothing(orig, *args, **kwargs):
|
|
|
|
pass
|
|
|
|
def yoloverify(orig, *args, **kwargs):
|
|
|
|
# We have to set it directly as repo is reading the config lfs.bypass
|
|
|
|
# during their repo setup.
|
|
|
|
repo.svfs.options['lfsbypass'] = True
|
|
|
|
return orig(*args, **kwargs)
|
|
|
|
def handlelfs(loaded):
|
|
|
|
if loaded:
|
|
|
|
lfs = extensions.find('lfs')
|
|
|
|
extensions.wrapfunction(lfs.blobstore.local, 'write', nothing)
|
|
|
|
extensions.wrapfunction(lfs.blobstore.local, 'read', nothing)
|
|
|
|
|
|
|
|
extensions.wrapfunction(verify.verifier, 'verify', yoloverify)
|
|
|
|
extensions.afterloaded('lfs', handlelfs)
|
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
def writebookmark(tr, repo, revisions, name):
|
|
|
|
if len(revisions) > 0:
|
|
|
|
marks = repo._bookmarks
|
|
|
|
__, hexnode = revisions[-1]
|
|
|
|
marks[name] = repo[hexnode].node()
|
|
|
|
marks.recordchange(tr)
|
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
def writerevmetadata(revisions, outfile):
|
|
|
|
"""Write the LFS mappings from OID to a depotpath and it's CLnum into
|
|
|
|
sqlite. This way the LFS server can import the correct file from Perforce
|
|
|
|
and mapping it to the correct OID.
|
|
|
|
"""
|
|
|
|
with sqlite3.connect(outfile, isolation_level=None) as conn:
|
|
|
|
cur = conn.cursor()
|
|
|
|
cur.execute("BEGIN TRANSACTION")
|
|
|
|
cur.execute("""
|
|
|
|
CREATE TABLE IF NOT EXISTS revision_mapping (
|
|
|
|
"id" INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
|
|
"cl" INTEGER NOT NULL,
|
|
|
|
"node" BLOB
|
|
|
|
)""")
|
|
|
|
cur.executemany(
|
|
|
|
"INSERT INTO revision_mapping(cl, node) VALUES (?,?)",
|
|
|
|
revisions)
|
|
|
|
cur.execute("COMMIT")
|
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
def writelfsmetadata(largefiles, revisions, outfile):
|
|
|
|
"""Write the LFS mappings from OID to a depotpath and it's CLnum into
|
|
|
|
sqlite. This way the LFS server can import the correct file from Perforce
|
|
|
|
and mapping it to the correct OID.
|
|
|
|
"""
|
|
|
|
with sqlite3.connect(outfile, isolation_level=None) as conn:
|
|
|
|
cur = conn.cursor()
|
|
|
|
cur.execute("BEGIN TRANSACTION")
|
|
|
|
cur.execute("""
|
|
|
|
CREATE TABLE IF NOT EXISTS p4_lfs_map(
|
|
|
|
"id" INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
|
|
"cl" INTEGER NOT NULL,
|
|
|
|
"node" BLOB,
|
|
|
|
"oid" TEXT,
|
|
|
|
"path" BLOB
|
|
|
|
)""")
|
|
|
|
inserts = []
|
|
|
|
revdict = dict(revisions)
|
|
|
|
for cl, path, oid in largefiles:
|
|
|
|
inserts.append((cl, path, oid, revdict[cl]))
|
|
|
|
|
|
|
|
cur.executemany(
|
|
|
|
"INSERT INTO p4_lfs_map(cl, path, oid, node) VALUES (?,?,?,?)",
|
|
|
|
inserts)
|
|
|
|
cur.execute("COMMIT")
|
|
|
|
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
def create(tr, ui, repo, importset, filelogs):
|
|
|
|
for filelog in filelogs:
|
|
|
|
# If the Perforce is case insensitive a filelog can map to
|
|
|
|
# multiple filenames. For exmaple A.txt and a.txt would show up in the
|
|
|
|
# same filelog. It would be more appropriate to update the filelist
|
|
|
|
# after receiving the initial filelist but this would not be parallel.
|
|
|
|
fi = importer.FileImporter(ui, repo, importset, filelog)
|
2017-04-25 17:29:39 +03:00
|
|
|
fileflags, largefiles, oldtiprev, newtiprev = fi.create(tr)
|
2017-04-20 09:33:06 +03:00
|
|
|
yield 1, json.dumps({
|
2017-04-20 09:33:06 +03:00
|
|
|
'newtiprev': newtiprev,
|
|
|
|
'oldtiprev': oldtiprev,
|
2017-04-20 09:33:06 +03:00
|
|
|
'fileflags': fileflags,
|
2017-04-25 17:29:39 +03:00
|
|
|
'largefiles': largefiles,
|
2017-04-20 09:33:06 +03:00
|
|
|
'depotname': filelog.depotfile,
|
2017-04-20 09:33:06 +03:00
|
|
|
'localname': fi.relpath,
|
2017-04-20 09:33:06 +03:00
|
|
|
})
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
cmdtable = {}
|
2017-05-22 23:38:37 +03:00
|
|
|
command = registrar.command(cmdtable)
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
@command(
|
|
|
|
'p4fastimport',
|
2017-04-25 17:29:39 +03:00
|
|
|
[('P', 'path', '.', _('path to the local depot store'), _('PATH')),
|
2017-05-02 06:43:11 +03:00
|
|
|
('B', 'bookmark', '', _('bookmark to set'), _('NAME')),
|
|
|
|
('', 'limit', '',
|
|
|
|
_('number of changelists to import at a time'), _('N'))],
|
|
|
|
_('[-P PATH] [-B NAME] [--limit N] [CLIENT]'),
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
inferrepo=True)
|
|
|
|
def p4fastimport(ui, repo, client, **opts):
|
|
|
|
if 'fncache' in repo.requirements:
|
|
|
|
raise error.Abort(_('fncache must be disabled'))
|
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
if opts.get('bookmark'):
|
|
|
|
scmutil.checknewlabel(repo, opts['bookmark'], 'bookmark')
|
|
|
|
|
2017-04-20 09:33:06 +03:00
|
|
|
startcl = None
|
|
|
|
if len(repo) > 0 and startcl is None:
|
|
|
|
latestctx = list(repo.set("last(extra(p4changelist))"))
|
|
|
|
if latestctx:
|
2017-05-02 06:43:11 +03:00
|
|
|
startcl = lastcl(latestctx[0])
|
2017-04-20 09:33:06 +03:00
|
|
|
ui.note(_('incremental import from changelist: %d, node: %s\n') %
|
|
|
|
(startcl, short(latestctx[0].node())))
|
|
|
|
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
# A client defines checkout behavior for a user. It contains a list of
|
|
|
|
# views.A view defines a set of files and directories to check out from a
|
|
|
|
# Perforce server and their mappins to local disk, e.g.:
|
|
|
|
# //depot/foo/... //client/x/...
|
|
|
|
# would map the files that are stored on the
|
|
|
|
# server under foo/* locally under x/*.
|
2017-07-13 20:18:39 +03:00
|
|
|
|
|
|
|
# 0. Fail if the specified client does not exist
|
|
|
|
if not p4.exists_client(client):
|
|
|
|
raise error.Abort(_('p4 client %s does not exist.') % client)
|
|
|
|
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
# 1. Return all the changelists touching files in a given client view.
|
|
|
|
ui.note(_('loading changelist numbers.\n'))
|
2017-05-02 06:43:11 +03:00
|
|
|
changelists = sorted(p4.parse_changes(client, startcl=startcl))
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
ui.note(_('%d changelists to import.\n') % len(changelists))
|
|
|
|
|
2017-05-02 06:43:11 +03:00
|
|
|
limit = len(changelists)
|
|
|
|
if opts.get('limit'):
|
|
|
|
limit = int(opts.get('limit'))
|
|
|
|
run_import(ui, repo, client, changelists[0:limit], **opts)
|
|
|
|
|
|
|
|
def run_import(ui, repo, client, changelists, **opts):
|
|
|
|
if len(changelists) == 0:
|
|
|
|
return
|
|
|
|
|
|
|
|
basepath = opts.get('path')
|
|
|
|
startcl, endcl = changelists[0].cl, changelists[-1].cl
|
|
|
|
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
# 2. Get a list of files that we will have to import from the depot with
|
|
|
|
# it's full path in the depot.
|
|
|
|
ui.note(_('loading list of files.\n'))
|
|
|
|
filelist = set()
|
2017-05-02 06:43:11 +03:00
|
|
|
for fileinfo in p4.parse_filelist(client, startcl=startcl, endcl=endcl):
|
2017-05-02 06:43:11 +03:00
|
|
|
if fileinfo['action'] in p4.ACTION_ARCHIVE:
|
|
|
|
pass
|
|
|
|
elif fileinfo['action'] in p4.SUPPORTED_ACTIONS:
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
filelist.add(fileinfo['depotFile'])
|
|
|
|
else:
|
|
|
|
ui.warn(_('unknown action %s: %s\n') % (fileinfo['action'],
|
|
|
|
fileinfo['depotFile']))
|
|
|
|
ui.note(_('%d files to import.\n') % len(filelist))
|
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
importset = importer.ImportSet(repo, client, changelists,
|
|
|
|
filelist, basepath)
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
p4filelogs = []
|
|
|
|
for i, f in enumerate(importset.filelogs()):
|
2017-05-02 06:43:11 +03:00
|
|
|
ui.debug('reading filelog %s\n' % f.depotfile)
|
|
|
|
ui.progress(_('reading filelog'), i, unit=_('filelogs'),
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
total=len(filelist))
|
|
|
|
p4filelogs.append(f)
|
2017-05-02 06:43:11 +03:00
|
|
|
ui.progress(_('reading filelog'), None)
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
# runlist is used to topologically order files which were branched (Perforce
|
|
|
|
# uses per-file branching, not per-repo branching). If we do copytracing a
|
|
|
|
# file A' which was branched off A will be considered a copy of A. Therefore
|
|
|
|
# we need to import A' before A. In this case A' will have a dependency
|
|
|
|
# counter +1 of A's, and therefore being imported after A. If copy tracing
|
|
|
|
# is disabled this is not needed and we can import files in arbitrary order.
|
|
|
|
runlist = collections.OrderedDict()
|
|
|
|
if ui.configbool('p4fastimport', 'copytrace', False):
|
|
|
|
raise error.Abort(_('copytracing is broken'))
|
|
|
|
else:
|
|
|
|
runlist[0] = p4filelogs
|
|
|
|
|
|
|
|
ui.note(_('importing repository.\n'))
|
2017-04-25 17:29:39 +03:00
|
|
|
with repo.wlock(), repo.lock():
|
2017-05-16 22:36:23 +03:00
|
|
|
for a, b in importset.caseconflicts:
|
|
|
|
ui.warn(_('case conflict: %s and %s\n') % (a, b))
|
|
|
|
# 3. Import files.
|
|
|
|
count = 0
|
|
|
|
fileinfo = {}
|
|
|
|
largefiles = []
|
|
|
|
ftr = ftrmod.filetransaction(ui.warn, repo.svfs)
|
2017-04-25 17:29:39 +03:00
|
|
|
try:
|
|
|
|
for filelogs in map(sorted, runlist.values()):
|
2017-05-16 22:36:23 +03:00
|
|
|
wargs = (ftr, ui, repo, importset)
|
2017-04-25 17:29:39 +03:00
|
|
|
for i, serialized in runworker(ui, create, wargs, filelogs):
|
|
|
|
data = json.loads(serialized)
|
2017-05-02 06:43:11 +03:00
|
|
|
ui.progress(_('importing filelogs'), count,
|
2017-04-25 17:29:39 +03:00
|
|
|
item=data['depotname'], unit='file',
|
|
|
|
total=len(p4filelogs))
|
|
|
|
# Json converts to UTF8 and int keys to strings, so we
|
|
|
|
# have to convert back.
|
|
|
|
# TODO: Find a better way to handle this.
|
|
|
|
fileinfo[data['depotname']] = {
|
|
|
|
'localname': data['localname'].encode('utf-8'),
|
2017-05-02 06:43:11 +03:00
|
|
|
'flags': decodefileflags(data['fileflags']),
|
2017-04-25 17:29:39 +03:00
|
|
|
'baserev': data['oldtiprev'],
|
|
|
|
}
|
|
|
|
largefiles.extend(data['largefiles'])
|
|
|
|
count += i
|
2017-05-02 06:43:11 +03:00
|
|
|
ui.progress(_('importing filelogs'), None)
|
2017-05-16 22:36:23 +03:00
|
|
|
ftr.close()
|
|
|
|
|
|
|
|
tr = repo.transaction('import')
|
|
|
|
try:
|
|
|
|
# 4. Generate manifest and changelog based on the filelogs
|
|
|
|
# we imported
|
|
|
|
clog = importer.ChangeManifestImporter(ui, repo, importset)
|
|
|
|
revisions = []
|
|
|
|
for cl, hgnode in clog.creategen(tr, fileinfo):
|
|
|
|
revisions.append((cl, hex(hgnode)))
|
|
|
|
|
|
|
|
if opts.get('bookmark'):
|
|
|
|
ui.note(_('writing bookmark\n'))
|
|
|
|
writebookmark(tr, repo, revisions, opts['bookmark'])
|
|
|
|
|
|
|
|
if ui.config('p4fastimport', 'lfsmetadata', None) is not None:
|
|
|
|
ui.note(_('writing lfs metadata to sqlite\n'))
|
|
|
|
writelfsmetadata(largefiles, revisions,
|
|
|
|
ui.config('p4fastimport', 'lfsmetadata', None))
|
|
|
|
|
|
|
|
if ui.config('p4fastimport', 'metadata', None) is not None:
|
|
|
|
ui.note(_('writing metadata to sqlite\n'))
|
|
|
|
writerevmetadata(revisions,
|
|
|
|
ui.config('p4fastimport', 'metadata', None))
|
|
|
|
|
|
|
|
tr.close()
|
|
|
|
ui.note(_('%d revision(s), %d file(s) imported.\n') % (
|
|
|
|
len(changelists), count))
|
|
|
|
finally:
|
|
|
|
tr.release()
|
2017-04-25 17:29:39 +03:00
|
|
|
finally:
|
2017-05-16 22:36:23 +03:00
|
|
|
ftr.release()
|
2017-05-13 03:03:20 +03:00
|
|
|
|
|
|
|
@command('debugscanlfs',
|
|
|
|
[('C', 'client', '', _('Perforce client to reverse lookup')),
|
|
|
|
('r', 'rev', '.', _('display LFS files in REV')),
|
|
|
|
('A', 'all', None, _('display LFS files all revisions'))])
|
|
|
|
def debugscanlfs(ui, repo, **opts):
|
|
|
|
lfs = extensions.find('lfs')
|
|
|
|
def display(repo, filename, flog, rev):
|
|
|
|
filenode = flog.node(rev)
|
|
|
|
rawtext = flog.revision(filenode, raw=True)
|
|
|
|
ptr = lfs.pointer.deserialize(rawtext)
|
|
|
|
linkrev = flog.linkrev(rev)
|
|
|
|
cl = int(repo[linkrev].extra()['p4changelist'])
|
|
|
|
return _('%d %s %s %d %s\n') % (
|
|
|
|
flog.linkrev(rev), hex(filenode), ptr.oid(), cl, filename)
|
|
|
|
|
|
|
|
def batchfnmap(repo, client, infos):
|
|
|
|
for filename, flog, rev in infos:
|
|
|
|
whereinfo = p4.parse_where(client, filename)
|
|
|
|
yield 1, display(repo, whereinfo['depotFile'], flog, rev)
|
|
|
|
|
|
|
|
client = opts.get('client', None)
|
|
|
|
todisplay = []
|
|
|
|
if opts.get('all'):
|
|
|
|
prefix, suffix = "data/", ".i"
|
|
|
|
plen, slen = len(prefix), len(suffix)
|
|
|
|
for fn, b, size in repo.store.datafiles():
|
|
|
|
if size == 0 or fn[-slen:] != suffix or fn[:plen] != prefix:
|
|
|
|
continue
|
|
|
|
fn = fn[plen:-slen]
|
|
|
|
flog = repo.file(fn)
|
|
|
|
for rev in range(0, len(flog)):
|
|
|
|
flags = flog.flags(rev)
|
|
|
|
if bool(flags & revlog.REVIDX_EXTSTORED):
|
|
|
|
if client:
|
|
|
|
todisplay.append((fn, flog, rev))
|
|
|
|
else:
|
|
|
|
ui.write(display(repo, fn, flog, rev))
|
|
|
|
else:
|
|
|
|
revisions = repo.set(opts.get('rev', '.'))
|
|
|
|
for ctx in revisions:
|
|
|
|
for fn in ctx.manifest():
|
|
|
|
fctx = ctx[fn]
|
|
|
|
flog = fctx.filelog()
|
|
|
|
flags = flog.flags(fctx.filerev())
|
|
|
|
if bool(flags & revlog.REVIDX_EXTSTORED):
|
|
|
|
if client:
|
|
|
|
todisplay.append((fn, flog, fctx.filerev()))
|
|
|
|
else:
|
|
|
|
ui.write(display(repo, fn, flog, fctx.filerev()))
|
|
|
|
if todisplay:
|
|
|
|
args = (repo, client)
|
|
|
|
for i, s in runworker(ui, batchfnmap, args, todisplay):
|
|
|
|
ui.write(s)
|