p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
# (c) 2017-present Facebook Inc.
|
|
|
|
from __future__ import absolute_import
|
|
|
|
|
|
|
|
import collections
|
|
|
|
import gzip
|
|
|
|
import os
|
|
|
|
import re
|
|
|
|
|
|
|
|
from mercurial.i18n import _
|
2017-04-20 15:47:18 +03:00
|
|
|
from mercurial.node import nullid, short
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
from mercurial import (
|
|
|
|
error,
|
2017-04-25 17:29:39 +03:00
|
|
|
extensions,
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
util,
|
|
|
|
)
|
|
|
|
|
|
|
|
from . import p4
|
2017-04-25 17:29:39 +03:00
|
|
|
from .util import caseconflict, localpath
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
2017-06-19 21:54:10 +03:00
|
|
|
KEYWORD_REGEX = "\$(Id|Header|DateTime|" + \
|
|
|
|
"Date|Change|File|" + \
|
|
|
|
"Revision|Author).*?\$"
|
|
|
|
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
class ImportSet(object):
|
2017-04-25 17:29:39 +03:00
|
|
|
def __init__(self, repo, client, changelists, filelist, storagepath):
|
2017-04-20 09:33:06 +03:00
|
|
|
self.repo = repo
|
2017-04-25 17:29:39 +03:00
|
|
|
self.client = client
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
self.changelists = sorted(changelists)
|
|
|
|
self.filelist = filelist
|
|
|
|
self.storagepath = storagepath
|
|
|
|
|
|
|
|
def linkrev(self, cl):
|
|
|
|
return self._linkrevmap[cl]
|
|
|
|
|
|
|
|
@util.propertycache
|
|
|
|
def _linkrevmap(self):
|
2017-04-20 09:33:06 +03:00
|
|
|
start = len(self.repo)
|
|
|
|
return {c.cl: idx + start for idx, c in enumerate(self.changelists)}
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
@util.propertycache
|
|
|
|
def caseconflicts(self):
|
|
|
|
return caseconflict(self.filelist)
|
|
|
|
|
|
|
|
def filelogs(self):
|
2017-05-02 06:43:11 +03:00
|
|
|
for filelog in p4.parse_filelogs(self.repo.ui, self.client,
|
|
|
|
self.changelists, self.filelist):
|
2017-05-02 06:43:11 +03:00
|
|
|
yield filelog
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
class ChangeManifestImporter(object):
|
|
|
|
def __init__(self, ui, repo, importset):
|
|
|
|
self._ui = ui
|
|
|
|
self._repo = repo
|
|
|
|
self._importset = importset
|
|
|
|
|
|
|
|
@util.propertycache
|
|
|
|
def usermap(self):
|
|
|
|
m = {}
|
|
|
|
for user in p4.parse_usermap():
|
|
|
|
m[user['User']] = '%s <%s>' % (user['FullName'], user['Email'])
|
|
|
|
return m
|
|
|
|
|
2017-04-20 09:33:06 +03:00
|
|
|
def create(self, *args, **kwargs):
|
|
|
|
return list(self.creategen(*args, **kwargs))
|
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
def creategen(self, tr, fileinfo):
|
2017-04-20 09:33:06 +03:00
|
|
|
mrevlog = self._repo.manifestlog._revlog
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
clog = self._repo.changelog
|
2017-04-20 09:33:06 +03:00
|
|
|
cp1 = self._repo['tip'].node()
|
|
|
|
cp2 = nullid
|
2017-05-02 06:43:11 +03:00
|
|
|
p1 = self._repo[cp1]
|
|
|
|
mp1 = p1.manifestnode()
|
|
|
|
mp2 = nullid
|
|
|
|
mf = p1.manifest().copy()
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
for i, change in enumerate(self._importset.changelists):
|
2017-04-20 09:33:06 +03:00
|
|
|
# invalidate caches so that the lookup works
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
self._ui.progress(_('importing change'), pos=i, item=change,
|
|
|
|
unit='changes', total=len(self._importset.changelists))
|
|
|
|
|
|
|
|
added, modified, removed = change.files
|
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
changed = set()
|
|
|
|
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
# generate manifest mappings of filenames to filenodes
|
2017-04-25 17:29:39 +03:00
|
|
|
for depotname in removed:
|
|
|
|
if depotname not in self._importset.filelist:
|
|
|
|
continue
|
|
|
|
localname = fileinfo[depotname]['localname']
|
|
|
|
if localname in mf:
|
|
|
|
changed.add(localname)
|
|
|
|
del mf[localname]
|
|
|
|
|
|
|
|
for depotname in added + modified:
|
|
|
|
if depotname not in self._importset.filelist:
|
|
|
|
continue
|
|
|
|
info = fileinfo[depotname]
|
|
|
|
localname, baserev = info['localname'], info['baserev']
|
|
|
|
hgfilelog = self._repo.file(localname)
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
try:
|
2017-04-25 17:29:39 +03:00
|
|
|
mf[localname] = hgfilelog.node(baserev)
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
except (error.LookupError, IndexError):
|
2017-04-25 17:29:39 +03:00
|
|
|
raise error.Abort("can't find rev %d for %s cl %d" % (
|
|
|
|
baserev, localname, change.cl))
|
|
|
|
changed.add(localname)
|
|
|
|
if change.cl in info['flags']:
|
|
|
|
mf.setflag(localname, info['flags'][change.cl])
|
|
|
|
fileinfo[depotname]['baserev'] += 1
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
2017-04-20 09:33:06 +03:00
|
|
|
linkrev = self._importset.linkrev(change.cl)
|
2017-05-02 06:43:11 +03:00
|
|
|
oldmp1 = mp1
|
2017-04-20 09:33:06 +03:00
|
|
|
mp1 = mrevlog.addrevision(mf.text(mrevlog._usemanifestv2), tr,
|
2017-05-02 06:43:11 +03:00
|
|
|
linkrev, mp1, mp2)
|
2017-04-20 09:33:06 +03:00
|
|
|
self._ui.debug('changelist %d: writing manifest. '
|
|
|
|
'node: %s p1: %s p2: %s linkrev: %d\n' % (
|
2017-05-02 06:43:11 +03:00
|
|
|
change.cl, short(mp1), short(oldmp1), short(mp2), linkrev))
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
desc = change.parsed['desc']
|
|
|
|
if desc == '':
|
|
|
|
desc = '** empty changelist description **'
|
|
|
|
desc = desc.decode('ascii', 'ignore')
|
|
|
|
|
|
|
|
shortdesc = desc.splitlines()[0]
|
|
|
|
username = change.parsed['user']
|
2017-05-03 01:43:39 +03:00
|
|
|
username = self.usermap.get(username, username)
|
2017-04-20 09:33:06 +03:00
|
|
|
self._ui.debug('changelist %d: writing changelog: %s\n' % (
|
|
|
|
change.cl, shortdesc))
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
cp1 = clog.add(
|
|
|
|
mp1,
|
2017-04-25 17:29:39 +03:00
|
|
|
changed,
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
desc,
|
|
|
|
tr,
|
|
|
|
cp1,
|
|
|
|
cp2,
|
|
|
|
user=username,
|
|
|
|
date=(change.parsed['time'], 0),
|
|
|
|
extra={'p4changelist': change.cl})
|
2017-04-20 09:33:06 +03:00
|
|
|
yield change.cl, cp1
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
self._ui.progress(_('importing change'), pos=None)
|
|
|
|
|
|
|
|
class RCSImporter(collections.Mapping):
|
|
|
|
def __init__(self, path):
|
|
|
|
self._path = path
|
|
|
|
|
|
|
|
@property
|
|
|
|
def rcspath(self):
|
|
|
|
return '%s,v' % self._path
|
|
|
|
|
|
|
|
def __getitem__(self, rev):
|
|
|
|
if rev in self.revisions:
|
|
|
|
return self.content(rev)
|
|
|
|
return IndexError
|
|
|
|
|
|
|
|
def __len__(self):
|
|
|
|
return len(self.revisions)
|
|
|
|
|
|
|
|
def __iter__(self):
|
|
|
|
for r in self.revisions:
|
|
|
|
yield r, self[r]
|
|
|
|
|
|
|
|
def content(self, rev):
|
|
|
|
text = None
|
|
|
|
if os.path.isfile(self.rcspath):
|
|
|
|
cmd = 'co -q -p1.%d %s' % (rev, util.shellquote(self.rcspath))
|
|
|
|
with util.popen(cmd, mode='rb') as fp:
|
|
|
|
text = fp.read()
|
|
|
|
return text
|
|
|
|
|
|
|
|
@util.propertycache
|
|
|
|
def revisions(self):
|
|
|
|
revs = set()
|
|
|
|
if os.path.isfile(self.rcspath):
|
2017-05-11 23:45:25 +03:00
|
|
|
stdout = util.popen('rlog %s 2>%s'
|
|
|
|
% (util.shellquote(self.rcspath), os.devnull),
|
|
|
|
mode='rb')
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
for l in stdout.readlines():
|
|
|
|
m = re.match('revision 1.(\d+)', l)
|
|
|
|
if m:
|
|
|
|
revs.add(int(m.group(1)))
|
|
|
|
return revs
|
|
|
|
|
|
|
|
T_FLAT, T_GZIP = 1, 2
|
|
|
|
|
|
|
|
class FlatfileImporter(collections.Mapping):
|
|
|
|
def __init__(self, path):
|
|
|
|
self._path = path
|
|
|
|
|
|
|
|
@property
|
|
|
|
def dirpath(self):
|
|
|
|
return '%s,d' % self._path
|
|
|
|
|
|
|
|
def __len__(self):
|
|
|
|
return len(self.revisions)
|
|
|
|
|
|
|
|
def __iter__(self):
|
|
|
|
for r in self.revisions:
|
|
|
|
yield r, self[r]
|
|
|
|
|
|
|
|
def filepath(self, rev):
|
|
|
|
flat = '%s/1.%d' % (self.dirpath, rev)
|
|
|
|
gzip = '%s/1.%d.gz' % (self.dirpath, rev)
|
|
|
|
if os.path.exists(flat):
|
|
|
|
return flat, T_FLAT
|
|
|
|
if os.path.exists(gzip):
|
|
|
|
return gzip, T_GZIP
|
|
|
|
return None, None
|
|
|
|
|
|
|
|
def __getitem__(self, rev):
|
|
|
|
text = self.content(rev)
|
|
|
|
if text is None:
|
|
|
|
raise IndexError
|
|
|
|
return text
|
|
|
|
|
|
|
|
@util.propertycache
|
|
|
|
def revisions(self):
|
|
|
|
revs = set()
|
|
|
|
if os.path.isdir(self.dirpath):
|
|
|
|
for name in os.listdir(self.dirpath):
|
|
|
|
revs.add(int(name.split('.')[1]))
|
|
|
|
return revs
|
|
|
|
|
|
|
|
def content(self, rev):
|
|
|
|
path, type = self.filepath(rev)
|
|
|
|
text = None
|
|
|
|
if type == T_GZIP:
|
|
|
|
with gzip.open(path, 'rb') as fp:
|
|
|
|
text = fp.read()
|
|
|
|
if type == T_FLAT:
|
|
|
|
with open(path, 'rb') as fp:
|
|
|
|
text = fp.read()
|
|
|
|
return text
|
|
|
|
|
|
|
|
class P4FileImporter(collections.Mapping):
|
|
|
|
"""Read a file from Perforce in case we cannot find it locally, in
|
|
|
|
particular when there was branch or a rename"""
|
2017-04-20 09:33:06 +03:00
|
|
|
def __init__(self, p4filelog):
|
|
|
|
self._p4filelog = p4filelog # type: p4.P4Filelog
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
def __len__(self):
|
|
|
|
return len(self.revisions)
|
|
|
|
|
|
|
|
def __iter__(self):
|
|
|
|
for r in self.revisions:
|
|
|
|
yield r, self[r]
|
|
|
|
|
|
|
|
def __getitem__(self, rev):
|
|
|
|
text = self.content(rev)
|
|
|
|
if text is None:
|
|
|
|
raise IndexError
|
|
|
|
return text
|
|
|
|
|
|
|
|
@util.propertycache
|
|
|
|
def revisions(self):
|
2017-04-20 09:33:06 +03:00
|
|
|
return self._p4filelog.revisions
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
def content(self, clnum):
|
2017-04-20 09:33:06 +03:00
|
|
|
return p4.get_file(self._p4filelog.depotfile, clnum=clnum)
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
class FileImporter(object):
|
2017-04-20 09:33:06 +03:00
|
|
|
def __init__(self, ui, repo, importset, p4filelog):
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
self._ui = ui
|
|
|
|
self._repo = repo
|
2017-04-20 09:33:06 +03:00
|
|
|
self._importset = importset
|
2017-04-20 09:33:06 +03:00
|
|
|
self._p4filelog = p4filelog # type: p4.P4Filelog
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
2017-05-16 22:36:23 +03:00
|
|
|
@util.propertycache
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
def relpath(self):
|
2017-04-25 17:29:39 +03:00
|
|
|
client = self._importset.client
|
|
|
|
where = p4.parse_where(client, self.depotfile)
|
|
|
|
return where['clientFile'].replace('//%s/' % client, '')
|
2017-04-20 09:33:06 +03:00
|
|
|
|
|
|
|
@property
|
|
|
|
def depotfile(self):
|
|
|
|
return self._p4filelog.depotfile
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
@util.propertycache
|
|
|
|
def storepath(self):
|
2017-04-25 17:29:39 +03:00
|
|
|
path = os.path.join(self._importset.storagepath,
|
|
|
|
localpath(self.depotfile))
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
if p4.config('caseHandling') == 'insensitive':
|
|
|
|
return path.lower()
|
|
|
|
return path
|
|
|
|
|
2017-04-20 09:33:06 +03:00
|
|
|
def hgfilelog(self):
|
|
|
|
return self._repo.file(self.relpath)
|
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
def findlfs(self):
|
|
|
|
try:
|
|
|
|
return extensions.find('lfs')
|
|
|
|
except KeyError:
|
|
|
|
pass
|
|
|
|
return None
|
|
|
|
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
def create(self, tr, copy_tracer=None):
|
|
|
|
assert tr is not None
|
2017-04-20 09:33:06 +03:00
|
|
|
p4fi = P4FileImporter(self._p4filelog)
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
rcs = RCSImporter(self.storepath)
|
|
|
|
flat = FlatfileImporter(self.storepath)
|
|
|
|
local_revs = rcs.revisions | flat.revisions
|
|
|
|
|
2017-05-02 06:43:11 +03:00
|
|
|
revs = set()
|
2017-04-20 09:33:06 +03:00
|
|
|
for c in self._importset.changelists:
|
2017-04-20 09:33:06 +03:00
|
|
|
if c.cl in p4fi.revisions and not self._p4filelog.isdeleted(c.cl):
|
2017-05-02 06:43:11 +03:00
|
|
|
revs.add(c)
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
fileflags = collections.defaultdict(dict)
|
|
|
|
lastlinkrev = 0
|
2017-04-20 09:33:06 +03:00
|
|
|
|
|
|
|
hgfilelog = self.hgfilelog()
|
|
|
|
origlen = len(hgfilelog)
|
2017-04-25 17:29:39 +03:00
|
|
|
largefiles = []
|
2017-04-25 17:29:39 +03:00
|
|
|
lfsext = self.findlfs()
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
for c in sorted(revs):
|
2017-04-20 09:33:06 +03:00
|
|
|
linkrev = self._importset.linkrev(c.cl)
|
2017-04-20 09:33:06 +03:00
|
|
|
fparent1, fparent2 = nullid, nullid
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
# invariant: our linkrevs do not criss-cross. They are monotonically
|
|
|
|
# increasing. See https://www.mercurial-scm.org/wiki/CrossedLinkrevs
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
assert linkrev >= lastlinkrev
|
|
|
|
lastlinkrev = linkrev
|
|
|
|
|
2017-04-20 09:33:06 +03:00
|
|
|
if len(hgfilelog) > 0:
|
|
|
|
fparent1 = hgfilelog.tip()
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
|
|
|
# select the content
|
|
|
|
text = None
|
|
|
|
if c.origcl in local_revs:
|
|
|
|
if c.origcl in rcs.revisions:
|
|
|
|
text, src = rcs.content(c.origcl), 'rcs'
|
|
|
|
elif c.origcl in flat.revisions:
|
|
|
|
text, src = flat.content(c.origcl), 'gzip'
|
|
|
|
elif c.cl in p4fi.revisions:
|
|
|
|
text, src = p4fi.content(c.cl), 'p4'
|
|
|
|
if text is None:
|
|
|
|
raise error.Abort('error generating file content %d %s' % (
|
|
|
|
c.cl, self.relpath))
|
|
|
|
|
|
|
|
meta = {}
|
2017-04-20 09:33:06 +03:00
|
|
|
if self._p4filelog.isexec(c.cl):
|
2017-04-25 17:29:39 +03:00
|
|
|
fileflags[c.cl] = 'x'
|
2017-04-20 09:33:06 +03:00
|
|
|
if self._p4filelog.issymlink(c.cl):
|
2017-04-25 17:29:39 +03:00
|
|
|
fileflags[c.cl] = 'l'
|
2017-04-20 09:33:06 +03:00
|
|
|
if self._p4filelog.iskeyworded(c.cl):
|
2017-06-19 21:54:10 +03:00
|
|
|
text = re.sub(KEYWORD_REGEX, r'$\1$', text)
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
node = hgfilelog.add(text, meta, tr, linkrev, fparent1, fparent2)
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
self._ui.debug(
|
|
|
|
'writing filelog: %s, p1 %s, linkrev %d, %d bytes, src: %s, '
|
2017-04-25 17:29:39 +03:00
|
|
|
'path: %s\n' % (short(node), short(fparent1), linkrev,
|
p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.
The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.
We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.
We then generate the changelog.
Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch). We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.
Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.
Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`
make tests
Reviewers: #idi, quark, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4776651
Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
|
|
|
len(text), src, self.relpath))
|
2017-04-20 09:33:06 +03:00
|
|
|
|
2017-04-25 17:29:39 +03:00
|
|
|
if lfsext and lfsext.wrapper._islfs(hgfilelog, node):
|
|
|
|
lfspointer = lfsext.pointer.deserialize(
|
|
|
|
hgfilelog.revision(node, raw=True))
|
2017-05-12 03:57:42 +03:00
|
|
|
oid = lfspointer.oid()
|
2017-05-12 03:48:21 +03:00
|
|
|
largefiles.append((c.cl, self.depotfile, oid))
|
|
|
|
self._ui.debug('largefile: %s, oid: %s\n' % (self.relpath, oid))
|
2017-04-25 17:29:39 +03:00
|
|
|
|
2017-04-20 09:33:06 +03:00
|
|
|
newlen = len(hgfilelog)
|
2017-04-25 17:29:39 +03:00
|
|
|
return fileflags, largefiles, origlen, newlen
|