sapling/hgext3rd/p4fastimport/importer.py

359 lines
11 KiB
Python
Raw Normal View History

p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
# (c) 2017-present Facebook Inc.
from __future__ import absolute_import
import collections
import gzip
import os
import re
from mercurial.i18n import _
from mercurial.node import nullid, short
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
from mercurial import (
error,
extensions,
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
util,
)
from . import p4
from .util import caseconflict, localpath
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
KEYWORD_REGEX = "\$(Id|Header|DateTime|" + \
"Date|Change|File|" + \
"Revision|Author).*?\$"
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
class ImportSet(object):
def __init__(self, repo, client, changelists, filelist, storagepath):
self.repo = repo
self.client = client
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
self.changelists = sorted(changelists)
self.filelist = filelist
self.storagepath = storagepath
def linkrev(self, cl):
return self._linkrevmap[cl]
@util.propertycache
def _linkrevmap(self):
start = len(self.repo)
return {c.cl: idx + start for idx, c in enumerate(self.changelists)}
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
@util.propertycache
def caseconflicts(self):
return caseconflict(self.filelist)
def filelogs(self):
for filelog in p4.parse_filelogs(self.repo.ui, self.client,
self.changelists, self.filelist):
yield filelog
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
class ChangeManifestImporter(object):
def __init__(self, ui, repo, importset):
self._ui = ui
self._repo = repo
self._importset = importset
@util.propertycache
def usermap(self):
m = {}
for user in p4.parse_usermap():
m[user['User']] = '%s <%s>' % (user['FullName'], user['Email'])
return m
def create(self, *args, **kwargs):
return list(self.creategen(*args, **kwargs))
def creategen(self, tr, fileinfo):
mrevlog = self._repo.manifestlog._revlog
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
clog = self._repo.changelog
cp1 = self._repo['tip'].node()
cp2 = nullid
p1 = self._repo[cp1]
mp1 = p1.manifestnode()
mp2 = nullid
mf = p1.manifest().copy()
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
for i, change in enumerate(self._importset.changelists):
# invalidate caches so that the lookup works
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
self._ui.progress(_('importing change'), pos=i, item=change,
unit='changes', total=len(self._importset.changelists))
added, modified, removed = change.files
changed = set()
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
# generate manifest mappings of filenames to filenodes
for depotname in removed:
if depotname not in self._importset.filelist:
continue
localname = fileinfo[depotname]['localname']
if localname in mf:
changed.add(localname)
del mf[localname]
for depotname in added + modified:
if depotname not in self._importset.filelist:
continue
info = fileinfo[depotname]
localname, baserev = info['localname'], info['baserev']
hgfilelog = self._repo.file(localname)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
try:
mf[localname] = hgfilelog.node(baserev)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
except (error.LookupError, IndexError):
raise error.Abort("can't find rev %d for %s cl %d" % (
baserev, localname, change.cl))
changed.add(localname)
if change.cl in info['flags']:
mf.setflag(localname, info['flags'][change.cl])
fileinfo[depotname]['baserev'] += 1
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
linkrev = self._importset.linkrev(change.cl)
oldmp1 = mp1
mp1 = mrevlog.addrevision(mf.text(mrevlog._usemanifestv2), tr,
linkrev, mp1, mp2)
self._ui.debug('changelist %d: writing manifest. '
'node: %s p1: %s p2: %s linkrev: %d\n' % (
change.cl, short(mp1), short(oldmp1), short(mp2), linkrev))
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
desc = change.parsed['desc']
if desc == '':
desc = '** empty changelist description **'
desc = desc.decode('ascii', 'ignore')
shortdesc = desc.splitlines()[0]
username = change.parsed['user']
username = self.usermap.get(username, username)
self._ui.debug('changelist %d: writing changelog: %s\n' % (
change.cl, shortdesc))
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
cp1 = clog.add(
mp1,
changed,
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
desc,
tr,
cp1,
cp2,
user=username,
date=(change.parsed['time'], 0),
extra={'p4changelist': change.cl})
yield change.cl, cp1
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
self._ui.progress(_('importing change'), pos=None)
class RCSImporter(collections.Mapping):
def __init__(self, path):
self._path = path
@property
def rcspath(self):
return '%s,v' % self._path
def __getitem__(self, rev):
if rev in self.revisions:
return self.content(rev)
return IndexError
def __len__(self):
return len(self.revisions)
def __iter__(self):
for r in self.revisions:
yield r, self[r]
def content(self, rev):
text = None
if os.path.isfile(self.rcspath):
cmd = 'co -q -p1.%d %s' % (rev, util.shellquote(self.rcspath))
with util.popen(cmd, mode='rb') as fp:
text = fp.read()
return text
@util.propertycache
def revisions(self):
revs = set()
if os.path.isfile(self.rcspath):
stdout = util.popen('rlog %s 2>%s'
% (util.shellquote(self.rcspath), os.devnull),
mode='rb')
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
for l in stdout.readlines():
m = re.match('revision 1.(\d+)', l)
if m:
revs.add(int(m.group(1)))
return revs
T_FLAT, T_GZIP = 1, 2
class FlatfileImporter(collections.Mapping):
def __init__(self, path):
self._path = path
@property
def dirpath(self):
return '%s,d' % self._path
def __len__(self):
return len(self.revisions)
def __iter__(self):
for r in self.revisions:
yield r, self[r]
def filepath(self, rev):
flat = '%s/1.%d' % (self.dirpath, rev)
gzip = '%s/1.%d.gz' % (self.dirpath, rev)
if os.path.exists(flat):
return flat, T_FLAT
if os.path.exists(gzip):
return gzip, T_GZIP
return None, None
def __getitem__(self, rev):
text = self.content(rev)
if text is None:
raise IndexError
return text
@util.propertycache
def revisions(self):
revs = set()
if os.path.isdir(self.dirpath):
for name in os.listdir(self.dirpath):
revs.add(int(name.split('.')[1]))
return revs
def content(self, rev):
path, type = self.filepath(rev)
text = None
if type == T_GZIP:
with gzip.open(path, 'rb') as fp:
text = fp.read()
if type == T_FLAT:
with open(path, 'rb') as fp:
text = fp.read()
return text
class P4FileImporter(collections.Mapping):
"""Read a file from Perforce in case we cannot find it locally, in
particular when there was branch or a rename"""
def __init__(self, p4filelog):
self._p4filelog = p4filelog # type: p4.P4Filelog
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
def __len__(self):
return len(self.revisions)
def __iter__(self):
for r in self.revisions:
yield r, self[r]
def __getitem__(self, rev):
text = self.content(rev)
if text is None:
raise IndexError
return text
@util.propertycache
def revisions(self):
return self._p4filelog.revisions
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
def content(self, clnum):
return p4.get_file(self._p4filelog.depotfile, clnum=clnum)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
class FileImporter(object):
def __init__(self, ui, repo, importset, p4filelog):
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
self._ui = ui
self._repo = repo
self._importset = importset
self._p4filelog = p4filelog # type: p4.P4Filelog
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
@util.propertycache
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
def relpath(self):
client = self._importset.client
where = p4.parse_where(client, self.depotfile)
return where['clientFile'].replace('//%s/' % client, '')
@property
def depotfile(self):
return self._p4filelog.depotfile
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
@util.propertycache
def storepath(self):
path = os.path.join(self._importset.storagepath,
localpath(self.depotfile))
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
if p4.config('caseHandling') == 'insensitive':
return path.lower()
return path
def hgfilelog(self):
return self._repo.file(self.relpath)
def findlfs(self):
try:
return extensions.find('lfs')
except KeyError:
pass
return None
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
def create(self, tr, copy_tracer=None):
assert tr is not None
p4fi = P4FileImporter(self._p4filelog)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
rcs = RCSImporter(self.storepath)
flat = FlatfileImporter(self.storepath)
local_revs = rcs.revisions | flat.revisions
revs = set()
for c in self._importset.changelists:
if c.cl in p4fi.revisions and not self._p4filelog.isdeleted(c.cl):
revs.add(c)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
fileflags = collections.defaultdict(dict)
lastlinkrev = 0
hgfilelog = self.hgfilelog()
origlen = len(hgfilelog)
largefiles = []
lfsext = self.findlfs()
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
for c in sorted(revs):
linkrev = self._importset.linkrev(c.cl)
fparent1, fparent2 = nullid, nullid
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
# invariant: our linkrevs do not criss-cross. They are monotonically
# increasing. See https://www.mercurial-scm.org/wiki/CrossedLinkrevs
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
assert linkrev >= lastlinkrev
lastlinkrev = linkrev
if len(hgfilelog) > 0:
fparent1 = hgfilelog.tip()
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
# select the content
text = None
if c.origcl in local_revs:
if c.origcl in rcs.revisions:
text, src = rcs.content(c.origcl), 'rcs'
elif c.origcl in flat.revisions:
text, src = flat.content(c.origcl), 'gzip'
elif c.cl in p4fi.revisions:
text, src = p4fi.content(c.cl), 'p4'
if text is None:
raise error.Abort('error generating file content %d %s' % (
c.cl, self.relpath))
meta = {}
if self._p4filelog.isexec(c.cl):
fileflags[c.cl] = 'x'
if self._p4filelog.issymlink(c.cl):
fileflags[c.cl] = 'l'
if self._p4filelog.iskeyworded(c.cl):
text = re.sub(KEYWORD_REGEX, r'$\1$', text)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
node = hgfilelog.add(text, meta, tr, linkrev, fparent1, fparent2)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
self._ui.debug(
'writing filelog: %s, p1 %s, linkrev %d, %d bytes, src: %s, '
'path: %s\n' % (short(node), short(fparent1), linkrev,
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
len(text), src, self.relpath))
if lfsext and lfsext.wrapper._islfs(hgfilelog, node):
lfspointer = lfsext.pointer.deserialize(
hgfilelog.revision(node, raw=True))
oid = lfspointer.oid()
largefiles.append((c.cl, self.depotfile, oid))
self._ui.debug('largefile: %s, oid: %s\n' % (self.relpath, oid))
newlen = len(hgfilelog)
return fileflags, largefiles, origlen, newlen