sapling/hgext3rd/p4fastimport/p4.py

339 lines
9.7 KiB
Python
Raw Normal View History

p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
# (c) 2017-present Facebook Inc.
from __future__ import absolute_import
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
import collections
import contextlib
import json
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
import marshal
import re
import time
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
from .util import runworker
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
from mercurial import (
util,
)
class P4Exception(Exception):
pass
def loaditer(f):
"Yield the dictionary objects generated by p4"
try:
while True:
d = marshal.load(f)
if not d:
break
yield d
except EOFError:
pass
def revrange(start=None, end=None):
"""Returns a revrange to filter a Perforce path. If start and end are None
we return an empty string as lookups without a revrange filter are much
faster in Perforce"""
revrange = ""
if end is not None or start is not None:
start = '0' if start is None else str(start)
end = '#head' if end is None else str(end)
revrange = "@%s,%s" % (start, end)
return revrange
def parse_info():
cmd = 'p4 -ztag -G info'
stdout = util.popen(cmd, mode='rb')
return marshal.load(stdout)
_config = None
def config(key):
global _config
if _config is None:
_config = parse_info()
return _config[key]
@contextlib.contextmanager
def retries(num=3, sleeps=0.3):
for _try in range(1, num + 1):
try:
yield
return
except Exception:
if _try == num:
raise
time.sleep(sleeps)
def parse_changes(client, startcl=None, endcl=None):
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
"Read changes affecting the path"
cmd = 'p4 --client %s -ztag -G changes -s submitted //%s/...%s' % (
util.shellquote(client),
util.shellquote(client),
revrange(startcl, endcl))
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
stdout = util.popen(cmd, mode='rb')
for d in loaditer(stdout):
c = d.get("change", None)
oc = d.get("oldChange", None)
if oc:
yield P4Changelist(int(oc), int(c))
elif c:
yield P4Changelist(int(c), int(c))
def parse_filelist(client, startcl=None, endcl=None):
if startcl is None:
startcl = 0
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
cmd = 'p4 --client %s -G files -a //%s/...%s' % (
util.shellquote(client),
util.shellquote(client),
revrange(startcl, endcl))
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
stdout = util.popen(cmd, mode='rb')
for d in loaditer(stdout):
c = d.get('depotFile', None)
if c:
yield d
def parse_where(client, depotname):
# TODO: investigate if we replace this with exactly one call to
# where //clientame/...
cmd = 'p4 --client %s -G where %s' % (
util.shellquote(client),
util.shellquote(depotname))
try:
with retries(num=3, sleeps=0.3):
stdout = util.popen(cmd, mode='rb')
return marshal.load(stdout)
except Exception:
raise P4Exception(stdout)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
def get_file(path, rev=None, clnum=None):
"""Returns a file from Perforce"""
r = '#head'
if rev:
r = '#%d' % rev
if clnum:
r = '@%d' % clnum
cmd = 'p4 print -q %s%s' % (util.shellquote(path), r)
with retries(num=5, sleeps=0.3):
stdout = util.popen(cmd, mode='rb')
content = stdout.read()
return content
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
def parse_cl(clnum):
"""Returns a description of a change given by the clnum. CLnum can be an
original CL before renaming"""
cmd = 'p4 -ztag -G describe -O %d' % clnum
try:
with retries(num=3, sleeps=0.3):
stdout = util.popen(cmd, mode='rb')
return marshal.load(stdout)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
except Exception:
raise P4Exception(stdout)
def parse_usermap():
cmd = 'p4 -G users'
stdout = util.popen(cmd, mode='rb')
try:
for d in loaditer(stdout):
if d.get('User'):
yield d
except Exception:
raise P4Exception(stdout)
def parse_client(client):
cmd = 'p4 -G client -o %s' % util.shellquote(client)
try:
with retries(num=3, sleeps=0.3):
stdout = util.popen(cmd, mode='rb')
clientspec = marshal.load(stdout)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
except Exception:
raise P4Exception(stdout)
views = {}
for client in clientspec:
if client.startswith("View"):
sview, cview = clientspec[client].split()
# XXX: use a regex for this
cview = cview.lstrip('/') # remove leading // from the local path
cview = cview[cview.find("/") + 1:] # remove the clientname part
views[sview] = cview
return views
def parse_fstat(clnum, client, filter=None):
cmd = 'p4 --client %s -G fstat -e %d -T ' \
'"depotFile,headAction,headType,headRev" "//%s/..."' % (
util.shellquote(client),
clnum,
util.shellquote(client))
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
stdout = util.popen(cmd, mode='rb')
try:
result = []
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
for d in loaditer(stdout):
if d.get('depotFile') and (filter is None or filter(d)):
if d['headAction'] in ACTION_ARCHIVE:
continue
result.append({
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
'depotFile': d['depotFile'],
'action': d['headAction'],
'type': d['headType'],
})
return result
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
except Exception:
raise P4Exception(stdout)
def parse_filelog(filelist, client, changelists):
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
for cl in changelists:
fstats = parse_fstat(cl.cl, client,
lambda f: f['depotFile'] in filelist)
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
for fstat in fstats:
yield cl.cl, json.dumps(fstat)
def parse_filelogs(ui, client, changelists, filelist):
# we can probably optimize this by using fstat only in the case-inensitive
# case and only for conflicts.
filelogs = collections.defaultdict(dict)
worker = runworker(ui, parse_filelog, (filelist, client), changelists)
for cl, jfstat in worker:
fstat = json.loads(jfstat)
depotfile = fstat['depotFile'].encode('ascii')
filelogs[depotfile][cl] = {
'action': fstat['action'].encode('ascii'),
'type': fstat['type'].encode('ascii'),
}
for p4filename, filelog in filelogs.iteritems():
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
yield P4Filelog(p4filename, filelog)
class P4Filelog(object):
def __init__(self, depotfile, data):
self._data = data
self._depotfile = depotfile
# @property
# def branchcl(self):
# return self._parsed[1]
#
# @property
# def branchsource(self):
# if self.branchcl:
# return self.parsed[self.branchcl]['from']
# return None
#
# @property
# def branchrev(self):
# if self.branchcl:
# return self.parsed[self.branchcl]['rev']
# return None
def __cmp__(self, other):
return (self.depotfile > other.depotfile) - (self.depotfile <
other.depotfile)
@property
def depotfile(self):
return self._depotfile
@property
def revisions(self):
return sorted(self._data.keys())
def isdeleted(self, clnum):
return self._data[clnum]['action'] in ['move/delete', 'delete']
def isexec(self, clnum):
t = self._data[clnum]['type']
return 'xtext' == t or '+x' in t
def issymlink(self, clnum):
t = self._data[clnum]['type']
return 'symlink' in t
def iskeyworded(self, clnum):
t = self._data[clnum]['type']
return (re.compile('kx?text').match(t) or
re.compile('\+kx?').search(t)) is not None
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
ACTION_EDIT = ['edit', 'integrate']
ACTION_ADD = ['add', 'branch', 'move/add']
ACTION_DELETE = ['delete', 'move/delete']
ACTION_ARCHIVE = ['archive']
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
SUPPORTED_ACTIONS = ACTION_EDIT + ACTION_ADD + ACTION_DELETE
class P4Changelist(object):
def __init__(self, origclnum, clnum):
self._clnum = clnum
self._origclnum = origclnum
def __repr__(self):
return '<P4Changelist %d>' % self._clnum
@property
def cl(self):
return self._clnum
@property
def origcl(self):
return self._origclnum
def __cmp__(self, other):
return (self.cl > other.cl) - (self.cl < other.cl)
def __hash__(self):
"""Ensure we are matching changelist numbers in sets and hashtables,
which the importer uses to ensure uniqueness of an imported changeset"""
return hash((self.origcl, self.cl))
@util.propertycache
def parsed(self):
return self.load()
def load(self):
"""Parse perforces awkward format"""
files = {}
info = parse_cl(self._clnum)
i = 0
while True:
fidx = 'depotFile%d' % i
aidx = 'action%d' % i
ridx = 'rev%d' % i
#XXX: Handle oldChange vs change
if fidx not in info:
break
files[info[fidx]] = {
'rev': int(info[ridx]),
'action': info[aidx],
}
i += 1
return {
'files': files,
'desc': info['desc'],
'user': info['user'],
'time': int(info['time']),
}
def rev(self, fname):
return self.parsed['files'][fname]['rev']
@property
def files(self):
"""Returns added, modified and removed files for a changelist.
The current mapping is:
Mercurial | Perforce
---------------------
add | add, branch, move/add
modified | edit, integrate
removed | delete, move/delte, archive
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
"""
a, m, r = [], [], []
for fname, info in self.parsed['files'].iteritems():
if info['action'] in ACTION_EDIT:
m.append(fname)
elif info['action'] in ACTION_ADD:
a.append(fname)
elif info['action'] in ACTION_DELETE + ACTION_ARCHIVE:
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with *fncache* disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 21:11:09 +03:00
r.append(fname)
else:
assert False
return a, m, r