Commit Graph

16 Commits

Author SHA1 Message Date
David Soria Parra
c428bfdaa8 p4fastimport: reuse the manifestnode we generated in previous iteration
Summary:
We are generating multiple manifests in memory. When we are trying to
read the previous manifest and doing a changelog lookup, we might trigger a bug
in Mercurial. Mercurial tries to use the 00changelog.d file instead of the .i file once it
reached a certain size. As this is all happening in one big transaction,
Mercruial is supposed to read this from in-memory, but this is broken atm. We
can circumvent this bug by reusing the manifestnode that we generated in the
previous iteration. This is more effective anyway.

Test Plan:
1. test import of ovrsource
2. rt test-p4* test-check*

Reviewers: durham, quark

Reviewed By: quark

Subscribers: quark, #idi, mjpieters, #sourcecontrol

Differential Revision: https://phabricator.intern.facebook.com/D4980601

Signature: t1:4980601:1493689852:889a661c6fe606119247c8261dc567e7d361dacb
2017-05-01 20:43:11 -07:00
David Soria Parra
305a284a00 p4fastimport: generate filelogs using fstat concurrently
Summary:
Generating case-correct filelogs using fstat leads to O(changelists)
calls to Perforce (and overall complexity of O(changelists*number of files),
which is slow. We want to run this using workers.

Test Plan: rt test-p4* test-check*

Reviewers: #sourcecontrol, quark

Reviewed By: quark

Subscribers: quark, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4963767

Signature: t1:4963767:1493349047:3eaddf6a3bb2ee06decaac48980c69b8645ebbed
2017-05-01 20:43:11 -07:00
David Soria Parra
fc6dc3be52 p4fastimport: adding debug message to filelog loading
Summary: Add debug output while loading filelogs when --debug is passed.

Test Plan: rt test-check* test-p4*

Reviewers: #sourcecontrol, #idi, ikostia

Reviewed By: ikostia

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4963656

Signature: t1:4963656:1493315731:f2b28dd06e4611d74e039d8b7805f72eb1bc16e4
2017-05-01 20:43:11 -07:00
David Soria Parra
757087ede8 p4fastimport: clarify criss-cross comment
Summary: Clarify the criss-cross comment and our monotonic property.

Test Plan: None

Reviewers: #sourcecontrol, #idi, ikostia

Reviewed By: ikostia

Subscribers: ikostia, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4921201

Signature: t1:4921201:1492696996:c2c8a179926649fac5009388d3c3054a627a9e10
2017-04-25 15:29:39 +01:00
David Soria Parra
78dcff9f00 p4fastimport: support client view
Summary:
Perforce client support a view. A view maps a server side path to a client side
path, e.g.: the view '//depot/A/B/... //myclient/foo/...' maps every file in
//depot/A/B/ on the server side to a local checkout foo/ inside the root for
checkouts defined for the client 'myclient'.
We are using §p4 where§ to support these mappings. We do the mapping inside the
FileImporter at the moment as this runs nicely in parallel. It's a bit hacky
but get's the job done. We use this mostly to ommit the common prefix
//depot/... and remove branch indicators such as Main.

So in our case a view looks like

  //depot/Software/OculusSDK/PC/Main... //client/Software/OculusSDK/PC/...

resulting in a file

  //depot/Software/OculusSDK/PC/Main/test.txt

being imported as

  Software/OculusSDK/PC/test.xt

Test Plan: rt test-p4*

Reviewers: #sourcecontrol, #idi, ikostia

Reviewed By: ikostia

Subscribers: ikostia, durham, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4913483

Signature: t1:4913483:1492702356:b97b691343b8a1d52940445934730b31d411db4c
2017-04-25 15:29:39 +01:00
David Soria Parra
2ab5a0baf8 p4fastimport: support for writing LFS metadata to sqlite
Summary:
We do not write the blobs to local cache anymore. We want our LFS
server to import them from Perforce directly or serve them from Perforce
directly. In order to do so, we need the correct mapping from oid to perforce
file + cl. This is generally useful metainformation that other LFS
implementation can use. We simple write the data to sqlite because it's simple
and built in.

Test Plan: rt test-p4*

Reviewers: #sourcecontrol, #idi, quark

Reviewed By: quark

Subscribers: quark, durham, wlis, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4913469

Signature: t1:4913469:1492796253:1e3b389c7cb0ba3acf9504410d267a1cf9651118
2017-04-25 15:29:39 +01:00
David Soria Parra
7d78fa39bf p4fastimport: initial support for writing lfs metadata
Summary:
Add a special mode to the importer that patches the LFS extension to
not write blobs to local disc. In our case we do have the files already in
Perforce and do not have to write them again to disk. This is currently breaking
verify and therefore we are patching verify.

Test Plan: rt test-p4*

Reviewers: #sourcecontrol, #idi, durham, quark

Reviewed By: quark

Subscribers: quark, durham, wlis, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4913455

Signature: t1:4913455:1492979419:204c1075376fe975ddea880b22e6984684e7ff25
2017-04-25 15:29:39 +01:00
Kostia Balytskyi
1aed2e20f9 p4importer: remove unused import
Summary: Unused import breaks tests. Can be added later when it is needed.

Test Plan: - rt

Reviewers: davidsp, #sourcecontrol

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4921372
2017-04-20 05:47:18 -07:00
David Soria Parra
44a0aabf25 p4fastimport: incremental imports
Summary:
Implement incremental imports.

1. Find the last imported perforce changelist.
2. Set startctx and use it in all importers
3. Import filelogs from their current position (we "should" add an additional check here, but we don't)
4. Import manifests and changelists.

Manifests are a bit tricky because we must obtain the original filelog revision
*BEFORE* we imported them, but manifest imports come "after". We could read the
most recent entry from manifests, but that won't cover the case in which files
are added. So instead we know the changelist that we are currently importing,
and looking for the rev with the correct linkrev in filelogs. That's a big ugly,
but it works. We could instead return the original offset from the worker and
pass it into the manifest importer, but I feel that is not much better and
evenutally more errorp rone.

Test Plan: cd tests && rt test-p4*

Reviewers: #sourcecontrol, durham

Reviewed By: durham

Subscribers: durham, mjpieters, #idi

Differential Revision: https://phabricator.intern.facebook.com/D4890110

Signature: t1:4890110:1492662991:0e141e62734e1224ac8e1c11f4e8794452455b18
2017-04-19 23:33:06 -07:00
David Soria Parra
11061394c4 p4fastimport: remove copytracing code
Summary: remove the unused copytracing code until we use it again.

Test Plan: rt test-p4*

Reviewers: #sourcecontrol, #idi, wlis

Reviewed By: wlis

Subscribers: wlis, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4913446

Signature: t1:4913446:1492624958:f6c083f3c64352f0a8e1172a1d6c6338ee86cd24
2017-04-19 23:33:06 -07:00
David Soria Parra
285ac17734 p4fastimport: access filelog by property
Summary: Just for simplicty, access the filelog by property

Test Plan: cd tests && rt test-p4*

Reviewers: #sourcecontrol, durham

Reviewed By: durham

Subscribers: durham, mjpieters, #idi

Differential Revision: https://phabricator.intern.facebook.com/D4890095

Signature: t1:4890095:1492535471:f4ee849c0769f5f7689391382ca7d09b9404a418
2017-04-19 23:33:06 -07:00
David Soria Parra
a557c1e9b2 p4fastimport: rename filelog to hgfilelog and p4filelog respectively
Summary: this should add clarity.

Test Plan: rt test-p4*

Reviewers: #sourcecontrol, #idi, wlis

Reviewed By: wlis

Subscribers: wlis, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4913444

Signature: t1:4913444:1492624927:76e7bb49424d46a3267cdad45d1222e1d3af5fd6
2017-04-19 23:33:06 -07:00
David Soria Parra
eebd897113 p4fastimport: _i -> _importset
Summary: Rename _i to _importset for clarity

Test Plan: rt test-p4*

Reviewers: #sourcecontrol, #idi, steaphan

Reviewed By: steaphan

Subscribers: steaphan, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4892613

Signature: t1:4892613:1492197969:4a48e47899aa986c37dd5bab9148dc7f214335c6
2017-04-19 23:33:06 -07:00
David Soria Parra
ad8ee588ed p4fastimport: more detailed debug output
Summary:
More detailed debug output for manifests. For changelog we don't
use it at the moment as changelog imports uses the perforce date, causing
changelog hashes not being stable. Will fix this in the future.

Test Plan: cd tests && rt test-p4*

Reviewers: #sourcecontrol, durham

Reviewed By: durham

Subscribers: durham, mjpieters, #idi

Differential Revision: https://phabricator.intern.facebook.com/D4890093

Signature: t1:4890093:1492474213:2b410da985fdd5d0786ef9dde05ecbb96f157e14
2017-04-19 23:33:06 -07:00
David Soria Parra
ba15cdea75 p4fastimport: use repo lookup to get p1/p2
Summary:
Use repo lookup to get p1 and p2 and use their manifestnodes instead.
We will use this for incremental imports in order to correctly optain the
manifestnode.

Test Plan: cd tests && rt test-p4*

Reviewers: #sourcecontrol, durham

Reviewed By: durham

Subscribers: durham, mjpieters, #idi

Differential Revision: https://phabricator.intern.facebook.com/D4890084

Signature: t1:4890084:1492474086:91658b5c57b4e58af88ee14673ffa3516bc2d88a
2017-04-19 23:33:06 -07:00
David Soria Parra
ef08c10f5b p4fastimport : introducing fast Perforce to Mercurial convert extension
Summary:
`p4fastimport` is a fast convert extensions for Perforce to Mercurial. It
is designed to generate filelogs in parallel from Perforce. It tries to
minimize the use of Perforce commands and reads from the the Perforce
store on a Perforce server directly.

The core of p4fastimport is the idea to generate a Mercurial filelog
directly from the underlying Perforce data, as a Perforce file in most
cases matches a filelog directly (per-file branches is an exception). To
generate a filelog we are reading each file for an imported revision. A
file in Perforce is locally either stored in RCS, as a compressed GZIP
or as an flat file (binaries). If we do not find a version locally on
disk we fallback to downloading it from Perforce.

We are generating manifests after all filelogs are imported. A manifest
is constructed by adding and removing files from an initial state. We
are generating the correct offset from a manifest into the filelog by
keeping track of how often a file was touched.

We then generate the changelog.

Linkrev generation is a bit tricky. For every file in Perforce know
to which changelist it belongs, as it's stored revisions contains the
changelist. E.g.  1.1422 is the file changed in the changelist 1422 (this
refers to the "original" changelist, before a potential renumbering,
which is why we use the -O switch).  We use the CL number obtained
from the revision to reverse lookup the offset in the sorted list of
changelists, which corresponds to it's place in the changelog later,
and therefore it's correct linkrev.

Parallel imports: In order to run parallel imports we MUST keep one lock
at a time, even if we import multiple file logs at the same time. However
filelogs use a singular `fncache`, which will be corrupted if we generate
filelogs in parallel. To avoid this, repositories must be generated with
*fncache* disabled! This restricts `p4fastimport` with workers to run
only on case sensitive file systems.

Test Plan:
The included tests as well as multiple imports from a small testing
Perforce client. Afterwards successfully run `hg verify`

  make tests

Reviewers: #idi, quark, durham

Reviewed By: durham

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4776651

Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d
2017-04-13 11:11:09 -07:00