sapling/tests/test-fb-hgext-p4fastimport-import-deletes.t

  $ setconfig extensions.treemanifest=!
#require p4

  $ . $TESTDIR/p4setup.sh

populate the depot
  $ mkdir Main
  $ mkdir Main/b
  $ echo a > Main/a
  $ echo c > Main/b/c
  $ echo d > Main/d
  $ p4 add Main/a Main/b/c Main/d
  //depot/Main/a#1 - opened for add
  //depot/Main/b/c#1 - opened for add
  //depot/Main/d#1 - opened for add
  $ p4 submit -d initial
  Submitting change 1.
  Locking 3 files ...
  add //depot/Main/a#1
  add //depot/Main/b/c#1
  add //depot/Main/d#1
  Change 1 submitted.

  $ p4 delete Main/a
  //depot/Main/a#1 - opened for delete
  $ p4 submit -d second
  Submitting change 2.
  Locking 1 files ...
  delete //depot/Main/a#2
  Change 2 submitted.

  $ echo  a > Main/a
  $ p4 add Main/a
  //depot/Main/a#2 - opened for add
  $ p4 submit -d third
  Submitting change 3.
  Locking 1 files ...
  add //depot/Main/a#3
  Change 3 submitted.

Simple import

  $ cd $hgwd
  $ hg init --config 'format.usefncache=False'
  $ hg p4seqimport -P $P4ROOT hg-p4-import

Verify

  $ hg verify
  checking changesets
  checking manifests
  crosschecking files in changesets and manifests
  checking files
  3 files, 3 changesets, 3 total revisions

  $ hg update tip
  3 files updated, 0 files merged, 0 files removed, 0 files unresolved

Check hg debug data
  $ hg debugdata -m 0
  Main/a\x00b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3 (esc)
  Main/b/c\x00149da44f2a4e14f488b7bd4157945a9837408c00 (esc)
  Main/d\x00a9092a3d84a37b9993b5c73576f6de29b7ea50f6 (esc)
  $ hg debugdata -m 1
  Main/b/c\x00149da44f2a4e14f488b7bd4157945a9837408c00 (esc)
  Main/d\x00a9092a3d84a37b9993b5c73576f6de29b7ea50f6 (esc)
  $ hg debugdata -m 2
  Main/a\x00b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3 (esc)
  Main/b/c\x00149da44f2a4e14f488b7bd4157945a9837408c00 (esc)
  Main/d\x00a9092a3d84a37b9993b5c73576f6de29b7ea50f6 (esc)
  $ hg debugindex Main/a
     rev    offset  length  delta linkrev nodeid       p1           p2
       0         0       3     -1       0 b789fdd96dc2 000000000000 000000000000

End Test

  stopping the p4 server
treemanifest: enable treemanifest by default in tests Summary: Now that all our repos are treemanifest, let's enable the extension by default in tests. Once we're certain no one needs it in production we'll also make it the default in core Mercurial. This diff includes a minor fix in treemanifest to be aware of always-enabled extensions. It won't matter until we actually add treemanifest to the list of default enabled extensions, but I caught this while testing things. Reviewed By: ikostia Differential Revision: D15030253 fbshipit-source-id: d8361f915928b6ad90665e6ed330c1df5c8d8d86 2019-05-28 13:12:27 +03:00			`$ setconfig extensions.treemanifest=!`
p4fastimport: require p4 during tests Summary: Require perforce during all tests. Test Plan: run tests with and without Perforce in PATH. Tests correctly run with P4D and P4 in path and were correctly skipped without. Reviewers: #sourcecontrol, #idi, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4899085 Signature: t1:4899085:1492475858:3bd1443c707e56461835d278a9c6bf3e034b5f4a 2017-04-18 04:08:07 +03:00			`#require p4`

p4fastimport: extract common logic from tests Summary: This diff adds a shared `p4setup.sh` that de-duplicates common logic among tests. It also uses absolute path to make sure the extension being tested is the version being developed. The LFS test is also workarounded temporarily waiting for upstream change. Test Plan: Run existing tests Reviewers: #mercurial, davidsp Reviewed By: davidsp Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D5049279 Signature: t1:5049279:1494547832:28222fd2034115faca73860d6dd2f19206701aaa 2017-05-12 03:13:31 +03:00			`$ . $TESTDIR/p4setup.sh`
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with fncache disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d 2017-04-13 21:11:09 +03:00
			`populate the depot`
			`$ mkdir Main`
			`$ mkdir Main/b`
			`$ echo a > Main/a`
			`$ echo c > Main/b/c`
			`$ echo d > Main/d`
			`$ p4 add Main/a Main/b/c Main/d`
			`//depot/Main/a#1 - opened for add`
			`//depot/Main/b/c#1 - opened for add`
			`//depot/Main/d#1 - opened for add`
			`$ p4 submit -d initial`
			`Submitting change 1.`
			`Locking 3 files ...`
			`add //depot/Main/a#1`
			`add //depot/Main/b/c#1`
			`add //depot/Main/d#1`
			`Change 1 submitted.`

			`$ p4 delete Main/a`
			`//depot/Main/a#1 - opened for delete`
			`$ p4 submit -d second`
			`Submitting change 2.`
			`Locking 1 files ...`
			`delete //depot/Main/a#2`
			`Change 2 submitted.`

			`$ echo a > Main/a`
			`$ p4 add Main/a`
			`//depot/Main/a#2 - opened for add`
			`$ p4 submit -d third`
			`Submitting change 3.`
			`Locking 1 files ...`
			`add //depot/Main/a#3`
			`Change 3 submitted.`

			`Simple import`

			`$ cd $hgwd`
			`$ hg init --config 'format.usefncache=False'`
Cleanup p4fastimport Summary: p4fastimport has been replaced by p4seqimport months ago, on top of that previous assumptions have changed that might make p4fastimport not work (e.g. allowing import to connect to LFS). This change cleans up a lot of unused code and a bunch of tests that no longer make sense. Reviewed By: zhh95 Differential Revision: D8525286 fbshipit-source-id: 91d33e7530bf6df1e1ec92fae3acde0345230247 2018-06-26 23:28:54 +03:00			`$ hg p4seqimport -P $P4ROOT hg-p4-import`
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with fncache disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d 2017-04-13 21:11:09 +03:00
			`Verify`

			`$ hg verify`
			`checking changesets`
			`checking manifests`
			`crosschecking files in changesets and manifests`
			`checking files`
p4fastimport: handle deleted files + fix wrong parents Summary: Running `hg log --traceback --template '{file_copies}' -r XXXX` on a file with long history is slow for 2 reasons - p4 fast importer preserves full history for deleted and re-added files - p4 fast importer records the wrong parent of a file This diff tries to fix these two issues. In mercurial, if a file is added, deleted, and then added back, it should start a new file history when the file is added again. For example, commits commit1 commit2 commit3 actions add a.txt delete a.txt add a.txt timeline ------------X------------X------------------------X------------ `hg debugindex a.txt` at commit3 shows a.txt as a new file without previous history rev offset length delta linkrev nodeid p1 p2 0 0 3 -1 0 b789fdd96dc2 000000000000 000000000000 However, this is different in p4. `p4 filelog test.txt` gives you //depot/Software/Apps/Main/Native/.castle/test.txt ... #3 change 523261 add on 2018/01/23 by zhihuih@devbig415 (text) 'test:add-again-same-file' ... #2 change 523254 delete on 2018/01/23 by zhihuih@devbig415 (text) 'testfile:delete' ... #1 change 523253 add on 2018/01/23 by zhihuih@devbig415 (text) 'testfile:add' Currently, p4 fast importer preserves history the same way as p4, and this causes slowness (even timeout) in hg when it runs `hg log --traceback --template '{file_copies}' -r XXXX` on a revision that contains files with long history in p4 (mostly contributed by automation). To mitigate this, we want the p4 fast importer to behave the same way as hg, and starts a new history for a file that's added again. Currently, p4 fast importer takes the tip of a filelog and treats that as the parent of the newly written entry diffusion/FBS/browse/master/fbcode/scm/hg/hgext/p4fastimport/importer.py;19ad9b05f50e3ff0265cdc7b4b45174dcf820343$468-469. This can be wrong when there are revisions from branches. For example, if I edit file a in master in CL1, 2, 4, and I branch at CL3, and edit the file in branch in CL5, the current importer implementation will take filenode at CL4 as the parent of CL3 (CL1,2,3,4,5 corresponds to rev0,1,3,2,4) {F120393661} However, the correct behavior is to take filenode at CL2 as the parent of CL3 (CL1,2,3,4,5 corresponds to rev0,1,3,2,4) {F120393662} (This is also the example I use in `test-fb-hgext-p4fastimport-import-branch-filelogorder.t`, so if the description here looks confusing, please refer to the test) Reviewed By: dsp Differential Revision: D6962019 fbshipit-source-id: 24de76ae009e0d6f976d247087fe4702c99e0f82 2018-02-22 08:06:40 +03:00			`3 files, 3 changesets, 3 total revisions`
p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with fncache disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d 2017-04-13 21:11:09 +03:00
			`$ hg update tip`
			`3 files updated, 0 files merged, 0 files removed, 0 files unresolved`

p4fastimport: handle deleted files + fix wrong parents Summary: Running `hg log --traceback --template '{file_copies}' -r XXXX` on a file with long history is slow for 2 reasons - p4 fast importer preserves full history for deleted and re-added files - p4 fast importer records the wrong parent of a file This diff tries to fix these two issues. In mercurial, if a file is added, deleted, and then added back, it should start a new file history when the file is added again. For example, commits commit1 commit2 commit3 actions add a.txt delete a.txt add a.txt timeline ------------X------------X------------------------X------------ `hg debugindex a.txt` at commit3 shows a.txt as a new file without previous history rev offset length delta linkrev nodeid p1 p2 0 0 3 -1 0 b789fdd96dc2 000000000000 000000000000 However, this is different in p4. `p4 filelog test.txt` gives you //depot/Software/Apps/Main/Native/.castle/test.txt ... #3 change 523261 add on 2018/01/23 by zhihuih@devbig415 (text) 'test:add-again-same-file' ... #2 change 523254 delete on 2018/01/23 by zhihuih@devbig415 (text) 'testfile:delete' ... #1 change 523253 add on 2018/01/23 by zhihuih@devbig415 (text) 'testfile:add' Currently, p4 fast importer preserves history the same way as p4, and this causes slowness (even timeout) in hg when it runs `hg log --traceback --template '{file_copies}' -r XXXX` on a revision that contains files with long history in p4 (mostly contributed by automation). To mitigate this, we want the p4 fast importer to behave the same way as hg, and starts a new history for a file that's added again. Currently, p4 fast importer takes the tip of a filelog and treats that as the parent of the newly written entry diffusion/FBS/browse/master/fbcode/scm/hg/hgext/p4fastimport/importer.py;19ad9b05f50e3ff0265cdc7b4b45174dcf820343$468-469. This can be wrong when there are revisions from branches. For example, if I edit file a in master in CL1, 2, 4, and I branch at CL3, and edit the file in branch in CL5, the current importer implementation will take filenode at CL4 as the parent of CL3 (CL1,2,3,4,5 corresponds to rev0,1,3,2,4) {F120393661} However, the correct behavior is to take filenode at CL2 as the parent of CL3 (CL1,2,3,4,5 corresponds to rev0,1,3,2,4) {F120393662} (This is also the example I use in `test-fb-hgext-p4fastimport-import-branch-filelogorder.t`, so if the description here looks confusing, please refer to the test) Reviewed By: dsp Differential Revision: D6962019 fbshipit-source-id: 24de76ae009e0d6f976d247087fe4702c99e0f82 2018-02-22 08:06:40 +03:00			`Check hg debug data`
			`$ hg debugdata -m 0`
			`Main/a\x00b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3 (esc)`
			`Main/b/c\x00149da44f2a4e14f488b7bd4157945a9837408c00 (esc)`
			`Main/d\x00a9092a3d84a37b9993b5c73576f6de29b7ea50f6 (esc)`
			`$ hg debugdata -m 1`
			`Main/b/c\x00149da44f2a4e14f488b7bd4157945a9837408c00 (esc)`
			`Main/d\x00a9092a3d84a37b9993b5c73576f6de29b7ea50f6 (esc)`
			`$ hg debugdata -m 2`
			`Main/a\x00b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3 (esc)`
			`Main/b/c\x00149da44f2a4e14f488b7bd4157945a9837408c00 (esc)`
			`Main/d\x00a9092a3d84a37b9993b5c73576f6de29b7ea50f6 (esc)`
			`$ hg debugindex Main/a`
			`rev offset length delta linkrev nodeid p1 p2`
			`0 0 3 -1 0 b789fdd96dc2 000000000000 000000000000`

p4fastimport : introducing fast Perforce to Mercurial convert extension Summary: `p4fastimport` is a fast convert extensions for Perforce to Mercurial. It is designed to generate filelogs in parallel from Perforce. It tries to minimize the use of Perforce commands and reads from the the Perforce store on a Perforce server directly. The core of p4fastimport is the idea to generate a Mercurial filelog directly from the underlying Perforce data, as a Perforce file in most cases matches a filelog directly (per-file branches is an exception). To generate a filelog we are reading each file for an imported revision. A file in Perforce is locally either stored in RCS, as a compressed GZIP or as an flat file (binaries). If we do not find a version locally on disk we fallback to downloading it from Perforce. We are generating manifests after all filelogs are imported. A manifest is constructed by adding and removing files from an initial state. We are generating the correct offset from a manifest into the filelog by keeping track of how often a file was touched. We then generate the changelog. Linkrev generation is a bit tricky. For every file in Perforce know to which changelist it belongs, as it's stored revisions contains the changelist. E.g. 1.1422 is the file changed in the changelist 1422 (this refers to the "original" changelist, before a potential renumbering, which is why we use the -O switch). We use the CL number obtained from the revision to reverse lookup the offset in the sorted list of changelists, which corresponds to it's place in the changelog later, and therefore it's correct linkrev. Parallel imports: In order to run parallel imports we MUST keep one lock at a time, even if we import multiple file logs at the same time. However filelogs use a singular `fncache`, which will be corrupted if we generate filelogs in parallel. To avoid this, repositories must be generated with fncache disabled! This restricts `p4fastimport` with workers to run only on case sensitive file systems. Test Plan: The included tests as well as multiple imports from a small testing Perforce client. Afterwards successfully run `hg verify` make tests Reviewers: #idi, quark, durham Reviewed By: durham Subscribers: mjpieters Differential Revision: https://phabricator.intern.facebook.com/D4776651 Signature: t1:4776651:1492015012:0161c4f45eab4d3b64597d012188c5f2007e8f7d 2017-04-13 21:11:09 +03:00			`End Test`

			`stopping the p4 server`