Commit Graph

2 Commits

Author SHA1 Message Date
Kaley Huang
82cd0ac475 p4fastimport: handle deleted files + fix wrong parents
Summary:
Running `hg log --traceback --template '{file_copies}' -r XXXX` on a file with long history is slow for 2 reasons
- p4 fast importer preserves full history for deleted and re-added files
- p4 fast importer records the wrong parent of a file

This diff tries to fix these two issues.

In mercurial, if a file is added, deleted, and then added back, it should start a new file history when the file is added again.
For example,
commits        commit1      commit2                 commit3
actions          add a.txt      delete a.txt             add a.txt
timeline ------------X------------X------------------------X------------

`hg debugindex a.txt` at commit3 shows a.txt as a new file without previous history
   rev    offset  length  delta linkrev nodeid       p1           p2
     0         0       3     -1       0 b789fdd96dc2 000000000000 000000000000

However, this is different in p4. `p4 filelog test.txt` gives you

  //depot/Software/Apps/Main/Native/.castle/test.txt
  ... #3 change 523261 add on 2018/01/23 by zhihuih@devbig415 (text) 'test:add-again-same-file'
  ... #2 change 523254 delete on 2018/01/23 by zhihuih@devbig415 (text) 'testfile:delete'
  ... #1 change 523253 add on 2018/01/23 by zhihuih@devbig415 (text) 'testfile:add'

Currently, p4 fast importer preserves history the same way as p4, and this causes slowness (even timeout) in hg when it runs `hg log --traceback --template '{file_copies}' -r XXXX` on a revision that contains files with long history in p4 (mostly contributed by automation). To mitigate this, we want the p4 fast importer to behave the same way as hg, and starts a new history for a file that's added again.

Currently, p4 fast importer takes the tip of a filelog and treats that as the parent of the newly written entry diffusion/FBS/browse/master/fbcode/scm/hg/hgext/p4fastimport/importer.py;19ad9b05f50e3ff0265cdc7b4b45174dcf820343$468-469. This can be wrong when there are revisions from branches.

For example, if I edit file a in master in CL1, 2, 4, and I branch at CL3, and edit the file in branch in CL5, the current importer implementation will take filenode at CL4 as the parent of CL3
(CL1,2,3,4,5 corresponds to rev0,1,3,2,4)
{F120393661}

However, the correct behavior is to take filenode at CL2 as the parent of CL3
(CL1,2,3,4,5 corresponds to rev0,1,3,2,4)
{F120393662}

(This is also the example I use in `test-fb-hgext-p4fastimport-import-branch-filelogorder.t`, so if the description here looks confusing, please refer to the test)

Reviewed By: dsp

Differential Revision: D6962019

fbshipit-source-id: 24de76ae009e0d6f976d247087fe4702c99e0f82
2018-04-13 21:51:14 -07:00
Kostia Balytskyi
e75b9fc1b1 fb-hgext: move most of hgext3rd and related tests to core
Summary:
This commit moves most of the stuff in hgext3rd and related tests to
hg-crew/hgext and hg-crew/test respectively.

The things that are not moved are the ones which require some more complex
imports.


Depends on D6675309

Test Plan: - tests are failing at this commit, fixes are in the following commits

Reviewers: #sourcecontrol

Differential Revision: https://phabricator.intern.facebook.com/D6675329
2018-01-09 03:03:59 -08:00