Summary: Makes syncimporter case insensitive for p4 paths to properly handle Perforce's case insensitivity.
Reviewed By: zhh95
Differential Revision: D8777555
fbshipit-source-id: 8d39f2455f42b27a3a3341c02901ed31398cc5bd
Summary:
Previous code format attempt (D8173629) didn't cover all files due to `**/*.py`
was not expanded recursively by bash. That makes certain changes larger than
they should be (ex. D8675439). Now use zsh's `**/*.py` to format them.
Also fix Python syntax so black can run on more files, and all lint issues.
Reviewed By: phillco
Differential Revision: D8696912
fbshipit-source-id: 95f07aa0c5eb1b63947b0f77f534957f4ab65364
Summary:
p4fastimport has been replaced by p4seqimport months ago, on top of that previous assumptions have changed that might make p4fastimport not work (e.g. allowing import to connect to LFS).
This change cleans up a lot of unused code and a bunch of tests that no longer make sense.
Reviewed By: zhh95
Differential Revision: D8525286
fbshipit-source-id: 91d33e7530bf6df1e1ec92fae3acde0345230247
Summary: Mostly empty lines removed and added. A few bugfixes on excessive line splitting.
Reviewed By: quark-zju
Differential Revision: D8199128
fbshipit-source-id: 90c1616061bfd7cfbba0b75f03f89683340374d5
Summary: This change makes all p4 paths lowercase when dealing with the mapping from p4 to hg, thus making the importer work fine with different case paths from Perforce.
Differential Revision: D7978382
fbshipit-source-id: 134705ac27d889e80e5de589ab165e8acfd52346
Summary:
This diff prepares the importer to cleanly handle moving files in perforce while
keeping the structure in hg untouched.
It introduces a magic string that has to be present in the changelist description:
`IMPORTER_IGNORE_REORG@`
When that is present, move/add move/delete actions where hg path is the same are
ignored.
Differential Revision: D7832814
fbshipit-source-id: e8323d0c3cc79ee81cb819bee63435f345069861
Summary:
We currently call "p4 where" on all files in the new client spec, which is roughly O(all files in ovrsource), and it's only going to get worse as ovrsource grows :(
This diff makes the number of calls O(files added & removed between two client specs) by filtering them in the following way:
- get a full list of files from old client (L1)
- get a full list of files from new client (L2)
- files_added = L2 - L1
- files_removed = L1 - L2
- `p4 where` on (files_added + files_removed)
How much speedup do we get?
from
[2018-04-25 16:07:27,725]INFO:root: Start sync import in ovrsource-master
[2018-04-25 17:35:35,873]INFO:root: Finish sync import in ovrsource-master after 5288.147963762283 seconds
to
[2018-04-26 13:43:28,819]INFO:root: Start sync import in ovrsource-master
[2018-04-26 13:51:35,575]INFO:root: Finish sync import in ovrsource-master after 486.7560772895813 seconds
on this change D7722798
Differential Revision: D7772403
fbshipit-source-id: 05a16343264007ee3ee466621da9da888c2368d7
Summary: Curernt sync importer modifies mercurial storage data structures directly, which is error prone and hard to debug. This reimplements the sync importer with a higher level mercurial API (what `p4seqmport` uses).
Differential Revision: D7714322
fbshipit-source-id: 0269839b5ee3a4b45f166dce74dfd29c8ec5135a
Summary:
D7676378 would break this as it changes how parse_where behaves. This is the only
callsite left behind. AFAICT it actually wouldn't work, as it passes the hg path
to p4 where.
This change removes the option and related code.
Differential Revision: D7696956
fbshipit-source-id: a27077a9540369b5c77692185f317cf5395789ba
Summary:
We currently call p4 where with one path at a time, but it accepts a list.
This change takes advantage of that, batching p4 where calls, which speeds up
importing.
Differential Revision: D7676378
fbshipit-source-id: 4a6747458555a60dd5f385604f2a25d595af947d
Summary:
Currently, the sync importer only reads files from perforce to ensure that the contents are up to date. Connecting to perforce and use `p4 print` is pretty slow, and we would like to use alternatives like `rcs` and `gzip` to speed things up, which are also what seqimporter is using.
This diff changes the reading behavior of sync importer so that it can also read from `rcs` and `gzip`.
A brief test on reading files from `Tools/...` shows that this works and the speedup is pretty good.
Differential Revision: D7619304
fbshipit-source-id: 563d0e40bcf7fa9a187cd3ede70878dccef4f9e9
Summary:
This has been commented out for ages
It is not needed, and it refers to self._parsed which does not exist.
Differential Revision: D7625105
fbshipit-source-id: 908809b5b31e104b01b49558febed5adbfdcd143
Summary:
We haven't used this for over a month, we're alive and well.
The raccoon asked me to remove this:
codecleanupracoon
Differential Revision: D7578280
fbshipit-source-id: ee522203095a213dc90d6146ce720b473ab5afe5
Summary:
seqimport currently has a gap: if a changelist touches files outside of
clientspec, it will blow up when trying to get the move info for it. Even if that
was not the case, it could blow up if the file was moved into the clientspec.
This change makes it resilient to that by providing the same list created by
`parse_fstat` to prefilter files we check for move info, and then making
`parse_where` take an optional parameter saying it is fine if file is not on
client (i.e. moved into clientspec)
Differential Revision: D7574415
fbshipit-source-id: 63f6a32436d3d53d6f9402575a9a13bb4187b76c
Summary:
We disabled `runworker` for every importer (fastimport, syncimport, seqimport) when we hotfixed a customized file transaction breakage in fastimport D7108127.
Since
- we're not using fastimport, which relies on the customized file transaction
- seqimport does not rely on the customized file transaction mechanism
- syncimport is super slow without `runworker` and syncimport does not rely on the customized file transaction mechanism
- the manual work involved in adding mapping to ovrsource (we have to do that if we don't have syncimport) is not trivial
I think it's a good idea to bring back `runworker`, which is a configurable config, and it's explicitly set to `false` for fastimport and seqimport, `force` for syncimport.
This diff
- make the `useworker` config available again
- remove the unnecessary test on the customized file transaction, which is still broken with `runworker`
- log the time it takes to finish one sync import
Differential Revision: D7557856
fbshipit-source-id: 6d4105cc38b182e027512730901ce3b2a4e1d449
Summary:
Turns out I incorrectly assessed this situation before. We do use content from
perforce servers a lot. This change makes p4seqimport read from local disk
directly if possibel rather than resorting solely on `p4 print` to obtain file content.
```name=Checking file content src on master-importer task 0 (running for 15h+)
[15:40:23 twsvcscm@priv_global/independent_devinfra/ovrsource-master-importer/0 ~]$ egrep -o 'src: (gzip|rcs|p4)' /logs/stdout | sort | uniq -c
2567 src: gzip
24 src: p4
```
Differential Revision: D7388797
fbshipit-source-id: 5fe1a525bc211d64a75954d529edc152d22970a7
Summary:
dsp had a look at the whole stack and suggested some changes:
* Only write bookmark once at the end of the import - we are doing a single transaction anyways so updating the bookmark after every changelist import is moot
* Remove unused function seqimporter.ChangelistImporter._safe_open
* Require fncache to preserve behavior from p4fastimport
Differential Revision: D7375481
fbshipit-source-id: f4407d5d0276f96d72bf67544091640fe1c46044
Summary: Updates the importer wrapper to use the new p4seqimport, replacing p4fastimport.
Differential Revision: D7326764
fbshipit-source-id: 588486bfd747086396f47e678da05c6eafd30565
Summary:
When testing p4seqimport with remotefilelog it would barf on call to `.tip()`,
because remotefilelog doesn't have that.
This change makes use of the change context from the repo instead to get the
tip node.
Differential Revision: D7294979
fbshipit-source-id: 18b4a5107f4cbf676016d44d5134bf0d252eeff3
Summary:
Perforce supports RCS keyworded files, more info here:
http://answers.perforce.com/articles/KB/3482
We replace things back in p4fastimport, this replicates the behavior in
p4seqimport (unit test should clarify what this means)
Differential Revision: D7188163
fbshipit-source-id: 594f71d6114c73001753ae36c4973c2db3310e62
Summary:
Respect the executable bit on files based on perforce type.
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7185388
fbshipit-source-id: 59afec7bd857572b8347ebe546d131017a79928c
Summary:
p4seqimport has used very high level mercurial abstractions so far (almost
equivalent to running hg add / mv / rm / commit on command line). This is very
easy to grasp as we use it day to day. It is not performant enough for our
importer:
- It does the work twice (write to working copy, then commit changing hg metadata)
- It requires the working copy (this would force us to update between revs,
materializing a prohibitively large number of files)
This change makes use of memctx, which is basically an in-memory commit. This way
we don't need a working copy and we save time + a lot of space.
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7176903
fbshipit-source-id: 2773d7c001b615837496ea9db3229d9afc020124
Summary:
p4seqimport has a bookmark option, it was completely ignored before this change.
This makes use of the opt, moving the bookmark as we import changes.
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7172867
fbshipit-source-id: be63765088b0583df2e1c9e0ccec869c5278d782
Summary:
Properly create files as symlinks if they are symlinks in P4
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Reviewed By: wlis
Differential Revision: D7157772
fbshipit-source-id: ac3e5010f3d15460592a449c817824c0b28a8435
Summary:
Similar to #10 (D7113181), we need to track large files.
This change adds the bits to do so, reusing the logic from p4fastimport which was
moved to lfs.py
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7115654
fbshipit-source-id: 56ccfadf6fa14dcfb8005cc5ef03fb175835bcda
Summary:
This change makes seqimport write revision info (i.e. (CL, hghash) pairs) to a
sqlite file. This is used by the importer TW job wrapper to write the info
into `xdb.p4sync` table `revmap`
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7113181
fbshipit-source-id: e55a8cf0b794216a4855ae7486885c3d956cd7fb
Summary:
Adds p4changelist to commit extra info
With p4changelist info, make p4seqimport incremental
Add debug message to have more accurate info on what is actually being imported
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7090090
fbshipit-source-id: 17529aa57452453cfe29c3c3dc9d9e7daa8cffb2
Summary:
Adds copy tracing to `p4seqimport` by:
- Leveraging `fromFile` from `p4 -ztag describe` to introduce source for moved
files into P4Changelist.load's
- Utilizing that info from P4 CL when creating hg commit
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7074892
fbshipit-source-id: e105a608bb953a8137ec6c9afc7e0571a902c868
Summary:
Consolidates manipulation of p4 CL info into p4 module, pulling the relevant code
out of ChangeManifestImporter creategen so it can be easily shared by
p4fastimport and p4seqimport
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7064179
fbshipit-source-id: 72c5bcad209eebf40ec8152a07f98f7f7fa544fb
Summary:
Adds logic to create the commit, using info from p4 CL + the list of added and
removed files.
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7063983
fbshipit-source-id: c64e44c19d06e54fe35121a8d6128de050f93823
Summary:
Read file from perforce, write into the hg repo.
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7050157
fbshipit-source-id: 4389ba11f62c8ed825d6a6ef3c001095339eb551
Summary:
Creates ChangelistImporter, which will be responsible for translating a p4 CL to
a hg commit
For now it only goes through files touched by the CL and lists what was added or
removed. Next diffs will evolve it to the point where it effectively performs the
translation.
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7049961
fbshipit-source-id: 6a9f3bd57cadc2b9ea8a81373cc10dfda76311e7
Summary:
Pulls the logic to define changelists from p4fastimport into separate function
and re-uses that in p4seqimport
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7035674
fbshipit-source-id: 699e9148d35e437f306062f290c8ec2a857df480
Summary:
This change:
Moves some opts sanitizing logic into function `sanitizeopts`
Adds checks for `limit` being a positive integer
Uses `sanitizeopts` new function in p4seqimport
Adds a test covering `sanitizeopts`
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7035217
fbshipit-source-id: cd677fb254ff83d123673d51a1c682639de08a30
Summary:
p4seqimport will be the new command to import from p4 to hg changelist by
changelist. This should provide us with a more robust importer that doesn't rely
on fiddling with hg's data structures directly. p4fastimport was important to
create ovrsource from scratch and import thousands of changelists, but moving
forward it is probably safer and easier to understand/maintain something that is
based on higher level Mercurial APIs
All that said, this is the first change, this change:
1. Creates p4seqimport command as part of the p4fastimport extension
2. Refactors the p4 client checking logic into `enforce_p4_client_exists`
3. Adds a test that checks the new function works through using `p4seqimport`.
For a high-level overview of p4seqimport, please check https://our.intern.facebook.com/intern/wiki/IDI/p4seqimport/
Differential Revision: D7015941
fbshipit-source-id: cb5c59b2f104f336a078025544a44028bf01fa85
Summary: progress.bar() is incorrectly called without passing ui.
Differential Revision: D7415250
fbshipit-source-id: 22c7419561879ed9293e2c79cc9d4271e805be76
Summary:
The breakage we had on branch importer was related to filetransaction trying to
close a file that didn't exist. We're still not sure why this happens yet, but
the workaround was to use mercurial's transaction and to force not using workers,
which is this change.
Differential Revision: D7108127
fbshipit-source-id: 71fa63824984bfb91de3b732166f7bae496187ad
Summary:
Running `hg log --traceback --template '{file_copies}' -r XXXX` on a file with long history is slow for 2 reasons
- p4 fast importer preserves full history for deleted and re-added files
- p4 fast importer records the wrong parent of a file
This diff tries to fix these two issues.
In mercurial, if a file is added, deleted, and then added back, it should start a new file history when the file is added again.
For example,
commits commit1 commit2 commit3
actions add a.txt delete a.txt add a.txt
timeline ------------X------------X------------------------X------------
`hg debugindex a.txt` at commit3 shows a.txt as a new file without previous history
rev offset length delta linkrev nodeid p1 p2
0 0 3 -1 0 b789fdd96dc2 000000000000 000000000000
However, this is different in p4. `p4 filelog test.txt` gives you
//depot/Software/Apps/Main/Native/.castle/test.txt
... #3 change 523261 add on 2018/01/23 by zhihuih@devbig415 (text) 'test:add-again-same-file'
... #2 change 523254 delete on 2018/01/23 by zhihuih@devbig415 (text) 'testfile:delete'
... #1 change 523253 add on 2018/01/23 by zhihuih@devbig415 (text) 'testfile:add'
Currently, p4 fast importer preserves history the same way as p4, and this causes slowness (even timeout) in hg when it runs `hg log --traceback --template '{file_copies}' -r XXXX` on a revision that contains files with long history in p4 (mostly contributed by automation). To mitigate this, we want the p4 fast importer to behave the same way as hg, and starts a new history for a file that's added again.
Currently, p4 fast importer takes the tip of a filelog and treats that as the parent of the newly written entry diffusion/FBS/browse/master/fbcode/scm/hg/hgext/p4fastimport/importer.py;19ad9b05f50e3ff0265cdc7b4b45174dcf820343$468-469. This can be wrong when there are revisions from branches.
For example, if I edit file a in master in CL1, 2, 4, and I branch at CL3, and edit the file in branch in CL5, the current importer implementation will take filenode at CL4 as the parent of CL3
(CL1,2,3,4,5 corresponds to rev0,1,3,2,4)
{F120393661}
However, the correct behavior is to take filenode at CL2 as the parent of CL3
(CL1,2,3,4,5 corresponds to rev0,1,3,2,4)
{F120393662}
(This is also the example I use in `test-fb-hgext-p4fastimport-import-branch-filelogorder.t`, so if the description here looks confusing, please refer to the test)
Reviewed By: dsp
Differential Revision: D6962019
fbshipit-source-id: 24de76ae009e0d6f976d247087fe4702c99e0f82
Summary:
running p4 syncimport when a client spec changes is very slow, and this blocks us from using it in prod.
This diff speeds up p4 syncimport by using multiple workers when calling `p4 where` on a big list of files.
It reduces the runtime from ~14mins to ~3mins when I add a new mapping `u'//depot/Tools/...': u'Tools/...',` to ovrsource config and run p4 syncimport.
Reviewed By: dsp
Differential Revision: D6838406
fbshipit-source-id: e7260be5dea2b41e176fdc3508a78134cb1e9c35
Summary:
Previously we allowed a sync commit to happen at any time, but imported the
full repository to bring it into sync. This is slow for very large repositories
and might be accidentally triggered. We change this behavior to only do a sync
commit if only the imported client changed but nothing else.
Reviewed By: dsp
Differential Revision: D6838370
fbshipit-source-id: eecdfb0ea295585058784f8d1f70de5f8c733645
Summary:
This commit moves most of the stuff in hgext3rd and related tests to
hg-crew/hgext and hg-crew/test respectively.
The things that are not moved are the ones which require some more complex
imports.
Depends on D6675309
Test Plan: - tests are failing at this commit, fixes are in the following commits
Reviewers: #sourcecontrol
Differential Revision: https://phabricator.intern.facebook.com/D6675329