sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 08:47:12 +03:00

Author	SHA1	Message	Date
Siddharth Agarwal	cbe9637432	hg2git: regularize mercurial imports	2015-12-31 12:27:07 -08:00
Sean Farley	3b19ebc41e	hg2git: flake8 cleanup	2015-04-22 16:42:48 -07:00
Augie Fackler	6efcd59b56	hg2git: audit path components during export (CVE-2014-9390) A user recently got confused and managed to track and export a .git directory, which confuses git and causes it to emit very odd errors. For example, cloning one such repository (which has a symlink for .git) produces this output from git: Cloning into 'git'... done. error: Updating '.git' would lose untracked files in it and another (which has a .git directory checked in) produces this: Cloning into 'git'... done. error: Invalid path '.git/hooks/post-update' If it ended there, that'd be fine, but this led to a line of investigation that ended with CVE-2014-9390, so now git will block checking these revisions out, so we should try to prevent foot-shooting on our end. Since some servers (notably github) are blocking trees that contain these entries, default to refusing to export any path component that looks like it folds to .git. Since some histories probably contain this already, we offer an escape hatch via the config option git.blockdotgit that allows users to resume foot-shooting behavior.	2014-11-23 19:06:21 -05:00
Siddharth Agarwal	c188adb4b9	hg2git: in _init_dirs, store keys without leading '/' (issue103) Previously, whenever a tree that wasn't the root ('') was stored, we'd prepend a '/' to it. Then, when we'd try retrieving the entry, we'd do so without the leading '/'. This caused data loss because existing tree entries were dropped on the floor. Fix that by only adding '/' if we're adding to a non-empty initial path. This wasn't detected in tests because most of them deal only with files in the root and not ones in subdirectories.	2014-03-25 11:11:04 -07:00
Siddharth Agarwal	f84c69b6c1	hg2git: start incremental conversion from a known commit Previously, we'd spin up the Mercurial incremental exporter from the null commit and build up state from there. This meant that for the first exported commit, we'd have to read all the files in that commit and compute Git blobs and trees based on that. The current Mercurial to Git conversion scheme makes most sense with Mercurial's current default storage format, where manifests are diffed against the numerically previous revision. At some point in the future, the default will switch to generaldelta, where manifests would be diffed against one of their parents. In that world it might make more sense to have a stateless exporter that diffed each commit against its generaldelta parent and calculated dirty trees based on that instead. However, more experiments need to be done to see what export scheme is best. For a repo with around 50,000 files, this brings down an incremental 'hg gexport' of one commit from 18 seconds with a hot file cache (and tens of minutes with a cold one) to around 2 seconds with a hot file cache.	2014-03-14 20:45:09 -07:00
Siddharth Agarwal	e5bd941852	hg2git: implement a method to initialize _dirs from a Git commit Upcoming patches will start incrementally exporting from a particular commit instead of from null. This function will be used for that..	2014-03-14 19:17:09 -07:00
Siddharth Agarwal	6b4e5f67db	hg2git: fix subrepo handling to be deterministic Previously, the correctness of _handle_subrepos was based on the order the files were processed in. For example, consider the case where a subrepo at location 'loc' is replaced with a file at 'loc', while another subrepo exists. This would cause .hgsubstate and .hgsub to be modified and the file added. If .hgsubstate was seen _before_ 'loc' in the modified/added loop, then _handle_subrepos would run and remove 'loc' correctly, before 'loc' was added back later. If, however, .hgsubstate was seen _after_ 'loc', then _handle_subrepos would run after 'loc' was added and would remove 'loc'. With this patch, _handle_subrepos merely computes the changes that need to be applied. The changes are then applied, making sure removed files and subrepos are processed before added ones. This was detected by setting a random PYTHONHASHSEED (in this case, 3910358828) and running the test suite against it. An upcoming patch will randomize the PYTHONHASHSEED in run-tests.py, just like is done in Mercurial.	2014-02-19 20:52:59 -08:00
Siddharth Agarwal	689b38dc44	hg2git: move parse_subrepos to top level durin42 expressed a desire for this function to be at the top level.	2014-02-19 20:18:43 -08:00
Siddharth Agarwal	d7dbce79bd	hg2git: call _handle_subrepos when .hgsubstate is removed Now that _handle_subrepos can handle .hgsubstate being removed, we should use it for that. The test changes make sure that the SHAs roundtrip.	2014-02-12 22:55:16 -08:00
Siddharth Agarwal	39d1c15298	hg2git: make _handle_subrepos worked in the removed case A test for this will be included in an upcoming patch.	2014-02-12 21:19:04 -08:00
Siddharth Agarwal	ca74d6d967	hg2git: add 'new' prefix to _handle_subrepos variables An upcoming patch will introduce similar variables for self._ctx. This helps disambiguate.	2014-02-12 20:34:09 -08:00
Siddharth Agarwal	3cadf19b94	hg2git: factor out subrepo parsing into a separate function This code will be used in multiple contexts in an upcoming patch.	2014-02-12 20:28:28 -08:00
Siddharth Agarwal	44c13be822	hg2git: factor out remove path logic into a separate function This will be used by _handle_subrepos in an upcoming patch.	2014-02-12 19:50:56 -08:00
Siddharth Agarwal	873a402c5e	hg2git: call status on newctx, not newctx.rev() There's no benefit to calling rev().	2014-02-12 18:05:12 -08:00
Siddharth Agarwal	17657a025c	hg2git: store ctx instead of rev Storing a ctx enables values like manifests to be cached on the context.	2014-02-12 17:49:14 -08:00
Siddharth Agarwal	b470bfcf51	hg2git: rename ctx to newctx in update_changeset An upcoming patch will introduce a new field called _ctx. This helps prevent confusion.	2014-02-12 17:47:38 -08:00
Gregory Szorc	10dcc5b5c0	Only export modified Git trees Previously, we emitted every Git tree when updating between Mercurial changesets. With this patch, we now only emit Git trees that changed. A side-effect of the implementation is that we now only update in-memory Git trees objects that changed. Before, we always touched Git trees, invalidating them in the process and causing Dulwich to recalculate their SHA-1. Profiling revealed this to be expensive and removing the extra calculation shows a nice performance win. Another optimization is to not sort the order that changed paths are processed in. Previously, we sorted by length, longest to shortest. Profiling revealed that the sorts took a non-trivial amount of time. While sorted execution resulted in likely idempotent behavior, it shouldn't be strictly required. On the author's machine, conversion of the Mercurial repository itself decreased from ~493s to ~333s. Even more impressive is conversion of Firefox's main repository (which is considerably larger). Converting the first 200 revisions of that repository decreased from ~152s to ~42s.	2013-04-14 11:11:41 -07:00
Gregory Szorc	baa19027ef	Export Git objects from incremental Mercurial changes This replaces the brute force Mercurial to Git export with one that is incremental. It results in a decent performance win and paves the road for parallel export via using multiple incremental exporters.	2013-03-19 22:44:01 -07:00

18 Commits