sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-09 16:31:02 +03:00

Author	SHA1	Message	Date
Pierre-Yves David	80ed73689f	checkcopies: move 'movewithdir' initialisation right before its usage The 'movewithdir' had a lot of related logic all around the 'mergecopies'. However it is actually never containing anything until the very last loop in that function. We move the (simplified) variable definition there for clarity	2016-10-11 02:15:23 +02:00
Pierre-Yves David	163070ae3d	checkcopies: extract the '_related' closure There is not need for it to be a closure.	2016-10-11 01:29:08 +02:00
Pierre-Yves David	2d670597fe	checkcopies: add an inline comment about the '_related' call This helps understanding the flow of the function.	2016-10-08 23:00:55 +02:00
Pierre-Yves David	71b0c4ef9c	checkcopies: minor change to comment This helped me understand the refactoring so this must be helpful.	2016-10-08 19:03:16 +02:00
Pierre-Yves David	1f966aa892	checkcopies: rename 'ca' to 'base' This variable was named after the common ancestor. It is actually the merge base that might differ from the common ancestor in the graft case. We rename the variable before a larger refactoring to clarify the situation.	2016-10-08 18:38:42 +02:00
Gábor Stefanik	8d5c0019c5	copies: don't record divergence for files needing no merge This is left over from when _checkcopies was factored out from mergecopies. The 2nd break has "of = None" before it, so it's a functionally equivalent change. The 1st one, however, causes a divergence to be recorded when a file has been renamed, but there is nothing to be merged to it. This is currently harmless, since the extra divergence is simply ignored later. However, the new _checkcopies introduced in the rest of this series does more than just record a divergence after completing the main loop, and it's important that the "post-processing" stage is really skipped for no-merge-needed renames.	2016-10-03 13:29:59 +02:00
Gábor Stefanik	5a7c7889a5	copies: mark checkcopies as internal with the _ prefix	2016-10-03 13:24:56 +02:00
Gábor Stefanik	6f3ec7c4c3	copies: split u1/u2 to u1u/u2u and u1r/u2r These will be made different in case of grafts by another patch in this series.	2016-10-03 13:23:19 +02:00
Gábor Stefanik	88c9737beb	copies: style fixes and add comment	2016-10-03 13:18:31 +02:00
Gábor Stefanik	846bad50ac	copies: limit is an optimization, and doesn't provide guarantees	2016-10-03 16:19:55 +02:00
timeless	a1cb3173a2	py3: convert to next() function next(..) was introduced in py2.6 and .next() is not available in py3 https://docs.python.org/2/library/functions.html#next	2016-05-16 21:30:53 +00:00
Durham Goode	6f7f581f5f	copies: optimize forward copy detection logic for rebases Forward copy detection (i.e. detecting what files have been moved/copied in commit X since ancestor Y) previously required diff'ing the manifests of both X and Y. This was expensive since it required reading both entire manifests and doing a set difference (they weren't already in a set because of the lazymanifest work). This cost almost 1 second on very large repositories, and happens N times for a rebase of N commits. This patch optimizes it for the case of rebase. In a rebase, we are comparing a commit against it's immediate parent, and therefore we can know what files changed by looking at ctx.files(). This lets us drastically decrease the size of the set comparison, and makes it O(# of changes) instead of O(size of manifest). This makes it take 1ms instead of 1000ms.	2016-02-05 13:23:24 -08:00
Matt Mackall	a8fcfbf03d	copies: fix detection of divergent directory renames If we move all the files out of one directory, but into two different directories, we should not consider it a directory rename. The detection of this case was broken.	2016-01-13 10:10:05 -06:00
Mads Kiilerich	09567db49a	spelling: trivial spell checking	2015-10-17 00:58:46 +02:00
Matt Mackall	3b6391ff9a	copies: group bothnew with other sets	2015-08-19 15:40:13 -05:00
Matt Mackall	a4a77851e3	copies: rename renamedelete to renamedeleteset for clarity	2015-08-19 15:32:27 -05:00
Matt Mackall	505912bcc2	copies: move _makegetfctx calls into checkcopies	2015-08-19 15:26:08 -05:00
Matt Mackall	babc6236a3	copies: factor out setupctx into _makegetfctx This reduces the scope of mergecopies a bit	2015-08-19 15:17:33 -05:00
Matt Mackall	6be7bd49e6	copies: avoid reference to c1/c2 in makectx	2015-08-21 15:12:58 -05:00
Matt Mackall	846cb052bb	copies: move debug statement to appropriate place	2015-08-19 15:11:17 -05:00
Matt Mackall	06dcd73aae	copies: rename diverge2 to divergeset for clarity	2015-08-19 14:04:54 -05:00
Matt Mackall	cf71f5908f	copies: begin separating mergecopies sides	2015-08-19 13:40:18 -05:00
Matt Mackall	a37c0c906b	copies: rename ctx() to getfctx() for clarity	2015-08-19 13:09:54 -05:00
Durham Goode	1bef6fef82	copy: add flag for disabling copy tracing Copy tracing can be up to 80% of rebase time when rebasing stacks of commits in large repos (hundreds of thousands of files). This provides the option of turning off the majority of copy tracing. It does not turn off _forwardcopies() since that is used to carry copy information inside a commit across a rebase. This will affect the situation where a user edits a file, then rebases on top of commits that have moved that file. The move will not be detected and the user will have to manually resolve the issue (possibly by redoing the rebase with this flag off). The reason to have a flag instead of trying to fix the actual copy tracing performance is that copy tracing is fundamentally an O(number of files in the repo) operation. In order to know if file X in the rebase source was copied anywhere, we have to walk the filelog for every new file that exists in the rebase destination (i.e. a file in the destination that is not in the common ancestor). Without an index that lets us trace forward (i.e. from file Y in the common ancestor forward to the rebase destination), it will never be an O(number of changes in my branch) operation. In mozilla-central, rebasing a 3 commit stack across 20,000 revs goes from 39s to 11s.	2015-01-27 11:26:27 -08:00
Gregory Szorc	9440bd18c5	copies: use absolute_import	2015-08-08 00:41:13 -07:00
Matt Mackall	fc81d2a796	merge with stable	2015-05-26 14:52:47 -05:00
Matt Mackall	bd4df663a9	mergecopies: avoid slowdown from linkrev adjustment (issue4680) checkcopies was using fctx.rev() which it was expecting would be equivalent to linkrev() but was triggering the new _adjustlinkrev path. This was making grafts and merges with large sets of potential copies very expensive.	2015-05-26 06:45:18 -05:00
Martin von Zweigbergk	68e09d510f	copies: document hack for adding '' to set of dirs The root directory is not normally added to 'dirs' instances (although I think it should be). In copies.mergecopies, we call dirname() to get the directory of a path and then check for containment in the 'dirs' instances ('d1' and 'd2'). In order to easily handle files in the root directory, '/' is added to d1/d2. This results in the empty string being added to the sets, since what comes before the slash in '/' is an empty string. This seems less than obvious, so let's document it.	2015-05-22 14:02:04 -07:00
Durham Goode	586027b77d	copies: switch to using pathutil.dirname copies had it's own dirname implementation. Now that pathutils has a common one, let's use that instead.	2015-05-22 12:58:27 -07:00
Durham Goode	610230ad03	copies: add matcher parameter to copy logic This allows passing a matcher down the pathcopies() stack to _forwardcopies(). This will let us add logic in a later patch to avoid tracing copies when not necessary (like when doing hg diff -r 1 -r 2 foo.txt).	2015-04-16 11:29:30 -07:00
Durham Goode	ce308b547b	copies: pass changectx instead of manifest to _computenonoverlap The _computenonoverlap function takes two manifests to allow extensions to hook in and read the manifest nodes produced by the function. The remotefilelog extension actually needs the entire changectx instead (which includes the manifest) so it can prefetch the subset of files necessary for a sparse checkout (and the sparse checkout depends on which commit is being accessed, hence the need for the changectx). I have tests in the remotefilelog extension that cover this.	2015-04-03 15:18:34 -07:00
Matt Mackall	db55434dfb	merge with stable	2015-03-20 17:30:38 -05:00
Pierre-Yves David	41927328f0	mergecopies: reuse ancestry context when traversing file history (issue4537) Merge copies is traversing file history in search for copies and renames. Since 3.3 we are doing "linkrev adjustment" to ensure duplicated filelog entry does not confuse the traversal. This "linkrev adjustment" involved ancestry testing and walking in the changeset graph. If we do such walk in the changesets graph for each file, we end up with a 'O(<changesets>x<files>)' complexity that create massive issue. For examples, grafting a changeset in Mozilla's repo moved from 6 seconds to more than 3 minutes. There is a mechanism to reuse such ancestors computation between all files. But it has to be manually set up in situation were it make sense to take such shortcut. This changesets set this mechanism up and bring back the graph time from 3 minutes to 8 seconds. To do so, we need a bigger control on the way 'filectx' are instantiated during each 'checkcopies' calls that 'mergecopies' is doing. We add a new 'setupctx' that configure and return a 'filectx' factory. The function make sure the ancestry context is properly created and the factory make sure it is properly installed on returned 'filectx'.	2015-03-20 00:30:35 -07:00
Matt Mackall	ab01fb226c	copies: use linkrev for file tracing limit This lets us lazily evaluate _adjustlinkrev.	2015-02-01 16:25:12 -06:00
Pierre-Yves David	e94f338ab6	_adjustlinkrev: reuse ancestors set during rename detection (issue4514) The new linkrev adjustement mechanism makes rename detection very slow, because each file rewalks the ancestor dag. To mitigate the issue in Mercurial 3.3, we introduce a simplistic way to share the ancestors computation for the linkrev validation phase. We can reuse the ancestors in that case because we do not care about sub-branching in the ancestors graph. The cached set will be use to check if the linkrev is valid in the search context. This is the vast majority of the ancestors usage during copies search since the uncached one will only be used when linkrev is invalid, which is hopefully rare.	2015-01-30 16:02:28 +00:00
Durham Goode	f07fec4ff0	copies: added manifests to computenonoverlap Commit d1f83f500b47 changed the computenonoverlap api's to not require the manifests. We actually need the manifests in the remotefilelog extension so we can find the file nodes for the various files that change. Let's add it back to the function signature with a note explaining why. This doesn't affect any behavior.	2015-03-10 13:56:05 -07:00
Martin von Zweigbergk	2de043381d	copies: only calculate 'addedinm[12]' sets once Pass the addedinm1 and addedinm2 instead of m1, m2, ma into _computenonoverlap() instead of calculating the sets twice.	2015-02-27 14:26:22 -08:00
Martin von Zweigbergk	d4eabc6ccd	copies: calculate 'bothnew' from manifestdict.filesnotin() In the same spirit as the previous change, let's now calculate the 'bothnew' variable using manifestdict.filesnotin().5D	2015-02-27 14:03:01 -08:00
Martin von Zweigbergk	93f839cfd2	copies: replace _nonoverlap() by calls to manifestdict.filesnotin() Now that we have manifestdict.filesnotin(), we can write _nonoverlap() in terms of that method instead, enabling future speedups when filesnotin() gets optimized, and perhaps making the code a little clearer at the same time.	2015-02-27 14:02:30 -08:00
Martin von Zweigbergk	199e845f93	copies: move code into new manifestdict.filesnotin() method copies._computeforwardmissing() finds files in one context that is not in the other. Let's move this code into a new method on manifestdict, so m1.filesnotin(m2) can be optimized for various types of manifests (we expect more types of manifests soon).	2015-02-27 13:57:37 -08:00
Durham Goode	7b9ea7ac55	copy: move _forwardcopies file logic to a function Moves the _forwardcopies missingfiles logic to a separate function so that other extensions which need to prefetch information about the files being processed have a hook point. This saves extensions from having to recompute this information themselves, and thus saves several seconds off of various commands (like rebase).	2015-01-27 17:24:12 -08:00
Durham Goode	5a2bff7069	copy: move mergecopies file logic to a function Moves the mergecopies nonoverlap logic to a separate function so that other extensions which may need to prefetch information about the files being processed have a hook point. This saves extensions from having to recompute this information themselves, and thus saves several seconds off of various commands (like rebase).	2015-01-27 17:23:18 -08:00
Mads Kiilerich	523c87c1fe	spelling: fixes from proofreading of spell checker issues	2014-04-17 22:47:38 +02:00
Ryan McElroy	365c7718eb	amend: fix amending rename commit with diverged topologies (issue4405) This addresses the bug described in issue4405: when obsolescence markers are enabled, amending a commit with a file move can lead to the copy information being lost. However, the bug is more general and can be reproduced without obsmarkers as well, as demonstracted by Pierre-Yves and put into the updated test. Specifically, graph topology divergences between the filelogs and the changelog can cause copy information to be lost during amends.	2014-10-16 06:35:06 -07:00
Matt Mackall	f663e5fc01	duplicatecopies: move from cmdutil to copies This is in preparation for moving its primary caller into merge.py, which would be a layering violation in the current location.	2014-10-13 14:33:13 -05:00
Mads Kiilerich	c55887b864	copies: guard debug section with ui.debugflag	2014-02-25 20:31:53 +01:00
Mads Kiilerich	210347bb4f	copies: remove _checkcopies wrapper - it does no good mergecopies might be doomed but it is not dead yet ...	2014-02-25 20:31:51 +01:00
Mads Kiilerich	43ddf0086b	copies: when both sides made the same copy, report it as a copy Not used yet ... but shows up in debug output.	2014-02-25 20:29:14 +01:00
Mads Kiilerich	3ee1a27c56	diff: search beyond ancestor when detecting renames This removes an optimization that was introduced in 5a644704d5eb but was too aggressive - as indicated by how it changed test-mq-merge.t . We are walking filelogs to find copy sources and we can thus not be sure to hit the base revision and find the renamed file there - it could also be in the first ancestor of the base ... in the filelog. We are walking the filelog and can thus not easily know when we hit the first ancestor of the base revision and which filename to look for there. Instead, we use _findlimit like mergecopies do: The lower bound for how far we have to go is found from the lowest changelog revision that is an ancestor of only one of the compared revisions. Any filelog ancestor with a revision number lower than that revision will be the ancestor of both compared revisions, and there is thus no reason to go further back than that.	2013-11-16 15:46:29 -05:00
Durham Goode	8921ef9d64	copies: refactor checkcopies() into a top level method This moves checkcopies() out of mergecopies() and makes it a top level function in the copies module. This allows extensions to override it. For example, I'm developing a filelog replacement that doesn't have rev numbers so all the rev number dependent implementation here needs to be replaced by the extension. No logic is changed in this commit.	2013-05-01 10:44:21 -07:00

1 2 3

110 Commits