sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 00:45:18 +03:00

Author	SHA1	Message	Date
Martin von Zweigbergk	55f777a236	copies: group wdir-handling in one place I think this makes it both easier to follow and shorter. Differential Revision: https://phab.mercurial-scm.org/D1698	2017-12-14 00:25:03 -08:00
Martin von Zweigbergk	c3a801b87d	copies: extract method for getting non-wdir forward copies I may add an alternative way of getting copy metadata (from changelog, not filelog) but the chaining with the dirstate copy metadata will be the same, so it will probably help to have this extracted. Even if that doesn't happen, the next patch will show that we can simplify this a bit after this refactoring, so it seems worth it regardless. Differential Revision: https://phab.mercurial-scm.org/D1697	2017-12-14 00:18:38 -08:00
Martin von Zweigbergk	ffa0e63e83	copies: consistently use """ for docstrings Differential Revision: https://phab.mercurial-scm.org/D1696	2017-12-14 08:27:22 -08:00
Martin von Zweigbergk	3bd0ca942e	copies: always respect matcher arg to _forwardcopies() The function would ignore the matcher if the dirstate copies were requested. It doesn't matter in practice because all callers used the returned map only for looking up specific files from and those files had already been filtered by the matcher (AFACT). Still, it's a little confusing, so let's make it clearer by respecting the matcher in this case too. Differential Revision: https://phab.mercurial-scm.org/D1695	2017-12-11 10:24:38 -08:00
Pulkit Goyal	98c13e4c70	copies: add a config to limit the number of candidates to check in heuristics The heuristics algorithm find possible candidates for move/copy and then check whether they are actually a copy or move. In some cases, there can be lot of candidates possible which can actually slow down the algorithm. This patch introduces a config option `experimental.copytrace.movecandidateslimit` using which one can limit the candidates to check. The limit defaults to 100. Thanks to Yuya for suggesting to skip copytracing for that file with a warning. Differential Revision: https://phab.mercurial-scm.org/D987	2017-10-10 02:25:03 +05:30
Phil Cohen	a04280b2f9	context: add workingfilectx.markcopied With in-memory merge, copy information needs to be stored in-memory, not in the dirstate. To make this transition easy, move the existing dirstate-based approach to workingfilectx; that way, other implementations can choose to store it somewhere else. Differential Revision: https://phab.mercurial-scm.org/D1106	2017-10-15 20:36:29 -07:00
Pulkit Goyal	2251d97e92	copies: add docs for config `experimental.copytrace.sourcecommitlimit` This patch adds documentation for the config option. The config name does not convey much and hence documentation was required. Differential Revision: https://phab.mercurial-scm.org/D986	2017-10-08 04:39:42 +05:30
Yuya Nishihara	183298cb9f	copytrace: use ctx.mutable() instead of adhoc constant of non-public phases	2017-09-22 22:45:02 +09:00
Pulkit Goyal	2c8e44b5f7	py3: explicitly convert dict.keys() and dict.items() into a list Differential Revision: https://phab.mercurial-scm.org/D853	2017-09-30 15:45:15 +05:30
Pulkit Goyal	ee400fb169	copytrace: add a a new config to limit the number of drafts in heuristics The heuristics options tries to the default full copytracing algorithm if both the source and destination branches contains of non-public changesets only. But this can be slow in cases when we have a lot of drafts. This patch adds a new config option experimental.copytrace.sourcecommitlimit which defaults to 100. This value will be the limit of number of drafts from c1 to base. Incase there are more changesets even though they are draft, the heuristics algorithm will be used. Differential Revision: https://phab.mercurial-scm.org/D763	2017-09-21 15:58:44 +05:30
Pulkit Goyal	8a6be941c9	copytrace: use the full copytracing method if only drafts are involved This patch adds the functionality to use the full copytracing even if `experimental.copytrace = heuristics` in cases when drafts are involved. This is also a part of copytrace extension in fbext. This also adds tests which are also taken from fbext. .. feature:: The `heuristics` option for `experimental.copytrace` performs full copytracing if both source and destination branches contains non-public changsets only. Differential Revision: https://phab.mercurial-scm.org/D625	2017-09-03 20:06:45 +05:30
Pulkit Goyal	a5baff1381	copytrace: move fast heuristic copytracing algorithm to core copytrace extension in fb-hgext has a heuristic implementation of copy tracing which is faster than the current copy tracing. The heuristic limits the search of copies to just files that are either: 1) Renames in the same directory 2) Moved to other directory with same name The default copytrace implementation is very slow as it finds all the new files that were added from merge base up to the head commit and for each file it checks whether it this was copied or moved version of a different file. Stash@fb did analysis for the above heuristics on the fb repo and found that among 2,443,768 moves/copies there are only 32,234 moves/copies which does not fall under the above heuristics which is approx. 0.013 of total copies. This patch moves the heuristics algorithm under config `experimental.copytrace=heuristics`. While moving fbext to core, this patch removes couple of less useful config options named `sourcecommitlimit` and `maxmovescandidatestocheck`. Tests are also added for the heuristics algorithm, which are basically copied from fbext/tests/test-copytrace.t. The tests follow a pattern creating a server repo and then cloning to a local repo to create public and draft changesets, the distinction which will be useful in upcoming patches. After this patch `experimental.copytrace` has the following behaviour: 1) `off`: turns off copytracing 2) `heuristics`: use the heuristic algorithm added in this patch. 3) everything else: use the full copytracing algorithm .. feature:: A new fast heuristic algorithm for copytracing which assumes that the files moves are either:: 1) Renames in the same directory 2) Moves in other directories with same names You can use this algorithm by setting `experimental.copytrace=heuristics`. Differential Revision: https://phab.mercurial-scm.org/D623	2017-09-03 03:49:15 +05:30
Pulkit Goyal	b8929368f2	copytrace: move the default copytracing algorithm in a new function We are going to introduce a new fast heuristic based copytracing algorithm, so lets make mergecopies the function which decides which algorithm to go with and then calls the related function. While I was here, I add a line in test-copy-move-merge.t saying its a test related to the full copytracing algorithm. Differential Revision: https://phab.mercurial-scm.org/D622	2017-09-03 02:34:01 +05:30
Pulkit Goyal	59fe14130c	copytrace: replace experimental.disablecopytrace config with copytrace (BC) This patch replaces experimental.disablecopytrace with experimental.copytrace. Since the words does not means the same, the default value is also changed. Now experimental.copytrace defaults to 'on'. The new value is not boolean value as we will be now having two different algorithms (current one and heuristics one to be imported from fbext) so we need this to be have more options than booleans. The old config option is not kept is completely replaced as that was under experimental and we don't gurantee BC to experimental things. .. bc:: The config option for copytrace `experimental.disablecopytrace` is now replaced with `experimental.copytrace` which defaults to `on`. If you need to turn off copytracing, add `[experimental] copytrace = off` to your config. Differential Revision: https://phab.mercurial-scm.org/D621	2017-09-03 01:52:19 +05:30
Gábor Stefanik	2431ab3a7d	copies: fix misaligned lines	2017-08-22 16:16:39 +02:00
Gábor Stefanik	a4ce7f6a87	copies: fix typo in comment "will not be limited" was meant to be "will not be visited". I missed this when writing the original graft-through-rename patch series.	2017-08-22 16:08:31 +02:00
Yuya Nishihara	5b29c5b3bd	copies: use intersectmatchers() in non-merge p1 optimization This enables the optimization introduced by b8d938230143 for non-rebase cases. Before, the match couldn't be narrowed if it was e.g. alwaysmatcher. The logic is copied from fca0d99edf8e.	2017-08-19 11:23:33 +09:00
Pulkit Goyal	304e4abf3a	copies: add more details to the documentation of mergecopies() This documentation is very helpful for any developer to understand what copytracing is and what the function does. Since this is the main function of doing copytracing, I have also included bits about copytracing in it. This additions are picked from a doc by Stash@Fb. So thanks to him. Differential Revision: https://phab.mercurial-scm.org/D409	2017-08-16 00:25:20 +05:30
Pulkit Goyal	b73ca160f7	py3: use dict.update() instead of constructing lists and adding them dict.items() returned a list on Python 2 and whereas on Python 3 it returns a view object. So we required a work around. Using dict.update() is better then constructing lists as it should save us on gc churns.	2017-06-01 01:14:02 +05:30
Stanislau Hlebik	8513dd5f4b	copies: introduce getdstfctx Previously `c2` may had an incorrect linkrev because getsrcfctx set wrong _descendantrev. getsrcfctx() sets descendant rev equals to srcctx.rev() (see _makegetfctx()), but for `c2` descendant rev should be dstctx. While we were lucky it didn't broke copytracing it made it significantly slower in some cases. Besides it broke some external extensions, for example remotefilelog.	2017-05-29 06:06:13 -07:00
Stanislau Hlebik	52f5b22254	copies: rename getfctx to getsrcfctx In the next patch we'll use getdstfctx. Let's rename getfctx to getsrcfctx in this patch.	2017-05-29 05:58:08 -07:00
Stanislau Hlebik	789df611a1	copies: remove msrc and mdst parameters This function already has lots of parameters. And we can get manifests from contexts. So let's get msrc and mdst parameters from srcctx and dstctx.	2017-05-29 05:57:25 -07:00
Stanislau Hlebik	d7b30bb1f8	copies: add dstctx parameter Add parameter with destination context	2017-05-29 05:57:03 -07:00
Stanislau Hlebik	92e5d5f67c	copies: rename ctx to srcctx In the next diff we'll pass new dstctx parameter. Let's rename ctx to srcctx in this patch.	2017-05-29 05:56:17 -07:00
Stanislau Hlebik	aa25365b25	copies: rename m2 to mdst Small refactoring to rename m2 to more clearer mdst.	2017-05-29 05:52:15 -07:00
Stanislau Hlebik	219c6a051d	copies: rename m1 to msrc Small refactoring that renames `m1` parameter name to a more clearer name `msrc`.	2017-05-29 05:52:15 -07:00
Martin von Zweigbergk	c3406ac3db	cleanup: use set literals We no longer support Python 2.6, so we can now use set literals.	2017-02-10 16:56:29 -08:00
Durham Goode	3058681421	copies: remove use of manifest.matches Convert the existing use of manifest.matches to use the new api. This is part of getting rid of manifest.matches, since it is O(manifest).	2017-03-07 09:56:11 -08:00
Gábor Stefanik	b631d16eab	graft: support grafting changes to new file in renamed directory (issue5436)	2016-12-05 17:40:01 +01:00
Durham Goode	fb55c2fbf3	dirstate: change added/modified placeholder hash length to 20 bytes Previously the added/modified placeholder hash for manifests generated from the dirstate was a 21byte long string consisting of the p1 file hash plus a single character to indicate an add or a modify. Normal hashes are only 20 bytes long. This makes it complicated to implement more efficient manifest implementations which rely on the hashes being fixed length. Let's change this hash to just be 20 bytes long, and rely on the astronomical improbability of an actual hash being these 20 bytes (just like we rely on no hash every being the nullid). This changes the possible behavior slightly in that the hash for all added/modified entries in the dirstate manifest will now be the same (so simple node comparisons would say they are equal), but we should never be doing simple node comparisons on these nodes even with the old hashes, because they did not accurately represent the content (i.e. two files based off the same p1 file node, with different working copy contents would have the same hash (even with the appended character) in the old scheme too, so we couldn't depend on the hashes period).	2016-11-10 02:19:16 -08:00
Durham Goode	03d313b5fd	dirstate: change placeholder hash length to 20 bytes Previously the new-node placeholder hash for manifests generated from the dirstate was a 21byte long string of "!" characters. Normal hashes are only 20 bytes long. This makes it complicated to implement more efficient manifest implementations which rely on the hashes being fixed length. Let's change this hash to just be 20 bytes long, and rely on the astronomical improbability of an actual hash being 20 "!" bytes in a row (just like we rely on no hash ever being the nullid). A future diff will do this for added and modified dirstate markers as well, so we're putting the new newnodeid in node.py so there's a common place for these placeholders.	2016-11-10 02:17:22 -08:00
Gábor Stefanik	5533b05a12	merge: avoid superfluous filemerges when grafting through renames (issue5407) This is a fix for a regression introduced by the patches for issue4028. The test changes are due to us doing fewer _checkcopies searches now, which makes some test outputs revert to the pre-issue4028 behavior. That issue itself remains fixed, we only skip copy tracing for files where it isn't relevant. As a nice side effect, this makes copy detection much faster when tracing backwards through lots of renames.	2016-10-25 21:01:53 +02:00
Gábor Stefanik	14dc42e666	copies: improve assertions during copy recombination - Make sure there is nothing to recombine in non-graftlike scenarios - More pythonic assert syntax	2016-10-18 02:09:08 +02:00
Gábor Stefanik	2f48be6841	copies: make _checkcopies handle copy sequences spanning the TCA (issue4028) When working in a rotated DAG (for a graftlike merge), there can be files that are renamed both between the base and the topological CA, and between the TCA and the endpoint farther from the base. Such renames span the TCA (and thus need both passes of _checkcopies to be fully detected), but may not necessarily be divergent. Make _checkcopies return "incomplete copies" and "incomplete divergences" in this case, and let mergecopies recombine them once data from both passes of _checkcopies is available. With this patch, all known cases involving renames and grafts pass. (Developed together with Pierre-Yves David)	2016-10-11 04:39:47 +02:00
Gábor Stefanik	c03c8792d5	checkcopies: add logic to handle remotebase As the two _checkcopies passes' ranges are separated by tca, not base, only one of the two passes will actually encounter the base. Pass "remotebase" to the other pass to let it know not to expect passing over the base. This is required for handling a few unusual rename cases.	2016-10-11 04:25:59 +02:00
Gábor Stefanik	912f58ada1	mergecopies: add logic to process incomplete data We first combine incomplete copies on the two sides of the topological CA into complete copies. Any leftover incomplete copies are then combined with the incomplete divergences to reconstruct divergences spanning over the topological CA. Finally we promote any divergences falsely flagged as incomplete to full divergences. Right now, there is nothing generating incomplete copy/divergence data, so this code does nothing. Changes to _checkcopies to populate these dicts are coming later in this series.	2016-10-04 12:51:54 +02:00
Gábor Stefanik	242c4897e8	checkcopies: handle divergences contained entirely in tca::ctx During a graftlike merge, _checkcopies runs from ctx to tca, possibly passing over the merge base. If there is a rename both before and after the base, then we're actually dealing with divergent renames. If there is no rename on the other side of tca, then the divergence is contained entirely in the range of one _checkcopies invocation, and should be detected "in the loop" without having to rely on the other _checkcopies pass.	2016-10-12 11:54:03 +02:00
Gábor Stefanik	d967d939d6	mergecopies: invoke _computenonoverlap for both base and tca during merges The algorithm of _checkcopies can only walk backwards in the DAG, never forward. Because of this, the two _checkcopies patches need to run from their respective endpoints to the TCA to cover the entire subgraph where the merge is being performed. However, detection of files new in both endpoints, as well as directory rename detection, need to run with respect to the merge base, so we need lists of new files both from the TCA's and the merge base's viewpoint to correctly detect renames in a graft-like merge scenario. (Series reworked by Pierre-Yves David)	2016-10-13 02:19:43 +02:00
Pierre-Yves David	cce3e9c3ad	copies: make it possible to distinguish betwen _computenonoverlap invocations _computenonoverlap needs to be invoked twice during a graft, and debugging messages should be distinguishable between the two invocations	2016-10-18 00:00:43 +02:00
Gábor Stefanik	7730b47e09	copies: make _checkcopies handle simple renames in a rotated DAG This introduces a distinction between "merge base" and "topological common ancestor". During a regular merge, these two are identical. Graft, however, performs a merge in a rotated DAG, where the merge base will not be a common ancestor at all in the original DAG. To correctly find copies in case of a graft, we need to take both the merge base and the topological CA into account, and track any renames between them in reverse. Fortunately we can detect this in advance, see comment in the code about "backwards". This patch only supports finding non-divergent renames contained entirely between the merge base and the topological CA. Further patches are coming to support more complex cases. (Pierre-Yves David was involved in the cleanup of this patch.)	2016-10-13 02:03:54 +02:00
Gábor Stefanik	60bab1ec6c	copies: compute a suitable TCA if base turns out to be unsuitable This will be used later in an update to _checkcopies. (Pierre-Yves David was involved in the cleanup of this patch.)	2016-10-13 02:03:49 +02:00
Gábor Stefanik	6250f7ff54	copies: detect graft-like merges Right now, nothing changes as a result of this, but we want to handle grafts differently from ordinary merges later. (Series developed together with Pierre-Yves David)	2016-10-13 01:47:33 +02:00
Gábor Stefanik	4adc2f1a6a	checkcopies: add a sanity check against false-positive copies When grafting a copy backwards through a rename, a copy is wrongly detected, which causes the graft to be applied inappropriately, in a destructive way. Make sure that the old file name really exists in the common ancestor, and bail out if it doesn't. This fixes the aggravated case of bug 5343, although the basic issue (failure to duplicate the copy information) still occurs.	2016-10-12 21:33:45 +02:00
Pierre-Yves David	604c8243a9	mergecopies: rename 'ca' to 'base' This variable was named after the common ancestor. It is actually the merge base that might differ from the common ancestor in the graft case. We rename the variable before a larger refactoring to clarify the situation. Similar rename was also applied to 'checkcopies' in a prior changeset.	2016-10-13 01:30:14 +02:00
Pierre-Yves David	4eabc75da9	copies: move variable document from checkcopies to mergecopies It appears that 'mergecopies' is the function consuming these data so we move the documentation there.	2016-10-13 01:26:33 +02:00
Pierre-Yves David	9df147eb63	checkcopies: pass data as a dictionary of dictionaries more are coming	2016-10-11 02:21:42 +02:00
Pierre-Yves David	80ed73689f	checkcopies: move 'movewithdir' initialisation right before its usage The 'movewithdir' had a lot of related logic all around the 'mergecopies'. However it is actually never containing anything until the very last loop in that function. We move the (simplified) variable definition there for clarity	2016-10-11 02:15:23 +02:00
Pierre-Yves David	163070ae3d	checkcopies: extract the '_related' closure There is not need for it to be a closure.	2016-10-11 01:29:08 +02:00
Pierre-Yves David	2d670597fe	checkcopies: add an inline comment about the '_related' call This helps understanding the flow of the function.	2016-10-08 23:00:55 +02:00
Pierre-Yves David	71b0c4ef9c	checkcopies: minor change to comment This helped me understand the refactoring so this must be helpful.	2016-10-08 19:03:16 +02:00

1 2 3 4

156 Commits