sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 00:45:18 +03:00

Author	SHA1	Message	Date
Boris Feld	cc076020d2	setdiscover: allow to ignore part of the local graph Currently, the push discovery first determines the full set of common nodes before looking into what changesets are outgoing. When pushing a specific subset, this can lead to pathological situations where we search for the status of thousand of local heads that are unrelated to the requested pushes. To fix this, we need to teach the discovery to ignores part of the graph. Most of the necessary pieces were already in place. This changeset just makes them available to higher level API and tests them. Change actually impacting pushes are coming in a later changeset.	2017-12-06 22:44:51 +01:00
Pierre-Yves David	c82b13f1cf	setdiscovery: improves logged message The 'srvheads' list contains all server heads including the common ones. We adjust 'ui.log' message to provide more useful information about server heads locally unknown. The performance impact of turning the list to set is negligible (about 1e-4s) compared to the rest of the discovery cost, so I'm taking the easy path.	2017-06-10 18:47:09 +01:00
Pierre-Yves David	ef5b27290d	discovery: log discovery result in non-trivial cases We log the discovery summary, the number of roundtrips and the elapsed time. This is useful to understand where slow push might come from when lloking at the blackbox.	2017-06-07 10:44:11 +01:00
Pierre-Yves David	4db3d34a4b	discovery: include timing in the debug output Having such date easily available is useful. It also prepare the inclusion of some discovery related data in blackbox.	2017-06-07 10:29:39 +01:00
Martin von Zweigbergk	c3406ac3db	cleanup: use set literals We no longer support Python 2.6, so we can now use set literals.	2017-02-10 16:56:29 -08:00
Augie Fackler	b6dda02542	setdiscovery: use iterbatch interface instead of batch It's a little more concise, and gives us some simple test coverage.	2016-03-01 17:44:41 -05:00
Pierre-Yves David	30913031d4	error: get Abort from 'error' instead of 'util' The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be confused about that and gives all the credit to 'util' instead of the hardworking 'error'. In a spirit of equity, we break the cycle of injustice and give back to 'error' the respect it deserves. And screw that 'util' poser. For great justice.	2015-10-08 12:55:45 -07:00
Gregory Szorc	fa0db1b2ec	setdiscovery: use absolute_import	2015-08-08 19:53:25 -07:00
Augie Fackler	11540756a2	discovery: always use batching now that all peers support batching Some peers will transparently downgrade batched requests to non-batched ones, but that simplifies code for everyone using batching.	2015-08-05 14:21:46 -04:00
Augie Fackler	9c2e980a64	cleanup: use __builtins__.all instead of util.all	2015-05-16 14:34:19 -04:00
Martin von Zweigbergk	4127f67694	util: drop alias for collections.deque Now that util.deque is just an alias for collections.deque, let's just remove it.	2015-05-16 11:28:04 -07:00
Pierre-Yves David	4f0528a29a	setdiscovery: remove '_setupsample' function It is now unused.	2015-01-06 17:19:21 -08:00
Pierre-Yves David	c84e477509	setdiscovery: document '_takequicksample'	2015-01-07 20:44:20 -08:00
Pierre-Yves David	2ee7c25abf	setdiscovery: drop '_setupsample' usage in '_takequicksample' For '_takefullsample' we can just retrieve the list of head directly and ignore the rest of the complex return values. This was the last call to the infamous '_updatesample' function.	2015-01-06 17:07:44 -08:00
Pierre-Yves David	c05e3eea5d	setdiscovery: drop the 'always' argument to '_updatesample' This argument exists because of the complex code flow in '_takequicksample'. It first gets the list of heads and then calls '_updatesample' on an empty initial sample and a size limit matching the differences between the number of heads and the target sample size. Finally the heads and the sample from '_updatesample' were added. To ensure this addition result had the exact target length, the code had to ensure no elements from the heads were added to the '_updatesample' content and therefore was passing this "always included set of heads". Instead we can just update the initial heads sample directly and use the final target size as target size for the update. This removes the need for this 'always' parameter to the '_updatesample' function The test are affected because different set building order results in different random sampling.	2015-01-07 10:32:17 -08:00
Pierre-Yves David	6ff053fa11	setdiscovery: always add exponential sample to the heads As explained in a previous changeset, prioritizing heads too much behaves pathologically when there are more heads than the sample size. To counter this, we always inject exponential samples before reducing to the sample size limit. This already show some benefit in the test themselves, but on a real-world example this moves my discovery for push to pathologically headed repo from 45 rounds to 17 of them. We should maybe ensure that at least 25% of the result sample is heads, but I think the random sampling will be fine in practice.	2015-01-07 17:28:51 -08:00
Pierre-Yves David	60a9cd0334	setdiscovery: directly run '_updatesample' The heads and exponential sample are going to end up in the same set before any extra processing happens. We simplify the code by directly updating a set with heads. Changes in the order the set is built lead to small changes in the random sampling output. But after double checking, I can confirm the input data to the random sampling is consistent.	2015-01-07 17:23:21 -08:00
Pierre-Yves David	252ba1a3c3	setdiscovery: stop using '_setupsample' in '_takefullsample' Very few of the return values of '_setupsample' remain in use, so we directly retrieve the value we care about and drop the '_setupsample' call.	2015-01-07 17:17:56 -08:00
Pierre-Yves David	e3605ecf1f	setdiscovery: randomly pick between heads and sample when taking full sample Before this changeset, the discovery protocol was too heads-centric. Heads of the undiscovered set were always sent for discovery and any room remaining in the sample were filled with exponential samples (and random ones if any room remained). This behaved extremely poorly when the number of heads exceeded the sample size, because we keep just asking about the existence of heads, then their direct parent and so on. As a result, the 'O(log(len(repo)))' discovery turns into a 'O(len(repo))' one. As a solution we take a random sample of the heads plus exponential samples. This way we ensure some exponential sampling is achieved, bringing back some logarithmic convergence of the discovery again. This patch only applies this principle in one place. More places will be updated in future patches. One test is impacted because the random sample happen to be different. By chance, it helps a bit in this case.	2015-01-07 12:09:51 -08:00
Pierre-Yves David	6141054495	setdiscovery: document the '_updatesample' function This function is central in the sample building process, having it documented help code readability a lot.	2015-01-06 17:02:32 -08:00
Pierre-Yves David	2fde36047b	setdiscovery: avoid calling any sample building if the undecided set is small If the length of undecided is smaller than the sample size, we can just request information for all of them. This conditional was previously handled by '_setupsample'. But '_setupsample' is in my opinion a problematic function with blurry semantics. Having this conditional explicitly earlier makes the code more explicit and moves us closer to removing this '_setupsample' function.	2015-01-06 16:40:33 -08:00
Pierre-Yves David	ef881538c4	setdiscovery: delay sample building calls to gather them in a single place Some of the logic around sample building is duplicated in the sample builders, it would clean up thing to extract it in the top function, but this requires all codes to be in the same place. This changeset mostly exists to make the next one more clear.	2015-01-07 09:30:06 -08:00
Pierre-Yves David	ebee9c1c62	setdiscovery: drop unused 'initial' argument for '_takequicksample' There is a single call site, and it is always using 'initial=True'. So we just drop the argument and the associated condition.	2015-01-06 16:32:23 -08:00
Pierre-Yves David	e1de1d2c0d	setdiscovery: factorize similar sampling code We are using full sampling of 'fullsamplesize' in both case. The only difference is the debug message. So we factorise the sampling code and put the message in an extra conditional. This is going to help making changes around the sampling logic. Such changes are needed to improve discovery performance on highly headed repository.	2015-01-06 16:30:52 -08:00
Pierre-Yves David	04778d65fb	setdiscovery: drop shadowed 'undecided' assignment The 'undecided' variable was never used before being overwritten a few lines later.	2015-01-06 16:30:37 -08:00
Siddharth Agarwal	fba9f14547	setdiscovery: avoid a full changelog graph traversal We were definitely being suboptimal here: we were constructing two full sets, one with the full set of common nodes (i.e. a graph traversal) and one with all nodes. Then we subtract one set from the other. This whole process is O(commits) and causes discovery to be significantly slower than it should be. Instead, keep track of common incrementally and keep undecided as small as possible. This makes discovery massively faster on large repos: on one such repo, 'hg debugdiscovery' over SSH with one commit missing on the client and five on the server went from 4.5 seconds to 1.5. (An 'hg debugdiscovery' with no commits missing on the client, i.e. connection startup time, was 1.2 seconds.)	2014-11-16 00:40:29 -08:00
Mads Kiilerich	c5488ba34c	discovery: indices between sample and yesno must match (issue4438) 2ec3e28dea6b changed 'sample' from a list to a set. The iteration order is thus undefined and the yesno indices are not stable. To solve this, repeat the listification and comment from elsewhere in the code. Note: the randomness in the discovery protocol can make this problem hard to reproduce.	2014-11-05 13:05:32 +01:00
Mads Kiilerich	8079358ce3	discovery: limit 'all local heads known remotely' to real 'all' (issue4438) 2ec3e28dea6b made it possible that the initial head check didn't include all heads. If that is the case, don't use the early exit just because this random sample happened to be 'all known'. Note: the randomness in the discovery protocol can make this problem hard to reproduce.	2014-11-05 13:05:29 +01:00
Pierre-Yves David	1b8f2c7e41	setdiscovery: limit the size of all sample (issue4411) Further digging on this issue show that the limit on the sample size used in discovery never works for heads. Here is a quote from the code itself: desiredlen = size - len(always) if desiredlen <= 0: # This could be bad if there are very many heads, all unknown to the # server. We're counting on long request support here. The long request support never landed and evolution make the "very many heads, all unknown to the server" case quite common. We implement a simple and stupid hard limit of sample size for all query. This should prevent HTTP 414 error with the current state of the code.	2014-11-01 23:52:53 +00:00
Pierre-Yves David	e107a615ed	setdiscovery: limit the size of the initial sample (issue4411) The set discovery start by sending a "known" command with all local heads. When the number of local heads is massive (eg: using hidden changesets) such request becomes too large. This lead to 414 error over http, aborting the whole process. We limit the size of the sample used by the first query to fix this. The test are impacted because they do test massive number of heads. But they do not test it over real world http setup.	2014-10-27 17:52:33 +01:00
Pierre-Yves David	d1263d8d84	setdiscovery: extract sample limitation in a `_limitsample` function We need to reuse this logic for the initial query. We extract it in a function to unsure sample limiting is applied consistently in all cases.	2014-10-27 17:40:32 +01:00
Olle Lundberg	492c3a2ebf	setdiscovery: document algorithms used This is taken from: http://programmers.stackexchange.com/questions/208998 And modified slightly.	2014-03-06 12:37:28 +01:00
Augie Fackler	9f876f6c89	cleanup: move stdlib imports to their own import statement There are a few warnings still produced by my import checker, but those are false positives produced by modules that share a name with stdlib modules.	2013-11-06 16:48:06 -05:00
Mads Kiilerich	520076e707	delete some dead comments and docstrings	2012-08-21 02:41:20 +02:00
Mads Kiilerich	2f4504e446	fix trivial spelling errors	2012-08-15 22:38:42 +02:00
Pierre-Yves David	ae5abd69f4	localpeer: return only visible heads and branchmap Now that we have localpeer, we can apply filtering on heads and branchmap the same way it's done for wireprotocol peer.	2012-07-17 01:04:45 +02:00
Sune Foldager	ffe56435bf	peer: introduce peer methods to prepare for peer classes This introduces a peer method into all repository classes, which currently simply returns self. It also changes hg.repository so it now raises an exception if the supplied paths does not resolve to a localrepo or descendant. Finally, all call sites are changed to use the peer and local methods as appropriate, where peer is used whenever the code is dealing with a remote repository (even if it's on local disk).	2012-07-13 21:46:53 +02:00
Bryan O'Sullivan	abdf4a8227	util: subclass deque for Python 2.4 backwards compatibility It turns out that Python 2.4's deque type is lacking a remove method. We can't implement remove in terms of find, because it doesn't have find either.	2012-06-01 17:05:31 -07:00
Brodie Rao	d6a6abf2b0	cleanup: eradicate long lines	2012-05-12 15:54:54 +02:00
Pierre-Yves David	8abd0aa7c9	phases: do not exchange secret changesets Any secret changesets will be excluded from pull and push. Phase data are properly synchronized on pull and push if a changeset is seen as secret locally but is non-secret remote side. This patch does not handle the case of a changeset secret on remote but known locally.	2011-12-22 00:42:25 +01:00
Mads Kiilerich	065de91b14	add missing localization markup	2011-11-11 01:07:10 +01:00
Peter Arrenbrecht	83352215f8	setdiscovery: fix hang when #heads>200 (issue2971) When setting up the next sample, we always add all of the heads, regardless of the desired max sample size. But if the number of heads exceeds this size, then we don't add any more nodes from the still undecided set. (This is debatable per se, and I'll investigate it, but it's how we designed it at the moment.) The bug was that we always added the overall heads, not the heads of the remaining undecided set. Thus, if #heads>200 (desired sample size), we did not make progress any longer.	2011-08-25 21:25:14 +02:00
Andrew Pritchard	f23118834a	setdiscovery: return anyincoming=False when remote's only head is nullid This fixes (issue2907) a crash when using 'hg incoming --bundle' with an empty remote repo and a non-empty local repo. This also fixes an unreported bug that 'hg summary --remote' erroneously reports incoming changes when the remote repo is empty and the local is not. Also, add a test to make sure issue2907 stays fixed	2011-07-27 18:32:54 -04:00
Matt Mackall	b06b887ee8	discovery: quiet note about heads This was changing output on in/out -v for no good reason.	2011-07-05 14:30:42 -05:00
Peter Arrenbrecht	9a2d2f747c	setdiscovery: batch heads and known(ownheads) This means that we now discover both subset conditions (local<remote and remote<local) in a single roundtrip without ever constructing an actual sample (which takes a bit of client CPU).	2011-06-14 22:58:00 +02:00
Steven Brown	411d547af4	setdiscovery: limit lines to 80 characters	2011-05-05 23:21:37 +08:00
Peter Arrenbrecht	75fa0e5ea9	discovery: add new set-based discovery Adds a new discovery method based on repeatedly sampling the still undecided subset of the local node graph to determine the set of nodes common to both the client and the server. For small differences between client and server, it uses about the same or slightly fewer roundtrips than the old tree-based discovery. For larger differences, it typically reduces the number of roundtrips drastically (from 150 to 4, for instance). The old discovery code now lives in treediscovery.py, the new code is in setdiscovery.py. Still missing is a hook for extensions to contribute nodes to the initial sample. For instance, Augie's remotebranches could contribute the last known state of the server's heads. Credits for the actual sampler and computing common heads instead of bases go to Benoit Boissinot.	2011-05-02 19:21:30 +02:00

47 Commits