Commit Graph

47 Commits

Author SHA1 Message Date
Boris Feld
cc076020d2 setdiscover: allow to ignore part of the local graph
Currently, the push discovery first determines the full set of common nodes
before looking into what changesets are outgoing. When pushing a specific
subset, this can lead to pathological situations where we search for the status
of thousand of local heads that are unrelated to the requested pushes.

To fix this, we need to teach the discovery to ignores part of the graph. Most
of the necessary pieces were already in place. This changeset just makes them
available to higher level API and tests them.

Change actually impacting pushes are coming in a later changeset.
2017-12-06 22:44:51 +01:00
Pierre-Yves David
c82b13f1cf setdiscovery: improves logged message
The 'srvheads' list contains all server heads including the common ones. We
adjust 'ui.log' message to provide more useful information about server heads
locally unknown. The performance impact of turning the list to set is
negligible (about 1e-4s) compared to the rest of the discovery cost, so I'm
taking the easy path.
2017-06-10 18:47:09 +01:00
Pierre-Yves David
ef5b27290d discovery: log discovery result in non-trivial cases
We log the discovery summary, the number of roundtrips and the elapsed time.
This is useful to understand where slow push might come from when lloking at
the blackbox.
2017-06-07 10:44:11 +01:00
Pierre-Yves David
4db3d34a4b discovery: include timing in the debug output
Having such date easily available is useful. It also prepare the inclusion of
some discovery related data in blackbox.
2017-06-07 10:29:39 +01:00
Martin von Zweigbergk
c3406ac3db cleanup: use set literals
We no longer support Python 2.6, so we can now use set literals.
2017-02-10 16:56:29 -08:00
Augie Fackler
b6dda02542 setdiscovery: use iterbatch interface instead of batch
It's a little more concise, and gives us some simple test coverage.
2016-03-01 17:44:41 -05:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Gregory Szorc
fa0db1b2ec setdiscovery: use absolute_import 2015-08-08 19:53:25 -07:00
Augie Fackler
11540756a2 discovery: always use batching now that all peers support batching
Some peers will transparently downgrade batched requests to
non-batched ones, but that simplifies code for everyone using
batching.
2015-08-05 14:21:46 -04:00
Augie Fackler
9c2e980a64 cleanup: use __builtins__.all instead of util.all 2015-05-16 14:34:19 -04:00
Martin von Zweigbergk
4127f67694 util: drop alias for collections.deque
Now that util.deque is just an alias for collections.deque, let's just
remove it.
2015-05-16 11:28:04 -07:00
Pierre-Yves David
4f0528a29a setdiscovery: remove '_setupsample' function
It is now unused.
2015-01-06 17:19:21 -08:00
Pierre-Yves David
c84e477509 setdiscovery: document '_takequicksample' 2015-01-07 20:44:20 -08:00
Pierre-Yves David
2ee7c25abf setdiscovery: drop '_setupsample' usage in '_takequicksample'
For '_takefullsample' we can just retrieve the list of head directly and
ignore the rest of the complex return values. This was the last call to the
infamous '_updatesample' function.
2015-01-06 17:07:44 -08:00
Pierre-Yves David
c05e3eea5d setdiscovery: drop the 'always' argument to '_updatesample'
This argument exists because of the complex code flow in '_takequicksample'. It
first gets the list of heads and then calls '_updatesample' on an empty initial
sample and a size limit matching the differences between the number of heads and
the target sample size. Finally the heads and the sample from '_updatesample'
were added. To ensure this addition result had the exact target length, the code
had to ensure no elements from the heads were added to the '_updatesample'
content and therefore was passing this "always included set of heads".

Instead we can just update the initial heads sample directly and use the final
target size as target size for the update.

This removes the need for this 'always' parameter to the '_updatesample' function

The test are affected because different set building order results in different
random sampling.
2015-01-07 10:32:17 -08:00
Pierre-Yves David
6ff053fa11 setdiscovery: always add exponential sample to the heads
As explained in a previous changeset, prioritizing heads too much behaves
pathologically when there are more heads than the sample size. To counter this,
we always inject exponential samples before reducing to the sample size limit.

This already show some benefit in the test themselves, but on a real-world example
this moves my discovery for push to pathologically headed repo from 45 rounds to
17 of them.

We should maybe ensure that at least 25% of the result sample is heads, but I
think the random sampling will be fine in practice.
2015-01-07 17:28:51 -08:00
Pierre-Yves David
60a9cd0334 setdiscovery: directly run '_updatesample'
The heads and exponential sample are going to end up in the same set
before any extra processing happens. We simplify the code by directly
updating a set with heads.

Changes in the order the set is built lead to small changes in the random
sampling output. But after double checking, I can confirm the input data to
the random sampling is consistent.
2015-01-07 17:23:21 -08:00
Pierre-Yves David
252ba1a3c3 setdiscovery: stop using '_setupsample' in '_takefullsample'
Very few of the return values of '_setupsample' remain in use, so we
directly retrieve the value we care about and drop the '_setupsample'
call.
2015-01-07 17:17:56 -08:00
Pierre-Yves David
e3605ecf1f setdiscovery: randomly pick between heads and sample when taking full sample
Before this changeset, the discovery protocol was too heads-centric. Heads of the
undiscovered set were always sent for discovery and any room remaining in the
sample were filled with exponential samples (and random ones if any room
remained).

This behaved extremely poorly when the number of heads exceeded the sample size,
because we keep just asking about the existence of heads, then their direct parent
and so on. As a result, the 'O(log(len(repo)))' discovery turns into a
'O(len(repo))' one. As a solution we take a random sample of the heads plus
exponential samples. This way we ensure some exponential sampling is achieved,
bringing back some logarithmic convergence of the discovery again.

This patch only applies this principle in one place. More places will be updated
in future patches.

One test is impacted because the random sample happen to be different. By
chance, it helps a bit in this case.
2015-01-07 12:09:51 -08:00
Pierre-Yves David
6141054495 setdiscovery: document the '_updatesample' function
This function is central in the sample building process, having it documented
help code readability a lot.
2015-01-06 17:02:32 -08:00
Pierre-Yves David
2fde36047b setdiscovery: avoid calling any sample building if the undecided set is small
If the length of undecided is smaller than the sample size, we can just request
information for all of them.

This conditional was previously handled by '_setupsample'. But '_setupsample' is
in my opinion a problematic function with blurry semantics. Having this
conditional explicitly earlier makes the code more explicit and moves us closer
to removing this '_setupsample' function.
2015-01-06 16:40:33 -08:00
Pierre-Yves David
ef881538c4 setdiscovery: delay sample building calls to gather them in a single place
Some of the logic around sample building is duplicated in the sample builders,
it would clean up thing to extract it in the top function, but this requires
all codes to be in the same place.

This changeset mostly exists to make the next one more clear.
2015-01-07 09:30:06 -08:00
Pierre-Yves David
ebee9c1c62 setdiscovery: drop unused 'initial' argument for '_takequicksample'
There is a single call site, and it is always using 'initial=True'. So we just drop
the argument and the associated condition.
2015-01-06 16:32:23 -08:00
Pierre-Yves David
e1de1d2c0d setdiscovery: factorize similar sampling code
We are using full sampling of 'fullsamplesize' in both case. The only
difference is the debug message. So we factorise the sampling code and put the
message in an extra conditional.

This is going to help making changes around the sampling logic. Such changes are
needed to improve discovery performance on highly headed repository.
2015-01-06 16:30:52 -08:00
Pierre-Yves David
04778d65fb setdiscovery: drop shadowed 'undecided' assignment
The 'undecided' variable was never used before being overwritten a few lines
later.
2015-01-06 16:30:37 -08:00
Siddharth Agarwal
fba9f14547 setdiscovery: avoid a full changelog graph traversal
We were definitely being suboptimal here: we were constructing two full sets,
one with the full set of common nodes (i.e. a graph traversal) and one with all
nodes. Then we subtract one set from the other. This whole process is
O(commits) and causes discovery to be significantly slower than it should be.

Instead, keep track of common incrementally and keep undecided as small as
possible.

This makes discovery massively faster on large repos: on one such repo, 'hg
debugdiscovery' over SSH with one commit missing on the client and five on the
server went from 4.5 seconds to 1.5. (An 'hg debugdiscovery' with no commits
missing on the client, i.e. connection startup time, was 1.2 seconds.)
2014-11-16 00:40:29 -08:00
Mads Kiilerich
c5488ba34c discovery: indices between sample and yesno must match (issue4438)
2ec3e28dea6b changed 'sample' from a list to a set. The iteration order is thus
undefined and the yesno indices are not stable.

To solve this, repeat the listification and comment from elsewhere in the code.

Note: the randomness in the discovery protocol can make this problem hard to
reproduce.
2014-11-05 13:05:32 +01:00
Mads Kiilerich
8079358ce3 discovery: limit 'all local heads known remotely' to real 'all' (issue4438)
2ec3e28dea6b made it possible that the initial head check didn't include all
heads. If that is the case, don't use the early exit just because this random
sample happened to be 'all known'.

Note: the randomness in the discovery protocol can make this problem hard to
reproduce.
2014-11-05 13:05:29 +01:00
Pierre-Yves David
1b8f2c7e41 setdiscovery: limit the size of all sample (issue4411)
Further digging on this issue show that the limit on the sample size used in
discovery never works for heads. Here is a quote from the code itself:

  desiredlen = size - len(always)
  if desiredlen <= 0:
      # This could be bad if there are very many heads, all unknown to the
      # server. We're counting on long request support here.

The long request support never landed and evolution make the "very many heads,
all unknown to the server" case quite common.

We implement a simple and stupid hard limit of sample size for all query. This
should prevent HTTP 414 error with the current state of the code.
2014-11-01 23:52:53 +00:00
Pierre-Yves David
e107a615ed setdiscovery: limit the size of the initial sample (issue4411)
The set discovery start by sending a "known" command with all local heads. When
the number of local heads is massive (eg: using hidden changesets) such request
becomes too large. This lead to 414 error over http, aborting the whole
process.

We limit the size of the sample used by the first query to fix this.

The test are impacted because they do test massive number of heads. But they do
not test it over real world http setup.
2014-10-27 17:52:33 +01:00
Pierre-Yves David
d1263d8d84 setdiscovery: extract sample limitation in a _limitsample function
We need to reuse this logic for the initial query. We extract it in a function
to unsure sample limiting is applied consistently in all cases.
2014-10-27 17:40:32 +01:00
Olle Lundberg
492c3a2ebf setdiscovery: document algorithms used
This is taken from:
http://programmers.stackexchange.com/questions/208998
And modified slightly.
2014-03-06 12:37:28 +01:00
Augie Fackler
9f876f6c89 cleanup: move stdlib imports to their own import statement
There are a few warnings still produced by my import checker, but
those are false positives produced by modules that share a name with
stdlib modules.
2013-11-06 16:48:06 -05:00
Mads Kiilerich
520076e707 delete some dead comments and docstrings 2012-08-21 02:41:20 +02:00
Mads Kiilerich
2f4504e446 fix trivial spelling errors 2012-08-15 22:38:42 +02:00
Pierre-Yves David
ae5abd69f4 localpeer: return only visible heads and branchmap
Now that we have localpeer, we can apply filtering on heads and branchmap the
same way it's done for wireprotocol peer.
2012-07-17 01:04:45 +02:00
Sune Foldager
ffe56435bf peer: introduce peer methods to prepare for peer classes
This introduces a peer method into all repository classes, which currently
simply returns self. It also changes hg.repository so it now raises an
exception if the supplied paths does not resolve to a localrepo or descendant.

Finally, all call sites are changed to use the peer and local methods as
appropriate, where peer is used whenever the code is dealing with a remote
repository (even if it's on local disk).
2012-07-13 21:46:53 +02:00
Bryan O'Sullivan
abdf4a8227 util: subclass deque for Python 2.4 backwards compatibility
It turns out that Python 2.4's deque type is lacking a remove method.
We can't implement remove in terms of find, because it doesn't have
find either.
2012-06-01 17:05:31 -07:00
Brodie Rao
d6a6abf2b0 cleanup: eradicate long lines 2012-05-12 15:54:54 +02:00
Pierre-Yves David
8abd0aa7c9 phases: do not exchange secret changesets
Any secret changesets will be excluded from pull and push. Phase data are
properly synchronized on pull and push if a changeset is seen as secret locally
but is non-secret remote side.

This patch does not handle the case of a changeset secret on remote but known
locally.
2011-12-22 00:42:25 +01:00
Mads Kiilerich
065de91b14 add missing localization markup 2011-11-11 01:07:10 +01:00
Peter Arrenbrecht
83352215f8 setdiscovery: fix hang when #heads>200 (issue2971)
When setting up the next sample, we always add all of the heads, regardless
of the desired max sample size. But if the number of heads exceeds this
size, then we don't add any more nodes from the still undecided set.
(This is debatable per se, and I'll investigate it, but it's how we designed
it at the moment.)

The bug was that we always added the overall heads, not the heads of the
remaining undecided set. Thus, if #heads>200 (desired sample size), we
did not make progress any longer.
2011-08-25 21:25:14 +02:00
Andrew Pritchard
f23118834a setdiscovery: return anyincoming=False when remote's only head is nullid
This fixes (issue2907) a crash when using 'hg incoming --bundle' with an empty
remote repo and a non-empty local repo.

This also fixes an unreported bug that 'hg summary --remote' erroneously
reports incoming changes when the remote repo is empty and the local is not.

Also, add a test to make sure issue2907 stays fixed
2011-07-27 18:32:54 -04:00
Matt Mackall
b06b887ee8 discovery: quiet note about heads
This was changing output on in/out -v for no good reason.
2011-07-05 14:30:42 -05:00
Peter Arrenbrecht
9a2d2f747c setdiscovery: batch heads and known(ownheads)
This means that we now discover both subset conditions (local<remote and
remote<local) in a single roundtrip without ever constructing an actual
sample (which takes a bit of client CPU).
2011-06-14 22:58:00 +02:00
Steven Brown
411d547af4 setdiscovery: limit lines to 80 characters 2011-05-05 23:21:37 +08:00
Peter Arrenbrecht
75fa0e5ea9 discovery: add new set-based discovery
Adds a new discovery method based on repeatedly sampling the still
undecided subset of the local node graph to determine the set of nodes
common to both the client and the server.

For small differences between client and server, it uses about the same
or slightly fewer roundtrips than the old tree-based discovery. For
larger differences, it typically reduces the number of roundtrips
drastically (from 150 to 4, for instance).

The old discovery code now lives in treediscovery.py, the new code is
in setdiscovery.py.

Still missing is a hook for extensions to contribute nodes to the
initial sample. For instance, Augie's remotebranches could contribute
the last known state of the server's heads.

Credits for the actual sampler and computing common heads instead of
bases go to Benoit Boissinot.
2011-05-02 19:21:30 +02:00