sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 00:45:18 +03:00

History

Thomas Orozco a3a0347639 mononoke/rendezvous: introduce query batching Summary: This introduces a basic building block for query batching. I called this rendezvous, since it's about multiple queries meeting up in the same place :) There are a few (somewhat conflicting) goals this tries to satisfy, so let's go over them: 1), we'd like to reduce the total number of queries made by batch jobs. For example, group hg bonsai lookups made by the walker. Those jobs are characterized by the fact that they have a lot of queries to make, all the time. Here's an example: https://fburl.com/ods/zuiep7yh. 2), we'd like to reduce the overall number of connections held to MySQL by our tasks. The main way we achieve this is by reducing the maximum number of concurrent queries. Indeed, a high total number of queries doesn't necessarily result in a lot of connections as long as they're not concurrent, because we can reuse connections. On the other hand, if you dispatch 100 concurrent queries, that _does_ use 100 connections. This is something that applies to batch jobs due to their query volume, but also to "interactive" jobs like Mononoke Server or SCS, just not all the time. Here's an example: https://fburl.com/ods/o6gp07qp (you can see the query count is overall low, but sometimes spikes substantially). 2.1) It's also worth noting that concurrent queries are often the result of many clients wanting the same data, so deduplication is also useful here. 3), we also don't want to impact the latency of interactive jobs when they need to a little query here or there (i.e. it's largely fine if our jobs all hold a few connections to MySQL and use them somewhat consistently). 4), we'd like this to make it easier to do batching right. For example, if you have 100 Bonsais to map to hg, you should be able to just map and call `future::try_join_all` and have that do the right thing. 5), we don't want "bad" queries to affect other queries negatively. One example would be the occasional queries we make to Bonsai <-> Hg mapping in `known` for thousands (if not more) of rows. 6), we want this to be easy to incorporate into the codebase. So, how do we try to address all of this? Here's how: - We ... do batching, and we deduplicate requests in a batch. This is the easier bit and should address #1, #2 and #2.1, #4. - However, batching is conditional. We notably don't batch very large requests with the rest (addresses #5). We also don't batch small queries all the time: we only batch if we are observing a throughput of queries that suggests we can find some benefit in batching (this targets #3). - Finally, we have some utilities for common cases like having to group by repo id (this is `MultiRendezVous`), and this is all configurable via tunables (and the default is to not do anything). Reviewed By: StanislavGlebik Differential Revision: D27010317 fbshipit-source-id: 4a2397255f9785c6722c02e4d419438fd0aafa07		2021-03-19 08:50:40 -07:00
..
src	mononoke/rendezvous: introduce query batching	2021-03-19 08:50:40 -07:00
tunables-derive	autocargo v1: changes to match autocargo v2 generation results.	2021-02-19 11:03:55 -08:00
Cargo.toml	third-party/rust: update futures	2021-03-04 06:42:55 -08:00