Summary:
Fixes for the handling of blobstores after heal:
1. If all blobstores are successfully healed for a key, no need to requeue it
2. Where all heal puts fail, make sure we requeue with at least the original source blobstore we loaded the blob from
3. When we do write to the queue, write with all blobstore ids where we know we have good data, so that when it is read later it is not considered missing.
Reviewed By: krallin
Differential Revision: D15911853
fbshipit-source-id: 1c81ce4ec5f975e5230b27934662e02ec515cb8f
Summary: make blobstore_healer auto-heal source blobstores found to be missing data so long as at least one other source blobstore from the queue has the data for the missing key
Reviewed By: krallin
Differential Revision: D16464895
fbshipit-source-id: 32549e58933f39bb20c173caf02a35c91123fe8d
Summary: Since we're only running a single healer in the process for a single blobstore, its easy to bound the concurrency by limiting it to the number of entries we deal with at once. As a result, we don't need a separate mechanism to do overall control.
Reviewed By: StanislavGlebik
Differential Revision: D15912818
fbshipit-source-id: 3087b88cfdfed2490664cd0df10bd6f126267b83
Summary: Basically notes I took for myself to truely understand the code.
Reviewed By: StanislavGlebik
Differential Revision: D15908406
fbshipit-source-id: 3f21f7a1ddce8e15ceeeffdb5518fd7f5b1749c4
Summary:
Allow blobstore_healer to be directly configured to operate on a blobstore.
This makes two changes:
- Define which blobstore to operate on defined in storage.toml (doesn't
currently support server.toml-local storage configs)
- Only heal one blobstore at a time. We can run multiple separate instances of the
healer to heal multiple blobstores.
Reviewed By: HarveyHunt
Differential Revision: D15065422
fbshipit-source-id: 5bc9f1a16fc83ca5966d804b5715b09d359a3832
Summary:
The healer is a blobstore-level operation, which is orthogonal to the concept of a repo; therefore, there should be no mention of repoid in any of the healer's structures or tables.
For now this leaves the schema unmodified, and fills the repoid with a dummy value (0). We can clean that up later.
Reviewed By: lukaspiatkowski, HarveyHunt
Differential Revision: D15051896
fbshipit-source-id: 438b4c6885f18934228f43d85cdb8bf2f0e542f1
Summary: RepositoryId shouldn't leak into the blobstore layer. This leaves repoid in the schema, but just populates it with a dummy value (0). We can clean up the schema and this code in a later diff.
Reviewed By: StanislavGlebik
Differential Revision: D15021285
fbshipit-source-id: 3ecb04a76ce74409ed0cced3d2a0217eacd0e2fb
Summary:
Before this change, we would always include the shard id in our mysql-related fb303 counters. This is not perfect for two reasons:
- the the xdb blobstore we have 4K shards and 24 counters, so we were reporting 96K counters in total
- we rarely care about per-counter metrics anyway, since in most cases all queries are uniformly distributed
Therefore, let's change this approach to not use per-shard counters and use per-shardmap ones (when sharding is involved).
Reviewed By: krallin
Differential Revision: D16360591
fbshipit-source-id: b2df94a3ca9cacbf5c1f328b48e87b48cd18287e
Summary:
In earlier diffs in this stack, I updated the callsites that reference XDB tiers to use concrete &str types (which is what they were receiving until now ... but it wasn't spelled out as-is).
In this diff, I'm updating them to use owned `String` instead, which lets us hoist up `to_string()` and `clone()` calls in the stack, rather than pass down reference only to copy them later on.
This allows us to skip some unnecessary copies. Tt turns out we were doing quite a few "turn this String into a reference, pass it down the stack, then turn it back into a String".
Reviewed By: farnz
Differential Revision: D16260372
fbshipit-source-id: faec402a575833f6555130cccdc04e79ddb8cfef
Summary:
This implements and uses the `add_many` method of the blob healer queue. This method allows us to do batched adds, which in turn allows us to use `chunks` on Manifold iteration.
NB 1: I deliberately removed control symbols form progress print message. If we only print it on the same line, we loose it when the job crashes.
NB 2: I deliberately use range of `entries[0]`, as I want to pessimistically restart from the earliest in case of a failure.
Reviewed By: krallin
Differential Revision: D16327788
fbshipit-source-id: 8d9f3cf85ee7cbca657a8003a787b5ea84a1b9b0
Summary:
Instantiating a new DB connection may require remote calls to be made to e.g. Hipster to allocate a new certificate (this is only the case when connecting to MySQL).
Currently, our bindings to our underlying DB locator make a blocking call to pretend that this operaiton is synchronous: https://fburl.com/ytmljxkb
This isn't ideal, because this call might actually take time, and we might also occasionally want to retry it (we've had issues in our MySQL tests with acquiring certificates that retrying should resolve). Running this synchronously makes doing so inefficient.
This patch doesn't update that, but it fixes everything on the Rust side of things to stop expecting connections to return a `Result` (and to start expecting a Future instead).
In a follow up diff, I'll work on making the changes in common/rust/sql to start returning a Future here.
Reviewed By: StanislavGlebik
Differential Revision: D16221857
fbshipit-source-id: 263f9237ff9394477c65e455de91b19a9de24a20
Summary:
`local_instances` option was used to create fileblobstore or sqlite blobstore.
Now we use mononoke config for this purpose. Since this option is no longer
useful let's delete it
Reviewed By: krallin
Differential Revision: D16120065
fbshipit-source-id: 375a168b27e7f2cf1a6a77f487c5e013f9004546
Summary:
This migrates the internal structures representing the repo and storage config,
while retaining the existing config file format.
The `RepoType` type has been replaced by `BlobConfig`, an enum containing all
the config information for all the supported blobstores. In addition there's
the `StorageConfig` type which includes `BlobConfig`, and also
`MetadataDBConfig` for the local or remote SQL database for metadata.
Reviewed By: StanislavGlebik
Differential Revision: D15065421
fbshipit-source-id: 47636074fceb6a7e35524f667376a5bb05bd8612
Summary:
In the later diff we'll add batching of BlobstoreSyncQueue writes. It would be
much harder to add the batching if we also have to return this boolean.
And since noboby uses it, let's just remove it
Reviewed By: farnz
Differential Revision: D15248290
fbshipit-source-id: 72c64770c1b023e9de23a5dfccd8b4482302fe96
Summary:
In the case of mononoke's admin tool it's annoying for users to be required to run myrouter in the background and provide myrouter port to every command.
Thanks to this change it is no longer necessary to run admin commands through myrouter - the tool will simply use a direct connection to XDB using the sql crate.
It is important to note that the raw XDB connection via sql crate doesn't have connection pooling and doesn't handle XDB failover so it is crucial that it is never used for long-lived or request heavy use cases like running mononoke server or blobimport
Reviewed By: jsgf
Differential Revision: D15174538
fbshipit-source-id: 299d3d7941ae6aec31961149f926c2a4965ed970
Summary:
The main fix is in speeding up sql query that returns entries to heal.
The sql query was slow in the case when there are a lot of entries for one
repo and few entries for another repo. Selecting entries for smaller repo can
become too slow because mysql has to scan the whole table in order to sort
entries. Since ordering by id doesn't look necessary I suggest to just remove
them.
Also waiting for 1 minute between heal attemps is too slow
There are a few more smaller fixes - replacing join_all with more efficient
futures_unordered and doing batch delete of entries from the sync queue
Reviewed By: aslpavel
Differential Revision: D14598578
fbshipit-source-id: e8d302aab7b5a4bc16c63e14228713b75295e97a
Summary: Slim down the blobstore trait crate as much as possible.
Reviewed By: aslpavel
Differential Revision: D14542675
fbshipit-source-id: faf09255f7fe2236a491742cd836226474f5967c
Summary:
- healer runs on all repositories at once, and queries for some repositories are timing out
- It is now possible to run healer just for specified repository
Reviewed By: HarveyHunt
Differential Revision: D14539978
fbshipit-source-id: 9139999da97b2655ae9312c33c9e8c15f0b24016
Summary:
- convert to 2018 edition, and removed all `extern crate`
- wait for `myrouter` to be available before actually doing anything
Reviewed By: HarveyHunt
Differential Revision: D14524247
fbshipit-source-id: ebe2e2e74935f00c87945129370f268c794fcab7
Summary:
Currently if a crate depends even on a single type from metaconfig then in
order to compile this trait buck first compiles metaconfig crate with all the
logic of parsing the configs.
This diff split metaconfig into two crates. The first one just holds the types for
"external consumption" by other crates. The second holds the parsing logic.
That makes builds faster
Reviewed By: jsgf, lukaspiatkowski
Differential Revision: D13877592
fbshipit-source-id: f353fb2d1737845bf1fa0de515ff8ef131020063
Summary:
This version still misses:
- proper production-ready logging
- smarter handling of case where the queue entries related to each other do not fit in the limit or `older_than` limit, so the healer will heal much more entries without realizing it shouldn't do so.
Reviewed By: aslpavel
Differential Revision: D13528686
fbshipit-source-id: 0245becea7e4f0ac69383a7885ff3746d81c4add