Ghost

mirror of https://github.com/TryGhost/Ghost.git synced 2024-11-29 15:12:58 +03:00

Author	SHA1	Message	Date
Chris Raible	d89c7d5f25	Added metric for time to create a database connection (#21696 ) ref https://linear.app/ghost/issue/ENG-1783/add-time-to-create-connection-metric - Since we've added the "time to acquire" metric to get visibility into contention in the connection pool, we've seen some anomalies where it takes a surprisingly long time to acquire a connection (~60ms) when not under load. Hypothesis is that these anomalies occur when there aren't any open connections, so Ghost has to establish a new connection with the DB, and that's the part that's actually taking most of that time. This new metric should help confirm/deny that hypothesis. - This will also be an interesting metric to keep an eye on and/or alert on — if Ghost can't create new connections with its database performantly, it's not going to perform very well.	2024-11-22 00:26:04 -08:00
Chris Raible	2ff82c7ac0	Configured prometheus client to reuse TCP connections to the pushgateway (#21695 ) ref https://linear.app/ghost/issue/ENG-1796/reuse-tcp-connections-when-sending-metrics-to-the-pushgateway - When we rolled out the prometheus metrics collection, it overwhelmed the pushgateway. Our hypothesis is that Ghost was creating too many new TCP connections to the pushgateway. - The prometheus client was creating a new connection with the pushgateway each time it pushed metrics every 15 seconds. - This commit changes the prometheus client to keep the connection alive, and re-use it instead of creating a new one. - It also limits the number of retries if pushing the metrics fails — after 3 consecutive failures, Ghost will stop retrying and log an error.	2024-11-21 17:43:33 -08:00
Chris Raible	431719080e	Added prometheus metric for time to acquire connection (#21628 ) ref https://linear.app/ghost/issue/ENG-1769/improve-pool-utilization-metric - Currently the connection pool metrics are all point in time metrics, and with a scrape interval of 15s this doesn't tell us a whole lot about what's happening in the pool. - This commit adds a Summary metric to track the elapsed time each transaction has to wait to acquire a connection from the pool, which should be a good indication of contention in the pool. - Also moved the call to `prometheusClient.instrumentKnex` to after `initCore` in the boot process, because the metric depends on event listeners on `knex.client.pool`, and the pool gets destroyed and recreated in `initCore`, which removes the listeners	2024-11-14 21:14:40 -08:00
Chris Raible	3728a4eaea	Refactored prometheus metrics `instrumentKnex` method (#21627 ) no issue - The `instrumentKnex` method was directly accessing the `promClient` instance to create custom metrics, and keeping track of them manually in a `customMetrics` map. This isn't necessary, since the metrics are all tracked within the `promClient` instance's registry. This method now uses the `prometheusClient.register...()` methods to create the metrics, and retrieves them with the `getMetric()` method to reduce duplication of work and manual bookkeeping - This also removes the query count metric, as there is a count already included in the query duration Summary metric	2024-11-14 16:57:55 -08:00
Chris Raible	6d9ea91634	Added utilities for creating custom prometheus metrics (#21614 ) ref https://linear.app/ghost/issue/ENG-1771/add-utility-functions-to-easily-create-custom-metrics - Currently adding custom metrics to our prometheus client requires you to directly access the `prometheusClient.client` to create the metrics - This isn't super convenient, as you then have to either keep the metric in a local variable, or manually get it from the `prometheusClient.client.register` - This commit exposes some utility functions for registering metrics on the `prometheusClient` class, and for retrieving metrics that have already been registered	2024-11-13 16:21:49 -08:00
Chris Raible	85408d10b7	Added connection pool metrics to prometheus client (#21576 ) ref https://linear.app/ghost/issue/ENG-1592/start-monitoring-connection-pool-utilization-in-ghost - This commit adds prometheus metrics to the connection pool so we can start to track connection pool utilization, number of pending acquires, and also adds some basic SQL query summary metrics like queries per minute and query duration percentiles. - The connection pool has now been theorized to be a main constraint of Ghost for some time, but it's been challenging to get actual visibility into the state of the connection pool. With this change, we should be able to directly observe, monitor and alert on the connection pool. - Updated grafana version to fix a bug in the query editor that was fixed in 8.3, even though this is a couple versions ahead of production	2024-11-07 23:01:34 -08:00
Chris Raible	a26f63dc11	Configured local prometheus and pushgateway in docker-compose (#21538 ) ref https://linear.app/ghost/issue/ENG-1746/enable-ghost-to-push-metrics-to-a-pushgateway - Added prometheus job to scrape the pushgateway - Updated grafana dashboard to use the metrics from the pushgateway - Added some logging to prometheus client to log errors when pushing metrics to pushgateway	2024-11-06 11:36:37 -08:00
Chris Raible	190ebcd684	Added ability to push prometheus metrics to a pushgateway (#21526 ) ref https://linear.app/ghost/issue/ENG-1746/enable-ghost-to-push-metrics-to-a-pushgateway - We'd like to use prometheus to expose metrics from Ghost, but the "standard" approach of having prometheus scrape the `/metrics` endpoint adds some complexity and additional challenges on Pro. - A suggested simpler alternative is to use a pushgateway, to have Ghost _push_ metrics to prometheus, rather than have prometheus scrape the running instances. - This PR introduces this functionality behind a configuration. - It also includes a refactor to the current metrics-server implementation so all the related code for prometheus is colocated, and the configuration is a bit more organized. `@tryghost/metrics-server` has been renamed to `@tryghost/prometheus-metrics`, and it now includes the metrics server and prometheus-client code itself (including the pushgateway code) - To enable the prometheus client alone, `prometheus:enabled` must be true. This will _not_ enable the metrics server or the pushgateway — it will essentially collect the metrics, but not do anything with them. - To enable the metrics server, set `prometheus:metrics_server:enabled` to true. You can also configure the host and port that the metrics server should export the `/metrics` endpoint on in the `prometheus:metrics_server` block. - To enable the pushgateway, set `prometheus:pushgateway:enabled` to true. You can also configure the pushgateway's `url`, the `interval` it should push metrics in (in milliseconds) and the `jobName` in the `prometheus:pushgateway` block.	2024-11-05 11:50:39 -08:00

8 Commits