no ref
Added Prometheus metrics for the job queue throughput and email analytics throughput. We'll likely keep these around as good metrics to keep an eye on, though for the moment their primary function is to establish a baseline for users w/o the job queue enabled so we can observe the full impact once switching it on.
ref
https://linear.app/ghost/issue/ENG-1769/improve-pool-utilization-metric
- Currently the connection pool metrics are all point in time metrics,
and with a scrape interval of 15s this doesn't tell us a whole lot about
what's happening in the pool.
- This commit adds a Summary metric to track the elapsed time each
transaction has to wait to acquire a connection from the pool, which
should be a good indication of contention in the pool.
- Also moved the call to `prometheusClient.instrumentKnex` to after `initCore` in the boot process, because the metric depends on event listeners on `knex.client.pool`, and the pool gets destroyed and recreated in `initCore`, which removes the listeners
no issue
- The `instrumentKnex` method was directly accessing the `promClient`
instance to create custom metrics, and keeping track of them manually in
a `customMetrics` map. This isn't necessary, since the metrics are all
tracked within the `promClient` instance's registry. This method now
uses the `prometheusClient.register...()` methods to create the metrics,
and retrieves them with the `getMetric()` method to reduce duplication
of work and manual bookkeeping
- This also removes the query count metric, as there is a count already
included in the query duration Summary metric
ref
https://linear.app/ghost/issue/ENG-1771/add-utility-functions-to-easily-create-custom-metrics
- Currently adding custom metrics to our prometheus client requires you
to directly access the `prometheusClient.client` to create the metrics
- This isn't super convenient, as you then have to either keep the
metric in a local variable, or manually get it from the
`prometheusClient.client.register`
- This commit exposes some utility functions for registering metrics on
the `prometheusClient` class, and for retrieving metrics that have
already been registered
ref
https://linear.app/ghost/issue/ENG-1592/start-monitoring-connection-pool-utilization-in-ghost
- This commit adds prometheus metrics to the connection pool so we can
start to track connection pool utilization, number of pending acquires,
and also adds some basic SQL query summary metrics like queries per
minute and query duration percentiles.
- The connection pool has now been theorized to be a main constraint of
Ghost for some time, but it's been challenging to get actual visibility
into the state of the connection pool. With this change, we should be
able to directly observe, monitor and alert on the connection pool.
- Updated grafana version to fix a bug in the query editor that was
fixed in 8.3, even though this is a couple versions ahead of production
ref
https://linear.app/ghost/issue/ENG-1746/enable-ghost-to-push-metrics-to-a-pushgateway
- Trying to get Ghost working with the prometheus pushgateway in
staging, but it's logging an error each time it tries to push the
metrics. The error output is pretty useless for debugging, so this
commit improves the error messages to make it easier to debug.
ref
https://linear.app/ghost/issue/ENG-1746/enable-ghost-to-push-metrics-to-a-pushgateway
- We'd like to use prometheus to expose metrics from Ghost, but the
"standard" approach of having prometheus scrape the `/metrics` endpoint
adds some complexity and additional challenges on Pro.
- A suggested simpler alternative is to use a pushgateway, to have Ghost
_push_ metrics to prometheus, rather than have prometheus scrape the
running instances.
- This PR introduces this functionality behind a configuration.
- It also includes a refactor to the current metrics-server
implementation so all the related code for prometheus is colocated, and
the configuration is a bit more organized. `@tryghost/metrics-server`
has been renamed to `@tryghost/prometheus-metrics`, and it now includes
the metrics server and prometheus-client code itself (including the
pushgateway code)
- To enable the prometheus client alone, `prometheus:enabled` must be
true. This will _not_ enable the metrics server or the pushgateway — it
will essentially collect the metrics, but not do anything with them.
- To enable the metrics server, set `prometheus:metrics_server:enabled`
to true. You can also configure the host and port that the metrics
server should export the `/metrics` endpoint on in the
`prometheus:metrics_server` block.
- To enable the pushgateway, set `prometheus:pushgateway:enabled` to
true. You can also configure the pushgateway's `url`, the `interval` it
should push metrics in (in milliseconds) and the `jobName` in the
`prometheus:pushgateway` block.