ref
https://linear.app/tryghost/issue/ENG-1505/add-prometheus-metrics-server-to-allow-monitoring-ghost-metrics
# Summary
This commit includes two main components: a prometheus client class to
collect metrics from Ghost, and a standalone metrics server that exposes
a /metrics endpoint at a separate port (9416 by default) from the main
Ghost app.
The prometheus client is a very thin wrapper around
[prom-client](https://github.com/siimon/prom-client). We could use
prom-client directly, but this approach should make it easier to switch
to a different prometheus client package (or make our own) if we ever
need to down the line.
The list of default metrics this enables is specified in an e2e test
[here](https://github.com/TryGhost/Ghost/pull/21192/files#diff-ebc52236be2cd14b40be89220ae961f48d3f837693f7d1da76db292348915941R66-R92).
This also gives us the ability to create and collect custom metrics,
although none are included in this commit yet.
# Configuration
The prometheus client and the metrics server are both disabled by
default, but can be enabled by setting the metrics_server:enabled flag
to true.
You can also define a custom host and port using `metrics_server:host`
and `metrics_server:port`.
## Why not expose the /metrics endpoint in one of the existing express
apps?
The standalone express app exists for two main reasons:
1. We don't want these metrics to be public, and the easiest way to
accomplish that is to expose the /metrics endpoint at a different port
that won't be exposed to the internet.
2. Creating a standalone express instance decouples the metrics endpoint
from the Ghost server, so if Ghost is not responding for whatever
reason, we should still be able to scrape metrics to understand what's
going on internally.
## Impact on Boot & Shut down time
The prometheus client is initialized early in the boot process so we can
collect metrics during the boot sequence. Testing locally has shown that
this increases boot time by ~20ms. The metrics server which exposes the
/metrics endpoint is not initialized until after the background
services, and it is not awaited, to avoid impacting boot time. None of
this code, including the requires, will run if the
metrics_server:enabled flag is set to false (or not set).
Shutting down the metrics server is added as a cleanup task for the main
Ghost server instance, and is setup to shut down with 0 grace period to
avoid impacting shut down time.