ref https://linear.app/ghost/issue/ENG-1782/move-docker-related-files-to-a-better-location-that-githubscripts
- The docker files are currently located in `.github/scripts`. This location doesn't make a lot of sense — you wouldn't think to look there unless you already knew they were there. This also requires you to specify the path to the `compose.yml` file whenever running a `docker compose ...` command.
- This commit moves the `compose.yml` file to the root of the repo, so you can simply run `docker compose up` and it will automatically find the file in the root, without having to specify `-f .github/scripts/docker-compose.yml`. This is a major win for convenience over the current setup.
- It also moves all the related files, including the `Dockerfile` used by the Dev Container setup and configuration files for supporting services into a new `.docker` directory, which is a more logical location, and should be easier to find.
- Also updated the current convenience commands in the `package.json` scripts block (`yarn docker:reset` and `yarn docker:down`
ref
https://linear.app/ghost/issue/ENG-1769/improve-pool-utilization-metric
- Currently the connection pool metrics are all point in time metrics,
and with a scrape interval of 15s this doesn't tell us a whole lot about
what's happening in the pool.
- This commit adds a Summary metric to track the elapsed time each
transaction has to wait to acquire a connection from the pool, which
should be a good indication of contention in the pool.
- Also moved the call to `prometheusClient.instrumentKnex` to after `initCore` in the boot process, because the metric depends on event listeners on `knex.client.pool`, and the pool gets destroyed and recreated in `initCore`, which removes the listeners
ref
https://linear.app/ghost/issue/ENG-1592/start-monitoring-connection-pool-utilization-in-ghost
- This commit adds prometheus metrics to the connection pool so we can
start to track connection pool utilization, number of pending acquires,
and also adds some basic SQL query summary metrics like queries per
minute and query duration percentiles.
- The connection pool has now been theorized to be a main constraint of
Ghost for some time, but it's been challenging to get actual visibility
into the state of the connection pool. With this change, we should be
able to directly observe, monitor and alert on the connection pool.
- Updated grafana version to fix a bug in the query editor that was
fixed in 8.3, even though this is a couple versions ahead of production
ref
https://linear.app/ghost/issue/ENG-1746/enable-ghost-to-push-metrics-to-a-pushgateway
- We'd like to use prometheus to expose metrics from Ghost, but the
"standard" approach of having prometheus scrape the `/metrics` endpoint
adds some complexity and additional challenges on Pro.
- A suggested simpler alternative is to use a pushgateway, to have Ghost
_push_ metrics to prometheus, rather than have prometheus scrape the
running instances.
- This PR introduces this functionality behind a configuration.
- It also includes a refactor to the current metrics-server
implementation so all the related code for prometheus is colocated, and
the configuration is a bit more organized. `@tryghost/metrics-server`
has been renamed to `@tryghost/prometheus-metrics`, and it now includes
the metrics server and prometheus-client code itself (including the
pushgateway code)
- To enable the prometheus client alone, `prometheus:enabled` must be
true. This will _not_ enable the metrics server or the pushgateway — it
will essentially collect the metrics, but not do anything with them.
- To enable the metrics server, set `prometheus:metrics_server:enabled`
to true. You can also configure the host and port that the metrics
server should export the `/metrics` endpoint on in the
`prometheus:metrics_server` block.
- To enable the pushgateway, set `prometheus:pushgateway:enabled` to
true. You can also configure the pushgateway's `url`, the `interval` it
should push metrics in (in milliseconds) and the `jobName` in the
`prometheus:pushgateway` block.
no issue
- The browser test output in CI is really noisy, because the `NX_DAEMON`
doens't run in CI, but we're trying to use NX to watch and rebuild the
typescript modules. This is outputting a ton of "NX Daemon is not
running" type of errors, which make it difficult to sift through the
actual test results.
- We don't actually need to watch the typescript files, we just need to
build them once before starting. This is defined as an NX dependency for
the browser tests target, so we don't need to explicitly build the TS
packages at all. Removing the typescript watch & build command removes
the noisy errors, without impacting how the tests actually run.
no issue
- Browser tests in CI were yielding a passing result even if one or more
tests failed (including retries).
- The `yarn dev` command that triggers the browser tests in CI was
catching any errors and exiting with code 0, resulting in a ✅ in CI.
- This commit changes `yarn dev` to exit with code 1 if the browser
tests fail, so that CI will correctly fail if any of the browser tests
fail.
no issue
- Dev Containers let you work on Ghost in a consistent, isolated
environment with all the necessary development dependencies
pre-installed. VSCode (or Cursor) can effectively run _inside_ the
container, providing a local quality development environment while
working in a well-defined, isolated environment.
- For now the default setup only works with "Clone repository in
Container Volume" or "Clone PR in Container Volume" — this allows for a
super quick and simple setup. We can also introduce another
configuration to allow opening an existing local checkout in a Dev
Container, but that's not quite ready yet.
- This PR also added the `yarn clean:hard` command which: deletes all
node_modules, cleans the yarn cache, and cleans the NX cache. This will
be necessary for opening a local checkout in a Dev Container.
- To learn more about Dev Containers, read this guide from VSCode:
https://code.visualstudio.com/docs/devcontainers/containers#_personalizing-with-dotfile-repositories
---------
Co-authored-by: Joe Grigg <joe@ghost.org>
Co-authored-by: Steve Larson <9larsons@gmail.com>
- we should be running most of CI on Node 20
- this has fallen out of sync because we declare the version in so many
places
- to help with this, I've extracted the Node version to an env var, and
re-used that across the workflow
no issue
# Before
The pre-commit hook would abort the commit if any submodules were staged
for commit, and prompt the user to manually un-stage them and retry the
commit.
# Now
The pre-commit hook automatically un-stages any staged submodules, then
allows the commit to proceed.
# Why?
This was a daily annoyance that caused many common git commands to
abort, and required manual un-staging of the submodules before retrying
the commit:
- `git commit -a`
- `git add . && git commit`
- `git add -A && git commit`
If we ever _do_ need to commit submodules, we can always add them back
and run `git commit --no-verify` to accomplish that (which we would have
needed to do before regardless). This should accomplish the same goal of
not allowing submodules to be committed, but reduce the day to day
friction of making commits in Ghost.
ref
https://linear.app/tryghost/issue/ENG-1591/add-prometheus-and-grafana-services-to-docker-compose
This commit adds 2 new services to the docker compose file to enable
monitoring metrics from Ghost locally in real-time:
1. Prometheus - a service that scrapes Ghost's new `/metrics` endpoint
introduced in this
[commit](768336efad).
2. Grafana - a service that consumes the metrics from prometheus and
exposes them in a dashboard that you can view locally at
`localhost:3000`.
# Usage
Both of these services are selectively enabled using docker compose
[profiles](https://docs.docker.com/compose/how-tos/profiles/). This way,
if you don't opt-in to using these monitoring tools, they won't start
and consume resources on your host machine. To enable these services,
enable the `monitoring` profile by either setting the `COMPOSE_PROFILES`
environment variable to `monitoring`, or specifying the `--profile
monitoring` CLI argument to any `docker compose ...` commands.
I've found the easiest way to configure this in an 'always on' fashion
is to create a `.env` file in the project's root directory and add
`COMPOSE_PROFILES=monitoring` to it. As an added convenience, you can
also set `COMPOSE_FILE=.github/scripts/docker-compose.yml`, which will
allow you to run `docker compose ...` commands from the root directory
without specifying the full path each time.
# Intended for development only
These services are meant for local development only, and are not
configured for a production use-case. For example, the Grafana instance
is configured to have _no authorization_ so you won't need a
username/password to login at `localhost:3000`. Prometheus is also
configured to scrape the metrics once every second, which is likely
excessive for production use-cases, but may be useful for getting more
granular metrics while e.g. load testing locally.
# Dashboards
The Grafana instance includes a default dashboard including most of the
main default metrics provided by our prometheus client integration. The
dashboard is defined in a JSON file at
`.github/scripts/docker/grafana/dashboards/main-dashboard.json' and can
be modified & committed to add new visualizations that will be available
to anyone work on Ghost locally. You can also add other dashboards to
the same directory for specific use-cases, which should be picked up and
made available in the Grafana UI. [Read
more](https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/view-dashboard-json-model/)
about Grafana's JSON schema for dashboards.
refs https://ghost.slack.com/archives/C02G9E68C/p1727704490753759
- if you open a PR and it becomes outdated enough such that the base
commit was 100 commits ago, the workflow starts to fail
- to help prevent this, we can increase it by 1000, which should more
than cover enough use-cases but still keep checkout quick
- users like `renovate[bot]` have brackets in the username
- this breaks the command and it exits with `exit code 3.`
- to fix this, we can encode the username before passing it in
ref https://linear.app/tryghost/issue/DEV-31/staging-deploys-of-feature-branchesprs
- we want the ability to ship a PR to staging, so we can test and QA
without merging to `main`
- most of the infrastructure is already in place for this, so it's
mostly a case of wiring it all up
- this commit will send a slightly different payload to the build
process, to indicate it's coming from a PR
- I've also added a check that the user is a member of the org, so we
don't get random builds from non-members
- to trigger this, we should be able to add the `deploy-to-staging`
label and it Just Works :TM:
refs https://github.com/nrwl/nx/releases/tag/19.8.0
- this commit updates Nx to v19
- we need to add some extra commands to the dev script to stop and
restart the Nx daemon, so it's ready and running before we execute a
bunch of Nx commands concurrently
- this also updates nx.json to the format needed for the latest version
ref https://linear.app/tryghost/issue/DEV-25/move-version-bumping-logic-into-ghost-repo
- we're slowly migrating our build code into the OSS repo, which means
we need to move scripts over
- we have this as a bash script, but I've rewritten it to JS so it's a
little more maintainable
- this script will just bump the version in the package.json files and
set the GHA output
- we shouldn't try and load the Stripe CLI via the dev script because
it's done in the browser tests and involves more setup than the dev
script contains
- this cuts 2mins from the browser tests because they're no longer
waiting for the Stripe CLI to be auth'd
- we should be able to trust Nx enough that we can sustain the build
cache across commits, which will speed up the workflow because we
don't need to rebuild our TS projects all the time
no issue
- OpenTelemetry has been problematic in a number of ways (boot time,
breaking the frontend). May revisit it at some point in the future, but
for now it is only exporting metrics via prometheus and not traces, so
there's currently nothing sending data to this jaeger container
- Cleaning it up for now as it's just sitting there idly consuming
resources
no issue
- This allows us to run `docker-compose down` or to restart docker
desktop without losing all our local databases
- Added a data volume to the MySQL service in the `docker-compose.yml`
file to persist the data between container restarts
- The `yarn docker:reset` command will still reset all the data in the
database since it uses `down -v` to remove the volumes as well
ref DOGM-32
Using the dev script as a template, this script runs the tests with
local copies of the applications needed instead of the released CDN
versions
- this was an early attempt to group PRs together by labels, so we can
triage PRs easier, but it's not finished and actually producing more
noise than signal
- we might want to re-add this in the future, but for now, silence 🧘
- We have browser tests which only run if the browser tests flag is added to the PR
- The label has to be present on PR creation, which is hard to remember/doesn't fit with various workflows
- The default type of action for the pull_request trigger are opened, synchronize, reopened
- This PR adds labeled and unlabeled to those, which I think will help us to run the tests as expected
- The expectation is that adding the browser test label will now trigger the tests to run
no issue
- knex can behave differently with SQLite and MySQL, which can cause
migrations to behave differently in each database. This PR adds a check
to the migration review checklist to remind us to test the migration in
both databases before merging.
This commit adds OpenTelemetry instrumentation to Ghost's backend, which
allows us to view traces similar to what we see in Sentry Performance
locally.
OpenTelemetry is enabled if `NODE_ENV === 'development'` or if it is
explicitly enabled via config with `opentelemetry:enabled`.
It also adds a [Jaeger](https://www.jaegertracing.io/) container to
Ghost's docker-compose file for viewing the traces. There's no setup
required (beyond running `yarn docker:reset` to pickup the changes in
the docker-compose file the first time — but this will also reset your
DB so be careful). This will launch the Jaeger container, and you can
view the UI to see the traces at `http://localhost:16686/search`.
no refs
Redis can be utilised for various caching purposes within Ghost. This PR
adds a Redis service to the docker-compose file to allow for easier
local development when Redis is required