docs: update observability best practices

PR-URL: https://github.com/hasura/graphql-engine-mono/pull/10115
GitOrigin-RevId: 6f0d8278265724d18a9e1ff90a2ce67c2e8efd5c
This commit is contained in:
Toan Nguyen 2023-08-21 19:26:52 +07:00 committed by hasura-bot
parent c9a061918f
commit 7f48bb9df6

View File

@ -8,6 +8,8 @@ sidebar_label: Best Practices
sidebar_position: 2
---
import Thumbnail from '@site/src/components/Thumbnail';
# Observability Best Practices
## Introduction
@ -16,10 +18,14 @@ The purpose of this document is to provide an overview of some of the best pract
observability for your Hasura-driven product. We will cover the fundamentals of observability and provides general
recommendations on what Hasura considers as observability best practices.
While specifics of your product or system will define your configurations, we have used Hasura Cloud, Postgres, and
Datadog to build this guide. Wherever applicable, links are provided to the mentioned products documentation.
While specifics of your product or system will define your configurations, we have used Hasura Cloud, Postgres,
Prometheus and Grafana to build this guide. Wherever applicable, links are provided to the mentioned products
documentation.
A sample dashboard based on Datadog is provided for you to replicate.
We also provide [pre-built Grafana Dashboards](/observability/enterprise-edition/prometheus/pre-built-dashboards.mdx)
for you to replicate.
<Thumbnail src="/img/observability/grafana-overview-dashboard.png" alt="Hasura Overview Dashboard" width="1000px" />
## The basics
@ -82,11 +88,26 @@ Depending on your Hasura Enterprise Edition deployment mode, you may access, exp
deployment using [this](/deployment/logging.mdx#log-types) document. Generally, you should configure your container logs
to be exported to your observability platform using the appropriate log drivers.
#### Metrics via Prometheus integration
#### Metrics via Prometheus
You can export metrics of your Hasura Cloud project to Prometheus. You can configure this on the `Integrations` tab on
the project's settings page. You can find more information on this
[here](/observability/cloud/prometheus.mdx).
You can export metrics of your Hasura Enterprise project to Prometheus easily via enabling the `metrics` API. You can find
more information on this [here](/observability/enterprise-edition/prometheus/integrate-prometheus-grafana.mdx).
For security reasons, the metrics endpoint should not be leaked to the internet. Or if unavoidable, the Prometheus
secret should be confidential to prevent misuse.
[Pre-built Grafana Dashboards](/observability/enterprise-edition/prometheus/pre-built-dashboards.mdx) are provided to
visualize Golden signal metrics that you will love for real-time monitoring.
#### Metrics via OpenTelemetry
Hasura Enterprise is open-telemetry compliant and can export metrics to third-party observability
platforms which support OpenTelemetry. Check out [the OpenTelemetry page](/observability/opentelemetry.mdx) for more information.
#### Distributed traces
Hasura Enterprise also can export distributed traces via OpenTelemetry. Read more at
[here](/observability/opentelemetry.mdx) for more information.
## Database observability
@ -109,8 +130,8 @@ be implemented:
[Query Tags](/observability/query-tags.mdx) are SQL comments that consist of `key=value` pairs that are appended to
generated SQL statements. When you issue a query or mutation with query tags, the generated SQL has some extra
information. Database analytics tools can use that information (metadata) in these comments to analyze DB load
and track or monitor performance.
information. Database analytics tools can use that information (metadata) to analyze DB load and track
or monitor performance.
### Using Query Tags and **pganalyze**
@ -130,3 +151,8 @@ Integrating your observability tools with an incident response platform (IRP) is
propagation. Integrating with an IRP allows high visibility and actionable intelligence across the entire incident
lifecycle. Most IRPs enable your organization to respond quickly to incidents, automate responses, and will allow you to
build more reliable services and platforms.
If you use Prometheus for metrics observability, you can also consider using
[Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) to configure
[common alert rules](https://github.com/hasura/graphql-engine/blob/master/community/boilerplates/observability/enterprise/prometheus/alert.rules#L22)
for performance and error incidents.