docs: update observability best practices

PR-URL: https://github.com/hasura/graphql-engine-mono/pull/10115 GitOrigin-RevId: 6f0d8278265724d18a9e1ff90a2ce67c2e8efd5c
2024-12-14 08:02:15 +03:00 · 2023-08-21 19:26:52 +07:00 · 2023-08-21 19:26:52 +07:00 · 7f48bb9df6
commit 7f48bb9df6
parent c9a061918f
1 changed files with 35 additions and 9 deletions
--- a/docs/docs/observability/observability-best-practices.mdx
+++ b/docs/docs/observability/observability-best-practices.mdx
@ -8,6 +8,8 @@ sidebar_label: Best Practices
 sidebar_position: 2
 ---

+import Thumbnail from '@site/src/components/Thumbnail';
+
 # Observability Best Practices

 ## Introduction
@ -16,10 +18,14 @@ The purpose of this document is to provide an overview of some of the best pract
 observability for your Hasura-driven product. We will cover the fundamentals of observability and provides general
 recommendations on what Hasura considers as observability best practices.

-While specifics of your product or system will define your configurations, we have used Hasura Cloud, Postgres, and
-Datadog to build this guide. Wherever applicable, links are provided to the mentioned product’s documentation.
+While specifics of your product or system will define your configurations, we have used Hasura Cloud, Postgres,
+Prometheus and Grafana to build this guide. Wherever applicable, links are provided to the mentioned product’s
+documentation.

-A sample dashboard based on Datadog is provided for you to replicate.
+We also provide [pre-built Grafana Dashboards](/observability/enterprise-edition/prometheus/pre-built-dashboards.mdx)
+for you to replicate.
+
+<Thumbnail src="/img/observability/grafana-overview-dashboard.png" alt="Hasura Overview Dashboard" width="1000px" />

 ## The basics

@ -82,11 +88,26 @@ Depending on your Hasura Enterprise Edition deployment mode, you may access, exp
 deployment using [this](/deployment/logging.mdx#log-types) document. Generally, you should configure your container logs
 to be exported to your observability platform using the appropriate log drivers.

-#### Metrics via Prometheus integration
+#### Metrics via Prometheus

-You can export metrics of your Hasura Cloud project to Prometheus. You can configure this on the `Integrations` tab on
-the project's settings page. You can find more information on this
-[here](/observability/cloud/prometheus.mdx).
+You can export metrics of your Hasura Enterprise project to Prometheus easily via enabling the `metrics` API. You can find
+more information on this [here](/observability/enterprise-edition/prometheus/integrate-prometheus-grafana.mdx).
+
+For security reasons, the metrics endpoint should not be leaked to the internet. Or if unavoidable, the Prometheus
+secret should be confidential to prevent misuse.
+
+[Pre-built Grafana Dashboards](/observability/enterprise-edition/prometheus/pre-built-dashboards.mdx) are provided to
+visualize Golden signal metrics that you will love for real-time monitoring.
+
+#### Metrics via OpenTelemetry
+
+Hasura Enterprise is open-telemetry compliant and can export metrics to third-party observability
+platforms which support OpenTelemetry. Check out [the OpenTelemetry page](/observability/opentelemetry.mdx) for more information.
+
+#### Distributed traces
+
+Hasura Enterprise also can export distributed traces via OpenTelemetry. Read more at
+[here](/observability/opentelemetry.mdx) for more information.

 ## Database observability

@ -109,8 +130,8 @@ be implemented:

 [Query Tags](/observability/query-tags.mdx) are SQL comments that consist of `key=value` pairs that are appended to
 generated SQL statements. When you issue a query or mutation with query tags, the generated SQL has some extra
-information. Database analytics tools can use that information (metadata) in these comments to analyze DB load
-and track or monitor performance.
+information. Database analytics tools can use that information (metadata) to analyze DB load and track
+or monitor performance.

 ### Using Query Tags and **pganalyze**

@ -130,3 +151,8 @@ Integrating your observability tools with an incident response platform (IRP) is
 propagation. Integrating with an IRP allows high visibility and actionable intelligence across the entire incident
 lifecycle. Most IRPs enable your organization to respond quickly to incidents, automate responses, and will allow you to
 build more reliable services and platforms.
+
+If you use Prometheus for metrics observability, you can also consider using
+[Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) to configure
+[common alert rules](https://github.com/hasura/graphql-engine/blob/master/community/boilerplates/observability/enterprise/prometheus/alert.rules#L22)
+for performance and error incidents.