From 8389388d5416a68e4fb5c43bb72357a6f93281e0 Mon Sep 17 00:00:00 2001
From: paritosh-08 <85472423+paritosh-08@users.noreply.github.com>
Date: Mon, 13 May 2024 17:50:08 +0530
Subject: [PATCH] document unit of prometheus metrics
PR-URL: https://github.com/hasura/graphql-engine-mono/pull/10806
GitOrigin-RevId: 9358230950c0fe488e3fc14f93c1cea158066296
---
.../enterprise-edition/prometheus/metrics.mdx | 30 ++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/docs/docs/observability/enterprise-edition/prometheus/metrics.mdx b/docs/docs/observability/enterprise-edition/prometheus/metrics.mdx
index 7d966a57a8f..7979050333b 100644
--- a/docs/docs/observability/enterprise-edition/prometheus/metrics.mdx
+++ b/docs/docs/observability/enterprise-edition/prometheus/metrics.mdx
@@ -16,6 +16,19 @@ import ProductBadge from '@site/src/components/ProductBadge';
+Hasura exports three types of prometheus metrics:
+- Histogram: Represents the distribution of a set of values across a set of buckets. Please note that the histogram
+ buckets are [cumulative](https://en.wikipedia.org/wiki/Histogram#Cumulative_histogram). You can read more about the
+ histogram metric type [here](https://prometheus.io/docs/concepts/metric_types/#histogram). For example
+ `hasura_event_webhook_processing_time_seconds` is a histogram metric.
+- Counter: Represents a cumulative metric that represents a single monotonically increasing counter whose value can only
+ increase or be reset to zero on restart. You can read more about the counter metric type
+ [here](https://prometheus.io/docs/concepts/metric_types/#counter). For example `hasura_graphql_requests_total` is a
+ counter metric.
+- Gauge: Represents a single numerical value that can arbitrarily go up and down. You can read more about the gauge
+ metric type [here](https://prometheus.io/docs/concepts/metric_types/#gauge). For example `hasura_active_subscriptions`
+ is a gauge metric.
+
## Metrics exported
The following metrics are exported by Hasura GraphQL Engine:
@@ -32,6 +45,7 @@ buckets, you should consider [tuning the performance](/deployment/performance-tu
| Name | `hasura_graphql_execution_time_seconds` |
| Type | Histogram
Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10 |
| Labels | `operation_type`: query \| mutation |
+| Unit | seconds |
:::info GraphQL request execution time
@@ -71,6 +85,7 @@ of your database.
| Name | `hasura_event_fetch_time_per_batch_seconds` |
| Type | Histogram
Buckets: 0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10 |
| Labels | none |
+| Unit | seconds |
#### Event invocations total
@@ -101,6 +116,7 @@ Time taken for an event to be processed.
| Name | `hasura_event_processing_time_seconds` |
| Type | Histogram
Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100 |
| Labels | `trigger_name`, `source_name` |
+| Unit | seconds |
The processing of an event involves the following steps:
@@ -148,6 +164,7 @@ server.
| Name | `hasura_event_queue_time_seconds` |
| Type | Histogram
Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100 |
| Labels | `trigger_name`, `source_name` |
+| Unit | seconds |
#### Event Triggers HTTP Workers
@@ -172,6 +189,7 @@ A higher processing time indicates slow webhook, you should try to optimize the
| Name | `hasura_event_webhook_processing_time_seconds` |
| Type | Histogram
Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10 |
| Labels | `trigger_name`, `source_name` |
+| Unit | seconds |
#### Events fetched per batch
@@ -283,6 +301,7 @@ some extra process time for other tasks the poller does during a single poll. In
| Name | `hasura_subscription_total_time_seconds` |
| Type | Histogram
Buckets: 0.000001, 0.0001, 0.01, 0.1, 0.3, 1, 3, 10, 30, 100 |
| Labels | `subscription_kind`: streaming \| live-query, `operation_name`, `parameterized_query_hash` |
+| Unit | seconds |
#### Subscription Database Execution Time
@@ -302,6 +321,7 @@ consider investigating the subscription query and see if indexes can help improv
| Name | `hasura_subscription_db_execution_time_seconds` |
| Type | Histogram
Buckets: 0.000001, 0.0001, 0.01, 0.1, 0.3, 1, 3, 10, 30, 100 |
| Labels | `subscription_kind`: streaming \| live-query, `operation_name`, `parameterized_query_hash` |
+| Unit | seconds |
#### WebSocket Egress
@@ -312,6 +332,7 @@ The total size of WebSocket messages sent in bytes.
| Name | `hasura_websocket_messages_sent_bytes_total` |
| Type | Counter |
| Labels | `operation_name`, `parameterized_query_hash` |
+| Unit | bytes |
#### WebSocket Ingress
@@ -322,6 +343,7 @@ The total size of WebSocket messages received in bytes.
| Name | `hasura_websocket_messages_received_bytes_total` |
| Type | Counter |
| Labels | none |
+| Unit | bytes |
#### Websocket Message Queue Time
@@ -332,6 +354,7 @@ The time for which a websocket message remains queued in the GraphQL engine's we
| Name | `hasura_websocket_message_queue_time` |
| Type | Histogram
Buckets: 0.000001, 0.0001, 0.01, 0.1, 0.3, 1, 3, 10, 30, 100 |
| Labels | none |
+| Unit | seconds |
#### Websocket Message Write Time
@@ -342,6 +365,7 @@ The time taken to write a websocket message into the TCP send buffer.
| Name | `hasura_websocket_message_write_time` |
| Type | Histogram
Buckets: 0.000001, 0.0001, 0.01, 0.1, 0.3, 1, 3, 10, 30, 100 |
| Labels | none |
+| Unit | seconds |
### Cache metrics
@@ -349,7 +373,7 @@ See more details on caching metrics [here](/caching/caching-metrics.mdx)
#### Hasura cache request count
-Tracks cache hit and miss requests, which helps in monitoring and optimizing cache utilization.
+Total number of cache hit and miss requests. This helps in monitoring and optimizing cache utilization.
| | |
| ------ | ---------------------------- |
@@ -448,6 +472,7 @@ The time taken to establish and initialize a PostgreSQL connection.
| Name | `hasura_postgres_connection_init_time` |
| Type | Histogram
Buckets: 0.000001, 0.0001, 0.01, 0.1, 0.3, 1, 3, 10, 30, 100 |
| Labels | `source_name`: name of the database
`conn_info`: connection url string (password omitted) or name of the connection url environment variable
`role`: primary \| replica |
+| Unit | seconds |
### Hasura Postgres Pool Wait Time
@@ -458,6 +483,7 @@ The time taken to acquire a connection from the pool.
| Name | `hasura_postgres_pool_wait_time` |
| Type | Histogram
Buckets: 0.000001, 0.0001, 0.01, 0.1, 0.3, 1, 3, 10, 30, 100 |
| Labels | `source_name`: name of the database
`conn_info`: connection url string (password omitted) or name of the connection url environment variable
`role`: primary \| replica |
+| Unit | seconds |
### Hasura source health
@@ -481,6 +507,7 @@ and `/v1/version` endpoints or any other undefined resource/endpoint (for exampl
| Name | `hasura_http_response_bytes_total` |
| Type | Counter |
| Labels | none |
+| Unit | bytes |
### HTTP Ingress
@@ -492,6 +519,7 @@ Total size of HTTP request bodies received via the HTTP server excluding request
| Name | `hasura_http_request_bytes_total` |
| Type | Counter |
| Labels | none |
+| Unit | bytes |
### OpenTelemetry OTLP Export Metrics