mirror of
https://github.com/hasura/graphql-engine.git
synced 2024-12-14 17:02:49 +03:00
docs: refactor metrics table for EE
PR-URL: https://github.com/hasura/graphql-engine-mono/pull/9013 GitOrigin-RevId: f867dc5daa22a9a76357280f5079eae5942963eb
This commit is contained in:
parent
c1705a09df
commit
f9c55a2a04
@ -34,205 +34,234 @@ HASURA_GRAPHQL_METRICS_SECRET=<secret>
|
||||
curl 'http://127.0.0.1:8080/v1/metrics' -H 'Authorization: Bearer <secret>'
|
||||
```
|
||||
|
||||
:::note Note
|
||||
:::info Configure a secret
|
||||
|
||||
- The metrics endpoint should be configured with a secret to prevent misuse and should not be exposed over the internet.
|
||||
The metrics endpoint should be configured with a secret to prevent misuse and should not be exposed over the internet.
|
||||
|
||||
:::
|
||||
|
||||
## Metrics exported
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<td>Name</td>
|
||||
<td>Description</td>
|
||||
<td>Type</td>
|
||||
<td>Labels</td>
|
||||
<td>Comment</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_http_connections</code></td>
|
||||
<td>Current number of active HTTP connections (excluding WebSocket connections)</td>
|
||||
<td>Gauge</td>
|
||||
<td>none</td>
|
||||
<td>Represents the HTTP load on the server</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_websocket_connections</code></td>
|
||||
<td>Current number of active WebSocket connections</td>
|
||||
<td>Gauge</td>
|
||||
<td>none</td>
|
||||
<td>Represents the websocket load on the server.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_active_subscriptions</code></td>
|
||||
<td>Current number of active subscriptions</td>
|
||||
<td>Gauge</td>
|
||||
<td>none</td>
|
||||
<td>Represents the subscription load on the server.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_graphql_requests_total</code></td>
|
||||
<td>Number of GraphQL requests received </td>
|
||||
<td>Counter</td>
|
||||
<td>• "operation_type": query|mutation|subscription|unknown <br/>
|
||||
• The "unknown" operation type will be returned for queries that fail authorization, parsing, or certain
|
||||
validations<br/>
|
||||
• "response_status": success|failed
|
||||
</td>
|
||||
<td>Represents the graphql query/mutation traffic on the server.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_graphql_execution_time_seconds</code></td>
|
||||
<td>Execution time of successful GraphQL requests (excluding subscriptions)</td>
|
||||
<td>Histogram<br/><br/>Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10</td>
|
||||
<td>• "operation_type": query|mutation</td>
|
||||
<td>If more requests are falling in the higher buckets, you should consider <a href="/latest/deployment/performance-tuning">tuning the performance</a>.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_event_queue_time_seconds</code></td>
|
||||
<td>Queue time for an event already in the processing queue</td>
|
||||
<td>Histogram<br/><br/>Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100</td>
|
||||
<td>none</td>
|
||||
<td>More events in higher bucket implies slow processing, you can consider increasing the <a href="/latest/deployment/graphql-engine-flags/reference/#events-http-pool-size">HTTP pool size</a> or optimizing the webhook server.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_event_fetch_time_per_batch_seconds</code></td>
|
||||
<td>Latency of fetching a batch of events</td>
|
||||
<td>Histogram<br/><br/>Buckets: 0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10</td>
|
||||
<td>none</td>
|
||||
<td>A higher metric indicates slower polling of events from the database, you should consider looking into the performance of your database.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_event_webhook_processing_time_seconds</code></td>
|
||||
<td>The time between when an HTTP worker picks an event for delivery to the time its response is updated in the DB</td>
|
||||
<td>Histogram<br/><br/>Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10</td>
|
||||
<td>none</td>
|
||||
<td>A higher processing time indicates slow webhook, you should try to optimize the event webhook.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_event_processing_time_seconds</code></td>
|
||||
<td>The time taken for an event to be delivered since it's been created (if first attempt) or retried (after first attempt).</td>
|
||||
<td>Histogram<br/><br/>Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100</td>
|
||||
<td>none</td>
|
||||
<td>This metric can be considered as the end-to-end processing time for an event.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_event_trigger_http_workers</code></td>
|
||||
<td>Current number of active Event Trigger HTTP workers</td>
|
||||
<td>Gauge</td>
|
||||
<td>none</td>
|
||||
<td>Compare this number to the <a href="/latest/deployment/graphql-engine-flags/reference/#events-http-pool-size">HTTP pool size</a>. Consider increasing it if the metric is near the current configured value.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_event_processed_total</code></td>
|
||||
<td>Total number of events processed</td>
|
||||
<td>Counter</td>
|
||||
<td>• "status": success|failed</td>
|
||||
<td>Represents the Event Trigger egress.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_event_invocations_total</code></td>
|
||||
<td>Total number of events invoked</td>
|
||||
<td>Counter</td>
|
||||
<td>• "status": success|failed</td>
|
||||
<td>Represents the Event Trigger webhook HTTP requests made.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_postgres_connections</code></td>
|
||||
<td>Current number of active PostgreSQL connections</td>
|
||||
<td>Gauge</td>
|
||||
<td>• "source_name": name of the database<br/>
|
||||
• "conn_info": connection url string (password omitted) or name of the connection url environment variable<br/>
|
||||
• "role": primary|replica
|
||||
</td>
|
||||
<td>Compare this to <a href="/latest/api-reference/syntax-defs/#pgpoolsettings">pool settings</a>.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_cron_events_invocation_total</code></td>
|
||||
<td>Total number of cron events invoked</td>
|
||||
<td>Counter</td>
|
||||
<td>• "status": success|failed<br /></td>
|
||||
<td>Total number of invocations made for cron events.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_cron_events_processed_total</code></td>
|
||||
<td>Total number of cron events processed</td>
|
||||
<td>Counter</td>
|
||||
<td>• "status": success|failed<br /></td>
|
||||
<td>
|
||||
Compare this to <code>hasura_cron_events_invocation_total</code>. A high difference between the two metrics
|
||||
indicates high failure rate of the cron webhook.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>hasura_oneoff_events_invocation_total</code></td>
|
||||
<td>Total number of one-off events invoked</td>
|
||||
<td>Counter</td>
|
||||
<td>• "status": success|failed<br /></td>
|
||||
<td>Total number of invocations made for one-off events.</td>
|
||||
</tr>
|
||||
The following metrics are exported by Hasura GraphQL Engine:
|
||||
|
||||
<tr>
|
||||
<td>
|
||||
<code>hasura_oneoff_events_processed_total</code>
|
||||
</td>
|
||||
<td>Total number of one-off events processed</td>
|
||||
<td>Counter</td>
|
||||
<td>
|
||||
• "status": success|failed
|
||||
<br />
|
||||
</td>
|
||||
<td>
|
||||
Compare this to <code>hasura_oneoff_events_invocation_total</code>. A high difference between the two metrics
|
||||
indicates high failure rate of the one-off webhook.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
<code>hasura_active_subscription_pollers</code>
|
||||
</td>
|
||||
<td>Current number of active subscription pollers. A subscription poller <a href="https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md#idea-3-batch-multiple-live-queries-into-one-sql-query">multiplexes </a> similar subscriptions together.
|
||||
</td>
|
||||
<td>Gauge</td>
|
||||
<td>
|
||||
• "subscription_kind": streaming|live-query
|
||||
<br />
|
||||
</td>
|
||||
<td>
|
||||
The value of this metric is supposed to be proportional to the number of uniquely parameterised subscriptions i.e. subscriptions with the same selection set
|
||||
but with different input arguments and session variables are multiplexed on the same poller.
|
||||
If this metric is high then it may be an indication that there are too many uniquely parameterised subscriptions
|
||||
which could be optimized for better performance.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
<code>hasura_active_subscription_pollers_in_error_state</code>
|
||||
</td>
|
||||
<td>Current number of active subscription pollers that are in the error state.
|
||||
A subscription poller <a href="https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md#idea-3-batch-multiple-live-queries-into-one-sql-query">multiplexes </a>
|
||||
similar subscriptions together.
|
||||
</td>
|
||||
<td>Gauge</td>
|
||||
<td>
|
||||
• "subscription_kind": streaming|live-query
|
||||
<br />
|
||||
</td>
|
||||
<td>
|
||||
A non-zero value of this metric indicates that there are runtime errors in atleast one of the subscription pollers that are running
|
||||
in Hasura. In most of the cases, runtime errors in subscriptions are caused due to the changes at the data model layer and fixing the
|
||||
issue at the data model layer should automatically fix the runtime errors.
|
||||
</td>
|
||||
</tr>
|
||||
### Hasura active subscription pollers
|
||||
|
||||
Current number of active subscription pollers. A subscription poller
|
||||
[multiplexes](https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md#idea-3-batch-multiple-live-queries-into-one-sql-query)
|
||||
similar subscriptions together. The value of this metric should be proportional to the number of uniquely parameterized
|
||||
subscriptions (i.e., subscriptions with the same selection set, but with different input arguments and session variables
|
||||
are multiplexed on the same poller). If this metric is high then it may be an indication that there are too many
|
||||
uniquely parameterized subscriptions which could be optimized for better performance.
|
||||
|
||||
| | |
|
||||
| ------ | -------------------------------------------- |
|
||||
| Name | `hasura_active_subscription_pollers` |
|
||||
| Type | Gauge |
|
||||
| Labels | `subscription_kind`: streaming \| live-query |
|
||||
|
||||
</table>
|
||||
### Hasura active subscription pollers in error state
|
||||
|
||||
:::note Note
|
||||
Current number of active subscription pollers that are in the error state. A subscription poller
|
||||
[multiplexes](https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md#idea-3-batch-multiple-live-queries-into-one-sql-query)
|
||||
similar subscriptions together. A non-zero value of this metric indicates that there are runtime errors in atleast one
|
||||
of the subscription pollers that are running in Hasura. In most of the cases, runtime errors in subscriptions are caused
|
||||
due to the changes at the data model layer and fixing the issue at the data model layer should automatically fix the
|
||||
runtime errors.
|
||||
|
||||
The GraphQL request execution time:
|
||||
| | |
|
||||
| ------ | --------------------------------------------------- |
|
||||
| Name | `hasura_active_subscription_pollers_in_error_state` |
|
||||
| Type | Gauge |
|
||||
| Labels | `subscription_kind`: streaming \| live-query |
|
||||
|
||||
### Hasura active subscriptions
|
||||
|
||||
Current number of active subscriptions, representing the subscription load on the server.
|
||||
|
||||
| | |
|
||||
| ------ | ----------------------------- |
|
||||
| Name | `hasura_active_subscriptions` |
|
||||
| Type | Gauge |
|
||||
| Labels | none |
|
||||
|
||||
### Hasura cron events invocation total
|
||||
|
||||
Total number of cron events invoked, representing the number of invocations made for cron events.
|
||||
|
||||
| | |
|
||||
| ------ | ------------------------------------- |
|
||||
| Name | `hasura_cron_events_invocation_total` |
|
||||
| Type | Counter |
|
||||
| Labels | `status`: success \| failed |
|
||||
|
||||
### Hasura cron events processed total
|
||||
|
||||
Total number of cron events processed, representing the number of invocations made for cron events. Compare this to
|
||||
`hasura_cron_events_invocation_total`. A high difference between the two metrics indicates high failure rate of the cron
|
||||
webhook.
|
||||
|
||||
| | |
|
||||
| ------ | ------------------------------------ |
|
||||
| Name | `hasura_cron_events_processed_total` |
|
||||
| Type | Counter |
|
||||
| Labels | `status`: success \| failed |
|
||||
|
||||
### Hasura event fetch time per batch
|
||||
|
||||
Latency of fetching a batch of events. A higher metric indicates slower polling of events from the database, you should
|
||||
consider looking into the performance of your database.
|
||||
|
||||
| | |
|
||||
| ------ | ------------------------------------------------------------------------------------------ |
|
||||
| Name | `hasura_event_fetch_time_per_batch_seconds` |
|
||||
| Type | Histogram<br /><br />Buckets: 0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10 |
|
||||
| Labels | none |
|
||||
|
||||
### Hasura event invocations total
|
||||
|
||||
Total number of events invoked. Represents the Event Trigger webhook HTTP requests made.
|
||||
|
||||
| | |
|
||||
| ------ | -------------------------------- |
|
||||
| Name | `hasura_event_invocations_total` |
|
||||
| Type | Counter |
|
||||
| Labels | `status`: success \| failed |
|
||||
|
||||
### Hasura event processed total
|
||||
|
||||
Total number of events processed. Represents the Event Trigger egress.
|
||||
|
||||
| | |
|
||||
| ------ | ------------------------------ |
|
||||
| Name | `hasura_event_processed_total` |
|
||||
| Type | Counter |
|
||||
| Labels | `status`: success \| failed |
|
||||
|
||||
### Hasura event processing time
|
||||
|
||||
The time taken for an event to be delivered since it's been created (if first attempt) or retried (after first attempt).
|
||||
This metric can be considered as the end-to-end processing time for an event.
|
||||
|
||||
| | |
|
||||
| ------ | --------------------------------------------------------------------- |
|
||||
| Name | `hasura_event_processing_time_seconds` |
|
||||
| Type | Histogram<br /><br />Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100 |
|
||||
| Labels | none |
|
||||
|
||||
### Hasura event queue time
|
||||
|
||||
Queue time for an event already in the processing queue. More events in a higher bucket implies slow processing. In this
|
||||
case, you can consider increasing the
|
||||
[HTTP pool size](/deployment/graphql-engine-flags/reference.mdx/#events-http-pool-size) or optimizing the webhook
|
||||
server.
|
||||
|
||||
| | |
|
||||
| ------ | --------------------------------------------------------------------- |
|
||||
| Name | `hasura_event_queue_time_seconds` |
|
||||
| Type | Histogram<br /><br />Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100 |
|
||||
| Labels | none |
|
||||
|
||||
### Hasura event trigger HTTP workers
|
||||
|
||||
Current number of active Event Trigger HTTP workers. Compare this number to the
|
||||
[HTTP pool size](/deployment/graphql-engine-flags/reference.mdx/#events-http-pool-size). Consider increasing it if the
|
||||
metric is near the current configured value.
|
||||
|
||||
| | |
|
||||
| ------ | ----------------------------------- |
|
||||
| Name | `hasura_event_trigger_http_workers` |
|
||||
| Type | Gauge |
|
||||
| Labels | none |
|
||||
|
||||
### Hasura event webhook processing time
|
||||
|
||||
The time between when an HTTP worker picks an event for delivery to the time its response is updated in the DB. A higher
|
||||
processing time indicates slow webhook, you should try to optimize the event webhook.
|
||||
|
||||
| | |
|
||||
| ------ | ------------------------------------------------------------ |
|
||||
| Name | `hasura_event_webhook_processing_time_seconds` |
|
||||
| Type | Histogram<br /><br />Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10 |
|
||||
| Labels | none |
|
||||
|
||||
### Hasura GraphQL execution time seconds
|
||||
|
||||
Execution time of successful GraphQL requests (excluding subscriptions). If more requests are falling in the higher
|
||||
buckets, you should consider [tuning the performance](/deployment/performance-tuning.mdx).
|
||||
|
||||
| | |
|
||||
| ------ | -------------------------------------------------------------- |
|
||||
| Name | `hasura_graphql_execution_time_seconds` |
|
||||
| Type | Histogram<br /><br />Buckets: 0.01, 0.03, 0.1, 0.3, 1, 3, 10 |
|
||||
| Labels | `operation_type`: query \| mutation \| subscription \| unknown |
|
||||
|
||||
### Hasura GraphQL requests total
|
||||
|
||||
Number of GraphQL requests received, representing the GraphQL query/mutation traffic on the server.
|
||||
|
||||
| | |
|
||||
| ------ | -------------------------------------------------------------- |
|
||||
| Name | `hasura_graphql_requests_total` |
|
||||
| Type | Counter |
|
||||
| Labels | `operation_type`: query \| mutation \| subscription \| unknown |
|
||||
|
||||
The `unknown` operation type will be returned for queries that fail authorization, parsing, or certain validations. The
|
||||
`response_status` label will be `success` for successful requests and `failed` for failed requests.
|
||||
|
||||
### Hasura HTTP connections
|
||||
|
||||
Current number of active HTTP connections (excluding WebSocket connections), representing the HTTP load on the server.
|
||||
|
||||
| | |
|
||||
| ------ | ------------------------- |
|
||||
| Name | `hasura_http_connections` |
|
||||
| Type | Gauge |
|
||||
| Labels | none |
|
||||
|
||||
### Hasura one-off events invocation total
|
||||
|
||||
Total number of one-off events invoked, representing the number of invocations made for one-off events.
|
||||
|
||||
| | |
|
||||
| ------ | --------------------------------------- |
|
||||
| Name | `hasura_oneoff_events_invocation_total` |
|
||||
| Type | Counter |
|
||||
| Labels | `status`: success \| failed |
|
||||
|
||||
### Hasura one-off events processed total
|
||||
|
||||
Total number of one-off events processed, representing the number of invocations made for one-off events. Compare this
|
||||
to `hasura_oneoff_events_invocation_total`. A high difference between the two metrics indicates high failure rate of the
|
||||
one-off webhook.
|
||||
|
||||
| | |
|
||||
| ------ | -------------------------------------- |
|
||||
| Name | `hasura_oneoff_events_processed_total` |
|
||||
| Type | Counter |
|
||||
| Labels | `status`: success \| failed |
|
||||
|
||||
### Hasura postgres connections
|
||||
|
||||
Current number of active PostgreSQL connections. Compare this to
|
||||
[pool settings](/api-reference/syntax-defs.mdx/#pgpoolsettings).
|
||||
|
||||
| | |
|
||||
| ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Name | `hasura_postgres_connections` |
|
||||
| Type | Gauge |
|
||||
| Labels | `source_name`: name of the database<br />`conn_info`: connection url string (password omitted) or name of the connection url environment variable<br />`role`: primary \| replica |
|
||||
|
||||
### Hasura WebSocket connections
|
||||
|
||||
Current number of active WebSocket connections, representing the WebSocket load on the server.
|
||||
|
||||
| | |
|
||||
| ------ | ------------------------------ |
|
||||
| Name | `hasura_websocket_connections` |
|
||||
| Type | Gauge |
|
||||
| Labels | none |
|
||||
|
||||
:::info GraphQL request execution time
|
||||
|
||||
- Uses wall-clock time, so it includes time spent waiting on I/O.
|
||||
- Includes authorization, parsing, validation, planning, and execution (calls to databases, Remote Schemas).
|
||||
|
Loading…
Reference in New Issue
Block a user