change traffic golden signal in event trigger observability

PR-URL: https://github.com/hasura/graphql-engine-mono/pull/9350 Co-authored-by: Sean Park-Ross <94021366+seanparkross@users.noreply.github.com> GitOrigin-RevId: d1f0b7684b5d758b7e3e3beb70dfc387cdcde792
2024-12-15 01:12:56 +03:00 · 2023-06-28 14:22:02 +05:30 · 2023-06-28 14:22:02 +05:30 · 7ef547e1f4
commit 7ef547e1f4
parent 8e6ec8b60d
1 changed files with 12 additions and 13 deletions
--- a/docs/docs/event-triggers/observability-and-performance.mdx
+++ b/docs/docs/event-triggers/observability-and-performance.mdx
@ -73,10 +73,10 @@ Triggers system performance.

 ### Latency

-Latency for the Hasura Event Triggers system references the total time taken by the graphql engine in delivering the
-events. To monitor the latency, you can use the [`hasura_event_processing_time_seconds`](#event-processing-time) metric.
+Latency for the Event Triggers system is the time taken by Hasura GraphQL Engine to deliver events. To monitor this
+latency, you can use the [`hasura_event_processing_time_seconds`](#event-processing-time) metric.

-If the value of this metric is high, it maybe an indication that events are taking longer time to be processed and
+If the value of this metric is high, it may be an indication that events are taking a longer time to be processed and
 delivered.

 The following are few things you can do to analyze and diagnose the latency issue:
@ -118,18 +118,17 @@ To monitor saturation, you can use the following:

 ### Traffic

-Traffic for Event Triggers means the number of new events created at a given point of time. Since it's complicated to
-figure out the number of events created, you can use the number of Event Triggers processed as a proxy for traffic.
+Traffic for Event Triggers is the number of new events created in a given time frame (like 1000 events per minute).
+Events can be created even if mutations don't go through Hasura i.e. using some other client. Hence, Hasura doesn't
+give the number of events as metrics, but you can find this out by using metadata APIs like
+[pg_get_event_logs](/latest/api-reference/metadata-api/event-triggers/#metadata-pg-get-event-logs). "Proxy"
+metrics for traffic are the number of mutations, number of events processed and number of events fetched per batch.

-To monitor traffic, you can use the [`hasura_event_processed_total`](#event-processed-total) metric.
+To monitor traffic, you can use the [`hasura_event_processed_total`](#event-processed-total) and the
+[`hasura_events_fetched_per_batch`](#events-fetched-per-batch) metrics.

-If the value of this metric is high (and above your established baseline), and the Hasura Event Triggers system is also
-saturated (`hasura_event_trigger_http_workers` nearing the configured HTTP worker pool size and
-`hasura_event_queue_time_seconds` is also high), then you may want to consider doing the following:
-
-1.  Increasing the number of HTTP workers by increasing the
-    [Events HTTP Pool Size](/deployment/graphql-engine-flags/reference.mdx/#events-http-pool-size)
-2.  [Scaling](/latest/faq/index/#faq-scaling) your Hasura instance horizontally to handle more events.
+If the value of `hasura_events_fetched_per_batch` is close to the configured max batch size, then it hints that there
+may be some pending events in the database yet to be fetched and processed.

 ### Errors