docs: add guide on performance tuning (fix #4805)

GITHUB_PR_NUMBER: 9052 GITHUB_PR_URL: https://github.com/hasura/graphql-engine/pull/9052 PR-URL: https://github.com/hasura/graphql-engine-mono/pull/6163 Co-authored-by: Aniketh Varma <55805574+aniketh-varma@users.noreply.github.com> Co-authored-by: Rob Dominguez <24390149+robertjdominguez@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> GitOrigin-RevId: 0e9fac8795211e211aa9909d4f18dcfdc261f1b9
2024-09-19 14:37:32 +03:00 · 2022-11-07 22:57:10 +05:30 · 2022-11-07 22:57:10 +05:30 · 78c36392eb
commit 78c36392eb
parent 0961e4fd74
5 changed files with 208 additions and 0 deletions
--- a/docs/docs/deployment/performance-tuning.mdx
+++ b/docs/docs/deployment/performance-tuning.mdx
@ -0,0 +1,208 @@
+---
+description: Tune performance of the Hasura GraphQL Engine
+keywords:
+  - hasura
+  - docs
+  - deployment
+  - performance
+  - tuning
+sidebar_position: 120
+sidebar_label: Performance tuning
+---
+
+# Tune Hasura Performance
+
+This page serves as a reference for suggestions on fine tuning the performance of your Hasura GraphQL Engine instance.
+We cover database configurations, scaling, observability, and software architecture to help you get the most out of your
+set up.
+
+## Configuration and deployment
+
+### Postgres configuration
+
+PostgreSQL's basic configuration is tuned for wide compatibility rather than performance. The default parameters are
+very undersized for your system. You can read more details about the configuration settings at the
+[PostgreSQL Wiki](https://wiki.postgresql.org/wiki/Main_Page). There are also config generator tools that can help us:
+
+- [Tuning Your PostgreSQL Server](https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server)
+- [Config generator](https://pgtune.leopard.in.ua)
+
+### Hasura configuration
+
+With respect to Hasura, several environment variables can be configured to tune performance.
+
+#### `HASURA_GRAPHQL_PG_CONNECTIONS`
+
+- Minimum **2** connections
+- Default value: **50**
+- Max connections: = Max connections of Postgres - 5 (keep free connections for another services, e.g PGAdmin, metrics
+  tools)
+
+However, how many connections is the best setting? Of course, it depends on the Postgres server's hardware
+specifications. Moreover, too many connections doesn't mean query performance will be the highest. There are many great
+articles that analyze this deeper:
+
+- [https://brandur.org/postgres-connections](https://brandur.org/postgres-connections)
+- [https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing](https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing)
+
+There isn't a silver bullet for all server specs. Your developer has to test and benchmark carefully for the final
+result. However, as a starting point, you can estimate with this formula, and then test around this value:
+
+```
+connections = ((core_count * 2) + effective_spindle_count)
+```
+
+For example, if your server has a 4 Core i7 CPU and 1 hard disk, it should have a connection pool of:
+
+9 = ((4 \* 2) +1).
+
+:::tip Tip
+
+Call it 10, as a nice round number.
+
+:::
+
+For high-transaction applications, a horizontal scale with multiple GraphQL Engine clusters is the recommended best
+practice. However, you should be aware of the total connections of all nodes. The number must be lower than the maximum
+connections of the Postgres instance.
+
+#### `HASURA_GRAPHQL_CONNECTIONS_PER_READ_REPLICA`
+
+With Read replicas, Hasura can load balance multiple databases. However, you will need to balance connections between
+database nodes too:
+
+- Master connections (`HASURA_GRAPHQL_PG_CONNECTIONS`) are now used for writing only. You can decrease max connections
+  lower if Hasura doesn't write much, or share connections with other Hasura nodes.
+- Currently, read-replica connections use one setting for all databases. It can't flexibly configure specific values for
+  each node. Therefore, you need to be aware of the total number of connections when scaling Hasura to multiple nodes.
+
+#### `HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_REFETCH_INTERVAL`
+
+Default: **1000** (ms)
+
+In brief, live query subscribers are grouped with the same query and variables. The GraphQL Engine needs to execute one
+query and return the same results to clients once every refetch interval.
+
+The smaller the interval is, the faster the update clients receive. However, everything has a cost. Small intervals with
+a large number of subscriptions need larger CPU and memory resources. If you don't need such high frequency, the
+interval can be set a bit longer. In contrast, with a small to medium number of subscriptions, the default value will
+suffice.
+
+#### `HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_BATCH_SIZE`
+
+Default: **100**
+
+Imagine there are 1,000,000 subscribers to the same query. Emitting to millions of websockets in sequence can cause
+delays and eat more memory in a long queue. However, a small batch size can increase the number of SQL transactions.
+This value needs to be kept in balance. If you don't have an idea to determine what value to use, use the default.
+
+### Scalability
+
+The Hasura GraphQL Engine binary is containerized by default, so it is easy to scale horizontally. You need to estimate
+concurrent requests/second, benchmark how many requests one Hasura node can load, then scale multiple nodes with a
+simple calculation:
+
+```
+total_nodes = required_ccu / requests_per_node + backup_node
+```
+
+:::info Note
+
+`backup_node` is `0` or `1`, depending on your plan.
+
+:::
+
+However, you need to be aware of the total Postgres connections, too. The default `HASURA_GRAPHQL_PG_CONNECTIONS` value
+is `50`, meanwhile the default Postgres `max_connections` configuration is `100`. The Postgres server will easily be out
+of connections with `3` Hasura nodes, or `2` nodes with events/action services that connect directly to the database.
+
+```
+pg_max_connections >= hasura_nodes * hasura_pg_connections + event_nodes * event_pg_connections
+```
+
+### Observability
+
+Observability tools help us track issues, alert us to errors, and allow us to monitor performance and hardware usage. It
+is critical in production. There are many open-source and commercial services. However, you may have to combine many
+tools because of the architectural complexity. For more information, check out our
+[observability section](/observability/index.mdx) and our
+[observability best practices](/guides/best-practices-for-production/observability.mdx).
+
+## Software architecture and best practices
+
+### Hasura as a data service
+
+#### Connection Pooler
+
+Database connection management isn't an easy task, especially when scaling to multiple application nodes. There are
+common issues such as connection leaking, and maximum connections exceeded. If you use serverless applications for
+Actions/event triggers that connect directly to a database, connection leaking is unavoidable because every invocation
+may result in a new connection to the database. This is especially true when the number of services has grown into the
+hundreds.
+
+You can use many solutions such as [PgBouncer](https://www.pgbouncer.org/), vertical scaling, and increasing
+`max_connections` on Postgres configurations. However, increasing too many connections can have unintended consequences
+for your server and degrade performance.
+
+Therefore, you can reduce connection usage by querying data from the GraphQL Engine instead. Connection pooling will be
+centralized in Hasura nodes.
+
+![hasura connection pooling](/img/deployment/hasura-connection-poller.png)
+
+You can read more in the
+[Hasura Blog](https://hasura.io/blog/level-up-your-serverless-game-with-a-graphql-data-as-a-service-layer/).
+
+### Understand your data
+
+Hasura's query performance relies on database performance. When there is any performance issue, you need profiling to
+identify the bottleneck point and optimize your database queries. Utilizing the power of Postgres can help boost
+application speed.
+
+Fundamental knowledge you should know and practice:
+
+- Index your table.
+- Optimize queries with view, materialized view, and functions.
+- Use trigger to update data instead of using 2 or more request calls.
+- Normalize data structure.
+- [EXPLAIN, ANALYZE](https://www.postgresql.org/docs/current/sql-explain.html).
+
+### Query tips
+
+- Avoid too many `_or` conditions. This doesn't utilize the table index.
+- `like`, `ilike` is expensive, especially on long text. Use
+  [Full-text search](https://hasura.io/blog/full-text-search-with-hasura-graphql-api-postgres/) instead.
+- Pagination is necessary.
+- Fetching batched multiple queries in one request is useful. However, you shouldn't overuse it. One common case is
+  using aggregate count with data in the same pagination query. It is okay for small-to-medium tables. However, when the
+  table size is large, the query will be slow because of the scanning the entire table.
+- Avoid joining too many tables.
+
+### Microservices
+
+Hardware has its limits. It's expensive to scale servers as well as optimize data. Moreover, Postgres doesn't support
+master-master replica, so it will be a bottleneck if you store all the data in one database. Therefore, you can divide
+your business logic into multiple smaller services, or microservices.
+
+Hasura encourages microservices with [Remote Schemas](/remote-schemas/index.mdx). This can act as an API Gateway that
+routes to multiple, smaller GraphQL servers.
+
+![hasura load-balancer](/img/deployment/hasura-microservices.png)
+
+For example, in an e-commerce application, you can design three Hasura services:
+
+- User + Authentication
+- Product management
+- Order + Transactions
+
+Depending on a project's scope, the design philosophy is flexible. On a small project, one database will suffice.
+
+### Postgres ecosystem
+
+Thanks to the open-source community, Postgres has many extensions for various types of applications:
+
+- Time-series data, metrics, IoT: [TimescaleDB](https://www.timescale.com/), [CitusDB](https://www.citusdata.com/).
+- Spatial and geographic objects: [PostGIS](https://postgis.net/).
+- Image processing: [PostPic](https://github.com/drotiro/postpic).
+
+With Remote Schemas, you can use Hasura with multiple databases for various use cases. For example, Postgres for user
+data, TimescaleDB for transaction logs, and Postgis for Geographic services.
--- a/docs/static/img/deployment/hasura-connection-poller.png
+++ b/docs/static/img/deployment/hasura-connection-poller.png
--- a/docs/static/img/deployment/hasura-load-balancer.png
+++ b/docs/static/img/deployment/hasura-load-balancer.png
--- a/docs/static/img/deployment/hasura-microservices.png
+++ b/docs/static/img/deployment/hasura-microservices.png
--- a/docs/static/img/deployment/hasura-usage.png
+++ b/docs/static/img/deployment/hasura-usage.png