docs: add guide on performance tuning (fix #4805)

GITHUB_PR_NUMBER: 9052
GITHUB_PR_URL: https://github.com/hasura/graphql-engine/pull/9052

PR-URL: https://github.com/hasura/graphql-engine-mono/pull/6163
Co-authored-by: Aniketh Varma <55805574+aniketh-varma@users.noreply.github.com>
Co-authored-by: Rob Dominguez <24390149+robertjdominguez@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
GitOrigin-RevId: 0e9fac8795211e211aa9909d4f18dcfdc261f1b9
This commit is contained in:
hasura-bot 2022-11-07 22:57:10 +05:30
parent 0961e4fd74
commit 78c36392eb
5 changed files with 208 additions and 0 deletions

View File

@ -0,0 +1,208 @@
---
description: Tune performance of the Hasura GraphQL Engine
keywords:
- hasura
- docs
- deployment
- performance
- tuning
sidebar_position: 120
sidebar_label: Performance tuning
---
# Tune Hasura Performance
This page serves as a reference for suggestions on fine tuning the performance of your Hasura GraphQL Engine instance.
We cover database configurations, scaling, observability, and software architecture to help you get the most out of your
set up.
## Configuration and deployment
### Postgres configuration
PostgreSQL's basic configuration is tuned for wide compatibility rather than performance. The default parameters are
very undersized for your system. You can read more details about the configuration settings at the
[PostgreSQL Wiki](https://wiki.postgresql.org/wiki/Main_Page). There are also config generator tools that can help us:
- [Tuning Your PostgreSQL Server](https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server)
- [Config generator](https://pgtune.leopard.in.ua)
### Hasura configuration
With respect to Hasura, several environment variables can be configured to tune performance.
#### `HASURA_GRAPHQL_PG_CONNECTIONS`
- Minimum **2** connections
- Default value: **50**
- Max connections: = Max connections of Postgres - 5 (keep free connections for another services, e.g PGAdmin, metrics
tools)
However, how many connections is the best setting? Of course, it depends on the Postgres server's hardware
specifications. Moreover, too many connections doesn't mean query performance will be the highest. There are many great
articles that analyze this deeper:
- [https://brandur.org/postgres-connections](https://brandur.org/postgres-connections)
- [https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing](https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing)
There isn't a silver bullet for all server specs. Your developer has to test and benchmark carefully for the final
result. However, as a starting point, you can estimate with this formula, and then test around this value:
```
connections = ((core_count * 2) + effective_spindle_count)
```
For example, if your server has a 4 Core i7 CPU and 1 hard disk, it should have a connection pool of:
9 = ((4 \* 2) +1).
:::tip Tip
Call it 10, as a nice round number.
:::
For high-transaction applications, a horizontal scale with multiple GraphQL Engine clusters is the recommended best
practice. However, you should be aware of the total connections of all nodes. The number must be lower than the maximum
connections of the Postgres instance.
#### `HASURA_GRAPHQL_CONNECTIONS_PER_READ_REPLICA`
With Read replicas, Hasura can load balance multiple databases. However, you will need to balance connections between
database nodes too:
- Master connections (`HASURA_GRAPHQL_PG_CONNECTIONS`) are now used for writing only. You can decrease max connections
lower if Hasura doesn't write much, or share connections with other Hasura nodes.
- Currently, read-replica connections use one setting for all databases. It can't flexibly configure specific values for
each node. Therefore, you need to be aware of the total number of connections when scaling Hasura to multiple nodes.
#### `HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_REFETCH_INTERVAL`
Default: **1000** (ms)
In brief, live query subscribers are grouped with the same query and variables. The GraphQL Engine needs to execute one
query and return the same results to clients once every refetch interval.
The smaller the interval is, the faster the update clients receive. However, everything has a cost. Small intervals with
a large number of subscriptions need larger CPU and memory resources. If you don't need such high frequency, the
interval can be set a bit longer. In contrast, with a small to medium number of subscriptions, the default value will
suffice.
#### `HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_BATCH_SIZE`
Default: **100**
Imagine there are 1,000,000 subscribers to the same query. Emitting to millions of websockets in sequence can cause
delays and eat more memory in a long queue. However, a small batch size can increase the number of SQL transactions.
This value needs to be kept in balance. If you don't have an idea to determine what value to use, use the default.
### Scalability
The Hasura GraphQL Engine binary is containerized by default, so it is easy to scale horizontally. You need to estimate
concurrent requests/second, benchmark how many requests one Hasura node can load, then scale multiple nodes with a
simple calculation:
```
total_nodes = required_ccu / requests_per_node + backup_node
```
:::info Note
`backup_node` is `0` or `1`, depending on your plan.
:::
However, you need to be aware of the total Postgres connections, too. The default `HASURA_GRAPHQL_PG_CONNECTIONS` value
is `50`, meanwhile the default Postgres `max_connections` configuration is `100`. The Postgres server will easily be out
of connections with `3` Hasura nodes, or `2` nodes with events/action services that connect directly to the database.
```
pg_max_connections >= hasura_nodes * hasura_pg_connections + event_nodes * event_pg_connections
```
### Observability
Observability tools help us track issues, alert us to errors, and allow us to monitor performance and hardware usage. It
is critical in production. There are many open-source and commercial services. However, you may have to combine many
tools because of the architectural complexity. For more information, check out our
[observability section](/observability/index.mdx) and our
[observability best practices](/guides/best-practices-for-production/observability.mdx).
## Software architecture and best practices
### Hasura as a data service
#### Connection Pooler
Database connection management isn't an easy task, especially when scaling to multiple application nodes. There are
common issues such as connection leaking, and maximum connections exceeded. If you use serverless applications for
Actions/event triggers that connect directly to a database, connection leaking is unavoidable because every invocation
may result in a new connection to the database. This is especially true when the number of services has grown into the
hundreds.
You can use many solutions such as [PgBouncer](https://www.pgbouncer.org/), vertical scaling, and increasing
`max_connections` on Postgres configurations. However, increasing too many connections can have unintended consequences
for your server and degrade performance.
Therefore, you can reduce connection usage by querying data from the GraphQL Engine instead. Connection pooling will be
centralized in Hasura nodes.
![hasura connection pooling](/img/deployment/hasura-connection-poller.png)
You can read more in the
[Hasura Blog](https://hasura.io/blog/level-up-your-serverless-game-with-a-graphql-data-as-a-service-layer/).
### Understand your data
Hasura's query performance relies on database performance. When there is any performance issue, you need profiling to
identify the bottleneck point and optimize your database queries. Utilizing the power of Postgres can help boost
application speed.
Fundamental knowledge you should know and practice:
- Index your table.
- Optimize queries with view, materialized view, and functions.
- Use trigger to update data instead of using 2 or more request calls.
- Normalize data structure.
- [EXPLAIN, ANALYZE](https://www.postgresql.org/docs/current/sql-explain.html).
### Query tips
- Avoid too many `_or` conditions. This doesn't utilize the table index.
- `like`, `ilike` is expensive, especially on long text. Use
[Full-text search](https://hasura.io/blog/full-text-search-with-hasura-graphql-api-postgres/) instead.
- Pagination is necessary.
- Fetching batched multiple queries in one request is useful. However, you shouldn't overuse it. One common case is
using aggregate count with data in the same pagination query. It is okay for small-to-medium tables. However, when the
table size is large, the query will be slow because of the scanning the entire table.
- Avoid joining too many tables.
### Microservices
Hardware has its limits. It's expensive to scale servers as well as optimize data. Moreover, Postgres doesn't support
master-master replica, so it will be a bottleneck if you store all the data in one database. Therefore, you can divide
your business logic into multiple smaller services, or microservices.
Hasura encourages microservices with [Remote Schemas](/remote-schemas/index.mdx). This can act as an API Gateway that
routes to multiple, smaller GraphQL servers.
![hasura load-balancer](/img/deployment/hasura-microservices.png)
For example, in an e-commerce application, you can design three Hasura services:
- User + Authentication
- Product management
- Order + Transactions
Depending on a project's scope, the design philosophy is flexible. On a small project, one database will suffice.
### Postgres ecosystem
Thanks to the open-source community, Postgres has many extensions for various types of applications:
- Time-series data, metrics, IoT: [TimescaleDB](https://www.timescale.com/), [CitusDB](https://www.citusdata.com/).
- Spatial and geographic objects: [PostGIS](https://postgis.net/).
- Image processing: [PostPic](https://github.com/drotiro/postpic).
With Remote Schemas, you can use Hasura with multiple databases for various use cases. For example, Postgres for user
data, TimescaleDB for transaction logs, and Postgis for Geographic services.

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB