graphql-engine/docs/graphql/manual/queries/performance.rst

239 lines
7.7 KiB
ReStructuredText

.. meta::
:description: Performance of Hasura GraphQL queries
:keywords: hasura, docs, schema, queries, performance
.. _query_performance:
Query performance
=================
.. contents:: Table of contents
:backlinks: none
:depth: 2
:local:
Introduction
------------
Sometimes queries can become slow due to large data volumes or levels of nesting.
This page explains how to identify the query performance, how the query plan caching in Hasura works, and how queries can be optimized.
.. _analysing_query_performance:
Analysing query performance
---------------------------
Let's say we want to analyse the following query:
.. code-block:: graphql
query {
authors(where: {name: {_eq: "Mario"}}) {
rating
}
}
In order to analyse the performance of a query, you can click on the ``Analyze`` button on the Hasura console:
.. thumbnail:: ../../../img/graphql/manual/queries/analyze-query.png
:class: no-shadow
:width: 75%
:alt: Query analyze button on Hasura console
The following query execution plan is generated:
.. thumbnail:: ../../../img/graphql/manual/queries/query-analysis-before-index.png
:class: no-shadow
:width: 75%
:alt: Execution plan for Hasura GraphQL query
We can see that a sequential scan is conducted on the ``authors`` table. This means that Postgres goes through every row of the ``authors`` table in order to check if the author's name equals "Mario".
The ``cost`` of a query is an arbitrary number generated by Postgres and is to be interpreted as a measure of comparison rather than an absolute measure of something.
Read more about query performance analysis in the `Postgres explain statement docs <https://www.postgresql.org/docs/current/sql-explain.html>`__.
.. _query_plan_caching:
Query plan caching
------------------
How it works
^^^^^^^^^^^^
Hasura executes GraphQL queries as follows:
1. The incoming GraphQL query is parsed into an `abstract syntax tree <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__ (AST) which is how GraphQL is represented.
2. The GraphQL AST is validated against the schema to generate an internal representation.
3. The internal representation is converted into an SQL statement (a `prepared statement <https://www.postgresql.org/docs/current/sql-prepare.html>`__ whenever possible).
4. The (prepared) statement is executed on Postgres to retrieve the result of the query.
For most use cases, Hasura constructs a "plan" for a query, so that a new instance of the same query can be executed without the overhead of steps 1 to 3.
For example, let's consider the following query:
.. code-block:: graphql
query getAuthor($id: Int!) {
authors(where: {id: {_eq: $id}}) {
name
rating
}
}
With the following variable:
.. code-block:: graphql
{
"id": 1
}
Hasura now tries to map a GraphQL query to a prepared statement where the parameters have a one-to-one correspondence to the variables defined in the GraphQL query.
The first time a query comes in, Hasura generates a plan for the query which consists of two things:
1. The prepared statement
2. Information necessary to convert variables into the prepared statement's arguments
For the above query, Hasura generates the following prepared statement (simplified):
.. code-block:: plpgsql
select name, rating from author where id = $1
With the following prepared variables:
.. code-block:: plpgsql
$1 = 1
This plan is then saved in a data structure called ``Query Plan Cache``. The next time the same query is executed,
Hasura uses the plan to convert the provided variables into the prepared statement's arguments and then executes the statement.
This will significantly cut down the execution time for a GraphQL query resulting in lower latencies and higher throughput.
Caveats
^^^^^^^
The above optimization is not possible for all types of queries. For example, consider this query:
.. code-block:: graphql
query getAuthorWithCondition($condition: author_bool_exp!) {
author(where: $condition)
name
rating
}
}
The statement generated for ``getAuthorWithCondition`` is now dependent on the variables.
With the following variables:
.. code-block:: json
{
"condition": {"id": {"_eq": 1}}
}
the generated statement will be:
.. code-block:: plpgsql
select name, rating from author where id = $1
However, with the following variables:
.. code-block:: json
{
"condition": {"name": {"_eq": "John"}}
}
the generated statement will be:
.. code-block:: plpgsql
select name, rating from author where name = 'John'
A plan cannot be generated for such queries because the variables defined in the GraphQL query don't have a one-to-one correspondence to the parameters in the prepared statement.
Query optimization
------------------
Using GraphQL variables
^^^^^^^^^^^^^^^^^^^^^^^
In order to leverage Hasura's query plan caching (as explained in the :ref:`previous section <query_plan_caching>`) to the full extent, GraphQL queries should be defined with
variables whose types are **non-nullable scalars** whenever possible.
To make variables non-nullable, add a ``!`` at the end of the type, like here:
.. code-block:: graphql
:emphasize-lines: 1
query getAuthor($id: Int!) {
authors(where: {id: {_eq: $id}}) {
name
rating
}
}
If the ``!`` is not added and the variable is nullable, the generated query will be different depending if an ``id`` is passed or if the variables is ``null``
(for the latter, there is no ``where`` statement present). Therefore, it's not possible for Hasura to create a reusable plan for a query in this case.
.. note::
Hasura is fast even for queries which cannot have a reusable plan.
This should concern you only if you face a high volume of traffic (thousands of requests per second).
Using PG indexes
^^^^^^^^^^^^^^^^
`Postgres indexes <https://www.tutorialspoint.com/postgresql/postgresql_indexes.htm>`__ are special lookup tables that Postgres can use to speed up data lookup.
An index acts as a pointer to data in a table, and it works very similar to an index in the back of a book.
If you look in the index first, you'll find the data much quicker than searching the whole book (or - in this case - database).
Let's say we know that ``authors`` table is frequently queried by ``name``:
.. code-block:: graphql
query {
authors(where: {name: {_eq: "Mario"}}) {
rating
}
}
We've seen in the :ref:`above example <analysing_query_performance>` that by default Postgres conducts a sequential scan i.e. going through all the rows.
Whenever there is a sequential scan, it can be optimized by adding an index.
.. rst-class:: api_tabs
.. tabs::
.. tab:: Console
An index can be added in the ``SQL -> Data`` tab in the Hasura console:
.. tab:: API
An index can be added via the :ref:`run_sql <run_sql>` metadata API.
The following statement sets an index on ``name`` in the ``authors`` table.
.. code-block:: plpgsql
CREATE INDEX ON authors (name);
Let's compare the performance analysis to :ref:`the one before adding the index <analysing_query_performance>`.
What was a ``sequential scan`` in the example earlier is now an ``index scan``. ``Index scans`` are usually more performant than ``sequential scans``.
We can also see that the ``cost`` of the query is now lower than the one before we added the index.
.. thumbnail:: ../../../img/graphql/manual/queries/query-analysis-after-index.png
:class: no-shadow
:width: 75%
:alt: Execution plan for Hasura GraphQL query
.. note::
In some cases sequential scans can still be faster than index scans, e.g. if the result returns a high percentage of the rows in the table.
Postgres comes up with multiple query plans and takes the call on what kind of scan would be faster.