graphql-engine/dc-agents
David Overton 346804fc67 Support nested object fields in DC API and use this to implement nest…
## Description

This change adds support for nested object fields in HGE IR and Schema Cache, the Data Connectors backend and API, and the MongoDB agent.

### Data Connector API changes

- The `/schema` endpoint response now includes an optional set of GraphQL type definitions. Table column types can refer to these definitions by name.
- Queries can now include a new field type `object` which contains a column name and a nested query. This allows querying into a nested object within a field.

### MongoDB agent changes

- Add support for querying into nested documents using the new `object` field type.

### HGE changes

- The `Backend` type class has a new type family `XNestedObjects b` which controls whether or not a backend supports querying into nested objects. This is currently enabled only for the `DataConnector` backend.
- For backends that support nested objects, the `FieldInfo` type gets a new constructor `FINestedObject`, and the `AnnFieldG` type gets a new constructor `AFNestedObject`.
- If the DC `/schema` endpoint returns any custom GraphQL type definitions they are stored in the `TableInfo` for each table in the source.
- During schema cache building, the function `addNonColumnFields` will check whether any column types match custom GraphQL object types stored in the `TableInfo`. If so, they are converted into `FINestedObject` instead of `FIColumn` in the `FieldInfoMap`.
- When building the `FieldParser`s from `FieldInfo` (function `fieldSelection`) any `FINestedObject` fields are converted into nested object parsers returning `AFNestedObject`.
- The `DataConnector` query planner converts `AFNestedObject` fields into `object` field types in the query sent to the agent.

## Limitations

### HGE not yet implemented:
- Support for nested arrays
- Support for nested objects/arrays in mutations
- Support for nested objects/arrays in order-by
- Support for filters (`where`) in nested objects/arrays
- Support for adding custom GraphQL types via track table metadata API
- Support for interface and union types
- Tests for nested objects

### Mongo agent not yet implemented:

- Generate nested object types from validation schema
- Support for aggregates
- Support for order-by
- Configure agent port
- Build agent in CI
- Agent tests for nested objects and MongoDB agent

PR-URL: https://github.com/hasura/graphql-engine-mono/pull/7844
GitOrigin-RevId: aec9ec1e4216293286a68f9b1af6f3f5317db423
2023-04-11 01:30:37 +00:00
..
dc-api-types Support nested object fields in DC API and use this to implement nest… 2023-04-11 01:30:37 +00:00
reference Support nested object fields in DC API and use this to implement nest… 2023-04-11 01:30:37 +00:00
scripts Move Typescript types for Data Connector agent into their own package 2022-09-05 06:09:23 +00:00
sdk Use Dataset Clones for all SQLite tests 2023-02-02 04:27:57 +00:00
sqlite Support nested object fields in DC API and use this to implement nest… 2023-04-11 01:30:37 +00:00
.gitignore Move Typescript types for Data Connector agent into their own package 2022-09-05 06:09:23 +00:00
.npmrc Move Typescript types for Data Connector agent into their own package 2022-09-05 06:09:23 +00:00
.nvmrc Move Typescript types for Data Connector agent into their own package 2022-09-05 06:09:23 +00:00
CONTRIBUTING.md Extract dc-api and its tests from graphql-engine.cabal 2022-09-27 07:21:54 +00:00
docker-compose.yaml SQLite Mutations 2023-02-14 11:20:36 +00:00
Dockerfile-reference tests: more through DC agent healthchecks 2022-10-26 17:10:44 +00:00
Dockerfile-sqlite tests: more through DC agent healthchecks 2022-10-26 17:10:44 +00:00
Makefile Extract dc-api and its tests from graphql-engine.cabal 2022-09-27 07:21:54 +00:00
package-lock.json Support nested object fields in DC API and use this to implement nest… 2023-04-11 01:30:37 +00:00
package.json Move Typescript types for Data Connector agent into their own package 2022-09-05 06:09:23 +00:00
README.md Enhance insert table schema with extra info and remove need for cached schema usage in Super Connector mutations 2023-04-05 02:23:20 +00:00
tsconfig.json Move Typescript types for Data Connector agent into their own package 2022-09-05 06:09:23 +00:00

Data Connectors

This document describes the current specification of the new _data connectors_s feature of graphql-engine, which is under active development.

The data connectors feature allows graphql-engine to delegate the execution of operations to external web services called agents. Such agents provide access to a data set, allowing graphql-engine to query that data set over a web API.

This document specifies (1) the web API that must be presented by agents, and (2) the precise behaviour of agents for specific reference data sets.

For further reference, the directory in which this document resides contains some implementations of different agents:

Stability

This specification is complete with regards to the current implementation, but should be considered unstable until the Data Connectors feature is officially released and explicitly marked as a non-experimental feature.

Setting up Data Connector agents with graphql-engine

In order to run one of the example agents, follow the steps in its respective README document.

Once an agent is running, import the following metadata into graphql-engine:

POST /v1/metadata

{
  "type": "replace_metadata",
  "args": {
    "metadata": {
      "version": 3,
      "backend_configs": {
        "dataconnector": {
          "reference": {
            "uri": "http://localhost:8100/"
          }
        }
      },
      "sources": [
        {
          "name": "chinook",
          "kind": "reference",
          "tables": [
            {
              "table": ["Album"],
              "object_relationships": [
                {
                  "name": "Artist",
                  "using": {
                    "manual_configuration": {
                      "remote_table": ["Artist"],
                      "column_mapping": {
                        "ArtistId": "ArtistId"
                      }
                    }
                  }
                }
              ]
            },
            {
              "table": ["Artist"],
              "array_relationships": [
                {
                  "name": "Album",
                  "using": {
                    "manual_configuration": {
                      "remote_table": ["Album"],
                      "column_mapping": {
                        "ArtistId": "ArtistId"
                      }
                    }
                  }
                }
              ]
            }
          ],
          "configuration": {
            "value": {
              "tables": [ "Artist", "Album" ]
            }
          }
        }
      ]
    }
  }
}

The backend_configs.dataconnector section lets you set the URIs for as many agents as you'd like. In this case, we've defined one called "reference". When you create a source, the kind of the source should be set to the name you gave the agent in the backend_configs.dataconnector section (in this case, "reference").

The configuration property under the source can contain an 'arbitrary' JSON object, and this JSON will be sent to the agent on every request via the X-Hasura-DataConnector-Config header. The example here is configuration that the reference agent uses. The JSON object must conform to the schema specified by the agent from its /capabilities endpoint.

The name property under the source will be sent to the agent on every request via the X-Hasura-DataConnector-SourceName header. This name uniquely identifies a source within an instance of HGE.

The albums and artists tables should now be available in the GraphiQL console. You should be able to issue queries via the web service. For example:

query {
  artists {
    name
    albums {
      title
    }
  }
}

Implementing Data Connector agents

This section is a guide to implementing Data Connector agents for graphql-engine. You may find it useful to consult the code examples for reference.

The entry point to the reference agent application is a Fastify HTTP server. Raw data is loaded from JSON files on disk, and the server provides the following endpoints:

  • GET /capabilities, which returns the capabilities of the agent and a schema that describes the type of the configuration expected to be sent on the X-Hasura-DataConnector-Config header
  • GET /schema, which returns information about the provided data schema, its tables and their columns
  • POST /query, which receives a query structure to be executed, encoded as the JSON request body, and returns JSON containing the requested fields. The query will be over the data schema described by the /schema endpoint.
  • GET /health, which can be used to either check if the agent is running, or if a particular data source is healthy
  • POST /mutation, which receives a request to mutate (ie change) data described by the /schema endpoint.

The /schema, /query and /mutation endpoints require the request to have the X-Hasura-DataConnector-Config header set. That header contains configuration information that agent can use to configure itself. For example, the header could contain a connection string to the database, if the agent requires a connection string to know how to connect to a specific database. The header must be a JSON object, but the specific properties that are required are up to the agent to define.

The /schema, /query and /mutation endpoints also require the request to have the X-Hasura-DataConnector-SourceName header set. This header contains the name of the data source configured in HGE that will be querying the agent. This can be used by the agent to maintain things like connection pools and configuration maps on a per-source basis.

We'll look at the implementation of each of the endpoints in turn.

Capabilities and configuration schema

The GET /capabilities endpoint is used by graphql-engine to discover the capabilities supported by the agent, and so that it can know the correct shape of configuration data that needs to be collected from the user and sent to the agent in the X-Hasura-DataConnector-Config header. It should return a JSON object similar to the following:

{
  "capabilities": {
    "queries": {
      "foreach": {}
    },
    "data_schema": {
      "supports_primary_keys": true,
      "supports_foreign_keys": true,
      "column_nullability": "nullable_and_non_nullable"
    },
    "relationships": {},
    "scalar_types": {
      "DateTime": {"comparison_operators": {"DateTime": {"in_year": "Number"}}}
    }
  },
  "config_schemas": {
    "config_schema": {
      "type": "object",
      "nullable": false,
      "properties": {
        "tables": { "$ref": "#/other_schemas/Tables" }
      }
    },
    "other_schemas": {
      "Tables": {
        "description": "List of tables to make available in the schema and for querying",
        "type": "array",
        "items": { "$ref": "#/other_schemas/TableName" },
        "nullable": true
      },
      "TableName": {
        "nullable": false,
        "type": "string"
      }
    }
  }
}

The capabilities section describes the capabilities of the service. This includes

  • queries: The query capabilities that the agent supports
  • data_schema: What sorts of features the agent supports when describing its data schema
  • relationships: whether or not the agent supports relationships
  • scalar_types: scalar types and the operations they support. See Scalar types capabilities.

The config_schema property contains an OpenAPI 3 Schema object that represents the schema of the configuration object. It can use references ($ref) to refer to other schemas defined in the other_schemas object by name.

graphql-engine will use the config_schema OpenAPI 3 Schema to validate the user's configuration JSON before putting it into the X-Hasura-DataConnector-Config header.

Query capabilities

The agent can declare whether or not it supports "foreach queries" by including a foreach property with an empty object assigned to it. Foreach query support is optional, but is required if the agent is to be used as the target of remote relationships in HGE.

Data schema capabilities

The agent can declare whether or not it supports primary keys or foreign keys by setting the supports_primary_keys and supports_foreign_keys properties under the data_schema object on capabilities. If it does not declare support, it is expected that it will not return any such primary/foreign keys in the schema it exposes on the /schema endpoint.

If the agent only supports table columns that are always nullable, then it should set column_nullability to "only_nullable". However, if it supports both nullable and non-nullable columns, then it should set "nullable_and_non_nullable".

Scalar type capabilities

Agents should declare the scalar types they support, along with the comparison operators and aggregate functions on those types. The agent may optionally specify how to parse values of scalar type by associating it with one of the built-in GraphQL types (Int, Float, String, Boolean or ID)

Scalar types are declared by adding a property to the scalar_types section of the capabilities.

Comparison operators can be defined by adding a comparison_operators property to the scalar type capabilities object. The comparison_operators property is an object where each key specifies a comparison operator name. The operator name must be a valid GraphQL name. The value associated with each key should be a string specifying the argument type, which must be a valid scalar type. All scalar types must also support the built-in comparison operators eq, gt, gte, lt and lte.

Aggregate functions can be defined by adding an aggregate_functions property to the scalar type capabilities object. The aggregate_functions property must be an object mapping aggregate function names to their result types. Aggregate function names must be must be valid GraphQL names. Result types must be valid scalar types.

Update column operators are operators that can defined to allow custom mutation operations to be performed on columns of the particular scalar type. These can be defined by adding an update_column_operators property, which maps the operator name to an object that defines the operator's argument_type. The name specified in capabilities will be prefixed with _ when it is used in the GraphQL mutation schema. The argument_type must be a valid scalar type.

The graphql_type property can be used to tell Hasura GraphQL Engine to parse values of the scalar type as though they were one of the built-in GraphQL scalar types Int, Float, String, Boolean, or ID.

Example:

capabilities:
  scalar_types:
    DateTime:
      comparison_operators:
        in_year: Number
      aggregate_functions:
        max: DateTime
        min: DateTime
      update_column_operators:
        set_year:
          argument_type: Number
      graphql_type: String

This example declares a scalar type DateTime which should be parsed as though it were a GraphQL String. The type supports a comparison operator in_year, which takes an argument of type Number.

An example GraphQL query using the comparison operator might look like below:

query MyQuery {
  Employee(where: {BirthDate: {in_year: 1962}}) {
    Name
    BirthDate
  }
}

In this query we have an Employee field with a BirthDate property of type DateTime. The in_year comparison operator is being used to request all employees with a birth date in the year 1962.

The example also defines two aggregate functions min and max, both of which have a result type of DateTime.

The example also defined a set_year update operator, which could be used in an update mutation:

mutation MyMutation {
  update_Employee(_set_year: {BirthDate: 1980}) {
    returning {
      BirthDate
    }
    affected_rows
  }
}

Mutations capabilities

The agent can declare whether it supports mutations (ie. changing data) against its data source. If it supports mutations, it needs to declare a mutations capability with agent-specific values for the following properties:

{
  "capabilities": {
    "mutations": {
      "insert": {
        "supports_nested_inserts": true
      },
      "update": {},
      "delete": {},
      "atomicity_support_level": "heterogeneous_operations",
      "returning": {}
    }
  }
}

The agent is able to specify whether or not it supports inserts, updates and deletes separately. For inserts, it can specify whether it supports nested inserts, where the user can insert related rows nested inside the one row insert.

It also should specify its supported level of transactional atomicity when performing mutations. It can choose between the following levels:

  • row: If multiple rows are affected in a single operation but one fails, only the failed row's changes will be reverted. For example, if one mutation operation inserts four rows, but one row fails, the other three rows will still be inserted, and the failed one will not.
  • single_operation: If multiple rows are affected in a single operation but one fails, all affected rows in the operation will be reverted. For example, if one mutation operation inserts four rows, but one row fails, none of the rows will be inserted.
  • homogeneous_operations: If multiple operations of only the same type exist in the one mutation request, a failure in one will result in all changes being reverted. For example, if one mutation request contains two insert operations, one to Table A and one to Table B, and Table B's insert fails, no rows will have been inserted into either Table A nor B.
  • heterogeneous_operations: If multiple operations of any type exist in the one mutation request, a failure in one will result in all changes being reverted. For example, if one mutation request contains three operations, one to insert some rows, one to update some rows, and one to delete some rows, and the deletion fails, all changes (inserts, updates and deletes) will be reverted.

The preference would be to support the highest level of atomicity possible (ie heteregeneous_operations is preferred over row). It is also possible to omit the property, which would imply no atomicity at all (failures cannot be rolled back whatsoever).

The agent can also specify whether or not it supports returning data from mutations. This refers to the ability to return the data that was mutated by mutation operations (for example, the updated rows in an update, or the deleted rows in a delete).

Dataset Capabilities

The agent can declare whether it supports datasets (ie. api for creating/cloning schemas). If it supports datasets, it needs to declare a datasets capability:

{
  "capabilities": {
    "datasets": { }
  }
}

See Datasets for information on how this capability is used.

Schema

The GET /schema endpoint is called whenever the metadata is (re)loaded by graphql-engine. It returns the following JSON object:

{
  "tables": [
    {
      "name": ["Artist"],
      "type": "table",
      "primary_key": ["ArtistId"],
      "description": "Collection of artists of music",
      "columns": [
        {
          "name": "ArtistId",
          "type": "number",
          "nullable": false,
          "description": "Artist primary key identifier",
          "insertable": true,
          "updatable": false,
          "value_generated": { "type": "auto_increment" }
        },
        {
          "name": "Name",
          "type": "string",
          "nullable": true,
          "description": "The name of the artist",
          "insertable": true,
          "updatable": true
        }
      ],
      "insertable": true,
      "updatable": true,
      "deletable": true
    },
    {
      "name": ["Album"],
      "type": "table",
      "primary_key": ["AlbumId"],
      "description": "Collection of music albums created by artists",
      "columns": [
        {
          "name": "AlbumId",
          "type": "number",
          "nullable": false,
          "description": "Album primary key identifier",
          "insertable": true,
          "updatable": false,,
          "value_generated": { "type": "auto_increment" }
        },
        {
          "name": "Title",
          "type": "string",
          "nullable": false,
          "description": "The title of the album",
          "insertable": true,
          "updatable": true
        },
        {
          "name": "ArtistId",
          "type": "number",
          "nullable": false,
          "description": "The ID of the artist that created this album",
          "insertable": true,
          "updatable": true
        }
      ],
      "insertable": true,
      "updatable": true,
      "deletable": true
    }
  ]
}

The tables section describes two available tables, as well as their columns, including types and nullability information.

Notice that the names of tables and columns are used in the metadata document to describe tracked tables and relationships.

Table names are described as an array of strings. This allows agents to fully qualify their table names with whatever namespacing requirements they have. For example, if the agent connects to a database that puts tables inside schemas, the agent could use table names such as ["my_schema", "my_table"].

The type of a table can be either a "table" or a "view".

Tables have mutability properties, namely, whether they are "insertable", "updatable" and "deletable", which refers to whether rows can be inserted/updated/deleted from the table. Typically, in an RDBMS, tables are insertable, updatable and deletable, but views may not be. However, an agent may declare the mutability properties in any combination that suits its data source.

Columns also have "insertable" and "updatable" mutability properties. Typically, in an RDBMS, computed columns are neither insertable not updatable, primary keys are insertable but not updatable and normal columns are insertable and updatable. Agents may declare whatever combination suits their data source.

Columns can have their value generated by the database, for example auto-incrementing primary key columns. This can be described on the column schema using the value_generated property. It can contain an object with a type field and the type can be one of:

  • auto_increment: The column's value can be generated by the database using an auto-incrementing integer ID
  • unique_identifier: The column's value can be generated by the database using a unique identifier
  • default_value: The column's value can be generated by the database using a default value

If the agent declares a lack of mutability support in its capabilities, it should not declare tables/columns as mutable in its schema here.

Type definitions

The SchemaResponse TypeScript type from the reference implementation describes the valid response body for the GET /schema endpoint.

Responding to queries

The POST /query endpoint is invoked when the user requests data from graphql-engine which is resolved by the service.

The service logs queries from the request body in the console. Here is a simple example based on a GraphQL query which fetches all artist data:

query {
  Artist {
    ArtistId
    Name
  }
}

and here is the resulting query request payload:

{
  "table": ["Artist"],
  "table_relationships": [],
  "query": {
    "where": {
      "expressions": [],
      "type": "and"
    },
    "order_by": null,
    "limit": null,
    "offset": null,
    "fields": {
      "ArtistId": {
        "type": "column",
        "column": "ArtistId",
        "column_type": "number"
      },
      "Name": {
        "type": "column",
        "column": "Name",
        "column_type": "string"
      }
    }
  }
}

The implementation of the service is responsible for intepreting this data structure and producing a JSON response body which is compatible with both the query and the schema.

Let's break down the request:

  • The table field tells us which table to fetch the data from, namely the Artist table. The table name (ie. the array of strings) must be one that was returned previously by the /schema endpoint.
  • The table_relationships field that lists any relationships used to join between tables in the query. This query does not use any relationships, so this is just an empty list here.
  • The query field contains further information about how to query the specified table:
    • The where field tells us that there is currently no (interesting) predicate being applied to the rows of the data set (just an empty conjunction, which ought to return every row).
    • The order_by field tells us that there is no particular ordering to use, and that we can return data in its natural order.
    • The limit and offset fields tell us that there is no pagination required.
    • The fields field tells us that we ought to return two fields per row (ArtistId and Name), and that these fields should be fetched from the columns with the same names. The scalar types of the columns are also denoted with column_type.

Response Body Structure

The response body for a call to POST /query must conform to a specific query response format. Here's an example:

{
  "rows": [
    {
      "ArtistId": 1,
      "Name": "AC/DC"
    },
    {
      "ArtistId": 2,
      "Name": "Accept"
    }
  ]
}

The rows returned by the query must be put into the rows property array in the query response object. Each object within this array represents a row, and the row object properties are the fields requested in the query. The value of the row object properties can be one of two types:

  • column: The field was a column field, then value of that column for this row is used
  • relationship: If the field was a relationship field, then a new query response object that contains the results of navigating that relationship for the current row must be used. (The query response structure is recursive via relationship-typed field values). Examples of this can be seen in the Relationships section below.

Pagination

There are three properties that are used to control pagination of queried data:

  • aggregates_limit: The maximum number of rows to consider in aggregations calculated and returned in the aggregrates property. aggregates_limit does not influence the rows returned in the rows property. It will only be used if there are aggregates in the query.
  • limit: The maximum number of rows to return from a query in the rows property. limit does not influence the rows considered by aggregations.
  • offset: The index of the first row to return. This affects the rows returned, and also the rows considered by aggregations.

limit and aggregates_limit are set when the user uses a limit parameter in their GraphQL query. This restricts the dataset considered when returning rows as well as calculating aggregates. HGE also has a row limit setting in a table's select permissions. This row limit will be used in the limit property if it is specified or if it is smaller than the limit specified in the GraphQL query itself.

To illustrate the difference between limit and aggregates_limit, consider this GraphQL query, where a row limit of 2 has been placed on the Artist table in its select permissions.

query ArtistsQuery {
  Artist_aggregate {
    aggregate {
      count
    }
    nodes {
      Name
    }
  }
}

This would produce the following agent query request JSON:

{
  "table": ["Artist"],
  "table_relationships": [],
  "query": {
    "aggregates_limit": null,
    "limit": 2,
    "offset": null,
    "aggregates": {
      "aggregate_count": {
        "type": "star_count"
      }
    },
    "fields": {
      "nodes_Name": {
        "type": "column",
        "column": "Name",
        "column_type": "string"
      }
    }
  }
}

The expected response to that query request would be the following JSON. Note that the row count has counted all rows (since aggregates_limit was null), but query has only returned the maximum number of rows as specified by the limit property: 2.

{
  "aggregates": {
    "aggregate_count": 275
  },
  "rows":[
    { "nodes_Name": "AC/DC" },
    { "nodes_Name": "Accept" }
  ]
}

By comparison, if we added a limit to our GraphQL query:

query ArtistsQuery {
  Artist_aggregate(limit: 5) {
    aggregate {
      count
    }
    nodes {
      Name
    }
  }
}

This would produce the following agent query request JSON:

{
  "table": ["Artist"],
  "table_relationships": [],
  "query": {
    "aggregates_limit": 5,
    "limit": 2,
    "offset": null,
    "aggregates": {
      "aggregate_count": {
        "type": "star_count"
      }
    },
    "fields": {
      "nodes_Name": {
        "type": "column",
        "column": "Name",
        "column_type": "string"
      }
    }
  }
}

We would expect the following result:

{
  "aggregates": {
    "aggregate_count": 5
  },
  "rows":[
    { "nodes_Name": "AC/DC" },
    { "nodes_Name": "Accept" }
  ]
}

Note that the row count aggregation has been limited to 5 because aggregates_limit was 5, and the rows returned were limited by the value of limit: 2.

Filters

The where field contains a recursive expression data structure which should be interpreted as a predicate in the context of each record.

Each node of this recursive expression structure is tagged with a type property, which indicates the type of that node, and the node will contain one or more additional fields depending on that type. The valid expression types are enumerated below, along with these additional fields:

type Additional fields Description
and expressions A conjunction of several subexpressions
or expressions A disjunction of several subexpressions
not expression The negation of a single subexpression
exists in_table, where Test if a row exists that matches the where subexpression in the specified table (in_table)
binary_op operator, column, value Test the specified column against a single value using a particular binary comparison operator
binary_arr_op operator, column, values, value_type Test the specified column against an array of values using a particular binary comparison operator where the type of each value is a value_type
unary_op operator, column Test the specified column against a particular unary comparison operator

The value of the in_table property of the exists expression is an object that describes which table to look for rows in. The object is tagged with a type property:

| type | Additional fields | Description | |-------------|---------------------------------| | related | relationship | The table is related to the current table via the relationship name specified in relationship (this means it should be joined to the current table via the relationship) | | unrelated | table | The table specified by table is unrelated to the current table and therefore is not explicitly joined to the current table |

The "current table" during expression evaluation is the table specified by the closest ancestor exists expression, or if there is no exists ancestor, it is the table involved in the Query that the whole where Expression is from.

The available binary comparison operators that can be used against a single value in binary_op are:

Binary comparison operator Description
less_than The < operator
less_than_or_equal The <= operator
greater_than The > operator
greater_than_or_equal The >= operator
equal The = operator

The available binary comparison operators that can be used against an array of values in binary_arr_op are:

Binary array comparison operator Description
in The SQL IN operator (ie. the column must be any of the array of specified values)

The available unary comparison operators that can be used against a column:

Unary comparison operator Description
is_null Tests if a column is null

Values (as used in value in binary_op and the values array in binary_arr_op) are specified as either a literal value, or a reference to another column, which could potentially be in another related table in the same query. The value object is tagged with a type property and has different fields based on the type.

type Additional fields Description
scalar value A scalar value to compare against
column column A column in the current table being queried to compare against

Columns (as used in column fields in binary_op, binary_arr_op, unary_op and in column-typed Values) are specified as a column name, a column_type to denote the scalar type of the column, as well as optionally a path to the table that contains the column. If the path property is missing/null or an empty array, then the column is on the current table. However, if the path is ["$"], then the column is on the table involved in the Query that the whole where expression is from. At this point in time, these are the only valid values of path.

Here is a simple example, which correponds to the predicate "first_name is John and last_name is Smith":

{
  "type": "and",
  "expressions": [
    {
      "type": "binary_op",
      "operator": "equal",
      "column": {
        "name": "first_name",
        "column_type": "string"
      },
      "value": {
        "type": "scalar",
        "value": "John",
        "value_type": "string"
      }
    },
    {
      "type": "binary_op",
      "operator": "equal",
      "column": {
        "name": "last_name",
        "column_type": "string"
      },
      "value": {
        "type": "scalar",
        "value": "John",
        "value_type": "string"
      }
    }
  ]
}

Here's another example, which corresponds to the predicate "first_name is the same as last_name":

{
  "type": "binary_op",
  "operator": "equal",
  "column": {
    "name": "first_name",
    "column_type": "string"
  },
  "value": {
    "type": "column",
    "column": {
      "name": "last_name",
      "column_type": "string"
    }
  }
}

In this example, a person table is filtered by whether or not that person has any children 18 years of age or older:

{
  "type": "exists",
  "in_table": {
    "type": "related",
    "relationship": "children"
  },
  "where": {
    "type": "binary_op",
    "operator": "greater_than_or_equal",
    "column": {
      "name": "age",
      "column_type": "number"
    },
    "value": {
      "type": "scalar",
      "value": 18,
      "value_type": "number"
    }
  }
}

In this example, a person table is filtered by whether or not that person has any children that have the same first name as them:

{
  "type": "exists",
  "in_table": {
    "type": "related",
    "relationship": "children"
  },
  "where": {
    "type": "binary_op",
    "operator": "equal",
    "column": {
      "name": "first_name", // This column refers to the child's name,
      "column_type": "string"
    },
    "value": {
      "type": "column",
      "column": {
        "path": ["$"],
        "name": "first_name", // This column refers to the parent's name
        "column_type": "string"
      }
    }
  }
}

Exists expressions can be nested, but the ["$"] path always refers to the query table. So in this example, a person table is filtered by whether or not that person has any children that have any friends that have the same first name as the parent:

{
  "type": "exists",
  "in_table": {
    "type": "related",
    "relationship": "children"
  },
  "where": {
    "type": "exists",
    "in_table": {
      "type": "related",
      "relationship": "friends"
    },
    "where": {
      "type": "binary_op",
      "operator": "equal",
      "column": {
        "name": "first_name", // This column refers to the children's friend's name
        "column_type": "string"
      },
      "value": {
        "type": "column",
        "column": {
          "path": ["$"],
          "name": "first_name", // This column refers to the parent's name
          "column_type": "string"
        }
      }
    }
  }
}

In this example, a table is filtered by whether or not an unrelated administrators table contains an admin called "superuser". Note that this means if the administrators table contains the "superuser" admin, then all rows of the table are returned, but if not, no rows are returned.

{
  "type": "exists",
  "in_table": {
    "type": "unrelated",
    "table": ["administrators"]
  },
  "where": {
    "type": "binary_op",
    "operator": "equal",
    "column": {
      "name": "username",
      "column_type": "string"
    },
    "value": {
      "type": "scalar",
      "value": "superuser",
      "value_type": "string"
    }
  }
}

Relationships

If the call to GET /capabilities returns a capabilities record with a relationships field then the query structure may include fields corresponding to relationships.

Note : if the relationships capability is not present then graphql-engine will not send queries to this agent involving relationships.

Relationship fields are indicated by a type field containing the string relationship. Such fields will also include the name of the relationship in a field called relationship. This name refers to a relationship that is specified on the top-level query request object in the table_relationships field.

This table_relationships is a list of tables, and for each table, a map of relationship name to relationship information. The information is an object that has a field target_table that specifies the name of the related table. It has a field called relationship_type that specified either an object (many to one) or an array (one to many) relationship. There is also a column_mapping field that indicates the mapping from columns in the source table to columns in the related table.

It is intended that the backend should execute the query contained in the relationship field and return the resulting query response as the value of this field, with the additional record-level predicate that any mapped columns should be equal in the context of the current record of the current table.

An example will illustrate this. Consider the following GraphQL query:

query {
  Artist {
    Name
    Albums {
      Title
    }
  }
}

This will generate the following JSON query if the agent supports relationships:

{
  "table": ["Artist"],
  "table_relationships": [
    {
      "source_table": ["Artist"],
      "relationships": {
        "ArtistAlbums": {
          "target_table": ["Album"],
          "relationship_type": "array",
          "column_mapping": {
            "ArtistId": "ArtistId"
          }
        }
      }
    }
  ],
  "query": {
    "where": {
      "expressions": [],
      "type": "and"
    },
    "offset": null,
    "order_by": null,
    "limit": null,
    "fields": {
      "Albums": {
        "type": "relationship",
        "relationship": "ArtistAlbums",
        "query": {
          "where": {
            "expressions": [],
            "type": "and"
          },
          "offset": null,
          "order_by": null,
          "limit": null,
          "fields": {
            "Title": {
              "type": "column",
              "column": "Title",
              "column_type": "string"
            }
          }
        }
      },
      "Name": {
        "type": "column",
        "column": "Name",
        "column_type": "string"
      }
    }
  }
}

Note the Albums field in particular, which traverses the Artists -> Albums relationship, via the ArtistAlbums relationship:

{
  "type": "relationship",
  "relationship": "ArtistAlbums",
  "query": {
    "where": {
      "expressions": [],
      "type": "and"
    },
    "offset": null,
    "order_by": null,
    "limit": null,
    "fields": {
      "Title": {
        "type": "column",
        "column": "Title",
        "column_type": "string"
      }
    }
  }
}

The top-level table_relationships can be looked up by starting from the source table (in this case Artist), locating the ArtistAlbums relationship under that table, then extracting the relationship information. This information includes the target_table field which indicates the table to be queried when following this relationship is the Album table. The relationship_type field indicates that this relationship is an array relationship (ie. that it will return zero to many Album rows per Artist row). The column_mapping field indicates the column mapping for this relationship, namely that the Artist's ArtistId must equal the Album's ArtistId.

Back on the relationship field inside the query, there is another query field. This indicates the query that should be executed against the Album table, but we must remember to enforce the additional constraint between Artist's ArtistId and Album's ArtistId. That is, in the context of any single outer Artist record, we should populate the Albums field with the query response containing the array of Album records for which the ArtistId field is equal to the outer record's ArtistId field.

Here's an example (truncated) response:

{
  "rows": [
    {
      "Albums": {
        "rows": [
          {
            "Title": "For Those About To Rock We Salute You"
          },
          {
            "Title": "Let There Be Rock"
          }
        ]
      },
      "Name": "AC/DC"
    },
    {
      "Albums": {
        "rows": [
          {
            "Title": "Balls to the Wall"
          },
          {
            "Title": "Restless and Wild"
          }
        ]
      },
      "Name": "Accept"
    }
    // Truncated, more Artist rows here
  ]
}

Cross-Table Filtering

It is possible to form queries that filter their results by comparing columns across tables via relationships. One way this can happen in Hasura GraphQL Engine is when configuring permissions on a table. It is possible to configure a filter on a table such that it joins to another table in order to compare some data in the filter expression.

The following metadata when used with HGE configures a Customer and Employee table, and sets up a select permission rule on Customer such that only customers that live in the same country as their SupportRep Employee would be visible to users in the user role:

POST /v1/metadata

{
  "type": "replace_metadata",
  "args": {
    "metadata": {
      "version": 3,
      "backend_configs": {
        "dataconnector": {
          "reference": {
            "uri": "http://localhost:8100/"
          }
        }
      },
      "sources": [
        {
          "name": "chinook",
          "kind": "reference",
          "tables": [
            {
              "table": ["Customer"],
              "object_relationships": [
                {
                  "name": "SupportRep",
                  "using": {
                    "manual_configuration": {
                      "remote_table": ["Employee"],
                      "column_mapping": {
                        "SupportRepId": "EmployeeId"
                      }
                    }
                  }
                }
              ],
              "select_permissions": [
                {
                  "role": "user",
                  "permission": {
                    "columns": [
                      "CustomerId",
                      "FirstName",
                      "LastName",
                      "Country",
                      "SupportRepId"
                    ],
                    "filter": {
                      "SupportRep": {
                        "Country": {
                          "_ceq": ["$","Country"]
                        }
                      }
                    }
                  }
                }
              ]
            },
            {
              "table": ["Employee"]
            }
          ],
          "configuration": {}
        }
      ]
    }
  }
}

Given this GraphQL query (where the X-Hasura-Role session variable is set to user):

query getCustomer {
  Customer {
    CustomerId
    FirstName
    LastName
    Country
    SupportRepId
  }
}

We would get the following query request JSON:

{
  "table": ["Customer"],
  "table_relationships": [
    {
      "source_table": ["Customer"],
      "relationships": {
        "SupportRep": {
          "target_table": ["Employee"],
          "relationship_type": "object",
          "column_mapping": {
            "SupportRepId": "EmployeeId"
          }
        }
      }
    }
  ],
  "query": {
    "fields": {
      "Country": {
        "type": "column",
        "column": "Country",
        "column_type": "string"
      },
      "CustomerId": {
        "type": "column",
        "column": "CustomerId",
        "column_type": "number"
      },
      "FirstName": {
        "type": "column",
        "column": "FirstName",
        "column_type": "string"
      },
      "LastName": {
        "type": "column",
        "column": "LastName",
        "column_type": "string"
      },
      "SupportRepId": {
        "type": "column",
        "column": "SupportRepId",
        "column_type": "number"
      }
    },
    "where": {
      "type": "and",
      "expressions": [
        {
          "type": "exists",
          "in_table": {
            "type": "related",
            "relationship": "SupportRep"
          },
          "where": {
            "type": "binary_op",
            "operator": "equal",
            "column": {
              "name": "Country",
              "column_type": "string"
            },
            "value": {
              "type": "column",
              "column": {
                "path": ["$"],
                "name": "Country",
                "column_type": "string"
              }
            }
          }
        }
      ]
    }
  }
}

The key point of interest here is in the where field where we are comparing between columns. Our first expression is an exists expression that specifies a row must exist in the table related to the Customer table by the SupportRep relationship (ie. the Employee table). These rows must match a subexpression that compares the related Employee's Country column with equal to Customer's Country column (as indicated by the ["$"] path). So, in order to evaluate this condition, we'd need to join the Employee table using the column_mapping specified in the SupportRep relationship. Then if any of the related rows (in this case, only one because it is an object relation) contain a Country that is equal to Customer row's Country the binary_op would evaluate to True. This would mean a row exists, so the exists evaluates to true, and we don't filter out the Customer row.

Filtering by Unrelated Tables

It is possible to filter a table by a predicate evaluated against a completely unrelated table. This can happen in Hasura GraphQL Engine when configuring permissions on a table.

In the following example, we are configuring HGE's metadata such that when the Customer table is queried by the employee role, the employee currently doing the query (as specified by the X-Hasura-EmployeeId session variable) must be an employee from the city of Calgary, otherwise no rows are returned.

POST /v1/metadata

{
  "type": "replace_metadata",
  "args": {
    "metadata": {
      "version": 3,
      "backend_configs": {
        "dataconnector": {
          "reference": {
            "uri": "http://localhost:8100/"
          }
        }
      },
      "sources": [
        {
          "name": "chinook",
          "kind": "reference",
          "tables": [
            {
              "table": ["Customer"],
              "select_permissions": [
                {
                  "role": "employee",
                  "permission": {
                    "columns": [
                      "CustomerId",
                      "FirstName",
                      "LastName",
                      "Country",
                      "SupportRepId"
                    ],
                    "filter": {
                      "_exists": {
                        "_table": ["Employee"],
                        "_where": {
                          "_and": [
                            { "EmployeeId": { "_eq": "X-Hasura-EmployeeId" } },
                            { "City": { "_eq": "Calgary" } }
                          ]
                        }
                      }
                    }
                  }
                }
              ]
            },
            {
              "table": ["Employee"]
            }
          ],
          "configuration": {}
        }
      ]
    }
  }
}

Given this GraphQL query (where the X-Hasura-Role session variable is set to employee, and the X-Hasura-EmployeeId session variable is set to 2):

query getCustomer {
  Customer {
    CustomerId
    FirstName
    LastName
    Country
    SupportRepId
  }
}

We would get the following query request JSON:

{
  "table": ["Customer"],
  "table_relationships": [],
  "query": {
    "fields": {
      "Country": {
        "type": "column",
        "column": "Country",
        "column_type": "string"
      },
      "CustomerId": {
        "type": "column",
        "column": "CustomerId",
        "column_type": "number"
      },
      "FirstName": {
        "type": "column",
        "column": "FirstName",
        "column_type": "string"
      },
      "LastName": {
        "type": "column",
        "column": "LastName",
        "column_type": "string"
      },
      "SupportRepId": {
        "type": "column",
        "column": "SupportRepId",
        "column_type": "number"
      }
    },
    "where": {
      "type": "exists",
      "in_table": {
        "type": "unrelated",
        "table": ["Employee"]
      },
      "where": {
        "type": "and",
        "expressions": [
          {
            "type": "binary_op",
            "operator": "equal",
            "column": {
              "name": "EmployeeId",
              "column_type": "number"
            },
            "value": {
              "type": "scalar",
              "value": 2,
              "value_type": "number"
            }
          },
          {
            "type": "binary_op",
            "operator": "equal",
            "column": {
              "name": "City",
              "column_type": "string"
            },
            "value": {
              "type": "scalar",
              "value": "Calgary",
              "value_type": "string"
            }
          }
        ]
      }
    }
  }
}

The key part in this query is the where expression. The root expression in the where is an exists expression which specifies that at least one row must exist in the unrelated ["Employee"] table that satisfies a subexpression. This subexpression asserts that the rows from the Employee table have both EmployeeId as 2 and City as Calgary. The columns referenced inside this subexpression don't have path properties, which means they refer the columns on the Employee table because that is the closest ancestor exists table.

Aggregates

HGE supports forming GraphQL queries that allow clients to aggregate over the data in their data sources. This type of query can be passed through to Data Connector agents as a part of the Query structure sent to /query.

For example, consider the following GraphQL query:

query {
  Artist_aggregate {
    aggregate {
      max {
        ArtistId
      }
    }
  }
}

This would cause the following query request to be performed:

{
  "table": ["Artist"],
  "table_relationships": [],
  "query": {
    "aggregates": {
      "aggregate_max_ArtistId": {
        "type": "single_column",
        "function": "max",
        "column": "ArtistId",
        "result_type": "number"
      }
    }
  }
}

Notice the Query has an aggregates property; this property contains an object where the property name is the field name of the aggregate, and the value is a description of the aggregate. In the example above, we're using the max function on the ArtistId column. The max function is a function that operates on a single column, so the type of the aggregate is single_column.

The supported single_column functions are defined by the agent in its capabilities. They are associated with the scalar types that the agent declares, and each scalar type declares the set of functions that can be used on columns of that scalar type (the aggregate_functions property in scalar type capabilities).

Every aggregate function declared in capabilities has a result scalar type (ie. the type of the value returned by the aggregate function). This is usually, but not always, the same as the scalar type the aggregate function is for (ie. the type of the column the function is used with). This result scalar type is provided in the result_type property on single_column query aggregations.

The aggregate function is to be run over all rows that match the Query. In this case, the query has no filters on it (ie. no where, limit or offset properties), so the query would be selecting all rows in the Artist table.

There are two other types of aggregates, column_count and star_count, as demonstrated in this GraphQL query, and its resultant QueryRequest:

query {
  Album_aggregate {
    aggregate {
      distinct_count: count(columns: Title, distinct: true)
      count
    }
  }
}
{
  "table": ["Album"],
  "table_relationships": [],
  "query": {
    "aggregates": {
      "aggregate_distinct_count": {
        "type": "column_count",
        "columns": ["Title"],
        "distinct": true
      },
      "aggregate_count": {
        "type": "star_count"
      }
    }
  }
}

A column_count aggregate counts the number of rows that have non-null values in the specified columns. If distinct is set to true, then the count should only count unique values of those columns. This is like a COUNT(x,y,z) or a COUNT(DISTINCT x,y,z) in SQL.

A star_count aggregate simply counts the number of rows matched by the query (similar to a COUNT(*) in SQL).

The results of the aggregate functions must be returned in an aggregates property on the query response. For example:

{
  "aggregates": {
    "aggregate_distinct_count": 347,
    "aggregate_count": 347
  }
}

HGE's aggregate GraphQL queries can also return the rows involved in the aggregates, as well as apply all the standard filtering operations, for example:

query {
  Artist_aggregate(where: {Name: {_gt: "Z"}}) {
    aggregate {
      count
    }
    nodes {
      ArtistId
      Name
    }
  }
}

The nodes part of the query ends up as standard fields in the Query, and therefore are treated exactly the same as discussed in previous sections:

{
  "table": ["Artist"],
  "table_relationships": [],
  "query": {
    "aggregates": {
      "aggregate_count": {
        "type": "star_count"
      }
    },
    "fields": {
      "nodes_ArtistId": {
        "type": "column",
        "column": "ArtistId",
        "column_type": "number"
      },
      "nodes_Name": {
        "type": "column",
        "column": "Name",
        "column_type": "string"
      }
    },
    "where": {
      "type": "binary_op",
      "operator": "greater_than",
      "column": {
        "name": "Name",
        "column_type": "string"
      },
      "value": {
        "type": "scalar",
        "value": "Z",
        "value_type": "string"
      }
    }
  },
}

The response from this query would include both the aggregates and the matching rows containing the specified fields:

{
  "aggregates": {
    "aggregate_count": 1
  },
  "rows": [
    {
      "nodes_ArtistId": 155,
      "nodes_Name": "Zeca Pagodinho"
    }
  ]
}

Aggregate queries can also appear in relationship fields. Consider the following query:

query {
  Artist(limit: 2, offset: 1) {
    Name
    Albums_aggregate {
      aggregate {
        count
      }
    }
  }
}

This would generate the following QueryRequest:

{
  "table": ["Artist"],
  "table_relationships": [
    {
      "source_table": ["Artist"],
      "relationships": {
        "Albums": {
          "target_table": ["Album"],
          "relationship_type": "array",
          "column_mapping": {
            "ArtistId": "ArtistId"
          }
        }
      }
    }
  ],
  "query": {
    "fields": {
      "Albums_aggregate": {
        "type": "relationship",
        "relationship": "Albums",
        "query": {
          "aggregates": {
            "aggregate_count": {
              "type": "star_count"
            }
          }
        }
      },
      "Name": {
        "type": "column",
        "column": "Name",
        "column_type": "string"
      }
    },
    "limit": 2,
    "offset": 1
  }
}

This would be expected to return the following response, with the rows from the Artist table, and the aggregates from the related Albums nested under the relationship field values for each Album row:

{
  "rows": [
    {
      "Albums_aggregate": {
        "aggregates": {
          "aggregate_count": 2
        }
      },
      "Name": "Accept"
    },
    {
      "Albums_aggregate": {
        "aggregates": {
          "aggregate_count": 1
        }
      },
      "Name": "Aerosmith"
    }
  ]
}

Ordering

The order_by field can either be null, which means no particular ordering is required, or an object with two properties:

{
  "relations": {},
  "elements": [
    {
      "target_path": [],
      "target": {
        "type": "column",
        "column": "last_name",
        "column_type": "string"
      },
      "order_direction": "asc"
    },
    {
      "target_path": [],
      "target": {
        "type": "column",
        "column": "first_name",
        "column_type": "string"
      },
      "order_direction": "desc"
    }
  ]
}

The elements field specifies an array of one-or-more ordering elements. Each element represents a "target" to order, and a direction to order by. The direction can either be asc (ascending) or desc (descending). If there are multiple elements specified, then rows should be ordered with earlier elements in the array taking precedence. In the above example, rows are principally ordered by last_name, delegating to first_name in the case where two last names are equal.

The order by element target is specified as an object, whose type property specifies a different sort of ordering target:

type Additional fields Description
column column Sort by the column specified
star_count_aggregate - Sort by the count of all rows on the related target table (a non-empty target_path will always be specified)
single_column_aggregate function, column, result_type Sort by the value of applying the specified aggregate function to the column values of the rows in the related target table (a non-empty target_path will always be specified). The aggregate function will return result_type scalar type

The target_path property is a list of relationships to navigate before finding the target to sort on. This is how sorting on columns or aggregates on related tables is expressed. Note that aggregate-typed targets will never be found on the current table (ie. a target_path of []) and are always applied to a related table.

Here's an example of applying an ordering by a related table; the Album table is being queried and sorted by the Album's Artist's Name.

{
  "table": ["Album"],
  "table_relationships": [
    {
      "source_table": ["Album"],
      "relationships": {
        "Artist": {
          "target_table": ["Artist"],
          "relationship_type": "object",
          "column_mapping": {
            "ArtistId": "ArtistId"
          }
        }
      }
    }
  ],
  "query": {
    "fields": {
      "Title": { "type": "column", "column": "Title", "column_type": "string" }
    },
    "order_by": {
      "relations": {
        "Artist": {
          "where": null,
          "subrelations": {}
        }
      },
      "elements": [
        {
          "target_path": ["Artist"],
          "target": {
            "type": "column",
            "column": "Name"
          },
          "order_direction": "desc"
        }
      ]
    }
  }
}

Note that the target_path specifies the relationship path of ["Artist"], and that this relationship is defined in the top-level table_relationships. The ordering element target column Name would therefore be found on the Artist table after joining to it from each Album. (See the Relationships section for more information about relationships.)

The relations property of order_by will contain all the relations used in the order by, for the purpose of specifying filters that must be applied to the joined tables before using them for sorting. The relations property captures all target_paths used in the order_by in a recursive fashion, so for example, if the following target_paths were used in the order_by's elements:

  • ["Artist", "Albums"]
  • ["Artist"]
  • ["Tracks"]

Then the value of the relations property would look like this:

{
  "Artist": {
    "where": null,
    "subrelations": {
      "Albums": {
        "where": null,
        "subrelations": {}
      }
    }
  },
  "Tracks": {
    "where": null,
    "subrelations": {}
  }
}

The where properties may contain filtering expressions that must be applied to the joined table before using it for sorting. The filtering expressions are defined in the same manner as specified in the Filters section of this document, where they are used on the where property of Queries.

For example, here's a query that retrieves artists ordered descending by the count of all their albums where the album title is greater than 'T'.

{
  "table": ["Artist"],
  "table_relationships": [
    {
      "source_table": ["Artist"],
      "relationships": {
        "Albums": {
          "target_table": ["Album"],
          "relationship_type": "array",
          "column_mapping": {
            "ArtistId": "ArtistId"
          }
        }
      }
    }
  ],
  "query": {
    "fields": {
      "Name": { "type": "column", "column": "Name", "column_type": "string" }
    },
    "order_by": {
      "relations": {
        "Albums": {
          "where": {
            "type": "binary_op",
            "operator": "greater_than",
            "column": {
              "name": "Title",
              "column_type": "string"
            },
            "value": {
              "type": "scalar",
              "value": "T",
              "value_type": "string"
            }
          },
          "subrelations": {}
        }
      },
      "elements": [
        {
          "target_path": ["Albums"],
          "target": {
            "type": "star_count_aggregate"
          },
          "order_direction": "desc"
        }
      ]
    }
  }
}

Foreach Queries

HGE has the ability to perform joins between tables in different databases, known as "remote relationships". In order to be able to do this relatively efficiently, HGE requires data connector agents to support performing "foreach" queries, which are a variant of a normal query.

Data connector agents that support foreach queries must declare it in their capabilities:

{
  "queries": {
    "foreach": {}
  }
}

Agents that do not declare foreach query support will not be able to be used as the target data source in remote relationships in HGE and will not receive foreach queries.

Foreach queries are very similar to standard queries, except they include an additional foreach property on the Query Request object:

{
  "table": ["Album"],
  "table_relationships": [],
  "query": {
    "fields": {
      "AlbumId": {
        "type": "column",
        "column": "AlbumId",
        "column_type": "number"
      },
      "Title": {
        "type": "column",
        "column": "Title",
        "column_type": "string"
      }
    }
  },
  "foreach": [
    { "ArtistId": {"value": 1, "value_type": "number"} },
    { "ArtistId": {"value": 2, "value_type": "number"} }
  ]
}

The easiest (and least performance-efficient) way of describing the purpose of foreach queries is that the query must be run for each element in the foreach array, and its results additionally filtered where the specified columns equal the specified values. For the above example, the query would need to be executed twice, first selecting only the rows where the column ArtistId is 1, and again, but this time selecting only the rows where the column ArtistId is 2. These filters must be applied in addition to any where filter in the query itself.

The results of a foreach query must be returned in the following format:

{
  "rows": [ // The results of each foreach query must be put into this array. The order of results in here must be in the same order as in the query request foreach.
    {
      "query": { // The results of running each query must be put under a field called "query"
        "rows": [
          { "AlbumId": 1, "Title": "For Those About To Rock We Salute You" },
          { "AlbumId": 4, "Title": "Let There Be Rock" }
        ]
      }
    },
    {
      "query": {
        "rows": [
          { "AlbumId": 2, "Title": "Balls to the Wall" },
          { "AlbumId": 3, "Title": "Restless and Wild" }
        ]
      }
    }
  ]
}

This is the standard query response format, except that the results of running each foreach query are nested inside a top-level query response that contains a row per foreach array element. Each row has the field query which contains the specific query results for that iteration of the foreach query. The order of the result rows in this top-level response object must match the order of the foreach array in the request.

Performance

Obviously, running the query many times over in a loop for each item in the foreach array has the potential to be slow. One way to implement this efficiently in a relational database would be to perform a LATERAL join from a rowset of the foreach values to the query table.

For example, using the above example foreach query and Postgresql as an example RDBMS, one could perform all foreach queries in a single operation using a LEFT OUTER JOIN LATERAL:

SELECT foreach."Index", foreach."ArtistId", album."AlbumId", album."Title"
FROM (VALUES (0, 1), (1, 2)) AS foreach ("Index", "ArtistId")
LEFT OUTER JOIN LATERAL (
	SELECT "AlbumId", "Title"
	FROM "Album" album
	WHERE album."ArtistId" = foreach."ArtistId"
) AS album ON true
ORDER BY foreach."Index" ASC

This returns:

Index ArtistId AlbumId Title
0 1 1 For Those About To Rock We Salute You
0 1 4 Let There Be Rock
1 2 2 Balls to the Wall
1 1 3 Restless and Wild

It is important to point out that a LATERAL join is necessary instead of regular join, because a lateral join preserves the necessary "for each" semantics; without it, performing other query operations like pagination using LIMIT and OFFSET in the subquery would not work correctly.

The artificial Index column is inserted into the foreach rowset to ensure that the ordering of the results matches the original ordering of the foreach array in the query request.

Type Definitions

The QueryRequest TypeScript type in the reference implementation describes the valid request body payloads which may be passed to the POST /query endpoint. The response body structure is captured by the QueryResponse type.

Health endpoint

Agents must expose a /health endpoint which must return a 204 No Content HTTP response code if the agent is up and running. This does not mean that the agent is able to connect to any data source it performs queries against, only that the agent is running and can accept requests, even if some of those requests might fail because a dependant service is unavailable.

However, this endpoint can also be used to check whether the ability of the agent to talk to a particular data source is healthy. If the endpoint is sent the X-Hasura-DataConnector-Config and X-Hasura-DataConnector-SourceName headers, then the agent is expected to check that it can successfully talk to whatever data source is being specified by those headers. If it can do so, then it must return a 204 No Content response code.

Reporting Errors

Any non-200 response code from an Agent (except for the /health endpoint) will be interpreted as an error. These should be handled gracefully by graphql-engine but provide limited details to users. If you wish to return structured error information to users you can return a status of 500, or 400 from the /capabilities, /schema, and /query endpoints with the following JSON format:

{
  "type": String,    // A specific error type, see below
  "message": String, // A plain-text message for display purposes
  "details": Value   // An arbitrary JSON Value containing error details
}

The available error types are:

  • mutation-constraint-violation: For when a mutation request fails because the mutation causes a violation of data constraints (for example, primary key constraint) in the data source
  • mutation-permission-check-failure: For when a permissions check fails during a mutation and the mutation is rejected
  • uncaught-error: For all other errors

Mutations

The POST /mutation endpoint is invoked when the user issues a mutation GraphQL request to graphql-engine, assuming the agent has declared itself capable of mutations in its capabilities. The basic structure of a mutation request is as follows:

{
  "table_relationships": [], // Any relationships between tables are described in here in the same manner as in queries
  "operations": [ // A mutation request can contain multiple mutation operations
    {
      "type": "insert", // Also: "update" and "delete"
      "returning_fields": { // The fields to return for every affected row
        "ArtistId": {
          "type": "column",
          "column": "ArtistId",
          "column_type": "number"
        }
      },
      ... // Other operation type-specific properties, detailed below
    }
  ]
}

There are three types of mutation operations: insert, update and delete. A request can involve multiple mutation operations, potentially of differing types. A mutation operation can specify returning_fields which are the fields that are expected to be returned in the response for each row affected by the mutation operation.

The response to a mutation request takes this basic structure:

{
  "operation_results": [ // There will be a result object per operation, returned here in the same order as in the request
    {
      "affected_rows": 1, // The number of rows affected by the mutation operation
      "returning": [ // The rows that were affected; each row object contains the fields requested in `returning_fields`
        {
          "FieldName": "FieldValue"
        }
      ]
    }
  ]
}

If any mutation operation causes an error, for example, if a mutation violates a constraint such as a primary key or a foreign key constraint in an RDBMS, then an error should be returned as a response in the same manner as described in the Reporting Errors section. Changes should be rolled back to the extent described by the atomicity_level declared by in the agent's mutation capabilities. The error type should be mutation_constraint_violation and the HTTP response code should be 400. For example:

{
  "type": "mutation-constraint-violation",
  "message": "Violation of PRIMARY KEY constraint PK_Artist. Cannot insert duplicate key in table Artist. The duplicate key value is (1).", // Can be any helpfully descriptive error message
  "details": { // Any helpful structured error information, the below is just an example
    "constraint_name": "PK_Artist",
    "table": ["Artist"],
    "key_value": 1
  }
}

If a mutation fails because it fails a permissions check (eg a post-insert-check), then the error code that should be used is mutation-permission-check-failure.

Insert Operations

Here's an example GraphQL mutation that inserts two artists:

mutation InsertArtists {
  insert_Artist(objects: [
    {ArtistId: 300, Name: "Taylor Swift"},
    {ArtistId: 301, Name: "Phil Collins"}
  ]) {
    affected_rows
    returning {
      ArtistId
      Name
    }
  }
}

This would result in a mutation request like this:

{
  "table_relationships": [],
  "insert_schema": [
    {
      "table": ["Artist"],
      "primary_key": ["ArtistId"],
      "fields": {
        "ArtistId": {
          "type": "column",
          "column": "ArtistId",
          "column_type": "number",
          "nullable": false
        },
        "Name": {
          "type": "column",
          "column": "Name",
          "column_type": "string",
          "nullable": false
        }
      }
    }
  ],
  "operations": [
    {
      "type": "insert",
      "table": ["Artist"],
      "rows": [
        [
          {
            "ArtistId": 300,
            "Name": "Taylor Swift"
          },
          {
            "ArtistId": 301,
            "Name": "Phil Collins"
          }
        ]
      ],
      "post_insert_check": {
        "type": "and",
        "expressions": []
      },
      "returning_fields": {
        "ArtistId": {
          "type": "column",
          "column": "ArtistId",
          "column_type": "number"
        },
        "Name": {
          "type": "column",
          "column": "Name",
          "column_type": "string"
        }
      }
    }
  ]
}

The first thing to notice is the insert_schema property at the mutation request level. This contains the definition of the fields that will be used inside any insert operation in this request on a per-table basis. The schema for the row data to insert is placed here, separate to the row data itself, in order to reduce the amount of duplicate data that would exist were it inlined into the row data structures themselves. We also have information about which columns belong to the table's primary key (if any) in the primary_key property.

So, because in this request we are inserting into the Artist table, we have the definition of what "ArtistId" and "Name" properties mean when they are found in the rows to insert for the Artist table. In this case, both fields are columns ("type": "column") with their names (column), types (column_type) and nullability (nullable) specified. There can also be an optional value_generated property that reflects the same property returned by the agent from its schema endpoint. In fact, all the schema information reflected here in the insert_schema is simply a reflection of what the agent originally has declared from its schema endpoint.

Next, let's break the insert-typed operation's properties down:

  • table: specifies the table we're inserting rows into
  • rows: An array of rows to insert. Each row is an object with properties, where what the properties correspond to (eg. column values) is defined by the insert_schema.
  • post_insert_check: The post-insert check is an expression (in the same format as Query's where property) that all inserted rows must match otherwise their insertion must be reverted. This expression comes from graphql-engine's permissions system. The reason that it is a "post-insert" check is because it can involve joins via relationships to other tables and potentially data that is only available post-insert such as computed columns. If the agent knows it can compute the result of such a check without actually performing an insert, it is free to do so, but it must produce a result that is indistinguishable from that which was done post-insert. If the post-insert check fails, the mutation request should fail with an error using the error code mutation-permission-check-failure.
  • returning_fields: This specifies a list of fields to return in the response. The property takes the same format as the fields property on Queries. It is expected that the specified fields will be returned for all rows affected by the insert (ie. all inserted rows).

The result of this request would be the following response:

{
  "operation_results": [
    {
      "affected_rows": 2,
      "returning": [
        {
          "ArtistId": 300,
          "Name": "Taylor Swift"
        },
        {
          "ArtistId": 301,
          "Name": "Phil Collins"
        }
      ]
    }
  ]
}

Notice that the two affected rows in returning are the two that we inserted.

Nested Insert Operations

If a user wishes to insert multiple related rows in one go, they can issue a nested insert GraphQL query:

mutation InsertAlbum {
  insert_Album(objects: [
    {
      AlbumId: 400,
      Title: "Fearless",
      Artist: {
        data: {
          ArtistId: 300,
          Name: "Taylor Swift"
        }
      },
      Tracks: {
        data: [
          { TrackId: 4000, Name: "Fearless" },
          { TrackId: 4001, Name: "Fifteen" }
        ]
      }
    }
  ]) {
    affected_rows
    returning {
      AlbumId
      Title
      Artist {
        ArtistId
        Name
      }
      Tracks {
        TrackId
        Name
      }
    }
  }
}

This would result in the following request:

{
  "table_relationships": [
    {
      "source_table": ["Album"],
      "relationships": {
        "Artist": {
          "target_table": ["Artist"],
          "relationship_type": "object",
          "column_mapping": {
            "ArtistId": "ArtistId"
          }
        },
        "Tracks": {
          "target_table": ["Track"],
          "relationship_type": "array",
          "column_mapping": {
            "AlbumId": "AlbumId"
          }
        }
      }
    }
  ],
  "insert_schema": [
    {
      "table": ["Album"],
      "fields": {
        "AlbumId": {
          "type": "column",
          "column": "AlbumId",
          "column_type": "number"
        },
        "Title": {
          "type": "column",
          "column": "Title",
          "column_type": "string"
        },
        "Artist": {
          "type": "object_relation",
          "relationship": "Artist",
          "insert_order": "before_parent"
        },
        "Tracks": {
          "type": "array_relation",
          "relationship": "Tracks"
        },
      }
    },
    {
      "table": ["Artist"],
      "fields": {
        "ArtistId": {
          "type": "column",
          "column": "ArtistId",
          "column_type": "number"
        },
        "Name": {
          "type": "column",
          "column": "Name",
          "column_type": "string"
        }
      }
    },
    {
      "table": ["Track"],
      "fields": {
        "TrackId": {
          "type": "column",
          "column": "TrackId",
          "column_type": "number"
        },
        "Name": {
          "type": "column",
          "column": "Name",
          "column_type": "string"
        }
      }
    }
  ],
  "operations": [
    {
      "type": "insert",
      "table": ["Album"],
      "rows": [
        [
          {
            "AlbumId": 400,
            "Title": "Fearless",
            "Artist": {
              "ArtistId": 300,
              "Name": "Taylor Swift"
            },
            "Tracks": [
              {
                "TrackId": 4000,
                "Name": "Fearless"
              },
              {
                "TrackId": 4001,
                "Name": "Fifteen"
              }
            ]
          }
        ]
      ],
      "post_insert_check": {
        "type": "and",
        "expressions": []
      },
      "returning_fields": {
        "AlbumId": {
          "type": "column",
          "column": "AlbumId",
          "column_type": "number"
        },
        "Title": {
          "type": "column",
          "column": "Title",
          "column_type": "string"
        },
        "Artist": {
          "type": "relationship",
          "relationship": "Artist",
          "query": {
            "fields": {
              "ArtistId": {
                "type": "column",
                "column": "ArtistId",
                "column_type": "number"
              },
              "Name": {
                "type": "column",
                "column": "Name",
                "column_type": "string"
              }
            }
          }
        },
        "Tracks": {
          "type": "relationship",
          "relationship": "Tracks",
          "query": {
            "fields": {
              "TrackId": {
                "type": "column",
                "column": "TrackId",
                "column_type": "number"
              },
              "Name": {
                "type": "column",
                "column": "Name",
                "column_type": "string"
              }
            }
          }
        }
      }
    }
  ]
}

Note that there are two new types of fields in the insert_schema in this query to capture the nested inserts:

  • object_relation: This captures a nested insert across an object relationship. In this case, we're inserting the related Artist row.
    • relationship: The name of the relationship across which to insert the related row. The information about this relationship can be looked up in table_relationships.
    • insert_order: This can be either before_parent or after_parent and indicates whether or not the related row needs to be inserted before the parent row or after it.
  • array_relation: This captures a nested insert across an array relationship. In this case, we're inserting the related Tracks rows.
    • relationship: The name of the relationship across which to insert the related rows. The information about this relationship can be looked up in table_relationships.

The agent is expected to set the necessary values of foreign key columns itself when inserting all the rows. In this example, the agent would:

  • First insert the Artist.
  • Then insert the Album, using the Artist.ArtistId primary key column for the Album.ArtistId foreign key column.
  • Then insert the two Track rows, using the Album.AlbumId primary key column for the Track.AlbumId foreign key column.

This is particularly important where the value of primary keys are not known until they are generated in the database itself and cannot be provided by the user.

Note that in returning_fields we have used fields of type relationship to navigate relationships in the returned affected rows. This works in the same way as in Queries.

The response to this mutation request would be:

{
  "operation_results": [
    {
      "affected_rows": 4,
      "returning": [
        {
          "AlbumId": 400,
          "Title": "Fearless",
          "Artist": {
            "rows": [
              {
                "ArtistId": 300,
                "Name": "Taylor Swift"
              }
            ]
          },
          "Tracks": {
            "rows": [
              {
                "TrackId": 4000,
                "Name": "Fearless"
              },
              {
                "ArtistId": 4001,
                "Name": "Fifteen"
              }
            ]
          }
        }
      ]
    }
  ]
}

Note that relationship fields are returned in the response in the same fashion as they are in a Query response; ie. inside a nested object with rows and aggregates (if specified) properties.

Update Operations

Here's an example of a mutation that updates a Track row:

mutation UpdateTrack {
  update_Track(
    where: { TrackId: { _eq: 1 } },
    _set: { UnitPrice: 2.50 },
    _inc: { Milliseconds: 100 }
    ) {
    affected_rows
    returning {
      TrackId
      Milliseconds
    }
  }
}

This would get translated into a mutation request like so:

{
  "table_relationships": [],
  "operations": [
    {
      "type": "update",
      "table": ["Track"],
      "where": {
        "type": "binary_op",
        "operator": "equal",
        "column": {
          "name": "TrackId",
          "column_type": "number"
        },
        "value": {
          "type": "scalar",
          "value": 1,
          "value_type": "number"
        }
      },
      "updates": [,
        {
          "type": "set",
          "column": "UnitPrice",
          "value": 2.50,
          "value_type": "number"
        }
        {
          "type": "custom_operator",
          "operator_name": "inc",
          "column": "Milliseconds",
          "value": 100,
          "value_type": "number"
        }
      ],
      "post_update_check": {
        "type": "and",
        "expressions": []
      },
      "returning_fields": {
        "TrackId": {
          "type": "column",
          "column": "TrackId",
          "column_type": "number"
        },
        "Name": {
          "type": "column",
          "column": "Milliseconds",
          "column_type": "number"
        }
      }
    }
  ]
}

Breaking down the properties in the update-typed mutation operation:

  • table: specifies the table we're updating rows in
  • where: An expression (same as the expression in a Query's where property) that is used to select the matching rows to update
  • updates: An array of RowUpdate objects that describe the individual updates to be applied to each row that matches the expression in where. There are two types of RowUpdates:
    • set - This sets the specified column to the specified value
    • custom_operator - This defines a mutation to a column using a custom update column operator defined by the agent in its capabilities. The operator_name property specifies which operator was used. In this case, inc is the custom operator.
  • post_update_check: The post-update check is an expression (in the same format as Query's where property) that all updated rows must match otherwise the changes made must be reverted. This expression comes from graphql-engine's permissions system. The reason that it is a "post-update" check is because it operates on the post-update data (such as the results of increment updates), can involve joins via relationships to other tables, and can potentially involve data that is only available post-insert such as computed columns. If the agent knows it can compute the result of such a check without actually performing an update, it is free to do so, but it must produce a result that is indistinguishable from that which was done post-update. If the post-update check fails, the mutation request should fail with an error using the error code mutation-permission-check-failure.
  • returning_fields: This specifies a list of fields to return in the response. The property takes the same format as the fields property on Queries. It is expected that the specified fields will be returned for all rows affected by the update (ie. all updated rows).

Update operations return responses that are the same as insert operations, except the affected rows in returning are naturally the updated rows instead.

Delete Operations

Here's an example of a mutation that deletes a Track row:

mutation UpdateTrack {
  delete_Track(
    where: { TrackId: { _eq: 1 } },
    ) {
    affected_rows
    returning {
      TrackId
      Milliseconds
    }
  }
}

This would cause a mutation request to be send that looks like this:

{
  "table_relationships": [],
  "operations": [
    {
      "type": "delete",
      "table": ["Track"],
      "where": {
        "type": "binary_op",
        "operator": "equal",
        "column": {
          "name": "TrackId",
          "column_type": "number"
        },
        "value": {
          "type": "scalar",
          "value": 1,
          "value_type": "number"
        }
      },
      "returning_fields": {
        "TrackId": {
          "type": "column",
          "column": "TrackId",
          "column_type": "number"
        }
      }
    }
  ]
}

Breaking down the properties in the delete-typed mutation operation:

  • table: specifies the table we're deleting rows from
  • where: An expression (same as the expression in a Query's where property) that is used to select the matching rows to delete
  • returning_fields: This specifies a list of fields to return in the response. The property takes the same format as the fields property on Queries. It is expected that the specified fields will be returned for all rows affected by the deletion (ie. all deleted rows).

Delete operations return responses that are the same as insert and update operations, except the affected rows in returning are the deleted rows instead.

Datasets

The /datasets resource is available to use in order to create new databases/schemas from templates.

Datasets are represented by abstract names referencing database-schema templates that can be cloned from and clones that can be used via config and deleted. This feature is required for testing the mutations feature, but may also have non-test uses - for example - spinning up interactive demo projects.

The /datasets/:name resource has the following methods:

  • GET /datasets/templates/:template_name -> {"exists": true|false}
  • POST /datasets/clones/:clone_name {"from": template_name} -> {"config": {...}}
  • DELETE /datasets/clones/:clone_name -> {"message": "success"}

The POST method is the most significant way to interact with the API. It allows for cloning a dataset template to a new name. The new name can be used to delete the dataset, and the config returned from the POST API call can be used as the config header for non-dataset interactions such as queries.

The following diagram shows the interactions between the various datatypes and resource methods:

flowchart TD;
    NAME["Dataset Name"] --> GET["GET /datasets/templates/:template_name"];
    style NAME stroke:#0f3,stroke-width:2px
    NAME -- clone_name --> POST;
    NAME -- from --> POST["POST /datasets/clones/:clone_name { from: TEMPLATE_NAME }"];
    GET --> EXISTS["{ exists: true }"];
    GET --> EXISTSF["{ exists: false }"];
    GET --> FAILUREG["400"];
    style FAILUREG stroke:#f33,stroke-width:2px
    POST --> FAILUREP["400"];
    style FAILUREP stroke:#f33,stroke-width:2px
    NAME --> DELETE["DELETE /datasets/clones/:clone_name"];
    POST --> CONFIG["Source Config"];
    style CONFIG stroke:#0f3,stroke-width:2px
    DELETE --> SUCCESSD["{ message: 'success' }"];
    DELETE --> FAILURED["400"];
    style FAILURED stroke:#f33,stroke-width:2px
    CONFIG --> SCHEMA["POST /schema"];
    CONFIG --> QUERY["POST /query"];
    CONFIG --> MUTATION["POST /mutation"];