Add new documentation folder and two documentation files

As discussed on Slack, we're thinking of updating the documentation that was in the old graphql-engine wiki, and bring it to the repo. This PR creates the new folder, and adds two pieces of documentation: a high-level overview of the system, and a description of our schema-building code (the so-called "PDV parsers").

One thing to be discussed: how do we do cross-file references? To an extent, this will depend on what tool we use for rendering them, to where we upload the generated files.

Another point is: we should update our CI to avoid running tests on changes in `server/documentation`.

Rendered:
- [high level overview](https://github.com/hasura/graphql-engine-mono/blob/nicuveo/server-docs/server/documentation/overview.md)
- [schema code](https://github.com/hasura/graphql-engine-mono/blob/nicuveo/server-docs/server/documentation/schema.md)

https://github.com/hasura/graphql-engine-mono/pull/2219

GitOrigin-RevId: f729616d28e8f2f30c0a07583b53460510bafd80
This commit is contained in:
Antoine Leblanc 2021-09-15 11:03:31 +01:00 committed by hasura-bot
parent 982b5a3d15
commit 75d374bf26
2 changed files with 575 additions and 0 deletions

View File

@ -0,0 +1,190 @@
## Table of contents
<!--
Please make sure you update the table of contents when modifying this file. If
you're using emacs, you can automatically do so using the command mentioned in
the generated comment below (provided by the package markdown-toc).
-->
<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
- [High-level architecture overview](#high-level-architecture-overview)
- [Schema cache](#schema-cache)
- [Schema parsers](#schema-parsers)
- [Query execution](#query-execution)
- [High-level code structure](#high-level-code-structure)
- [Non-Hasura code](#non-hasura-code)
- [Hasura.Base](#hasurabase)
- [Hasura.Server](#hasuraserver)
- [Hasura.Backends](#hasurabackends)
- [Hasura.RQL](#hasurarql)
- [Hasura.RQL.DML](#hasurarqldml)
- [Hasura.RQL.IR](#hasurarqlir)
- [Hasura.RQL.DDL](#hasurarqlddl)
- [Hasura.RQL.Types](#hasurarqltypes)
- [Hasura.GraphQL](#hasuragraphql)
- [Hasura.GraphQL.Parser](#hasuragraphqlparser)
- [Hasura.GraphQL.Schema](#hasuragraphqlschema)
- [Hasura.GraphQL.Execute](#hasuragraphqlexecute)
- [Hasura.GraphQL.Transport](#hasuragraphqltransport)
<!-- markdown-toc end -->
## High-level architecture overview
This diagram shows the high-level components of the system.
![high-level architecture overview](imgs/architecture.png)
### Schema cache
The _schema cache_ is a live, annotated copy of our metadata that lives in the
Haskell process. It is defined in `Hasura.RQL.Types.SchemaCache`. It contains a
`SourceInfo` for each source that has been added (see
`Hasura.RQL.Types.Source`), which contains all that source's info, such as
information on all tracked tables, tracked functions, connection
information... It also contains information about tracked remote schemas.
Most importantly, it contains the cached version of the schema parsers. More
information in the [Metadata](#metadata) section.
### Schema parsers
We have a unified piece of code that is used to generate the GraphQL schema and
its corresponding parsers at the same time from information in the schema cache
(see our schema cache documentation (TODO: add link) for more information). That
code lives in `Hasura/GraphQL/Schema`. The result is stored back in the schema
cache, as a hashmap from role name to the corresponding "context" (its set of
parsers).
The parsers themselves transform the incoming GraphQL query into an intermediary
representation, defined in `Hasura.RQL.IR`.
### Query execution
When a graphql query is received, it hits the transport layer (HTTP or
Websocket, see `Hasura.GraphQL.Transport`). Based on the user's role, the
correct parser is taken from the schema cache, and applied to the query, to
translate into the IR. We treat each "root field" independently (see `RootField`
in `Hasura.RQL.IR.Root`).
From the IR, we can generate an execution plan: the set of monadic actions that
correspond to each root field; that code lives in
`Hasura.GraphQL.Execute`. After that, the transport layer, which received and
translated the query, runs the generated actions and joins the result.
## High-level code structure
The code structure isn't perfect and has suffered from the heavy changes made to
the codebase over time. Here's a rough outline, that points to both the current
structure and, to some extent, what the structure should be.
### Non-Hasura code
Code outside of the `Hasura` folder (like in `Control` and `Data`) are
standalone libraries that could be open-sourced as they are not specific to our
project, such as `Control.Arrow.Trans`. Most commonly, we extend exiting third
party libraries to add useful missing functions; see for instance:
- `Data.Text.Extended`
- `Control.Concurrent.Extended`
As a rule, those "Extended" modules should follow the following guidelines:
- they re-export the original module they extend, so that the rest of the code can always import the extended library
- they do not depend on any `Hasura` with the (arguable) exception of the `Prelude`
### Hasura.Base
The goal of `Hasura.Base` is to be the place where all "base" code, that do not
depend on any other Hasura code but the Prelude, but that is specific enough
that it doesn't belong outside of the `Hasura`. It currently contains, for
instance:
- `Hasura.Base.Error`, which defines our error-handling code, and the one error
sum type we use throughout the entire codebase
- `Hasura.Base.Instances`, which defines all missing instances from third-party
code
More code will be moved in it as we make progress. For instance, tracing and
logging could belong in `Base`, as they are "base" pieces of code that we want
the entire codebase to have access to.
### Hasura.Server
`Hasura.Server` contains all of the network stack, the initialisation code, and
the endpoints' routes. APIs are defined in `Hasura.Server.API`, authentication
code in `Hasura.Server.Auth`.
### Hasura.Backends
Most of the code in `Hasura.GraphQL` is generic, and expressed in a
backend-agnostic way. The code for each backend lives in that backend's
`Hasura.Backends.[Name]` folder. More details in our backend architecture
documentation (TODO: insert link here).
### Hasura.RQL
Before the graphql-engine was the graphql-engine, it supported a JSON-based API
called RQL. A lot of the code still lives in the RQL "primordial soup".
#### Hasura.RQL.DML
This DML is the aforementioned JSON API: this folder contains the data
structures that represent old-style Postgres-only table level Select/Insert/Update/Delete requests.
This API is deprecated, not used by any of our users to our knowledge,
and is only kept in the code because the console still uses it.
#### Hasura.RQL.IR
This is our intermediary representation, into which an incoming query (RQL or
GraphQL) gets translated.
#### Hasura.RQL.DDL
This is where our _Metadata API_'s code lives: the API, as described by
`Hasura.Server.API.Metadata.RQLMetadataV1`, is routed to corresponding functions
in `Hasura.RQL.DDL`.
#### Hasura.RQL.Types
This is a somewhat outdated folder that we need to break down in individual
components. As of now, this folder contains all the "base types" of our codebase
and, despite the folder's name, a lot of the code of the engine's core
behaviour. This includes, among others, most of our metadata code: our
representation types (how do we represent information about tables or columns
for instance) and some of the metadata building process (including all code
related to the internal dependency graph of metadata objects).
### Hasura.GraphQL
This folder contains most of the GraphQL API stack; specifically:
#### Hasura.GraphQL.Parser
This folder is where the schema building combinators are defined. It is intended
as an almost standalone library, see [the Schema Building section](#building-the-schema), used by our
schema code.
#### Hasura.GraphQL.Schema
In turn, this is where the aforementioned parsers are used to build the GraphQL
schema from our metadata. The folder itself contains all the individual
components of our schema: how to construct a parser for a table's selection set,
for a remote schema's input arguments, and so on. It's in the root file,
`Hasura/GraphQL/Schema.hs`, that they're all put together to build the schema;
see more information about this in our [schema documentation](schema.md).
#### Hasura.GraphQL.Execute
While most of the actual translation from the `IR` into a SQL query string is
backend-specific, the top-level processing of GraphQL requests (be they Queries, Mutations or subscriptions) is backend agnostic and lives in `GraphQL.Execute`.
At the time of this writing it also still contains Postgres-specific code that has yet to
be generalized.
Furthermore, it contains all code related to the execution of queries targeting
remote schemas, and high-level code for remote joins.
#### Hasura.GraphQL.Transport
The code in this folder is both the actual entry point of query processing
(`Transport` calls `Execute` which calls the parsers), and the actual execution of
the root fields: in each file (HTTP and WebSocket) is where there's the long
case switch of how to execute each step over the network.

View File

@ -0,0 +1,385 @@
## Table of contents
<!--
Please make sure you update the table of contents when modifying this file. If
you're using emacs, you can automatically do so using the command mentioned in
the generated comment below (provided by the package markdown-toc).
-->
<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
- [Terminology](#terminology)
- [The `Parser` type](#the-parser-type)
- [Output](#output)
- [Input](#input)
- [Recursion, and tying the knot](#recursion-and-tying-the-knot)
- [The GraphQL context](#the-graphql-context)
<!-- markdown-toc end -->
## Building the schema
We use the same piece of code to generate the GraphQL schema and parse
it, to ensure those two parts of the code are always consistent. In
practice, we do this by building _introspectable_ parsers, in the
style of parser combinators, which turn an incoming GraphQL AST into
our internal representation ([IR](#ir)).
### Terminology
The schema building code takes as input our metadata: what sources do
we have, what tables are tracked, what columns do they have... and
builds the corresponding GraphQL schema. More precisely, its output is
a series of _parsers_. That term is controversial, as it is ambiguous.
To clarify: an incoming request is first parsed from a raw string into
a GraphQL AST using our
[graphql-parser-hs](https://github.com/hasura/graphql-parser-hs)
library. At that point, no semantic analysis is performed: the output
of that phase will be a GraphQL document: a simple AST, on which no
semantic verification has been performed. The second step is to apply
those "schema parsers": their input is that GraphQL AST, and their
output is a semantic representation of the query (see the
[Output](#output) section).
To summarize: ahead of time, based on our metadata, we generate
*schema parsers*: they will parse an incoming GraphQL AST into a
transformed semantic AST, based on whether that incoming query is
consistent with our metadata.
### The `Parser` type
We have different types depending on what we're parsing: `Parser` for types in
the GraphQL schema, `FieldParser` for a field in an output type, and
`InputFieldsParser` for field in input types. But all three of them share a
similar structure: they combine static type information, and the actual parsing
function:
```haskell
data Parser n a = Parser
{ parserType :: TypeInfo
, parserFunc :: ParserInput -> n a
}
```
The GraphQL schema is a graph of types, stemming from one of the three roots: the
`query_root`, `mutation_root`, and `subscription_root` types. Consequently, if
we correctly build the `Parser` for the `query_root` type, then its `TypeInfo`
will be a "root" of the full graph.
What our combinators do is recursively build both at the same time. For
instance, consider `nullable` from `Hasura.GraphQL.Parser.Internal.Parser` (here
simplified a bit for readability):
```haskell
nullable :: MonadParse m => Parser m a -> Parser m (Maybe a)
nullable parser = Parser
{ pType = nullableType $ pType parser
, pParser = \case
JSONValue A.Null -> pure Nothing
GraphQLValue VNull -> pure Nothing
value -> Just <$> pParser parser value
}
```
Given a parser for a value type `a`, this function translates it into a parser
of `Maybe a` that tolerates "null" values and updates its internal type
information to transform the corresponding GraphQL non-nullable `TypeName!` into
a nullable `TypeName`.
### Output
While the parsers keep track of the GraphQL types, their output is our IR: we
transform the incoming GrapQL AST into a semantic representation. This is
clearly visible with input fields, like in field arguments. For instance, this
is the definition of the parser for the arguments to a table (here again
slightly simplified for readability):
```haskell
tableArguments
:: (MonadSchema m, MonadParse n)
=> SourceName -- ^ name of the source we're building the schema for
-> TableInfo b -- ^ internal information about the given table (e.g. columns)
-> SelPermInfo b -- ^ selection permissions for that table
-> m (
-- parser for a group of input fields, such as arguments to a field
-- has an Applicative instance to allow to write one parser for a group of
-- arguments
InputFieldsParser n (
-- internal representation of the arguments to a table select
IR.SelectArgs b
)
)
tableArguments sourceName tableInfo selectPermissions = do
-- construct other parsers in the outer `m` monad
whereParser <- tableWhereArg sourceName tableInfo selectPermissions
orderByParser <- tableOrderByArg sourceName tableInfo selectPermissions
distinctParser <- tableDistinctArg sourceName tableInfo selectPermissions
-- combine them using an "applicative do"
pure do
whereArg <- whereParser
orderByArg <- orderByParser
limitArg <- tableLimitArg
offsetArg <- tableOffsetArg
distinctArg <- distinctParser
pure $ IR.SelectArgs
{ IR._saWhere = whereArg
, IR._saOrderBy = orderByArg
, IR._saLimit = limitArg
, IR._saOffset = offsetArg
, IR._saDistinct = distinctArg
}
```
Running the parser on the input will yield the `SelectArgs`; if used for a field
name `article`, it will result in the following GraphQL schema, if introspected
(null fields omitted for brevity):
```json
{
"fields": [
{
"name": "article",
"args": [
{
"name": "distinct_on",
"type": {
"name": null,
"kind": "LIST",
"ofType": {
"name": null,
"kind": "NON_NULL",
"ofType": {
"name": "article_select_column",
"kind": "ENUM"
}
}
}
},
{
"name": "limit",
"type": {
"name": "Int",
"kind": "SCALAR",
"ofType": null
}
},
{
"name": "offset",
"type": {
"name": "Int",
"kind": "SCALAR",
"ofType": null
}
},
{
"name": "order_by",
"type": {
"name": null,
"kind": "LIST",
"ofType": {
"name": null,
"kind": "NON_NULL",
"ofType": {
"name": "article_order_by",
"kind": "INPUT_OBJECT"
}
}
}
},
{
"name": "where",
"type": {
"name": "article_bool_exp",
"kind": "INPUT_OBJECT",
"ofType": null
}
}
]
}
]
}
```
### Input
There is a noteworthy peculiarity with the input of the parsers for
GraphQL input types: some of the values we parse are JSON values,
supplied to a query by means of variable assignment:
```graphql
mutation($name: String!, $shape: geometry!) {
insert_location_one(object: {name: $name, shape: $shape}) {
id
}
}
```
The GraphQL spec doesn't mandate a transport format for the variables;
the fact that they are encoding using JSON is a choice on our
part. However, this poses a problem: a variable's JSON value might not
be representable as a GraphQL value, as GraphQL values are a strict
subset of JSON values. For instance, the JSON object `{"1": "2"}` is
not representable in GraphQL, as `"1"` is not a valid key. This is not
an issue, as the spec doesn't mandate that the variables be translated
into GraphQL values; their parsing and validation is left entirely to
the service. However, it means that we need to keep track, when
parsing input values, of whether that value is coming from a GraphQL
literal or from a JSON value. Furthermore, we delay the expansion of
variables, in order to do proper type-checking.
Consequently, we represent input variables with the following type
(defined in `Hasura.GraphQL.Parser.Schema`):
```haskell
data InputValue v
= GraphQLValue (G.Value v)
| JSONValue J.Value
```
Whenever we need to inspect an input value, we usually start by "peeling the
variable" (see `peelVariable` in `Hasura.GraphQL.Parser.Internal.TypeChecking`),
to guarantee that we have an actual literal to inspect; we still end up with
either a GraphQL value, or a JSON value; parsers usually deal with both, such as
`nullable` (see section [The Parser Type](#the-parser-type)), or like scalar
parsers do (see `Hasura.GraphQL.Parser.Internal.Scalars`):
```haskell
float :: MonadParse m => Parser 'Both m Double
float = Parser
{ pType = schemaType
, pParser =
-- we first unpack the variable, if any:
peelVariable (toGraphQLType schemaType) >=> \case
-- we deal with valid GraphQL values
GraphQLValue (VFloat f) -> convertWith scientificToFloat f
GraphQLValue (VInt i) -> convertWith scientificToFloat $ fromInteger i
-- we deal with valid JSON values
JSONValue (A.Number n) -> convertWith scientificToFloat n
-- we reject everything else
v -> typeMismatch floatScalar "a float" v
}
where
schemaType = NonNullable $ TNamed $ mkDefinition "Float" Nothing TIScalar
```
This allows us to incrementally unpack JSON values without having to fully
transform them into GraphQL values in the first place; the following is
therefore accepted:
```
# graphql
query($w: foo_bool_exp!) {
foo(where: $w) {
id
}
}
# json
{
"w": {
# graphql boolean expression
"foo_json_field": {
"_eq": {
# json value that cannot be translated
"1": "2"
}
}
}
}
```
The parsers will unpack the variable into the `JSONValue` constructor, and the
`object` combinator will unpack the fields when parsing a boolean expression;
but the remaining `JSONValue $ Object [("1", String "2")]` will not be
translated into a GraphQL value, and parsed directly from JSON by the
appropriate value parser.
Step-by step:
- the value given to our `where` argument is a `GraphQLValue`, that
contains a `VVariable` and its JSON value;
- when parsing the argument's `foo_bool_exp` type, we expect an
object: we "peel" the variable, and our input value becomes a
`JSONValue`, containing one entry, `"foo_json_field"`;
- we then parse each field one by one; to parse our one field, we
first focus our input value on the actual field, and refine our
input value to the content of thee field: `JSONValue $ Object
[("_eq", Object [("1", String "2")])]`;
- that field's argument is a boolean expression, which is also an
object: we repeat the same process;
- when finally parsing the argument to `_eq`, we are no longer in the
realm of GraphQL syntax: the argument to `_eq` is whatever a value
of that database column is; we use the appropriate column parser to
interpret `{"1": "2"}` without treating is as a GraphQL value.
### Recursion, and tying the knot
One major hurdle that we face when building the schema is that, due to
relationships, our schema is not a tree, but a graph. Consider for instance two
tables, `author` and `article`, joined by two opposite relationships (one-to-one
AKA "object relationship, and one-to-many AKA "array relationship",
respectively); the GraphQL schema will end up with something akin to:
```graphql
type Author {
id: Int!
name: String!
articles: [Article!]!
}
type Article {
id: Int!
name: String!
author: Author!
}
```
To build the schema parsers for a query that selects those tables, we are going to
end up with code that would be essentially equivalent to:
```haskell
selectTable tableName tableInfo = do
arguments <- tableArguments tableInfo
selectionSet <- traverse mkField $ fields tableInfo
pure $ selection tableName arguments selectionSet
mkField = \case
TableColumn c -> field_ (name c)
Relationship r -> field (name r) $ selectTable (target r)
```
We would end up with an infinite recursion building the parsers:
```
-> selectTable "author"
-> mkField "articles"
-> selectTable "article"
-> mkField "author"
-> selectTable "author"
```
To avoid this, we *memoize* the parsers as we build them. This is, however,
quite tricky: since building a parser might require it knowing about itself, we
cannot memoize it after it's build; we have to memoize it *before*. What we end
up doing is relying on `unsafeInterleaveIO` to store in the memoization cache a
value whose evaluation will be delayed, and that can be updated after we're done
evaluating the parser. The relevant code lives in `Hasura.GraphQL.Parser.Monad`.
This all works just fine as long as building a parser doesn't require forcing
its own evaluation: as long as the newly built parser only references fields to
itself, the graph will be properly constructed, and the knot will be tied.
In practice, that means that the schema building code has *two layers of
monads*: most functions in `Hasura.GraphQL.Schema.*`, that build parsers for
various GraphQL types, return the constructed parser in an outer monad `m` which
is an instance of `MonadSchema`; the parser itself operates in an inner monad
`n`, which is an instance of `MonadParse`.
### The GraphQL context
It's in `Hasura.GraphQL.Schema` that we build the actual "context" that is
stored in the [SchemaCache](#schema-cache): for each role we build the
`query_root` and the `mutation_root` (if any) by going over each source's table
cache and function cache. We do not have dedicated code for the subscription
root: we reuse the appropriate subset of the query root to build the
subscription root.