Add new documentation folder and two documentation files

As discussed on Slack, we're thinking of updating the documentation that was in the old graphql-engine wiki, and bring it to the repo. This PR creates the new folder, and adds two pieces of documentation: a high-level overview of the system, and a description of our schema-building code (the so-called "PDV parsers"). One thing to be discussed: how do we do cross-file references? To an extent, this will depend on what tool we use for rendering them, to where we upload the generated files. Another point is: we should update our CI to avoid running tests on changes in `server/documentation`. Rendered: - [high level overview](https://github.com/hasura/graphql-engine-mono/blob/nicuveo/server-docs/server/documentation/overview.md) - [schema code](https://github.com/hasura/graphql-engine-mono/blob/nicuveo/server-docs/server/documentation/schema.md) https://github.com/hasura/graphql-engine-mono/pull/2219 GitOrigin-RevId: f729616d28e8f2f30c0a07583b53460510bafd80
2024-12-15 01:12:56 +03:00 · 2021-09-15 11:03:31 +01:00 · 2021-09-15 11:03:31 +01:00 · 75d374bf26
commit 75d374bf26
parent 982b5a3d15
2 changed files with 575 additions and 0 deletions
--- a/server/documentation/overview.md
+++ b/server/documentation/overview.md
@ -0,0 +1,190 @@
+## Table of contents
+
+<!--
+Please make sure you update the table of contents when modifying this file. If
+you're using emacs, you can automatically do so using the command mentioned in
+the generated comment below (provided by the package markdown-toc).
+-->
+
+<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
+
+- [High-level architecture overview](#high-level-architecture-overview)
+    - [Schema cache](#schema-cache)
+    - [Schema parsers](#schema-parsers)
+    - [Query execution](#query-execution)
+- [High-level code structure](#high-level-code-structure)
+    - [Non-Hasura code](#non-hasura-code)
+    - [Hasura.Base](#hasurabase)
+    - [Hasura.Server](#hasuraserver)
+    - [Hasura.Backends](#hasurabackends)
+    - [Hasura.RQL](#hasurarql)
+        - [Hasura.RQL.DML](#hasurarqldml)
+        - [Hasura.RQL.IR](#hasurarqlir)
+        - [Hasura.RQL.DDL](#hasurarqlddl)
+        - [Hasura.RQL.Types](#hasurarqltypes)
+    - [Hasura.GraphQL](#hasuragraphql)
+        - [Hasura.GraphQL.Parser](#hasuragraphqlparser)
+        - [Hasura.GraphQL.Schema](#hasuragraphqlschema)
+        - [Hasura.GraphQL.Execute](#hasuragraphqlexecute)
+        - [Hasura.GraphQL.Transport](#hasuragraphqltransport)
+
+<!-- markdown-toc end -->
+
+## High-level architecture overview
+
+This diagram shows the high-level components of the system.
+
+![high-level architecture overview](imgs/architecture.png)
+
+### Schema cache
+
+The _schema cache_ is a live, annotated copy of our metadata that lives in the
+Haskell process. It is defined in `Hasura.RQL.Types.SchemaCache`. It contains a
+`SourceInfo` for each source that has been added (see
+`Hasura.RQL.Types.Source`), which contains all that source's info, such as
+information on all tracked tables, tracked functions, connection
+information... It also contains information about tracked remote schemas.
+
+Most importantly, it contains the cached version of the schema parsers. More
+information in the [Metadata](#metadata) section.
+
+### Schema parsers
+
+We have a unified piece of code that is used to generate the GraphQL schema and
+its corresponding parsers at the same time from information in the schema cache
+(see our schema cache documentation (TODO: add link) for more information). That
+code lives in `Hasura/GraphQL/Schema`. The result is stored back in the schema
+cache, as a hashmap from role name to the corresponding "context" (its set of
+parsers).
+
+The parsers themselves transform the incoming GraphQL query into an intermediary
+representation, defined in `Hasura.RQL.IR`.
+
+### Query execution
+
+When a graphql query is received, it hits the transport layer (HTTP or
+Websocket, see `Hasura.GraphQL.Transport`). Based on the user's role, the
+correct parser is taken from the schema cache, and applied to the query, to
+translate into the IR. We treat each "root field" independently (see `RootField`
+in `Hasura.RQL.IR.Root`).
+
+From the IR, we can generate an execution plan: the set of monadic actions that
+correspond to each root field; that code lives in
+`Hasura.GraphQL.Execute`. After that, the transport layer, which received and
+translated the query, runs the generated actions and joins the result.
+
+## High-level code structure
+
+The code structure isn't perfect and has suffered from the heavy changes made to
+the codebase over time. Here's a rough outline, that points to both the current
+structure and, to some extent, what the structure should be.
+
+### Non-Hasura code
+
+Code outside of the `Hasura` folder (like in `Control` and `Data`) are
+standalone libraries that could be open-sourced as they are not specific to our
+project, such as `Control.Arrow.Trans`. Most commonly, we extend exiting third
+party libraries to add useful missing functions; see for instance:
+- `Data.Text.Extended`
+- `Control.Concurrent.Extended`
+
+As a rule, those "Extended" modules should follow the following guidelines:
+- they re-export the original module they extend, so that the rest of the code can always import the extended library
+- they do not depend on any `Hasura` with the (arguable) exception of the `Prelude`
+
+### Hasura.Base
+
+The goal of `Hasura.Base` is to be the place where all "base" code, that do not
+depend on any other Hasura code but the Prelude, but that is specific enough
+that it doesn't belong outside of the `Hasura`. It currently contains, for
+instance:
+- `Hasura.Base.Error`, which defines our error-handling code, and the one error
+  sum type we use throughout the entire codebase
+- `Hasura.Base.Instances`, which defines all missing instances from third-party
+  code
+
+More code will be moved in it as we make progress. For instance, tracing and
+logging could belong in `Base`, as they are "base" pieces of code that we want
+the entire codebase to have access to.
+
+### Hasura.Server
+
+`Hasura.Server` contains all of the network stack, the initialisation code, and
+the endpoints' routes. APIs are defined in `Hasura.Server.API`, authentication
+code in `Hasura.Server.Auth`.
+
+### Hasura.Backends
+
+Most of the code in `Hasura.GraphQL` is generic, and expressed in a
+backend-agnostic way. The code for each backend lives in that backend's
+`Hasura.Backends.[Name]` folder. More details in our backend architecture
+documentation (TODO: insert link here).
+
+### Hasura.RQL
+
+Before the graphql-engine was the graphql-engine, it supported a JSON-based API
+called RQL. A lot of the code still lives in the RQL "primordial soup".
+
+#### Hasura.RQL.DML
+
+This DML is the aforementioned JSON API: this folder contains the data
+structures that represent old-style Postgres-only table level Select/Insert/Update/Delete requests.
+This API is deprecated, not used by any of our users to our knowledge,
+and is only kept in the code because the console still uses it.
+
+#### Hasura.RQL.IR
+
+This is our intermediary representation, into which an incoming query (RQL or
+GraphQL) gets translated.
+
+#### Hasura.RQL.DDL
+
+This is where our _Metadata API_'s code lives: the API, as described by
+`Hasura.Server.API.Metadata.RQLMetadataV1`, is routed to corresponding functions
+in `Hasura.RQL.DDL`.
+
+#### Hasura.RQL.Types
+
+This is a somewhat outdated folder that we need to break down in individual
+components. As of now, this folder contains all the "base types" of our codebase
+and, despite the folder's name, a lot of the code of the engine's core
+behaviour. This includes, among others, most of our metadata code: our
+representation types (how do we represent information about tables or columns
+for instance) and some of the metadata building process (including all code
+related to the internal dependency graph of metadata objects).
+
+### Hasura.GraphQL
+
+This folder contains most of the GraphQL API stack; specifically:
+
+#### Hasura.GraphQL.Parser
+
+This folder is where the schema building combinators are defined. It is intended
+as an almost standalone library, see [the Schema Building section](#building-the-schema), used by our
+schema code.
+
+#### Hasura.GraphQL.Schema
+
+In turn, this is where the aforementioned parsers are used to build the GraphQL
+schema from our metadata. The folder itself contains all the individual
+components of our schema: how to construct a parser for a table's selection set,
+for a remote schema's input arguments, and so on. It's in the root file,
+`Hasura/GraphQL/Schema.hs`, that they're all put together to build the schema;
+see more information about this in our [schema documentation](schema.md).
+
+#### Hasura.GraphQL.Execute
+
+While most of the actual translation from the `IR` into a SQL query string is
+backend-specific, the top-level processing of GraphQL requests (be they Queries, Mutations or subscriptions) is backend agnostic and lives in `GraphQL.Execute`.
+At the time of this writing it also still contains Postgres-specific code that has yet to
+be generalized.
+
+Furthermore, it contains all code related to the execution of queries targeting
+remote schemas, and high-level code for remote joins.
+
+#### Hasura.GraphQL.Transport
+
+The code in this folder is both the actual entry point of query processing
+(`Transport` calls `Execute` which calls the parsers), and the actual execution of
+the root fields: in each file (HTTP and WebSocket) is where there's the long
+case switch of how to execute each step over the network.
--- a/server/documentation/schema.md
+++ b/server/documentation/schema.md
@ -0,0 +1,385 @@
+## Table of contents
+
+<!--
+Please make sure you update the table of contents when modifying this file. If
+you're using emacs, you can automatically do so using the command mentioned in
+the generated comment below (provided by the package markdown-toc).
+-->
+
+<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
+
+- [Terminology](#terminology)
+- [The `Parser` type](#the-parser-type)
+- [Output](#output)
+- [Input](#input)
+- [Recursion, and tying the knot](#recursion-and-tying-the-knot)
+- [The GraphQL context](#the-graphql-context)
+
+<!-- markdown-toc end -->
+
+## Building the schema
+
+We use the same piece of code to generate the GraphQL schema and parse
+it, to ensure those two parts of the code are always consistent. In
+practice, we do this by building _introspectable_ parsers, in the
+style of parser combinators, which turn an incoming GraphQL AST into
+our internal representation ([IR](#ir)).
+
+
+### Terminology
+
+The schema building code takes as input our metadata: what sources do
+we have, what tables are tracked, what columns do they have... and
+builds the corresponding GraphQL schema. More precisely, its output is
+a series of _parsers_. That term is controversial, as it is ambiguous.
+
+To clarify: an incoming request is first parsed from a raw string into
+a GraphQL AST using our
+[graphql-parser-hs](https://github.com/hasura/graphql-parser-hs)
+library. At that point, no semantic analysis is performed: the output
+of that phase will be a GraphQL document: a simple AST, on which no
+semantic verification has been performed. The second step is to apply
+those "schema parsers": their input is that GraphQL AST, and their
+output is a semantic representation of the query (see the
+[Output](#output) section).
+
+To summarize: ahead of time, based on our metadata, we generate
+*schema parsers*: they will parse an incoming GraphQL AST into a
+transformed semantic AST, based on whether that incoming query is
+consistent with our metadata.
+
+### The `Parser` type
+
+We have different types depending on what we're parsing: `Parser` for types in
+the GraphQL schema, `FieldParser` for a field in an output type, and
+`InputFieldsParser` for field in input types. But all three of them share a
+similar structure: they combine static type information, and the actual parsing
+function:
+
+```haskell
+data Parser n a = Parser
+  { parserType :: TypeInfo
+  , parserFunc :: ParserInput -> n a
+  }
+```
+
+The GraphQL schema is a graph of types, stemming from one of the three roots: the
+`query_root`, `mutation_root`, and `subscription_root` types. Consequently, if
+we correctly build the `Parser` for the `query_root` type, then its `TypeInfo`
+will be a "root" of the full graph.
+
+What our combinators do is recursively build both at the same time. For
+instance, consider `nullable` from `Hasura.GraphQL.Parser.Internal.Parser` (here
+simplified a bit for readability):
+
+```haskell
+nullable :: MonadParse m => Parser m a -> Parser m (Maybe a)
+nullable parser = Parser
+  { pType = nullableType $ pType parser
+  , pParser = \case
+      JSONValue A.Null   -> pure Nothing
+      GraphQLValue VNull -> pure Nothing
+      value              -> Just <$> pParser parser value
+  }
+```
+
+Given a parser for a value type `a`, this function translates it into a parser
+of `Maybe a` that tolerates "null" values and updates its internal type
+information to transform the corresponding GraphQL non-nullable `TypeName!` into
+a nullable `TypeName`.
+
+### Output
+
+While the parsers keep track of the GraphQL types, their output is our IR: we
+transform the incoming GrapQL AST into a semantic representation. This is
+clearly visible with input fields, like in field arguments. For instance, this
+is the definition of the parser for the arguments to a table (here again
+slightly simplified for readability):
+
+```haskell
+tableArguments
+  :: (MonadSchema m, MonadParse n)
+  => SourceName     -- ^ name of the source we're building the schema for
+  -> TableInfo b    -- ^ internal information about the given table (e.g. columns)
+  -> SelPermInfo b  -- ^ selection permissions for that table
+  -> m (
+       -- parser for a group of input fields, such as arguments to a field
+       -- has an Applicative instance to allow to write one parser for a group of
+       -- arguments
+       InputFieldsParser n (
+         -- internal representation of the arguments to a table select
+         IR.SelectArgs b
+       )
+     )
+tableArguments sourceName tableInfo selectPermissions = do
+  -- construct other parsers in the outer `m` monad
+  whereParser    <- tableWhereArg    sourceName tableInfo selectPermissions
+  orderByParser  <- tableOrderByArg  sourceName tableInfo selectPermissions
+  distinctParser <- tableDistinctArg sourceName tableInfo selectPermissions
+  -- combine them using an "applicative do"
+  pure do
+    whereArg    <- whereParser
+    orderByArg  <- orderByParser
+    limitArg    <- tableLimitArg
+    offsetArg   <- tableOffsetArg
+    distinctArg <- distinctParser
+    pure $ IR.SelectArgs
+      { IR._saWhere    = whereArg
+      , IR._saOrderBy  = orderByArg
+      , IR._saLimit    = limitArg
+      , IR._saOffset   = offsetArg
+      , IR._saDistinct = distinctArg
+      }
+```
+
+Running the parser on the input will yield the `SelectArgs`; if used for a field
+name `article`, it will result in the following GraphQL schema, if introspected
+(null fields omitted for brevity):
+
+```json
+{
+  "fields": [
+    {
+      "name": "article",
+      "args": [
+        {
+          "name": "distinct_on",
+          "type": {
+            "name": null,
+            "kind": "LIST",
+            "ofType": {
+              "name": null,
+              "kind": "NON_NULL",
+              "ofType": {
+                "name": "article_select_column",
+                "kind": "ENUM"
+              }
+            }
+          }
+        },
+        {
+          "name": "limit",
+          "type": {
+            "name": "Int",
+            "kind": "SCALAR",
+            "ofType": null
+          }
+        },
+        {
+          "name": "offset",
+          "type": {
+            "name": "Int",
+            "kind": "SCALAR",
+            "ofType": null
+          }
+        },
+        {
+          "name": "order_by",
+          "type": {
+            "name": null,
+            "kind": "LIST",
+            "ofType": {
+              "name": null,
+              "kind": "NON_NULL",
+              "ofType": {
+                "name": "article_order_by",
+                "kind": "INPUT_OBJECT"
+              }
+            }
+          }
+        },
+        {
+          "name": "where",
+          "type": {
+            "name": "article_bool_exp",
+            "kind": "INPUT_OBJECT",
+            "ofType": null
+          }
+        }
+      ]
+    }
+  ]
+}
+```
+
+### Input
+
+There is a noteworthy peculiarity with the input of the parsers for
+GraphQL input types: some of the values we parse are JSON values,
+supplied to a query by means of variable assignment:
+
+```graphql
+mutation($name: String!, $shape: geometry!) {
+  insert_location_one(object: {name: $name, shape: $shape}) {
+    id
+  }
+}
+```
+
+The GraphQL spec doesn't mandate a transport format for the variables;
+the fact that they are encoding using JSON is a choice on our
+part. However, this poses a problem: a variable's JSON value might not
+be representable as a GraphQL value, as GraphQL values are a strict
+subset of JSON values. For instance, the JSON object `{"1": "2"}` is
+not representable in GraphQL, as `"1"` is not a valid key. This is not
+an issue, as the spec doesn't mandate that the variables be translated
+into GraphQL values; their parsing and validation is left entirely to
+the service. However, it means that we need to keep track, when
+parsing input values, of whether that value is coming from a GraphQL
+literal or from a JSON value. Furthermore, we delay the expansion of
+variables, in order to do proper type-checking.
+
+Consequently, we represent input variables with the following type
+(defined in `Hasura.GraphQL.Parser.Schema`):
+
+```haskell
+data InputValue v
+  = GraphQLValue (G.Value v)
+  | JSONValue J.Value
+```
+
+Whenever we need to inspect an input value, we usually start by "peeling the
+variable" (see `peelVariable` in `Hasura.GraphQL.Parser.Internal.TypeChecking`),
+to guarantee that we have an actual literal to inspect; we still end up with
+either a GraphQL value, or a JSON value; parsers usually deal with both, such as
+`nullable` (see section [The Parser Type](#the-parser-type)), or like scalar
+parsers do (see `Hasura.GraphQL.Parser.Internal.Scalars`):
+
+```haskell
+float :: MonadParse m => Parser 'Both m Double
+float = Parser
+  { pType = schemaType
+  , pParser =
+      -- we first unpack the variable, if any:
+      peelVariable (toGraphQLType schemaType) >=> \case
+        -- we deal with valid GraphQL values
+        GraphQLValue (VFloat f)   -> convertWith scientificToFloat f
+        GraphQLValue (VInt   i)   -> convertWith scientificToFloat $ fromInteger i
+        -- we deal with valid JSON values
+        JSONValue    (A.Number n) -> convertWith scientificToFloat n
+        -- we reject everything else
+        v                         -> typeMismatch floatScalar "a float" v
+  }
+  where
+    schemaType = NonNullable $ TNamed $ mkDefinition "Float" Nothing TIScalar
+```
+
+This allows us to incrementally unpack JSON values without having to fully
+transform them into GraphQL values in the first place; the following is
+therefore accepted:
+
+```
+# graphql
+query($w: foo_bool_exp!) {
+  foo(where: $w) {
+    id
+  }
+}
+
+# json
+{
+  "w": {
+    # graphql boolean expression
+    "foo_json_field": {
+      "_eq": {
+        # json value that cannot be translated
+        "1": "2"
+      }
+    }
+  }
+}
+```
+
+The parsers will unpack the variable into the `JSONValue` constructor, and the
+`object` combinator will unpack the fields when parsing a boolean expression;
+but the remaining `JSONValue $ Object [("1", String "2")]` will not be
+translated into a GraphQL value, and parsed directly from JSON by the
+appropriate value parser.
+
+Step-by step:
+- the value given to our `where` argument is a `GraphQLValue`, that
+  contains a `VVariable` and its JSON value;
+- when parsing the argument's `foo_bool_exp` type, we expect an
+  object: we "peel" the variable, and our input value becomes a
+  `JSONValue`, containing one entry, `"foo_json_field"`;
+- we then parse each field one by one; to parse our one field, we
+  first focus our input value on the actual field, and refine our
+  input value to the content of thee field: `JSONValue $ Object
+  [("_eq", Object [("1", String "2")])]`;
+- that field's argument is a boolean expression, which is also an
+  object: we repeat the same process;
+- when finally parsing the argument to `_eq`, we are no longer in the
+  realm of GraphQL syntax: the argument to `_eq` is whatever a value
+  of that database column is; we use the appropriate column parser to
+  interpret `{"1": "2"}` without treating is as a GraphQL value.
+
+### Recursion, and tying the knot
+
+One major hurdle that we face when building the schema is that, due to
+relationships, our schema is not a tree, but a graph. Consider for instance two
+tables, `author` and `article`, joined by two opposite relationships (one-to-one
+AKA "object relationship, and one-to-many AKA "array relationship",
+respectively); the GraphQL schema will end up with something akin to:
+
+```graphql
+type Author {
+  id: Int!
+  name: String!
+  articles: [Article!]!
+}
+
+type Article {
+  id: Int!
+  name: String!
+  author: Author!
+}
+```
+
+To build the schema parsers for a query that selects those tables, we are going to
+end up with code that would be essentially equivalent to:
+
+```haskell
+selectTable tableName tableInfo = do
+  arguments    <- tableArguments tableInfo
+  selectionSet <- traverse mkField $ fields tableInfo
+  pure $ selection tableName arguments selectionSet
+
+mkField = \case
+  TableColumn c  -> field_ (name c)
+  Relationship r -> field (name r) $ selectTable (target r)
+```
+
+We would end up with an infinite recursion building the parsers:
+```
+-> selectTable "author"
+  -> mkField "articles"
+    -> selectTable "article"
+      -> mkField "author"
+        -> selectTable "author"
+```
+
+To avoid this, we *memoize* the parsers as we build them. This is, however,
+quite tricky: since building a parser might require it knowing about itself, we
+cannot memoize it after it's build; we have to memoize it *before*. What we end
+up doing is relying on `unsafeInterleaveIO` to store in the memoization cache a
+value whose evaluation will be delayed, and that can be updated after we're done
+evaluating the parser. The relevant code lives in `Hasura.GraphQL.Parser.Monad`.
+
+This all works just fine as long as building a parser doesn't require forcing
+its own evaluation: as long as the newly built parser only references fields to
+itself, the graph will be properly constructed, and the knot will be tied.
+
+In practice, that means that the schema building code has *two layers of
+monads*: most functions in `Hasura.GraphQL.Schema.*`, that build parsers for
+various GraphQL types, return the constructed parser in an outer monad `m` which
+is an instance of `MonadSchema`; the parser itself operates in an inner monad
+`n`, which is an instance of `MonadParse`.
+
+### The GraphQL context
+
+It's in `Hasura.GraphQL.Schema` that we build the actual "context" that is
+stored in the [SchemaCache](#schema-cache): for each role we build the
+`query_root` and the `mutation_root` (if any) by going over each source's table
+cache and function cache. We do not have dedicated code for the subscription
+root: we reuse the appropriate subset of the query root to build the
+subscription root.