graphql-engine/server/documentation/overview.md
Antoine Leblanc 8a999bd44d Structural updates to the server's engineering documentation
### Description

This PR does several things:
- it cleans up some structural issues with the engineering documentation:
  - it harmonizes the table of contents structure across different files
  - adds a link to the bigquery documentation
  - moves some files to a new `deep-dives` subfolder
  - puts a title at the top of each page to avoid github assuming their title is "table of contents"
- it pre-fills the glossary with a long list of words that could use an entry (all empty for now)
- it adds the only remaining relevant server file from [hasura-internal's wiki](https://github.com/hasura/graphql-engine-internal/wiki): the old "multiple backends architecture" file

### Discussion

A few things worth discussing in the scope of this PR:
- is it worth migrating old documentation such as the multiple backends architecture, that document a decision process rather instead of being up-to-date reflections of the code? are we planning to delete hasura-internal?
- should we focus instead on _new_ documentation, aimed to be kept up to date?
- are there other old documents we want to move in here, or is that it?
- is this glossary structure ok, or would a purely alphabetical structure make sense?
- does it make sense to have the glossary only in the engineering section? more generally, _what's our broader plan for documentation_?

PR-URL: https://github.com/hasura/graphql-engine-mono/pull/4537
GitOrigin-RevId: c2b674657b19af7a75f66a2a304c80c30f5b0afb
2022-05-30 09:46:06 +00:00

8.1 KiB

Overview

Table of contents

High-level architecture overview

This diagram shows the high-level components of the system.

high-level architecture overview

Schema cache

The schema cache is a live, annotated copy of our metadata that lives in the Haskell process. It is defined in Hasura.RQL.Types.SchemaCache. It contains a SourceInfo for each source that has been added (see Hasura.RQL.Types.Source), which contains all that source's info, such as information on all tracked tables, tracked functions, connection information... It also contains information about tracked remote schemas.

Most importantly, it contains the cached version of the schema parsers. More information in the Metadata section.

Schema parsers

We have a unified piece of code that is used to generate the GraphQL schema and its corresponding parsers at the same time from information in the schema cache (see our schema cache documentation (TODO: add link) for more information). That code lives in Hasura/GraphQL/Schema. The result is stored back in the schema cache, as a hashmap from role name to the corresponding "context" (its set of parsers).

The parsers themselves transform the incoming GraphQL query into an intermediary representation, defined in Hasura.RQL.IR.

Query execution

When a graphql query is received, it hits the transport layer (HTTP or Websocket, see Hasura.GraphQL.Transport). Based on the user's role, the correct parser is taken from the schema cache, and applied to the query, to translate into the IR. We treat each "root field" independently (see RootField in Hasura.RQL.IR.Root).

From the IR, we can generate an execution plan: the set of monadic actions that correspond to each root field; that code lives in Hasura.GraphQL.Execute. After that, the transport layer, which received and translated the query, runs the generated actions and joins the result.

High-level code structure

The code structure isn't perfect and has suffered from the heavy changes made to the codebase over time. Here's a rough outline, that points to both the current structure and, to some extent, what the structure should be.

Non-Hasura code

Code outside of the Hasura folder (like in Control and Data) are standalone libraries that could be open-sourced as they are not specific to our project, such as Control.Arrow.Trans. Most commonly, we extend exiting third party libraries to add useful missing functions; see for instance:

  • Data.Text.Extended
  • Control.Concurrent.Extended

As a rule, those "Extended" modules should follow the following guidelines:

  • they re-export the original module they extend, so that the rest of the code can always import the extended library
  • they do not depend on any Hasura with the (arguable) exception of the Prelude

Hasura.Base

The goal of Hasura.Base is to be the place where all "base" code, that do not depend on any other Hasura code but the Prelude, but that is specific enough that it doesn't belong outside of the Hasura. It currently contains, for instance:

  • Hasura.Base.Error, which defines our error-handling code, and the one error sum type we use throughout the entire codebase
  • Hasura.Base.Instances, which defines all missing instances from third-party code

More code will be moved in it as we make progress. For instance, tracing and logging could belong in Base, as they are "base" pieces of code that we want the entire codebase to have access to.

Hasura.Server

Hasura.Server contains all of the network stack, the initialisation code, and the endpoints' routes. APIs are defined in Hasura.Server.API, authentication code in Hasura.Server.Auth.

Hasura.Backends

Most of the code in Hasura.GraphQL is generic, and expressed in a backend-agnostic way. The code for each backend lives in that backend's Hasura.Backends.[Name] folder. More details in our backend architecture documentation (TODO: insert link here).

Hasura.RQL

Before the graphql-engine was the graphql-engine, it supported a JSON-based API called RQL. A lot of the code still lives in the RQL "primordial soup".

Hasura.RQL.DML

This DML is the aforementioned JSON API: this folder contains the data structures that represent old-style Postgres-only table level Select/Insert/Update/Delete requests. This API is deprecated, not used by any of our users to our knowledge, and is only kept in the code because the console still uses it.

Hasura.RQL.IR

This is our intermediary representation, into which an incoming query (RQL or GraphQL) gets translated.

Hasura.RQL.DDL

This is where our Metadata API's code lives: the API, as described by Hasura.Server.API.Metadata.RQLMetadataV1, is routed to corresponding functions in Hasura.RQL.DDL.

Hasura.RQL.Types

This is a somewhat outdated folder that we need to break down in individual components. As of now, this folder contains all the "base types" of our codebase and, despite the folder's name, a lot of the code of the engine's core behaviour. This includes, among others, most of our metadata code: our representation types (how do we represent information about tables or columns for instance) and some of the metadata building process (including all code related to the internal dependency graph of metadata objects).

Hasura.GraphQL

This folder contains most of the GraphQL API stack; specifically:

Hasura.GraphQL.Parser

This folder is where the schema building combinators are defined. It is intended as an almost standalone library, see the Schema Building section, used by our schema code.

Hasura.GraphQL.Schema

In turn, this is where the aforementioned parsers are used to build the GraphQL schema from our metadata. The folder itself contains all the individual components of our schema: how to construct a parser for a table's selection set, for a remote schema's input arguments, and so on. It's in the root file, Hasura/GraphQL/Schema.hs, that they're all put together to build the schema; see more information about this in our schema documentation.

Hasura.GraphQL.Execute

While most of the actual translation from the IR into a SQL query string is backend-specific, the top-level processing of GraphQL requests (be they Queries, Mutations or subscriptions) is backend agnostic and lives in GraphQL.Execute. At the time of this writing it also still contains Postgres-specific code that has yet to be generalized.

Furthermore, it contains all code related to the execution of queries targeting remote schemas, and high-level code for remote joins.

Hasura.GraphQL.Transport

The code in this folder is both the actual entry point of query processing (Transport calls Execute which calls the parsers), and the actual execution of the root fields: in each file (HTTP and WebSocket) is where there's the long case switch of how to execute each step over the network.