graphql-engine/v3/docs/architecture.md

# Architecture

## Project structure

The most important parts of the code from a server point of view are illustrated
here. Explanations of each are given below:

```
crates
├── open-dds
├── lang-graphql
│   ├── src
│   │   ├── ast
│   │   ├── normalized_ast
│   │   ├── lexer
│   │   ├── parser
│   │   ├── schema
│   │   ├── introspection
│   │   ├── validation
├── metadata-resolve
├── schema
│   ├── operations
│   ├── types
├── execute
├── engine
│   ├── bin
│   │   ├── engine
```

### `open-dds`

This crate contains the Open DDS metadata structure and an accessor library.
This metadata is used to specify the data models, permissions, connectors, and
essentially everything the engine needs to know about the project when it
starts.

### `lang-graphql`

This crate is an implementation of the GraphQL specification in Rust. It
provides types for the GraphQL AST, implements the lexer and parser, as well as
validation and introspection operations.

#### `lang-graphql/src/ast`

The raw GraphQL AST (abstract syntax tree) types that are emitted by the parser.

#### `lang-graphql/src/normalized_ast`

The normalized AST types. The raw AST can be validated and elaborated with
respect to a GraphQL schema, producing the normalized AST.

#### `lang-graphql/src/lexer`

Lexer that emits tokens (eg: String, Number, Punctuation) for a raw GraphQL
document string.

#### `lang-graphql/src/parser`

Parser for GraphQL documents (executable operations and schema documents).

#### `lang-graphql/src/schema`

Types to define a GraphQL schema.

#### `lang-graphql/src/introspection`

Provides schema and type introspection for GraphQL schemas.

#### `lang-graphql/src/validation`

Validates GraphQL requests vs a schema, and produces normalized ASTs, which
contain additional relevant data from the schema.

##### `metadata-resolve`

Resolves and validates the input Open DDS metadata and creates intermediate
structures that are used in the `engine` crate for schema generation.

##### `schema`

Provides functions to resolve the Open DDS metadata, generate the GraphQL scehma
from it, and execute queries against the schema.

##### `schema/operations`

Contains the logic to define and execute the operations that would be defined by
the Open DDS spec.

Technically, these are fields of the `query_root`, `subscription_root` or
`mutation_root` and as such can be defined in `schema::types::*_root` module.
However, this separation makes it easier to organize them (for example
`subscription_root` can also import the same set of operations).

Each module under `operations` would roughly define the following:

- IR: To capture the specified operation.
- Logic to generate schema for the given operation using data from resolved
  metadata.
- Logic to parse a normalized field from the request into the defined IR format.
- Logic to execute the operation.

##### `schema/types`

TODO: This is a bit outdated, so we should fix this.

Contains one module for each GraphQL type that we generate in the schema. For
example:

- `model_selection`: An object type for selecting fields from a table (eg: type
  of the table_by_pk field in query_root).
- `query_root`: An object type that represents the entry point for all queries.

Each module under `types` defines the following:

- IR: A container for a value of the type.
- Logic to generate schema for the given type using data from resolved metadata.
- Logic to parse a normalized object (selection set or input value) from the
  request into the defined IR format.

### `execute`

Responsible for the core operation of the engine in the context of a user
provided metadata, including the web server, requests processing, executing
requests, etc.

#### `engine/bin`

Entry point to the program. The executable takes in a metadata file and starts
the v3 engine according to that file.

## Design Principles

### Separation of concerns: Open DDS vs NDC

NDC (formerly known as GDC) was introduced in v2, primarily as a way to improve
development speed of new backends, but it also had several ancillary benefits
which can be attributed to the separation of concerns between the NDC agent and
the API engine.

In v3, we will work exclusively against the NDC abstraction to access data in
databases.

There will be a separation between Open DDS (the metadata which the user
provides to describe the their data models and APIs), the v3 engine which
implements the specification (as GDS), and NDC which provides access to data.

### Server should start reliably and instantly

A major problem in v2 was that the construction of a schema from the user’s
metadata was slow, and could fail for several reasons (e.g. a database might
have been unavailable). This meant that the server could fail to come back up
after a restart, or replicas could end up with subtly different versions of the
metadata.

In v3, the schema will be completely and uniquely determined by the Open DDS
metadata.

NDC can be unavailable, or its schema differ from what is in the Open DDS
metadata. These are fine, because the schema is determined only by the Open DDS
metadata.

In fact, it is useful to allow these cases, because they will allow different
deployment workflows in which the Open DDS metadata is updated before a database
migration, for example.

### Open DDS: configuration over convention

In v2, there were several conventions baked into the construction of the schema
from metadata.

E.g. all table root fields were named after the database table by default, or
could be renamed after the fact in metadata. However, this meant that we had to
prefix table names when we added new databases, in case their default names
overlapped.

Several other type names and root field names have defaults in v2 metadata.

V3 adopts the principle that Open DDS metadata will be explicit about everything
needed to determine the schema, so that no overlaps can occur if the data
connector schema changes.

Open DDS metadata in general will favor configuration over convention
everywhere, and any conventions that we want to add to improve the user
experience should be implemented in the CLI or console instead.
-												create auth and utils subdirectories in crates, and move architecture information to a separate doc. (#534)

## Description

1. I've moved the architecture information we had in `CONTRIBUTING.md`
to a separate document `docs/architecture.md` so we can evolve both
separately in the future.
2. I've introduced a couple of sub directories: `utils` and `auth`, for
supporting crates that are not the core functionality of the engine so
it is easier to find the most relevant crates.

New structure:

```
crates
├── auth
│   ├── dev-auth-webhook
│   ├── hasura-authn-core
│   ├── hasura-authn-jwt
│   └── hasura-authn-webhook
├── custom-connector
├── engine
├── lang-graphql
├── metadata-schema-generator
├── open-dds
└── utils
    ├── opendds-derive
    ├── recursion_limit_macro
    └── tracing-util
```

V3_GIT_ORIGIN_REV_ID: e0e9394da2fcd911f329c48107a76f8492fa304c

											
										
										
											2024-05-01 12:04:24 +03:00
+								# Architecture
 								## Project structure
 								The most important parts of the code from a server point of view are illustrated
 								here. Explanations of each are given below:
 								```
 								crates
 								├── open-dds
 								├── lang-graphql
 								│   ├── src
 								│   │   ├── ast
 								│   │   ├── normalized_ast
 								│   │   ├── lexer
 								│   │   ├── parser
 								│   │   ├── schema
 								│   │   ├── introspection
 								│   │   ├── validation
-												Split `metadata-resolve` into own crate (#543)

<!-- Thank you for submitting this PR! :) -->

## Description

Now that metadata resolve has a clear interface with the rest of the
engine, let's take it out into it's own crate. This will make it easier
to maintain a strong boundary between things.

To simplify imports etc, removed nested layers of modules, so now we
import `use metadata_resolve::Qualified` instead of `use
crate::metadata::resolved::Qualified`.

The changes in `engine` crate are all just updating imports.

Functional no-op.

V3_GIT_ORIGIN_REV_ID: fb94304f7ed8883287c18bd6870045dfd69e3fe3

											
										
										
											2024-05-02 15:28:27 +03:00
+								├── metadata-resolve
-												Split `schema` into own crate (#556)

<!-- Thank you for submitting this PR! :) -->

## Description

This PR splits the GraphQL schema generation into the `schema` crate.
Functional no-op.

V3_GIT_ORIGIN_REV_ID: 4f1a91387305d88e9b5fbe4bc8df0575292cf878

											
										
										
											2024-05-08 15:04:39 +03:00
+								├── schema
 								│   ├── operations
 								│   ├── types
-												Split out `execute` crate (#588)

<!-- Thank you for submitting this PR! :) -->

## Description

Following `metadata-resolve` and `schema` crates, this splits out
`execute`, the largest folder in `engine`. Undoubtedly this could be
split further.

Functional no-op.

V3_GIT_ORIGIN_REV_ID: c272908153f78212d1f5dd58819707ac3cbcd439

											
										
										
											2024-05-17 17:41:32 +03:00
+								├── execute
-												create auth and utils subdirectories in crates, and move architecture information to a separate doc. (#534)

## Description

1. I've moved the architecture information we had in `CONTRIBUTING.md`
to a separate document `docs/architecture.md` so we can evolve both
separately in the future.
2. I've introduced a couple of sub directories: `utils` and `auth`, for
supporting crates that are not the core functionality of the engine so
it is easier to find the most relevant crates.

New structure:

```
crates
├── auth
│   ├── dev-auth-webhook
│   ├── hasura-authn-core
│   ├── hasura-authn-jwt
│   └── hasura-authn-webhook
├── custom-connector
├── engine
├── lang-graphql
├── metadata-schema-generator
├── open-dds
└── utils
    ├── opendds-derive
    ├── recursion_limit_macro
    └── tracing-util
```

V3_GIT_ORIGIN_REV_ID: e0e9394da2fcd911f329c48107a76f8492fa304c

											
										
										
											2024-05-01 12:04:24 +03:00
+								├── engine
 								│   ├── bin
 								│   │   ├── engine
 								```
 								### `open-dds`
 								This crate contains the Open DDS metadata structure and an accessor library.
 								This metadata is used to specify the data models, permissions, connectors, and
 								essentially everything the engine needs to know about the project when it
 								starts.
 								### `lang-graphql`
 								This crate is an implementation of the GraphQL specification in Rust. It
 								provides types for the GraphQL AST, implements the lexer and parser, as well as
 								validation and introspection operations.
 								#### `lang-graphql/src/ast`
 								The raw GraphQL AST (abstract syntax tree) types that are emitted by the parser.
 								#### `lang-graphql/src/normalized_ast`
 								The normalized AST types. The raw AST can be validated and elaborated with
 								respect to a GraphQL schema, producing the normalized AST.
 								#### `lang-graphql/src/lexer`
 								Lexer that emits tokens (eg: String, Number, Punctuation) for a raw GraphQL
 								document string.
 								#### `lang-graphql/src/parser`
 								Parser for GraphQL documents (executable operations and schema documents).
 								#### `lang-graphql/src/schema`
 								Types to define a GraphQL schema.
 								#### `lang-graphql/src/introspection`
 								Provides schema and type introspection for GraphQL schemas.
 								#### `lang-graphql/src/validation`
 								Validates GraphQL requests vs a schema, and produces normalized ASTs, which
 								contain additional relevant data from the schema.
-												Split `metadata-resolve` into own crate (#543)

<!-- Thank you for submitting this PR! :) -->

## Description

Now that metadata resolve has a clear interface with the rest of the
engine, let's take it out into it's own crate. This will make it easier
to maintain a strong boundary between things.

To simplify imports etc, removed nested layers of modules, so now we
import `use metadata_resolve::Qualified` instead of `use
crate::metadata::resolved::Qualified`.

The changes in `engine` crate are all just updating imports.

Functional no-op.

V3_GIT_ORIGIN_REV_ID: fb94304f7ed8883287c18bd6870045dfd69e3fe3

											
										
										
											2024-05-02 15:28:27 +03:00
+								##### `metadata-resolve`
 								Resolves and validates the input Open DDS metadata and creates intermediate
 								structures that are used in the `engine` crate for schema generation.
-												Split `schema` into own crate (#556)

<!-- Thank you for submitting this PR! :) -->

## Description

This PR splits the GraphQL schema generation into the `schema` crate.
Functional no-op.

V3_GIT_ORIGIN_REV_ID: 4f1a91387305d88e9b5fbe4bc8df0575292cf878

											
										
										
											2024-05-08 15:04:39 +03:00
+								##### `schema`
-												create auth and utils subdirectories in crates, and move architecture information to a separate doc. (#534)

## Description

1. I've moved the architecture information we had in `CONTRIBUTING.md`
to a separate document `docs/architecture.md` so we can evolve both
separately in the future.
2. I've introduced a couple of sub directories: `utils` and `auth`, for
supporting crates that are not the core functionality of the engine so
it is easier to find the most relevant crates.

New structure:

```
crates
├── auth
│   ├── dev-auth-webhook
│   ├── hasura-authn-core
│   ├── hasura-authn-jwt
│   └── hasura-authn-webhook
├── custom-connector
├── engine
├── lang-graphql
├── metadata-schema-generator
├── open-dds
└── utils
    ├── opendds-derive
    ├── recursion_limit_macro
    └── tracing-util
```

V3_GIT_ORIGIN_REV_ID: e0e9394da2fcd911f329c48107a76f8492fa304c

											
										
										
											2024-05-01 12:04:24 +03:00
 								Provides functions to resolve the Open DDS metadata, generate the GraphQL scehma
 								from it, and execute queries against the schema.
-												Split `schema` into own crate (#556)

<!-- Thank you for submitting this PR! :) -->

## Description

This PR splits the GraphQL schema generation into the `schema` crate.
Functional no-op.

V3_GIT_ORIGIN_REV_ID: 4f1a91387305d88e9b5fbe4bc8df0575292cf878

											
										
										
											2024-05-08 15:04:39 +03:00
+								##### `schema/operations`
-												create auth and utils subdirectories in crates, and move architecture information to a separate doc. (#534)

## Description

1. I've moved the architecture information we had in `CONTRIBUTING.md`
to a separate document `docs/architecture.md` so we can evolve both
separately in the future.
2. I've introduced a couple of sub directories: `utils` and `auth`, for
supporting crates that are not the core functionality of the engine so
it is easier to find the most relevant crates.

New structure:

```
crates
├── auth
│   ├── dev-auth-webhook
│   ├── hasura-authn-core
│   ├── hasura-authn-jwt
│   └── hasura-authn-webhook
├── custom-connector
├── engine
├── lang-graphql
├── metadata-schema-generator
├── open-dds
└── utils
    ├── opendds-derive
    ├── recursion_limit_macro
    └── tracing-util
```

V3_GIT_ORIGIN_REV_ID: e0e9394da2fcd911f329c48107a76f8492fa304c

											
										
										
											2024-05-01 12:04:24 +03:00
 								Contains the logic to define and execute the operations that would be defined by
 								the Open DDS spec.
 								Technically, these are fields of the `query_root`, `subscription_root` or
 								`mutation_root` and as such can be defined in `schema::types::*_root` module.
 								However, this separation makes it easier to organize them (for example
 								`subscription_root` can also import the same set of operations).
 								Each module under `operations` would roughly define the following:
 								- IR: To capture the specified operation.
 								- Logic to generate schema for the given operation using data from resolved
 								  metadata.
 								- Logic to parse a normalized field from the request into the defined IR format.
 								- Logic to execute the operation.
-												Split `schema` into own crate (#556)

<!-- Thank you for submitting this PR! :) -->

## Description

This PR splits the GraphQL schema generation into the `schema` crate.
Functional no-op.

V3_GIT_ORIGIN_REV_ID: 4f1a91387305d88e9b5fbe4bc8df0575292cf878

											
										
										
											2024-05-08 15:04:39 +03:00
+								##### `schema/types`
-												create auth and utils subdirectories in crates, and move architecture information to a separate doc. (#534)

## Description

1. I've moved the architecture information we had in `CONTRIBUTING.md`
to a separate document `docs/architecture.md` so we can evolve both
separately in the future.
2. I've introduced a couple of sub directories: `utils` and `auth`, for
supporting crates that are not the core functionality of the engine so
it is easier to find the most relevant crates.

New structure:

```
crates
├── auth
│   ├── dev-auth-webhook
│   ├── hasura-authn-core
│   ├── hasura-authn-jwt
│   └── hasura-authn-webhook
├── custom-connector
├── engine
├── lang-graphql
├── metadata-schema-generator
├── open-dds
└── utils
    ├── opendds-derive
    ├── recursion_limit_macro
    └── tracing-util
```

V3_GIT_ORIGIN_REV_ID: e0e9394da2fcd911f329c48107a76f8492fa304c

											
										
										
											2024-05-01 12:04:24 +03:00
 								TODO: This is a bit outdated, so we should fix this.
 								Contains one module for each GraphQL type that we generate in the schema. For
 								example:
 								- `model_selection`: An object type for selecting fields from a table (eg: type
 								  of the table_by_pk field in query_root).
 								- `query_root`: An object type that represents the entry point for all queries.
 								Each module under `types` defines the following:
 								- IR: A container for a value of the type.
 								- Logic to generate schema for the given type using data from resolved metadata.
 								- Logic to parse a normalized object (selection set or input value) from the
 								  request into the defined IR format.
-												Split out `execute` crate (#588)

<!-- Thank you for submitting this PR! :) -->

## Description

Following `metadata-resolve` and `schema` crates, this splits out
`execute`, the largest folder in `engine`. Undoubtedly this could be
split further.

Functional no-op.

V3_GIT_ORIGIN_REV_ID: c272908153f78212d1f5dd58819707ac3cbcd439

											
										
										
											2024-05-17 17:41:32 +03:00
+								### `execute`
-												Split `schema` into own crate (#556)

<!-- Thank you for submitting this PR! :) -->

## Description

This PR splits the GraphQL schema generation into the `schema` crate.
Functional no-op.

V3_GIT_ORIGIN_REV_ID: 4f1a91387305d88e9b5fbe4bc8df0575292cf878

											
										
										
											2024-05-08 15:04:39 +03:00
 								Responsible for the core operation of the engine in the context of a user
 								provided metadata, including the web server, requests processing, executing
 								requests, etc.
 								#### `engine/bin`
 								Entry point to the program. The executable takes in a metadata file and starts
 								the v3 engine according to that file.
-												create auth and utils subdirectories in crates, and move architecture information to a separate doc. (#534)

## Description

1. I've moved the architecture information we had in `CONTRIBUTING.md`
to a separate document `docs/architecture.md` so we can evolve both
separately in the future.
2. I've introduced a couple of sub directories: `utils` and `auth`, for
supporting crates that are not the core functionality of the engine so
it is easier to find the most relevant crates.

New structure:

```
crates
├── auth
│   ├── dev-auth-webhook
│   ├── hasura-authn-core
│   ├── hasura-authn-jwt
│   └── hasura-authn-webhook
├── custom-connector
├── engine
├── lang-graphql
├── metadata-schema-generator
├── open-dds
└── utils
    ├── opendds-derive
    ├── recursion_limit_macro
    └── tracing-util
```

V3_GIT_ORIGIN_REV_ID: e0e9394da2fcd911f329c48107a76f8492fa304c

											
										
										
											2024-05-01 12:04:24 +03:00
+								## Design Principles
 								### Separation of concerns: Open DDS vs NDC
 								NDC (formerly known as GDC) was introduced in v2, primarily as a way to improve
 								development speed of new backends, but it also had several ancillary benefits
 								which can be attributed to the separation of concerns between the NDC agent and
 								the API engine.
 								In v3, we will work exclusively against the NDC abstraction to access data in
 								databases.
 								There will be a separation between Open DDS (the metadata which the user
 								provides to describe the their data models and APIs), the v3 engine which
 								implements the specification (as GDS), and NDC which provides access to data.
 								### Server should start reliably and instantly
 								A major problem in v2 was that the construction of a schema from the user’s
 								metadata was slow, and could fail for several reasons (e.g. a database might
 								have been unavailable). This meant that the server could fail to come back up
 								after a restart, or replicas could end up with subtly different versions of the
 								metadata.
 								In v3, the schema will be completely and uniquely determined by the Open DDS
 								metadata.
 								NDC can be unavailable, or its schema differ from what is in the Open DDS
 								metadata. These are fine, because the schema is determined only by the Open DDS
 								metadata.
 								In fact, it is useful to allow these cases, because they will allow different
 								deployment workflows in which the Open DDS metadata is updated before a database
 								migration, for example.
 								### Open DDS: configuration over convention
 								In v2, there were several conventions baked into the construction of the schema
 								from metadata.
 								E.g. all table root fields were named after the database table by default, or
 								could be renamed after the fact in metadata. However, this meant that we had to
 								prefix table names when we added new databases, in case their default names
 								overlapped.
 								Several other type names and root field names have defaults in v2 metadata.
 								V3 adopts the principle that Open DDS metadata will be explicit about everything
 								needed to determine the schema, so that no overlaps can occur if the data
 								connector schema changes.
 								Open DDS metadata in general will favor configuration over convention
 								everywhere, and any conventions that we want to add to improve the user
 								experience should be implemented in the CLI or console instead.