graphql-engine/rfcs/identity-columns.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

140 lines
5.7 KiB
Markdown
Raw Permalink Normal View History

# Handling of Identity Columns
## Metadata
```
---
authors: Philip Lykke Carlsen <philip@hasura.io>
discussion:
https://github.com/hasura/graphql-engine-mono/issues/2407
https://github.com/hasura/graphql-engine-mono/pull/2507
state: pending answers to unresolved questions
---
```
## Description
This RFC collects discussion and decisions on how we want Identity Columns to
work in the GraphQL Engine.
## Problem
Identity Columns are an SQL standard database feature that attempts to solve
the problem of generating row identifiers in a more sound way than naive
auto-incrementing columns. This works by imposing restrictions on how such
columns may be updated.
This means that, in order for the GraphQL engine to correctly deal with tables
that have identity columns it has to observe these restrictions, specifically
when updating and inserting.
It is also possible to sometimes override the constraints imposed by Identity
Columns, and we need to decide what we want to support and how we want to
support it.
## Available options
Overall, there are two flavors of identity columns we may encounter:
* (Postgres only) Identity columns declared `GENERATED BY DEFAULT AS IDENTITY`
work just like regular `SERIAL` columns and impose no further constraints.
We can view these as being identity-columns-in-name-only.
* The more "true" Identity Columns, supported by both MSSQL and PostgreSQL, are not
updatable and only insertable using and override mechanism:
* In MSSQL, a column declared `IDENTITY(..)` may be inserted into only when `SET
IDENTITY_INSERT` is applied to that table.
* In Postgres, a column declared `GENERATED ALWAYS AS IDENTITY`
may be inserted into by giving the clause `OVERRIDING SYSTEM VALUE` in an
`INSERT` statement.
**We need to decide how/when/if we want to expose the overriding mechanism in
our GraphQL API** (see the Unresolved Questions section below).
## How
Implementing the handling of identity columns should apply the architecture
described in [Column Mutability](/rfcs/column-mutability.md).
If we go with the non-overriding policy described above there should not be
any changes necessary to SQL translation for either MSSQL or PostgreSQL.
The only necessary change then ought to be amending the table metadata
extraction (for both MSSQL and PostgreSQL) to identify identity columns
and set column mutability accordingly (i.e. not insertable, not updatable).
## Unresolved Questions
_When, if ever, should we make use of the constraints overriding mechanisms
described above? Do we want to never override? Always? Make it configurable?_
Note that:
* Column Mutability guides us for how to implement the schema generation aspects
of either choice (of "non-overriding" vs "overriding")
* Leaving this unanswered does not block implementation of basically correct
handling of identity columns.
* But the implementation will have to make an (arbitrary) choice between the two.
A reasonable choice would be to select "non-overriding".
* We don't expect any complications to result from amending the implementation
at a later point in time.
## Appendix
The purpose of this appendix is to collect relevant information on the concept
of _Identity Columns_ and inform the implementation of GraphQL Engine.
* Part of the SQL standard.
* Motivation is to standardise DB-supplied identifiers (i.e. autoincrement/serial/..)
* Note: This is a concept distinct from primary keys. Identity Columnss don't introduce
uniqueness constraints by themselves!
* Also provide better semantics than naive auto-increment/serial solutions, by
prohibiting updating and inserting of Identity Columns (to an extent), in order to
avoid issue where auto-increment logic produces duplicates because conflicts
with manual inserts/updates.
* Interestingly, no-one seems to actually link to the standard they implement from.
* Implemented in PG, MSSQL, DB2 and Oracle (also Oracle-NoSQL, ironically)
* Not implemented in MySQL or SQLite
* Introduces some complications/extra coordination for replication/backup.
In a sentence:
> Identity columns are immutable, sequentially distinct values
> provided only by the DBMS
### MSSQL semantics
[MSSQL TSQL Identity Columns](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-ver15)
* Possible to `INSERT` values for Identity Columns, but guarded by a `SET INSERT_IDENTITY <tablename> ON` statement.
* Impossible to `UPDATE` values for Identity Columns.
* Syntax differs from SQL standard: `column IDENTITY(type, seed, increment)`.
### PostgreSQL Semantics
[PG Create table syntax (including GENERATED)](https://www.postgresql.org/docs/devel/sql-createtable.html)
* Syntax closer to SQL standard: `column GENERATED BY DEFAULT AS IDENTITY`, `column GENERATED ALWAYS AS IDENTITY`.
* Implemented on top of `series`.
* Columns `GENERATED BY DEFAULT` may be both `INSERT`ed and and `UPDATE`d.
* Columns `GENERATED ALWAYS` may be `INSERT`ed (guarded by an `OVERRIDE SYSTEM VALUE` keyword), but never `UPDATE`d.
### Links
[Don't use serial](https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_serial):
> For new applications, identity columns should be used instead.
>
> Why not?
>
> The serial types have some weird behaviors that make schema, dependency, and permission management unnecessarily cumbersome.
[SE: pg serial vs identity](https://stackoverflow.com/questions/55300370/postgresql-serial-vs-identity)
[Implementers blog post](https://www.2ndquadrant.com/en/blog/postgresql-10-identity-columns/)
[Technical details blog post](https://www.depesz.com/2017/04/10/waiting-for-postgresql-10-identity-columns/)
[Wikipedia: Identity Columns](https://en.wikipedia.org/wiki/Identity_column)