graphql-engine/rfcs/identity-columns.md

# Handling of Identity Columns

## Metadata

```
---
authors: Philip Lykke Carlsen <philip@hasura.io>
discussion:
  https://github.com/hasura/graphql-engine-mono/issues/2407
  https://github.com/hasura/graphql-engine-mono/pull/2507
state: pending answers to unresolved questions
---
```

## Description

This RFC collects discussion and decisions on how we want Identity Columns to
work in the GraphQL Engine.

## Problem

Identity Columns are an SQL standard database feature that attempts to solve
the problem of generating row identifiers in a more sound way than naive
auto-incrementing columns. This works by imposing restrictions on how such
columns may be updated.

This means that, in order for the GraphQL engine to correctly deal with tables
that have identity columns it has to observe these restrictions, specifically
when updating and inserting.

It is also possible to sometimes override the constraints imposed by Identity
Columns, and we need to decide what we want to support and how we want to
support it.

## Available options

Overall, there are two flavors of identity columns we may encounter:

* (Postgres only) Identity columns declared `GENERATED BY DEFAULT AS IDENTITY`
  work just like regular `SERIAL` columns and impose no further constraints.
  We can view these as being identity-columns-in-name-only.

* The more "true" Identity Columns, supported by both MSSQL and PostgreSQL, are not
  updatable and only insertable using and override mechanism:
  * In MSSQL, a column declared `IDENTITY(..)` may be inserted into only when `SET
    IDENTITY_INSERT` is applied to that table.
  * In Postgres, a column declared `GENERATED ALWAYS AS IDENTITY`
    may be inserted into by giving the clause `OVERRIDING SYSTEM VALUE` in an
    `INSERT` statement.

**We need to decide how/when/if we want to expose the overriding mechanism in
our GraphQL API** (see the Unresolved Questions section below).

## How

Implementing the handling of identity columns should apply the architecture
described in [Column Mutability](/rfcs/column-mutability.md).

If we go with the non-overriding policy described above there should not be
any changes necessary to SQL translation for either MSSQL or PostgreSQL.

The only necessary change then ought to be amending the table metadata
extraction (for both MSSQL and PostgreSQL) to identify identity columns
and set column mutability accordingly (i.e. not insertable, not updatable).

## Unresolved Questions

_When, if ever, should we make use of the constraints overriding mechanisms
  described above? Do we want to never override? Always? Make it configurable?_

Note that:
* Column Mutability guides us for how to implement the schema generation aspects
  of either choice (of "non-overriding" vs "overriding")
* Leaving this unanswered does not block implementation of basically correct
  handling of identity columns.
* But the implementation will have to make an (arbitrary) choice between the two.
  A reasonable choice would be to select "non-overriding".
* We don't expect any complications to result from amending the implementation
  at a later point in time.


## Appendix

The purpose of this appendix is to collect relevant information on the concept
of _Identity Columns_ and inform the implementation of GraphQL Engine.

* Part of the SQL standard.
* Motivation is to standardise DB-supplied identifiers (i.e. autoincrement/serial/..)
  * Note: This is a concept distinct from primary keys. Identity Columnss don't introduce
    uniqueness constraints by themselves!
  * Also provide better semantics than naive auto-increment/serial solutions, by
    prohibiting updating and inserting of Identity Columns (to an extent), in order to
    avoid issue where auto-increment logic produces duplicates because conflicts
    with manual inserts/updates.
  * Interestingly, no-one seems to actually link to the standard they implement from.
* Implemented in PG, MSSQL, DB2 and Oracle (also Oracle-NoSQL, ironically)
* Not implemented in MySQL or SQLite
* Introduces some complications/extra coordination for replication/backup.
    
In a sentence:
> Identity columns are immutable, sequentially distinct values
> provided only by the DBMS

### MSSQL semantics

[MSSQL TSQL Identity Columns](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-ver15)

* Possible to `INSERT` values for Identity Columns, but guarded by a `SET INSERT_IDENTITY <tablename> ON` statement.
* Impossible to `UPDATE` values for Identity Columns.
* Syntax differs from SQL standard: `column IDENTITY(type, seed, increment)`.

### PostgreSQL Semantics

[PG Create table syntax (including GENERATED)](https://www.postgresql.org/docs/devel/sql-createtable.html)

* Syntax closer to SQL standard: `column GENERATED BY DEFAULT AS IDENTITY`, `column GENERATED ALWAYS AS IDENTITY`.
* Implemented on top of `series`.
* Columns `GENERATED BY DEFAULT` may be both `INSERT`ed and and `UPDATE`d.
* Columns `GENERATED ALWAYS` may be `INSERT`ed (guarded by an `OVERRIDE SYSTEM VALUE` keyword), but never `UPDATE`d.


### Links

[Don't use serial](https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_serial):
> For new applications, identity columns should be used instead.
>
> Why not?
> 
> The serial types have some weird behaviors that make schema, dependency, and permission management unnecessarily cumbersome.

[SE: pg serial vs identity](https://stackoverflow.com/questions/55300370/postgresql-serial-vs-identity)

[Implementers blog post](https://www.2ndquadrant.com/en/blog/postgresql-10-identity-columns/)

[Technical details blog post](https://www.depesz.com/2017/04/10/waiting-for-postgresql-10-identity-columns/)
    
[Wikipedia: Identity Columns](https://en.wikipedia.org/wiki/Identity_column)
RFC: Column Mutability This PR contains two pull requests: [Identity Columns](https://github.com/hasura/graphql-engine-mono/blob/rfc/identity-columns/rfcs/identity-columns.md) collects information and product decisions about identity columns. There are some decisions we need to make explicit. [Column Mutability](https://github.com/hasura/graphql-engine-mono/blob/rfc/identity-columns/rfcs/column-mutability.md) proposes an implementation strategy for identity columns and similar that should be able to elegantly accommodate differences among backends. The idea is to model the notion of _column mutability_ rather than e.g. identity columns directly. Please volunteer your opinions and perspectives on these topics in the PR comments. --- Closes #2407 PR-URL: https://github.com/hasura/graphql-engine-mono/pull/2507 GitOrigin-RevId: 5eb14a53504985fd32933c182bee4cc13bb70a02 2021-11-02 16:42:03 +03:00			`# Handling of Identity Columns`

			`## Metadata`

			```
			`---`
			`authors: Philip Lykke Carlsen <philip@hasura.io>`
			`discussion:`
			`https://github.com/hasura/graphql-engine-mono/issues/2407`
			`https://github.com/hasura/graphql-engine-mono/pull/2507`
			`state: pending answers to unresolved questions`
			`---`
			```

			`## Description`

			`This RFC collects discussion and decisions on how we want Identity Columns to`
			`work in the GraphQL Engine.`

			`## Problem`

			`Identity Columns are an SQL standard database feature that attempts to solve`
			`the problem of generating row identifiers in a more sound way than naive`
			`auto-incrementing columns. This works by imposing restrictions on how such`
			`columns may be updated.`

			`This means that, in order for the GraphQL engine to correctly deal with tables`
			`that have identity columns it has to observe these restrictions, specifically`
			`when updating and inserting.`

			`It is also possible to sometimes override the constraints imposed by Identity`
			`Columns, and we need to decide what we want to support and how we want to`
			`support it.`

			`## Available options`

			`Overall, there are two flavors of identity columns we may encounter:`

			* (Postgres only) Identity columns declared `GENERATED BY DEFAULT AS IDENTITY`
			work just like regular `SERIAL` columns and impose no further constraints.
			`We can view these as being identity-columns-in-name-only.`

			`* The more "true" Identity Columns, supported by both MSSQL and PostgreSQL, are not`
			`updatable and only insertable using and override mechanism:`
			* In MSSQL, a column declared `IDENTITY(..)` may be inserted into only when `SET
			IDENTITY_INSERT` is applied to that table.
			* In Postgres, a column declared `GENERATED ALWAYS AS IDENTITY`
			may be inserted into by giving the clause `OVERRIDING SYSTEM VALUE` in an
			`INSERT` statement.

			`**We need to decide how/when/if we want to expose the overriding mechanism in`
			`our GraphQL API** (see the Unresolved Questions section below).`

			`## How`

			`Implementing the handling of identity columns should apply the architecture`
			`described in [Column Mutability](/rfcs/column-mutability.md).`

			`If we go with the non-overriding policy described above there should not be`
			`any changes necessary to SQL translation for either MSSQL or PostgreSQL.`

			`The only necessary change then ought to be amending the table metadata`
			`extraction (for both MSSQL and PostgreSQL) to identify identity columns`
			`and set column mutability accordingly (i.e. not insertable, not updatable).`

			`## Unresolved Questions`

			`_When, if ever, should we make use of the constraints overriding mechanisms`
			`described above? Do we want to never override? Always? Make it configurable?_`

			`Note that:`
			`* Column Mutability guides us for how to implement the schema generation aspects`
			`of either choice (of "non-overriding" vs "overriding")`
			`* Leaving this unanswered does not block implementation of basically correct`
			`handling of identity columns.`
			`* But the implementation will have to make an (arbitrary) choice between the two.`
			`A reasonable choice would be to select "non-overriding".`
			`* We don't expect any complications to result from amending the implementation`
			`at a later point in time.`


			`## Appendix`

			`The purpose of this appendix is to collect relevant information on the concept`
			`of _Identity Columns_ and inform the implementation of GraphQL Engine.`

			`* Part of the SQL standard.`
			`* Motivation is to standardise DB-supplied identifiers (i.e. autoincrement/serial/..)`
			`* Note: This is a concept distinct from primary keys. Identity Columnss don't introduce`
			`uniqueness constraints by themselves!`
			`* Also provide better semantics than naive auto-increment/serial solutions, by`
			`prohibiting updating and inserting of Identity Columns (to an extent), in order to`
			`avoid issue where auto-increment logic produces duplicates because conflicts`
			`with manual inserts/updates.`
			`* Interestingly, no-one seems to actually link to the standard they implement from.`
			`* Implemented in PG, MSSQL, DB2 and Oracle (also Oracle-NoSQL, ironically)`
			`* Not implemented in MySQL or SQLite`
			`* Introduces some complications/extra coordination for replication/backup.`

			`In a sentence:`
			`> Identity columns are immutable, sequentially distinct values`
			`> provided only by the DBMS`

			`### MSSQL semantics`

			`[MSSQL TSQL Identity Columns](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-ver15)`

			* Possible to `INSERT` values for Identity Columns, but guarded by a `SET INSERT_IDENTITY <tablename> ON` statement.
			* Impossible to `UPDATE` values for Identity Columns.
			* Syntax differs from SQL standard: `column IDENTITY(type, seed, increment)`.

			`### PostgreSQL Semantics`

			`[PG Create table syntax (including GENERATED)](https://www.postgresql.org/docs/devel/sql-createtable.html)`

			* Syntax closer to SQL standard: `column GENERATED BY DEFAULT AS IDENTITY`, `column GENERATED ALWAYS AS IDENTITY`.
			* Implemented on top of `series`.
			* Columns `GENERATED BY DEFAULT` may be both `INSERT`ed and and `UPDATE`d.
			* Columns `GENERATED ALWAYS` may be `INSERT`ed (guarded by an `OVERRIDE SYSTEM VALUE` keyword), but never `UPDATE`d.


			`### Links`

			`[Don't use serial](https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_serial):`
			`> For new applications, identity columns should be used instead.`
			`>`
			`> Why not?`
			`>`
			`> The serial types have some weird behaviors that make schema, dependency, and permission management unnecessarily cumbersome.`

			`[SE: pg serial vs identity](https://stackoverflow.com/questions/55300370/postgresql-serial-vs-identity)`

			`[Implementers blog post](https://www.2ndquadrant.com/en/blog/postgresql-10-identity-columns/)`

			`[Technical details blog post](https://www.depesz.com/2017/04/10/waiting-for-postgresql-10-identity-columns/)`

			`[Wikipedia: Identity Columns](https://en.wikipedia.org/wiki/Identity_column)`