# Handling of Identity Columns ## Metadata ``` --- authors: Philip Lykke Carlsen discussion: https://github.com/hasura/graphql-engine-mono/issues/2407 https://github.com/hasura/graphql-engine-mono/pull/2507 state: pending answers to unresolved questions --- ``` ## Description This RFC collects discussion and decisions on how we want Identity Columns to work in the GraphQL Engine. ## Problem Identity Columns are an SQL standard database feature that attempts to solve the problem of generating row identifiers in a more sound way than naive auto-incrementing columns. This works by imposing restrictions on how such columns may be updated. This means that, in order for the GraphQL engine to correctly deal with tables that have identity columns it has to observe these restrictions, specifically when updating and inserting. It is also possible to sometimes override the constraints imposed by Identity Columns, and we need to decide what we want to support and how we want to support it. ## Available options Overall, there are two flavors of identity columns we may encounter: * (Postgres only) Identity columns declared `GENERATED BY DEFAULT AS IDENTITY` work just like regular `SERIAL` columns and impose no further constraints. We can view these as being identity-columns-in-name-only. * The more "true" Identity Columns, supported by both MSSQL and PostgreSQL, are not updatable and only insertable using and override mechanism: * In MSSQL, a column declared `IDENTITY(..)` may be inserted into only when `SET IDENTITY_INSERT` is applied to that table. * In Postgres, a column declared `GENERATED ALWAYS AS IDENTITY` may be inserted into by giving the clause `OVERRIDING SYSTEM VALUE` in an `INSERT` statement. **We need to decide how/when/if we want to expose the overriding mechanism in our GraphQL API** (see the Unresolved Questions section below). ## How Implementing the handling of identity columns should apply the architecture described in [Column Mutability](/rfcs/column-mutability.md). If we go with the non-overriding policy described above there should not be any changes necessary to SQL translation for either MSSQL or PostgreSQL. The only necessary change then ought to be amending the table metadata extraction (for both MSSQL and PostgreSQL) to identify identity columns and set column mutability accordingly (i.e. not insertable, not updatable). ## Unresolved Questions _When, if ever, should we make use of the constraints overriding mechanisms described above? Do we want to never override? Always? Make it configurable?_ Note that: * Column Mutability guides us for how to implement the schema generation aspects of either choice (of "non-overriding" vs "overriding") * Leaving this unanswered does not block implementation of basically correct handling of identity columns. * But the implementation will have to make an (arbitrary) choice between the two. A reasonable choice would be to select "non-overriding". * We don't expect any complications to result from amending the implementation at a later point in time. ## Appendix The purpose of this appendix is to collect relevant information on the concept of _Identity Columns_ and inform the implementation of GraphQL Engine. * Part of the SQL standard. * Motivation is to standardise DB-supplied identifiers (i.e. autoincrement/serial/..) * Note: This is a concept distinct from primary keys. Identity Columnss don't introduce uniqueness constraints by themselves! * Also provide better semantics than naive auto-increment/serial solutions, by prohibiting updating and inserting of Identity Columns (to an extent), in order to avoid issue where auto-increment logic produces duplicates because conflicts with manual inserts/updates. * Interestingly, no-one seems to actually link to the standard they implement from. * Implemented in PG, MSSQL, DB2 and Oracle (also Oracle-NoSQL, ironically) * Not implemented in MySQL or SQLite * Introduces some complications/extra coordination for replication/backup. In a sentence: > Identity columns are immutable, sequentially distinct values > provided only by the DBMS ### MSSQL semantics [MSSQL TSQL Identity Columns](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-ver15) * Possible to `INSERT` values for Identity Columns, but guarded by a `SET INSERT_IDENTITY ON` statement. * Impossible to `UPDATE` values for Identity Columns. * Syntax differs from SQL standard: `column IDENTITY(type, seed, increment)`. ### PostgreSQL Semantics [PG Create table syntax (including GENERATED)](https://www.postgresql.org/docs/devel/sql-createtable.html) * Syntax closer to SQL standard: `column GENERATED BY DEFAULT AS IDENTITY`, `column GENERATED ALWAYS AS IDENTITY`. * Implemented on top of `series`. * Columns `GENERATED BY DEFAULT` may be both `INSERT`ed and and `UPDATE`d. * Columns `GENERATED ALWAYS` may be `INSERT`ed (guarded by an `OVERRIDE SYSTEM VALUE` keyword), but never `UPDATE`d. ### Links [Don't use serial](https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_serial): > For new applications, identity columns should be used instead. > > Why not? > > The serial types have some weird behaviors that make schema, dependency, and permission management unnecessarily cumbersome. [SE: pg serial vs identity](https://stackoverflow.com/questions/55300370/postgresql-serial-vs-identity) [Implementers blog post](https://www.2ndquadrant.com/en/blog/postgresql-10-identity-columns/) [Technical details blog post](https://www.depesz.com/2017/04/10/waiting-for-postgresql-10-identity-columns/) [Wikipedia: Identity Columns](https://en.wikipedia.org/wiki/Identity_column)