graphql-engine/rfcs/identity-columns.md
Philip Lykke Carlsen ef4d194d79 RFC: Column Mutability
This PR contains two pull requests:

[Identity Columns](https://github.com/hasura/graphql-engine-mono/blob/rfc/identity-columns/rfcs/identity-columns.md) collects information and product decisions about identity columns. There are some decisions we need to make explicit.

[Column Mutability](https://github.com/hasura/graphql-engine-mono/blob/rfc/identity-columns/rfcs/column-mutability.md) proposes an implementation strategy for identity columns and similar that should be able to elegantly accommodate differences among backends. The idea is to model the notion of _column mutability_ rather than e.g. identity columns directly.

Please volunteer your opinions and perspectives on these topics in the PR comments.

---
Closes #2407

PR-URL: https://github.com/hasura/graphql-engine-mono/pull/2507
GitOrigin-RevId: 5eb14a53504985fd32933c182bee4cc13bb70a02
2021-11-02 13:43:28 +00:00

5.7 KiB

Handling of Identity Columns

Metadata

---
authors: Philip Lykke Carlsen <philip@hasura.io>
discussion:
  https://github.com/hasura/graphql-engine-mono/issues/2407
  https://github.com/hasura/graphql-engine-mono/pull/2507
state: pending answers to unresolved questions
---

Description

This RFC collects discussion and decisions on how we want Identity Columns to work in the GraphQL Engine.

Problem

Identity Columns are an SQL standard database feature that attempts to solve the problem of generating row identifiers in a more sound way than naive auto-incrementing columns. This works by imposing restrictions on how such columns may be updated.

This means that, in order for the GraphQL engine to correctly deal with tables that have identity columns it has to observe these restrictions, specifically when updating and inserting.

It is also possible to sometimes override the constraints imposed by Identity Columns, and we need to decide what we want to support and how we want to support it.

Available options

Overall, there are two flavors of identity columns we may encounter:

  • (Postgres only) Identity columns declared GENERATED BY DEFAULT AS IDENTITY work just like regular SERIAL columns and impose no further constraints. We can view these as being identity-columns-in-name-only.

  • The more "true" Identity Columns, supported by both MSSQL and PostgreSQL, are not updatable and only insertable using and override mechanism:

    • In MSSQL, a column declared IDENTITY(..) may be inserted into only when SET IDENTITY_INSERT is applied to that table.
    • In Postgres, a column declared GENERATED ALWAYS AS IDENTITY may be inserted into by giving the clause OVERRIDING SYSTEM VALUE in an INSERT statement.

We need to decide how/when/if we want to expose the overriding mechanism in our GraphQL API (see the Unresolved Questions section below).

How

Implementing the handling of identity columns should apply the architecture described in Column Mutability.

If we go with the non-overriding policy described above there should not be any changes necessary to SQL translation for either MSSQL or PostgreSQL.

The only necessary change then ought to be amending the table metadata extraction (for both MSSQL and PostgreSQL) to identify identity columns and set column mutability accordingly (i.e. not insertable, not updatable).

Unresolved Questions

When, if ever, should we make use of the constraints overriding mechanisms described above? Do we want to never override? Always? Make it configurable?

Note that:

  • Column Mutability guides us for how to implement the schema generation aspects of either choice (of "non-overriding" vs "overriding")
  • Leaving this unanswered does not block implementation of basically correct handling of identity columns.
  • But the implementation will have to make an (arbitrary) choice between the two. A reasonable choice would be to select "non-overriding".
  • We don't expect any complications to result from amending the implementation at a later point in time.

Appendix

The purpose of this appendix is to collect relevant information on the concept of Identity Columns and inform the implementation of GraphQL Engine.

  • Part of the SQL standard.
  • Motivation is to standardise DB-supplied identifiers (i.e. autoincrement/serial/..)
    • Note: This is a concept distinct from primary keys. Identity Columnss don't introduce uniqueness constraints by themselves!
    • Also provide better semantics than naive auto-increment/serial solutions, by prohibiting updating and inserting of Identity Columns (to an extent), in order to avoid issue where auto-increment logic produces duplicates because conflicts with manual inserts/updates.
    • Interestingly, no-one seems to actually link to the standard they implement from.
  • Implemented in PG, MSSQL, DB2 and Oracle (also Oracle-NoSQL, ironically)
  • Not implemented in MySQL or SQLite
  • Introduces some complications/extra coordination for replication/backup.

In a sentence:

Identity columns are immutable, sequentially distinct values provided only by the DBMS

MSSQL semantics

MSSQL TSQL Identity Columns

  • Possible to INSERT values for Identity Columns, but guarded by a SET INSERT_IDENTITY <tablename> ON statement.
  • Impossible to UPDATE values for Identity Columns.
  • Syntax differs from SQL standard: column IDENTITY(type, seed, increment).

PostgreSQL Semantics

PG Create table syntax (including GENERATED)

  • Syntax closer to SQL standard: column GENERATED BY DEFAULT AS IDENTITY, column GENERATED ALWAYS AS IDENTITY.
  • Implemented on top of series.
  • Columns GENERATED BY DEFAULT may be both INSERTed and and UPDATEd.
  • Columns GENERATED ALWAYS may be INSERTed (guarded by an OVERRIDE SYSTEM VALUE keyword), but never UPDATEd.

Don't use serial:

For new applications, identity columns should be used instead.

Why not?

The serial types have some weird behaviors that make schema, dependency, and permission management unnecessarily cumbersome.

SE: pg serial vs identity

Implementers blog post

Technical details blog post

Wikipedia: Identity Columns