mirror of
https://github.com/hasura/graphql-engine.git
synced 2024-08-15 13:40:29 +03:00
RFC: Column Mutability
This PR contains two pull requests: [Identity Columns](https://github.com/hasura/graphql-engine-mono/blob/rfc/identity-columns/rfcs/identity-columns.md) collects information and product decisions about identity columns. There are some decisions we need to make explicit. [Column Mutability](https://github.com/hasura/graphql-engine-mono/blob/rfc/identity-columns/rfcs/column-mutability.md) proposes an implementation strategy for identity columns and similar that should be able to elegantly accommodate differences among backends. The idea is to model the notion of _column mutability_ rather than e.g. identity columns directly. Please volunteer your opinions and perspectives on these topics in the PR comments. --- Closes #2407 PR-URL: https://github.com/hasura/graphql-engine-mono/pull/2507 GitOrigin-RevId: 5eb14a53504985fd32933c182bee4cc13bb70a02
This commit is contained in:
parent
5694d41025
commit
ef4d194d79
243
rfcs/column-mutability.md
Normal file
243
rfcs/column-mutability.md
Normal file
@ -0,0 +1,243 @@
|
|||||||
|
# Column Mutability
|
||||||
|
|
||||||
|
## Metadata
|
||||||
|
|
||||||
|
```
|
||||||
|
---
|
||||||
|
authors: Philip Lykke Carlsen <philip@hasura.io>
|
||||||
|
discussion:
|
||||||
|
https://github.com/hasura/graphql-engine-mono/issues/2407
|
||||||
|
https://github.com/hasura/graphql-engine-mono/pull/2507
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
Various features of databases influence the set of columns that should be
|
||||||
|
exposed under different circumstances. Examples of this include Identity Columns
|
||||||
|
and (DB) Generated Columns [^1].
|
||||||
|
[^1]: https://www.postgresql.org/docs/14/sql-createtable.html
|
||||||
|
|
||||||
|
This RFC proposes to add the basic, shared concept of _Column Mutability_ with
|
||||||
|
the goal of simplifying the implementation of features including those mentioned
|
||||||
|
above and similar, as well as improving code reuse, specifically for Schema
|
||||||
|
Generation.
|
||||||
|
|
||||||
|
### Problem
|
||||||
|
|
||||||
|
In order to motivate this proposal, we'll start with a historical summary a
|
||||||
|
previous effort of implementing support for handling identity columns.
|
||||||
|
|
||||||
|
Identity Columns were introduced as first class concept, present throughout the
|
||||||
|
different parts of the implementation that needed to adapt its behavior
|
||||||
|
depending on them.
|
||||||
|
|
||||||
|
What we didn't know at the time was that this would turn out to be awkward,
|
||||||
|
because identity columns in Postgresql and MSSQL have different semantics. And
|
||||||
|
because Postgres even has two variations of identity columns which need to be
|
||||||
|
treated differently. (More on this later)
|
||||||
|
|
||||||
|
The specific way in which this turned awkward was in the schema code. We want to
|
||||||
|
avoid duplicating this code, so that we produce consistent schemas across
|
||||||
|
backends. However, in order for this code to deal correctly with different
|
||||||
|
behaviors of different identity column variants it had to accept further
|
||||||
|
backend-specific customization options (or, hypothetically, be refactored).
|
||||||
|
|
||||||
|
Concretely, apart from case-switching on whether columns were identity columns
|
||||||
|
or not the code was amended with the (`class Backend b` type-level) feature flag
|
||||||
|
`type XOnConflict b`, indicating whether a backend would support
|
||||||
|
`_on_conflict`-based upserts. [^2]
|
||||||
|
|
||||||
|
[^2]: MSSQL would not support upserts, as a concession for us missing an
|
||||||
|
implementation that would distinguish between what columns to insert and what
|
||||||
|
columns to update owning to `_on_conflict` being part of `insert_` mutations.
|
||||||
|
|
||||||
|
Then came [issue #7557](https://github.com/hasura/graphql-engine/issues/7557),
|
||||||
|
and we realised the importance of distinguishing between `GENERATED BY DEFAULT AS
|
||||||
|
IDENTITY` and `GENERATED ALWAYS AS IDENTITY` identity columns in Postgres. [^3]
|
||||||
|
|
||||||
|
[^3]: `BY DEFAULT` may be inserted but not updated, while `ALWAYS` may neither.
|
||||||
|
So now that the PG backend had a concept of id cols it would interpret them the
|
||||||
|
same way as in MSSQL (which are more similar to the PG `ALWAYS` kind), meaning
|
||||||
|
one user's insert mutations on a table with a `GENERATED BY DEFAULT` identity
|
||||||
|
col suddenly stopped working, because the id col was elided from the schema!
|
||||||
|
|
||||||
|
To summarize, it seems our approach to modelling identity columns was
|
||||||
|
inappropriate: We introduced a single _Identity Column_ modelling concept which
|
||||||
|
had to cover for the various idiosyncratic variants between (and within!)
|
||||||
|
databases, and as a result the shared schema generation code needed to know
|
||||||
|
about different backends' idiosyncracies.
|
||||||
|
|
||||||
|
### Why is it important?
|
||||||
|
|
||||||
|
At the level of concrete features our product should support, Identity columns
|
||||||
|
are already something our customers rely on (as evidenced by issue #7557). On Postgresql,
|
||||||
|
Identity Columns are even the recommended successor to `SERIAL` columns [^4].
|
||||||
|
|
||||||
|
[^4]: [Don't use serial](https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_serial):
|
||||||
|
> For new applications, identity columns should be used instead.
|
||||||
|
>
|
||||||
|
> Why not?
|
||||||
|
>
|
||||||
|
> The serial types have some weird behaviors that make schema, dependency,
|
||||||
|
> and permission management unnecessarily cumbersome.
|
||||||
|
|
||||||
|
Additionally, a generic type of generated columns (database backed rather than
|
||||||
|
GraphQL Engine backed) could also be an attractive feature to support, which
|
||||||
|
should be easy to build on top of this proposal.
|
||||||
|
|
||||||
|
On a more meta-feature level, faithfully exposing relational databases on
|
||||||
|
GraphQL is our bread and butter. Being able to develop new features confidently
|
||||||
|
with less effort is an important enabler of this.
|
||||||
|
|
||||||
|
Our ability to achieve this (and thus the quality of the finished product)
|
||||||
|
is affected by our choices of the abstractions that we use to express our solution.
|
||||||
|
Therefore it's important we pick those that enable us rather than hamper us.
|
||||||
|
|
||||||
|
### Success
|
||||||
|
|
||||||
|
This effort is a success once insert, update, and upsert mutations are implemented
|
||||||
|
using a backend-agnostic notion of column mutability, unaffected by backend-specific
|
||||||
|
concepts such as Identity Columns and Generated Columns.
|
||||||
|
|
||||||
|
## How
|
||||||
|
|
||||||
|
This specific proposal describes an implementation strategy and does not itself
|
||||||
|
have any user visible aspects.
|
||||||
|
|
||||||
|
Implementing this proposal in a basic form amounts to three conceptually simple
|
||||||
|
things:
|
||||||
|
|
||||||
|
1. Amend the table metadata with column mutability information. E.g. by changing
|
||||||
|
`Hasura.RQL.Types.Column.ColumnInfo` or
|
||||||
|
`Hasura.RQL.Types.Table.TableCoreInfoG`. We should be able to record whether
|
||||||
|
each column is insertable and updatable at the least.
|
||||||
|
2. Updating the schema generation code to respect this new metadata. Points of
|
||||||
|
interest include update mutations, insert mutations, and insert with
|
||||||
|
`on_conflict` (the `update_columns` field should only contain updatable
|
||||||
|
columns).
|
||||||
|
3. Updating the permissions code to respect this new metadata. Column Update
|
||||||
|
Permissions should apply to only updatable columns etc.
|
||||||
|
|
||||||
|
In the actual table metadata collection code we may initially just mark every
|
||||||
|
column as insertable and updatable, unless the work is in the context of
|
||||||
|
e.g. implementing identity columns.
|
||||||
|
|
||||||
|
### Effects and Interactions
|
||||||
|
|
||||||
|
This feature may be perfectly well confined to the server. It might be
|
||||||
|
interesting however to show or use the column mutability information in the
|
||||||
|
Console, for example to reflect to a user how HGE is understanding their
|
||||||
|
database schema.
|
||||||
|
|
||||||
|
### Alternatives
|
||||||
|
|
||||||
|
If you do not have a generic concept of column mutability, all features that
|
||||||
|
influence the set of columns exposed in certain situations will have to
|
||||||
|
themselves deal directly with those situations.
|
||||||
|
|
||||||
|
Examples of code in an implementation without generic column mutability:
|
||||||
|
* Schema generation needs to know and understand the concepts of Identity
|
||||||
|
Columns, and Generated Columns. These work differently across backends, which
|
||||||
|
hampers code reuse.
|
||||||
|
* Permissions need to know about Identity Columns, and Generated Columns,
|
||||||
|
because Update Permissions don't make sense those columns. But only sometimes,
|
||||||
|
and that in backend specific ways.
|
||||||
|
|
||||||
|
### Costs and Drawbacks
|
||||||
|
|
||||||
|
A drawback of using this approach over the alternatives is that, as a feature
|
||||||
|
implementer you need to know prior knowledge of the column mutability feature in
|
||||||
|
order to exploit it. Nothing prohibits you from implementing some feature in
|
||||||
|
parallel, that could have benefited from using it.
|
||||||
|
|
||||||
|
But then again, if not exploiting the column mutability feature, you would have
|
||||||
|
to know about all the relevant interactions of column mutability in order to
|
||||||
|
produce a well-rounded feature.
|
||||||
|
|
||||||
|
Another drawback is that this could potentially result in poor usability if it
|
||||||
|
turns out that some features need to be handled more specifically than just
|
||||||
|
including or excluding columns depending on their mutability affords.
|
||||||
|
|
||||||
|
A somewhat mild but illustrative example of this is a user wanting to update a
|
||||||
|
non-updatable column and being puzzled as to why this column is not present as a
|
||||||
|
field in the update mutation.
|
||||||
|
|
||||||
|
This concrete example could be remedied by annotating the metadata that
|
||||||
|
indicates mutability with a human-readable text describing the cause, which we
|
||||||
|
could then display in the console.
|
||||||
|
|
||||||
|
But care must be taken: In the (contrived) case that we wanted to, say, give
|
||||||
|
rich error messages in response only to update mutations that mention identity
|
||||||
|
columns but not those that mention generated columns, we would have encountered
|
||||||
|
a need that is not met by a generic column mutability concept. That would leave
|
||||||
|
us having to overload/pollute column mutability with data and logic for dealing
|
||||||
|
with non-universal edge cases, and we wouldn't have achieved anything.
|
||||||
|
|
||||||
|
### Unresolved Questions
|
||||||
|
|
||||||
|
> A place for authors to list out open questions
|
||||||
|
> Ideally, it would serve as a collection point for others to chime in during
|
||||||
|
> the review process to either identify their own questions or offer potential
|
||||||
|
> solutions
|
||||||
|
|
||||||
|
### Future Work / Out of Scope
|
||||||
|
|
||||||
|
As stated earlier, this RFC does not directly imply any application of its
|
||||||
|
ideas. Applications such as Identity Columns are treated separately.
|
||||||
|
|
||||||
|
Here is a first attempt at looking more closely at how Identity Columns would
|
||||||
|
use Column Mutability to fulfill its needs:
|
||||||
|
|
||||||
|
* In MSSQL.
|
||||||
|
|
||||||
|
We have a couple of options for how we want to identity columns to work.
|
||||||
|
|
||||||
|
```
|
||||||
|
IDENTITY(..), with SET IDENTITY_INSERT => not updatable, insertable
|
||||||
|
```
|
||||||
|
(This variant will also require specific handling at SQL translation)
|
||||||
|
|
||||||
|
```
|
||||||
|
IDENTITY(..), w/o SET_IDENTITY_INSERT => not updatable, not insertable
|
||||||
|
```
|
||||||
|
|
||||||
|
* In Postgres
|
||||||
|
|
||||||
|
Similarly to MSSQL, we need to choose how to handle one flavor of identity column:
|
||||||
|
|
||||||
|
```
|
||||||
|
GENERATED ALWAYS AS IDENTITY, with OVERRIDING SYSTEM VALUE => not updatable, insertable
|
||||||
|
```
|
||||||
|
(This variant will also require specific handling at SQL translation)
|
||||||
|
|
||||||
|
```
|
||||||
|
GENERATED ALWAYS AS IDENTITY, w/o OVERRIDING SYSTEM VALUE => not updatable, not insertable
|
||||||
|
```
|
||||||
|
|
||||||
|
The third variant does not even register:
|
||||||
|
```
|
||||||
|
GENERATED BY DEFAULT AS IDENTITY => updatable, insertable
|
||||||
|
```
|
||||||
|
|
||||||
|
* General Computed Columns, as supported by all of MSSQL, PG 14, and MySQL:
|
||||||
|
`generated => not updatable, not insertable`
|
||||||
|
|
||||||
|
Note: Careful inspection of the above suggests that the
|
||||||
|
upsert-via-insert+on_conflict schema should work for both MSSQL and Postgresql,
|
||||||
|
regardless of the flavors we choose to support. The only troublesome combination
|
||||||
|
is `updatable, not insertable` which is absent above. [^why-troublesome]
|
||||||
|
|
||||||
|
[^why-troublesome]: The reason `updatable, not insertable` is troublesome is that the values to be upserted are given as:
|
||||||
|
```
|
||||||
|
insert_some_table(
|
||||||
|
objects: [{ <values here> }],
|
||||||
|
on_conflict: { update_columns: [<columns to be updated>], ... }
|
||||||
|
)
|
||||||
|
```
|
||||||
|
We want to elide fields `not insertable` from the schema of `<values here>`,
|
||||||
|
but that is also the place where values upserted go.
|
||||||
|
|
||||||
|
Also, for something completely different, it might be interesting to make column
|
||||||
|
mutability something a user can specify. This use case does however have some
|
||||||
|
overlap with the Columns Permissions feature.
|
139
rfcs/identity-columns.md
Normal file
139
rfcs/identity-columns.md
Normal file
@ -0,0 +1,139 @@
|
|||||||
|
# Handling of Identity Columns
|
||||||
|
|
||||||
|
## Metadata
|
||||||
|
|
||||||
|
```
|
||||||
|
---
|
||||||
|
authors: Philip Lykke Carlsen <philip@hasura.io>
|
||||||
|
discussion:
|
||||||
|
https://github.com/hasura/graphql-engine-mono/issues/2407
|
||||||
|
https://github.com/hasura/graphql-engine-mono/pull/2507
|
||||||
|
state: pending answers to unresolved questions
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
This RFC collects discussion and decisions on how we want Identity Columns to
|
||||||
|
work in the GraphQL Engine.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Identity Columns are an SQL standard database feature that attempts to solve
|
||||||
|
the problem of generating row identifiers in a more sound way than naive
|
||||||
|
auto-incrementing columns. This works by imposing restrictions on how such
|
||||||
|
columns may be updated.
|
||||||
|
|
||||||
|
This means that, in order for the GraphQL engine to correctly deal with tables
|
||||||
|
that have identity columns it has to observe these restrictions, specifically
|
||||||
|
when updating and inserting.
|
||||||
|
|
||||||
|
It is also possible to sometimes override the constraints imposed by Identity
|
||||||
|
Columns, and we need to decide what we want to support and how we want to
|
||||||
|
support it.
|
||||||
|
|
||||||
|
## Available options
|
||||||
|
|
||||||
|
Overall, there are two flavors of identity columns we may encounter:
|
||||||
|
|
||||||
|
* (Postgres only) Identity columns declared `GENERATED BY DEFAULT AS IDENTITY`
|
||||||
|
work just like regular `SERIAL` columns and impose no further constraints.
|
||||||
|
We can view these as being identity-columns-in-name-only.
|
||||||
|
|
||||||
|
* The more "true" Identity Columns, supported by both MSSQL and PostgreSQL, are not
|
||||||
|
updatable and only insertable using and override mechanism:
|
||||||
|
* In MSSQL, a column declared `IDENTITY(..)` may be inserted into only when `SET
|
||||||
|
IDENTITY_INSERT` is applied to that table.
|
||||||
|
* In Postgres, a column declared `GENERATED ALWAYS AS IDENTITY`
|
||||||
|
may be inserted into by giving the clause `OVERRIDING SYSTEM VALUE` in an
|
||||||
|
`INSERT` statement.
|
||||||
|
|
||||||
|
**We need to decide how/when/if we want to expose the overriding mechanism in
|
||||||
|
our GraphQL API** (see the Unresolved Questions section below).
|
||||||
|
|
||||||
|
## How
|
||||||
|
|
||||||
|
Implementing the handling of identity columns should apply the architecture
|
||||||
|
described in [Column Mutability](/rfcs/column-mutability.md).
|
||||||
|
|
||||||
|
If we go with the non-overriding policy described above there should not be
|
||||||
|
any changes necessary to SQL translation for either MSSQL or PostgreSQL.
|
||||||
|
|
||||||
|
The only necessary change then ought to be amending the table metadata
|
||||||
|
extraction (for both MSSQL and PostgreSQL) to identify identity columns
|
||||||
|
and set column mutability accordingly (i.e. not insertable, not updatable).
|
||||||
|
|
||||||
|
## Unresolved Questions
|
||||||
|
|
||||||
|
_When, if ever, should we make use of the constraints overriding mechanisms
|
||||||
|
described above? Do we want to never override? Always? Make it configurable?_
|
||||||
|
|
||||||
|
Note that:
|
||||||
|
* Column Mutability guides us for how to implement the schema generation aspects
|
||||||
|
of either choice (of "non-overriding" vs "overriding")
|
||||||
|
* Leaving this unanswered does not block implementation of basically correct
|
||||||
|
handling of identity columns.
|
||||||
|
* But the implementation will have to make an (arbitrary) choice between the two.
|
||||||
|
A reasonable choice would be to select "non-overriding".
|
||||||
|
* We don't expect any complications to result from amending the implementation
|
||||||
|
at a later point in time.
|
||||||
|
|
||||||
|
|
||||||
|
## Appendix
|
||||||
|
|
||||||
|
The purpose of this appendix is to collect relevant information on the concept
|
||||||
|
of _Identity Columns_ and inform the implementation of GraphQL Engine.
|
||||||
|
|
||||||
|
* Part of the SQL standard.
|
||||||
|
* Motivation is to standardise DB-supplied identifiers (i.e. autoincrement/serial/..)
|
||||||
|
* Note: This is a concept distinct from primary keys. Identity Columnss don't introduce
|
||||||
|
uniqueness constraints by themselves!
|
||||||
|
* Also provide better semantics than naive auto-increment/serial solutions, by
|
||||||
|
prohibiting updating and inserting of Identity Columns (to an extent), in order to
|
||||||
|
avoid issue where auto-increment logic produces duplicates because conflicts
|
||||||
|
with manual inserts/updates.
|
||||||
|
* Interestingly, no-one seems to actually link to the standard they implement from.
|
||||||
|
* Implemented in PG, MSSQL, DB2 and Oracle (also Oracle-NoSQL, ironically)
|
||||||
|
* Not implemented in MySQL or SQLite
|
||||||
|
* Introduces some complications/extra coordination for replication/backup.
|
||||||
|
|
||||||
|
In a sentence:
|
||||||
|
> Identity columns are immutable, sequentially distinct values
|
||||||
|
> provided only by the DBMS
|
||||||
|
|
||||||
|
### MSSQL semantics
|
||||||
|
|
||||||
|
[MSSQL TSQL Identity Columns](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-ver15)
|
||||||
|
|
||||||
|
* Possible to `INSERT` values for Identity Columns, but guarded by a `SET INSERT_IDENTITY <tablename> ON` statement.
|
||||||
|
* Impossible to `UPDATE` values for Identity Columns.
|
||||||
|
* Syntax differs from SQL standard: `column IDENTITY(type, seed, increment)`.
|
||||||
|
|
||||||
|
### PostgreSQL Semantics
|
||||||
|
|
||||||
|
[PG Create table syntax (including GENERATED)](https://www.postgresql.org/docs/devel/sql-createtable.html)
|
||||||
|
|
||||||
|
* Syntax closer to SQL standard: `column GENERATED BY DEFAULT AS IDENTITY`, `column GENERATED ALWAYS AS IDENTITY`.
|
||||||
|
* Implemented on top of `series`.
|
||||||
|
* Columns `GENERATED BY DEFAULT` may be both `INSERT`ed and and `UPDATE`d.
|
||||||
|
* Columns `GENERATED ALWAYS` may be `INSERT`ed (guarded by an `OVERRIDE SYSTEM VALUE` keyword), but never `UPDATE`d.
|
||||||
|
|
||||||
|
|
||||||
|
### Links
|
||||||
|
|
||||||
|
[Don't use serial](https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_serial):
|
||||||
|
> For new applications, identity columns should be used instead.
|
||||||
|
>
|
||||||
|
> Why not?
|
||||||
|
>
|
||||||
|
> The serial types have some weird behaviors that make schema, dependency, and permission management unnecessarily cumbersome.
|
||||||
|
|
||||||
|
[SE: pg serial vs identity](https://stackoverflow.com/questions/55300370/postgresql-serial-vs-identity)
|
||||||
|
|
||||||
|
[Implementers blog post](https://www.2ndquadrant.com/en/blog/postgresql-10-identity-columns/)
|
||||||
|
|
||||||
|
[Technical details blog post](https://www.depesz.com/2017/04/10/waiting-for-postgresql-10-identity-columns/)
|
||||||
|
|
||||||
|
[Wikipedia: Identity Columns](https://en.wikipedia.org/wiki/Identity_column)
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user