diff --git a/rfcs/column-mutability.md b/rfcs/column-mutability.md new file mode 100644 index 00000000000..eccc4062e66 --- /dev/null +++ b/rfcs/column-mutability.md @@ -0,0 +1,243 @@ +# Column Mutability + +## Metadata + +``` +--- +authors: Philip Lykke Carlsen +discussion: + https://github.com/hasura/graphql-engine-mono/issues/2407 + https://github.com/hasura/graphql-engine-mono/pull/2507 +--- +``` + +## Description + +Various features of databases influence the set of columns that should be +exposed under different circumstances. Examples of this include Identity Columns +and (DB) Generated Columns [^1]. +[^1]: https://www.postgresql.org/docs/14/sql-createtable.html + +This RFC proposes to add the basic, shared concept of _Column Mutability_ with +the goal of simplifying the implementation of features including those mentioned +above and similar, as well as improving code reuse, specifically for Schema +Generation. + +### Problem + +In order to motivate this proposal, we'll start with a historical summary a +previous effort of implementing support for handling identity columns. + +Identity Columns were introduced as first class concept, present throughout the +different parts of the implementation that needed to adapt its behavior +depending on them. + +What we didn't know at the time was that this would turn out to be awkward, +because identity columns in Postgresql and MSSQL have different semantics. And +because Postgres even has two variations of identity columns which need to be +treated differently. (More on this later) + +The specific way in which this turned awkward was in the schema code. We want to +avoid duplicating this code, so that we produce consistent schemas across +backends. However, in order for this code to deal correctly with different +behaviors of different identity column variants it had to accept further +backend-specific customization options (or, hypothetically, be refactored). + +Concretely, apart from case-switching on whether columns were identity columns +or not the code was amended with the (`class Backend b` type-level) feature flag +`type XOnConflict b`, indicating whether a backend would support +`_on_conflict`-based upserts. [^2] + +[^2]: MSSQL would not support upserts, as a concession for us missing an + implementation that would distinguish between what columns to insert and what + columns to update owning to `_on_conflict` being part of `insert_` mutations. + +Then came [issue #7557](https://github.com/hasura/graphql-engine/issues/7557), +and we realised the importance of distinguishing between `GENERATED BY DEFAULT AS +IDENTITY` and `GENERATED ALWAYS AS IDENTITY` identity columns in Postgres. [^3] + +[^3]: `BY DEFAULT` may be inserted but not updated, while `ALWAYS` may neither. + So now that the PG backend had a concept of id cols it would interpret them the + same way as in MSSQL (which are more similar to the PG `ALWAYS` kind), meaning + one user's insert mutations on a table with a `GENERATED BY DEFAULT` identity + col suddenly stopped working, because the id col was elided from the schema! + +To summarize, it seems our approach to modelling identity columns was +inappropriate: We introduced a single _Identity Column_ modelling concept which +had to cover for the various idiosyncratic variants between (and within!) +databases, and as a result the shared schema generation code needed to know +about different backends' idiosyncracies. + +### Why is it important? + +At the level of concrete features our product should support, Identity columns +are already something our customers rely on (as evidenced by issue #7557). On Postgresql, +Identity Columns are even the recommended successor to `SERIAL` columns [^4]. + +[^4]: [Don't use serial](https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_serial): + > For new applications, identity columns should be used instead. + > + > Why not? + > + > The serial types have some weird behaviors that make schema, dependency, + > and permission management unnecessarily cumbersome. + +Additionally, a generic type of generated columns (database backed rather than +GraphQL Engine backed) could also be an attractive feature to support, which +should be easy to build on top of this proposal. + +On a more meta-feature level, faithfully exposing relational databases on +GraphQL is our bread and butter. Being able to develop new features confidently +with less effort is an important enabler of this. + +Our ability to achieve this (and thus the quality of the finished product) +is affected by our choices of the abstractions that we use to express our solution. +Therefore it's important we pick those that enable us rather than hamper us. + +### Success + +This effort is a success once insert, update, and upsert mutations are implemented +using a backend-agnostic notion of column mutability, unaffected by backend-specific +concepts such as Identity Columns and Generated Columns. + +## How + +This specific proposal describes an implementation strategy and does not itself +have any user visible aspects. + +Implementing this proposal in a basic form amounts to three conceptually simple +things: + +1. Amend the table metadata with column mutability information. E.g. by changing + `Hasura.RQL.Types.Column.ColumnInfo` or + `Hasura.RQL.Types.Table.TableCoreInfoG`. We should be able to record whether + each column is insertable and updatable at the least. +2. Updating the schema generation code to respect this new metadata. Points of + interest include update mutations, insert mutations, and insert with + `on_conflict` (the `update_columns` field should only contain updatable + columns). +3. Updating the permissions code to respect this new metadata. Column Update + Permissions should apply to only updatable columns etc. + +In the actual table metadata collection code we may initially just mark every +column as insertable and updatable, unless the work is in the context of +e.g. implementing identity columns. + +### Effects and Interactions + +This feature may be perfectly well confined to the server. It might be +interesting however to show or use the column mutability information in the +Console, for example to reflect to a user how HGE is understanding their +database schema. + +### Alternatives + +If you do not have a generic concept of column mutability, all features that +influence the set of columns exposed in certain situations will have to +themselves deal directly with those situations. + +Examples of code in an implementation without generic column mutability: +* Schema generation needs to know and understand the concepts of Identity + Columns, and Generated Columns. These work differently across backends, which + hampers code reuse. +* Permissions need to know about Identity Columns, and Generated Columns, + because Update Permissions don't make sense those columns. But only sometimes, + and that in backend specific ways. + +### Costs and Drawbacks + +A drawback of using this approach over the alternatives is that, as a feature +implementer you need to know prior knowledge of the column mutability feature in +order to exploit it. Nothing prohibits you from implementing some feature in +parallel, that could have benefited from using it. + +But then again, if not exploiting the column mutability feature, you would have +to know about all the relevant interactions of column mutability in order to +produce a well-rounded feature. + +Another drawback is that this could potentially result in poor usability if it +turns out that some features need to be handled more specifically than just +including or excluding columns depending on their mutability affords. + +A somewhat mild but illustrative example of this is a user wanting to update a +non-updatable column and being puzzled as to why this column is not present as a +field in the update mutation. + +This concrete example could be remedied by annotating the metadata that +indicates mutability with a human-readable text describing the cause, which we +could then display in the console. + +But care must be taken: In the (contrived) case that we wanted to, say, give +rich error messages in response only to update mutations that mention identity +columns but not those that mention generated columns, we would have encountered +a need that is not met by a generic column mutability concept. That would leave +us having to overload/pollute column mutability with data and logic for dealing +with non-universal edge cases, and we wouldn't have achieved anything. + +### Unresolved Questions + +> A place for authors to list out open questions +> Ideally, it would serve as a collection point for others to chime in during +> the review process to either identify their own questions or offer potential +> solutions + +### Future Work / Out of Scope + +As stated earlier, this RFC does not directly imply any application of its +ideas. Applications such as Identity Columns are treated separately. + +Here is a first attempt at looking more closely at how Identity Columns would +use Column Mutability to fulfill its needs: + +* In MSSQL. + + We have a couple of options for how we want to identity columns to work. + + ``` + IDENTITY(..), with SET IDENTITY_INSERT => not updatable, insertable + ``` + (This variant will also require specific handling at SQL translation) + + ``` + IDENTITY(..), w/o SET_IDENTITY_INSERT => not updatable, not insertable + ``` + +* In Postgres + + Similarly to MSSQL, we need to choose how to handle one flavor of identity column: + + ``` + GENERATED ALWAYS AS IDENTITY, with OVERRIDING SYSTEM VALUE => not updatable, insertable + ``` + (This variant will also require specific handling at SQL translation) + + ``` + GENERATED ALWAYS AS IDENTITY, w/o OVERRIDING SYSTEM VALUE => not updatable, not insertable + ``` + + The third variant does not even register: + ``` + GENERATED BY DEFAULT AS IDENTITY => updatable, insertable + ``` + +* General Computed Columns, as supported by all of MSSQL, PG 14, and MySQL: + `generated => not updatable, not insertable` + +Note: Careful inspection of the above suggests that the +upsert-via-insert+on_conflict schema should work for both MSSQL and Postgresql, +regardless of the flavors we choose to support. The only troublesome combination +is `updatable, not insertable` which is absent above. [^why-troublesome] + +[^why-troublesome]: The reason `updatable, not insertable` is troublesome is that the values to be upserted are given as: + ``` + insert_some_table( + objects: [{ }], + on_conflict: { update_columns: [], ... } + ) + ``` + We want to elide fields `not insertable` from the schema of ``, + but that is also the place where values upserted go. + +Also, for something completely different, it might be interesting to make column +mutability something a user can specify. This use case does however have some +overlap with the Columns Permissions feature. diff --git a/rfcs/identity-columns.md b/rfcs/identity-columns.md new file mode 100644 index 00000000000..7373383f3b0 --- /dev/null +++ b/rfcs/identity-columns.md @@ -0,0 +1,139 @@ +# Handling of Identity Columns + +## Metadata + +``` +--- +authors: Philip Lykke Carlsen +discussion: + https://github.com/hasura/graphql-engine-mono/issues/2407 + https://github.com/hasura/graphql-engine-mono/pull/2507 +state: pending answers to unresolved questions +--- +``` + +## Description + +This RFC collects discussion and decisions on how we want Identity Columns to +work in the GraphQL Engine. + +## Problem + +Identity Columns are an SQL standard database feature that attempts to solve +the problem of generating row identifiers in a more sound way than naive +auto-incrementing columns. This works by imposing restrictions on how such +columns may be updated. + +This means that, in order for the GraphQL engine to correctly deal with tables +that have identity columns it has to observe these restrictions, specifically +when updating and inserting. + +It is also possible to sometimes override the constraints imposed by Identity +Columns, and we need to decide what we want to support and how we want to +support it. + +## Available options + +Overall, there are two flavors of identity columns we may encounter: + +* (Postgres only) Identity columns declared `GENERATED BY DEFAULT AS IDENTITY` + work just like regular `SERIAL` columns and impose no further constraints. + We can view these as being identity-columns-in-name-only. + +* The more "true" Identity Columns, supported by both MSSQL and PostgreSQL, are not + updatable and only insertable using and override mechanism: + * In MSSQL, a column declared `IDENTITY(..)` may be inserted into only when `SET + IDENTITY_INSERT` is applied to that table. + * In Postgres, a column declared `GENERATED ALWAYS AS IDENTITY` + may be inserted into by giving the clause `OVERRIDING SYSTEM VALUE` in an + `INSERT` statement. + +**We need to decide how/when/if we want to expose the overriding mechanism in +our GraphQL API** (see the Unresolved Questions section below). + +## How + +Implementing the handling of identity columns should apply the architecture +described in [Column Mutability](/rfcs/column-mutability.md). + +If we go with the non-overriding policy described above there should not be +any changes necessary to SQL translation for either MSSQL or PostgreSQL. + +The only necessary change then ought to be amending the table metadata +extraction (for both MSSQL and PostgreSQL) to identify identity columns +and set column mutability accordingly (i.e. not insertable, not updatable). + +## Unresolved Questions + +_When, if ever, should we make use of the constraints overriding mechanisms + described above? Do we want to never override? Always? Make it configurable?_ + +Note that: +* Column Mutability guides us for how to implement the schema generation aspects + of either choice (of "non-overriding" vs "overriding") +* Leaving this unanswered does not block implementation of basically correct + handling of identity columns. +* But the implementation will have to make an (arbitrary) choice between the two. + A reasonable choice would be to select "non-overriding". +* We don't expect any complications to result from amending the implementation + at a later point in time. + + +## Appendix + +The purpose of this appendix is to collect relevant information on the concept +of _Identity Columns_ and inform the implementation of GraphQL Engine. + +* Part of the SQL standard. +* Motivation is to standardise DB-supplied identifiers (i.e. autoincrement/serial/..) + * Note: This is a concept distinct from primary keys. Identity Columnss don't introduce + uniqueness constraints by themselves! + * Also provide better semantics than naive auto-increment/serial solutions, by + prohibiting updating and inserting of Identity Columns (to an extent), in order to + avoid issue where auto-increment logic produces duplicates because conflicts + with manual inserts/updates. + * Interestingly, no-one seems to actually link to the standard they implement from. +* Implemented in PG, MSSQL, DB2 and Oracle (also Oracle-NoSQL, ironically) +* Not implemented in MySQL or SQLite +* Introduces some complications/extra coordination for replication/backup. + +In a sentence: +> Identity columns are immutable, sequentially distinct values +> provided only by the DBMS + +### MSSQL semantics + +[MSSQL TSQL Identity Columns](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-ver15) + +* Possible to `INSERT` values for Identity Columns, but guarded by a `SET INSERT_IDENTITY ON` statement. +* Impossible to `UPDATE` values for Identity Columns. +* Syntax differs from SQL standard: `column IDENTITY(type, seed, increment)`. + +### PostgreSQL Semantics + +[PG Create table syntax (including GENERATED)](https://www.postgresql.org/docs/devel/sql-createtable.html) + +* Syntax closer to SQL standard: `column GENERATED BY DEFAULT AS IDENTITY`, `column GENERATED ALWAYS AS IDENTITY`. +* Implemented on top of `series`. +* Columns `GENERATED BY DEFAULT` may be both `INSERT`ed and and `UPDATE`d. +* Columns `GENERATED ALWAYS` may be `INSERT`ed (guarded by an `OVERRIDE SYSTEM VALUE` keyword), but never `UPDATE`d. + + +### Links + +[Don't use serial](https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_serial): +> For new applications, identity columns should be used instead. +> +> Why not? +> +> The serial types have some weird behaviors that make schema, dependency, and permission management unnecessarily cumbersome. + +[SE: pg serial vs identity](https://stackoverflow.com/questions/55300370/postgresql-serial-vs-identity) + +[Implementers blog post](https://www.2ndquadrant.com/en/blog/postgresql-10-identity-columns/) + +[Technical details blog post](https://www.depesz.com/2017/04/10/waiting-for-postgresql-10-identity-columns/) + +[Wikipedia: Identity Columns](https://en.wikipedia.org/wiki/Identity_column) + +