Remove all but the last inferred migration in a group of inferred
migrations that occur at the same timestamp with the same migration SQL.
Such inferred migrations are typically part of a batched transaction for
which we want to ignore all but the effect of the last statement in the
batch.
Add a new `WithNoVersionSchemaForRawSQL` option to control whether or
not version schema should be created for raw SQL migrations.
With the option set, a raw SQL migration:
* Has no version schema created on migration start.
* Leaves the previous version schema in place on migration completion.
Set `created_at` and `updated_at` explicitly when inserting inferred
migrations into the migrations table.
When two statements are run in a transaction, we need to explicitly
insert `statement_timestamp()` into the `created_at` and `updated_at`
fields rather than relying on the table default of `current_timestamp`.
`current_timestamp` is the same for all statements in a transaction,
which causes problems when ordering statements by `created_at`.
Fixes#361
The state query changes from #358 are causing performance issues in an
upstream integration of `pgroll`.
Revert the PR until the cause of the regression is found and fixed.
This reverts commit f994a42192.
Fixes https://github.com/xataio/pgroll/issues/355
Postgres stores index names with uppercase characters in the `pg_index`
catalog using the quoted version of the name. For example:
```
"idx_USERS_name"
```
whereas a lowercase index name would be stored as:
```
idx_users_name
```
This is different to how other object types are stored in their
respective catalogs. For example, table names are stored in
the`pg_class` catalog without quotes, regardless of whether they contain
uppercase characters.
This makes it necessary to strip quotes from index names when retrieving
them from the `pg_index` catalog when building the internal schema
representation.
Restrict the search path of the connection used by the state package to
only the state schema (the schema that contains the `migrations` table).
This ensures that any column types that might reside in other schema are
always qualified with the schema name in the internal schema
representation. Without the restriction of the search path, these type
names would be unqualified if they lived in a schema that was on the
search path used by the state connection.
The representation (qualified/unqualified) of type names in the internal
schema is ultimately due to how the
[format-type](https://www.postgresql.org/docs/9.5/functions-info.html)
function behaves; types in the `search_path` of the caller are
unqualified, otherwise they are qualified.
Add the `ON DELETE` setting of a foreign key to the information stored
about the key in the internal schema representation.
The schema representation for a foreign key now looks like:
```json
{
"some_table": {
...
"foreignKeys": {
"fk_users_id": {
"name": "fk_users_id",
"columns": [
"user_id"
],
"onDelete": "NO ACTION",
"referencedTable": "users",
"referencedColumns": [
"id"
]
}
}
}
}
```
Fixes https://github.com/xataio/pgroll/issues/309
Ensure that only one `inferred` migration is created in the
`pgroll.migrations` table when a column is dropped outside of a
migration.
From the Postgres
[docs](https://www.postgresql.org/docs/current/event-trigger-definition.html):
> The sql_drop event occurs just before the ddl_command_end event
trigger for any operation that drops database objects
This means that when the `raw_migration` function is run in response to
`sql_drop` and `ddl_command_end`, duplicate entries will be created in
`pgroll.migrations`; once as the function is run for `sql_drop` and
again when it's run for `ddl_command_end`.
Change the definition of the `pg_roll_handle_drop` event trigger to only
run on those kinds of drops that won't result in duplicates when the
`pg_roll_handle_ddl` trigger runs for the same change. `DROP TABLE` and
`DROP VIEW` won't result in duplicate migrations because their schema
can't be inferred by the `ddl_command_event` trigger because the object
has already been dropped when the trigger runs.
Update the inferred migration tests with two new testcases covering
dropping tables and columns.
Fixes#304
UNIQUE NOT NULL columns should also work in order to perform backfills.
This change relaxes the check on backfill requirements to use those
columns if a primary key is not available.
Validation will still error out if no suitable column is found
Note: this also fixes the `unique` retrieval from schema, where we were
failing to understand composed unique indices, resulting in columns
flagged as unique where they weren't really unique.
Fix some hard-coded occurrences of the `pgroll` schema name where the
schema name should instead be parameterized.
This isn't the first time we've made a mistake like this so also add a
test to guard against it.
Remove the schema component from index names in the internal schema
representation.
A `pgroll` migration is always run in the context of a a specific schema
so the extra qualification is redundant.
Removing the schema component from the index names makes it easy to
identify duplicated indexes by name when temporary columns and
associated attributes are renamed on migration completion.
Tests for this change are already in place (#276, #277). As of this PR,
migration tests run in a non-public schema will pass:
```bash
PGROLL_TEST_SCHEMA=foo go test ./...
```
Part of #273
Inferred migrations do not follow the same format as all other stored
migrations. From the schema history point of view, all migrations in the
list should apply correctly, including these.
This change ensures we store them following the same format, so
replaying a migration history is possible.
In the internal schema representation, include the key columns on which
an index is defined in the details for an index. With the new `columns`
field, the representation for an index now looks like:
```json
"indexes": {
"_pgroll_new_products_pkey": {
"name": "_pgroll_new_products_pkey",
"unique": true,
"columns": [
"id"
]
}
}
```
Add uniqueness information to each index recorded in `pgroll`'s internal
schema representation.
With the extra information, the entry for indexes in the schema
representation now looks like:
```json
"indexes": {
"products_name_unique": {
"name": "products_name_unique",
"unique": true,
},
"_pgroll_new_products_pkey": {
"name": "_pgroll_new_products_pkey",
"unique": true,
}
}
```
Having index uniqueness recorded in the schema representation will help
with preserving uniqueness constraints on columns duplicated for
backfilling (see https://github.com/xataio/pgroll/issues/227).
Add knowledge of `UNIQUE` constraints defined on a table to `pgroll`'s
internal schema representation.
`UNIQUE` constraints were already present as indexes, but this PR adds
them as their own top-level field in the `Table` schema and includes
information about the columns involved in the constraint.
Part of https://github.com/xataio/pgroll/issues/105
Improve the reliability and performance of the test suite by moving the
test isolation model from 'container per test' to 'database per test'.
The current test suite works well locally but is [very flaky
](https://github.com/xataio/pgroll/actions) when run on Github Actions.
The cause of the flakiness is the 'container per test' isolation model,
under which each testcase in each test starts its own Postgres
container. This model worked well initially but as the number of tests
has increased the actions runner often fails to make available the
required number of containers for such a large number of parallel tests.
This PR changes the isolation model to 'database per test'. Each package
creates one Postgres container and then each testcase in each test
creates a database within that container. This greatly reduces the
number of simultaneous containers required, making the test suite faster
and more reliable.
Each job in the test matrix sees a 60-70% reduction in duration and
(anecdotally) far fewer failures with no failures observed in ~20 runs.
Move the logic for the `pgroll status` command out of the CLI and into
the `migrations` and `state` package and add tests for it.
This makes it possible to consume migration status information from
packages using `pgroll` as a module.
Update the definition of the `previous_version` function to use a
recursive CTE to find the first non-'inferred' parent of the current
version.
This requires adding a new column to the `pgroll.migrations` table to
record whether a migration is a pgroll migration or an inferred DDL
change made outside of pgroll.
Together, this means that the previous version schema is removed
correctly when there have been inferred DDL changes between the current
and previous pgroll migration.
Fixes#196
Ping the database after establishing a connection for better error
messages.
Without this change, an error establishing a connection would be
prefixed with the message from the next failure.
**before**:
```
Error: unable to set pgroll.internal to true: dial tcp [::1]:7432: connect: connection refused
```
**after**:
```
Error: dial tcp [::1]:7432: connect: connection refused
```
Fixes https://github.com/xataio/pgroll/issues/133
We want to work on the security model for the `pgroll` schema, allowing
admins to remove access while maintaining functionality intact. Because
of this, some functions must still be marked as executable for any user
doing migrations directly with postgres.
This PR adds `SECURITY DEFINER` to these functions
---------
Co-authored-by: Carlos Pérez-Aradros Herce <exekias@gmail.com>
Fix the `previous_version` function so that it works correctly for
migrations applied in schema other than`public`.
In particular, this meant that the previous version schema would not be
removed on migration completion for migrations applied is schema other
than `public`.
Extend the test that checks the `--schema` flag is respected with the
extra coverage that would have caught this.
Add a `primaryKey` field to each table in the internal schema store to
record the column(s) that make up a table's primary key.
This will be used when backfilling rows
(https://github.com/xataio/pgroll/issues/127)
An example schema now looks like:
```json
{
"name": "public",
"tables": {
"users": {
"oid": "16412",
"name": "users",
"columns": {
"id": {
"name": "id",
"type": "integer",
"comment": null,
"default": "nextval('users_id_seq'::regclass)",
"nullable": false
},
"name": {
"name": "name",
"type": "text",
"comment": null,
"default": null,
"nullable": true
}
},
"comment": null,
"indexes": {
"users_pkey": {
"name": "users_pkey"
}
},
"primaryKey": [
"id"
]
}
}
}
```
Where the `primaryKey` field is the new field.
When applying a first migration against a schema, return the schema as
read form postgres, instead of an empty one.
This change allows migrations to happen against schemas created before
pg-roll install.
This change ensures we also catch DROP statements for their inclusion in
the migrations log.
It seems DROP statements don't make it to the `ddl_command_end` trigger.
We need to explictly listen for them under `sql_drop`.
This change adds a new `sql` operation, that allows to define an `up`
SQL statement to perform a migration on the schema.
An optional `down` field can be provided, this will be used when trying
to do a rollback after (for instance, in case of migration failure).
A new trigger is installed to capture DDL events coming from direct user
manipulations (not done by pg-roll), so they are stored as a migration,
getting to know the resulting schema in all cases.
Change the representation of a schema in `pg-roll`s state store from:
```go
type Schema struct {
// Tables is a map of virtual table name -> table mapping
Tables map[string]Table `json:"tables"`
}
```
to:
```go
type Schema struct {
// Name is the name of the schema
Name string `json:"name"`
// Tables is a map of virtual table name -> table mapping
Tables map[string]Table `json:"tables"`
}
```
ie, store the schema's name.
This allows the signature of `Start` to be simplified, removing the
`schemaName` parameter; the name can be retrieved from the
`schema.Schema` struct that is already provided.
Change the module name to match its import path.
In order for `pg-roll` to be usable as a module we need the be able to
import `"github.com/xataio/pg-roll/pkg/roll"` etc from other modules.
Changing the name of the module to match its import path ensures that
this is possible.
Add information about indexes on a table to `pg-roll`'s internal state
storage.
For each table, store an additional JSON object mapping each index name
on the table to details of the index (initially just its name).
An example of the resulting JSON is:
```json
{
"tables": {
"fruits": {
"oid": "16497",
"name": "fruits",
"columns": {
"id": {
"name": "id",
"type": "integer",
"comment": null,
"default": "nextval('_pgroll_new_fruits_id_seq'::regclass)",
"nullable": false
},
"name": {
"name": "name",
"type": "varchar(255)",
"comment": null,
"default": null,
"nullable": false
}
},
"comment": null,
"indexes": {
"_pgroll_idx_fruits_name": {
"name": "_pgroll_idx_fruits_name"
},
"_pgroll_new_fruits_pkey": {
"name": "_pgroll_new_fruits_pkey"
},
"_pgroll_new_fruits_name_key": {
"name": "_pgroll_new_fruits_name_key"
}
}
}
}
}
```
Also add fields to the `Schema` model structs to allow the new `indexes`
field to be unmarshalled.
Add a new field `Up` to **add column** migrations:
```json
{
"name": "03_add_column_to_products",
"operations": [
{
"add_column": {
"table": "products",
"up": "UPPER(name)",
"column": {
"name": "description",
"type": "varchar(255)",
"nullable": true
}
}
}
]
}
```
The SQL specified by the `up` field will be run whenever an row is
inserted into the underlying table when the session's `search_path` is
not set to the latest version of the schema.
The `up` SQL snippet can refer to existing columns in the table by name
(as in the the above example, where the `description` field is set to
`UPPER(name)`).
Add a `status` command to show the status of each schema that `pg-roll`
knows about (ie ,those schema that have had >0 migrations run in them).
`go run . status`
**Example output**:
```json
[
{
"Schema": "public",
"Version": "01_create_tables",
"Status": "In Progress"
}
]
```
or:
```json
[
{
"Schema": "public",
"Version": "01_create_tables",
"Status": "Complete"
}
]
```
In future the `json` output of the command should be behind a `-o json`
switch and the default output should be human readable.
This change will retrieve and store the resulting schema after a
migration is completed. This schema will be used as the base to execute
the next migration, making it possible to create views that are aware of
the full schema, and not only the one created by the last migration.
We use a function to retrieve the schema directly from Postgres instead
of building it from the migration files. This allows for more features
in the future, like doing an initial sync on top of the existing schema
or automatically detecting and storing out of band migrations from
triggers.
Example JSON stored schema:
```
{
"tables": {
"bills": {
"oid": "18272",
"name": "bills",
"columns": {
"id": {
"name": "id",
"type": "integer",
"comment": null,
"default": null,
"nullable": false
},
"date": {
"name": "date",
"type": "time with time zone",
"comment": null,
"default": null,
"nullable": false
},
"quantity": {
"name": "quantity",
"type": "integer",
"comment": null,
"default": null,
"nullable": false
}
},
"comment": null
},
"products": {
"oid": "18286",
"name": "products",
"columns": {
"id": {
"name": "id",
"type": "integer",
"comment": null,
"default": "nextval(_pgroll_new_products_id_seq::regclass)",
"nullable": false
},
"name": {
"name": "name",
"type": "varchar(255)",
"comment": null,
"default": null,
"nullable": false
},
"price": {
"name": "price",
"type": "numeric(10,2)",
"comment": null,
"default": null,
"nullable": false
}
},
"comment": null
},
"customers": {
"oid": "18263",
"name": "customers",
"columns": {
"id": {
"name": "id",
"type": "integer",
"comment": null,
"default": null,
"nullable": false
},
"name": {
"name": "name",
"type": "varchar(255)",
"comment": null,
"default": null,
"nullable": false
},
"credit_card": {
"name": "credit_card",
"type": "text",
"comment": null,
"default": null,
"nullable": true
}
},
"comment": null
}
}
}
```
After this change, I believe that the `create_table` operation is
feature complete and can be used for many sequential migrations.
Add a sentinel error `ErrNoActiveMigration` for the case where there is
no active migration. This improves the error strings presented to users
by not mentioning SQL errors.
**`pg-roll start` when there is a migration in progess:**
```
Error: a migration for schema "public" is already in progress
```
**`pg-roll rollback` when there is no migration in progress:**
```
Error: unable to get active migration: no active migration
```
**`pg-complete` when there is no active migration:**
```
Error: unable to get active migration: no active migration
```
This migrations introduces state handling by creating a dedicated
`pgroll` schema (name configurable). We will store migrations there, as
well as their state. So we keep some useful information, ie the
migration definition (so we don't need it for the `complete` state).
Schema includes the proper constraints to guarantee that:
* Only a migration is active at a time
* Migration history is linear (all migrations have a unique parent,
except the first one which is NULL)
* We now the current migration at all times
Some helper functions are included:
* `is_active_migration_period()` will return true if there is an active
migration.
* `latest_version()` will return the name of the latest version of the
schema.