reshape/README.md

623 lines
18 KiB
Markdown
Raw Normal View History

2021-10-19 17:32:37 +03:00
# Reshape
2022-01-04 18:22:23 +03:00
[![Test status badge](https://github.com/fabianlindfors/Reshape/actions/workflows/test.yaml/badge.svg)](https://github.com/fabianlindfors/reshape/actions/workflows/test.yaml) [![Latest release](https://shields.io/github/v/release/fabianlindfors/reshape?display_name=tag&sort=semver&color=blue)](https://github.com/fabianlindfors/reshape/releases)
2021-10-19 17:32:37 +03:00
2022-01-17 01:23:30 +03:00
Reshape is an easy-to-use, zero-downtime schema migration tool for Postgres. It automatically handles complex migrations that would normally require downtime or manual multi-step changes. During a migration, Reshape ensures both the old and new schema are available at the same time, allowing you to gradually roll out your application. It will also perform all changes without excessive locking, avoiding downtime caused by blocking other queries. For a more thorough introduction to Reshape, check out the [introductory blog post](https://fabianlindfors.se/blog/schema-migrations-in-postgres-using-reshape/).
2021-10-19 17:32:37 +03:00
2022-01-17 01:23:30 +03:00
Designed for Postgres 12 and later.
*Note: Reshape is **experimental** and should not be used in production. It can (and probably will) break your application.*
2021-10-19 17:32:37 +03:00
- [How it works](#how-it-works)
2021-10-27 01:15:54 +03:00
- [Getting started](#getting-started)
- [Installation](#installation)
- [Creating your first migration](#creating-your-first-migration)
- [Preparing your application](#preparing-your-application)
- [Running your migration](#running-your-migration)
2022-01-03 15:24:55 +03:00
- [Using during development](#using-during-development)
2021-10-27 01:15:54 +03:00
- [Writing migrations](#writing-migrations)
- [Basics](#basics)
- [Tables](#tables)
- [Create table](#create-table)
2021-11-18 01:39:03 +03:00
- [Rename table](#rename-table)
2021-11-18 01:07:12 +03:00
- [Remove table](#remove-table)
2022-01-21 20:00:41 +03:00
- [Add foreign key](#add-foreign-key)
2022-01-28 01:33:23 +03:00
- [Remove foreign key](#remove-foreign-key)
2021-10-27 01:15:54 +03:00
- [Columns](#columns)
2021-10-27 01:19:29 +03:00
- [Add column](#add-column)
2021-10-27 01:15:54 +03:00
- [Alter column](#alter-column)
2021-11-12 01:29:02 +03:00
- [Remove column](#remove-column)
2021-10-27 01:55:29 +03:00
- [Indices](#indices)
- [Add index](#add-index)
2022-01-12 03:12:08 +03:00
- [Remove index](#remove-index)
2022-01-20 01:00:45 +03:00
- [Enums](#enums)
- [Create enum](#create-enum)
2022-01-20 01:14:24 +03:00
- [Remove enum](#remove-enum)
- [Custom](#custom)
2021-11-11 22:36:37 +03:00
- [Commands and options](#commands-and-options)
- [`reshape migrate`](#reshape-migrate)
- [`reshape complete`](#reshape-complete)
- [`reshape abort`](#reshape-abort)
2021-11-11 22:47:07 +03:00
- [`reshape generate-schema-query`](#reshape-generate-schema-query)
2021-11-11 22:36:37 +03:00
- [Connection options](#connection-options)
2022-01-03 15:24:55 +03:00
- [License](#license)
2021-10-19 17:32:37 +03:00
## How it works
Reshape works by creating views that encapsulate the underlying tables, which your application will interact with. During a migration, Reshape will automatically create a new set of views and set up triggers to translate inserts and updates between the old and new schema. This means that every deployment is a three-phase process:
1. **Start migration** (`reshape migrate`): Sets up views and triggers to ensure both the new and old schema are usable at the same time.
2. **Roll out application**: Your application can be gradually rolled out without downtime. The existing deployment will continue using the old schema whilst the new deployment uses the new schema.
3. **Complete migration** (`reshape complete`): Removes the old schema and any intermediate data and triggers.
If the application deployment fails, you should run `reshape abort` which will roll back any changes made by `reshape migrate` without losing data.
2021-10-19 17:32:37 +03:00
## Getting started
### Installation
#### Binaries
2021-10-19 17:32:37 +03:00
Binaries are available for macOS and Linux under [Releases](https://github.com/fabianlindfors/reshape/releases).
2021-10-19 17:32:37 +03:00
2022-01-04 18:14:40 +03:00
#### Cargo
2022-02-03 16:25:06 +03:00
Reshape can be installed using [Cargo](https://doc.rust-lang.org/cargo/) (requires Rust 1.58 or later):
2022-01-04 18:14:40 +03:00
```shell
cargo install reshape
```
#### Docker
2021-10-19 17:32:37 +03:00
Reshape is available as a Docker image on [Docker Hub](https://hub.docker.com/repository/docker/fabianlindfors/reshape).
```shell
2021-12-28 18:44:52 +03:00
docker run -v $(pwd):/usr/share/app fabianlindfors/reshape reshape migrate
```
2021-10-19 17:32:37 +03:00
### Creating your first migration
2022-01-03 15:24:55 +03:00
Each migration should be stored as a separate file in a `migrations/` directory. The files can be in either JSON or TOML format and the name of the file will become the name of your migration. We recommend prefixing every migration with an incrementing number as migrations are sorted by file name.
2021-10-19 17:32:37 +03:00
2022-01-03 15:24:55 +03:00
Let's create a simple migration to set up a new table `users` with two fields, `id` and `name`. We'll create a file called `migrations/1_create_users_table.toml`:
2021-10-19 17:32:37 +03:00
```toml
[[actions]]
type = "create_table"
2022-01-05 16:04:40 +03:00
name = "users"
primary_key = ["id"]
2021-10-19 17:32:37 +03:00
[[actions.columns]]
name = "id"
type = "INTEGER"
generated = "ALWAYS AS IDENTITY"
2021-10-19 17:32:37 +03:00
[[actions.columns]]
name = "name"
type = "TEXT"
```
This is the equivalent of running `CREATE TABLE users (id INTEGER GENERATED ALWAYS AS IDENTITY, name TEXT)`.
2021-10-19 17:32:37 +03:00
### Preparing your application
Reshape relies on your application using a specific schema. When establishing the connection to Postgres in your application, you need to run a query to select the most recent schema. This query can be generated using: `reshape generate-schema-query`.
To pass it along to your application, you can for example use an environment variable in your run script: `RESHAPE_SCHEMA_QUERY=$(reshape generate-schema-query)`. Then in your application:
2021-10-19 17:32:37 +03:00
```python
# Example for Python
reshape_schema_query = os.getenv("RESHAPE_SCHEMA_QUERY")
db.execute(reshape_schema_query)
```
2022-04-19 21:15:59 +03:00
If your application is written in Rust, you might prefer the [Rust helper library](https://github.com/fabianlindfors/reshape-helper) which embeds the query directly in your application with a macro.
2021-10-19 17:32:37 +03:00
### Running your migration
To create your new `users` table, run:
```bash
reshape migrate --complete
2021-10-19 17:32:37 +03:00
```
2022-02-06 00:16:34 +03:00
We use the `--complete` flag to automatically complete the migration. During a production deployment, you should first run `reshape migrate` followed by `reshape complete` once your application has been fully rolled out.
2021-10-19 17:32:37 +03:00
2021-12-28 17:58:48 +03:00
If nothing else is specified, Reshape will try to connect to a Postgres database running on `localhost` using `postgres` as both username and password. See [Connection options](#connection-options) for details on how to change the connection settings.
2022-01-03 15:24:55 +03:00
### Using during development
When adding new migrations during development, we recommend running `reshape migrate` but skipping `reshape complete`. This way, the new migrations can be iterated on by updating the migration file and running `reshape abort` followed by `reshape migrate`.
2021-10-19 17:32:37 +03:00
## Writing migrations
### Basics
Every migration consists of one or more actions. The actions will be run sequentially. Here's an example of a migration with two actions to create two tables, `customers` and `products`:
```toml
[[actions]]
type = "create_table"
2022-01-06 18:54:35 +03:00
name = "customers"
primary_key = ["id"]
2021-10-19 17:32:37 +03:00
[[actions.columns]]
name = "id"
type = "INTEGER"
generated = "ALWAYS AS IDENTITY"
2021-10-19 17:32:37 +03:00
[[actions]]
type = "create_table"
2022-01-06 18:54:35 +03:00
name = "products"
primary_key = ["sku"]
2021-10-19 17:32:37 +03:00
[[actions.columns]]
name = "sku"
type = "TEXT"
```
Every action has a `type`. The supported types are detailed below.
2021-10-27 01:19:29 +03:00
### Tables
#### Create table
2021-10-19 17:32:37 +03:00
The `create_table` action will create a new table with the specified columns, indices and constraints.
2021-10-19 17:32:37 +03:00
2021-10-27 01:55:29 +03:00
*Example: create a `customers` table with a few columns and a primary key*
2021-10-19 17:32:37 +03:00
```toml
[[actions]]
type = "create_table"
2022-01-06 18:54:35 +03:00
name = "customers"
primary_key = ["id"]
2021-10-19 17:32:37 +03:00
[[actions.columns]]
name = "id"
type = "INTEGER"
generated = "ALWAYS AS IDENTITY"
2021-10-19 17:32:37 +03:00
[[actions.columns]]
name = "name"
type = "TEXT"
2021-10-19 17:32:37 +03:00
# Columns default to nullable
nullable = false
# default can be any valid SQL value, in this case a string literal
default = "'PLACEHOLDER'"
```
2021-10-27 01:55:29 +03:00
*Example: create `users` and `items` tables with a foreign key between them*
```toml
[[actions]]
type = "create_table"
2022-01-06 18:54:35 +03:00
name = "users"
primary_key = ["id"]
[[actions.columns]]
name = "id"
type = "INTEGER"
generated = "ALWAYS AS IDENTITY"
[[actions]]
type = "create_table"
2022-01-06 18:54:35 +03:00
name = "items"
primary_key = ["id"]
[[actions.columns]]
name = "id"
type = "INTEGER"
generated = "ALWAYS AS IDENTITY"
[[actions.columns]]
name = "user_id"
type = "INTEGER"
[[actions.foreign_keys]]
columns = ["user_id"]
referenced_table = "users"
referenced_columns = ["id"]
```
2021-11-18 01:39:03 +03:00
#### Rename table
The `rename_table` action will change the name of an existing table.
*Example: change name of `users` table to `customers`*
```toml
[[actions]]
type = "rename_table"
table = "users"
new_name = "customers"
```
2021-11-18 01:07:12 +03:00
#### Remove table
The `remove_table` action will remove an existing table.
*Example: remove `users` table*
```toml
[[actions]]
type = "remove_table"
table = "users"
```
2022-01-21 20:00:41 +03:00
#### Add foreign key
The `add_foreign_key` action will add a foreign key between two existing tables. The migration will fail if the existing column values aren't valid references.
*Example: create foreign key from `items` to `users` table*
```toml
[[actions]]
type = "add_foreign_key"
table = "items"
[actions.foreign_key]
columns = ["user_id"]
referenced_table = "users"
referenced_columns = ["id"]
```
2022-01-28 01:33:23 +03:00
#### Remove foreign key
The `remove_foreign_key` action will remove an existing foreign key. The foreign key will only be removed once the migration is completed, which means that your new application must continue to adhere to the foreign key constraint.
*Example: remove foreign key `items_user_id_fkey` from `users` table*
```toml
[[actions]]
type = "remove_foreign_key"
table = "items"
foreign_key = "items_user_id_fkey"
```
2021-10-27 01:19:29 +03:00
### Columns
#### Add column
2021-10-19 17:32:37 +03:00
The `add_column` action will add a new column to an existing table. You can optionally provide an `up` setting. This should be an SQL expression which will be run for all existing rows to backfill the new column.
*Example: add a new column `reference` to table `products`*
```toml
[[actions]]
type = "add_column"
table = "products"
[actions.column]
name = "reference"
type = "INTEGER"
nullable = false
default = "10"
```
*Example: replace an existing `name` column with two new columns, `first_name` and `last_name`*
```toml
[[actions]]
type = "add_column"
table = "users"
# Extract the first name from the existing name column
up = "(STRING_TO_ARRAY(name, ' '))[1]"
[actions.column]
name = "first_name"
type = "TEXT"
[[actions]]
type = "add_column"
table = "users"
# Extract the last name from the existing name column
up = "(STRING_TO_ARRAY(name, ' '))[2]"
[actions.column]
name = "last_name"
type = "TEXT"
[[actions]]
type = "remove_column"
table = "users"
column = "name"
# Reconstruct name column by concatenating first and last name
down = "first_name || ' ' || last_name"
```
2022-01-26 00:26:03 +03:00
*Example: extract nested value from unstructured JSON `data` column to new `name` column*
2022-01-26 00:14:47 +03:00
```toml
[[actions]]
type = "add_column"
table = "users"
# #>> '{}' converts the JSON string value to TEXT
up = "data['path']['to']['value'] #>> '{}'"
[actions.column]
name = "name"
type = "TEXT"
```
2021-10-19 17:32:37 +03:00
2021-10-27 01:19:29 +03:00
#### Alter column
2021-10-19 17:32:37 +03:00
The `alter_column` action enables many different changes to an existing column, for example renaming, changing type and changing existing values.
When performing more complex changes than a rename, `up` and `down` should be provided. These should be SQL expressions which determine how to transform between the new and old version of the column. Inside those expressions, you can reference the current column value by the column name.
2021-10-19 17:32:37 +03:00
*Example: rename `last_name` column on `users` table to `family_name`*
```toml
[[actions]]
type = "alter_column"
table = "users"
column = "last_name"
[actions.changes]
name = "family_name"
```
*Example: change the type of `reference` column from `INTEGER` to `TEXT`*
```toml
[[actions]]
type = "alter_column"
table = "users"
column = "reference"
up = "CAST(reference AS TEXT)" # Converts from integer value to text
down = "CAST(reference AS INTEGER)" # Converts from text value to integer
[actions.changes]
type = "TEXT" # Previous type was 'INTEGER'
```
2021-12-28 17:58:48 +03:00
*Example: increment all values of an `index` column by one*
2021-10-19 17:32:37 +03:00
```toml
[[actions]]
type = "alter_column"
table = "users"
column = "index"
2021-10-19 17:32:37 +03:00
up = "index + 1" # Increment for new schema
down = "index - 1" # Decrement to revert for old schema
[actions.changes]
name = "index"
```
2022-02-04 19:55:48 +03:00
*Example: make `name` column not nullable*
```toml
[[actions]]
type = "alter_column"
table = "users"
column = "name"
# Use "N/A" for any rows that currently have a NULL name
up = "COALESCE(name, 'N/A')"
[actions.changes]
nullable = false
```
*Example: change default value of `created_at` column to current time*
```toml
[[actions]]
type = "alter_column"
table = "users"
column = "created_at"
[actions.changes]
default = "NOW()"
```
2021-11-12 01:29:02 +03:00
#### Remove column
The `remove_column` action will remove an existing column from a table. You can optionally provide a `down` setting. This should be an SQL expression which will be used to determine values for the old schema when inserting or updating rows using the new schema. The `down` setting must be provided when the removed column is `NOT NULL` or doesn't have a default value.
Any indices that cover the column will be removed.
2021-11-12 01:29:02 +03:00
*Example: remove column `name` from table `users`*
```toml
[[actions]]
type = "remove_column"
table = "users"
column = "name"
# Use a default value of "N/A" for the old schema when inserting/updating rows
down = "'N/A'"
```
2021-10-27 01:55:29 +03:00
### Indices
#### Add index
The `add_index` action will add a new index to an existing table.
*Example: create a `users` table with a unique index on the `name` column*
2021-10-27 01:55:29 +03:00
```toml
[[actions]]
type = "create_table"
table = "users"
primary_key = "id"
[[actions.columns]]
name = "id"
type = "INTEGER"
generated = "ALWAYS AS IDENTITY"
2021-10-27 01:55:29 +03:00
[[actions.columns]]
name = "name"
type = "TEXT"
[[actions]]
type = "add_index"
table = "users"
[actions.index]
name = "name_idx"
columns = ["name"]
# Defaults to false
unique = true
2021-10-27 01:55:29 +03:00
```
*Example: add GIN index to `data` column on `products` table*
```toml
[[actions]]
type = "add_index"
table = "products"
[actions.index]
name = "data_idx"
columns = ["data"]
# One of: btree (default), hash, gist, spgist, gin, brin
type = "gin"
```
2022-01-12 03:12:08 +03:00
#### Remove index
The `remove_index` action will remove an existing index. The index won't actually be removed until the migration is completed.
*Example: remove the `name_idx` index*
```toml
[[actions]]
type = "remove_index"
index = "name_idx"
```
2022-01-20 01:00:45 +03:00
### Enums
#### Create enum
The `create_enum` action will create a new [enum type](https://www.postgresql.org/docs/current/datatype-enum.html) with the specified values.
2022-01-20 01:14:24 +03:00
*Example: add a new `mood` enum type with three possible values*
2022-01-20 01:00:45 +03:00
```toml
[[actions]]
type = "create_enum"
name = "mood"
values = ["happy", "ok", "sad"]
```
2022-01-20 01:14:24 +03:00
#### Remove enum
The `remove_enum` action will remove an existing [enum type](https://www.postgresql.org/docs/current/datatype-enum.html). Make sure all usages of the enum has been removed before running the migration. The enum will only be removed once the migration is completed.
*Example: remove the `mood` enum type*
```toml
[[actions]]
type = "remove_enum"
enum = "mood"
```
### Custom
The `custom` action lets you create a migration which runs custom SQL. It should be used with great care as it provides no guarantees of zero-downtime and will simply run whatever SQL is provided. Use other actions whenever possible as they are explicitly designed for zero downtime.
There are three optional settings available which all accept SQL queries. All queries need to be idempotent, for example by using `IF NOT EXISTS` wherever available.
- `start`: run when a migration is started using `reshape migrate`
- `complete`: run when a migration is completed using `reshape complete`
- `abort`: run when a migration is aborted using `reshape abort`
*Example: enable PostGIS and pg_stat_statements extensions*
```toml
[[actions]]
type = "custom"
start = """
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
"""
abort = """
DROP EXTENSION IF NOT EXISTS postgis;
DROP EXTENSION IF NOT EXISTS pg_stat_statements;
"""
```
2021-11-11 22:36:37 +03:00
## Commands and options
### `reshape migrate`
Starts a new migration, applying all migrations under `migrations/` that haven't yet been applied. After the command has completed, both the old and new schema will be usable at the same time. When you have rolled out the new version of your application which uses the new schema, you should run `reshape complete`.
#### Options
*See also [Connection options](#connection-options)*
| Option | Default | Description |
| ------ | ------- | ----------- |
2022-01-03 15:24:55 +03:00
| `--complete`, `-c` | `false` | Automatically complete migration after applying it. |
| `--dirs` | `migrations/` | Directories to search for migration files. Multiple directories can be specified using `--dirs dir1 dir2 dir3`. |
2021-11-11 22:36:37 +03:00
### `reshape complete`
Completes migrations previously started with `reshape complete`.
#### Options
See [Connection options](#connection-options)
### `reshape abort`
Aborts any migrations which haven't yet been completed.
#### Options
See [Connection options](#connection-options)
2021-11-11 22:47:07 +03:00
### `reshape generate-schema-query`
Generates the SQL query you need to run in your application before using the database. This command does not require a database connection. Instead it will generate the query based on the latest migration in the `migrations/` directory (or the directories specified by `--dirs`).
2021-11-11 22:47:07 +03:00
The query should look something like `SET search_path TO migration_1_initial_migration`.
#### Options
| Option | Default | Description |
| ------ | ------- | ----------- |
| `--dirs` | `migrations/` | Directories to search for migration files. Multiple directories can be specified using `--dirs dir1 dir2 dir3`. |
2021-11-11 22:36:37 +03:00
### Connection options
The options below can be used with all commands that communicate with Postgres. Use either a [connection URL](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING) or specify each connection option individually.
All options can also be set using environment variables instead of flags. If a `.env` file exists, then variables will be automatically loaded from there.
| Option | Default | Environment variable | Description |
| ------ | ------- | -------------------- | ----------- |
| `--url` | | `DB_URL` | URL to your Postgres database |
| `--host` | `localhost` | `DB_HOST` | Hostname to use when connecting to Postgres |
| `--port` | `5432` | `DB_PORT` | Port which Postgres is listening on |
| `--database` | `postgres` | `DB_NAME` | Database name |
| `--username` | `postgres` | `DB_USERNAME` | Postgres username |
| `--password` | `postgres` | `DB_PASSWORD` | Postgres password |
2021-10-19 17:32:37 +03:00
2022-01-01 19:27:23 +03:00
## License
Reshape is released under the [MIT license](https://choosealicense.com/licenses/mit/).