# PGmigrate tutorial We'll play around with example database `foodb`. ## Base directory structure of our example Our [Example db](foodb) migrations dir structure looks like this: ``` foodb ├── callbacks # directory with sql callbacks │ ├── afterAll # will be executed before commit and after last migration │ ├── afterEach # will be executed after each migration │ ├── beforeAll # will be executed after begin and before first migration │ └── beforeEach # will be executed before each migration ├── grants # use this dir to set special callbacks for grants ├── migrations # migrations dir ├── migrations.yml # pgmigrate configuration ``` Every sql file has special operation on table `ops`. This will help in understanding what is going on in each pgmigrate run. ## Configuration Let's start with [Example configuration](foodb/migrations.yml). ### Callbacks Callbacks could be configured via command-line arguments like this: ``` admin@localhost foodb $ pgmigrate -a beforeAll:callbacks/beforeAll,afterAll:callbacks/afterAll ... ``` But if we have a lot of callbacks we could use configuration file for them: ``` callbacks: beforeAll: - callbacks/beforeAll beforeEach: - callbacks/beforeEach afterEach: - callbacks/afterEach afterAll: - callbacks/afterAll - grants ``` ### Connection We could use command-line arguments for connection configuration: ``` admin@localhost foodb $ pgmigrate -c 'dbname=foodb' ... ``` Or configuration file: ``` conn: dbname=foodb ``` Third option for setting connection params is using [environment variables](https://www.postgresql.org/docs/current/libpq-envars.html): ``` admin@localhost foodb $ PGDATABASE=foodb pgmigrate -c '' ... ``` Note: we need to explicitly set connstring to empty value via command-line argument or configuration file to force psycopg2 to pick fields from environment. ## Migration file name pattern All migration files should have versions and names in the following format ``` V__.sql ``` Note: files not matching this pattern will be skipped. ## Creating `foo` user and `foodb` We'll need dummy user and database for our experiments. ``` postgres=# CREATE ROLE foo WITH LOGIN PASSWORD 'foo'; CREATE ROLE postgres=# CREATE DATABASE foodb; CREATE DATABASE ``` ## Getting migrations info before first migration ``` admin@localhost foodb $ pgmigrate -t 1 info { "1": { "description": "Initial schema foo", "transactional": true, "version": 1, "installed_by": null, "type": "auto", "installed_on": null } } ``` Here we see json description of migrations that will be applied if we want to get to version 1. Let's try to check steps to apply up to version 3 but ignoring version 1: ``` admin@localhost foodb $ pgmigrate -b 1 -t 3 info { "2": { "description": "Add baz column to foo", "transactional": true, "version": 2, "installed_by": null, "type": "auto", "installed_on": null }, "3": { "description": "NONTRANSACTIONAL Add index on baz column", "transactional": false, "version": 3, "installed_by": null, "type": "auto", "installed_on": null } } ``` ## Migrating to first version ``` admin@localhost foodb $ pgmigrate -t 1 migrate admin@localhost foodb $ echo $? 0 ``` Ok. Migration applied. Let's see what is in our db now. ``` admin@localhost foodb $ psql foodb psql (9.5.4) Type "help" for help. foodb=# SELECT * FROM ops; seq | op -----+----------------------------------------- 1 | beforeAll 00_create_database_ops.sql 2 | beforeEach 00_dummy_before_each.sql 3 | migration V0001__Initial_schema_foo.sql 4 | afterEach 00_dummy_after_each.sql 5 | afterAll 00_dummy_after_all.sql 6 | grants foo (6 rows) foodb=# \dt foo.foo List of relations Schema | Name | Type | Owner --------+------+-------+------- foo | foo | table | admin (1 row) foodb=# \dS+ foo.foo Table "foo.foo" Column | Type | Modifiers | Storage | Stats target | Description --------+--------+-----------+----------+--------------+------------- id | bigint | not null | plain | | bar | text | not null | extended | | Indexes: "foo_pkey" PRIMARY KEY, btree (id) ``` Let's check if `foo` user can really do something with our new table. ``` psql "dbname=foodb user=foo password=foo host=localhost" psql (9.5.4) Type "help" for help. foodb=> SELECT * FROM foo.foo; id | bar ----+----- (0 rows) ``` ## Mixing transactional and nontransactional migrations Let's try to go to version 3. ``` admin@localhost foodb $ pgmigrate -t 3 migrate 2016-09-29 00:14:35,402 ERROR : Unable to mix transactional and nontransactional migrations Traceback (most recent call last): File "/usr/local/bin/pgmigrate", line 9, in load_entry_point('yandex-pgmigrate==1.0.0', 'console_scripts', 'pgmigrate')() File "/usr/local/lib/python2.7/dist-packages/pgmigrate.py", line 663, in _main COMMANDS[args.cmd](config) File "/usr/local/lib/python2.7/dist-packages/pgmigrate.py", line 549, in migrate raise MigrateError('Unable to mix transactional and ' pgmigrate.MigrateError: Unable to mix transactional and nontransactional migrations ``` Oops! It complained. But why? The main reason for this is quite simple: Your production databases are likely larger than test ones. And migration to version 3 could take a lot of time. You definitely should stop on version 2, check that everything is working fine, and then move to version 3. ## Migrating to second version Ok. Now let's try version 2. ``` admin@localhost foodb $ pgmigrate -t 2 migrate admin@localhost foodb $ echo $? 0 ``` Looks good. But what is in db? ``` admin@localhost foodb $ psql foodb psql (9.5.4) Type "help" for help. foodb=# SELECT * FROM ops; seq | op -----+-------------------------------------------- 1 | beforeAll 00_create_database_ops.sql 2 | beforeEach 00_dummy_before_each.sql 3 | migration V0001__Initial_schema_foo.sql 4 | afterEach 00_dummy_after_each.sql 5 | afterAll 00_dummy_after_all.sql 6 | grants foo 7 | beforeAll 00_create_database_ops.sql 8 | beforeEach 00_dummy_before_each.sql 9 | migration V0002__Add_baz_column_to_foo.sql 10 | afterEach 00_dummy_after_each.sql 11 | afterAll 00_dummy_after_all.sql 12 | grants foo (12 rows) foodb=# \dS+ foo.foo Table "foo.foo" Column | Type | Modifiers | Storage | Stats target | Description --------+--------+--------------------+----------+--------------+------------- id | bigint | not null | plain | | bar | text | not null | extended | | baz | bigint | not null default 0 | plain | | Indexes: "foo_pkey" PRIMARY KEY, btree (id) ``` As we can see migration steps are almost the same as in version 1. ## Migrating to version 3 with nontransactional migration ``` admin@localhost foodb $ pgmigrate -t 3 migrate admin@localhost foodb $ echo $? 0 ``` In database: ``` admin@localhost foodb $ psql foodb psql (9.5.4) Type "help" for help. foodb=# SELECT * FROM ops; seq | op -----+--------------------------------------------------------------- 1 | beforeAll 00_create_database_ops.sql 2 | beforeEach 00_dummy_before_each.sql 3 | migration V0001__Initial_schema_foo.sql 4 | afterEach 00_dummy_after_each.sql 5 | afterAll 00_dummy_after_all.sql 6 | grants foo 7 | beforeAll 00_create_database_ops.sql 8 | beforeEach 00_dummy_before_each.sql 9 | migration V0002__Add_baz_column_to_foo.sql 10 | afterEach 00_dummy_after_each.sql 11 | afterAll 00_dummy_after_all.sql 12 | grants foo 13 | migration V0003__NONTRANSACTIONAL_Add_index_on_baz_column.sql (13 rows) foodb=# \dS+ foo.foo Table "foo.foo" Column | Type | Modifiers | Storage | Stats target | Description --------+--------+--------------------+----------+--------------+------------- id | bigint | not null | plain | | bar | text | not null | extended | | baz | bigint | not null default 0 | plain | | Indexes: "foo_pkey" PRIMARY KEY, btree (id) "i_foo_baz" btree (baz) ``` No callbacks were applied this time (we are trying to run the absolute minimum of operations outside of transactions). ## Baseline Let's suppose that you already have a database with schema on version 3. But you have already reached this state without using pgmigrate. How should you migrate to version 4 and so on with it? Let's remove schema_version info from our database ``` admin@localhost foodb $ pgmigrate clean ``` Now let's check how pgmigrate will bring us to version 3: ``` admin@localhost foodb $ pgmigrate -t 3 info { "1": { "description": "Initial schema foo", "transactional": true, "version": 1, "installed_by": null, "type": "auto", "installed_on": null }, "2": { "description": "Add baz column to foo", "transactional": true, "version": 2, "installed_by": null, "type": "auto", "installed_on": null }, "3": { "description": "NONTRANSACTIONAL Add index on baz column", "transactional": false, "version": 3, "installed_by": null, "type": "auto", "installed_on": null } } ``` This looks really bad. Our migration v1 will definitely fail (because schema `foo` already exists). Let's tell pgmigrate that our database is already on version 3. ``` admin@localhost foodb $ pgmigrate -b 3 baseline admin@localhost foodb $ pgmigrate -t 3 info { "3": { "description": "Forced baseline", "transactional": true, "version": 3, "installed_on": "2016-09-29 00:37:27", "type": "manual", "installed_by": "admin" } } ``` ## Migrations on empty database When you have hundreds of migrations with some nontransactional ones you really don't want to stop on each of them to get your empty database to specific version (consider creating new database for some experiments). PGmigrate is able to run such kind of migration in single command run (but you should definitely know what are you doing). Let's try it. Drop and create empty `foodb` ``` postgres=# DROP DATABASE foodb; DROP DATABASE postgres=# CREATE DATABASE foodb; CREATE DATABASE ``` Now migrate to latest available version ``` admin@localhost foodb $ pgmigrate -t latest migrate ``` Operations log will look like this: ``` admin@localhost foodb $ psql foodb psql (9.5.4) Type "help" for help. foodb=# SELECT * FROM ops; seq | op -----+--------------------------------------------------------------- 1 | beforeAll 00_create_database_ops.sql 2 | beforeEach 00_dummy_before_each.sql 3 | migration V0001__Initial_schema_foo.sql 4 | afterEach 00_dummy_after_each.sql 5 | beforeEach 00_dummy_before_each.sql 6 | migration V0002__Add_baz_column_to_foo.sql 7 | afterEach 00_dummy_after_each.sql 8 | afterAll 00_dummy_after_all.sql 9 | grants foo 10 | migration V0003__NONTRANSACTIONAL_Add_index_on_baz_column.sql (10 rows) ``` ## UTF-8 Migrations In most cases you should avoid non-ascii characters in your migrations. So PGmigrate will complain about them with: ``` pgmigrate.MalformedStatement: Non ascii symbols in file ``` But sometimes there is no way to avoid migration with UTF-8 (imagine a case with inserting some initial data in your database). You could insert modeline in migration file to disable non-ascii characters check: ``` /* pgmigrate-encoding: utf-8 */ ``` ## Session setup Sometimes you need to set some session options before migrate (e.g. isolation level). It is possible with `-s` option or `session` in config. For example to set `serializable` isolation level and lock timeout to 30 seconds one could do something like this: ``` pgmigrate -s "SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE" \ -s "SET lock_timeout = '30s'" ... ``` This feature will not work with connection pooler (such as `odyssey` or `pgbouncer`) in non-session mode. ## Terminating blocking pids On heavy loaded production environments running some migrations could block queries by application backends. Unfortunately if migration is blocked by some other query it could lead to really slow database queries. For example lock queue like this: ``` ``` makes database almost unavailable for at least `idle_in_transaction_timeout`. To mitigate such issues there is `-l ` option in pgmigrate which starts separate thread running `pg_terminate_backend(pid)` for each pid blocking any of pgmigrate conn pids every `interval` seconds. Of course pgmigrate should be able to terminate other pids so migration user should be the app user or have `pg_signal_backend` grant. To terminate superuser (e.g. `postgres`) pids one could run pgmigrate with superuser. Session setup should not manipulate `application_name` setting because conflict terminations expects application names in pg_stat_activity to match internal dsn values. Note: this feature relies on `pg_blocking_pids()` function available since PostgreSQL 9.6. ## Session restriction In some cases you need to use several independent schemas in one database. For example large SaaS applications tend to use such setup for client separation. To use non-default schema for migrations one could pass `-m ` option. By default schema restriction is enabled (it prevents access and modification of relations not in system schemas and selected schema). Some restrictions are hard to implement with current approach: relation drop and nontransactional migrations support. Schema restriction could be disabled with `--disable_schema_check` option. ## Avoiding gaps in versions Sometimes development process leads to adding a gaps in migration versions. E.g. current version is `N`. Alice worked on branch `a` and expects branch `b` (Bob is working on it) to be merged first. So she decides to select version `N+2` instead of `N+1` (Bob selects this version). But for some reason branch `a` was merged before `b`. If we run migrations on database without branch `b` merge we'll need to change version in `b` to `N+3` or it will be skipped. One could run migrate with `--check_serial_versions` option to avoid applying migrations with gaps in versions.