squeal/squeal-presentation-raveline.md

18 KiB

Squeal

A bridge between SQL and Haskell

by @Raveline


Rationale

Using postgresql-simple you have a typical "Trial, error, despair" workflow:

  • Write query manually or semi-automatically

  • Write haskell code using the query

  • Pray that the query syntax is correct

  • Pray that it returns what you want

  • Realize the inefficiency of prayer when it comes to SQL

  • Iterate till it works


Can't take this anymore


Use your best friend: GHC

Squeal provides several eDSL to make your SQL typesafe:

  • Type level eDSL to express schema;

  • Value level eDSL to manipulate schema (plus migration, yeehaw !);

  • Type level eDSL to express queries;

  • Value level eDSL to perform queries

However, it's not an ORM. There's no caching, lazy loading - you retain control over your memory. Also, joins - and mostly aggregation after joins - have to be handled manually.


Part I. Schema & migration


The Schema

  • The schema will be used to validate everything: migration, queries, etc.

  • A schema is mostly a collection of tables.

  • In this presentation, we will create a very basic database modeling a parliament. We will store members of parliament and parliamentary groups.

  • A simple Schema to represent a Parliament could be:

type Schema = '[ "mp" ::: 'Table MemberOfParliament
               , "groupp" ::: 'Table ParliamentGroup ]

Note

We need the DataKinds extension to be able to express heterogenous lists containing specific types like this one.

You can perfectly call your table "group" and not "groupp" even though it is a keyword in SQL - Squeal queries will be properly escaped.


A small example

type ParliamentaryGroup =
    '[ "pk_groupp" ::: 'PrimaryKey '["groupp_id"]]
      :=>
      '[ "groupp_id" ::: 'NoDef  :=> 'NotNull 'PGuuid
       , "name" ::: 'NoDef :=> 'NotNull 'PGtext
      ]
  • That's a whole simple table defined in one go.

  • ::: lets us define a column or a constraint.

  • :=> associates the constraints to the column.

  • Let's split constraints and column to study the syntax a bit more.


Note

We are using ::: and :=> to quickly express associations when writing our schema. We need the TypeOperators extension.]


What does this means ?!


Defining a table

type MemberOfParliament =
    '[ "pk_mp" ::: 'PrimaryKey '["mp_id"]
    , "fk_mp_groupp" ::: 'ForeignKey '["mp_group"] "groupp" '["groupp_id"]
    ] :=> MpCols
  • A table is a collection of constraints associated to a collection of columns.

  • We defined a table named "mp".

  • It has a constraint named "pk_mp", defining its primary key on column "mp_id".

  • It has a constraint named "fk_mp_groupp" defining a foreign key on column "mg_group", connected to the column "groupp_id" of table "groupp".

  • These constraints will be associated to the columns defined in MpCols.


Defining columns

type MpCols =
      '[ "mp_id" ::: 'NoDef :=> 'NotNull 'PGuuid
       , "first_name" ::: 'NoDef  :=> 'NotNull 'PGtext
       , "last_name" ::: 'NoDef :=> 'NotNull 'PGtext
       ]
  • A listing of column associate, for each element:

    • a name;

    • the mention of an eventual default value;

    • the nullability;

    • the type (obviously).


Note

GHC is already helping. If I named the "mp_id" column differently, GHC would yell because I promised a primary key constraint on a column named mp_id, so there must be one.]


Implementating the schema

  • We will carry this schema type pretty much everywhere.

  • But before we play with this schema, we need to implement it.


setup :: Definition '[] Schema
setup =
  createTable #groupp
    ( notNullable uuid `as` #groupp_id
    :* notNullable text `as` #name)
    ( primaryKey #groupp_id `as` #pk_groupp )
  >>> createTable #mp
    ( notNullable uuid `as` #mp_id
    :* notNullable text `as` #first_name
    :* notNullable text `as` #last_name
    :* notNullable text `as` #mp_group)
    ( primaryKey #mp_id `as` #pk_mp
    :* foreignKey #mp_group #groupp #groupp_id
        OnDeleteCascade OnUpdateCascade 
        `as` #fk_mp_groupp)

Damn, that's verbose

  • Yes. But you need verbosity to get type safety.

  • On the plus side, it's fairly straightforward.

  • You use :* to compose the element of the heterogenous list of columns.

  • You use >>> to compose table creation.

Note

The compiler will catch any mistype between Schema and definition; wrong nullability, wrong type, wrong name, etc.


Note

You'll also need OverloadedLabels, for naming stuff. This is mostly to avoid having to write manual proxies all the time and for convenience.


Setting up the schema

  • Squeal comes with a very good migration manager, handling upgrades AND downgrades.

  • A migration is a simple type:

Migration io schema0 schema1
  • The first parameter is a BaseMonad (typically, IO).

  • The second parameter is "the current schema of your DB".

  • The last one is "what you will migrate to".

  • The Definition type used to define upgrade and downgrade functions use the same logic (from one schema to the other).


Defining our first migration

  • We have the setup bit, we need the teardown:
tearDown :: Definition Schema '[]
tearDown = dropTable #mp >>> dropTable #groupp

Note

GHC will also detect the proper order of what you typed in downgrade and upgrade should there be any conflict (with foreign keys).

initDB :: Migration IO '[] Schema
initDB =
  Migration { name = "Schema creation"
            , up = void $ define setup
            , down = void $ define tearDown }

Simple migrator example

main :: IO ()
main = do
  printSQL setup
  void $ withConnection connectionString $
    migrateUp $ single initDB
  • We print the migration query; you can do this for all queries generated through Squeal.

  • We use migrateUp to perform the migration.

  • You can run several migration at the same time, to run only one use single.


That was easy


Part II. Insertions


Manipulations & Queries

Besides the specific case of migration, you will mostly perform:

  • Manipulation : inserting, updating and deleting data.

  • Query : fetching data.

  • Both types are parametric over the same things:

    • A schema

    • Input parameters

    • Output

  • Simplified, they look like this:

Manipulation schema params columns

Query schema params columns

Our Haskell model

type Parliament = [Group]

data Group =
  Group { name :: Text
        , members :: [MemberOfParliament] }

data MemberOfParliament =
  MemberOfParliament { firstName :: Text
                     , lastName :: Text }

Note

We didn't use anything from Squeal. The model can be entirely separated from the persistence layer.


Inserting a Parliamentary Group

  • A group is very simple: it's a uuid and a name. Let's define our params:
type GroupInsertionParams = '[ 'NotNull 'PGuuid
                             , 'NotNull 'PGtext ]
  • Params are not named, but they are indexed. You just need nullability and type.

groupInsertion :: Manipulation Schema GroupInsertionParams '[]
groupInsertion =
  insertRow_ #groupp ( Set (param @1) `as` #groupp_id
                     :* Set (param @2) `as` #name )

  • TypeApplication lets us use the index of parameters (counting from 1).

  • Once again: all this is checked by the compiler. Wrong name, wrong type... you'll get a compile error.

  • insertRow_ is a simplified version of insertRow. insertRow lets you express a RETURNING clause and the expected behaviour in case of conflict.


Inserting a Member of Parliament

Doing an INSERT INTO ... SELECT

  • We could create a naive query that takes MP uuid, first name, last name and group uuid...

  • But that's boring. So let's use the INSERT INTO ... SELECT.

  • We will build a query that will return as constants our MP's uuid, first name and last name...

  • ... and fetch the uuid of a group given the name of the group.

  • There's a insertQuery utility function for that. All we have to do is write the select !
mpInsertion :: Manipulation Schema MpParams '[]
mpInsertion =
  insertQuery_ #mp selectGroup -- we just have to write selectGroup !

Inserting a Member of Parliament

The query type

  • A query looks just like a Manipulation:
Query schema params returns
  • Returns use a special syntax, that demands names, nullability and types:
type ExampleReturnType = '[ "some_column" ::: 'NotNull 'PGuuid
                          , "other_column" ::: 'Null 'PGtext ]
  • It is not the same as the column definition we used in the schema. That one also expects that you specify, for each column, an eventual default:
type ExampleSchemaCols =
      '[ "some_column" ::: 'NoDef :=> 'NotNull 'PGuuid
       , "other_column" ::: 'NoDef  :=> 'Null 'PGtext ]

Inserting a Member of Parliament

Don't rewrite column definitions

  • We want our return type to be "all columns from table MP expressed as return type".

  • But we cannot reuse our neat MpCols alias, since the type do not match as we've just seen.

  • Hopefully, there's a neat Type Family that will let you convert any table you defined to the type of the equivalent row: TableToRow.

  • We finally have the signature for our SELECT query.

selectGroup :: Query Schema MpParams (TableToRow MemberOfParliament)

Inserting a Member of Parliament

Our final Select query

  • Our intermediate query looks like this:
selectGroup =
  select
    ( param @1 `as` #mp_id
    :* param @2 `as` #first_name
    :* param @3 `as` #last_name
    :* #groupp ! #groupp_id `as` #mp_group
    )
  ( from (table #groupp)
    & where_ ( #groupp ! #name .== param @4 )
  )
  • The first three columns are constant defined through our params;

  • ... we alias the column name using as...

  • ... and when we need the result from the table we use the #table ! #column syntax.

  • We'll go back to the "from" block later.


Dealing with the connection context

  • Queries are runned in a connection context:

    • When dealing with a single-connection context, use the type PQ.

    • When dealing with a pool of connection, use the type PoolPQ.

  • Or get rid of the context and use mtl style, with the typeclass MonadPQ.

  • We'll use that to demonstrate how to actually run our queries.


Inserting a whole group

insertGroup :: (MonadPQ Schema m, MonadBaseControl IO m) => Group -> m ()
insertGroup g@(Group name _) = do
  uuid' <- liftBase nextRandom
  void $ manipulateParams groupInsertion (uuid', name)
  insertMps g
  • Remember that we defined a group as a name and a list of MPs.

  • We are in MonadBaseControl, so we cannot use liftIO, we need liftBase.

  • To perform a simple insertion, use manipulateParams. It takes into parameter instances of ToParam, but you will typically use tuples or Generic-SOP.


Inserting a bunch of MPs

insertMps :: (MonadPQ Schema m, MonadBaseControl IO m) => Group -> m ()
insertMps (Group groupName mps) =
  let tuplify Mp{..} = (, firstName, lastName, groupName) <$> nextRandom
      params = traverse tuplify mps
  in void $ liftBase params >>= traversePrepared mpInsertion
  • This time, we want to do a preparedStatement. We'll use traversePrepare which behaves like manipulateParams, but expect a list of ToParam instances.

  • We build our tuple manually again, mostly because we want to generate UUID on the fly and we don't want to have them in our model.


Inserting a whole parliament

insertParliament ::
    (MonadPQ Schema m, MonadBaseControl IO m) => Parliament -> m ()
insertParliament = traverse_ insertGroup
  • That's simple enough !

  • If we want to be a bit safer, though, we can wrap this call in a transactionally_ function, which will put all that in a transaction.


Very simple


Part III. Selects


Composable queries

  • Building basic queries is easy, and is well documented.

  • However, the real interest of a tool like Squeal is in DRYness, and the documentation is still lacking in "how-to" related to composability.

  • I'll build an example showing how column selections and from clause can be factorized.

  • We want to write two queries:

    • One to get all members of a specific group.

    • One to get all the parliament.


Decomposing a Query

  • A query typically stars with select (variants are available).

  • It then takes:

    • A heterogenous list of fields with a scary signature;

    • A virtual table (the from clause and filter clauses), called a TableExpression.

  • And return fields. Ours will look like this:

type GroupRowResult =
  '[ "groupName" ::: 'NotNull 'PGtext
   , "firstName" ::: 'NotNull 'PGtext
   , "lastName" ::: 'NotNull 'PGtext ]

The Table Expression

  • It necessarily contains a fromClause (table, view or subquery, plus optional joins).

  • You can add where, groups, "HAVING" clause, order, etc.

  • from creates a basic TableExpression that you extend through various function to add clauses.

  • For our queries, we will share a common FromClause.


The From Clause

FromClause schema params from
  • We need Schema, input parameters and a FromType giving the available fields of the expression.

  • In our case, we'll join the table mp and the table groupp, meaning the from will be all fields of these tables.

  • We can use TableToRow to be more DRY:

type BaseParliamentSelection =
  '[ "g" ::: TableToRow GroupCols
   , "m" ::: TableToRow MpCols ]

Note

We've also put everything with table aliases: "g" and "m".


Writing our from clause

baseParliamentTables ::
    FromClause Schema (param :: [NullityType]) BaseParliamentSelection
baseParliamentTables =
  table
    (#groupp `as` #g)
    & innerJoin
      (table (#mp `as` #m))
      (#m ! #mp_group .== #g ! #groupp_id)
  • When picking "from" something, you need to specify the type with a function: table for a table, view for a view, etc.

  • All joins are available. innerJoin is the most basic one. It takes the joined table and the joining condition.

  • We have to specifiy the type of param (current limitation of the lib), even though any params will be compatible with this, so we can plug where clauses depending on params should we need to !


Typing the common selection

  • Both our querie will need the same fields. This has to be factorized too.

  • The scary signature of selection fields:

NP (Aliased (Expression schema from grouping params)) cols
  • NP is for heterogenous lists ("n-ary product").

  • from is the virtual table type, so our BaseParliamentSelection.

  • grouping is there to make a distinction between aggregated / unaggregated queries.

  • And finally, cols is the return type.


Our common selection

groupSelection ::
    NP (Aliased (Expression Schema BaseParliamentSelection 'Ungrouped param))
        GroupRowResult
groupSelection =
       #g ! #name `as` #groupName
    :* #m ! #first_name `as` #firstName
    :* #m ! #last_name `as` #lastName
  • I hope you like big signatures.

  • But it's what let GHC checks that all alias and columns are available.

  • And that you're returning what you really intend to return.


Putting it all together

selectParliament :: Query Schema '[] GroupRowResult
selectParliament =
  select groupSelection (from baseParliamentTables)
selectGroupMembers :: Query Schema '[ 'NotNull 'PGtext] GroupRowResult
selectGroupMembers =
  select groupSelection
    (from baseParliamentTables
      & where_ (#g ! #name .== param @1))
  • That is fairly DRY. And entirely typesafe.

Actually fetching the data

  • We'll build an intermediary datatype representing our rows.

  • We'll make it match our row.

  • And we'll add generic-SOP so that we can build them from query results.

type GroupRowResult =
  '[ "groupName" ::: 'NotNull 'PGtext
   , "firstName" ::: 'NotNull 'PGtext
   , "lastName" ::: 'NotNull 'PGtext ]
data GroupRow =
  GroupRow { groupName :: Text
           , firstName :: Text
           , lastName :: Text }
  deriving (Generic)

instance SOP.Generic GroupRow
instance SOP.HasDatatypeInfo GroupRow

Aggregate result logic

  • We will get tabular, SQL data.

  • Our results will be [GroupRow].

  • We'll build Groups from this:

buildGroup :: NE.NonEmpty GroupRow -> Group
buildGroup grs =
  let buildMP (GroupRow _ f l) = Mp f l
      name = groupName . NE.head $ grs
      members = NE.toList (buildMP <$> grs)
  in Group{..}

rowToGroups :: [GroupRow] -> [Group]
rowToGroups =
  let grouped = fmap NE.fromList . L.groupBy ((==) `on` groupName)
  in fmap buildGroup . grouped

Finally calling our queries

  • Query with params will use runQueryParams.

  • Query without params will use runQuery.

  • Result is inside a MonadPQ and of type K from generic-sop.

  • You get your actual result using getRows. Exemple:

getParliament :: (MonadPQ Schema m, MonadBaseControl IO m) => m [Group]
getParliament = do
  res <- runQuery selectParliament
  rows <- getRows res
  pure $ rowToGroups rows
  • In real life of course you'll write:
getParliament =
    rowToGroups <$> (runQuery selectParliament >>= getRows)

Our query with params

getGroupMembers ::
    (MonadPQ Schema m, MonadBaseControl IO m) => Text -> m (Maybe Group)
getGroupMembers =
    listToMaybe . rowToGroups <$>
        (runQueryParams selectGroupMembers (Only t) >>= getRows)
  • And that's it.

Eazy


Conclusion


Pros & Cons

  • Not everything is included: some cool stuff like array_agg and window functions are not available yet.

  • But development is very active. IN clauses were missing but were added in the 4.0 released recently.

  • It's bleeding edge. You need latest LTS to be comfortable and psql >= 9.5.

  • It's Postgres-only but I would say that's a feature. Multi-DB tools are even more complex.

  • The author & maintainer is <3.

  • Typesafe. 'nuff said.


Thank you !