* Fix some unescaped markup in haddocks
This addresses #315
* Expand documentation on unsafeDefault
This copies the documentation in the cookbook to the haddocks,
addressing #233.
* Update docs/concepts/insert.rst
Co-authored-by: Ollie Charles <ollie@ocharles.org.uk>
---------
Co-authored-by: Ollie Charles <ollie@ocharles.org.uk>
Somewhere between PostgreSQL 11 and PostgreSQL 15, PostgreSQL's optimiser gained the ability to see "through" subqueries, and it seems to choose to do this even when we don't really want it to.
E.g., it started transforming the following:
```haskell
SELECT
x * y + x * y
FROM (
SELECT
a + b + c AS x
d + e + f AS y
FROM
foo
) _
```
into:
```haskell
SELECT
(a + b + c) * (d + e + f) + (a + b + c) * (d + e + f)
FROM
foo
```
before evaluating.
You can see how more complicated expressions nested several levels deep could get expanded into crazy big expressions. This seems to be what PostgreSQL actually does on Rel8 code that uses `rebind`. Compared to older versions of PostgreSQL, this increases the planning time and execution time dramatically.
Given that Rel8's `rebind` is intended to function as a "let binding", and the user needs to go out of their way to choose to use it (they could just use `pure` if they wanted the fully expanded expression), we want a way to force PostgreSQL to evaluate the `a + b + c` and the `d + e + f` first before worrying about trying to simplify `x * y + x * y`. Adding `OFFSET 0` to the inner query seems to achieve that.
```haskell
SELECT
x * y + x * y
FROM (
SELECT
a + b + c AS x
d + e + f AS y
FROM
foo
OFFSET
0
) _
```
The `Sql DBEq a` constraint on the return type of the aggregator was wrong. It also isn't quite right to have a `EqTable i` constraint on the input type of the `Aggregator`, because technically what we want is a `Sql DBEq` constraint on whichever column(s) within `i` are used by aggregation functions, but we don't know which columns were used at this point. We could give `distinctAggregate` a type like `Sql DBEq i => Aggregator (Expr i) a` and make people run it through `lmap` manually, but that makes it impractical to use with `ListTable` without exposing more machinery. So we just drop the equality constraint for now.
This is one possible "fix" to #168. With this we can `catListTable` arbitrarily deep trees of `ListTable`s.
It comes at a relatively high cost, however.
Currently we represent nested arrays with anonymous records. This works reasonably well, except that we can't extract the field from the anonymous record when we need it (PostgreSQL [theoretically](https://www.postgresql.org/docs/13/release-13.html#id-1.11.6.16.5.6) suports `.f1` syntax since PG13 but it only works in very limited situations). But it does mean we can decode the results using Hasql's binary decoders, and ordering works how we expect ('array[row(array[9])] < array[row(array[10])]'.
What this PR does is instead represent nested arrays as text. To be able to decode this, we need each 'DBType' to supply a text parser in addition to a binary decoder. It also means that ordering is no longer intuitive, because `array[array[9]::text] > array[array[10]::text]`. However, it does mean we can nest `catListTable`s to our heart's content and it will always just work.
Types in PostgreSQL can also be qualified with a schema. However, it's not sufficient to just change the type of `TypeInformation`'s `typeName` to `QualifiedName`, because a type isn't *just* a name. Postgres types can also be parameterised by modifiers (e.g., `numeric(7, 2)`) and array types of arbitrary depth (e.g., `int4[][]`).
To accomodate this, a new type is introduced, `TypeName`. Like `QualifiedName`, it has an `IsString` instance, so the common case (`schema` set to `Nothing`, no modifiers, scalar type) will continue working as before.
The main reason I wasn't happy with this before is that there was nothing stopping you from writing `materialize query pure`. This returned query would produce invalid SQL if you actually tried to use it, because it would attempt to reference a common table expression outside the scope of the `WITH` statement.
The "solution" here is just to throw a `rebind` around the result such that `materialize` incurs an extra `Table Expr b` constraint, which means that you can't return `Query (Query a)` because `Query a` can't satisfy a `Table Expr` constraint.
This does away with the weird variadic arguments thing we had going on with `function`.
Functions with no arguments are now written as:
```haskell
now :: Expr UTCTime
now = function "now" ()
```
Functions with multiple arguments are now written as:
```haskell
quot :: Sql DBIntegral a => Expr a -> Expr a -> Expr a
quot n d = function "div" (n, d)
```
Single-argument functions are written exactly as before.
This adds a new type `QualifiedName` for named PostgreSQL objects (tables, views, functions and sequences) that can optionally be qualified by a schema. Previously only `TableSchema` could be qualified in this way.
`QualifiedName` has an `IsString` instance so the common case (where the schema is `Nothing`) doesn't have to care about schemas (if `OverloadedStrings` is enabled).
This also refactors `TableSchema` to use `QualifiedName` for its `name` field and drops its `schema` field.
Thanks to @elldritch for the bug report and the inspiration.
The motivation behind this PR is to add support for PostreSQL's `WITH` syntax at the statement level, which gives the ability to, e.g., delete some rows from a table and then re-insert those deleted rows into another table, without any round-trips between the application and the database.
To support this, this PR introduces a new type called `Statement`, which represents a single PostgreSQL statement. It has a `Monad` instance which allows sub-statements (such as `DELETE` and `INSERT` statements) to be composed together and their results bound to values that can be referenced in subsequent sub-statements. These "compound" statements are then rendered as a `WITH` statement.
`select`, `insert`, `update` and `delete` have all been altered to produce the `Statement` type described above instead of the `Hasql.Statement` type.
Some changes were necessary to the `Returning` type. `Returning` previously bundled two different concepts together: whether or not to generate a `RETURNING` clause in the SQL for a manipulation statement, and how to decode the returned rows (if any). It was necessary to break these concepts apart because with `WITH` we need the ability to generate manipulation statements with `RETURNING` clauses that are never actually decoded at all (the results just get passed to the next statement without touching the application).
Now, the `Returning` type is only concerned with whether or not to generate a `RETURNING` clause, and the question of how to decode the returned the result of the statement is handled by the `run` functions. `run` converts a `Statement` into a runnable `Hasql.Statement`, decoding the result of the statement as a list of rows. The other variations, `run_`, `runN`, `run1`, `runMaybe` and `runVector` can be used when you want to decode as something other than a list of rows.
This also gains us support for decoding the result of a query directly to a `Vector` for the first time, which brings a performance improvement over lists for those who need it.
This is another possible "fix" to #168 (as opposed to #242). It doesn't really fix the problem, but it allows us to use two levels of `catListTable` instead of only one. Instead of trying to use Postgres's broken `.f1` syntax, we cast the anonymous record to text, remove the parentheses and quotes and unescape any escaped quotes or backslashes, and then cast the resulting text back to the appropriate type. The reason this only works one level deep is that if the type we cast the text back to is itself an anonymous record, then PostgreSQL doesn't know how to parse the text.
It's kind of ugly and hacky but it does work and otherwise maintains the status quo. Comparison operators on nested lists continue to work as before and we don't need to burden `DBType` with parsing nonsense.
This PR makes a number of changes to how aggregation works in Rel8.
The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like:
```haskell
fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do
foo <- each fooSchema
pure (sum foo.x, count foo.x)
```
The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case.
The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`.
The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly.
Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s.
This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`.
It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former).
Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`.
Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions.
`filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s.
`orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.