This work is related to restricting tag union sizes in input positions.
As an example, for something like
```
\x -> when x is
A M -> X
A N -> X
A _ -> X
```
we'd like to infer `[A [M, N]* ]` rather than the `[A, [M, N]* ]*` we
infer today. Notice the difference is that the former type tells us we
only accepts `A`s, but the argument of the `A` can be `M`, `N` or
anything else (hence the `_`).
So what's the idea? It's an encoding of the "must have"/"might have"
design discussed in https://github.com/rtfeldman/roc/issues/1758. Let's
take our example above and walk through unification of each branch.
Suppose `x` starts off as a flex var `t`.
```
\x -> when x is
A M -> X
```
Now we introduce a new kind of constraint called a "presence"
constraint. It says "t has at least [A [M]]". I'll notate this as `t +=
[A [M]]`. When `t` is free as it is here, this is equivalent to `t ~
[A [M]]`.
```
\x -> when x is
...
A N -> X
```
At this branch we introduce the presence constraint `[A [M]] += [A [N]]`.
Notice that there's two tag unions we care about resolving here - one is
the toplevel one that says "I have an `A ...` inside of me", and the
other one is the tag union that's the tyarg to `A`. They are distinct
and at different depths.
For the toplevel one, we first figure out if the number of tags in the
union needs to expand. It does not - we're hoping to resolve the type
`[A [M, N]]`, which only has `A` in the toplevel union. So, we don't
need to do anything extra there, other than the merge the nested tag
unions.
We recurse on the shared tags, and now we have the presence constraint
`[M] += [N]`. At this point it's important to remember that the left and
right hand types are backed by type variables, so this is really
something like `t11 [M] += t12 [N]`, where `[M]` and `[N]` are just what
we know the variables `t11` and `t12` to be at this moment. So how do we
solve for `t11 [M, N]` from here? Well, we can encode this constraint as
a type variable definition and a unification constraint we already know
how to solve:
```
New definition: t11 [M]a (a fresh)
New constraint: a ~ t12 [N]
```
That's it; upon unification, `t11 [M, N]` falls out.
Okay, last step.
```
\x -> when x is
...
A _ -> X
```
We now have `[A [M, N]] += [A a]`, where `a` is a fresh unbound
variable. Again nothing has to happen on the toplevel. We walk down and
find `t11 [M, N] += t21 a`. This is actually called an "open constraint"; we
differentiate it at the time we generate constraints because it follows
syntactically from the presence of an `_`, but it's semantically
equivalent to the presence constraint `t11 [M, N] += t21 a`. It's just
called opening because literally the only way `t11 [M, N] += t21 a` can
be true is if we set `t11 a`. Well, actually, we assume `a` is a tag
union, so we just make `t11` the open tag union `[M, N]a`. Since `a` is
unbound, this eventually becomes a wildcard and hence falls out `[M, N]*`.
Also, once we open a tag union with an open constraint, we never close
it again.
That's it. The rest falls out recursively. This gives us a really easy
way to encode these ordering constraints in the unification-based system
we have today with minimal additional intervention. We do have to patch
variables in-place sometimes, and the additive nature of these
constraints feels about out-of-place relative to unification, but it
seems to work well.
Resolves#1758
By emitting a runtime error rather than panicking when we can't layout
a record, we help programs like
```
main =
get = \{a} -> a
get {b: "hello world"}
```
execute as
```
Mismatch in compiler/unify/src/unify.rs Line 1071 Column 13
Trying to unify two flat types that are incompatible: EmptyRecord ~ { 'a' : Demanded(122), }<130>
🔨 Rebuilding host...
── TYPE MISMATCH ───────────────────────────────────────────────────────────────
The 1st argument to get is not what I expect:
8│ get {b: "hello world"}
^^^^^^^^^^^^^^^^^^
This argument is a record of type:
{ b : Str }
But get needs the 1st argument to be:
{ a : a }b
Tip: Seems like a record field typo. Maybe a should be b?
Tip: Can more type annotations be added? Type annotations always help
me give more specific messages, and I think they could help a lot in
this case
────────────────────────────────────────────────────────────────────────────────
'+fast-variable-shuffle' is not a recognized feature for this target (ignoring feature)
'+fast-variable-shuffle' is not a recognized feature for this target (ignoring feature)
Done!
Application crashed with message
Can't create record with improper layout
Shutting down
```
rather than the hanging
```
Mismatch in compiler/unify/src/unify.rs Line 1071 Column 13
Trying to unify two flat types that are incompatible: EmptyRecord ~ { 'a' : Demanded(122), }<130>
thread '<unnamed>' panicked at 'invalid layout from var: UnresolvedTypeVar(104)', compiler/mono/s
rc/layout.rs:1510:52
```
that was previously produced.
Part of #2227
This reverts commit 6e4fd5f06a1ae6138659b0073b4e2b375a499588.
This idea didn't work out because cloning the type and storing it on a
variable still resulted in the solver trying to uify the variable with
the type. When there were errors, which there certainly would be if we
tried to unify the variable with a structure that had nested flex/rigid
vars, the nested flex/rigid vars would inherit those errors, and the
program wouldn't typecheck.
Since the motivation here was to expose the signature type to
`reporting` so that we could modify it with suggestions, we should
instead pass that information along in something analogous to the
`Expected` struct.
Previously, a program like
```roc
word = "word"
if True then 1 else "\(word) is a word"
```
would report an error like
```
── TYPE MISMATCH ───────────────────────────────────────────────────────────────
This `if` has an `else` branch with a different type from its `then` branch:
3│ if True then 1 else "\(word) is a word"
^^^^^^^^^^^^^^^^^^
This concat all produces:
Str
but the `then` branch has the type:
Num a
I need all branches in an `if` to have the same type!
```
but this is a little bit confusing, since the user shouldn't have to
know (or care) that string interpolations are equivalent to
concatenations under the current implementation.
Indeed we should make this fully transparent. We now word the error
message by taking into account the way calls are made. To support the
case shown above, we introduce the `CalledVia::Sugar` variant to
represent the fact that some calls may be the result of desugaring the
surface syntax.
This commit also demonstrates the usage of `CalledVia` to produce better
error messages where we use binary comparison operators like `<`. There
are more improvements we can make here for all `CalledVia` variants, but
this is a good starting point to demonstrate the usage of the new
procedure.
Closes#1714
These tests fail currently because we have not implemented type
reconstruction in the presence of annonation-only declarations, but it's
good to keep track of them for later.
This is mostly the same as the procedure for unannotated `FunctionDef`s.
The only differences are that we need to
1. note the rigid type variables that are a part of the FunctionDef
2. generate fresh type variables for the function arguments, and
constrain them to the types of the arguments passed to us in the
annotation
This is enough to get the constraint generation working. There are more
interesting cases that properly typecheck in the main compiler but do
not typecheck given the current state of the editor AST (see the second
test in this commit), but this is a bug due to canonicalization, not
type reconstruction.
We ened to store the mapping of `rigid type vars -> user-declared names`
for presentation later on when we reconstruct the type of an expression
and want to pretty-print it for the user. This information is most
naturally captured on `output.introduced_variables`, as the main
compiler does so there.
We don't currently record this mapping while translating `Annotation`s
to `Annotation2`s (though we could), so we store it when we hit an
annotated expression that now needs to become an `Expr2`.
Moving from storing these as slices to `Lowercase`s makes it easier to
transform to shapes we'll need later during type reconstruction, for
example a mapping of rigid type variables to their lowercase variable
names.
Rigid vars are added to types after solutions are already found and are
only relevant for pretty printing elaborated types, so we don't keep
track of/care about their rigid names until the end of solution.
Previously we didn't provide this information in tests, though - now we
do.
The size of a `(PatternId, Type2)` tuple is 40 bytes (4 for the
`PatternId`, 4 bytes padding, 32 for the `Type2`). This doesn't fit in
an item slot allocated by the pool, which has a max of 32 bytes. So, we
allocate the Type2 itself on the pool, and then reference its pool ID in
the resulting tuple, which lowers the total size of the tuple to 8
bytes. This is a bit wasteful, but I couldn't find a better solution
without significantly more rework.
We also reorder the Type2 and PatternId fields in the tuple to better
align with the typical `(type|type variable, pattern|expression)` tuple
structure that exists in e.g. `FunctionDef::NoAnnotation`.
We do this by treating function definition bodies as equivalent to
closures, and piggy-backing on existing work to generate constraints
over closures. Then, we just bind the function name with the resolved
type of the function body.
Support for constraint generation in the presence of annotated functions
will be added later.
Prior to this patch we would not explicitly name solved type variables,
causing the elaborated type to appear unconstrained even when the
internal representation was constrained. For example, given a definition
like
```
\a, b -> Pair a b
```
we would generate distinct, fresh type variables for `a` and `b` but not
name them after solution. So even though the compiler knows they are
distinct, printing to the surface syntax would emit
```
*, * -> [ Pair * * ]*
```
which is incorrect, as the result type is constrained on the input type.
Instead, we now properly emit
```
a, b -> [ Pair a b ]*
```
naming variables where dependencies do exist. Where type variables are
don't constrain anything else, we can and do continue to emit the
wildcard type.