grin/GRIN-LLVM-CodeGen.md

# Differences to Boquist's PhD thesis RISC backend

- support for multiple types (float, int, etc.)
- LLVM is typed ; values must have proper types i.e.
- no register pinning ; instead pass registers (e.g. heap pointer) as local parameter
- rely on LLVM register allocator


Boquist's RISC backend only supports int type ; there are problematic expressions: unit (A Int | B Float)

IDEA:
  - GRIN must be monomorph in LLVM type system
  - Vectorisation is an efficient mapping to product types
```
node = tag + simple values + node pointers
con data = simple values + node pointers
slim layout: tag + con data
fat layout: tag + con1 data + ... + conN data
```
IDEA:
  - support array literals
  - support struct literals

HINT:
  - fetch, store, update must be layout monorphic
  - the only supported cast is pointer cast

## Legalisation
  implement new GRIN transformations
  - llvm monomorphisation ; split polymorphic (in LLVM type system) GRIN node components
  - node layout calculation

  It is possible to put monomorphisation into the vectorisation transformation as it is already relies on the HPT result.

## Notes
  - CodeGen supports monomorphic GRIN ; monomorphic GRIN validation pass
  - new transformation: grin-monomorphisation
      turns variadic typed GRIN into monomorphic type GRIN (using HPT result)
  - GRIN type system
      node
      heap location
      simple type
      tag

  - GRIN type system for LLVM codegen
      node TAG
      heap location
      simple type LLVM TYPE
      tag N
  - new transformation: llvm-monomorphisation
      makes GRIN monomorphic in LLVM type system (using HPT result)

  - GRIN transformations must not lose information. If the information is only stored in the TAG info table then the TAG info table must be part of the GRIN language. e.g. as types

LLVM bitcast experiments
  - convert i16 to <i8,i8>
  - convert i8 to <i2,i2,i2,i2>

# LLVM codegen without analysis

It is possible to compile to LLVM from high level GRIN without analysis.
However an universal value representation is required where every GRIN register is mapped to a vector of universal values.
Basically an interpreter can be generated for the input source code.
If the source language can provide type information then the value representation can be more efficient.

# Node representation

The Heap-Points-To analysis calculates a type set (set of possible value types) for every GRIN variable and heap location.
The corresponding type set for a variables or heap location is described by the result of the HPT analysis. e.g.
```haskell
Heap
    1      -> {CInt[{T_Int64}]}
    2      -> {CInt[{T_Int64}]}
    3      -> {CInt[{T_Int64}]
              ,Fadd[{1},{2}]}
Env
    a      -> {1,2,3}
    b      -> {T_Int64}
    c      -> {CInt[{T_Int64}]
              ,Fadd[{1},{2}]}
```
Each type set describe the possible values that a variable or heap location can hold.
A type set is a disjoint union of value types that the variable can store at a time.
The GRIN values can not contain every possible value at a time.

GRIN value types:
  - simple type
  - node
  - location

Currently in GRIN only the following value type combinations can form a valid type set:
  - `{simple type}` - singleton type set of a simple type
  - `{node+}` - type set of one or more node type
  - `{location+} ` - type set of one or more location types

Due to the disjoint property of the GRIN values, they can be represented as tagged unions.

## Tag construction

Beside node types, type sets must have tags to mark their current content.
The type set tags can be constructed the following way:
  - `{simple type}` - singleton set, no tag needed
  - `{node+}` - node tags can be reused
  - `{location+} ` - location values are raw pointers, the abstract location index can be used as tag;
    location as tagged union value `{location, pointer}`
  - `{location} ` - singleton set, no tag is needed

## Operations

Type set tagged union operations:
  - `pack    (value :: type :: type set) = (tagged union :: type set)`
  - `unpack  (tagged union :: type set) (tag/witness :: type :: type set ) = (value :: type :: type set)`

Node operations:
  - `build    (tag :: node tag) (values :: [type]) = (node :: type)`
  - `project  (node :: type) (elemIndex :: Int) = (element :: type)`

## TODO
  - prune dead variables from HPTResult before converting to TypeEnv
  - hash cons TypeEnv to get rid of duplicate types
  - use better variable names in the generated LLVM IR
  - remove special heap pointer handling from codegen ; expose it in GRIN via a transfromation ; heap pointer should be a parameter and return value of `store`.