Differences to Boquist's PhD thesis RISC backend

support for multiple types (float, int, etc.)
LLVM is typed ; values must have proper types i.e.
no register pinning ; instead pass registers (e.g. heap pointer) as local parameter
rely on LLVM register allocator

Boquist's RISC backend only supports int type ; there are problematic expressions: unit (A Int | B Float)

IDEA:

GRIN must be monomorph in LLVM type system
Vectorisation is an efficient mapping to product types

node = tag + simple values + node pointers
con data = simple values + node pointers
slim layout: tag + con data
fat layout: tag + con1 data + ... + conN data

IDEA:

support array literals
support struct literals

HINT:

fetch, store, update must be layout monorphic
the only supported cast is pointer cast

Legalisation

implement new GRIN transformations

llvm monomorphisation ; split polymorphic (in LLVM type system) GRIN node components
node layout calculation

It is possible to put monomorphisation into the vectorisation transformation as it is already relies on the HPT result.

Notes

CodeGen supports monomorphic GRIN ; monomorphic GRIN validation pass
new transformation: grin-monomorphisation turns variadic typed GRIN into monomorphic type GRIN (using HPT result)
GRIN type system node heap location simple type tag
GRIN type system for LLVM codegen node TAG heap location simple type LLVM TYPE tag N
new transformation: llvm-monomorphisation makes GRIN monomorphic in LLVM type system (using HPT result)
GRIN transformations must not lose information. If the information is only stored in the TAG info table then the TAG info table must be part of the GRIN language. e.g. as types

LLVM bitcast experiments

convert i16 to <i8,i8>
convert i8 to <i2,i2,i2,i2>

LLVM codegen without analysis

It is possible to compile to LLVM from high level GRIN without analysis. However an universal value representation is required where every GRIN register is mapped to a vector of universal values. Basically an interpreter can be generated for the input source code. If the source language can provide type information then the value representation can be more efficient.

Node representation

The Heap-Points-To analysis calculates a type set (set of possible value types) for every GRIN variable and heap location. The corresponding type set for a variables or heap location is described by the result of the HPT analysis. e.g.

Heap
    1      -> {CInt[{T_Int64}]}
    2      -> {CInt[{T_Int64}]}
    3      -> {CInt[{T_Int64}]
              ,Fadd[{1},{2}]}
Env
    a      -> {1,2,3}
    b      -> {T_Int64}
    c      -> {CInt[{T_Int64}]
              ,Fadd[{1},{2}]}

Each type set describe the possible values that a variable or heap location can hold. A type set is a disjoint union of value types that the variable can store at a time. The GRIN values can not contain every possible value at a time.

GRIN value types:

simple type
node
location

Currently in GRIN only the following value type combinations can form a valid type set:

{simple type} - singleton type set of a simple type
{node+} - type set of one or more node type
{location+} - type set of one or more location types

Due to the disjoint property of the GRIN values, they can be represented as tagged unions.

Tag construction

Beside node types, type sets must have tags to mark their current content. The type set tags can be constructed the following way:

{simple type} - singleton set, no tag needed
{node+} - node tags can be reused
{location+} - location values are raw pointers, the abstract location index can be used as tag; location as tagged union value {location, pointer}
{location} - singleton set, no tag is needed

Operations

Type set tagged union operations:

pack (value :: type :: type set) = (tagged union :: type set)
unpack (tagged union :: type set) (tag/witness :: type :: type set ) = (value :: type :: type set)

Node operations:

build (tag :: node tag) (values :: [type]) = (node :: type)
project (node :: type) (elemIndex :: Int) = (element :: type)

TODO

prune dead variables from HPTResult before converting to TypeEnv
hash cons TypeEnv to get rid of duplicate types
use better variable names in the generated LLVM IR
remove special heap pointer handling from codegen ; expose it in GRIN via a transfromation ; heap pointer should be a parameter and return value of store.

4.5 KiB Raw Blame History