This allows a much wider range of terms to be decompiled. We remember
the original term that we're about to float, and use it when the floated
combinator is decompiled.
We do this for every closed term that is being floated. Before floating,
we do a pass to close all lambda terms, so the only free variables in
them are actual recursive references. This means that every (locally)
non-recursive function has retained decompilation information.
Resolves#1648.
Technically I think that this has slightly different RNG characteristics
than the old implementation, but I wouldn't expect it to matter for
the sake of the test.
Before this change, when I ran the tests I got the following "hints"
printed to the console:
```
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
```
I think that this means that the tests could potentially fail, depending
on the user's local git config and/or version.
I tried using the `trunk` branch and the tests failed. It seems that
the tests are dependent on the `master` branch being used, so I figured
it was probably be best to explicitly specify that branch name.
- This test cases expected `{ Some x -> k }` to parse, but since
data/effect references are resolved at parse time, this isn't
possible. Parsing effect matches is likely covered by other tests
with legitimate abilities anyway.
- This is necessary for dynamically loading intermediate code during
execution. Local assignments of various code are now stored in
transactional variables that can be mutated as necessary.
- Various functions for adding to this cache have naturally been moved
to the machine implementation rather than the interface. The interface
mainly retains functions for keeping track of mappings between the
surface and intermediate levels, which won't be updated during
execution.
- Helper functions for finding references in intermediate code have been
added, since we may not have surface terms for some intermediate code
we're dealing with.
- Terms have been monomorphized to Symbol for this caching, just for
simplicity.
- A let may be classified as either Direct or Indirect. Indirect lets
contain a number, so that we can represent their return location in a
continuation. Direct indicates that this is not necessary.
- Currently, only simple direct lets are allowed, which may be turned
into individual instructions. This is because anything else still
requires stack/continuation management, and would require additional
code forms to represent compound expressions that involve only local
instructions.
- This obviates generating these numbers in the MCode translation.
Instead, the relevant lets are numbered with Word16s, and each section
gets 48 bits for numbering, while their lets use the low 16 bits for
identification.
- Previously we were ensuring that a `let` does not immediately contain
another `let`. However, there is actually no need for this from the
perspective of the machine code. Removing this requirement makes the
definition somewhat less complicated.
- Includes some preparation for numbering lets for continuation
serialization.
- The previous strategy was to store a `Section` directly in the
continuation to know where to resume. However, this is undesirable
both from the perspective of treating them as values (with ordering
etc.) and from the perspective of serialization.
- Instead, the continuation now stores an index that is looked up the
same way as any function. To make this possible, during the process
of compiling from ANF to machine code, we create code sections for
any let block that actually does an indirect code jump (i.e. not a
literal, primop, or similar). Then the continuation is conceptually
a sequence of these indexes (among other things).
- Various structures/functions have been shuffled and modified to
facilitate this change.
- For actual mobile code, some of this numbering should probably be
tracked at the ANF level. However, this was just an initial step to
demonstrate the feasibility.
- Make ANF representation work entirely in terms of Reference. Since ANF
is significantly closer to something that could actually be moved
between machines, it doesn't make much sense to work in terms of
machine-local numberings at that stage.
- Reference => Numbering resolution has been moved to the process of
compiling ANF code to MCode.
- References are now stored directly in closures for partial
applications and data. This means that Foreign functions can
potentially inspect them without making a reverse mapping function
available to them.
* The interpreter mostly ignores these references, working in terms of
an ephemeral numbering of these references.
* This makes the RTag/CTag packing scheme unnecessary. Runtime tags
are now just storing the constructor number, and the full Word64
space is available for data/effect and constructor numbering for
now.
* The performance implications of this haven't been tested. If this
is too slow, the strategy may need to be rethought.
- Universal comparison has been tweaked to use these references. It's
also been tweaked to try to avoid comparing references in situations
where it should be impossible for them to not match.
- Previously the new runtime expected the code it was to run to be
completely contained in a single letrec. The new code no longer has
this restriction, and it will even break down top-level letrecs it's
given into separate referenced combinators.
- Internal letrecs are not split out currently; they are lifted after
the top group is split up. This results in each top-level reference
potentially having a set of mutually recursive definitions.
- The mutually recursive definitions of each reference are preserved all
the way to machine code, resulting in (potentially) multiple
combinators making up a single referenced definition. One of these is
distinguished as the main entry point, and local entry points can only
be referenced from within the combinator (at this point).
- The new runtime interface will actualaly chase term dependencies now,
and compile them.
- Debugging and test code has been tweaked to work with these changes.
- Decompilation of 'internal' function sections likely still needs work.
- The tests are specifically for the IO ability used in the old runtime.
It was anticipated that the new one would use the same IO, but that
ended up not being the case.
- So instead, just run the IO test with the old runtime, since that is
simple, and doesn't require any complicated logic to make the test
valid in both situations.
- As part of implementing MVar, it became clear that the existing FFI
methodology wasn't set up to handle 'foreign' code working
parametrically in closures.
- Rather than wrapping closures in Foreign or the like, things have been
reworked to enable operating on closures directly, while still having
convenience machinery for working on wrapped Haskell values.
- This required some additional refactoring to avoid cyclic or collapsed
module structures.
* Foreign functions are now stored in a map and referred to by word,
similar to combinators.
* The map is threaded through execution, and various 'static
environment' maps have been combined into a single record for easier
threading.
* The ForeignFunc type has been given its own module, because it
involves operating on stacks, but the stack module contains the
closure type which has wrapped foreign values.
* IOp to foreign function translation has been moved to the builtin
module. Code emission is accomplished by using the Enum instance to
pick words for the corresponding function.
- Some of the foreign machinery has been used to overload various
peek/poke convenience operators rather than having multiple functions.
- Runtime exceptions have been moved to their own module.