Previous implementation was directly including primops into the
generated intermediate code, which involved unboxed literal arguments.
These fake builtins allow for generating boxed literals so that no
unboxed data is on the surface stack.
I believe this was the last source of unboxed data on captured stacks.
This covers values that are stored as pseudo data types in the Haskell
runtime, which would otherwise need to be serialized using unboxed data
in the byte format.
The implementation was producing Maybe, but the calling convention just
called the primop in tail position. This would leave the Just tag on the
unboxed stack afterwards.
The goal of these changes is to remove unboxed details from the
interchange formats, and instead exchange values that somewhat resemble
surface level unison values. In some cases, the way this works is
obvious. For instance, the MatchNumeric/NMatch and BLit constructs avoid
having unboxed details in code. However, what might not be so obvious is
the request/data matching constructs. These prevent unboxed values from
occurring on the stack in normal unison functions, which means they no
longer end up in captured continuations.
The unboxed details were originally intended to support optimizations
that would turn surface unison functions into more efficient versions
that operate without any boxing. However, that didn't materialize, and
it seems unlikely that we'll implement it for the Haskell end of things
(we aren't really obliged to do this ourselves for good performance in
scheme). So, some of these new methods of doing things might actually be
more efficient than what was happening previously.
If a user arbitrarily permutes the links associated with code, it's
possible to end up with a load request that appears to have mutually
recursive definitions without common hashes. So, this has been made into
a catchable exception instead of throwing a Haskell error.
The initial hashing in the runtime just calls error when this happens,
since it means some internal code generation generated bad SCCs, not
that some kind of user error happened.
Solves one observed problem and potentially some others.
The observed problem was the process of unhashing and rehashing does not
replace any term links in the original terms. This is because term link
literals can't be turned into variables and subsequently replaced with
new hashes. So, instead we use our available variable mappings to
replace the literals manually.
A superior methodology might be to replicate the SCC behavior already in
hashTermComponents, and incrementally remap individual components.
However, that is a considerable amount of additional work, and the
post-floating references are just used as a translation layer between
the codebase and verifiable hashes, so making them completely consistent
doesn't seem necessary.
I've also added some codebase->verifiable replacements from the context
in some places. It's possible not doing this would have caused problems
during UCM sessions where some terms are loaded incrementally due to
multiple calls into the runtime. I didn't observe any failing tests due
to this, though.
When the interpreter is called for things in a scratch file, the code
path is slightly different, because all the definitions are pre-combined
into a letrec. If the definitions are subsequently added to the
codebase, we will have already loaded and compiled their code. However,
previously the remapping from base to floated references would not
exist, because that was only being generated for loaded dependencies in
the other path.
So, this code adds a similar remapping for a top-level letrec in this
code path. This fixes a problem with compiling a definition that had
just been added from a scratch file. The remap was expected to be there,
but it wasn't, so the compiler could find the code to emit.
The pair marshaling code was mistakenly using only a single layer of
data nesting, but unison pairs are like 2-element cons lists.
The rehashing code was not sorting the SCCs into a canonical order, so
the exact input order for components with more than one binding could
influence the hash. Sorting by input reference order fixes this, as all
references in an SCC are required to have the same hash, and differ only
by index.
I had mistakenly been putting floated terms in the decompile info, and
that was causing some mismatches with previous decompiling outputs.
Switching back to remembering the unfloated definitions seems to have
cleared up the discrepancies.
Previously, we were taking already hashed terms as input to the various
intermediate compiler stages. We would then do things like lambda
lifting, and hash the resulting definitions afterwards. However, this
leads to problems like mutually recursive definitions that do not share
a common hash.
To rectify this, we instead do the intermediate operations on an
_unhashed_ version of code looked up from the codebase. Essentially, we
turn relevant definitions back into a letrec, do the floating, then hash
the results of that processing. This gives proper hashes to the
processed terms so that the compiled terms can be rehashed with relative
ease.
The system for this is unfortunately quite intricate. To get good
decompiled output, we need to keep track of associations between
original and floated terms _and_ floated and rehashed terms, then
mediate between them in various places.