Previously, macaw-ppc (and macaw-arm) would call `error` if there were no
semantics available for a decoded instruction. This was useful during initial
development, but it is a problem for deployment. Now just turn missing
semantics into TranslationErrors, which appear as block terminators in macaw IR.
This will require more diligence in monitoring TranslationErrors for patterns
that need to be addressed.
This includes a minor change: a new required field for the blocks returned by
the machine-specific disassembler. The information was already readily
available in this backend.
There was an error case in function interpretation in the TH generated
code (when a function couldn't be evaluated for a given operand). This
shouldn't happen for well-formed code, but can be a problem when macaw finds
invalid code that happens to decode as a real instruction (with an invalid
operand).
The old code called error, which caused macaw to fail with a hard stop. This
commit changes the call to be to `fail` instead, which fires off an exception
that is properly handled (giving us a ClassifyFailure instead).
We now thread a snapshot of the register state from the beginning of the
instruction evaluation through each instruction's semantics instead of
re-fetching register values each time we need it and potentially seeing
incorrect, partially modified register values.
In macaw core, the type of the arch-specific 'disassemble' function changed to
no longer take a Memory, and to pass the maximum offset as an Int instead of a
MemWord. It also removed the jump table entry size (which is no longer
required).
The removal of the Memory parameter required a bit of a change in how the
instruction parsers are structured, but it isn't a huge change (the "memory
contents after an address" can be computed from a MemSegmentOff, too).
Most instructions don't reference this variable, but it is in the signature of
every semantics function, leading to many unused variable warnings. This commit
adds an underscore prefix to the variable name to silence the warning.
The previous generator put all of the code for each matcher in a single large
case expression. While there were individual functions broken out for each case
body, they were all still in the same let expression, which created a huge term.
This refactoring lifts all of the semantics definition bodies to the top
level (with NOINLINE pragmas) to give the code generator less to chew on at a
time.
This improves compile times a little, but, more importantly, works around a bug
in the register allocator in GHC 8.4 that caused a crash in the PowerPC
semantics functions.
With caching disabled, we weren't doing any sharing recovery when
emitting Template Haskell. The cache in the `MacawQ` monad wasn't
helping because the expression builder generates a fresh nonce for each
expression it doesn't find in the cache; with no cache, every
subexpression was distinct from every other one.
It is now (optionally) pure via the MonadThrow class. It also exposes a new
binary format repr, which currently only has constructors for ELF containers.
The generic binary loading interface is instantiated once for each
architecture/binary container pair. This isn't great, but there is enough
custom work in each setting to justify it.
The binary loading interface isn't finished yet, and needs to learn some
additional operations to support relocation. It already supports additional
information that is architecture specific and binary container format
specific (that operations will have to use on a per-format basis).
On the PowerPC side, the Table of Contents (TOC) is now architecture-specific
information constructed by the loader (currently from ELF binaries). The new
TOC data type is in place to support this more easily (the old format was just a
function).
This change is in the core generator monad and applied in the PowerPC backend.
This change includes some macaw updates (which required a new elf-edit version).
Original version was pushing error into generated TH, which was
generating the error statement into the SSA formula; this breaks
formula interpretation at compile time but hides the error. Instead,
this changes it so that the error is thrown during TH evaluation.
Pass operand and architecture types and instead of
case opcode of
ADD -> case operands of
Just GPR gpr0 :< Nil of ->
SSA-semantics
Generate:
let opc_ADD operands = case operands of
Just GPR gpr0 :< Nil of ->
SSA-semantics
in case opcode of
ADD -> opc_ADD operand
This provides better encapsulation for the individual operands and
more specific control over the types (at the cost of a pair of
additional type specifications in the call). This also seems to
reduce memory consumption by about half.