The improved string support in Crucible adds a parameter to string reprs; this
change accommodates that. Earlier changes added the necessary support in the
rest of macaw.
Update to API changes in macaw-base in macaw-ppc and macaw-arm
The "block label" abstraction (used during arch-specific disassembly) was removed some time ago in the base macaw library. This change updates macaw-ppc and macaw-arm to remove uses of block labels. The major change is that the disassembly function only returns a single block at a time instead of a sequence of blocks.
To facilitate this, the handling of the PowerPC conditional trap instruction (trap doubleword) is now an architecture-specific terminator instruction instead of encoding the logic of conditional trapping. We will now have to encode the conditional trapping logic in macaw-ppc-symbolic. Note that we have not done so yet.
This commit also updates the expected results of the PowerPC tests; the number of discovered blocks is different, but not significantly so. It is hard to tell if this is a regression or an improvement.
Previously, macaw-ppc (and macaw-arm) would call `error` if there were no
semantics available for a decoded instruction. This was useful during initial
development, but it is a problem for deployment. Now just turn missing
semantics into TranslationErrors, which appear as block terminators in macaw IR.
This will require more diligence in monitoring TranslationErrors for patterns
that need to be addressed.
This includes a minor change: a new required field for the blocks returned by
the machine-specific disassembler. The information was already readily
available in this backend.
There was an error case in function interpretation in the TH generated
code (when a function couldn't be evaluated for a given operand). This
shouldn't happen for well-formed code, but can be a problem when macaw finds
invalid code that happens to decode as a real instruction (with an invalid
operand).
The old code called error, which caused macaw to fail with a hard stop. This
commit changes the call to be to `fail` instead, which fires off an exception
that is properly handled (giving us a ClassifyFailure instead).
We now thread a snapshot of the register state from the beginning of the
instruction evaluation through each instruction's semantics instead of
re-fetching register values each time we need it and potentially seeing
incorrect, partially modified register values.
In macaw core, the type of the arch-specific 'disassemble' function changed to
no longer take a Memory, and to pass the maximum offset as an Int instead of a
MemWord. It also removed the jump table entry size (which is no longer
required).
The removal of the Memory parameter required a bit of a change in how the
instruction parsers are structured, but it isn't a huge change (the "memory
contents after an address" can be computed from a MemSegmentOff, too).
Most instructions don't reference this variable, but it is in the signature of
every semantics function, leading to many unused variable warnings. This commit
adds an underscore prefix to the variable name to silence the warning.
The previous generator put all of the code for each matcher in a single large
case expression. While there were individual functions broken out for each case
body, they were all still in the same let expression, which created a huge term.
This refactoring lifts all of the semantics definition bodies to the top
level (with NOINLINE pragmas) to give the code generator less to chew on at a
time.
This improves compile times a little, but, more importantly, works around a bug
in the register allocator in GHC 8.4 that caused a crash in the PowerPC
semantics functions.
With caching disabled, we weren't doing any sharing recovery when
emitting Template Haskell. The cache in the `MacawQ` monad wasn't
helping because the expression builder generates a fresh nonce for each
expression it doesn't find in the cache; with no cache, every
subexpression was distinct from every other one.
It is now (optionally) pure via the MonadThrow class. It also exposes a new
binary format repr, which currently only has constructors for ELF containers.
The generic binary loading interface is instantiated once for each
architecture/binary container pair. This isn't great, but there is enough
custom work in each setting to justify it.
The binary loading interface isn't finished yet, and needs to learn some
additional operations to support relocation. It already supports additional
information that is architecture specific and binary container format
specific (that operations will have to use on a per-format basis).
On the PowerPC side, the Table of Contents (TOC) is now architecture-specific
information constructed by the loader (currently from ELF binaries). The new
TOC data type is in place to support this more easily (the old format was just a
function).