The generic binary loading interface is instantiated once for each
architecture/binary container pair. This isn't great, but there is enough
custom work in each setting to justify it.
The binary loading interface isn't finished yet, and needs to learn some
additional operations to support relocation. It already supports additional
information that is architecture specific and binary container format
specific (that operations will have to use on a per-format basis).
On the PowerPC side, the Table of Contents (TOC) is now architecture-specific
information constructed by the loader (currently from ELF binaries). The new
TOC data type is in place to support this more easily (the old format was just a
function).
This code was mostly architecture independent already, so this commit moves it
to the macaw-semmc module so that it can be shared with the ARM backend. I
still plan to move the main TH module with the SimpleBuilder to macaw
translation, but that requires a few other changes first.
The code pointer discovery in macaw can't handle this case because we never
write the code pointers into memory - we only read them. We really need a way
to tell macaw about code pointers.
The easy workaround is to pull all of the function entry points out of the TOC
and just seed the macaw search with them, but it would be nice to be able to
identify them from first principles.
This change now memoizes translations of SimpleBuilder expression fragments,
which allows us to restore the sharing in semantics formulas. The generator
re-uses shared sub-expressions automatically now. This generates less Haskell
code, yielding better code density and fewer terms constructed at run time. It
also reduces compile times.
It seems to cut the size of the generated TH code by about half. It also
generates less deeply-nested Haskell code, making the resulting TH splices human
readable.
It runs code discovery over a large-ish binary to test coverage. We currently
fail due to unsupported instructions (expected). This test will guide
priorities on implementing new semantics.
The main function is 'extractValue', which takes an operand and returns a macaw
bitvector for it (in the PPCGenerator monad).
There are still some missing cases for the memory operands.
These lists come from semmc and contain the bytestrings of the semantics files
for each opcode.
NOTE: The lists are currently empty (presumably due to bugs), but the logic for
moving data around and setting up a SimpleBuilder instance is at least right.