Now test to ensure that no blocks end in a classification failure (or a
disassembly failure). Before, many blocks were not classified, which causes
problems downstream. This required some changes in macaw core in two places:
1. The simplifier needed some additional rules to remove some redundant
constructions that threw off the abstract interpretation of values. This was
particularly an issue while reading return values off of the stack in
PowerPC.
2. Extending the abstract interpretation to be able to handle more operations (shiftl)
We need special treatment of the return, as the low two bits are cleared on
PowerPC, so we can't just rely on pattern matching against the ReturnAddr in the
IP register.
The identifyReturn was previously unused because the Macaw Discovery
performed this test inline, but some architectures have different
semantics so the identifyReturn is now used by the Discovery process.
This implements the return discovery that should be sufficient for the
PPC.
Recent changes in macaw(-base) mean that we split blocks more aggressively. The
old expected outputs were conservative - these new values are much more in line
with intuitive expectation (with more aggressive splitting of blocks and less
code duplication between blocks).
Pass operand and architecture types and instead of
case opcode of
ADD -> case operands of
Just GPR gpr0 :< Nil of ->
SSA-semantics
Generate:
let opc_ADD operands = case operands of
Just GPR gpr0 :< Nil of ->
SSA-semantics
in case opcode of
ADD -> opc_ADD operand
This provides better encapsulation for the individual operands and
more specific control over the types (at the cost of a pair of
additional type specifications in the call). This also seems to
reduce memory consumption by about half.
The system call instructions TRAP and SC were updating the IP twice, which led
to skipping instructions. The IP increment for these instructions was already
handled in the abstract interpretation of arch-specific terminators.
Macaw has removed all floating point expression types, so we duplicate those as
arch-specific functions for PowerPC until the more general floating point
support is ready.
The old method involved providing the TH code a list of match expressions. This
made it very difficult to inspect arguments of instructions. The new approach
has the architecture backend provide a function that gets the first opportunity
to process instructions, which is much more flexible. This commit also includes
support for a number of cache hint instructions that use the new features.
The semantics for many of the vector instructions are incomplete and just set
the target register to undefined. This is enough for code discovery (for now).
This code was mostly architecture independent already, so this commit moves it
to the macaw-semmc module so that it can be shared with the ARM backend. I
still plan to move the main TH module with the SimpleBuilder to macaw
translation, but that requires a few other changes first.
The TOC parser now doesn't require a Memory object, making it easier to actually
instantiate this in derived tools (where the TOC parser needs to be used before
a memory is available). To do this, we use MemAddr as the base type for the TOC
instead of MemSegmentOff
The recursive simplifier could exhibit exponential behavior in cases where a
nested tree of irreducable terms were accumulated. The recursive calls quickly
exploded execution times.
The fix was to remove the recursive calls from the simplifier, but to
incrementally simplify expressions to constants as they are added (via the
addExpr function). This simplifies as much as the recursive case, but more
efficiently. This change required exporting the simplifyApp function.
This code now pulls all of the function addresses from the TOC as entry points
for the code discovery search. This lets us trivially find code reachable via
indirect calls, as the function pointer discovery heuristic doesn't seem to be
well-suited to PowerPC. I'd like to push on that, but it seems like a good
start for now.