The goal is to support a jumptable testcase that is not supported by
the current jump bounds check. The jump bounds check needs to be
augmented so that it understands equality relationships between stack
values and registers, and bounds on both.
This patch tracks when a register points to a concrete stack offset.
As part of this, we droped the AbsDomain instance for AbsBlockState.
Clients should now likely use `fnStartAbsBlockState` in lieu of `top`.
The other client visible change is that the ClassifyFailure
constructor now has an extra argument with details about why
classification failure occured.
Update to API changes in macaw-base in macaw-ppc and macaw-arm
The "block label" abstraction (used during arch-specific disassembly) was removed some time ago in the base macaw library. This change updates macaw-ppc and macaw-arm to remove uses of block labels. The major change is that the disassembly function only returns a single block at a time instead of a sequence of blocks.
To facilitate this, the handling of the PowerPC conditional trap instruction (trap doubleword) is now an architecture-specific terminator instruction instead of encoding the logic of conditional trapping. We will now have to encode the conditional trapping logic in macaw-ppc-symbolic. Note that we have not done so yet.
This commit also updates the expected results of the PowerPC tests; the number of discovered blocks is different, but not significantly so. It is hard to tell if this is a regression or an improvement.
The ArchitectureInfo checkForReturnAddr is used to check if a specific
value corresponds to the symbolic "ReturnAddr", indicating that the
target is the original call location (this is used to identify
tail-call recursion or identify that a return has been performed from
the primary function via identifyReturn).
The current implementation simply checks for ReturnAddr in the Link
Register (LR), but it needs to be enhanced to detect ARM semantic
manipulation of ReturnAddr (clearing the low bit(s), etc.).
We now thread a snapshot of the register state from the beginning of the
instruction evaluation through each instruction's semantics instead of
re-fetching register values each time we need it and potentially seeing
incorrect, partially modified register values.
The field it contains is supposed to be the instruction offset in its basic
block; overflowing it can cause significant problems during symbolic simulation.
There is a new metadata statement that tracks the start address of each
instruction. This is used in the translation to Crucible to provide better
error messages. The x86 backend was already updated, this commit adds the
metadata to the ARM and PowerPC backends.
In macaw core, the type of the arch-specific 'disassemble' function changed to
no longer take a Memory, and to pass the maximum offset as an Int instead of a
MemWord. It also removed the jump table entry size (which is no longer
required).
The removal of the Memory parameter required a bit of a change in how the
instruction parsers are structured, but it isn't a huge change (the "memory
contents after an address" can be computed from a MemSegmentOff, too).
The previous generator put all of the code for each matcher in a single large
case expression. While there were individual functions broken out for each case
body, they were all still in the same let expression, which created a huge term.
This refactoring lifts all of the semantics definition bodies to the top
level (with NOINLINE pragmas) to give the code generator less to chew on at a
time.
This improves compile times a little, but, more importantly, works around a bug
in the register allocator in GHC 8.4 that caused a crash in the PowerPC
semantics functions.
The Macaw Discovery now calls the identifyReturn to identify return
statements. Supply this for ARM, but at present this simply
replicates the original inline code which does not properly detect ARM
return operations because the low bit(s) of the address are always
cleared when writing to the instruction pointer in ARM.