The changes include:
Clean up elf loading to fix a bug in rel addend parsing.
Introduce block preconditions for populating reopt-vcg fields.
Change load options to match reopt's interface.
The goal is to support a jumptable testcase that is not supported by
the current jump bounds check. The jump bounds check needs to be
augmented so that it understands equality relationships between stack
values and registers, and bounds on both.
This patch tracks when a register points to a concrete stack offset.
As part of this, we droped the AbsDomain instance for AbsBlockState.
Clients should now likely use `fnStartAbsBlockState` in lieu of `top`.
The other client visible change is that the ClassifyFailure
constructor now has an extra argument with details about why
classification failure occured.
This introduces a new datatype CValue for representing constants
in Macaw programs, modifies the existing Value datatype to use then,
and introduces patterns for compatibility with existing datatypes.
The patch also updates the function argument analysis to use more
explicit argument passing rather than monadic updates. The intent is
to help clarify when data is initialized rather than updated.
Finally this updates a README and does some minor updates.
This primarily refines the abstract state propagated to branch
pairs. It was needed on the ARM platform to support the IT blocks
with the changes to the Core representation in macaw-base 0.3.6.
This also includes a few simplifications added and comment
improvements.
`MonadFail` being a forward-compatibility measure, overriding
`Control.Monad.Fail.fail` in GHC versions <= 8.6 doesn't do anything
unless `Control.Monad.Fail.fail` is invoked explicitly (or the importing
module happens to have `-XMonadFailDesugaring` on).
These `getCallTarget` and `doJump` were calling `fail` if they saw an argument
type that we hadn't thought to handle yet. This change turns those errors into
TranslationError statements, allowing macaw to continue exploring code.
This came up recently in a glibc-based example where macaw ended up exploring
unaligned code and creating a strange jump to a far pointer, which doesn't make
much sense in x86_64 mode.
Before, we just discarded them during the translation. They are useful metadata
for generating diagnostics in Crucible, so this commit translates them. They
are no-ops during symbolic evaluation.
To make them truly useful, they need to include the address of the block that
they belong to (their data payload in macaw is just an offset from the start of
a block). This information wasn't available before, so it has to be plumbed
through in macaw-x86.
This is not currently an error, as this function is only used in the definition
of the semantics for push, which doesn't accept a signed immediate value. This
fix is defensive in case someone decides to re-use this helper in another
context where the missing cases could cause a problem.
We were hitting a translation error for imul in another application - this test
case is a reduced example demonstrating the problem.
The root cause was that there were a few missing cases for the new signed
immediate values from flexdis; this caused a fallthrough that mis-identified
signed immediates as non-immediates, triggering an error.
This update renames many of the declarations exported by
Data.Macaw.Memory so that we have more consistent names.
The majority of the existing names are now exported with DEPRECATION
warnings. Some of the symbol declarations that were not used by the
Memory datatype have been moved to other modules.
The minor version of macaw-base has been incremented.
This patch adds initial support for relocations in Macaw code
discovery, and adds other refactoring.
* It introduces a SymbolValue constructor to represent references to
symbols within Macaw.
* The various cases for x86 mov are made explicit after the flexdis refactor
broke the previous code. We should now support segment register movs and
give better error messages when seeing mov with control or debug registers.
* The generic exception operation is replaced with Hlt and UD2 terminal
x86-specific statements.
* CodeAddrReason is split into FunctionExploreReason and BlockExploreReason to
clarify whether a function or block was discovered.
* The Macaw pretty printer is changed to use write_mem in place of pointer syntax.
* Various other refactoring is made to clarify code.
The tests now check to make sure that no blocks end in a classification failure.
This exposed a problem where some simple cases (where the return address was
read from the stack) where we were getting classification failures.
It turns out that the problem was due to the code being PIE and loaded at a very
low address. This made a number of small constants look like code pointers,
which threw off the abstract interpretation.
The fix is to load the test binaries at a large offset (0x400000 or so) to
reduce the likelihood of overlap.
Instead of inline analysis of whether the instruction pointer has been
updated to contain the ReturnAddr symbolic value, defer the
determination of the call return to the (previously defined but
unused) architecture-specific handling. This allows architectures
like ARM that perform modifications on the values loaded to the
instruction pointer (e.g. clearing lower bits) to provide their own
recognition of a return operation.
Also modifies the signature of identifyReturn to return a Sequence of
statements to match the identifyCall type signature.
Replaces the previously unused identifyX86Return with the inline
detection of IP == ReturnAddr.