ArchMemAddr is easier to use than ArchAddrWord in downstream clients, and is
probably more faithful in the case where we want to support shared libraries
and/or object files.
The new statement is called `ArchState`, and has two fields: an address and a
map. The address is the address of the instruction it is standing in for. The
map contains a mapping from the *machine registers* that the instruction updated
to the *macaw values* that were assigned to those locations.
This is useful metadata for debugging, but is also required to do some types of
architecture-independent analysis (where we can still reason about machine
register contents).
The tests now check to make sure that no blocks end in a classification failure.
This exposed a problem where some simple cases (where the return address was
read from the stack) where we were getting classification failures.
It turns out that the problem was due to the code being PIE and loaded at a very
low address. This made a number of small constants look like code pointers,
which threw off the abstract interpretation.
The fix is to load the test binaries at a large offset (0x400000 or so) to
reduce the likelihood of overlap.
Instead of inline analysis of whether the instruction pointer has been
updated to contain the ReturnAddr symbolic value, defer the
determination of the call return to the (previously defined but
unused) architecture-specific handling. This allows architectures
like ARM that perform modifications on the values loaded to the
instruction pointer (e.g. clearing lower bits) to provide their own
recognition of a return operation.
Also modifies the signature of identifyReturn to return a Sequence of
statements to match the identifyCall type signature.
Replaces the previously unused identifyX86Return with the inline
detection of IP == ReturnAddr.
When computing pointers we don't always check that the results are valid.
Instead, we do the check whenever we use the pointers.
The reason is to support code where pointers are temporarily "bad"
but are never used that way. For example:
subq $10, %aex # aex contains a pointer
Loop:
addq $10, %aex
...
We don't really do anything with alignment, but sometime asm code
ands pointers to align them. For example `andq $(-64), %rsp`
aligns the pointer to a multiple of 64.
To support code like this we treat "and"-ing a pointer with a special
constant of the form 0xFFFF...FF000 (i.e., and alignment) as a subtracting
`0x0000...00XXX` where the `XXX` is symbolic.
This looses some information (i.e., we don't know that the result is aligned).
However, it is good enough for checking memory safety, as it covers
all possible results of the alignment.