It needs to take (and return) a Crucible state so that we can insert the new
function handle into the handle map (so that the Crucible Call statement can
find it).
The GlobalMap is mapping from virtual addresses computed by a program to the
corresponding logical address in the LLVM memory model during symbolic
simulation. It is needed because addresses in binaries are computed from
bitvectors, which are not valid pointers in the LLVM memory model.
This change turns the GlobalMap from a Data.Map into a function, which is more
flexible and allows for a wider range of possible implementations of this
functionality, especially implementations that introduce numerous disjoint
segments for the original binary contents.
The former strategy was to represent macaw calls using a macaw-specific
MacawCall statement, which was interpreted by a call handler (which took
registers+memory as input and produced new registers+memory as an output). This
worked for cases where the callee had a summary, but did not allow for
simulating the called function inline. Moreover, the OverrideSim monad doesn't
admit recursive calls in this context (we can make the call, but we can't get
the final simulator state out, which we would need to implement a call handler
in macaw-symbolic).
The new strategy is to translate macaw calls into two separate statements:
1. A `LookupFunctionHandle` call, which returns a Crucible FunctionHandle, and
2. A normal Crucible `Call`
The interpretation of LookupFunctionHandle has the full register+memory state
available, and can inspect the IP to determine which function has been
called (and provide the necessary FunctionHandle, which will be interpreted by
Crucible in the standard way). Note that the handler is in IO, so client code
can translate functions being simulated into Crucible on-demand.
This commit introduces a new syntax extension for the macaw translation to
represent the ArchState statement: MacawArchStateUpdate.
Also adds some new instances for MacawCrucibleValue.
The new statement is called `ArchState`, and has two fields: an address and a
map. The address is the address of the instruction it is standing in for. The
map contains a mapping from the *machine registers* that the instruction updated
to the *macaw values* that were assigned to those locations.
This is useful metadata for debugging, but is also required to do some types of
architecture-independent analysis (where we can still reason about machine
register contents).
When computing pointers we don't always check that the results are valid.
Instead, we do the check whenever we use the pointers.
The reason is to support code where pointers are temporarily "bad"
but are never used that way. For example:
subq $10, %aex # aex contains a pointer
Loop:
addq $10, %aex
...
We don't really do anything with alignment, but sometime asm code
ands pointers to align them. For example `andq $(-64), %rsp`
aligns the pointer to a multiple of 64.
To support code like this we treat "and"-ing a pointer with a special
constant of the form 0xFFFF...FF000 (i.e., and alignment) as a subtracting
`0x0000...00XXX` where the `XXX` is symbolic.
This looses some information (i.e., we don't know that the result is aligned).
However, it is good enough for checking memory safety, as it covers
all possible results of the alignment.
Previously we were asserting that some bogus-y things don't happen.
Unfortunately, these expressions can occur in code that was not
directly written by the user (e.g., comparisons for setting various
machine flags). To allow for that, we allow the expressions, but
give them undefined values. So the proof will succeed only if it
does not depend on the values of these bogus comparisons.
We basically punt, by passing-in a function to use as the implementation of
all functions. This function is supposed to look at the IP, and
decide what to do.