It runs code discovery over a large-ish binary to test coverage. We currently
fail due to unsupported instructions (expected). This test will guide
priorities on implementing new semantics.
This helper additionally simplifies constants. This is very useful for dealing
with simplifying the instruction pointer. That is required by the rest of
macaw, which expects IP values it wants to explore to be fully reduced.
The current heuristic isn't great, but is probably okay for now. It just checks
to see if the LNK register is an address plus four. Something more precise
would require knowing the address of the next instruction, but we can't get that
from the IP, which has already been changed due to the call.
The semantics of each instruction are atomic updates over the register state.
Prior to this commit, changes were not atomic and updates to register values
were visible to later register definitions, which causes a huge number of
problems. Now, we take a snapshot of the register state at the beginning of the
instruction and read all values we need from that snapshot. This way, updates
are isolated from one another.
My understanding of how macaw splits up blocks was incorrect when I wrote the
test initially. Macaw doesn't split blocks just because a jump happens to land
in the middle of the block, so the middle block in this example is actually a
few instructions longer.
It now recursively traverses its arguments. This isn't great from an efficiency
perspective, but we need it to be able to simplify instruction pointers computed
from relative jumps (which involve some sign extensions and shifts).
These values are new values of the IP to explore, and the code consuming these
values expects them to be BV literals (i.e., simplified from expressions to
values).
The simplifier isn't currently powerful enough to simplify everything we throw
at it, but this is at least the right place to apply it. If we don't simplify
here, the core of macaw won't know how to follow the IP changes and will miss
blocks.
These operations generate a lot of code, so it is helpful to factor them out and
reduce the burden on the type checker. Factoring these two definitions out cuts
the generated code nearly in half.
The change is actually in the semantics (semmc), where we were extracting the
wrong part of the 128 bit vector registers to operate on. Many operations were
being simplified to zero, which manifest as unused fprc registers.
The previous implementation missed an IP update, which is required to prevent
macaw from treating the syscall instruction as its own basic block. Also factor
out the implementation of SC so that we can re-use it later for TW.
If we don't do this, the saved IP is unsimplified and contains expressions,
which means that the next decoding step won't simplify properly (it would
require recursive simplification, which we would prefer to avoid).