Verifies that the number of blocks found matches what should generally
be expected from this particular executable.
The specific value checked for is not independently verified, it just
happens to be a reasonable-looking value that the discovery process
currently identifies, and encoding it here ensures that if discovery
ever changes that the change will be seen and explicitly accepted or
fixed as needed.
This change is in the core generator monad and applied in the PowerPC backend.
This change includes some macaw updates (which required a new elf-edit version).
Now test to ensure that no blocks end in a classification failure (or a
disassembly failure). Before, many blocks were not classified, which causes
problems downstream. This required some changes in macaw core in two places:
1. The simplifier needed some additional rules to remove some redundant
constructions that threw off the abstract interpretation of values. This was
particularly an issue while reading return values off of the stack in
PowerPC.
2. Extending the abstract interpretation to be able to handle more operations (shiftl)
Recent changes in macaw(-base) mean that we split blocks more aggressively. The
old expected outputs were conservative - these new values are much more in line
with intuitive expectation (with more aggressive splitting of blocks and less
code duplication between blocks).
Macaw has removed all floating point expression types, so we duplicate those as
arch-specific functions for PowerPC until the more general floating point
support is ready.
The TOC parser now doesn't require a Memory object, making it easier to actually
instantiate this in derived tools (where the TOC parser needs to be used before
a memory is available). To do this, we use MemAddr as the base type for the TOC
instead of MemSegmentOff
This code now pulls all of the function addresses from the TOC as entry points
for the code discovery search. This lets us trivially find code reachable via
indirect calls, as the function pointer discovery heuristic doesn't seem to be
well-suited to PowerPC. I'd like to push on that, but it seems like a good
start for now.
The code pointer discovery in macaw can't handle this case because we never
write the code pointers into memory - we only read them. We really need a way
to tell macaw about code pointers.
The easy workaround is to pull all of the function entry points out of the TOC
and just seed the macaw search with them, but it would be nice to be able to
identify them from first principles.
It runs code discovery over a large-ish binary to test coverage. We currently
fail due to unsupported instructions (expected). This test will guide
priorities on implementing new semantics.
My understanding of how macaw splits up blocks was incorrect when I wrote the
test initially. Macaw doesn't split blocks just because a jump happens to land
in the middle of the block, so the middle block in this example is actually a
few instructions longer.