* update to bv-sized branch of what4 and other things
* removed parameterized-utils submodule completely
* Updates submodules
* Fixes macaw-symbolic w.r.t. crucible-llvm changes
Co-authored-by: Ben Selfridge <ben@000548-benselfridge.local>
These packages replace the old macaw-arm (which has been removed). The only
change to the core macaw is to introduce a `Lift` instance for the Endianness
data type, which is used in macaw-semmc.
The macaw-aarch32 package uses the official ARM semantics (via the
asl-translator package). In its current state, macaw-aarch32 seems to handle
the common idioms of simple ARM binaries. Position independent executables have
not been tested yet. The semantics and disassemblers for Thumb are present, but
not integrated into code discovery at this time. There are some tests in
macaw-aarch32. Compile times are longer than necessarily desired.
macaw-aarch32 can be compiled in two modes: lite mode (cabal flag -fasl-lite),
which uses a restricted set of instructions for testing, and takes less time to
compile. The full instruction set is the default, though there are a few
undefined functions that are not yet handled for the full set, mostly relating
to floating point operations.
The macaw-aarch32-symbolic package is currently a stub, but is implemented to
provide a few necessary instances.
This commit updates macaw-refinement to work with the latest macaw/crucible and makes a few improvements along the way.
The major changes involved in this are:
* Block labels were removed from macaw, so we had to come up with an alternative approach to making synthetic blocks to represent dispatch resolved by macaw-refinement that is not really a jump table. We considered adding a new terminator that encoded "computed IP-based dispatch", but there was concern about the impact on client code. Instead, we added a field to the `DiscoveryFunInfo` that records "external" resolutions to indirect control flow (e.g., as by an SMT solver in macaw-refinement). The hook by which we feed SMT-based resolutions back into macaw was modified accordingly (`addDiscoveredFunctionBlockTargets`).
* Solver invocation changed to allow solver selection and parallel solver application.
* Logging is now done via the `lumberjack` library.
* macaw-symbolic now uses the "external" resolutions in `DiscoveryFunInfo` while building crucible CFGs.
* The path creation code in macaw-refinement was simplified significantly and the approach to path creation has been documented.
* The run-refinement tool is now more featureful.
* The test suite is a bit more structured and no longer depends on the printed output of the discovery process.
Updates for GHC 8.8
The two main classes of update are related to MonadFail and type alias expansion.
The MonadFail updates introduce explicit MonadFail instances and backward-compatible `fail` implementations under `Monad` for older GHC versions.
The type alias expansion rules changed in GHC 8.8 in a way that breaks the `Simple Lens` idiom; instead, we have to use `Lens'`. Lens started supporting this alias in version 3.8, which was released in 2013.
This change includes necessary submodule updates, as well as the update for the split of what4 into its own repository.
The new registerUse analysis uses a three phase process:
Phase 1 computes invariants about the start state of each block. It
will indicate when registers/stack locations store stack offsets, and
where callee saved registers are stashed. It also memoizes
information about stack reads and writes to simplify later passes.
Phase 2 is a demand analysis that computes which registers and stack
locations must be available to execute the program. It then
propagates those constraints across blocks in the function.
Phase 3 combines the information into a form relevant for function
recovery.
The changes include:
Clean up elf loading to fix a bug in rel addend parsing.
Introduce block preconditions for populating reopt-vcg fields.
Change load options to match reopt's interface.
I believe this is due to an ambiguity of whether the
post-documentation reference is to the previous constructor or just
the last argument of the previous constructor. By moving the haddocks
to be a pre-doc this seems to work better.
It also cannot handle the UNPACK pragma, which could probably be
removed since -funbox-small-strict-fields is on by default, but
haddock would still fail for the strictness annotation.
By moving the haddocks to the constructor instead of the individual
fields the strictness restriction can be avoided.
The goal is to support a jumptable testcase that is not supported by
the current jump bounds check. The jump bounds check needs to be
augmented so that it understands equality relationships between stack
values and registers, and bounds on both.
This patch tracks when a register points to a concrete stack offset.
As part of this, we droped the AbsDomain instance for AbsBlockState.
Clients should now likely use `fnStartAbsBlockState` in lieu of `top`.
The other client visible change is that the ClassifyFailure
constructor now has an extra argument with details about why
classification failure occured.
This introduces a new datatype CValue for representing constants
in Macaw programs, modifies the existing Value datatype to use then,
and introduces patterns for compatibility with existing datatypes.
The patch also updates the function argument analysis to use more
explicit argument passing rather than monadic updates. The intent is
to help clarify when data is initialized rather than updated.
Finally this updates a README and does some minor updates.
The parseFetchAndExecute in Discovery attempts to identify ParsedITE
terminal statements by examining the value of the ip_reg via
valueAsApp and pattern matching on a Mux statement. This patch adds
specific handling in the Macaw CFG Rewriter to attempt to float Mux
statements upwards (aka "Mux Head Normal Form") so that they will be
the top-most ip_reg value and therefore be recognized as a ParsedITE
terminator.
For macaw-ppc testing of the 988KB gzip binary, this increased the
number of blocks found from 1339 to 37950 (and increased the test
runtime from 1.36s to 88.14s).
This patch focuses on function argument analysis, but includes some
other cleanups.
The main changes are to add additional comments and cleanups to the
function argument analysis code. This also extends the analysis so
that we can annotate the types of some of the functions and use those
types during analysis.
As part of this we tighten the PLTStub checking, and clean up the
elfloader in some minor ways.
WidthEqProofs are now irrelevant. Two proofs with the same
coercision source and destination will be equal. This allows us to
add a transitivity constructor without introducing spurious
inequalities, and will in the future allow us to collapse multiple
bitcasts into a single bitcast.
adjustedLoadRegionIndex is exported for reopt.
TypeRepr now has a pretty instance.
This primarily refines the abstract state propagated to branch
pairs. It was needed on the ARM platform to support the IT blocks
with the changes to the Core representation in macaw-base 0.3.6.
This also includes a few simplifications added and comment
improvements.
The caller (e.g. macaw-refinement) can provide an additional Rewriter
operation that can operate on TermStmts for blocks (typically those
for which a previous Discovery was unable to determine a transfer
target). There is an additional
entrypoint ('addDiscoveredFunctionBlockTargets') that will allow this
additional rewriter to be supplied for updating an existing
DiscoveryState.
Some of the TermStmt and other elements might generate new blocks as
part of the Rewrite operation (e.g. adding a new 'Branch' TermStmt) so
this change allows the rewrite context to update the generated block
labels and collect these newly generated blocks for inclusing in the
results passed to the parser.
Before, we just discarded them during the translation. They are useful metadata
for generating diagnostics in Crucible, so this commit translates them. They
are no-ops during symbolic evaluation.
To make them truly useful, they need to include the address of the block that
they belong to (their data payload in macaw is just an offset from the start of
a block). This information wasn't available before, so it has to be plumbed
through in macaw-x86.
The use of `Data.Parameterized.Map.fromList` in `mkRegStateM` was
showing up in profiling as a huge time sink. We don't actually need to
build the map from scratch there, though, since the keys are known ahead
of time. Adding an `archRegSet` variable to the `RegisterInfo` class
(with the obvious default implementation) ensures that a `MapF` with the
right keys will be built once and then reused.
This mostly affects x86. Previously, we threw away the write of the return
address to the stack when identifying calls for macaw-x86. This was partly for
hygiene and partly to support the "addresses written to memory are function
pointers" heuristic. Treating the return address as a potential function
pointer breaks function identification, so that is important.
The problem comes in the translation of macaw into crucible - we never write the
return address to the stack, but returns still read the return address from the
stack. If it wasn't written in the first place, this leads to a read
from (potentially) uninitialized memory, which causes errors in the symbolic
simulator. There are two solutions:
1. Make returns not read from the stack
2. Keep the write of the return address to the stack
Solution 1 is a problem, as we have a data dependency on the read. Eliding it
breaks Crucible generation later and produces an invalid CFG.
Solution 2 works well. The implementation is actually simple. We can keep
identifyCall the same for x86 and just construct the basic block not from the
return value but from the original list of statements (unaltered). We do need
to have identifyCall still give us the reduced statement list, which we use for
identifying possible function pointers written onto the stack (but not the
return address, which we do not want to treat as a function pointer).
This update renames many of the declarations exported by
Data.Macaw.Memory so that we have more consistent names.
The majority of the existing names are now exported with DEPRECATION
warnings. Some of the symbol declarations that were not used by the
Memory datatype have been moved to other modules.
The minor version of macaw-base has been incremented.
This should cut down on the number of proxies/explicit type arguments
needed when dealing with these types.
Awkwardly, ArchTermStmt isn't injective, because PPC32 and PPC64 happen
to use exactly the same type. We could add an argument to that type and
then all the families could be injective.
The pretty-printer for Stmts takes a pretty-printer function as an
argument. This used when a Stmt stores an offset from the beginning of a
block can, but we don't have information about that block internally in
the Stmt.
An ArchState Stmt stores an ArchMemAddr, which is independent of the
block it's in. Previously we were treating the ArchMemAddr as an offset
and passing it to the pretty-printer function for offsets; in practice
this means most of them were printed as values about twice as big as
they were supposed to be.