* update to bv-sized branch of what4 and other things
* removed parameterized-utils submodule completely
* Updates submodules
* Fixes macaw-symbolic w.r.t. crucible-llvm changes
Co-authored-by: Ben Selfridge <ben@000548-benselfridge.local>
symbolic: Add some documentation on pointer operations
Their behavior is not entirely obvious, so hopefully this should be useful to
someone in the future.
This commit updates macaw-refinement to work with the latest macaw/crucible and makes a few improvements along the way.
The major changes involved in this are:
* Block labels were removed from macaw, so we had to come up with an alternative approach to making synthetic blocks to represent dispatch resolved by macaw-refinement that is not really a jump table. We considered adding a new terminator that encoded "computed IP-based dispatch", but there was concern about the impact on client code. Instead, we added a field to the `DiscoveryFunInfo` that records "external" resolutions to indirect control flow (e.g., as by an SMT solver in macaw-refinement). The hook by which we feed SMT-based resolutions back into macaw was modified accordingly (`addDiscoveredFunctionBlockTargets`).
* Solver invocation changed to allow solver selection and parallel solver application.
* Logging is now done via the `lumberjack` library.
* macaw-symbolic now uses the "external" resolutions in `DiscoveryFunInfo` while building crucible CFGs.
* The path creation code in macaw-refinement was simplified significantly and the approach to path creation has been documented.
* The run-refinement tool is now more featureful.
* The test suite is a bit more structured and no longer depends on the printed output of the discovery process.
Updates for GHC 8.8
The two main classes of update are related to MonadFail and type alias expansion.
The MonadFail updates introduce explicit MonadFail instances and backward-compatible `fail` implementations under `Monad` for older GHC versions.
The type alias expansion rules changed in GHC 8.8 in a way that breaks the `Simple Lens` idiom; instead, we have to use `Lens'`. Lens started supporting this alias in version 3.8, which was released in 2013.
This change includes necessary submodule updates, as well as the update for the split of what4 into its own repository.
There are two major changes:
- The interface to memory models in Data.Macaw.Symbolic has changed
- The suggested implementation in Data.Macaw.Symbolic.Memory has changed
The change improves performance and fixes a soundness bug.
* `macawExtensions` (Data.Macaw.Symbolic) takes a new argument: a `MkGlobalPointerValidityPred`. Use `mkGlobalPointerValidityPred` to provide one.
* `mapRegionPointers` no longer takes a default pointer argument (delete it at call sites)
* `GlobalMap` returns an `LLVMPtr sym w` instead of a `Maybe (LLVMPtr sym w)`
Users of the suggested memory model do not need to worry about the last change,
as it has been migrated. If you provided your own address mapping function, it
must now be total. This is annoying, but the old API was unsound because
macaw-symbolic did not have enough information to correctly handle the `Nothing`
case. The idea of the change is that the mapping function should translate any
concrete pointers as appropriate, while symbolic pointers should generate a mux
over all possible allocations. Unfortunately, macaw-symbolic does not have
enough information to generate the mux itself, as there may be allocations
created externally.
This interface and implementation is concerned with handling pointers to static
memory in a binary. These are distinguished from pointers to
dynamically-allocated or stack memory because many machine code instructions
compute bitvectors and treat them as pointers. In the LLVM memory model used by
macaw-symbolic, each memory allocation has a block identifier (a natural
number). The stack and each heap allocation get unique block identifiers.
However, pointers to static memory have no block identifier and must be mapped
to a block in order to fit into the LLVM memory model.
The previous implementation assigned each mapped memory segment in a binary to
its own LLVM memory allocation. This had the nice property of implicitly
proving that no memory access was touching unmapped memory. Unfortunately, it
was especially inefficient in the presence of symbolic reads and writes, as it
generated mux trees over all possible allocations and significantly slowed
symbolic execution.
The new memory model implementation (in Data.Macaw.Symbolic.Memory) instead uses
a single allocation for all static allocations. This pushes more of the logic
for resolving reads and writes into the SMT solver and the theory of arrays. In
cases where sufficient constraints exist in path conditions, this means that we
can support symbolic reads and writes. Additionally, since we have only a
single SMT array backing all allocations, mapping bitvectors to LLVM pointers in
the memory model is trivial: we just change their block identifier from zero
(denoting a bitvector) to the block identifier of the unique allocation backing
static data.
This change has to do some extra work to ensure safety (especially that unmapped
memory is never written to or read from). This is handled with the
MkGlobalPointerValidityPred interface in Data.Macaw.Symbolic. This function,
which is passed to the macaw-symbolic initialization, constructs well-formedness
predicates for all pointers used to access memory. Symbolic execution tasks
that do not need to enforce this property can simply provide a function that
never returns any predicates to check. Implementations that want a reasonable
default can use the mkGlobalPointerValidityPred from Data.Macaw.Symbolic.Memory.
The default implementation ensures that no reads or writes touch unmapped memory
and that writes to static data never write to read-only segments.
This change also converts the examples in macaw-symbolic haddocks to use doctest
to ensure that they do not get out of date. These are checked as part of CI.
The only real code change required is that simulation failure messages have an
extra argument. The goal with this update is to pull in some fixes to the
solver feature detection for yices in the latest crucible.
This version constructs a Crucible CFG for a collection of blocks while
preserving control flow between them. It allows the caller to specify blocks
that are considered "terminal": those blocks return the current register state.
Control flow to blocks no included in the "slice" are directed to synthetic
blocks that assume False in order to stop the symbolic simulator from exploring
those branches.
The goal is to support a jumptable testcase that is not supported by
the current jump bounds check. The jump bounds check needs to be
augmented so that it understands equality relationships between stack
values and registers, and bounds on both.
This patch tracks when a register points to a concrete stack offset.
As part of this, we droped the AbsDomain instance for AbsBlockState.
Clients should now likely use `fnStartAbsBlockState` in lieu of `top`.
The other client visible change is that the ClassifyFailure
constructor now has an extra argument with details about why
classification failure occured.
Most of the interface functions took a map from addresses to segments, however this map
was never actually used in macaw-symbolic.
The migration for this change is simply to remove the unused parameter from all
call sites in client code.