These names are so short that they might conflict with local names in
files that import this module. Instead, export them from `Regs` and
encourage qualified use of that module.
Previously, the `macaw-x86` semantics for `call` would retrieve the call target
*after* pushing the next instruction's address to the stack, but if the call
target involves the stack pointer, then this would mean that it would get the
next instruction's address when retrieving the call target. This is not what is
intended!
This patch fixes the issue by always retrieving the call target *before*
pushing the next instruction's address to the stack. I have added a test case
to the `macaw-x86-symbolic` test suite which demonstrates that this fix works
as intended.
Fixes#420.
When converting a Macaw value with the Macaw type `TupleType [x_1, ..., x_n]`
to Crucible, the resulting Crucible value will have the Crucible type
`StructType (EmptyCtx ::> ToCrucibleType x_n ::> ... ::> ToCrucibleType x_1)`.
(See `macawListToCrucible(M)` in `Data.Macaw.Symbolic.PersistentState` for
where this is implemented.) Note that the order of the tuple's fields is
reversed in the process of converting it to a Crucible struct. This is a
convention that one must keep in mind when dealing with Macaw tuples at the
Crucible level.
As it turns out, the part of `macaw-x86-symbolic` reponsible for interpreting
the semantics of the `idiv` instruction (for signed quotient/remainder) and the
`div` instruction (for unsigned quotient/remainder) were _not_ respecting this
convention. This is because the `macaw-x86-symbolic` semantics were returning a
Crucible struct consisting of `Empty :> quotient :> remainder)`, but at the
Macaw level, this was interpreted as the tuple `(remainder, quotient)`, which
is the opposite of the intended order. This led to subtle bugs such as those
observed in #393.
The solution is straightforward: have the `macaw-x86-symbolic` semantics
compute `Empty :> remainder :> quotient` instead. Somewhat counterintuitive,
but it does work.
Fixes#393.
This extends `Data.Macaw.Symbolic.Testing` in `macaw-symbolic` to be able to
handle binaries that depend on shared libraries. This is fully functional for
the x86-64 and AArch32 symbolic backends, and I have added test cases to the
respective repos demonstrating that it works. (The PowerPC backend is not yet
supported. At a minimum, this is blocked on GaloisInc/elf-edit#35.)
To implement this, I also needed to add some additional infrastructure to
`macaw-base` (I put this infrastructure here as it doesn't depend on any
Crucible-specific functionality):
* `Data.Macaw.Memory.ElfLoader.DynamicDependencies`: a basic ELF dynamic
loader that performs a breadth-first search over all `DT_NEEDED` entries
that an ELF binary depends on (both directly and indirectly).
* `Data.Macaw.Memory.ElfLoader.PLTStubs`: a collection of heuristics for
detecting the addresses of PLT stubs in a dynamically linked binary.
It is worth noting that shared libraries are rife with nuance and subtlety,
and the way `macaw` models shared libraries is not 100% accurate. I have
written a length `Note [Shared libraries]` in `Data.Macaw.Symbolic.Testing`
to describe where corners had to be cut.
Fixes#318.
GHC 9.2 adds `-Wincomplete-uni-patterns` to `-Wall`, which uncovers a slew of
previously unnoticed warnings in `macaw`. This patch fixes them, mostly by
adding explicit fall-through cases.
The tail call classifier came after the jump classifier, which was a problem because it is less strict than the tail call classifier, meaning it would always fire. This commit moves direct jump to be the last classifier applied, giving the others a chance.
Includes a test case in the ARM backend.
This requires some updates to some of the expected test results, as a few blocks are now classified as tail calls that were
plain jumps before. They really could be considered either. I think it would be nice if these could be classified as jumps instead, but the reason they are flagged as tail calls is mostly down to the fact that their surrounding context is so simple that either interpretation works.
Correcting this would require some heuristics based on additional analysis passes.
The test harness for macaw symbolic required a few changes because the new detection of some jumps as tail calls introduces new calls into the symbolic test suites. However, the symbolic testing harness did not support calls before. Adding support required a bit of plumbing, including a more extensive code discovery pass.
Fixes#285
This mostly deals with the splitting of the old `sym` type into
two: one for dealing with expression creation, and a new simulator
backend type for dealing with control-flow and assertions.
See the writeup in Crucible.hs in this commit for details. In short, the recent
changes to generalize `PtrAdd` triggered a failing proof obligation due to a use
of `llvmPointer_bv`. The new implementation is as sound as the previous one,
but more general.
Fixes#260
See `Note [Sign-extending immediate operands in push]` in
`Data.Macaw.X86.Semantics` for the full story. I have also added a test case
in `macaw-x86-symbolic` which ensures that the stack-pointer-decrementing
logic behaves as one would expect.
Bumps in the `flexdis86` submodule to bring in GaloisInc/flexdis86#37.
Fixes#235.
This change adds an optional argument to `genArchVals` that allows client code to override the backend translation behavior of `MacawArchStmtExtension`s on a statement-by-statement basis. The new argument has type `Maybe (MacawArchStmtExtensionOverride arch)`, where `MacawArchStmtExtensionOverride` is a function that takes a statement and a crucible state, and returns an optional tuple containing the value produced by the statement, as well as an updated state. Returning 'Nothing' indicates that the backend should use its default handler for the statement.
Client code that wishes to maintain the existing default behavior in all cases can simply pass `Nothing` for the new argument to `genArchVals`.
GaloisInc/crucible#794 increases the number of functions that use
implicit `MemOptions`, including a handful of key LLVM memory model–related
functions. As a result, many parts of `macaw` need to add implicit `?memOpts`
parameters to accommodate to this change.
The new test suites cover x86_64, PowerPC, and ARM. They test that the semantics are actually correct (rather than just seeing if symbolic execution produces any result). The `Data.Macaw.Symbolic.Testing` module in macaw-symbolic provides some common utilities for symbolic execution engine setup, while there are tailored test harnesses for each architecture.
The semantics of the test harnesses are documented in each architecture test suite, but they:
1. Discover all of the test binaries (which are generated from the included makefiles)
2. Treat each function whose name begins with `test_` as a test entry point
3. Symbolically executes each test case with fully symbolic register states
4. Extracts the return value after symbolic execution, which is treated as the predicate to an assertion that must be proved
- If the test case is in the `pass` subdirectory, it is proved and expected to hold
- If the test case is in the `fail` subdirectory, it is proved and expected to not hold.
Each test harness supports two options for debugging:
- Dumping generated SMT queries
- Dumping generated Macaw IR for inspection
This testing uncovered a bug in the (previously untested) macaw-aarch32-symbolic code. It required a number of submodule updates to:
- Adapt to some what4 changes
- Fix a bug in the LLVM memory model that lets these tests pass
- Adapt to changes to some crucible APIs
This change also modifies the CI configuration to install SMT solvers earlier (which are now needed for all of the symbolic package tests).
There are two major changes:
- The interface to memory models in Data.Macaw.Symbolic has changed
- The suggested implementation in Data.Macaw.Symbolic.Memory has changed
The change improves performance and fixes a soundness bug.
* `macawExtensions` (Data.Macaw.Symbolic) takes a new argument: a `MkGlobalPointerValidityPred`. Use `mkGlobalPointerValidityPred` to provide one.
* `mapRegionPointers` no longer takes a default pointer argument (delete it at call sites)
* `GlobalMap` returns an `LLVMPtr sym w` instead of a `Maybe (LLVMPtr sym w)`
Users of the suggested memory model do not need to worry about the last change,
as it has been migrated. If you provided your own address mapping function, it
must now be total. This is annoying, but the old API was unsound because
macaw-symbolic did not have enough information to correctly handle the `Nothing`
case. The idea of the change is that the mapping function should translate any
concrete pointers as appropriate, while symbolic pointers should generate a mux
over all possible allocations. Unfortunately, macaw-symbolic does not have
enough information to generate the mux itself, as there may be allocations
created externally.
This interface and implementation is concerned with handling pointers to static
memory in a binary. These are distinguished from pointers to
dynamically-allocated or stack memory because many machine code instructions
compute bitvectors and treat them as pointers. In the LLVM memory model used by
macaw-symbolic, each memory allocation has a block identifier (a natural
number). The stack and each heap allocation get unique block identifiers.
However, pointers to static memory have no block identifier and must be mapped
to a block in order to fit into the LLVM memory model.
The previous implementation assigned each mapped memory segment in a binary to
its own LLVM memory allocation. This had the nice property of implicitly
proving that no memory access was touching unmapped memory. Unfortunately, it
was especially inefficient in the presence of symbolic reads and writes, as it
generated mux trees over all possible allocations and significantly slowed
symbolic execution.
The new memory model implementation (in Data.Macaw.Symbolic.Memory) instead uses
a single allocation for all static allocations. This pushes more of the logic
for resolving reads and writes into the SMT solver and the theory of arrays. In
cases where sufficient constraints exist in path conditions, this means that we
can support symbolic reads and writes. Additionally, since we have only a
single SMT array backing all allocations, mapping bitvectors to LLVM pointers in
the memory model is trivial: we just change their block identifier from zero
(denoting a bitvector) to the block identifier of the unique allocation backing
static data.
This change has to do some extra work to ensure safety (especially that unmapped
memory is never written to or read from). This is handled with the
MkGlobalPointerValidityPred interface in Data.Macaw.Symbolic. This function,
which is passed to the macaw-symbolic initialization, constructs well-formedness
predicates for all pointers used to access memory. Symbolic execution tasks
that do not need to enforce this property can simply provide a function that
never returns any predicates to check. Implementations that want a reasonable
default can use the mkGlobalPointerValidityPred from Data.Macaw.Symbolic.Memory.
The default implementation ensures that no reads or writes touch unmapped memory
and that writes to static data never write to read-only segments.
This change also converts the examples in macaw-symbolic haddocks to use doctest
to ensure that they do not get out of date. These are checked as part of CI.
The changes include:
Clean up elf loading to fix a bug in rel addend parsing.
Introduce block preconditions for populating reopt-vcg fields.
Change load options to match reopt's interface.