The bug arose in the handling of `StackOffsetAbsVal`, which track an abstraction
of references relative to the stack pointer. The offsets in `StackOffsetAbsVal`
are `Int64`; they are signed because references are both above and below the
stack pointer. The code constructing new values of this type was incorrectly
zero-extending new offsets instead of sign extending them. This did not matter
on 64 bit architectures, as it happened to result in the same values. It
substantially corrupted the abstract stack on PowerPC 32. It did not seem to
affect AArch32, but that is likely just due to luck in compiler code generation
that does not require this level of precision in the abstract stack.
The resulting errors manifest in the `absEvalCall` function. Because of the lack
of sign extension in `StackOffsetAbsVal`s, it made the current stack pointer
look like a huge number, which caused *all* stack entries to be dropped after
function calls.
This fix simplifies the stack offset abstract value computation substantially
and ensures that signs are extended correctly. The commit adds a PowerPC32 test
case that only passes with this fix.
GHC 9.2 adds `-Wincomplete-uni-patterns` to `-Wall`, which uncovers a slew of
previously unnoticed warnings in `macaw`. This patch fixes them, mostly by
adding explicit fall-through cases.
GHC 9.2 adds `-Wnoncanonical-monad-instances` to `-Wall`, which warns whenever
one has explicit implementations of `return` or `(>>)` that aren't simply
`return = pure` or `(>>) = (*>)`. Since these are the default
implementations of `return` and `(>>)` since `base-4.11`, the simplest
way to fix the warnings is to simply remove all explicit definitions of
`return` and `(>>)` and rely on the defaults, which this patch accomplishes.
In `base-4.16.*`, `Nat` is now a type synonym for `Natural`, and `GHC.TypeLits`
now re-exports `Natural`. This causes a `-Wunused-imports` warning in
`macaw-base` as a consequence. I fixed the warning by tightening up the imports
slightly.
The core of macaw cannot represent conditional calls because the existing block terminators are not sufficiently expressive and it doesn't support creating synthetic blocks to represent control flow not directly tied to machine addresses.
To work around this, we introduce ARM-specific block terminators for conditional calls and plumb them through up to macaw-aarch32-symbolic.
Fixes#288
The tail call classifier came after the jump classifier, which was a problem because it is less strict than the tail call classifier, meaning it would always fire. This commit moves direct jump to be the last classifier applied, giving the others a chance.
Includes a test case in the ARM backend.
This requires some updates to some of the expected test results, as a few blocks are now classified as tail calls that were
plain jumps before. They really could be considered either. I think it would be nice if these could be classified as jumps instead, but the reason they are flagged as tail calls is mostly down to the fact that their surrounding context is so simple that either interpretation works.
Correcting this would require some heuristics based on additional analysis passes.
The test harness for macaw symbolic required a few changes because the new detection of some jumps as tail calls introduces new calls into the symbolic test suites. However, the symbolic testing harness did not support calls before. Adding support required a bit of plumbing, including a more extensive code discovery pass.
Fixes#285
When a user overrides a system call on an architecture that supports returning two values from a system call and they provide a context containing the result of the system call in the form
```
empty :> v0 :> v1
```
macaw will perform the register assignment
```
r0 := v1
r1 := v0
```
This change reverses this behavior so that the assignment becomes
```
r0 := v0
r1 := v1
```
This brings the expected ordering of the result context in agreement
with the left-to-right ordering of the argument context:
```
empty :> arg1 :> arg2 :> ...
```
* remove/generalize MacawBlockEnd from CFG slicing
* expose functions in symbolic backend
* hide bvLit from Backend import
* add CI version to workflow
This bump the `elf-edit` submodule to bring in the changes from
https://github.com/GaloisInc/elf-edit/pull/29, which adds an additional
`VersionDefMap` argument to `elf-edit` to make it aware of version definitions.
This requires some changes to the API in `Data.Macaw.Memory.ElfLoader` to
accommodate.
This would come in handy for an application where I wish to pass a
`NonEmptyVector` to `newMergedGlobalMemoryWith`. Currently, I have to convert
the `NonEmptyVector` to a `NonEmpty` list to accomplish this, wish seems
wasteful given that `newMergedGlobalMemoryWith` only needs to use the
`Foldable` interface.
The return address gets masked and has the low-bit set in an obtuse way due to
the semantics. This threw off the call detection.
This change matches against the quirky pattern.
This change adds support for RV32GC RISCV binaries. Specifically, it:
* Updates the return matcher to recognize returns in 32-bit binaries
* Updates detection of unsupported binaries to allow RV32GC binaries
* Adds RV32GC versions of the RV64GC tests
This change adds a function `newMergedGlobalMemoryWith`, which acts like
`newGlobalMemoryWith` but takes a list of macaw memories and merges them
into a flat address space. This aids in reasoning dynamically linked
programs.
Before, the API provided by macaw-symbolic asserted the initial value of each byte of memory individually. This was fairly expensive for large binaries, as each such assertion flushed the solver pipe.
This change generates a large conjunction of assertions and sends them all at once. In unscientific testing, this saved half an hour on a large binary.
API Changes:
- Note that it introduces a minor API change. The optimization required that the `sym` parameter be concretely an `ExprBuilder`.
* riscv: added grift as submodule
* added macaw-riscv project
* make arch polykinded everywhere in macaw base
* stubbed out riscv_info
* update grift
* started on RISCVReg
* started on RISCVReg
* RegisterInfo instance for RISCVReg (a few unimplemented fields)
* filled out archRegSet
* filled out withArchConstraints, archAddrWidth, and archEndianness
* added Arch module
* RISCV initialBlockRegs
* preliminary work on disassembleFn
* wip: disassembleFn
* made things more lens-y
* wip: disassemble instruction
* finished disassembly of grift assignment statements
* separated out DisInstM into separate module
* disassembly wip
* finished disassembleBlock
* Finished riscvDisassembleFn
* bump grift submodule
* made macaw discovery poly-kinded
* added risc-v test suite
* added risc-v test suite
* fixed macaw semantics to hardware x0 to constant value 0
* added riscvPreserveReg based on assembler's manual
* riscvDemandContext
* successfully disassembled a block!
* enhanced tests to allow optional entry point spec
* stubbed out identifyCall
* stubbed out identifyReturn
* passing initial test
* added checkForReturnAddr stub
* fleshed out identifyCall and identifyReturn
* update grift submodule
* bug fix and exception handling
* added EXC register, which tracks whether or not we've attempted to
read from/write to any system registers.
* Replaced custom CSR type with GRIFT's (but we're not using it
currently)
* added better show instance for GPRs we should migrate this to a
GRIFT pretty printer at some point)
* Fixed a vicious bug in the semantics; unsigned and signed LT were
getting swapped in translation
* added pattern synonyms for GPRs
* improved docs and fixed RISCVReg bug (GP was 3 instead of 4)
* changed undefineds to errors
* changed RISCV class to RISCVConstraints
* wrapped GRIFT's "RV" parameter in a type to remove the need to make
macaw architecture parameter polykinded
* rolled back all changes to macaw base that made things poly-kinded
* reverted two more macaw core changes, updated license, removed old PPC test
* macaw: update to upstream changes in bv-sized and grift
* address code review comments
* macaw-riscv: expose fewer modules
* Update RISCVTermStmt definition
* Update riscv_info. macaw-riscv now builds against master
* Update bv-sized and cabal freeze files
* Update cabal freeze files with satisfying lens version
* Get tests building
* Fix printf runtime error
* Add simpler tests
* Change RISCV target version and update grift pointer
[skip ci]
* Compressed branch test passes
[skip ci]
* Add additional small tests
[skip ci]
* Introduce a syscall PrimFn
* Syscalls now correctly classified
* Fix return regs from syscall
* Extract syscall arguments
* Update expected riscv test results
* Add macaw-riscv build + test to CI
* Get building with GHC 9.0.2
* Revert "Update cabal freeze files with satisfying lens version"
This reverts commit 4aa95c19c3.
* Install softfloat in CI
* Update Grift
* Some initial cleanup
* More cleanup
* Resolve FIXME on getReg
* Detect and only accept rv64gc rvreprs
* Address Tristan's PR comments
* Update Grift pointer
* Add info on installing Softfloat to README for macaw-riscv
* Add missing submodule step to softfloat build instructions
Co-authored-by: Ben Selfridge <benselfridge@000279.local>
Co-authored-by: Valentin Robert <val@galois.com>
* Translate PLTStubs as tail calls
This change modifies `addMacawParsedTermStmt` to translate `PLTStub`s as
tail calls.
* Replace CR.Call + CR.Return with CR.TailCall
* Add support for standalone PIEs
This changeset adds support for standalone position independent
executables (PIEs) that do not make use of procedure linkage tables. It
does so by adding relative address support to `populateSegmentChunk` and
adding an additional simplification rule for Aarch32.
This covers part of the work for #234.
* Remove NoOp + replace mult with left shift
This mostly deals with the splitting of the old `sym` type into
two: one for dealing with expression creation, and a new simulator
backend type for dealing with control-flow and assertions.
See the writeup in Crucible.hs in this commit for details. In short, the recent
changes to generalize `PtrAdd` triggered a failing proof obligation due to a use
of `llvmPointer_bv`. The new implementation is as sound as the previous one,
but more general.
Fixes#260
GHC 9.0 uncovered this type family as being unused. (See
https://gitlab.haskell.org/ghc/ghc/-/issues/18470, which made
`-Wunused-top-binds` more clever about detecting unused, closed
type families like `FloatInfoFromSSEType`.) Let's remove it to avoid
an `-Wunused-top-binds` warning.
While pre-9.0 versions of GHC would silently turn negative right shifts into
left shifts, GHC 9.0 will throw an `arithmetic overflow` exception instead.
This patch makes this behavior explicit in `macaw-semmc` to allow the code to
work on GHC 9.0.
Fixes#212.
This contains a variety of fixes needed to make the packages in the `macaw`
repo compile with GHC 9.0:
* GHC 9.0 implements simplified subsumption (see
[here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#simplified-subsumption)).
In most cases, adapting to this is a matter of manually eta expanding
definitions, such as in `base:Data.Macaw.Analysis.RegisterUse`. In the case
of `macaw-x86-symbolic:Data.Macaw.X86.Crucible`, the type signature of
`evalExt` had to be made more specific to adapt to the loss of contravariance
when typechecking `(->)`.
* GHC's constraint solver now solves constraints in each top-level group
sooner (see
[here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#the-order-of-th-splices-is-more-important)).
This affects `macaw-aarch32` and `macaw-symbolic`, as they separate top-level
groups with `$(return [])` Template Haskell splices. The previous locations
of these splices made it so that the TH-generated instances in that package
were not available to any code before the splice, resulting in type errors
when compiled with GHC 9.0.
To overcome this, I rearranged the TH-generated instances so that they appear
before the top-level groups that make use of them.
* GHC 9.0 now enables `-Wstar-is-type` in `-Wall`, so this patch replaces some
uses of `*` with `Data.Kind.Type`. `Data.Kind` requires the use of GHC 8.0 or
later, so this patch also updates thes lower bounds on `base` to `>= 4.9` in
the appropriate `.cabal` files. (I'm fairly certain that this requirement was
already present implicity, but better to be explicit about it.)
* The `asl-translator`, `crucible`, and `semmc` submodules were updated to
allow them to build with GHC 9.0. The `llvm-pretty` and
`llvm-pretty-bc-parser` submodules were also bumped to accommodate unrelated
changes in `crucible` that were brought in.
* The upper version bounds on `doctest` in `macaw-symbolic`'s test suite were
raised to allow it to build with GHC 9.0.
There were two identical definitions of `toCrucibleEndian`, one in
`D.M.S.Memory` and another in `D.M.S.Testing`. This commit removes the
latter in favor of the former, which is actually exported.