macaw/README.md

154 lines
5.8 KiB
Markdown
Raw Normal View History

This is the main repository for the Macaw binary analysis framework with two key
goals: binary code discovery and symbolic execution of machine code. This
framework is implemented to offer extensible support for architectures (i.e.,
library clients can add their own architectures and opt in to the architecture
support they need).
2017-09-28 02:12:44 +03:00
# Overview
The code discovery algorithm is based on forced execution and is able to
discovery code from one or more entry points. Symbols are optional but can
significantly improve the quality of the results. Stripped binaries can pose a
challenge for macaw (especially static stripped binaries). Macaw provides
support for lifting discovered machine code into an IR suitable for symbolic
execution via the Crucible library.
Currently, macaw supports:
* x86-64
* PowerPC (32 and 64 bit)
* ARM (32 bit)
* RISC-V
## Repository Structure
2017-09-28 02:12:44 +03:00
The Macaw libraries are:
2017-09-28 02:12:44 +03:00
* macaw-base -- The core architecture-independent operations and
algorithms.
* macaw-symbolic -- Library that provides symbolic simulation of Macaw
programs via Crucible.
* macaw-x86 -- Provides definitions enabling Macaw to be used on
X86_64 programs.
* macaw-x86-symbolic -- Adds Macaw-symbolic extensions needed to
support x86.
* macaw-semmc -- Contains the architecture-independent components of
the translation from semmc semantics into macaw IR. This provides
the shared infrastructure for all of our backends; this will include
the Template Haskell function to create a state transformer function
from learned semantics files provided by the _semmc_ library.
* macaw-arm -- Enables macaw for ARM (32-bit) binaries by reading the
semantics files generated by _semmc_ and using Template Haskell to
generate a function that transforms machine states according to the
learned semantics.
* macaw-arm-symbolic -- Enables macaw/crucible symbolic simulation for
ARM (32-bit) architectures.
* macaw-ppc -- Enables macaw for PPC (32-bit and 64-bit) binaries by reading the
semantics files generated by _semmc_ and using Template Haskell to
generate a function that transforms machine states according to the
learned semantics.
* macaw-ppc-symbolic -- Enables macaw/crucible symbolic simulation for
PPC architectures
* macaw-riscv -- Enables macaw for RISC-V (RV32GC and RV64GC variants) binaries.
* macaw-refinement -- Enables additional architecture-independent
refinement of code discovery. This can enable discovery of more
functionality than is revealed by the analysis in macaw-base.
2017-09-28 02:12:44 +03:00
2018-06-15 20:15:02 +03:00
The libraries that make up Macaw are released under the BSD license.
These Macaw core libraries depend on a number of different supporting libraries, including:
* elf-edit -- loading and parsing of ELF binary files
* galois-dwarf -- retrieval of Dwarf debugging information from binary files
* flexdis86 -- disassembly and semantics for x86 architectures
* dismantle -- disassembly for ARM and PPC architectures
* semmc -- semantics definitions for ARM and PPC architectures
* crucible -- Symbolic execution and analysis
* what4 -- Symbolic representation for the crucible backend
* parameterized-utils -- utilities for working with parameterized types
## Documentation
A set of high-level design documents can be found in the [`doc`](doc)
subdirectory. Documentation for individual API functions and data types can be
found in the Haddock comments throughout the code.
We have also written some other resources about Macaw:
* [Macaw: A Machine Code Toolbox for the Busy Binary
Analyst](http://www.arxiv.org/abs/2407.06375): an unpublished paper about
Macaw, as well as binary analysis tools built on top of Macaw.
* [Making a scalable, SMT-based machine code memory
model](https://galois.com/blog/2023/03/making-a-scalable-smt-based-machine-code-memory-model/):
a blog post about `macaw-symbolic`'s lazy memory model (implemented in
[`Data.Macaw.Symbolic.Memory.Lazy`](symbolic/src/Data/Macaw/Symbolic/Memory/Lazy.hs)
# Building
## Preparation
Dependencies for building Macaw that are not obtained from Hackage are
supported via Git submodules:
$ git submodule update --init
### Preparing Softfloat for RISC-V Backend
Add RISC-V backend (#259) * riscv: added grift as submodule * added macaw-riscv project * make arch polykinded everywhere in macaw base * stubbed out riscv_info * update grift * started on RISCVReg * started on RISCVReg * RegisterInfo instance for RISCVReg (a few unimplemented fields) * filled out archRegSet * filled out withArchConstraints, archAddrWidth, and archEndianness * added Arch module * RISCV initialBlockRegs * preliminary work on disassembleFn * wip: disassembleFn * made things more lens-y * wip: disassemble instruction * finished disassembly of grift assignment statements * separated out DisInstM into separate module * disassembly wip * finished disassembleBlock * Finished riscvDisassembleFn * bump grift submodule * made macaw discovery poly-kinded * added risc-v test suite * added risc-v test suite * fixed macaw semantics to hardware x0 to constant value 0 * added riscvPreserveReg based on assembler's manual * riscvDemandContext * successfully disassembled a block! * enhanced tests to allow optional entry point spec * stubbed out identifyCall * stubbed out identifyReturn * passing initial test * added checkForReturnAddr stub * fleshed out identifyCall and identifyReturn * update grift submodule * bug fix and exception handling * added EXC register, which tracks whether or not we've attempted to read from/write to any system registers. * Replaced custom CSR type with GRIFT's (but we're not using it currently) * added better show instance for GPRs we should migrate this to a GRIFT pretty printer at some point) * Fixed a vicious bug in the semantics; unsigned and signed LT were getting swapped in translation * added pattern synonyms for GPRs * improved docs and fixed RISCVReg bug (GP was 3 instead of 4) * changed undefineds to errors * changed RISCV class to RISCVConstraints * wrapped GRIFT's "RV" parameter in a type to remove the need to make macaw architecture parameter polykinded * rolled back all changes to macaw base that made things poly-kinded * reverted two more macaw core changes, updated license, removed old PPC test * macaw: update to upstream changes in bv-sized and grift * address code review comments * macaw-riscv: expose fewer modules * Update RISCVTermStmt definition * Update riscv_info. macaw-riscv now builds against master * Update bv-sized and cabal freeze files * Update cabal freeze files with satisfying lens version * Get tests building * Fix printf runtime error * Add simpler tests * Change RISCV target version and update grift pointer [skip ci] * Compressed branch test passes [skip ci] * Add additional small tests [skip ci] * Introduce a syscall PrimFn * Syscalls now correctly classified * Fix return regs from syscall * Extract syscall arguments * Update expected riscv test results * Add macaw-riscv build + test to CI * Get building with GHC 9.0.2 * Revert "Update cabal freeze files with satisfying lens version" This reverts commit 4aa95c19c374ce4874af5fd9350bb20a56a872f2. * Install softfloat in CI * Update Grift * Some initial cleanup * More cleanup * Resolve FIXME on getReg * Detect and only accept rv64gc rvreprs * Address Tristan's PR comments * Update Grift pointer * Add info on installing Softfloat to README for macaw-riscv * Add missing submodule step to softfloat build instructions Co-authored-by: Ben Selfridge <benselfridge@000279.local> Co-authored-by: Valentin Robert <val@galois.com>
2022-03-04 23:44:46 +03:00
The RISC-V backend depends on softfloat-hs, which in turn depends on the
softfloat library. Macaw's build system will automatically build softfloat,
but the softfloat-hs repo must be recursively cloned to enable this. If you
are not building `macaw-riscv` you can skip this step. To recursively clone
softfloat-hs, run:
Add RISC-V backend (#259) * riscv: added grift as submodule * added macaw-riscv project * make arch polykinded everywhere in macaw base * stubbed out riscv_info * update grift * started on RISCVReg * started on RISCVReg * RegisterInfo instance for RISCVReg (a few unimplemented fields) * filled out archRegSet * filled out withArchConstraints, archAddrWidth, and archEndianness * added Arch module * RISCV initialBlockRegs * preliminary work on disassembleFn * wip: disassembleFn * made things more lens-y * wip: disassemble instruction * finished disassembly of grift assignment statements * separated out DisInstM into separate module * disassembly wip * finished disassembleBlock * Finished riscvDisassembleFn * bump grift submodule * made macaw discovery poly-kinded * added risc-v test suite * added risc-v test suite * fixed macaw semantics to hardware x0 to constant value 0 * added riscvPreserveReg based on assembler's manual * riscvDemandContext * successfully disassembled a block! * enhanced tests to allow optional entry point spec * stubbed out identifyCall * stubbed out identifyReturn * passing initial test * added checkForReturnAddr stub * fleshed out identifyCall and identifyReturn * update grift submodule * bug fix and exception handling * added EXC register, which tracks whether or not we've attempted to read from/write to any system registers. * Replaced custom CSR type with GRIFT's (but we're not using it currently) * added better show instance for GPRs we should migrate this to a GRIFT pretty printer at some point) * Fixed a vicious bug in the semantics; unsigned and signed LT were getting swapped in translation * added pattern synonyms for GPRs * improved docs and fixed RISCVReg bug (GP was 3 instead of 4) * changed undefineds to errors * changed RISCV class to RISCVConstraints * wrapped GRIFT's "RV" parameter in a type to remove the need to make macaw architecture parameter polykinded * rolled back all changes to macaw base that made things poly-kinded * reverted two more macaw core changes, updated license, removed old PPC test * macaw: update to upstream changes in bv-sized and grift * address code review comments * macaw-riscv: expose fewer modules * Update RISCVTermStmt definition * Update riscv_info. macaw-riscv now builds against master * Update bv-sized and cabal freeze files * Update cabal freeze files with satisfying lens version * Get tests building * Fix printf runtime error * Add simpler tests * Change RISCV target version and update grift pointer [skip ci] * Compressed branch test passes [skip ci] * Add additional small tests [skip ci] * Introduce a syscall PrimFn * Syscalls now correctly classified * Fix return regs from syscall * Extract syscall arguments * Update expected riscv test results * Add macaw-riscv build + test to CI * Get building with GHC 9.0.2 * Revert "Update cabal freeze files with satisfying lens version" This reverts commit 4aa95c19c374ce4874af5fd9350bb20a56a872f2. * Install softfloat in CI * Update Grift * Some initial cleanup * More cleanup * Resolve FIXME on getReg * Detect and only accept rv64gc rvreprs * Address Tristan's PR comments * Update Grift pointer * Add info on installing Softfloat to README for macaw-riscv * Add missing submodule step to softfloat build instructions Co-authored-by: Ben Selfridge <benselfridge@000279.local> Co-authored-by: Valentin Robert <val@galois.com>
2022-03-04 23:44:46 +03:00
```shell
$ cd deps/softfloat-hs
$ git submodule update --init --recursive
```
## Building with Cabal
The Macaw libraries can be individually built or collectively built with Cabal:
$ ln -s cabal.project.dist cabal.project
$ cabal configure
$ cabal build all
To build a single library, either specify that library name instaed of
`all`, or change to that library's subdirectory before building:
$ cabal build macaw-refinement
or
$ cd refinement
$ cabal build
# Notes on Freeze Files
We use the `cabal.project.freeze.ghc-*` files to constrain dependency versions
in CI. To build with a known-working set of Hackage dependencies:
```
ln -s cabal.GHC-<VER>.config cabal.project.freeze
```
These freeze files were generated using the `scripts/regenerate-freeze-files.sh` script.
Note that at present, these configuration files assume a Unix-like operating
system, as we do not currently test Windows on CI. If you would like to use
these configuration files on Windows, you will need to make some manual changes
to remove certain packages and flags:
```
regex-posix
tasty +unix
unix
unix-compat
```
Note that if any of the macaw packages fail to build *without* the freeze files,
it is a bug in the dependency version bounds specified in the `.cabal` files
that should be reported (https://github.com/GaloisInc/macaw/issues).
# License
This code is made available under the BSD3 license and without any support.