75e31f8338
Given some symbolic value, @x@, we'd like to compute its possible upper and lower bounds after it is truncated to @w@ bits. To do this, we first find the bound of x by (recursively) calling `exprRangePred`. This bound is a statement of the following form (see `RangePred` for more info): "r bits of x are bounded between @low@ and @high@". Then, we check the following: - If x has a bound (r, l, w) AND - If r is less than or equal to w => Pass-through the bound (r, l, w) Otherwise, we deem x "unbounded" Declaring x unbounded in the second case seems to throw away useful information that causes many jump tables to remain unclassified. We attempt to improve on that in this commit. Consider an example where x is bounded by (64, 0, 10) (that is, the 64 bits of x are constrained to be between 0 and 10) and we want to find the bound of truncating x to 8 bits. With the current logic, since 64 > 8, we'd declare x unbounded. However, the bound (8, 0, 10) should also be valid: if 64 bits of x are bounded to [0, 10], then surely 8 bits of x also lie between 0 and 10. If the upper bound is instead larger than the largest 8-bit value, we can truncate it to the largest value. For example, (64, 0, 10000) becomes (8, 0, 255). Instead of losing the bound completely, we're able to tighten it! |
||
---|---|---|
.github | ||
base | ||
deps | ||
doc | ||
macaw-aarch32 | ||
macaw-aarch32-symbolic | ||
macaw-ppc | ||
macaw-ppc-symbolic | ||
macaw-riscv | ||
macaw-semmc | ||
refinement | ||
scripts | ||
symbolic | ||
symbolic-syntax | ||
utils/compare-dwarfdump | ||
x86 | ||
x86_symbolic | ||
.gitignore | ||
.gitmodules | ||
cabal.project.dist | ||
cabal.project.freeze.ghc-9.2.8 | ||
cabal.project.freeze.ghc-9.4.8 | ||
cabal.project.freeze.ghc-9.6.5 | ||
cabal.project.werror | ||
LICENSE | ||
README.md |
This is the main repository for the Macaw binary analysis framework with two key goals: binary code discovery and symbolic execution of machine code. This framework is implemented to offer extensible support for architectures (i.e., library clients can add their own architectures and opt in to the architecture support they need).
Overview
The code discovery algorithm is based on forced execution and is able to discovery code from one or more entry points. Symbols are optional but can significantly improve the quality of the results. Stripped binaries can pose a challenge for macaw (especially static stripped binaries). Macaw provides support for lifting discovered machine code into an IR suitable for symbolic execution via the Crucible library.
Currently, macaw supports:
- x86-64
- PowerPC (32 and 64 bit)
- ARM (32 bit)
- RISC-V
Repository Structure
The Macaw libraries are:
- macaw-base -- The core architecture-independent operations and algorithms.
- macaw-symbolic -- Library that provides symbolic simulation of Macaw programs via Crucible.
- macaw-x86 -- Provides definitions enabling Macaw to be used on X86_64 programs.
- macaw-x86-symbolic -- Adds Macaw-symbolic extensions needed to support x86.
- macaw-semmc -- Contains the architecture-independent components of the translation from semmc semantics into macaw IR. This provides the shared infrastructure for all of our backends; this will include the Template Haskell function to create a state transformer function from learned semantics files provided by the semmc library.
- macaw-arm -- Enables macaw for ARM (32-bit) binaries by reading the semantics files generated by semmc and using Template Haskell to generate a function that transforms machine states according to the learned semantics.
- macaw-arm-symbolic -- Enables macaw/crucible symbolic simulation for ARM (32-bit) architectures.
- macaw-ppc -- Enables macaw for PPC (32-bit and 64-bit) binaries by reading the semantics files generated by semmc and using Template Haskell to generate a function that transforms machine states according to the learned semantics.
- macaw-ppc-symbolic -- Enables macaw/crucible symbolic simulation for PPC architectures
- macaw-riscv -- Enables macaw for RISC-V (RV32GC and RV64GC variants) binaries.
- macaw-refinement -- Enables additional architecture-independent refinement of code discovery. This can enable discovery of more functionality than is revealed by the analysis in macaw-base.
The libraries that make up Macaw are released under the BSD license.
These Macaw core libraries depend on a number of different supporting libraries, including:
- elf-edit -- loading and parsing of ELF binary files
- galois-dwarf -- retrieval of Dwarf debugging information from binary files
- flexdis86 -- disassembly and semantics for x86 architectures
- dismantle -- disassembly for ARM and PPC architectures
- semmc -- semantics definitions for ARM and PPC architectures
- crucible -- Symbolic execution and analysis
- what4 -- Symbolic representation for the crucible backend
- parameterized-utils -- utilities for working with parameterized types
Documentation
A set of high-level design documents can be found in the doc
subdirectory. Documentation for individual API functions and data types can be
found in the Haddock comments throughout the code.
We have also written some other resources about Macaw:
- Macaw: A Machine Code Toolbox for the Busy Binary Analyst: an unpublished paper about Macaw, as well as binary analysis tools built on top of Macaw.
- Making a scalable, SMT-based machine code memory
model:
a blog post about
macaw-symbolic
's lazy memory model (implemented inData.Macaw.Symbolic.Memory.Lazy
Building
Preparation
Dependencies for building Macaw that are not obtained from Hackage are supported via Git submodules:
$ git submodule update --init
Preparing Softfloat for RISC-V Backend
The RISC-V backend depends on softfloat-hs, which in turn depends on the
softfloat library. Macaw's build system will automatically build softfloat,
but the softfloat-hs repo must be recursively cloned to enable this. If you
are not building macaw-riscv
you can skip this step. To recursively clone
softfloat-hs, run:
$ cd deps/softfloat-hs
$ git submodule update --init --recursive
Building with Cabal
The Macaw libraries can be individually built or collectively built with Cabal:
$ ln -s cabal.project.dist cabal.project
$ cabal configure
$ cabal build all
To build a single library, either specify that library name instaed of
all
, or change to that library's subdirectory before building:
$ cabal build macaw-refinement
or
$ cd refinement
$ cabal build
Notes on Freeze Files
We use the cabal.project.freeze.ghc-*
files to constrain dependency versions
in CI. To build with a known-working set of Hackage dependencies:
ln -s cabal.GHC-<VER>.config cabal.project.freeze
These freeze files were generated using the scripts/regenerate-freeze-files.sh
script.
Note that at present, these configuration files assume a Unix-like operating
system, as we do not currently test Windows on CI. If you would like to use
these configuration files on Windows, you will need to make some manual changes
to remove certain packages and flags:
regex-posix
tasty +unix
unix
unix-compat
Note that if any of the macaw packages fail to build without the freeze files,
it is a bug in the dependency version bounds specified in the .cabal
files
that should be reported (https://github.com/GaloisInc/macaw/issues).
License
This code is made available under the BSD3 license and without any support.