2022-08-10 04:40:55 +03:00
|
|
|
This is the main repository for the Macaw binary analysis framework with two key
|
|
|
|
goals: binary code discovery and symbolic execution of machine code. This
|
|
|
|
framework is implemented to offer extensible support for architectures (i.e.,
|
|
|
|
library clients can add their own architectures and opt in to the architecture
|
|
|
|
support they need).
|
2017-09-28 02:12:44 +03:00
|
|
|
|
2019-02-20 22:12:23 +03:00
|
|
|
# Overview
|
|
|
|
|
2022-08-10 04:40:55 +03:00
|
|
|
The code discovery algorithm is based on forced execution and is able to
|
|
|
|
discovery code from one or more entry points. Symbols are optional but can
|
|
|
|
significantly improve the quality of the results. Stripped binaries can pose a
|
|
|
|
challenge for macaw (especially static stripped binaries). Macaw provides
|
|
|
|
support for lifting discovered machine code into an IR suitable for symbolic
|
|
|
|
execution via the Crucible library.
|
|
|
|
|
|
|
|
Currently, macaw supports:
|
|
|
|
|
|
|
|
* x86-64
|
|
|
|
* PowerPC (32 and 64 bit)
|
|
|
|
* ARM (32 bit)
|
|
|
|
* RISC-V
|
|
|
|
|
|
|
|
## Repository Structure
|
2017-09-28 02:12:44 +03:00
|
|
|
|
2019-02-20 22:12:23 +03:00
|
|
|
The Macaw libraries are:
|
2017-09-28 02:12:44 +03:00
|
|
|
|
2019-02-20 22:12:23 +03:00
|
|
|
* macaw-base -- The core architecture-independent operations and
|
|
|
|
algorithms.
|
|
|
|
* macaw-symbolic -- Library that provides symbolic simulation of Macaw
|
|
|
|
programs via Crucible.
|
|
|
|
* macaw-x86 -- Provides definitions enabling Macaw to be used on
|
|
|
|
X86_64 programs.
|
|
|
|
* macaw-x86-symbolic -- Adds Macaw-symbolic extensions needed to
|
|
|
|
support x86.
|
|
|
|
* macaw-semmc -- Contains the architecture-independent components of
|
|
|
|
the translation from semmc semantics into macaw IR. This provides
|
|
|
|
the shared infrastructure for all of our backends; this will include
|
|
|
|
the Template Haskell function to create a state transformer function
|
|
|
|
from learned semantics files provided by the _semmc_ library.
|
|
|
|
* macaw-arm -- Enables macaw for ARM (32-bit) binaries by reading the
|
|
|
|
semantics files generated by _semmc_ and using Template Haskell to
|
|
|
|
generate a function that transforms machine states according to the
|
|
|
|
learned semantics.
|
|
|
|
* macaw-arm-symbolic -- Enables macaw/crucible symbolic simulation for
|
|
|
|
ARM (32-bit) architectures.
|
|
|
|
* macaw-ppc -- Enables macaw for PPC (32-bit and 64-bit) binaries by reading the
|
|
|
|
semantics files generated by _semmc_ and using Template Haskell to
|
|
|
|
generate a function that transforms machine states according to the
|
2022-08-10 04:40:55 +03:00
|
|
|
learned semantics.
|
2019-02-20 22:12:23 +03:00
|
|
|
* macaw-ppc-symbolic -- Enables macaw/crucible symbolic simulation for
|
|
|
|
PPC architectures
|
2022-03-22 00:08:50 +03:00
|
|
|
* macaw-riscv -- Enables macaw for RISC-V (RV32GC and RV64GC variants) binaries.
|
2019-02-20 22:12:23 +03:00
|
|
|
* macaw-refinement -- Enables additional architecture-independent
|
|
|
|
refinement of code discovery. This can enable discovery of more
|
|
|
|
functionality than is revealed by the analysis in macaw-base.
|
2017-09-28 02:12:44 +03:00
|
|
|
|
2018-06-15 20:15:02 +03:00
|
|
|
The libraries that make up Macaw are released under the BSD license.
|
2018-10-31 02:12:16 +03:00
|
|
|
|
2019-02-20 22:12:23 +03:00
|
|
|
These Macaw core libraries depend on a number of different supporting libraries, including:
|
|
|
|
|
|
|
|
* elf-edit -- loading and parsing of ELF binary files
|
2019-07-18 20:56:19 +03:00
|
|
|
* galois-dwarf -- retrieval of Dwarf debugging information from binary files
|
2019-02-20 22:12:23 +03:00
|
|
|
* flexdis86 -- disassembly and semantics for x86 architectures
|
|
|
|
* dismantle -- disassembly for ARM and PPC architectures
|
|
|
|
* semmc -- semantics definitions for ARM and PPC architectures
|
|
|
|
* crucible -- Symbolic execution and analysis
|
|
|
|
* what4 -- Symbolic representation for the crucible backend
|
|
|
|
* parameterized-utils -- utilities for working with parameterized types
|
|
|
|
|
2024-07-16 19:42:14 +03:00
|
|
|
## Documentation
|
|
|
|
|
|
|
|
A set of high-level design documents can be found in the [`doc`](doc)
|
|
|
|
subdirectory. Documentation for individual API functions and data types can be
|
|
|
|
found in the Haddock comments throughout the code.
|
|
|
|
|
|
|
|
We have also written some other resources about Macaw:
|
|
|
|
|
|
|
|
* [Macaw: A Machine Code Toolbox for the Busy Binary
|
|
|
|
Analyst](http://www.arxiv.org/abs/2407.06375): an unpublished paper about
|
|
|
|
Macaw, as well as binary analysis tools built on top of Macaw.
|
|
|
|
* [Making a scalable, SMT-based machine code memory
|
|
|
|
model](https://galois.com/blog/2023/03/making-a-scalable-smt-based-machine-code-memory-model/):
|
|
|
|
a blog post about `macaw-symbolic`'s lazy memory model (implemented in
|
|
|
|
[`Data.Macaw.Symbolic.Memory.Lazy`](symbolic/src/Data/Macaw/Symbolic/Memory/Lazy.hs)
|
|
|
|
|
2019-02-20 22:12:23 +03:00
|
|
|
# Building
|
|
|
|
|
|
|
|
## Preparation
|
|
|
|
|
|
|
|
Dependencies for building Macaw that are not obtained from Hackage are
|
|
|
|
supported via Git submodules:
|
|
|
|
|
|
|
|
$ git submodule update --init
|
|
|
|
|
2022-03-05 02:49:36 +03:00
|
|
|
### Preparing Softfloat for RISC-V Backend
|
2022-03-04 23:44:46 +03:00
|
|
|
|
|
|
|
The RISC-V backend depends on softfloat-hs, which in turn depends on the
|
2022-03-05 02:49:36 +03:00
|
|
|
softfloat library. Macaw's build system will automatically build softfloat,
|
|
|
|
but the softfloat-hs repo must be recursively cloned to enable this. If you
|
|
|
|
are not building `macaw-riscv` you can skip this step. To recursively clone
|
|
|
|
softfloat-hs, run:
|
2022-03-04 23:44:46 +03:00
|
|
|
```shell
|
|
|
|
$ cd deps/softfloat-hs
|
|
|
|
$ git submodule update --init --recursive
|
|
|
|
```
|
|
|
|
|
2019-02-20 22:12:23 +03:00
|
|
|
## Building with Cabal
|
|
|
|
|
2022-08-10 04:40:55 +03:00
|
|
|
The Macaw libraries can be individually built or collectively built with Cabal:
|
2019-02-20 22:12:23 +03:00
|
|
|
|
|
|
|
$ ln -s cabal.project.dist cabal.project
|
2022-08-10 04:40:55 +03:00
|
|
|
$ cabal configure
|
|
|
|
$ cabal build all
|
2019-02-20 22:12:23 +03:00
|
|
|
|
|
|
|
To build a single library, either specify that library name instaed of
|
|
|
|
`all`, or change to that library's subdirectory before building:
|
|
|
|
|
2022-08-10 04:40:55 +03:00
|
|
|
$ cabal build macaw-refinement
|
2019-02-20 22:12:23 +03:00
|
|
|
|
|
|
|
or
|
|
|
|
|
|
|
|
$ cd refinement
|
2022-08-10 04:40:55 +03:00
|
|
|
$ cabal build
|
2019-02-20 22:12:23 +03:00
|
|
|
|
2022-01-10 20:27:12 +03:00
|
|
|
# Notes on Freeze Files
|
|
|
|
|
|
|
|
We use the `cabal.project.freeze.ghc-*` files to constrain dependency versions
|
2022-08-10 04:40:55 +03:00
|
|
|
in CI. To build with a known-working set of Hackage dependencies:
|
2022-01-10 20:27:12 +03:00
|
|
|
|
|
|
|
```
|
|
|
|
ln -s cabal.GHC-<VER>.config cabal.project.freeze
|
|
|
|
```
|
|
|
|
|
2022-08-10 04:40:55 +03:00
|
|
|
These freeze files were generated using the `scripts/regenerate-freeze-files.sh` script.
|
2022-01-10 20:27:12 +03:00
|
|
|
Note that at present, these configuration files assume a Unix-like operating
|
|
|
|
system, as we do not currently test Windows on CI. If you would like to use
|
|
|
|
these configuration files on Windows, you will need to make some manual changes
|
|
|
|
to remove certain packages and flags:
|
|
|
|
|
|
|
|
```
|
|
|
|
regex-posix
|
|
|
|
tasty +unix
|
|
|
|
unix
|
|
|
|
unix-compat
|
|
|
|
```
|
|
|
|
|
2022-08-10 04:40:55 +03:00
|
|
|
Note that if any of the macaw packages fail to build *without* the freeze files,
|
|
|
|
it is a bug in the dependency version bounds specified in the `.cabal` files
|
|
|
|
that should be reported (https://github.com/GaloisInc/macaw/issues).
|
|
|
|
|
2019-02-20 22:12:23 +03:00
|
|
|
# License
|
|
|
|
|
|
|
|
This code is made available under the BSD3 license and without any support.
|