macaw/README.org
2018-05-21 17:53:01 -07:00

68 lines
2.9 KiB
Org Mode

* Overview
The high level goal is to write and/or generate architecture-specific backends
for /macaw/ based on the semantics discovered by /semmc/. In particular, we
are interested in making /macaw-ppc/ and /macaw-arm/. We will hand-write some
of the code, but we will generate as much as possible automatically. We will
read in the semantics files generated by /semmc/ and use Template
Haskell to generate a function that transforms machine
states according to the learned semantics.
We will implement a base package (/macaw-semmc/) that provides shared
infrastructure for all of our backends; this will include the Template Haskell
function to create a state transformer function from learned semantics files.
** Repository Layout
- ~macaw-semmc~ contains the architecture-independent components of the translation from semmc semantics into macaw IR.
- ~macaw-ppc~ implements the PowerPC-specific backend of the translation.
- ~macaw-ppc-symbolic~ implements a translation of macaw IR (with PowerPC architecture-specific functions) into Crucible IR, which is suitable for symbolic execution.
- ~macaw-arm~ implements the ARM-specific backend of the translation.
** Building
The dependencies of this project that are not available on Hackage are tracked via git submodules. To build with a reasonably modern version of ~cabal~ (i.e., one that supports ~new-build~):
#+BEGIN_SRC
$ git submodule update --init
$ ln -s cabal.project.dist cabal.project
$ cabal new-configure
$ cabal new-build macaw-ppc
#+END_SRC
** Code dependencies and related packages
- macaw (binary code discovery)
- macaw-x86 (x86_64 backend for macaw)
- semmc (semantics learning and code synthesis)
- semmc-ppc (PowerPC backend for synthesis)
- dismantle-tablegen (disassembler infrastructure)
- dismantle-ppc (PowerPC disassembler)
- crucible (interface to SMT solvers)
- parameterized-utils (utilities for working with parameterized types)
* Semantics background
The /semmc/ library is designed to learn semantics for machine code
instructions. Its output, for each Instruction Set Architecture (ISA), is a
directory of files where each file contains a formula corresponding to the
semantics for an opcode in the ISA. For example, the ~ADDI.sem~ file
contains the semantics for the add immediate instruction in PowerPC.
There are functions in /semmc/ for dealing with this representation.
Formulas are loaded into a data type called ~ParameterizedFormula~, which
contains formula fragments based on the ~ExprBuilder~ representation of /crucible/. This can be thought of as a convenient representation of SMT
formulas.
* Status
This codebase is a work in progress. PowerPC support (both 32 and 64 bit) is reasonably robust. Support for ARM is ongoing.
* License
This code is made available under the BSD3 license and without any support.