Open source binary analysis tools.
Go to file
Tristan Ravitch ff80d7e676 Improve the TH generator for instruction matchers (i.e., execInstruction)
The previous generator put all of the code for each matcher in a single large
case expression.  While there were individual functions broken out for each case
body, they were all still in the same let expression, which created a huge term.

This refactoring lifts all of the semantics definition bodies to the top
level (with NOINLINE pragmas) to give the code generator less to chew on at a
time.

This improves compile times a little, but, more importantly, works around a bug
in the register allocator in GHC 8.4 that caused a crash in the PowerPC
semantics functions.
2018-07-26 17:17:09 -07:00
doc Documentation updates 2018-05-21 17:53:01 -07:00
macaw-arm Improve the TH generator for instruction matchers (i.e., execInstruction) 2018-07-26 17:17:09 -07:00
macaw-ppc Improve the TH generator for instruction matchers (i.e., execInstruction) 2018-07-26 17:17:09 -07:00
macaw-ppc-symbolic Update submodules 2018-07-24 16:57:36 -07:00
macaw-semmc Improve the TH generator for instruction matchers (i.e., execInstruction) 2018-07-26 17:17:09 -07:00
submodules Update submodules 2018-07-24 16:57:36 -07:00
.gitignore Add generated output files and editor backup files to gitignore. 2017-12-20 10:13:31 -08:00
.gitmodules Submodule updates 2018-05-21 17:53:10 -07:00
cabal.project.dist Update for crucible reorganization and new what4 module. 2018-05-18 08:33:58 -07:00
README.org Documentation updates 2018-05-21 17:53:01 -07:00
stack.yaml fixed stack and used Data.Functor.Product instead of hand-rolled type 2018-05-24 14:09:49 -07:00

Overview

The high level goal is to write and/or generate architecture-specific backends for macaw based on the semantics discovered by semmc. In particular, we are interested in making macaw-ppc and macaw-arm. We will hand-write some of the code, but we will generate as much as possible automatically. We will read in the semantics files generated by semmc and use Template Haskell to generate a function that transforms machine states according to the learned semantics.

We will implement a base package (macaw-semmc) that provides shared infrastructure for all of our backends; this will include the Template Haskell function to create a state transformer function from learned semantics files.

Repository Layout

  • macaw-semmc contains the architecture-independent components of the translation from semmc semantics into macaw IR.
  • macaw-ppc implements the PowerPC-specific backend of the translation.
  • macaw-ppc-symbolic implements a translation of macaw IR (with PowerPC architecture-specific functions) into Crucible IR, which is suitable for symbolic execution.
  • macaw-arm implements the ARM-specific backend of the translation.

Building

The dependencies of this project that are not available on Hackage are tracked via git submodules. To build with a reasonably modern version of cabal (i.e., one that supports new-build):

 $ git submodule update --init
 $ ln -s cabal.project.dist cabal.project
 $ cabal new-configure
 $ cabal new-build macaw-ppc

Code dependencies and related packages

  • macaw (binary code discovery)
  • macaw-x86 (x86_64 backend for macaw)
  • semmc (semantics learning and code synthesis)
  • semmc-ppc (PowerPC backend for synthesis)
  • dismantle-tablegen (disassembler infrastructure)
  • dismantle-ppc (PowerPC disassembler)
  • crucible (interface to SMT solvers)
  • parameterized-utils (utilities for working with parameterized types)

Semantics background

The semmc library is designed to learn semantics for machine code instructions. Its output, for each Instruction Set Architecture (ISA), is a directory of files where each file contains a formula corresponding to the semantics for an opcode in the ISA. For example, the ADDI.sem file contains the semantics for the add immediate instruction in PowerPC.

There are functions in semmc for dealing with this representation. Formulas are loaded into a data type called ParameterizedFormula, which contains formula fragments based on the ExprBuilder representation of crucible. This can be thought of as a convenient representation of SMT formulas.

Status

This codebase is a work in progress. PowerPC support (both 32 and 64 bit) is reasonably robust. Support for ARM is ongoing.

License

This code is made available under the BSD3 license and without any support.