Open source binary analysis tools.
Go to file
Ryan Scott a6ff58f473 macaw-x86-symbolic: Fix idiv/div semantics
When converting a Macaw value with the Macaw type `TupleType [x_1, ..., x_n]`
to Crucible, the resulting Crucible value will have the Crucible type
`StructType (EmptyCtx ::> ToCrucibleType x_n ::> ... ::> ToCrucibleType x_1)`.
(See `macawListToCrucible(M)` in `Data.Macaw.Symbolic.PersistentState` for
where this is implemented.) Note that the order of the tuple's fields is
reversed in the process of converting it to a Crucible struct. This is a
convention that one must keep in mind when dealing with Macaw tuples at the
Crucible level.

As it turns out, the part of `macaw-x86-symbolic` reponsible for interpreting
the semantics of the `idiv` instruction (for signed quotient/remainder) and the
`div` instruction (for unsigned quotient/remainder) were _not_ respecting this
convention. This is because the `macaw-x86-symbolic` semantics were returning a
Crucible struct consisting of `Empty :> quotient :> remainder)`, but at the
Macaw level, this was interpreted as the tuple `(remainder, quotient)`, which
is the opposite of the intended order. This led to subtle bugs such as those
observed in #393.

The solution is straightforward: have the `macaw-x86-symbolic` semantics
compute `Empty :> remainder :> quotient` instead. Somewhat counterintuitive,
but it does work.

Fixes #393.
2024-07-12 16:56:51 -04:00
.github CI: Regenerate freeze files using latest GHC minor versions 2024-06-13 04:25:12 -04:00
base parse subroutine type declaration formal parameters (#382) 2024-05-22 10:47:16 -07:00
deps Bump submodules to allow building with what4-1.6.* 2024-06-13 04:25:12 -04:00
doc Add some additional documentation (#303) 2022-08-09 18:40:55 -07:00
macaw-aarch32 macaw-{aarch32,ppc}: Remove vestigial InstructionAtUnmappedAddr error types 2023-11-14 13:03:52 -05:00
macaw-aarch32-symbolic aarch32-symbolic: Export AArch32Exception 2024-01-24 11:03:37 -05:00
macaw-ppc macaw-{aarch32,ppc}: Remove vestigial InstructionAtUnmappedAddr error types 2023-11-14 13:03:52 -05:00
macaw-ppc-symbolic macaw-symbolic: Test both memory model configurations in test suites 2023-03-14 13:27:07 -04:00
macaw-riscv Fix -Wincomplete-uni-patterns warnings 2022-05-31 15:50:48 -04:00
macaw-semmc Bump submodules to allow building with what4-1.6.* 2024-06-13 04:25:12 -04:00
refinement Fix refinement panic caused by duplicate cfg edges 2024-05-10 15:40:11 -07:00
scripts Add a script for regenerating CI freeze files 2022-06-30 13:44:35 -07:00
symbolic Bump submodules to allow building with what4-1.6.* 2024-06-13 04:25:12 -04:00
symbolic-syntax Bump Crucible submodule, adapt to crucible-syntax changes 2023-12-08 09:46:20 -05:00
utils/compare-dwarfdump Cleanup compare-dwarfdump; bump submodules. 2021-05-26 07:25:51 -07:00
x86 Add semantics for prefetch instructions. (#365) 2024-01-10 11:31:36 -08:00
x86_symbolic macaw-x86-symbolic: Fix idiv/div semantics 2024-07-12 16:56:51 -04:00
.gitignore Fix .gitignore update. 2021-04-03 18:16:25 -07:00
.gitmodules Update llvm-pretty submodule target 2024-05-23 16:23:39 -07:00
cabal.project.dist symbolic-syntax: Reuse type alias parsers from crucible-llvm-syntax 2023-11-02 16:34:01 -04:00
cabal.project.freeze.ghc-9.2.8 CI: Regenerate freeze files using latest GHC minor versions 2024-06-13 04:25:12 -04:00
cabal.project.freeze.ghc-9.4.8 CI: Regenerate freeze files using latest GHC minor versions 2024-06-13 04:25:12 -04:00
cabal.project.freeze.ghc-9.6.5 CI: Regenerate freeze files using latest GHC minor versions 2024-06-13 04:25:12 -04:00
cabal.project.werror macaw-symbolic-syntax: Concrete syntax for macaw-symbolic CFGs 2023-11-01 17:19:13 -04:00
LICENSE Update license dates 2020-11-12 23:43:38 -08:00
README.md Add some additional documentation (#303) 2022-08-09 18:40:55 -07:00

This is the main repository for the Macaw binary analysis framework with two key goals: binary code discovery and symbolic execution of machine code. This framework is implemented to offer extensible support for architectures (i.e., library clients can add their own architectures and opt in to the architecture support they need).

Overview

The code discovery algorithm is based on forced execution and is able to discovery code from one or more entry points. Symbols are optional but can significantly improve the quality of the results. Stripped binaries can pose a challenge for macaw (especially static stripped binaries). Macaw provides support for lifting discovered machine code into an IR suitable for symbolic execution via the Crucible library.

Currently, macaw supports:

  • x86-64
  • PowerPC (32 and 64 bit)
  • ARM (32 bit)
  • RISC-V

Repository Structure

The Macaw libraries are:

  • macaw-base -- The core architecture-independent operations and algorithms.
  • macaw-symbolic -- Library that provides symbolic simulation of Macaw programs via Crucible.
  • macaw-x86 -- Provides definitions enabling Macaw to be used on X86_64 programs.
  • macaw-x86-symbolic -- Adds Macaw-symbolic extensions needed to support x86.
  • macaw-semmc -- Contains the architecture-independent components of the translation from semmc semantics into macaw IR. This provides the shared infrastructure for all of our backends; this will include the Template Haskell function to create a state transformer function from learned semantics files provided by the semmc library.
  • macaw-arm -- Enables macaw for ARM (32-bit) binaries by reading the semantics files generated by semmc and using Template Haskell to generate a function that transforms machine states according to the learned semantics.
  • macaw-arm-symbolic -- Enables macaw/crucible symbolic simulation for ARM (32-bit) architectures.
  • macaw-ppc -- Enables macaw for PPC (32-bit and 64-bit) binaries by reading the semantics files generated by semmc and using Template Haskell to generate a function that transforms machine states according to the learned semantics.
  • macaw-ppc-symbolic -- Enables macaw/crucible symbolic simulation for PPC architectures
  • macaw-riscv -- Enables macaw for RISC-V (RV32GC and RV64GC variants) binaries.
  • macaw-refinement -- Enables additional architecture-independent refinement of code discovery. This can enable discovery of more functionality than is revealed by the analysis in macaw-base.

The libraries that make up Macaw are released under the BSD license.

These Macaw core libraries depend on a number of different supporting libraries, including:

  • elf-edit -- loading and parsing of ELF binary files
  • galois-dwarf -- retrieval of Dwarf debugging information from binary files
  • flexdis86 -- disassembly and semantics for x86 architectures
  • dismantle -- disassembly for ARM and PPC architectures
  • semmc -- semantics definitions for ARM and PPC architectures
  • crucible -- Symbolic execution and analysis
  • what4 -- Symbolic representation for the crucible backend
  • parameterized-utils -- utilities for working with parameterized types

Building

Preparation

Dependencies for building Macaw that are not obtained from Hackage are supported via Git submodules:

$ git submodule update --init

Preparing Softfloat for RISC-V Backend

The RISC-V backend depends on softfloat-hs, which in turn depends on the softfloat library. Macaw's build system will automatically build softfloat, but the softfloat-hs repo must be recursively cloned to enable this. If you are not building macaw-riscv you can skip this step. To recursively clone softfloat-hs, run:

$ cd deps/softfloat-hs
$ git submodule update --init --recursive

Building with Cabal

The Macaw libraries can be individually built or collectively built with Cabal:

$ ln -s cabal.project.dist cabal.project
$ cabal configure
$ cabal build all

To build a single library, either specify that library name instaed of all, or change to that library's subdirectory before building:

$ cabal build macaw-refinement

or

$ cd refinement
$ cabal build

Notes on Freeze Files

We use the cabal.project.freeze.ghc-* files to constrain dependency versions in CI. To build with a known-working set of Hackage dependencies:

ln -s cabal.GHC-<VER>.config cabal.project.freeze

These freeze files were generated using the scripts/regenerate-freeze-files.sh script. Note that at present, these configuration files assume a Unix-like operating system, as we do not currently test Windows on CI. If you would like to use these configuration files on Windows, you will need to make some manual changes to remove certain packages and flags:

regex-posix
tasty +unix
unix
unix-compat

Note that if any of the macaw packages fail to build without the freeze files, it is a bug in the dependency version bounds specified in the .cabal files that should be reported (https://github.com/GaloisInc/macaw/issues).

License

This code is made available under the BSD3 license and without any support.