Open source binary analysis tools.
Go to file
Ben Davis d9a516f09f Replace error on duplicate symbol names with just a warning
Symbol names are not always unique (multiple addresses are associated with same
name) on code we want to analyze, so just continue anyway.
2017-03-29 10:50:23 -04:00
scripts Add a script to clone/pull dependencies 2017-01-18 11:17:59 -05:00
src/Data/Macaw Replace error on duplicate symbol names with just a warning 2017-03-29 10:50:23 -04:00
.gitignore Initial commit of macaw 2016-11-28 23:14:04 -08:00
macaw.cabal Start using jump bounds; introduce "ParsedBlocks" 2017-02-16 02:53:19 -05:00
README.rst Fix a typo in the README 2017-03-22 11:39:18 -07:00
stack.yaml Update stack.yaml to match cabal requirement on containers version 2017-01-18 14:26:48 -05:00

The macaw library implements architecture-independent binary code discovery.  Support for specific architectures is provided by implementing the semantics of that architecture.  The library is written in terms of an abstract interface to memory, for which an ELF backend is provided (via the elf-edit_ library).  There is also a dependency on flexdis86_, which is an x86_64 disassembler, but that does not tie the discovery algorithm to x86_64.  The basic code discovery is based on a variant of Value Set Analysis (VSA).

The most important user-facing abstractions are:

* The ``Memory`` type, defined in ``Data.Macaw.Memory``, which provides an abstract interface to an address space containing both code and data.
* The ``memoryForElfSegments`` function is a useful helper to produce a ``Memory`` from an ELF file.
* The ``cfgFromAddrs`` function, defined in ``Data.Macaw.Discovery``, which performs code discovery on a ``Memory`` given some initial parameters (semantics to use via ``ArchitectureInfo`` and some entry points.
* The ``DiscoveryInfo`` type, which is the result of ``cfgFromAddrs``; it contains a collection of ``DiscoveryFunInfo`` records, each of which represents a discovered function.  Every basic block is assigned to at least one function.

An abbreviated example of using macaw on an ELF file looks like::

  import qualified Data.Map as M

  import qualified Data.ElfEdit as E
  import qualified Data.Parameterized.Some as PU
  import qualified Data.Macaw.Architecture.Info as AI
  import qualified Data.Macaw.Memory as MM
  import qualified Data.Macaw.Memory.ElfLoader as MM
  import qualified Data.Macaw.Discovery as MD
  import qualified Data.Macaw.Discovery.Info as MD

  discoverCode :: E.Elf Word64 -> AI.ArchitectureInfo X86_64 -> (forall ids . MD.DiscoveryInfo X86_64 ids -> a) -> a
  discoverCode elf archInfo k =
    withMemory MM.Addr64 elf $ \mem ->
      let Just entryPoint = MM.absoluteAddrSegment mem (fromIntegral (E.elfEntry elf))
      in case MD.cfgFromAddrs archInfo mem M.empty [entryPoint] [] of
        PU.Some di -> k di

  withMemory :: forall w m a
              . (MM.MemWidth w, Integral (E.ElfWordType w))
             => MM.AddrWidthRepr w
             -> E.Elf (E.ElfWordType w)
             -> (MM.Memory w -> m a)
             -> m a
  withMemory relaWidth e k =
    case MM.memoryForElfSegments relaWidth e of
      Left err -> error (show err)
      Right (_sim, mem) -> k mem


In the callback, the ``DiscoveryInfo`` can be analyzed as desired.

Implementing support for an architecture is more involved and requires implementing an ``ArchitectureInfo``, which is defined in ``Data.Macaw.Architecture.Info``.  This structure contains architecture-specific information like:

* The pointer width
* A disassembler from bytes to abstract instructions
* ABI information regarding registers and calling conventions
* A transfer function for architecture-specific features not represented in the common IR

.. _elf-edit: https://github.com/GaloisInc/elf-edit
.. _flexdis86: https://github.com/GaloisInc/flexdis86