macaw/base
Tristan Ravitch 96129be6de Keep the write of the return address to the stack (x86)
This mostly affects x86.  Previously, we threw away the write of the return
address to the stack when identifying calls for macaw-x86.  This was partly for
hygiene and partly to support the "addresses written to memory are function
pointers" heuristic.  Treating the return address as a potential function
pointer breaks function identification, so that is important.

The problem comes in the translation of macaw into crucible - we never write the
return address to the stack, but returns still read the return address from the
stack.  If it wasn't written in the first place, this leads to a read
from (potentially) uninitialized memory, which causes errors in the symbolic
simulator.  There are two solutions:

1. Make returns not read from the stack
2. Keep the write of the return address to the stack

Solution 1 is a problem, as we have a data dependency on the read.  Eliding it
breaks Crucible generation later and produces an invalid CFG.

Solution 2 works well.  The implementation is actually simple.  We can keep
identifyCall the same for x86 and just construct the basic block not from the
return value but from the original list of statements (unaltered).  We do need
to have identifyCall still give us the reduced statement list, which we use for
identifying possible function pointers written onto the stack (but not the
return address, which we do not want to treat as a function pointer).
2018-12-07 15:11:39 -08:00
..
src/Data/Macaw Keep the write of the return address to the stack (x86) 2018-12-07 15:11:39 -08:00
LICENSE Update license information. 2017-09-27 15:59:06 -07:00
macaw-base.cabal Explicit NoStarIsType with Data.Kind.Type and increasing do indentation (for GHC 8.6) 2018-11-20 09:43:48 +00:00
README.rst Update READMEs. 2017-09-27 16:12:44 -07:00

The macaw library implements architecture-independent binary code
discovery.  Support for specific architectures is provided by
implementing the semantics of that architecture.  The library is
written in terms of an abstract interface to memory, for which an ELF
backend is provided (via the elf-edit_ library).  The basic code
discovery is based on a variant of Value Set Analysis (VSA).

The most important user-facing abstractions are:

* The ``Memory`` type, defined in ``Data.Macaw.Memory``, which provides an abstract interface to an address space containing both code and data.
* The ``memoryForElfSegments`` function is a useful helper to produce a ``Memory`` from an ELF file.
* The ``cfgFromAddrs`` function, defined in ``Data.Macaw.Discovery``, which performs code discovery on a ``Memory`` given some initial parameters (semantics to use via ``ArchitectureInfo`` and some entry points.
* The ``DiscoveryInfo`` type, which is the result of ``cfgFromAddrs``; it contains a collection of ``DiscoveryFunInfo`` records, each of which represents a discovered function.  Every basic block is assigned to at least one function.

Architecture-specific code goes into separate libraries.  X86-specific code is in the macaw-x86 repo.

An abbreviated example of using macaw on an ELF file looks like::

  import qualified Data.Map as M

  import qualified Data.ElfEdit as E
  import qualified Data.Parameterized.Some as PU
  import qualified Data.Macaw.Architecture.Info as AI
  import qualified Data.Macaw.Memory as MM
  import qualified Data.Macaw.Memory.ElfLoader as MM
  import qualified Data.Macaw.Discovery as MD
  import qualified Data.Macaw.Discovery.Info as MD

  discoverCode :: E.Elf Word64 -> AI.ArchitectureInfo X86_64 -> (forall ids . MD.DiscoveryInfo X86_64 ids -> a) -> a
  discoverCode elf archInfo k =
    withMemory MM.Addr64 elf $ \mem ->
      let Just entryPoint = MM.absoluteAddrSegment mem (fromIntegral (E.elfEntry elf))
      in case MD.cfgFromAddrs archInfo mem M.empty [entryPoint] [] of
        PU.Some di -> k di

  withMemory :: forall w m a
              . (MM.MemWidth w, Integral (E.ElfWordType w))
             => MM.AddrWidthRepr w
             -> E.Elf (E.ElfWordType w)
             -> (MM.Memory w -> m a)
             -> m a
  withMemory relaWidth e k =
    case MM.memoryForElfSegments relaWidth e of
      Left err -> error (show err)
      Right (_sim, mem) -> k mem


In the callback, the ``DiscoveryInfo`` can be analyzed as desired.

Implementing support for an architecture is more involved and requires implementing an ``ArchitectureInfo``, which is defined in ``Data.Macaw.Architecture.Info``.  This structure contains architecture-specific information like:

* The pointer width
* A disassembler from bytes to abstract instructions
* ABI information regarding registers and calling conventions
* A transfer function for architecture-specific features not represented in the common IR

.. _elf-edit: https://github.com/GaloisInc/elf-edit
.. _flexdis86: https://github.com/GaloisInc/flexdis86