macaw/base
Ryan Scott 049096c506 Support building with GHC 9.0
This contains a variety of fixes needed to make the packages in the `macaw`
repo compile with GHC 9.0:

* GHC 9.0 implements simplified subsumption (see
  [here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#simplified-subsumption)).
  In most cases, adapting to this is a matter of manually eta expanding
  definitions, such as in `base:Data.Macaw.Analysis.RegisterUse`. In the case
  of `macaw-x86-symbolic:Data.Macaw.X86.Crucible`, the type signature of
  `evalExt` had to be made more specific to adapt to the loss of contravariance
  when typechecking `(->)`.
* GHC's constraint solver now solves constraints in each top-level group
  sooner (see
  [here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#the-order-of-th-splices-is-more-important)).
  This affects `macaw-aarch32` and `macaw-symbolic`, as they separate top-level
  groups with `$(return [])` Template Haskell splices. The previous locations
  of these splices made it so that the TH-generated instances in that package
  were not available to any code before the splice, resulting in type errors
  when compiled with GHC 9.0.

  To overcome this, I rearranged the TH-generated instances so that they appear
  before the top-level groups that make use of them.
* GHC 9.0 now enables `-Wstar-is-type` in `-Wall`, so this patch replaces some
  uses of `*` with `Data.Kind.Type`. `Data.Kind` requires the use of GHC 8.0 or
  later, so this patch also updates thes lower bounds on `base` to `>= 4.9` in
  the appropriate `.cabal` files. (I'm fairly certain that this requirement was
  already present implicity, but better to be explicit about it.)
* The `asl-translator`, `crucible`, and `semmc` submodules were updated to
  allow them to build with GHC 9.0. The `llvm-pretty` and
  `llvm-pretty-bc-parser` submodules were also bumped to accommodate unrelated
  changes in `crucible` that were brought in.
* The upper version bounds on `doctest` in `macaw-symbolic`'s test suite were
  raised to allow it to build with GHC 9.0.
2022-01-10 16:40:23 -05:00
..
src/Data/Macaw Support building with GHC 9.0 2022-01-10 16:40:23 -05:00
ChangeLog.md AArch32: Support conditional returns (#243) 2021-11-19 16:20:50 -08:00
LICENSE Update license information. 2017-09-27 15:59:06 -07:00
macaw-base.cabal Expose block classification in the ArchitectureInfo 2021-11-05 18:25:03 -07:00
README.rst Haddock and README fixes. 2019-01-08 16:38:38 -08:00

The macaw library implements architecture-independent binary code
discovery.  Support for specific architectures is provided by
implementing the semantics of that architecture.  The library is
written in terms of an abstract interface to memory, for which an ELF
backend is provided (via the elf-edit_ library).  The basic code
discovery is based on a variant of Value Set Analysis (VSA).

The most important user-facing abstractions are:

* The ``Memory`` type, defined in ``Data.Macaw.Memory``, which provides an abstract interface to an address space containing both code and data.
* The ``memoryForElfSegments`` function is a useful helper to produce a ``Memory`` from an ELF file.
* The ``cfgFromAddrs`` function, defined in ``Data.Macaw.Discovery``, which performs code discovery on a ``Memory`` given some initial parameters (semantics to use via ``ArchitectureInfo`` and some entry points).
* The ``DiscoveryInfo`` type, which is the result of ``cfgFromAddrs``; it contains a collection of ``DiscoveryFunInfo`` records, each of which represents a discovered function.  Every basic block is assigned to at least one function.

Architecture-specific code goes into separate libraries.  X86-specific code is in the macaw-x86 repo.

An abbreviated example of using macaw on an X86_64 ELF file looks like::

  import qualified Data.Map as M
  import qualified Data.ElfEdit as E
  import qualified Data.Parameterized.Some as PU
  import qualified Data.Macaw.X86 as MX86
  import qualified Data.Macaw.Memory.ElfLoader as ML
  import qualified Data.Macaw.Discovery as MD

  discoverCode :: E.Elf Word64 -> (forall ids . MD.DiscoveryInfo X86_64 ids -> a) -> a
  discoverCode elf k =
    case ML.resolveElfContents ML.defaultLoadOptions elf of
      Left e -> error (show e)
      Right (_, _, Nothing, _) -> error "Unable to determine entry point"
      Right (warn, mem, Just entryPoint, _) -> do
        mapM_ print warn
        case MD.cfgFromAddrs MX86.x86_64_linux_info mem M.empty [entryPoint] [] of
        PU.Some di -> k di


In the callback ``k``, the ``DiscoveryInfo`` can be analyzed as desired.

Implementing support for an architecture is more involved and requires implementing an ``ArchitectureInfo``, which is defined in ``Data.Macaw.Architecture.Info``.  This structure contains architecture-specific information like:

* The pointer width
* A disassembler from bytes to abstract instructions
* ABI information regarding registers and calling conventions
* A transfer function for architecture-specific features not represented in the common IR

.. _elf-edit: https://github.com/GaloisInc/elf-edit
.. _flexdis86: https://github.com/GaloisInc/flexdis86