Commit Graph

50 Commits

Author SHA1 Message Date
Ryan Scott
049096c506 Support building with GHC 9.0
This contains a variety of fixes needed to make the packages in the `macaw`
repo compile with GHC 9.0:

* GHC 9.0 implements simplified subsumption (see
  [here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#simplified-subsumption)).
  In most cases, adapting to this is a matter of manually eta expanding
  definitions, such as in `base:Data.Macaw.Analysis.RegisterUse`. In the case
  of `macaw-x86-symbolic:Data.Macaw.X86.Crucible`, the type signature of
  `evalExt` had to be made more specific to adapt to the loss of contravariance
  when typechecking `(->)`.
* GHC's constraint solver now solves constraints in each top-level group
  sooner (see
  [here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#the-order-of-th-splices-is-more-important)).
  This affects `macaw-aarch32` and `macaw-symbolic`, as they separate top-level
  groups with `$(return [])` Template Haskell splices. The previous locations
  of these splices made it so that the TH-generated instances in that package
  were not available to any code before the splice, resulting in type errors
  when compiled with GHC 9.0.

  To overcome this, I rearranged the TH-generated instances so that they appear
  before the top-level groups that make use of them.
* GHC 9.0 now enables `-Wstar-is-type` in `-Wall`, so this patch replaces some
  uses of `*` with `Data.Kind.Type`. `Data.Kind` requires the use of GHC 8.0 or
  later, so this patch also updates thes lower bounds on `base` to `>= 4.9` in
  the appropriate `.cabal` files. (I'm fairly certain that this requirement was
  already present implicity, but better to be explicit about it.)
* The `asl-translator`, `crucible`, and `semmc` submodules were updated to
  allow them to build with GHC 9.0. The `llvm-pretty` and
  `llvm-pretty-bc-parser` submodules were also bumped to accommodate unrelated
  changes in `crucible` that were brought in.
* The upper version bounds on `doctest` in `macaw-symbolic`'s test suite were
  raised to allow it to build with GHC 9.0.
2022-01-10 16:40:23 -05:00
Tristan Ravitch
8047a5d0ed
arm: Add transfer functions for division (#249)
Add support for abstract interpretation of division operations in the ARM
backend (when the operands to division are concretely known). 

This commit also adds extended documentation on the semantics of these
operations.

The concrete evaluation eliminates constant division operations.  The abstract
cases are probably obsolete in light of that, but are still interesting...

These changes were motivated by insufficient simplification around some of the syscall/errno handling code in musl
2021-12-07 14:11:59 -08:00
Tristan Ravitch
3e918f8b51
Revise handling of syscalls in AArch32 to match X86 (#246)
The old formulation (with system calls as block terminators) proved to be
impossible to implement properly. Handlers for syntax overrides have very
limited types (`IO`, rather than `OverrideSim`), which made symbolic branching
and reusing overrides impossible.

This change replaces the system call block terminator with an arch-specific
function that is translated into a function handle lookup (which is then
dispatched to with a call).

Unfortunately, this refactoring required combining the AArch32 simplification
module with the architecture extension definitions, due to the new translation
relying on the simplifier instance.
2021-11-24 11:59:56 -08:00
Tristan Ravitch
9ce3d43188
AArch32: Support conditional returns (#243)
Adds support in macaw-aarch32 for conditional returns. These are not supported in core macaw, and are thus architecture-specific block terminators.

This required changes to the type of arch-specific block terminators. Before, `ArchTermStmt` was only parameterized by a state thread (`ids`).  This meant that they could not contain macaw (or crucible) values.  Some work on. AArch32 requires being able to store condition values in arch terminators (to support conditional returns). This change modifies the `ArchTermStmt` to enable this, which requires a bit of plumbing through various definitions and some extra instances.

In support of actually using this, it also became necessary to plumb fallthrough block labels through the architecture-specific terminator translation in macaw-symbolic.

Note that this change was overdue, as the PowerPC backend was storing macaw values in a way that would have rendered them unusable in the macaw-ppc-symbolic translation, had any interpretation been provided.  These new changes will enable a handler to be written for the conditional PowerPC trap instructions.

PowerPC, x86, and ARM have been updated.

Improves the macaw-aarch32 tests. There is now a command line option to save the generated macaw IR for each
discovered function to /tmp. Note that this reuses some infrastructure from the macaw-symbolic tests. This
shared functionality should be extracted into a macaw-testing library.
2021-11-19 16:20:50 -08:00
Tristan Ravitch
c6eae87724 Update the aarch32 disassembler
This fixes some critical bugs in thumb2 disassembly.  This change adds some
macaw-level tests to ensure that thumb2 is working properly.
2021-11-05 20:01:11 -07:00
Tristan Ravitch
2c85dce18e Expose block classification in the ArchitectureInfo
This change makes the block classifier heuristic part of the `ArchitectureInfo`
structure.  This enables clients and architecture backends to customize the
block classification heuristics.  This is most useful for architectures that
have complex architecture-specific block terminators that require analysis to
generate (e.g., conditional returns).  It will also make macaw-refinement
simpler in the future, as the SMT-based refinement is just an additional block
classifier (but is currently implemented via a hacky side channel).

This change introduces an ancillary change, which should not be very
user-visible.

It splits the Macaw.Discovery and Macaw.Discovery.State modules to break
module import cycles in a way that enables us to expose the classifier.  This
should not be user-visible, as Macaw.Discovery still exports the same
names (with one minor exception that should not appear in user code).

It also moves the definition of the `ArchBlockPrecond` type family; the few
affected places should be updated. User code should probably not be able to see
this.
2021-11-05 18:25:03 -07:00
Tristan Ravitch
37d8029c00 Remove an incorrect assumption about addresses in the ARM decode logic
It was previously assuming that addresses are absolute, which is not true for
position independent executables.  Extracting the offset from the address is
sufficient for our purposes here (note that taking the offset from the
`MemSegmentOffset` would not be right, as that offset is relative to the segment
start).
2021-07-21 11:44:20 -07:00
Tristan Ravitch
6fa6010473 Add more detail to ARM memory errors
Record the decode mode in the error, which can help those interpreting errors
understand what state the decoder was in when it failed.
2021-07-21 11:43:49 -07:00
Tristan Ravitch
f13eb7c01f Add an additional case handling reading/writing the PC directly
We originally left this case out to catch cases where the PC was written to
directly (skipping the special `LoadWritePC` logic in the semantics). At this
point we are confident that the ARM semantics handle that correctly. The
translation of the semantics into macaw (via TH) are not entirely lazy, and the
interpretation will call this function indirectly (via `get_gpr_uf`); returning
Nothing in the `r15` case caused that part of the translation to fail. The
resulting value is never actually used (because the ARM semantics have special
behavior when reading from `r15`), but the error was too eager and caused a
crash.

This change just lets that code continue.
2021-07-09 10:54:58 -07:00
Tristan Ravitch
196a81ad29
Fix a bug in the AArch32-specific simplifier (#188)
Some important simplifications for classification were failing to fire because
other simplifications fired first, short circuiting the search.  It turns out
that more than one rule may apply at any given step (and it is important to
apply all of the rules that can be applied).  This commit modifies the
simplifier to apply rules until saturation.
2021-01-27 23:55:44 -08:00
Brian Huffman
2a620d41de Switch from ansi-wl-pprint to the prettyprinter package.
This patch relies on the following submodule updates:
- GaloisInc/what4#77
- GaloisInc/elf-edit#20
- GaloisInc/crucible#586
- GaloisInc/asl-translator#28

This patch updates the following packages:
- macaw-base
- macaw-symbolic
- macaw-x86
- macaw-x86-symbolic
- macaw-aarch32
- macaw-ppc
- macaw-semmc
- macaw-refinement
2020-12-02 11:38:19 -08:00
Joe Hendrix
6d879215e5 Fix macaw-aarch32 tests. 2020-11-12 19:22:20 -08:00
Joe Hendrix
5aad8ca32e Upgrade to elf-edit 0.39 and other libraries. 2020-11-10 17:15:47 -08:00
Tristan Ravitch
37861df8c7
Support for mixed ARM/Thumb binaries (#174)
aarch32: Support mixed ARM/Thumb1 binaries

This updates the aarch32 backend to decode Thumb instructions and generate the Thumb semantics. The major implementation change is to use the `ArchBlockPrecond` feature of macaw to track the Thumb state (`PSTATE_T`) across block boundaries.

The ARM code discovery decides whether or not a function entry point should be decoded as Thumb by examining the low bit of the function address. If the low bit is set, it is a Thumb entry point. This has the slightly odd effect of causing macaw to say that the function is at the address with the low bit set, which is not technically true. This is documented in the README, but not obvious on inspection. Most use cases should not care, and can in any case account for it. In the future, it should be possible to fix this (though it will require some changes to the core of macaw).
2020-11-02 12:48:01 -08:00
Daniel Matichuk
c8d057b715 compatibility fix for template-haskell-2.16
See: https://gitlab.haskell.org/ghc/ghc/-/issues/15843
2020-10-15 16:32:46 -07:00
Tristan Ravitch
cbc7a3ca31
Feature/aarch32 symbolic backend (#162)
aarch32-symbolic: Implement most of the remaining macaw-aarch32-symbolic bits

It should be usable now, modulo some execution-time semantics for the floating
point operations.  There will be a separate ticket covering the changes required
for them (some refactoring of how they are handled during translation is required).
2020-10-05 12:31:39 -07:00
Daniel Matichuk
75d998f719 aarch32: avoid symbolic addresses
this changes the write action model to instead index
writes based on the macaw term representing the address
to be written
2020-08-05 16:53:27 -07:00
Daniel Matichuk
b42cce3f1c aarch32: support non-concrete conditional writes 2020-08-04 23:57:50 -07:00
Daniel Matichuk
f2defdcdc4 aarch32: use empty tuple for unit type 2020-07-31 00:31:29 -07:00
Daniel Matichuk
dd8b78bb7b aarch32: fix floating point uf calls 2020-07-28 11:57:07 -07:00
Daniel Matichuk
278d365a31 aarch32: simplify write action representation 2020-07-28 11:57:07 -07:00
Daniel Matichuk
f53ea84cd9 module import cleanup 2020-07-28 11:57:07 -07:00
Daniel Matichuk
838ef3924d aarch32: wrap up stateful operations as values 2020-07-28 11:57:07 -07:00
Daniel Matichuk
98a429b7e0 avoid using applicative binds for eager values 2020-07-28 11:57:07 -07:00
Daniel Matichuk
62dd08f5a1 add more cases for simplifying boolean Muxes 2020-07-28 11:57:07 -07:00
Ben Selfridge
039b8497fc
updates what4, crucible, etc. (#146)
* update to bv-sized branch of what4 and other things

* removed parameterized-utils submodule completely

* Updates submodules

* Fixes macaw-symbolic w.r.t. crucible-llvm changes

Co-authored-by: Ben Selfridge <ben@000548-benselfridge.local>
2020-06-16 16:49:55 -07:00
Tristan Ravitch
7ec8df5e92
aarch32: Two bug fixes
* Fix block size accounting in the disassembler

The value in the early failure combinator is used as the *block size* in the
resulting macaw block.  The code was actually using the offset from the
beginning of the segment, which is wrong.  This produced very large blocks that
didn't reflect the results of code discovery and led to decode errors later in
the pipeline.

* Do not throw an error if concreteIte has a symbolic argument

The `concreteIte` combinator turns formula conditionals with concrete operands
into Haskell-level conditional execution.  It would fail because we believed
that there were no cases that could fail to satisfy that condition.  That
assumption was not true - we need to fall back to generating a mux when we have
a symbolic condition.
2020-06-11 15:28:23 -07:00
Kevin Quick
c625c2cf92
Update for GHC 8.4 type management.
Under GHC8.4, a let binding is independent of the surrounding context,
so the let statements encountered errors related to type matching on
synthesized internal type parameters that could not be identified as
the same due to rigid skolem type binding inside the let.
2020-06-02 11:35:40 -07:00
Tristan Ravitch
89fc5a73f7
Tr/full arm intrinsics (#137)
Improve the TH codegen for macaw-semmc

This change lazily translates as much as possible.  It also generates somewhat more compact code. This change also finishes implementing primitives for the aarch32 backend.  Complementing the aarch32 changes, the macaw-semmc interface has been modified to allow macaw-aarch32 to avoid a redundant serialize-deserialize round.

Co-authored-by: Kevin Quick <kquick@galois.com>
2020-05-26 09:24:45 -07:00
Kevin Quick
aff97bec6a
Update bv-sized lower constraint to allow parameterized-utils 2.1.0. 2020-05-15 10:25:12 -07:00
Kevin Quick
5da67a8ec1
Update bv-sized package constraints. 2020-05-15 10:22:20 -07:00
Kevin Quick
3bee174f5f
[macaw-aarch32] Update for bv-sized API changes in version 1.0 2020-05-14 16:48:54 -07:00
Tristan Ravitch
a824fc4051
Tr/warning cleanups (#127)
Warning and style cleanups in macaw-semmc and macaw-aarch32
2020-04-14 00:07:15 -07:00
Tristan Ravitch
e536e43f1b Introduce macaw-aarch32 and macaw-aarch32-symbolic
These packages replace the old macaw-arm (which has been removed).  The only
change to the core macaw is to introduce a `Lift` instance for the Endianness
data type, which is used in macaw-semmc.

The macaw-aarch32 package uses the official ARM semantics (via the
asl-translator package).  In its current state, macaw-aarch32 seems to handle
the common idioms of simple ARM binaries.  Position independent executables have
not been tested yet.  The semantics and disassemblers for Thumb are present, but
not integrated into code discovery at this time.  There are some tests in
macaw-aarch32.  Compile times are longer than necessarily desired.
macaw-aarch32 can be compiled in two modes: lite mode (cabal flag -fasl-lite),
which uses a restricted set of instructions for testing, and takes less time to
compile.  The full instruction set is the default, though there are a few
undefined functions that are not yet handled for the full set, mostly relating
to floating point operations.

The macaw-aarch32-symbolic package is currently a stub, but is implemented to
provide a few necessary instances.
2020-04-12 19:53:00 -07:00
Tristan Ravitch
73f758544d Update tests and expected outputs
The tests were issuing the exit syscall incorrectly (they didn't set the sycall
number) and were not executable.
2020-04-08 21:21:28 -07:00
Tristan Ravitch
3e1c2aa487 Warning cleanup 2020-04-08 20:27:29 -07:00
Tristan Ravitch
fabb8799d8 Make tests less chatty 2020-04-08 19:57:26 -07:00
Tristan Ravitch
958aeaa3ed Remove the nested mux match rule from macaw core
We can now do enough rewriting in the ARM backend that it isn't needed.  This
adds extra ARM rewriting rules and a term cache to make matching easier.
2020-04-08 19:46:32 -07:00
Tristan Ravitch
d865811701 In ARM, read the current register value from a snapshot
We were reading partially updated values that were committed to the register
state out-of-order, yielding some bad results.

This commit takes a snapshot of the register state before executing each
instruction and only reads register values from the snapshot.
2020-04-08 11:54:28 -07:00
Tristan Ravitch
36c67eb586 Fix an error introduced during cleanup 2020-04-08 07:27:28 -07:00
Tristan Ravitch
b0683c06a9 Poke the ARM simplifier into working
The generic simplifier needed a case to handle xor.

The more specific simplifier needed a case to coalesce adjacent additions.
2020-04-08 02:23:47 -07:00
Tristan Ravitch
997e435a0c WIP Debugging rewriting rules
They fire sometimes and definitely clean up the IR, but they are missing a few
key cases still
2020-04-08 00:04:51 -07:00
Daniel Matichuk
2257c18d65 use appendStmt properly for SIMD write mode 2020-04-06 21:37:47 -07:00
Daniel Matichuk
464835403e import fixup 2020-04-06 20:53:06 -07:00
Daniel Matichuk
a2ee426714 Merge remote-tracking branch 'origin/feature/asl' into feature/asl 2020-04-06 20:48:34 -07:00
Daniel Matichuk
c3bdbfd191 add guarded register/memory writes 2020-04-06 20:38:39 -07:00
Tristan Ravitch
5e5c90c993 Updates to the simplifier and call recognition 2020-04-06 17:56:22 -07:00
Tristan Ravitch
b8c3e65389 Add a test with a call 2020-04-06 15:56:43 -07:00
Tristan Ravitch
c5fe84f97c Add a missing instance for ARMReg 2020-04-05 21:17:47 -07:00
Tristan Ravitch
1fa9b86b26 Rename macaw-asl to macaw-aarch32
This is more descriptive, especially since we will eventually have
macaw-aarch32 (also derived from the ASL specs)
2020-04-05 15:16:39 -07:00