Commit Graph

242 Commits

Author SHA1 Message Date
Ryan Scott
1add47389a macaw-x86: Fix call semantics when call target involves the stack pointer
Previously, the `macaw-x86` semantics for `call` would retrieve the call target
*after* pushing the next instruction's address to the stack, but if the call
target involves the stack pointer, then this would mean that it would get the
next instruction's address when retrieving the call target. This is not what is
intended!

This patch fixes the issue by always retrieving the call target *before*
pushing the next instruction's address to the stack. I have added a test case
to the `macaw-x86-symbolic` test suite which demonstrates that this fix works
as intended.

Fixes #420.
2024-08-13 12:31:09 -04:00
Ryan Scott
9954dd6d01 Fix -Wx-partial warnings uncovered by GHC 9.8 2024-08-08 09:34:03 -04:00
Ryan Scott
c1a1449ec2 Enable -Wno-orphans to fix warnings uncovered by GHC 9.8
GHC 9.8 is better about reporting orphan type family instances, which are used
in various spots in Macaw. Enable `-Wno-orphans` to suppress these warnings.
2024-08-08 09:34:03 -04:00
Andrei Stefanescu
9d8cdcc587
Add semantics for prefetch instructions. (#365) 2024-01-10 11:31:36 -08:00
Valentin Robert
28d3c587fc fix incorrect documentation 2023-09-08 10:17:15 -07:00
Ryan Scott
984f7cb368 Support building with GHC 9.6
This patch contains a handful of tweaks needed to make the libraries in the
`macaw` repo build with GHC 9.6:

* GHC 9.6 bundles `mtl-2.3.*`, which no longer re-exports `Control.Monad`,
  `Control.Monad.Trans`, and similar modules from `mtl`-related modules. To
  accommodate this, various imports have been made more explicit.
* I have disambiguated a use of `Data.Parameterized.NatRepr.withKnownNat` in
  `macaw-aarch32` to avoid clashing with a newly exported function of the same
  name in `GHC.TypeNats`.
* I have bumped various upper version bounds on `doctest`,
  `optparse-applicative`, and `what4` to allow building these libraries with
  GHC 9.6.
* I have bumped the following submodules to bring in GHC 9.6–related changes:
  * `asl-translator`: GaloisInc/asl-translator#53
  * `crucible`: GaloisInc/crucible#1102
  * `dwarf`: GaloisInc/dwarf#6
  * `elf-edit`: GaloisInc/elf-edit#38
  * `flexdis86`: GaloisInc/flexdis86#54
  * `grift`: GaloisInc/grift#9
  * `llvm-pretty`: elliottt/llvm-pretty#112
  * `llvm-pretty-bc-parser`: GaloisInc/llvm-pretty-bc-parser#225
  * `semmc`: GaloisInc/semmc#80
  * `what4`: GaloisInc/what4#235
2023-08-21 08:16:10 -04:00
Valentin Robert
417e8b780b remove redundant pragmas 2023-08-09 14:31:50 -07:00
Ryan Scott
97c61e471a Add basic support for simulating PLT stubs and shared libraries
This extends `Data.Macaw.Symbolic.Testing` in `macaw-symbolic` to be able to
handle binaries that depend on shared libraries. This is fully functional for
the x86-64 and AArch32 symbolic backends, and I have added test cases to the
respective repos demonstrating that it works. (The PowerPC backend is not yet
supported. At a minimum, this is blocked on GaloisInc/elf-edit#35.)

To implement this, I also needed to add some additional infrastructure to
`macaw-base` (I put this infrastructure here as it doesn't depend on any
Crucible-specific functionality):

* `Data.Macaw.Memory.ElfLoader.DynamicDependencies`: a basic ELF dynamic
  loader that performs a breadth-first search over all `DT_NEEDED` entries
  that an ELF binary depends on (both directly and indirectly).
* `Data.Macaw.Memory.ElfLoader.PLTStubs`: a collection of heuristics for
  detecting the addresses of PLT stubs in a dynamically linked binary.

It is worth noting that shared libraries are rife with nuance and subtlety,
and the way `macaw` models shared libraries is not 100% accurate. I have
written a length `Note [Shared libraries]` in `Data.Macaw.Symbolic.Testing`
to describe where corners had to be cut.

Fixes #318.
2023-02-23 17:16:12 -05:00
Tristan Ravitch
6a4f406c68 Revisit handling of tail calls
It turns out that we have to be more conservative with tail call identification,
as incorrectly identifying a block as the target of a tail call (instead of a
branch) can cause other branch classifiers to fail if that block is the target
of another jump.

Ultimately, we will need to give up some tail call recognition (since they are
in general indistinguishable from jumps), and instead only identify known call
targets as tail call candidates.

With additional global analysis we could do better.

Fixes #294
2022-06-27 15:02:43 -07:00
Tristan Ravitch
857bb72b31 [x86] Add an option to save macaw IR from test cases 2022-06-27 15:02:43 -07:00
Ryan Scott
6e020bcde6 Fix -Wincomplete-uni-patterns warnings
GHC 9.2 adds `-Wincomplete-uni-patterns` to `-Wall`, which uncovers a slew of
previously unnoticed warnings in `macaw`. This patch fixes them, mostly by
adding explicit fall-through cases.
2022-05-31 15:50:48 -04:00
Ryan Scott
6237d615c3 Fix -Wnoncanonical-monad-instances warnings
GHC 9.2 adds `-Wnoncanonical-monad-instances` to `-Wall`, which warns whenever
one has explicit implementations of `return` or `(>>)` that aren't simply
`return = pure` or `(>>) = (*>)`. Since these are the default
implementations of `return` and `(>>)` since `base-4.11`, the simplest
way to fix the warnings is to simply remove all explicit definitions of
`return` and `(>>)` and rely on the defaults, which this patch accomplishes.
2022-05-31 15:50:48 -04:00
Tristan Ravitch
8e10643b0f
Fix tail call classification (#286)
The tail call classifier came after the jump classifier, which was a problem because it is less strict than the tail call classifier, meaning it would always fire.  This commit moves direct jump to be the last classifier applied, giving the others a chance.

Includes a test case in the ARM backend.

This requires some updates to some of the expected test results, as a few blocks are now classified as tail calls that were
plain jumps before.  They really could be considered either.  I think it would be nice if these could be classified as jumps instead, but the reason they are flagged as tail calls is mostly down to the fact that their surrounding context is so simple that either interpretation works.

Correcting this would require some heuristics based on additional analysis passes.

The test harness for macaw symbolic required a few changes because the new detection of some jumps as tail calls introduces new calls into the symbolic test suites. However, the symbolic testing harness did not support calls before.  Adding support required a bit of plumbing, including a more extensive code discovery pass.


Fixes #285
2022-05-10 07:29:55 -07:00
Brett Boston
a5796fc955
Reverse syscall override return register ordering (#284)
When a user overrides a system call on an architecture that supports returning two values from a system call and they provide a context containing the result of the system call in the form

```
empty :> v0 :> v1
```

macaw will perform the register assignment

```
r0 := v1
r1 := v0
```

This change reverses this behavior so that the assignment becomes

```
r0 := v0
r1 := v1
```

This brings the expected ordering of the result context in agreement
with the left-to-right ordering of the argument context:

```
empty :> arg1 :> arg2 :> ...
```
2022-05-04 12:41:02 -07:00
Ryan Scott
ce10bc9243 Drop support for GHC 8.6
This allows us to remove gobs of CPP as a consequence.
2022-01-10 16:40:23 -05:00
Tristan Ravitch
9ce3d43188
AArch32: Support conditional returns (#243)
Adds support in macaw-aarch32 for conditional returns. These are not supported in core macaw, and are thus architecture-specific block terminators.

This required changes to the type of arch-specific block terminators. Before, `ArchTermStmt` was only parameterized by a state thread (`ids`).  This meant that they could not contain macaw (or crucible) values.  Some work on. AArch32 requires being able to store condition values in arch terminators (to support conditional returns). This change modifies the `ArchTermStmt` to enable this, which requires a bit of plumbing through various definitions and some extra instances.

In support of actually using this, it also became necessary to plumb fallthrough block labels through the architecture-specific terminator translation in macaw-symbolic.

Note that this change was overdue, as the PowerPC backend was storing macaw values in a way that would have rendered them unusable in the macaw-ppc-symbolic translation, had any interpretation been provided.  These new changes will enable a handler to be written for the conditional PowerPC trap instructions.

PowerPC, x86, and ARM have been updated.

Improves the macaw-aarch32 tests. There is now a command line option to save the generated macaw IR for each
discovered function to /tmp. Note that this reuses some infrastructure from the macaw-symbolic tests. This
shared functionality should be extracted into a macaw-testing library.
2021-11-19 16:20:50 -08:00
Tristan Ravitch
2c85dce18e Expose block classification in the ArchitectureInfo
This change makes the block classifier heuristic part of the `ArchitectureInfo`
structure.  This enables clients and architecture backends to customize the
block classification heuristics.  This is most useful for architectures that
have complex architecture-specific block terminators that require analysis to
generate (e.g., conditional returns).  It will also make macaw-refinement
simpler in the future, as the SMT-based refinement is just an additional block
classifier (but is currently implemented via a hacky side channel).

This change introduces an ancillary change, which should not be very
user-visible.

It splits the Macaw.Discovery and Macaw.Discovery.State modules to break
module import cycles in a way that enables us to expose the classifier.  This
should not be user-visible, as Macaw.Discovery still exports the same
names (with one minor exception that should not appear in user code).

It also moves the definition of the `ArchBlockPrecond` type family; the few
affected places should be updated. User code should probably not be able to see
this.
2021-11-05 18:25:03 -07:00
Ryan Scott
5547632f65 macaw-x86: Handle sign-extended immediates in def_push
See `Note [Sign-extending immediate operands in push]` in
`Data.Macaw.X86.Semantics` for the full story. I have also added a test case
in `macaw-x86-symbolic` which ensures that the stack-pointer-decrementing
logic behaves as one would expect.

Bumps in the `flexdis86` submodule to bring in GaloisInc/flexdis86#37.

Fixes #235.
2021-10-12 16:37:21 -04:00
Andrew Kent
5906f34a63 doc: fix MemCmp docs w.r.t. semantics of return value 2021-09-10 16:16:41 -07:00
Tristan Ravitch
380d732d0e
Implement system call support for x86 (#226)
Implement support for symbolically executing system calls in macaw-symbolic. **To update code that does not need to symbolically execute system calls (i.e., most clients of macaw-symbolic), just pass the new `unsupportedSyscalls` default handler as the fifth argument of `macawExtensions`.**

The primary interface is via the new `LookupSyscallHandle` callback passed to `macawExtensions`. This callback inspects the environment and returns a Crucible `FunctionHandle` that models the behavior of the requested system call. Note that this mechanism only supports concrete system calls (i.e., system calls where the system call number is concrete). The x86 backend has been updated to support this new functionality.

The representation of system calls in macaw is still architecture-specific (because there are interesting differences between system call instructions across architectures). The idea is that system calls are now treated in two steps:
1. A macaw-symbolic extension statement that looks up the override to invoke for the given syscall (returned as a Crucible FunctionHandle)
2. A call to that handle

We need this two step approach because the handlers that interpret syntax extension statements cannot symbolically branch (and thus cannot call overrides). The extension interpreter just looks up the necessary handle and uses the standard call/override machinery to handle any branching required to support the system call model functionality.

The major complication to this approach is that system calls need to update values in registers when they return. To capture these updates, the architecture-specific syntax extension needs to explicitly update any machine registers that could possibly be affected. The explicit updates are necessary because machine registers do not exist anymore at the macaw-symbolic level (at least within a block). To handle all of these constraints:
1. System calls are represented as extension functions at the macaw level when lifted from machine code.
2. During translation into crucible (via macaw-symbolic), the extension functions are translated into two statements: a function handle lookup and then a function call (with the return values being explicitly threaded through the Crucible function).
3. During symbolic execution, the lookup statement examines the environment to return the necessary function handle, while the handle is called via the normal machinery.

Note that the feature is entirely controlled by the `LookupSyscallHandle` function, which determines the system call dispatch policy. No system call models are included with this change.

Co-authored-by: Brett Boston <boston@galois.com>
2021-08-27 15:47:40 -07:00
Joe Hendrix
cdc90bd846 Update to more recent flexdis 2021-06-14 13:22:46 -07:00
Joe Hendrix
ceb64be843 Sort x86 functions for easier browsing. 2021-04-27 23:54:11 -07:00
Sam Breese
8a0c760886
x86: Add semantics for SHA256 instructions (#196)
* Add semantics for SHA256 instructions

* Use an additional helper function

* Address comments
2021-03-30 18:32:35 -04:00
Joe Hendrix
8756d2e9d3 Minor layout changes 2021-01-29 12:01:16 -08:00
Sam Breese
d5e4a441cd
x86: Add semantics for aesimc (#177)
* x86: Add semantics for aesimc

* x86: Use safeSymbol rather than userSymbol
2021-01-11 13:24:16 -05:00
Sam Breese
2bd0633ba8
x86: Fix semantics for pinsrw, add semantics for pinsr{b,d,q} (#183)
* x86: Fix semantics for pinsrw, add semantics for pinsr{b,d,q}

* x86: Add comments on exec_pinsrx parameters
2020-12-22 15:44:55 -05:00
Brian Huffman
b3af7d63e9 Use OverloadedStrings for the prettyprinter Doc type. 2020-12-02 17:23:47 -08:00
Brian Huffman
2a620d41de Switch from ansi-wl-pprint to the prettyprinter package.
This patch relies on the following submodule updates:
- GaloisInc/what4#77
- GaloisInc/elf-edit#20
- GaloisInc/crucible#586
- GaloisInc/asl-translator#28

This patch updates the following packages:
- macaw-base
- macaw-symbolic
- macaw-x86
- macaw-x86-symbolic
- macaw-aarch32
- macaw-ppc
- macaw-semmc
- macaw-refinement
2020-12-02 11:38:19 -08:00
Sam Breese
2a56e404bd
x86: Special case for sbb with duplicated operand (#176)
This better handles cases like sbb rax, rax, where we know that the result will be -cf regardless of the value in rax.
2020-11-18 04:19:27 -05:00
Joe Hendrix
d2b81d3c2f Fixes for jump table tests.
* Update macaw-x86-tests to build properly.
* Fix off by two error in memMapOverwrite
* Introduce some special handling for unsigned-extension in stack
  analysis so it knows one value is the unsigned extension of another.
* Error report formating improvements
* Slightly more precise treatment of archfn is bound updates.
2020-11-12 11:25:30 -08:00
Joe Hendrix
22a9104faa Various cleanups.
Consolidate three different checks that control when to explore
a function into a single one defined in exploreFunPred.

Modify noreturn function calls to not treat the return address
as a potential function entry point.

Add basic checking of LSDA address to compare-dwarfdump.

Minor code refactoring and submodule updates.
2020-11-06 14:37:13 -08:00
Joe Hendrix
9203a37b94 Minor cleanups; dwarf updates 2020-11-06 14:35:06 -08:00
Lisanna Dettwyler
47544e4b2d Fix warnings in GHC 8.10 2020-10-20 13:53:22 -07:00
Sam Breese
34e7394c14
x86: Implement semantics for a few instructions (#167)
* x86: Add aesenc, aesenclast, aesdec, aesdeclast

* x86: Add vpcmpgtd

* WIP

* Implement Pshufb

* Fix AESNI_AESEncLast.

* Fix PtCmpGt.

* Refactor AESNI instructions a bit

* Finish refactoring

* Forgot MultiWayIf

* Reduce duplication a bit

* Address comments

Co-authored-by: Andrei Stefanescu <andrei@stefanescu.io>
2020-10-08 19:37:17 -04:00
Sam Breese
5fdfdb2eaa
x86: Add some entries to the export list of Data.Macaw.X86 (#164) 2020-10-08 18:25:30 -04:00
Sam Breese
b248cf7f45
x86: Add semantics for vpunpckhqdq, vxorps, vpaddb, and movbe (#155)
* Add semantics for vpunpckhqdq

* Add semantics for movbe

* Add semantics for vxorps and vpaddb
2020-08-20 14:50:20 -04:00
Sam Breese
48737990f3
x86: Add semantics for some AVX2 instructions (#149)
* x86: Add semantics for the vpsrld and vpsrlq instructions

* x86: Add semantics for vpaddq

* Fix Haddock for PointwiseLogicalShiftR

* x86: Change vpsubd to PtSub rather than PtAdd
2020-07-14 14:41:16 -04:00
Tristan Ravitch
b160e480a7
x86: Add semantics for the endbr instructions (#147)
This change treats them as no-ops (which is what they do on all released
hardware).  We could represent them with arch extensions.  This has a supporting
change in flexdis86 (included as a submodule).
2020-06-25 13:43:15 -07:00
Sam Breese
02c6cc3cb5
Handle bitwise operations on stack offset abstract values (#136)
- Generalize handling of bitwise operations to also apply them to stack offsets
- Use the extended bitwise handling on AND
2020-05-28 14:04:06 -04:00
Brian Huffman
f65c80d7b1 Make code compile without warnings in ghc-8.6 and ghc-8.8. 2020-04-23 20:22:30 -07:00
Tristan Ravitch
c825332f39
Update/ghc 8.8 (#112)
Updates for GHC 8.8

The two main classes of update are related to MonadFail and type alias expansion.

The MonadFail updates introduce explicit MonadFail instances and backward-compatible `fail` implementations under `Monad` for older GHC versions.

The type alias expansion rules changed in GHC 8.8 in a way that breaks the `Simple Lens` idiom; instead, we have to use `Lens'`.  Lens started supporting this alias in version 3.8, which was released in 2013.

This change includes necessary submodule updates, as well as the update for the split of what4 into its own repository.
2020-03-03 13:28:26 -08:00
Joe Hendrix
46be7aa52b Implement new registerUse analysis.
The new registerUse analysis uses a three phase process:

Phase 1 computes invariants about the start state of each block.  It
will indicate when registers/stack locations store stack offsets, and
where callee saved registers are stashed.  It also memoizes
information about stack reads and writes to simplify later passes.

Phase 2 is a demand analysis that computes which registers and stack
locations must be available to execute the program.  It then
propagates those constraints across blocks in the function.

Phase 3 combines the information into a form relevant for function
recovery.
2020-02-06 19:26:46 -08:00
Samuel Breese
fb1611a127
Semantics for MULX from BMI2 and all of ADX 2019-12-19 10:43:54 -05:00
Joe Hendrix
1ed99917b4
Add testcase for non-zero index jumptable. 2019-12-04 14:31:45 -08:00
Joe Hendrix
df9b5bbe27
Support for offset jump tables. 2019-11-19 14:52:58 -08:00
Joe Hendrix
1be68af2a0
Fix warnings. 2019-10-21 21:18:54 -07:00
Joe Hendrix
81d0469fbe
Group mod/div x86 functions. 2019-10-21 14:59:43 -07:00
Joe Hendrix
744424d28b
Remove unused X86PrimLoc. 2019-09-20 15:19:37 -07:00
Joe Hendrix
5e834122d1
Segment register updates; stack offset calculation. 2019-09-20 13:58:05 -07:00
Joe Hendrix
7aee0cd803
Remove unused debug reg code. 2019-09-09 00:55:28 -07:00