Commit Graph

1942 Commits

Author SHA1 Message Date
Ryan Scott
c3b8bac579 CI: Test GHC 9.0.2 2022-01-10 16:40:23 -05:00
Ryan Scott
ce3e9cba13 Add notes on freeze file generation to README 2022-01-10 16:40:23 -05:00
Ryan Scott
dfeef5f332 Regenerate freeze files 2022-01-10 16:40:23 -05:00
Ryan Scott
7819377d84 macaw-x86-symbolic: Remove unused FloatInfoFromSSEType definition
GHC 9.0 uncovered this type family as being unused. (See
https://gitlab.haskell.org/ghc/ghc/-/issues/18470, which made
`-Wunused-top-binds` more clever about detecting unused, closed
type families like `FloatInfoFromSSEType`.) Let's remove it to avoid
an `-Wunused-top-binds` warning.
2022-01-10 16:40:23 -05:00
Ryan Scott
2f2ba118bd macaw-semmc: Avoid arithmetic overflow from negative shift amounts
While pre-9.0 versions of GHC would silently turn negative right shifts into
left shifts, GHC 9.0 will throw an `arithmetic overflow` exception instead.
This patch makes this behavior explicit in `macaw-semmc` to allow the code to
work on GHC 9.0.

Fixes #212.
2022-01-10 16:40:23 -05:00
Ryan Scott
049096c506 Support building with GHC 9.0
This contains a variety of fixes needed to make the packages in the `macaw`
repo compile with GHC 9.0:

* GHC 9.0 implements simplified subsumption (see
  [here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#simplified-subsumption)).
  In most cases, adapting to this is a matter of manually eta expanding
  definitions, such as in `base:Data.Macaw.Analysis.RegisterUse`. In the case
  of `macaw-x86-symbolic:Data.Macaw.X86.Crucible`, the type signature of
  `evalExt` had to be made more specific to adapt to the loss of contravariance
  when typechecking `(->)`.
* GHC's constraint solver now solves constraints in each top-level group
  sooner (see
  [here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#the-order-of-th-splices-is-more-important)).
  This affects `macaw-aarch32` and `macaw-symbolic`, as they separate top-level
  groups with `$(return [])` Template Haskell splices. The previous locations
  of these splices made it so that the TH-generated instances in that package
  were not available to any code before the splice, resulting in type errors
  when compiled with GHC 9.0.

  To overcome this, I rearranged the TH-generated instances so that they appear
  before the top-level groups that make use of them.
* GHC 9.0 now enables `-Wstar-is-type` in `-Wall`, so this patch replaces some
  uses of `*` with `Data.Kind.Type`. `Data.Kind` requires the use of GHC 8.0 or
  later, so this patch also updates thes lower bounds on `base` to `>= 4.9` in
  the appropriate `.cabal` files. (I'm fairly certain that this requirement was
  already present implicity, but better to be explicit about it.)
* The `asl-translator`, `crucible`, and `semmc` submodules were updated to
  allow them to build with GHC 9.0. The `llvm-pretty` and
  `llvm-pretty-bc-parser` submodules were also bumped to accommodate unrelated
  changes in `crucible` that were brought in.
* The upper version bounds on `doctest` in `macaw-symbolic`'s test suite were
  raised to allow it to build with GHC 9.0.
2022-01-10 16:40:23 -05:00
Ryan Scott
f9b300a3d1 macaw-symbolic: Add fromCrucibleEndian function
This is like `toCrucibleEndian`, but in the opposite direction.
2022-01-04 16:27:57 -06:00
Ryan Scott
45f991ccdf macaw-symbolic: Consolidate duplicate definitions of toCrucibleEndian
There were two identical definitions of `toCrucibleEndian`, one in
`D.M.S.Memory` and another in `D.M.S.Testing`. This commit removes the
latter in favor of the former, which is actually exported.
2022-01-04 16:27:57 -06:00
Ryan Scott
c900dc4de8 macaw-symbolic: Replace mkName with safeSymbol
Fixes #251.
2022-01-04 16:27:24 -06:00
Tristan Ravitch
9c426986ff Revise handling of some pointer operations
Have all additions (at any bit width) go through the special PtrAdd
handler (rather than BVAdd). Also add special handlers for truncation and
extension.

These changes support architectures that do pointer operations at non-pointer
widths (e.g., to detect overflow).  These new operations apply the named
operations over just the offset of pointers, preserving the block id.
2021-12-23 13:54:29 -08:00
Tristan Ravitch
21366abc23
symbolic: Make relocation handling configurable in the symbolic backend (#250)
The `Data.Macaw.Symbolic.Memory` module provides a default memory model and
initial memory setup that is suitable for many symbolic execution
workloads. However, the defaults cannot handle dynamically-linked programs, as
it calls `error` when it attempts to determine an initial value for relocations
it finds in memory.

There are no good defaults for this, as what those values should be depend a lot
on what the verifier wants to prove.

This commit adds some hooks to configure this behavior in the verifier, and is
designed to be extensible and enable other configuration choices where
reasonable.

The original API is unchanged, as it calls the added `newGlobalMemoryWith`
function with a default set of hooks.  Callers with special memory handling
needs are directed to use that function.
2021-12-16 15:22:38 -08:00
Tristan Ravitch
8047a5d0ed
arm: Add transfer functions for division (#249)
Add support for abstract interpretation of division operations in the ARM
backend (when the operands to division are concretely known). 

This commit also adds extended documentation on the semantics of these
operations.

The concrete evaluation eliminates constant division operations.  The abstract
cases are probably obsolete in light of that, but are still interesting...

These changes were motivated by insufficient simplification around some of the syscall/errno handling code in musl
2021-12-07 14:11:59 -08:00
Tristan Ravitch
3e918f8b51
Revise handling of syscalls in AArch32 to match X86 (#246)
The old formulation (with system calls as block terminators) proved to be
impossible to implement properly. Handlers for syntax overrides have very
limited types (`IO`, rather than `OverrideSim`), which made symbolic branching
and reusing overrides impossible.

This change replaces the system call block terminator with an arch-specific
function that is translated into a function handle lookup (which is then
dispatched to with a call).

Unfortunately, this refactoring required combining the AArch32 simplification
module with the architecture extension definitions, due to the new translation
relying on the simplifier instance.
2021-11-24 11:59:56 -08:00
Ryan Scott
d3a53a6769 Update crucible, semmc submodules; adapt to GaloisInc/crucible#906
This updates the `crucible` submodule to include GaloisInc/crucible#906
(`Control granularity of reading uninitialized memory`), as well as the
`semmc` submodule to bring in corresponding changes on its side
(GaloisInc/semmc#69). Some additional `?memOpts :: MemOptions` constraints
needed to be added to some functions in `macaw-symbolic` and
`macaw-refinement` as a result.
2021-11-22 18:27:46 -05:00
Ryan Scott
9efbbf1753 Update llvm-pretty-bc-parser submodule
This updates the `llvm-pretty-bc-parser` to include these PRs:

* GaloisInc/llvm-pretty-bc-parser#159 (`Support parsing fneg instructions`)
* GaloisInc/llvm-pretty-bc-parser#166 (`Support parsing freeze instructions`)
* GaloisInc/llvm-pretty-bc-parser#164 (`Parse DebugInfoEnumerator properly on LLVM 12+`)
* GaloisInc/llvm-pretty-bc-parser#162 (`Support parsing dict{Associated,Allocated,Rank} fields introduced in LLVM 12`)

The `llvm-pretty` submodule had corresponding changes as well. These were
included as part of a previous commit to update the `llvm-pretty` submodule
(952fe5578d), but the `llvm-pretty-bc-parser`
submodule was not updated at the same time. This commit brings the two back
into harmony.
2021-11-22 18:27:46 -05:00
Tristan Ravitch
9ce3d43188
AArch32: Support conditional returns (#243)
Adds support in macaw-aarch32 for conditional returns. These are not supported in core macaw, and are thus architecture-specific block terminators.

This required changes to the type of arch-specific block terminators. Before, `ArchTermStmt` was only parameterized by a state thread (`ids`).  This meant that they could not contain macaw (or crucible) values.  Some work on. AArch32 requires being able to store condition values in arch terminators (to support conditional returns). This change modifies the `ArchTermStmt` to enable this, which requires a bit of plumbing through various definitions and some extra instances.

In support of actually using this, it also became necessary to plumb fallthrough block labels through the architecture-specific terminator translation in macaw-symbolic.

Note that this change was overdue, as the PowerPC backend was storing macaw values in a way that would have rendered them unusable in the macaw-ppc-symbolic translation, had any interpretation been provided.  These new changes will enable a handler to be written for the conditional PowerPC trap instructions.

PowerPC, x86, and ARM have been updated.

Improves the macaw-aarch32 tests. There is now a command line option to save the generated macaw IR for each
discovered function to /tmp. Note that this reuses some infrastructure from the macaw-symbolic tests. This
shared functionality should be extracted into a macaw-testing library.
2021-11-19 16:20:50 -08:00
Tristan Ravitch
952fe5578d Submodule updates 2021-11-18 21:40:09 -08:00
Tristan Ravitch
c6eae87724 Update the aarch32 disassembler
This fixes some critical bugs in thumb2 disassembly.  This change adds some
macaw-level tests to ensure that thumb2 is working properly.
2021-11-05 20:01:11 -07:00
Tristan Ravitch
2c85dce18e Expose block classification in the ArchitectureInfo
This change makes the block classifier heuristic part of the `ArchitectureInfo`
structure.  This enables clients and architecture backends to customize the
block classification heuristics.  This is most useful for architectures that
have complex architecture-specific block terminators that require analysis to
generate (e.g., conditional returns).  It will also make macaw-refinement
simpler in the future, as the SMT-based refinement is just an additional block
classifier (but is currently implemented via a hacky side channel).

This change introduces an ancillary change, which should not be very
user-visible.

It splits the Macaw.Discovery and Macaw.Discovery.State modules to break
module import cycles in a way that enables us to expose the classifier.  This
should not be user-visible, as Macaw.Discovery still exports the same
names (with one minor exception that should not appear in user code).

It also moves the definition of the `ArchBlockPrecond` type family; the few
affected places should be updated. User code should probably not be able to see
this.
2021-11-05 18:25:03 -07:00
Ryan Scott
2152108e81
Make LookupFunctionHandle a newtype (#238)
`LookupSyscallHandle` is already a newtype. `LookupFunctionHandle` deserves to
share the love.
2021-10-13 12:10:36 -04:00
Ryan Scott
5547632f65 macaw-x86: Handle sign-extended immediates in def_push
See `Note [Sign-extending immediate operands in push]` in
`Data.Macaw.X86.Semantics` for the full story. I have also added a test case
in `macaw-x86-symbolic` which ensures that the stack-pointer-decrementing
logic behaves as one would expect.

Bumps in the `flexdis86` submodule to bring in GaloisInc/flexdis86#37.

Fixes #235.
2021-10-12 16:37:21 -04:00
Ryan Scott
4fc8a46ed6 Bump semmc submodule to bring in GaloisInc/semmc#67 2021-10-12 16:37:21 -04:00
Ryan Scott
49a6d9d7b8
stripPLTRead: Avoid referencing dropped IDs in ArchStates (#233)
It is possible for later `ArchState` updates to reference the `AssignId` of
the `AssignStmt` that is dropped as a part of `stripPLTRead`, so make sure to
prune such updates from the `ArchState` to avoid referencing the now
out-of-scope `AssignId`.
2021-09-28 09:17:20 -04:00
Ryan Scott
fbd7bce217
Add Show{,F} instances for MacawCrucibleValue (#232)
I found these useful while printf debugging.
2021-09-28 08:52:05 -04:00
Brett Boston
a336895da7
Add optional override for MacawArchStmtExtensions to genArchVals (#230)
This change adds an optional argument to `genArchVals` that allows client code to override the backend translation behavior of `MacawArchStmtExtension`s on a statement-by-statement basis.  The new argument has type `Maybe (MacawArchStmtExtensionOverride arch)`, where `MacawArchStmtExtensionOverride` is a function that takes a statement and a crucible state, and returns an optional tuple containing the value produced by the statement, as well as an updated state.  Returning 'Nothing' indicates that the backend should use its default handler for the statement.

Client code that wishes to maintain the existing default behavior in all cases can simply pass `Nothing` for the new argument to `genArchVals`.
2021-09-14 18:24:47 -07:00
Andrew Kent
5906f34a63 doc: fix MemCmp docs w.r.t. semantics of return value 2021-09-10 16:16:41 -07:00
Tristan Ravitch
380d732d0e
Implement system call support for x86 (#226)
Implement support for symbolically executing system calls in macaw-symbolic. **To update code that does not need to symbolically execute system calls (i.e., most clients of macaw-symbolic), just pass the new `unsupportedSyscalls` default handler as the fifth argument of `macawExtensions`.**

The primary interface is via the new `LookupSyscallHandle` callback passed to `macawExtensions`. This callback inspects the environment and returns a Crucible `FunctionHandle` that models the behavior of the requested system call. Note that this mechanism only supports concrete system calls (i.e., system calls where the system call number is concrete). The x86 backend has been updated to support this new functionality.

The representation of system calls in macaw is still architecture-specific (because there are interesting differences between system call instructions across architectures). The idea is that system calls are now treated in two steps:
1. A macaw-symbolic extension statement that looks up the override to invoke for the given syscall (returned as a Crucible FunctionHandle)
2. A call to that handle

We need this two step approach because the handlers that interpret syntax extension statements cannot symbolically branch (and thus cannot call overrides). The extension interpreter just looks up the necessary handle and uses the standard call/override machinery to handle any branching required to support the system call model functionality.

The major complication to this approach is that system calls need to update values in registers when they return. To capture these updates, the architecture-specific syntax extension needs to explicitly update any machine registers that could possibly be affected. The explicit updates are necessary because machine registers do not exist anymore at the macaw-symbolic level (at least within a block). To handle all of these constraints:
1. System calls are represented as extension functions at the macaw level when lifted from machine code.
2. During translation into crucible (via macaw-symbolic), the extension functions are translated into two statements: a function handle lookup and then a function call (with the return values being explicitly threaded through the Crucible function).
3. During symbolic execution, the lookup statement examines the environment to return the necessary function handle, while the handle is called via the normal machinery.

Note that the feature is entirely controlled by the `LookupSyscallHandle` function, which determines the system call dispatch policy. No system call models are included with this change.

Co-authored-by: Brett Boston <boston@galois.com>
2021-08-27 15:47:40 -07:00
Ryan Scott
7c95f5d874
Adapt to crucible-symio and even more MemOptions (#225)
This bumps the `crucible` submodule to bring in:

* `crucible-symio` (GaloisInc/crucible#788). This requires adding a new
  project dependency in `cabal.project.dist`.
* GaloisInc/crucible#808, which adds yet another `?memOpts :: MemOptions`
  constraint, this time in `doPtrAddOffset`.
2021-08-26 08:31:28 -04:00
Ryan Scott
7f7de2a59b
Adapt to GaloisInc/crucible#794 (#224)
GaloisInc/crucible#794 increases the number of functions that use
implicit `MemOptions`, including a handful of key LLVM memory model–related
functions. As a result, many parts of `macaw` need to add implicit `?memOpts`
parameters to accommodate to this change.
2021-08-23 20:39:08 -04:00
Tristan Ravitch
a0107f1f9d
Merge pull request #223 from GaloisInc/tr/ppc32-ipalign
ppc32: Add IP alignment rules for code discovery
2021-08-23 10:43:51 -07:00
Tristan Ravitch
7d1fa366c2 ppc32: Add IP alignment rules for code discovery
This was stubbed out before. It is basically identical to the ppc64 version,
except with offsets corrected.
2021-08-23 08:35:56 -07:00
Tristan Ravitch
6b712c6280
Fix the default ppc32 ABI (#222)
None of the common default ppc32 ABIs use a Table of Contents (TOC), so default
our code to not assume it either. This has accompanying changes in
macaw-loader-ppc, which also made incorrect assumptions about ppc32.

Note that we may eventually need to support rarely-used ABIs that do use a
TOC (or similar dedicated registers, e.g., the Small Data Area mode). When we
do, we will probably want that to be a data-oriented decision rather than a
type-level one, as each architecture supports multiple ABIs. We may also need to
modify ppc64 to support ABIs without TOCs, but we'll do it when we need to.
2021-08-19 07:41:29 -07:00
Tristan Ravitch
9069e114fd
Merge pull request #220 from GaloisInc/tr/pic-arm-fixes
Fix issues analyzing Position Independent Executables in AArch32
2021-07-21 13:16:55 -07:00
Tristan Ravitch
37d8029c00 Remove an incorrect assumption about addresses in the ARM decode logic
It was previously assuming that addresses are absolute, which is not true for
position independent executables.  Extracting the offset from the address is
sufficient for our purposes here (note that taking the offset from the
`MemSegmentOffset` would not be right, as that offset is relative to the segment
start).
2021-07-21 11:44:20 -07:00
Tristan Ravitch
6fa6010473 Add more detail to ARM memory errors
Record the decode mode in the error, which can help those interpreting errors
understand what state the decoder was in when it failed.
2021-07-21 11:43:49 -07:00
Tristan Ravitch
764de152ce
Merge pull request #219 from GaloisInc/tr/submodules
Tr/submodules
2021-07-16 07:37:09 -07:00
Tristan Ravitch
10ef849895 Warning cleanup 2021-07-15 21:22:57 -07:00
Tristan Ravitch
df839de678 Update submodules
This commit adapts to recent changes in crucible and raises the version bounds
to admit newer versions of what4.
2021-07-15 21:22:30 -07:00
Sam Breese
135fb062bb
x86: Fix semantics for BSF and BSR instructions (#216)
* x86: Fix semantics for BSF and BSR instructions

* Add a test for BSR and BSF
2021-07-13 14:14:59 -04:00
Joe Hendrix
e01155e32f
Merge pull request #218 from GaloisInc/jhx/fix
Specify function name in register use error message
2021-07-12 16:22:39 -07:00
Joe Hendrix
4afe6fd82e Specify function name in register use error message 2021-07-12 11:59:43 -07:00
Tristan Ravitch
a0aaa6e083
Attempt to use self-hosted CI runners (#215)
Switch the github actions configuration to use internally-hosted runners, which have enough memory to complete all of the CI jobs
2021-07-12 11:58:21 -07:00
Tristan Ravitch
f2e3d9652c
Merge pull request #214 from GaloisInc/tr/arm-pc-fix
Tr/arm pc fix
2021-07-10 20:28:22 -07:00
Tristan Ravitch
cb02157ca6
Merge branch 'master' into tr/arm-pc-fix 2021-07-09 11:03:18 -07:00
Tristan Ravitch
f13eb7c01f Add an additional case handling reading/writing the PC directly
We originally left this case out to catch cases where the PC was written to
directly (skipping the special `LoadWritePC` logic in the semantics). At this
point we are confident that the ARM semantics handle that correctly. The
translation of the semantics into macaw (via TH) are not entirely lazy, and the
interpretation will call this function indirectly (via `get_gpr_uf`); returning
Nothing in the `r15` case caused that part of the translation to fail. The
resulting value is never actually used (because the ARM semantics have special
behavior when reading from `r15`), but the error was too eager and caused a
crash.

This change just lets that code continue.
2021-07-09 10:54:58 -07:00
Joe Hendrix
e4b198ab2d
Merge pull request #211 from GaloisInc/jhx/event
Improve event handling
2021-06-14 22:19:16 -07:00
Joe Hendrix
a646d5fab2 Update submodules 2021-06-14 15:41:32 -07:00
Joe Hendrix
cdc90bd846 Update to more recent flexdis 2021-06-14 13:22:46 -07:00
Joe Hendrix
a50faaebd1 Move isExecutableSegOff check in discovery 2021-06-13 11:55:08 -07:00
Joe Hendrix
cda1d50b28 Expand register use error reasons.
This allows architecture-specific functions to add additional detail.
2021-06-12 12:33:42 -07:00