macaw

mirror of https://github.com/GaloisInc/macaw.git synced 2024-11-25 21:54:51 +03:00

Author	SHA1	Message	Date
Ryan Scott	1add47389a	macaw-x86: Fix `call` semantics when call target involves the stack pointer Previously, the `macaw-x86` semantics for `call` would retrieve the call target after pushing the next instruction's address to the stack, but if the call target involves the stack pointer, then this would mean that it would get the next instruction's address when retrieving the call target. This is not what is intended! This patch fixes the issue by always retrieving the call target before pushing the next instruction's address to the stack. I have added a test case to the `macaw-x86-symbolic` test suite which demonstrates that this fix works as intended. Fixes #420.	2024-08-13 12:31:09 -04:00
Ryan Scott	9954dd6d01	Fix -Wx-partial warnings uncovered by GHC 9.8	2024-08-08 09:34:03 -04:00
Ryan Scott	c1a1449ec2	Enable -Wno-orphans to fix warnings uncovered by GHC 9.8 GHC 9.8 is better about reporting orphan type family instances, which are used in various spots in Macaw. Enable `-Wno-orphans` to suppress these warnings.	2024-08-08 09:34:03 -04:00
Andrei Stefanescu	9d8cdcc587	Add semantics for prefetch instructions. (#365 )	2024-01-10 11:31:36 -08:00
Valentin Robert	28d3c587fc	fix incorrect documentation	2023-09-08 10:17:15 -07:00
Ryan Scott	984f7cb368	Support building with GHC 9.6 This patch contains a handful of tweaks needed to make the libraries in the `macaw` repo build with GHC 9.6: * GHC 9.6 bundles `mtl-2.3.`, which no longer re-exports `Control.Monad`, `Control.Monad.Trans`, and similar modules from `mtl`-related modules. To accommodate this, various imports have been made more explicit. I have disambiguated a use of `Data.Parameterized.NatRepr.withKnownNat` in `macaw-aarch32` to avoid clashing with a newly exported function of the same name in `GHC.TypeNats`. * I have bumped various upper version bounds on `doctest`, `optparse-applicative`, and `what4` to allow building these libraries with GHC 9.6. * I have bumped the following submodules to bring in GHC 9.6–related changes: * `asl-translator`: GaloisInc/asl-translator#53 * `crucible`: GaloisInc/crucible#1102 * `dwarf`: GaloisInc/dwarf#6 * `elf-edit`: GaloisInc/elf-edit#38 * `flexdis86`: GaloisInc/flexdis86#54 * `grift`: GaloisInc/grift#9 * `llvm-pretty`: elliottt/llvm-pretty#112 * `llvm-pretty-bc-parser`: GaloisInc/llvm-pretty-bc-parser#225 * `semmc`: GaloisInc/semmc#80 * `what4`: GaloisInc/what4#235	2023-08-21 08:16:10 -04:00
Valentin Robert	417e8b780b	remove redundant pragmas	2023-08-09 14:31:50 -07:00
Ryan Scott	97c61e471a	Add basic support for simulating PLT stubs and shared libraries This extends `Data.Macaw.Symbolic.Testing` in `macaw-symbolic` to be able to handle binaries that depend on shared libraries. This is fully functional for the x86-64 and AArch32 symbolic backends, and I have added test cases to the respective repos demonstrating that it works. (The PowerPC backend is not yet supported. At a minimum, this is blocked on GaloisInc/elf-edit#35.) To implement this, I also needed to add some additional infrastructure to `macaw-base` (I put this infrastructure here as it doesn't depend on any Crucible-specific functionality): * `Data.Macaw.Memory.ElfLoader.DynamicDependencies`: a basic ELF dynamic loader that performs a breadth-first search over all `DT_NEEDED` entries that an ELF binary depends on (both directly and indirectly). * `Data.Macaw.Memory.ElfLoader.PLTStubs`: a collection of heuristics for detecting the addresses of PLT stubs in a dynamically linked binary. It is worth noting that shared libraries are rife with nuance and subtlety, and the way `macaw` models shared libraries is not 100% accurate. I have written a length `Note [Shared libraries]` in `Data.Macaw.Symbolic.Testing` to describe where corners had to be cut. Fixes #318.	2023-02-23 17:16:12 -05:00
Tristan Ravitch	6a4f406c68	Revisit handling of tail calls It turns out that we have to be more conservative with tail call identification, as incorrectly identifying a block as the target of a tail call (instead of a branch) can cause other branch classifiers to fail if that block is the target of another jump. Ultimately, we will need to give up some tail call recognition (since they are in general indistinguishable from jumps), and instead only identify known call targets as tail call candidates. With additional global analysis we could do better. Fixes #294	2022-06-27 15:02:43 -07:00
Tristan Ravitch	857bb72b31	[x86] Add an option to save macaw IR from test cases	2022-06-27 15:02:43 -07:00
Ryan Scott	6e020bcde6	Fix -Wincomplete-uni-patterns warnings GHC 9.2 adds `-Wincomplete-uni-patterns` to `-Wall`, which uncovers a slew of previously unnoticed warnings in `macaw`. This patch fixes them, mostly by adding explicit fall-through cases.	2022-05-31 15:50:48 -04:00
Ryan Scott	6237d615c3	Fix -Wnoncanonical-monad-instances warnings GHC 9.2 adds `-Wnoncanonical-monad-instances` to `-Wall`, which warns whenever one has explicit implementations of `return` or `(>>)` that aren't simply `return = pure` or `(>>) = (*>)`. Since these are the default implementations of `return` and `(>>)` since `base-4.11`, the simplest way to fix the warnings is to simply remove all explicit definitions of `return` and `(>>)` and rely on the defaults, which this patch accomplishes.	2022-05-31 15:50:48 -04:00
Tristan Ravitch	8e10643b0f	Fix tail call classification (#286 ) The tail call classifier came after the jump classifier, which was a problem because it is less strict than the tail call classifier, meaning it would always fire. This commit moves direct jump to be the last classifier applied, giving the others a chance. Includes a test case in the ARM backend. This requires some updates to some of the expected test results, as a few blocks are now classified as tail calls that were plain jumps before. They really could be considered either. I think it would be nice if these could be classified as jumps instead, but the reason they are flagged as tail calls is mostly down to the fact that their surrounding context is so simple that either interpretation works. Correcting this would require some heuristics based on additional analysis passes. The test harness for macaw symbolic required a few changes because the new detection of some jumps as tail calls introduces new calls into the symbolic test suites. However, the symbolic testing harness did not support calls before. Adding support required a bit of plumbing, including a more extensive code discovery pass. Fixes #285	2022-05-10 07:29:55 -07:00
Brett Boston	a5796fc955	Reverse syscall override return register ordering (#284 ) When a user overrides a system call on an architecture that supports returning two values from a system call and they provide a context containing the result of the system call in the form ``` empty :> v0 :> v1 ``` macaw will perform the register assignment ``` r0 := v1 r1 := v0 ``` This change reverses this behavior so that the assignment becomes ``` r0 := v0 r1 := v1 ``` This brings the expected ordering of the result context in agreement with the left-to-right ordering of the argument context: ``` empty :> arg1 :> arg2 :> ... ```	2022-05-04 12:41:02 -07:00
Ryan Scott	ce10bc9243	Drop support for GHC 8.6 This allows us to remove gobs of CPP as a consequence.	2022-01-10 16:40:23 -05:00
Tristan Ravitch	9ce3d43188	AArch32: Support conditional returns (#243 ) Adds support in macaw-aarch32 for conditional returns. These are not supported in core macaw, and are thus architecture-specific block terminators. This required changes to the type of arch-specific block terminators. Before, `ArchTermStmt` was only parameterized by a state thread (`ids`). This meant that they could not contain macaw (or crucible) values. Some work on. AArch32 requires being able to store condition values in arch terminators (to support conditional returns). This change modifies the `ArchTermStmt` to enable this, which requires a bit of plumbing through various definitions and some extra instances. In support of actually using this, it also became necessary to plumb fallthrough block labels through the architecture-specific terminator translation in macaw-symbolic. Note that this change was overdue, as the PowerPC backend was storing macaw values in a way that would have rendered them unusable in the macaw-ppc-symbolic translation, had any interpretation been provided. These new changes will enable a handler to be written for the conditional PowerPC trap instructions. PowerPC, x86, and ARM have been updated. Improves the macaw-aarch32 tests. There is now a command line option to save the generated macaw IR for each discovered function to /tmp. Note that this reuses some infrastructure from the macaw-symbolic tests. This shared functionality should be extracted into a macaw-testing library.	2021-11-19 16:20:50 -08:00
Tristan Ravitch	2c85dce18e	Expose block classification in the ArchitectureInfo This change makes the block classifier heuristic part of the `ArchitectureInfo` structure. This enables clients and architecture backends to customize the block classification heuristics. This is most useful for architectures that have complex architecture-specific block terminators that require analysis to generate (e.g., conditional returns). It will also make macaw-refinement simpler in the future, as the SMT-based refinement is just an additional block classifier (but is currently implemented via a hacky side channel). This change introduces an ancillary change, which should not be very user-visible. It splits the Macaw.Discovery and Macaw.Discovery.State modules to break module import cycles in a way that enables us to expose the classifier. This should not be user-visible, as Macaw.Discovery still exports the same names (with one minor exception that should not appear in user code). It also moves the definition of the `ArchBlockPrecond` type family; the few affected places should be updated. User code should probably not be able to see this.	2021-11-05 18:25:03 -07:00
Ryan Scott	5547632f65	`macaw-x86`: Handle sign-extended immediates in `def_push` See `Note [Sign-extending immediate operands in push]` in `Data.Macaw.X86.Semantics` for the full story. I have also added a test case in `macaw-x86-symbolic` which ensures that the stack-pointer-decrementing logic behaves as one would expect. Bumps in the `flexdis86` submodule to bring in GaloisInc/flexdis86#37. Fixes #235.	2021-10-12 16:37:21 -04:00
Andrew Kent	5906f34a63	doc: fix MemCmp docs w.r.t. semantics of return value	2021-09-10 16:16:41 -07:00
Tristan Ravitch	380d732d0e	Implement system call support for x86 (#226 ) Implement support for symbolically executing system calls in macaw-symbolic. To update code that does not need to symbolically execute system calls (i.e., most clients of macaw-symbolic), just pass the new `unsupportedSyscalls` default handler as the fifth argument of `macawExtensions`. The primary interface is via the new `LookupSyscallHandle` callback passed to `macawExtensions`. This callback inspects the environment and returns a Crucible `FunctionHandle` that models the behavior of the requested system call. Note that this mechanism only supports concrete system calls (i.e., system calls where the system call number is concrete). The x86 backend has been updated to support this new functionality. The representation of system calls in macaw is still architecture-specific (because there are interesting differences between system call instructions across architectures). The idea is that system calls are now treated in two steps: 1. A macaw-symbolic extension statement that looks up the override to invoke for the given syscall (returned as a Crucible FunctionHandle) 2. A call to that handle We need this two step approach because the handlers that interpret syntax extension statements cannot symbolically branch (and thus cannot call overrides). The extension interpreter just looks up the necessary handle and uses the standard call/override machinery to handle any branching required to support the system call model functionality. The major complication to this approach is that system calls need to update values in registers when they return. To capture these updates, the architecture-specific syntax extension needs to explicitly update any machine registers that could possibly be affected. The explicit updates are necessary because machine registers do not exist anymore at the macaw-symbolic level (at least within a block). To handle all of these constraints: 1. System calls are represented as extension functions at the macaw level when lifted from machine code. 2. During translation into crucible (via macaw-symbolic), the extension functions are translated into two statements: a function handle lookup and then a function call (with the return values being explicitly threaded through the Crucible function). 3. During symbolic execution, the lookup statement examines the environment to return the necessary function handle, while the handle is called via the normal machinery. Note that the feature is entirely controlled by the `LookupSyscallHandle` function, which determines the system call dispatch policy. No system call models are included with this change. Co-authored-by: Brett Boston <boston@galois.com>	2021-08-27 15:47:40 -07:00
Joe Hendrix	cdc90bd846	Update to more recent flexdis	2021-06-14 13:22:46 -07:00
Joe Hendrix	ceb64be843	Sort x86 functions for easier browsing.	2021-04-27 23:54:11 -07:00
Sam Breese	8a0c760886	x86: Add semantics for SHA256 instructions (#196 ) * Add semantics for SHA256 instructions * Use an additional helper function * Address comments	2021-03-30 18:32:35 -04:00
Joe Hendrix	8756d2e9d3	Minor layout changes	2021-01-29 12:01:16 -08:00
Sam Breese	d5e4a441cd	x86: Add semantics for aesimc (#177 ) * x86: Add semantics for aesimc * x86: Use safeSymbol rather than userSymbol	2021-01-11 13:24:16 -05:00
Sam Breese	2bd0633ba8	x86: Fix semantics for pinsrw, add semantics for pinsr{b,d,q} (#183 ) * x86: Fix semantics for pinsrw, add semantics for pinsr{b,d,q} * x86: Add comments on exec_pinsrx parameters	2020-12-22 15:44:55 -05:00
Brian Huffman	b3af7d63e9	Use OverloadedStrings for the prettyprinter Doc type.	2020-12-02 17:23:47 -08:00
Brian Huffman	2a620d41de	Switch from `ansi-wl-pprint` to the `prettyprinter` package. This patch relies on the following submodule updates: - GaloisInc/what4#77 - GaloisInc/elf-edit#20 - GaloisInc/crucible#586 - GaloisInc/asl-translator#28 This patch updates the following packages: - macaw-base - macaw-symbolic - macaw-x86 - macaw-x86-symbolic - macaw-aarch32 - macaw-ppc - macaw-semmc - macaw-refinement	2020-12-02 11:38:19 -08:00
Sam Breese	2a56e404bd	x86: Special case for sbb with duplicated operand (#176 ) This better handles cases like sbb rax, rax, where we know that the result will be -cf regardless of the value in rax.	2020-11-18 04:19:27 -05:00
Joe Hendrix	d2b81d3c2f	Fixes for jump table tests. * Update macaw-x86-tests to build properly. * Fix off by two error in memMapOverwrite * Introduce some special handling for unsigned-extension in stack analysis so it knows one value is the unsigned extension of another. * Error report formating improvements * Slightly more precise treatment of archfn is bound updates.	2020-11-12 11:25:30 -08:00
Joe Hendrix	22a9104faa	Various cleanups. Consolidate three different checks that control when to explore a function into a single one defined in exploreFunPred. Modify noreturn function calls to not treat the return address as a potential function entry point. Add basic checking of LSDA address to compare-dwarfdump. Minor code refactoring and submodule updates.	2020-11-06 14:37:13 -08:00
Joe Hendrix	9203a37b94	Minor cleanups; dwarf updates	2020-11-06 14:35:06 -08:00
Lisanna Dettwyler	47544e4b2d	Fix warnings in GHC 8.10	2020-10-20 13:53:22 -07:00
Sam Breese	34e7394c14	x86: Implement semantics for a few instructions (#167 ) * x86: Add aesenc, aesenclast, aesdec, aesdeclast * x86: Add vpcmpgtd * WIP * Implement Pshufb * Fix AESNI_AESEncLast. * Fix PtCmpGt. * Refactor AESNI instructions a bit * Finish refactoring * Forgot MultiWayIf * Reduce duplication a bit * Address comments Co-authored-by: Andrei Stefanescu <andrei@stefanescu.io>	2020-10-08 19:37:17 -04:00
Sam Breese	5fdfdb2eaa	x86: Add some entries to the export list of Data.Macaw.X86 (#164 )	2020-10-08 18:25:30 -04:00
Sam Breese	b248cf7f45	x86: Add semantics for vpunpckhqdq, vxorps, vpaddb, and movbe (#155 ) * Add semantics for vpunpckhqdq * Add semantics for movbe * Add semantics for vxorps and vpaddb	2020-08-20 14:50:20 -04:00
Sam Breese	48737990f3	x86: Add semantics for some AVX2 instructions (#149 ) * x86: Add semantics for the vpsrld and vpsrlq instructions * x86: Add semantics for vpaddq * Fix Haddock for PointwiseLogicalShiftR * x86: Change vpsubd to PtSub rather than PtAdd	2020-07-14 14:41:16 -04:00
Tristan Ravitch	b160e480a7	x86: Add semantics for the endbr instructions (#147 ) This change treats them as no-ops (which is what they do on all released hardware). We could represent them with arch extensions. This has a supporting change in flexdis86 (included as a submodule).	2020-06-25 13:43:15 -07:00
Sam Breese	02c6cc3cb5	Handle bitwise operations on stack offset abstract values (#136 ) - Generalize handling of bitwise operations to also apply them to stack offsets - Use the extended bitwise handling on AND	2020-05-28 14:04:06 -04:00
Brian Huffman	f65c80d7b1	Make code compile without warnings in ghc-8.6 and ghc-8.8.	2020-04-23 20:22:30 -07:00
Tristan Ravitch	c825332f39	Update/ghc 8.8 (#112 ) Updates for GHC 8.8 The two main classes of update are related to MonadFail and type alias expansion. The MonadFail updates introduce explicit MonadFail instances and backward-compatible `fail` implementations under `Monad` for older GHC versions. The type alias expansion rules changed in GHC 8.8 in a way that breaks the `Simple Lens` idiom; instead, we have to use `Lens'`. Lens started supporting this alias in version 3.8, which was released in 2013. This change includes necessary submodule updates, as well as the update for the split of what4 into its own repository.	2020-03-03 13:28:26 -08:00
Joe Hendrix	46be7aa52b	Implement new registerUse analysis. The new registerUse analysis uses a three phase process: Phase 1 computes invariants about the start state of each block. It will indicate when registers/stack locations store stack offsets, and where callee saved registers are stashed. It also memoizes information about stack reads and writes to simplify later passes. Phase 2 is a demand analysis that computes which registers and stack locations must be available to execute the program. It then propagates those constraints across blocks in the function. Phase 3 combines the information into a form relevant for function recovery.	2020-02-06 19:26:46 -08:00
Samuel Breese	fb1611a127	Semantics for MULX from BMI2 and all of ADX	2019-12-19 10:43:54 -05:00
Joe Hendrix	1ed99917b4	Add testcase for non-zero index jumptable.	2019-12-04 14:31:45 -08:00
Joe Hendrix	df9b5bbe27	Support for offset jump tables.	2019-11-19 14:52:58 -08:00
Joe Hendrix	1be68af2a0	Fix warnings.	2019-10-21 21:18:54 -07:00
Joe Hendrix	81d0469fbe	Group mod/div x86 functions.	2019-10-21 14:59:43 -07:00
Joe Hendrix	744424d28b	Remove unused X86PrimLoc.	2019-09-20 15:19:37 -07:00
Joe Hendrix	5e834122d1	Segment register updates; stack offset calculation.	2019-09-20 13:58:05 -07:00
Joe Hendrix	7aee0cd803	Remove unused debug reg code.	2019-09-09 00:55:28 -07:00

1 2 3 4 5

242 Commits