macaw/symbolic/macaw-symbolic.cabal

62 lines
1.3 KiB
Plaintext
Raw Normal View History

name: macaw-symbolic
version: 0.0.1
author: Galois, Inc.
maintainer: jhendrix@galois.com
build-type: Simple
2019-02-27 22:52:18 +03:00
cabal-version: >= 1.10
2017-09-28 01:59:06 +03:00
license: BSD3
license-file: LICENSE
library
build-depends:
base >= 4.13,
bv-sized >= 1.0.0,
containers,
Add a testing framework for macaw-symbolic (#184) The new test suites cover x86_64, PowerPC, and ARM. They test that the semantics are actually correct (rather than just seeing if symbolic execution produces any result). The `Data.Macaw.Symbolic.Testing` module in macaw-symbolic provides some common utilities for symbolic execution engine setup, while there are tailored test harnesses for each architecture. The semantics of the test harnesses are documented in each architecture test suite, but they: 1. Discover all of the test binaries (which are generated from the included makefiles) 2. Treat each function whose name begins with `test_` as a test entry point 3. Symbolically executes each test case with fully symbolic register states 4. Extracts the return value after symbolic execution, which is treated as the predicate to an assertion that must be proved - If the test case is in the `pass` subdirectory, it is proved and expected to hold - If the test case is in the `fail` subdirectory, it is proved and expected to not hold. Each test harness supports two options for debugging: - Dumping generated SMT queries - Dumping generated Macaw IR for inspection This testing uncovered a bug in the (previously untested) macaw-aarch32-symbolic code. It required a number of submodule updates to: - Adapt to some what4 changes - Fix a bug in the LLVM memory model that lets these tests pass - Adapt to changes to some crucible APIs This change also modifies the CI configuration to install SMT solvers earlier (which are now needed for all of the symbolic package tests).
2021-03-01 20:21:44 +03:00
elf-edit,
filepath,
2019-01-11 05:10:30 +03:00
IntervalMap >= 0.6 && < 0.7,
2018-06-06 21:48:45 +03:00
crucible >= 0.4,
crucible-llvm,
indexed-traversable,
lens,
macaw-base,
mtl,
parameterized-utils,
prettyprinter >= 1.7.0,
split,
text,
2019-01-11 05:10:30 +03:00
vector,
bytestring,
what4 >= 1.1 && < 1.7
hs-source-dirs: src
exposed-modules:
2019-01-11 05:10:30 +03:00
Data.Macaw.Symbolic
Data.Macaw.Symbolic.Backend
Data.Macaw.Symbolic.Concretize
2019-01-11 05:10:30 +03:00
Data.Macaw.Symbolic.Memory
Data.Macaw.Symbolic.Memory.Lazy
2020-09-02 03:07:04 +03:00
Data.Macaw.Symbolic.MemOps
Add a testing framework for macaw-symbolic (#184) The new test suites cover x86_64, PowerPC, and ARM. They test that the semantics are actually correct (rather than just seeing if symbolic execution produces any result). The `Data.Macaw.Symbolic.Testing` module in macaw-symbolic provides some common utilities for symbolic execution engine setup, while there are tailored test harnesses for each architecture. The semantics of the test harnesses are documented in each architecture test suite, but they: 1. Discover all of the test binaries (which are generated from the included makefiles) 2. Treat each function whose name begins with `test_` as a test entry point 3. Symbolically executes each test case with fully symbolic register states 4. Extracts the return value after symbolic execution, which is treated as the predicate to an assertion that must be proved - If the test case is in the `pass` subdirectory, it is proved and expected to hold - If the test case is in the `fail` subdirectory, it is proved and expected to not hold. Each test harness supports two options for debugging: - Dumping generated SMT queries - Dumping generated Macaw IR for inspection This testing uncovered a bug in the (previously untested) macaw-aarch32-symbolic code. It required a number of submodule updates to: - Adapt to some what4 changes - Fix a bug in the LLVM memory model that lets these tests pass - Adapt to changes to some crucible APIs This change also modifies the CI configuration to install SMT solvers earlier (which are now needed for all of the symbolic package tests).
2021-03-01 20:21:44 +03:00
Data.Macaw.Symbolic.Testing
2019-01-11 05:10:30 +03:00
other-modules:
This commit re-implements the memory model used by macaw symbolic There are two major changes: - The interface to memory models in Data.Macaw.Symbolic has changed - The suggested implementation in Data.Macaw.Symbolic.Memory has changed The change improves performance and fixes a soundness bug. * `macawExtensions` (Data.Macaw.Symbolic) takes a new argument: a `MkGlobalPointerValidityPred`. Use `mkGlobalPointerValidityPred` to provide one. * `mapRegionPointers` no longer takes a default pointer argument (delete it at call sites) * `GlobalMap` returns an `LLVMPtr sym w` instead of a `Maybe (LLVMPtr sym w)` Users of the suggested memory model do not need to worry about the last change, as it has been migrated. If you provided your own address mapping function, it must now be total. This is annoying, but the old API was unsound because macaw-symbolic did not have enough information to correctly handle the `Nothing` case. The idea of the change is that the mapping function should translate any concrete pointers as appropriate, while symbolic pointers should generate a mux over all possible allocations. Unfortunately, macaw-symbolic does not have enough information to generate the mux itself, as there may be allocations created externally. This interface and implementation is concerned with handling pointers to static memory in a binary. These are distinguished from pointers to dynamically-allocated or stack memory because many machine code instructions compute bitvectors and treat them as pointers. In the LLVM memory model used by macaw-symbolic, each memory allocation has a block identifier (a natural number). The stack and each heap allocation get unique block identifiers. However, pointers to static memory have no block identifier and must be mapped to a block in order to fit into the LLVM memory model. The previous implementation assigned each mapped memory segment in a binary to its own LLVM memory allocation. This had the nice property of implicitly proving that no memory access was touching unmapped memory. Unfortunately, it was especially inefficient in the presence of symbolic reads and writes, as it generated mux trees over all possible allocations and significantly slowed symbolic execution. The new memory model implementation (in Data.Macaw.Symbolic.Memory) instead uses a single allocation for all static allocations. This pushes more of the logic for resolving reads and writes into the SMT solver and the theory of arrays. In cases where sufficient constraints exist in path conditions, this means that we can support symbolic reads and writes. Additionally, since we have only a single SMT array backing all allocations, mapping bitvectors to LLVM pointers in the memory model is trivial: we just change their block identifier from zero (denoting a bitvector) to the block identifier of the unique allocation backing static data. This change has to do some extra work to ensure safety (especially that unmapped memory is never written to or read from). This is handled with the MkGlobalPointerValidityPred interface in Data.Macaw.Symbolic. This function, which is passed to the macaw-symbolic initialization, constructs well-formedness predicates for all pointers used to access memory. Symbolic execution tasks that do not need to enforce this property can simply provide a function that never returns any predicates to check. Implementations that want a reasonable default can use the mkGlobalPointerValidityPred from Data.Macaw.Symbolic.Memory. The default implementation ensures that no reads or writes touch unmapped memory and that writes to static data never write to read-only segments. This change also converts the examples in macaw-symbolic haddocks to use doctest to ensure that they do not get out of date. These are checked as part of CI.
2019-12-08 08:03:28 +03:00
Data.Macaw.Symbolic.Bitcast
2018-01-03 04:31:42 +03:00
Data.Macaw.Symbolic.CrucGen
Data.Macaw.Symbolic.Memory.Common
Data.Macaw.Symbolic.PersistentState
ghc-options: -Wall -Wcompat
ghc-prof-options: -O2 -fprof-auto-top
2019-02-27 22:52:18 +03:00
default-language: Haskell2010
default-extensions: NoStarIsType
This commit re-implements the memory model used by macaw symbolic There are two major changes: - The interface to memory models in Data.Macaw.Symbolic has changed - The suggested implementation in Data.Macaw.Symbolic.Memory has changed The change improves performance and fixes a soundness bug. * `macawExtensions` (Data.Macaw.Symbolic) takes a new argument: a `MkGlobalPointerValidityPred`. Use `mkGlobalPointerValidityPred` to provide one. * `mapRegionPointers` no longer takes a default pointer argument (delete it at call sites) * `GlobalMap` returns an `LLVMPtr sym w` instead of a `Maybe (LLVMPtr sym w)` Users of the suggested memory model do not need to worry about the last change, as it has been migrated. If you provided your own address mapping function, it must now be total. This is annoying, but the old API was unsound because macaw-symbolic did not have enough information to correctly handle the `Nothing` case. The idea of the change is that the mapping function should translate any concrete pointers as appropriate, while symbolic pointers should generate a mux over all possible allocations. Unfortunately, macaw-symbolic does not have enough information to generate the mux itself, as there may be allocations created externally. This interface and implementation is concerned with handling pointers to static memory in a binary. These are distinguished from pointers to dynamically-allocated or stack memory because many machine code instructions compute bitvectors and treat them as pointers. In the LLVM memory model used by macaw-symbolic, each memory allocation has a block identifier (a natural number). The stack and each heap allocation get unique block identifiers. However, pointers to static memory have no block identifier and must be mapped to a block in order to fit into the LLVM memory model. The previous implementation assigned each mapped memory segment in a binary to its own LLVM memory allocation. This had the nice property of implicitly proving that no memory access was touching unmapped memory. Unfortunately, it was especially inefficient in the presence of symbolic reads and writes, as it generated mux trees over all possible allocations and significantly slowed symbolic execution. The new memory model implementation (in Data.Macaw.Symbolic.Memory) instead uses a single allocation for all static allocations. This pushes more of the logic for resolving reads and writes into the SMT solver and the theory of arrays. In cases where sufficient constraints exist in path conditions, this means that we can support symbolic reads and writes. Additionally, since we have only a single SMT array backing all allocations, mapping bitvectors to LLVM pointers in the memory model is trivial: we just change their block identifier from zero (denoting a bitvector) to the block identifier of the unique allocation backing static data. This change has to do some extra work to ensure safety (especially that unmapped memory is never written to or read from). This is handled with the MkGlobalPointerValidityPred interface in Data.Macaw.Symbolic. This function, which is passed to the macaw-symbolic initialization, constructs well-formedness predicates for all pointers used to access memory. Symbolic execution tasks that do not need to enforce this property can simply provide a function that never returns any predicates to check. Implementations that want a reasonable default can use the mkGlobalPointerValidityPred from Data.Macaw.Symbolic.Memory. The default implementation ensures that no reads or writes touch unmapped memory and that writes to static data never write to read-only segments. This change also converts the examples in macaw-symbolic haddocks to use doctest to ensure that they do not get out of date. These are checked as part of CI.
2019-12-08 08:03:28 +03:00
test-suite doctests
type: exitcode-stdio-1.0
default-language: Haskell2010
hs-source-dirs: test
main-is: doctest.hs
ghc-options: -Wall -Wcompat -threaded
build-depends: base, macaw-base, macaw-symbolic, doctest >= 0.10 && < 0.23
Support building with GHC 9.0 This contains a variety of fixes needed to make the packages in the `macaw` repo compile with GHC 9.0: * GHC 9.0 implements simplified subsumption (see [here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#simplified-subsumption)). In most cases, adapting to this is a matter of manually eta expanding definitions, such as in `base:Data.Macaw.Analysis.RegisterUse`. In the case of `macaw-x86-symbolic:Data.Macaw.X86.Crucible`, the type signature of `evalExt` had to be made more specific to adapt to the loss of contravariance when typechecking `(->)`. * GHC's constraint solver now solves constraints in each top-level group sooner (see [here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#the-order-of-th-splices-is-more-important)). This affects `macaw-aarch32` and `macaw-symbolic`, as they separate top-level groups with `$(return [])` Template Haskell splices. The previous locations of these splices made it so that the TH-generated instances in that package were not available to any code before the splice, resulting in type errors when compiled with GHC 9.0. To overcome this, I rearranged the TH-generated instances so that they appear before the top-level groups that make use of them. * GHC 9.0 now enables `-Wstar-is-type` in `-Wall`, so this patch replaces some uses of `*` with `Data.Kind.Type`. `Data.Kind` requires the use of GHC 8.0 or later, so this patch also updates thes lower bounds on `base` to `>= 4.9` in the appropriate `.cabal` files. (I'm fairly certain that this requirement was already present implicity, but better to be explicit about it.) * The `asl-translator`, `crucible`, and `semmc` submodules were updated to allow them to build with GHC 9.0. The `llvm-pretty` and `llvm-pretty-bc-parser` submodules were also bumped to accommodate unrelated changes in `crucible` that were brought in. * The upper version bounds on `doctest` in `macaw-symbolic`'s test suite were raised to allow it to build with GHC 9.0.
2021-06-24 01:15:00 +03:00
This commit re-implements the memory model used by macaw symbolic There are two major changes: - The interface to memory models in Data.Macaw.Symbolic has changed - The suggested implementation in Data.Macaw.Symbolic.Memory has changed The change improves performance and fixes a soundness bug. * `macawExtensions` (Data.Macaw.Symbolic) takes a new argument: a `MkGlobalPointerValidityPred`. Use `mkGlobalPointerValidityPred` to provide one. * `mapRegionPointers` no longer takes a default pointer argument (delete it at call sites) * `GlobalMap` returns an `LLVMPtr sym w` instead of a `Maybe (LLVMPtr sym w)` Users of the suggested memory model do not need to worry about the last change, as it has been migrated. If you provided your own address mapping function, it must now be total. This is annoying, but the old API was unsound because macaw-symbolic did not have enough information to correctly handle the `Nothing` case. The idea of the change is that the mapping function should translate any concrete pointers as appropriate, while symbolic pointers should generate a mux over all possible allocations. Unfortunately, macaw-symbolic does not have enough information to generate the mux itself, as there may be allocations created externally. This interface and implementation is concerned with handling pointers to static memory in a binary. These are distinguished from pointers to dynamically-allocated or stack memory because many machine code instructions compute bitvectors and treat them as pointers. In the LLVM memory model used by macaw-symbolic, each memory allocation has a block identifier (a natural number). The stack and each heap allocation get unique block identifiers. However, pointers to static memory have no block identifier and must be mapped to a block in order to fit into the LLVM memory model. The previous implementation assigned each mapped memory segment in a binary to its own LLVM memory allocation. This had the nice property of implicitly proving that no memory access was touching unmapped memory. Unfortunately, it was especially inefficient in the presence of symbolic reads and writes, as it generated mux trees over all possible allocations and significantly slowed symbolic execution. The new memory model implementation (in Data.Macaw.Symbolic.Memory) instead uses a single allocation for all static allocations. This pushes more of the logic for resolving reads and writes into the SMT solver and the theory of arrays. In cases where sufficient constraints exist in path conditions, this means that we can support symbolic reads and writes. Additionally, since we have only a single SMT array backing all allocations, mapping bitvectors to LLVM pointers in the memory model is trivial: we just change their block identifier from zero (denoting a bitvector) to the block identifier of the unique allocation backing static data. This change has to do some extra work to ensure safety (especially that unmapped memory is never written to or read from). This is handled with the MkGlobalPointerValidityPred interface in Data.Macaw.Symbolic. This function, which is passed to the macaw-symbolic initialization, constructs well-formedness predicates for all pointers used to access memory. Symbolic execution tasks that do not need to enforce this property can simply provide a function that never returns any predicates to check. Implementations that want a reasonable default can use the mkGlobalPointerValidityPred from Data.Macaw.Symbolic.Memory. The default implementation ensures that no reads or writes touch unmapped memory and that writes to static data never write to read-only segments. This change also converts the examples in macaw-symbolic haddocks to use doctest to ensure that they do not get out of date. These are checked as part of CI.
2019-12-08 08:03:28 +03:00