This contains a variety of fixes needed to make the packages in the `macaw`
repo compile with GHC 9.0:
* GHC 9.0 implements simplified subsumption (see
[here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#simplified-subsumption)).
In most cases, adapting to this is a matter of manually eta expanding
definitions, such as in `base:Data.Macaw.Analysis.RegisterUse`. In the case
of `macaw-x86-symbolic:Data.Macaw.X86.Crucible`, the type signature of
`evalExt` had to be made more specific to adapt to the loss of contravariance
when typechecking `(->)`.
* GHC's constraint solver now solves constraints in each top-level group
sooner (see
[here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#the-order-of-th-splices-is-more-important)).
This affects `macaw-aarch32` and `macaw-symbolic`, as they separate top-level
groups with `$(return [])` Template Haskell splices. The previous locations
of these splices made it so that the TH-generated instances in that package
were not available to any code before the splice, resulting in type errors
when compiled with GHC 9.0.
To overcome this, I rearranged the TH-generated instances so that they appear
before the top-level groups that make use of them.
* GHC 9.0 now enables `-Wstar-is-type` in `-Wall`, so this patch replaces some
uses of `*` with `Data.Kind.Type`. `Data.Kind` requires the use of GHC 8.0 or
later, so this patch also updates thes lower bounds on `base` to `>= 4.9` in
the appropriate `.cabal` files. (I'm fairly certain that this requirement was
already present implicity, but better to be explicit about it.)
* The `asl-translator`, `crucible`, and `semmc` submodules were updated to
allow them to build with GHC 9.0. The `llvm-pretty` and
`llvm-pretty-bc-parser` submodules were also bumped to accommodate unrelated
changes in `crucible` that were brought in.
* The upper version bounds on `doctest` in `macaw-symbolic`'s test suite were
raised to allow it to build with GHC 9.0.
Adds support in macaw-aarch32 for conditional returns. These are not supported in core macaw, and are thus architecture-specific block terminators.
This required changes to the type of arch-specific block terminators. Before, `ArchTermStmt` was only parameterized by a state thread (`ids`). This meant that they could not contain macaw (or crucible) values. Some work on. AArch32 requires being able to store condition values in arch terminators (to support conditional returns). This change modifies the `ArchTermStmt` to enable this, which requires a bit of plumbing through various definitions and some extra instances.
In support of actually using this, it also became necessary to plumb fallthrough block labels through the architecture-specific terminator translation in macaw-symbolic.
Note that this change was overdue, as the PowerPC backend was storing macaw values in a way that would have rendered them unusable in the macaw-ppc-symbolic translation, had any interpretation been provided. These new changes will enable a handler to be written for the conditional PowerPC trap instructions.
PowerPC, x86, and ARM have been updated.
Improves the macaw-aarch32 tests. There is now a command line option to save the generated macaw IR for each
discovered function to /tmp. Note that this reuses some infrastructure from the macaw-symbolic tests. This
shared functionality should be extracted into a macaw-testing library.
This change makes the block classifier heuristic part of the `ArchitectureInfo`
structure. This enables clients and architecture backends to customize the
block classification heuristics. This is most useful for architectures that
have complex architecture-specific block terminators that require analysis to
generate (e.g., conditional returns). It will also make macaw-refinement
simpler in the future, as the SMT-based refinement is just an additional block
classifier (but is currently implemented via a hacky side channel).
This change introduces an ancillary change, which should not be very
user-visible.
It splits the Macaw.Discovery and Macaw.Discovery.State modules to break
module import cycles in a way that enables us to expose the classifier. This
should not be user-visible, as Macaw.Discovery still exports the same
names (with one minor exception that should not appear in user code).
It also moves the definition of the `ArchBlockPrecond` type family; the few
affected places should be updated. User code should probably not be able to see
this.
None of the common default ppc32 ABIs use a Table of Contents (TOC), so default
our code to not assume it either. This has accompanying changes in
macaw-loader-ppc, which also made incorrect assumptions about ppc32.
Note that we may eventually need to support rarely-used ABIs that do use a
TOC (or similar dedicated registers, e.g., the Small Data Area mode). When we
do, we will probably want that to be a data-oriented decision rather than a
type-level one, as each architecture supports multiple ABIs. We may also need to
modify ppc64 to support ABIs without TOCs, but we'll do it when we need to.
* update to bv-sized branch of what4 and other things
* removed parameterized-utils submodule completely
* Updates submodules
* Fixes macaw-symbolic w.r.t. crucible-llvm changes
Co-authored-by: Ben Selfridge <ben@000548-benselfridge.local>
Improve the TH codegen for macaw-semmc
This change lazily translates as much as possible. It also generates somewhat more compact code. This change also finishes implementing primitives for the aarch32 backend. Complementing the aarch32 changes, the macaw-semmc interface has been modified to allow macaw-aarch32 to avoid a redundant serialize-deserialize round.
Co-authored-by: Kevin Quick <kquick@galois.com>
This code was confusing what "offset" to pass to the `failAt` function. Some
sites were passing the offset from the beginning of the block (correct), while
others passed the offset from the start of the segment (incorrect). The
incorrect values were later used as block sizes, which caused some downstream
failures (in renovate).
This commit uniformly fails with the offset from the start of the block.
This change is probably due to the BitTrie modifications in
dismantle-tablegen. It's not clear whether the older or newer number
of discovered blocks is correct; testing at this point is focused more
on getting roughly the correct order of magnitude rather than being
refined enough for high precision values.
This commit reduces duplication in the PowerPC backend. Instances are now in terms of the generic `AnyPPC` type, rather than having separate instances for 32 and 64 bit. Shuffling some type parameters also allows us to remove a large number of type equalities that e.g., fix the arch register type to `PPCReg`.
The main changes are:
- `postCallAbsState` was removed from the architecture info
- `mkInitialRegsForBlock` was renamed to `initialBlockRegs` and takes slightly
different parameters
- There is a new type family and some new functions in the architecture info
relating to post-block/terminator abstract state construction
PowerPC doesn't need any extra information to compute post-block abstract
states, so we use () as the ArchBlockPrecond type.
The goal is to support a jumptable testcase that is not supported by
the current jump bounds check. The jump bounds check needs to be
augmented so that it understands equality relationships between stack
values and registers, and bounds on both.
This patch tracks when a register points to a concrete stack offset.
As part of this, we droped the AbsDomain instance for AbsBlockState.
Clients should now likely use `fnStartAbsBlockState` in lieu of `top`.
The other client visible change is that the ClassifyFailure
constructor now has an extra argument with details about why
classification failure occured.
Update to API changes in macaw-base in macaw-ppc and macaw-arm
The "block label" abstraction (used during arch-specific disassembly) was removed some time ago in the base macaw library. This change updates macaw-ppc and macaw-arm to remove uses of block labels. The major change is that the disassembly function only returns a single block at a time instead of a sequence of blocks.
To facilitate this, the handling of the PowerPC conditional trap instruction (trap doubleword) is now an architecture-specific terminator instruction instead of encoding the logic of conditional trapping. We will now have to encode the conditional trapping logic in macaw-ppc-symbolic. Note that we have not done so yet.
This commit also updates the expected results of the PowerPC tests; the number of discovered blocks is different, but not significantly so. It is hard to tell if this is a regression or an improvement.
Verifies that the number of blocks found matches what should generally
be expected from this particular executable.
The specific value checked for is not independently verified, it just
happens to be a reasonable-looking value that the discovery process
currently identifies, and encoding it here ensures that if discovery
ever changes that the change will be seen and explicitly accepted or
fixed as needed.
We now thread a snapshot of the register state from the beginning of the
instruction evaluation through each instruction's semantics instead of
re-fetching register values each time we need it and potentially seeing
incorrect, partially modified register values.
The field it contains is supposed to be the instruction offset in its basic
block; overflowing it can cause significant problems during symbolic simulation.
There is a new metadata statement that tracks the start address of each
instruction. This is used in the translation to Crucible to provide better
error messages. The x86 backend was already updated, this commit adds the
metadata to the ARM and PowerPC backends.