This patch contains a handful of tweaks needed to make the libraries in the
`macaw` repo build with GHC 9.6:
* GHC 9.6 bundles `mtl-2.3.*`, which no longer re-exports `Control.Monad`,
`Control.Monad.Trans`, and similar modules from `mtl`-related modules. To
accommodate this, various imports have been made more explicit.
* I have disambiguated a use of `Data.Parameterized.NatRepr.withKnownNat` in
`macaw-aarch32` to avoid clashing with a newly exported function of the same
name in `GHC.TypeNats`.
* I have bumped various upper version bounds on `doctest`,
`optparse-applicative`, and `what4` to allow building these libraries with
GHC 9.6.
* I have bumped the following submodules to bring in GHC 9.6–related changes:
* `asl-translator`: GaloisInc/asl-translator#53
* `crucible`: GaloisInc/crucible#1102
* `dwarf`: GaloisInc/dwarf#6
* `elf-edit`: GaloisInc/elf-edit#38
* `flexdis86`: GaloisInc/flexdis86#54
* `grift`: GaloisInc/grift#9
* `llvm-pretty`: elliottt/llvm-pretty#112
* `llvm-pretty-bc-parser`: GaloisInc/llvm-pretty-bc-parser#225
* `semmc`: GaloisInc/semmc#80
* `what4`: GaloisInc/what4#235
This is cargo-culted from the `*.Panic` module of a similar name in
`macaw-aarch32`. This will be useful in a subsequent commit in which we replace
some unreachable calls to `fail` with `panic`.
This extends `Data.Macaw.Symbolic.Testing` in `macaw-symbolic` to be able to
handle binaries that depend on shared libraries. This is fully functional for
the x86-64 and AArch32 symbolic backends, and I have added test cases to the
respective repos demonstrating that it works. (The PowerPC backend is not yet
supported. At a minimum, this is blocked on GaloisInc/elf-edit#35.)
To implement this, I also needed to add some additional infrastructure to
`macaw-base` (I put this infrastructure here as it doesn't depend on any
Crucible-specific functionality):
* `Data.Macaw.Memory.ElfLoader.DynamicDependencies`: a basic ELF dynamic
loader that performs a breadth-first search over all `DT_NEEDED` entries
that an ELF binary depends on (both directly and indirectly).
* `Data.Macaw.Memory.ElfLoader.PLTStubs`: a collection of heuristics for
detecting the addresses of PLT stubs in a dynamically linked binary.
It is worth noting that shared libraries are rife with nuance and subtlety,
and the way `macaw` models shared libraries is not 100% accurate. I have
written a length `Note [Shared libraries]` in `Data.Macaw.Symbolic.Testing`
to describe where corners had to be cut.
Fixes#318.
This:
* Bumps the `elf-edit` submodule to bring in the changes from
GaloisInc/elf-edit#34.
* Updates `Data.Macaw.Memory.ElfLoader` to consolidate the symbol table logic
with the corresponding functions from `elf-edit`.
Fixes#277.
We already have support for `R_X86_64_COPY` relocations, so adding support
for `R_ARM_COPY` on the AArch32 side is straightforward.
This is related to #47, although this is not a full fix for the issue.
It turns out that we have to be more conservative with tail call identification,
as incorrectly identifying a block as the target of a tail call (instead of a
branch) can cause other branch classifiers to fail if that block is the target
of another jump.
Ultimately, we will need to give up some tail call recognition (since they are
in general indistinguishable from jumps), and instead only identify known call
targets as tail call candidates.
With additional global analysis we could do better.
Fixes#294
The bug arose in the handling of `StackOffsetAbsVal`, which track an abstraction
of references relative to the stack pointer. The offsets in `StackOffsetAbsVal`
are `Int64`; they are signed because references are both above and below the
stack pointer. The code constructing new values of this type was incorrectly
zero-extending new offsets instead of sign extending them. This did not matter
on 64 bit architectures, as it happened to result in the same values. It
substantially corrupted the abstract stack on PowerPC 32. It did not seem to
affect AArch32, but that is likely just due to luck in compiler code generation
that does not require this level of precision in the abstract stack.
The resulting errors manifest in the `absEvalCall` function. Because of the lack
of sign extension in `StackOffsetAbsVal`s, it made the current stack pointer
look like a huge number, which caused *all* stack entries to be dropped after
function calls.
This fix simplifies the stack offset abstract value computation substantially
and ensures that signs are extended correctly. The commit adds a PowerPC32 test
case that only passes with this fix.
GHC 9.2 adds `-Wincomplete-uni-patterns` to `-Wall`, which uncovers a slew of
previously unnoticed warnings in `macaw`. This patch fixes them, mostly by
adding explicit fall-through cases.
In `base-4.16.*`, `Nat` is now a type synonym for `Natural`, and `GHC.TypeLits`
now re-exports `Natural`. This causes a `-Wunused-imports` warning in
`macaw-base` as a consequence. I fixed the warning by tightening up the imports
slightly.
The tail call classifier came after the jump classifier, which was a problem because it is less strict than the tail call classifier, meaning it would always fire. This commit moves direct jump to be the last classifier applied, giving the others a chance.
Includes a test case in the ARM backend.
This requires some updates to some of the expected test results, as a few blocks are now classified as tail calls that were
plain jumps before. They really could be considered either. I think it would be nice if these could be classified as jumps instead, but the reason they are flagged as tail calls is mostly down to the fact that their surrounding context is so simple that either interpretation works.
Correcting this would require some heuristics based on additional analysis passes.
The test harness for macaw symbolic required a few changes because the new detection of some jumps as tail calls introduces new calls into the symbolic test suites. However, the symbolic testing harness did not support calls before. Adding support required a bit of plumbing, including a more extensive code discovery pass.
Fixes#285
This bump the `elf-edit` submodule to bring in the changes from
https://github.com/GaloisInc/elf-edit/pull/29, which adds an additional
`VersionDefMap` argument to `elf-edit` to make it aware of version definitions.
This requires some changes to the API in `Data.Macaw.Memory.ElfLoader` to
accommodate.
This contains a variety of fixes needed to make the packages in the `macaw`
repo compile with GHC 9.0:
* GHC 9.0 implements simplified subsumption (see
[here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#simplified-subsumption)).
In most cases, adapting to this is a matter of manually eta expanding
definitions, such as in `base:Data.Macaw.Analysis.RegisterUse`. In the case
of `macaw-x86-symbolic:Data.Macaw.X86.Crucible`, the type signature of
`evalExt` had to be made more specific to adapt to the loss of contravariance
when typechecking `(->)`.
* GHC's constraint solver now solves constraints in each top-level group
sooner (see
[here](https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.0?version_id=5fcd0a50e0872efb3c38a32db140506da8310d87#the-order-of-th-splices-is-more-important)).
This affects `macaw-aarch32` and `macaw-symbolic`, as they separate top-level
groups with `$(return [])` Template Haskell splices. The previous locations
of these splices made it so that the TH-generated instances in that package
were not available to any code before the splice, resulting in type errors
when compiled with GHC 9.0.
To overcome this, I rearranged the TH-generated instances so that they appear
before the top-level groups that make use of them.
* GHC 9.0 now enables `-Wstar-is-type` in `-Wall`, so this patch replaces some
uses of `*` with `Data.Kind.Type`. `Data.Kind` requires the use of GHC 8.0 or
later, so this patch also updates thes lower bounds on `base` to `>= 4.9` in
the appropriate `.cabal` files. (I'm fairly certain that this requirement was
already present implicity, but better to be explicit about it.)
* The `asl-translator`, `crucible`, and `semmc` submodules were updated to
allow them to build with GHC 9.0. The `llvm-pretty` and
`llvm-pretty-bc-parser` submodules were also bumped to accommodate unrelated
changes in `crucible` that were brought in.
* The upper version bounds on `doctest` in `macaw-symbolic`'s test suite were
raised to allow it to build with GHC 9.0.
Adds support in macaw-aarch32 for conditional returns. These are not supported in core macaw, and are thus architecture-specific block terminators.
This required changes to the type of arch-specific block terminators. Before, `ArchTermStmt` was only parameterized by a state thread (`ids`). This meant that they could not contain macaw (or crucible) values. Some work on. AArch32 requires being able to store condition values in arch terminators (to support conditional returns). This change modifies the `ArchTermStmt` to enable this, which requires a bit of plumbing through various definitions and some extra instances.
In support of actually using this, it also became necessary to plumb fallthrough block labels through the architecture-specific terminator translation in macaw-symbolic.
Note that this change was overdue, as the PowerPC backend was storing macaw values in a way that would have rendered them unusable in the macaw-ppc-symbolic translation, had any interpretation been provided. These new changes will enable a handler to be written for the conditional PowerPC trap instructions.
PowerPC, x86, and ARM have been updated.
Improves the macaw-aarch32 tests. There is now a command line option to save the generated macaw IR for each
discovered function to /tmp. Note that this reuses some infrastructure from the macaw-symbolic tests. This
shared functionality should be extracted into a macaw-testing library.
This change makes the block classifier heuristic part of the `ArchitectureInfo`
structure. This enables clients and architecture backends to customize the
block classification heuristics. This is most useful for architectures that
have complex architecture-specific block terminators that require analysis to
generate (e.g., conditional returns). It will also make macaw-refinement
simpler in the future, as the SMT-based refinement is just an additional block
classifier (but is currently implemented via a hacky side channel).
This change introduces an ancillary change, which should not be very
user-visible.
It splits the Macaw.Discovery and Macaw.Discovery.State modules to break
module import cycles in a way that enables us to expose the classifier. This
should not be user-visible, as Macaw.Discovery still exports the same
names (with one minor exception that should not appear in user code).
It also moves the definition of the `ArchBlockPrecond` type family; the few
affected places should be updated. User code should probably not be able to see
this.
It is possible for later `ArchState` updates to reference the `AssignId` of
the `AssignStmt` that is dropped as a part of `stripPLTRead`, so make sure to
prune such updates from the `ArchState` to avoid referencing the now
out-of-scope `AssignId`.
This refactors demand analysis so we export assignment inferred values
and also no longer compute demands for assigments when inference
infers a value for them.
This exposes more information but requires changes to consumers of
information.