macaw/macaw-aarch32/REAMDE.md
Tristan Ravitch 37861df8c7
Support for mixed ARM/Thumb binaries (#174)
aarch32: Support mixed ARM/Thumb1 binaries

This updates the aarch32 backend to decode Thumb instructions and generate the Thumb semantics. The major implementation change is to use the `ArchBlockPrecond` feature of macaw to track the Thumb state (`PSTATE_T`) across block boundaries.

The ARM code discovery decides whether or not a function entry point should be decoded as Thumb by examining the low bit of the function address. If the low bit is set, it is a Thumb entry point. This has the slightly odd effect of causing macaw to say that the function is at the address with the low bit set, which is not technically true. This is documented in the README, but not obvious on inspection. Most use cases should not care, and can in any case account for it. In the future, it should be possible to fix this (though it will require some changes to the core of macaw).
2020-11-02 12:48:01 -08:00

1.5 KiB

Overview

This package provides support in macaw for the 32 bit ARM architecture (both the ARM and Thumb encodings). The semantics are derived from the official ARM semantics (encoded in ASL and processed by the asl-translator Haskell package).

Differences from other architecture backends

This backend relies on extensive additional simplification rules (see Data.Macaw.ARM.Simplify) to reduce some redundant syntactic constructs to constants. This simplification infrastructure is provided by macaw-semmc, rather than macaw-base. The simplification rules in macaw base do not have the correct form to support the transformations that we need.

Limitations

  • Currently, this package does not support vector instructions. The semantics are available but they are disabled by default due to the increased compile times for the complex vector instruction semantics. They can be modified on a per-instruction basis via the isUninterpretedOpcode predicate in Data.Macaw.ARM.Arch.

Quirks

  • Thumb functions discovered by macaw have the low bit of their address set. This is an artifact of the Thumb calling convention, which indicates mode switches by setting the low bit of the jump address. The reflection of this in the recovered function is not quite correct: disassembling from the address with the low bit set would actually yield the wrong results. This is not very important for most uses of macaw, but should be kept in mind when processing discovered Thumb functions. Fixing this is possible but potentially fairly invasive.