This folds the menagerie of various configuration option arguments to
`macawExtensions` into the `MemModelConfig` data type. The advantage to doing
this is that is will make it easier to extend the memory model configuration
options in the future without needlessly foisting breaking changes on all
`macaw-symbolic` users.
Unfortunately, it does require a breaking change to get to this point, but the
migration path is straightforward for existing code. I have included this
migration story in the `macaw-symbolic` changelog.
The current `macaw-symbolic` memory model has issues when scaling up to
binaries that have several megabytes or more in size. This patch introduces a
new memory model (in `Data.Macaw.Symbolic.Memory.Lazy`) that serves as a mostly
drop-in replacement for the existing memory model (which I now refer to as the
"default" memoy model). The lazy memory model scales better by incrementally
populating the SMT array backing global memory over the course of a run of the
simulator. For the full details, see `Note [Lazy memory model]`.
I performed some refactoring to share common bits between the default and lazy
memory models.
Fixes#282.
This patch introduces a `MemModelConfig` data type for configuring the finer
details of `macaw-symbolic`'s memory model. For now, there is a single option,
which configures whether the memory model should attempt to concretize pointers
during a read or write, which can sometimes be beneficial for performance
reasons. The details of how concretization work can be found in the new
`Data.Macaw.Symbolic.Concretize` module.
Subsequent commits will add more configurable knobs to `MemModelConfig`.
Fixes#323.
This extends `Data.Macaw.Symbolic.Testing` in `macaw-symbolic` to be able to
handle binaries that depend on shared libraries. This is fully functional for
the x86-64 and AArch32 symbolic backends, and I have added test cases to the
respective repos demonstrating that it works. (The PowerPC backend is not yet
supported. At a minimum, this is blocked on GaloisInc/elf-edit#35.)
To implement this, I also needed to add some additional infrastructure to
`macaw-base` (I put this infrastructure here as it doesn't depend on any
Crucible-specific functionality):
* `Data.Macaw.Memory.ElfLoader.DynamicDependencies`: a basic ELF dynamic
loader that performs a breadth-first search over all `DT_NEEDED` entries
that an ELF binary depends on (both directly and indirectly).
* `Data.Macaw.Memory.ElfLoader.PLTStubs`: a collection of heuristics for
detecting the addresses of PLT stubs in a dynamically linked binary.
It is worth noting that shared libraries are rife with nuance and subtlety,
and the way `macaw` models shared libraries is not 100% accurate. I have
written a length `Note [Shared libraries]` in `Data.Macaw.Symbolic.Testing`
to describe where corners had to be cut.
Fixes#318.
This:
* Bumps the `elf-edit` submodule to bring in the changes from
GaloisInc/elf-edit#34.
* Updates `Data.Macaw.Memory.ElfLoader` to consolidate the symbol table logic
with the corresponding functions from `elf-edit`.
Fixes#277.
This is needed to bring in the changes from #77, which adds support for
`hashable-1.4.*`. With this change, everything in the `macaw` repo now
builds with `hashable-1.4.*`.
This requires bumping the `asl-translator` submodule to bring the changes from
GaloisInc/asl-translator#47, which are necessary for the test case to work.
When populating `COPY` relocations, it is helpful to know the address of the
relocation so that it can be related back to the name of the global symbol
whose value it is copying. Unfortunately, the type of `populateRelocation` does
not make it straightforward to compute this address. This patch includes three
additional arguments to `populateRelocation` (the relocation's `Memory`, its
`MemSegment`, and its `MemAddr`) to more easily facilitate computing the
address.
This is a breaking API change, albet it is a fairly straightforward change to
adapt to for most consumers.
This is related to #47, although this is not a full fix for the issue.
We already have support for `R_X86_64_COPY` relocations, so adding support
for `R_ARM_COPY` on the AArch32 side is straightforward.
This is related to #47, although this is not a full fix for the issue.
This patch was motivated by the need to call `doGetGlobal` from a Crucible
override, where the `SimState` is instantiated with `OverrideLang` rather than
`CrucibleLang`, the latter of which is used in the `CrucibleState` type
synonym. While I was in town, I generalized the types of other operations in
`Data.Macaw.Symbolic.MemOps` where it was reasonable.
This bumps the `crucible` submodule to bring in the changes from
GaloisInc/crucible#998, which adds `?memOpts :: MemOptions` constraints to
a handful of additional functions. This requires adding constraints to
some functions in `macaw-symbolic` to accommodate, as well as bumping the
`semmc` submodule to bring in analogous changes from GaloisInc/semmc#76.
It turns out that we have to be more conservative with tail call identification,
as incorrectly identifying a block as the target of a tail call (instead of a
branch) can cause other branch classifiers to fail if that block is the target
of another jump.
Ultimately, we will need to give up some tail call recognition (since they are
in general indistinguishable from jumps), and instead only identify known call
targets as tail call candidates.
With additional global analysis we could do better.
Fixes#294
The bug arose in the handling of `StackOffsetAbsVal`, which track an abstraction
of references relative to the stack pointer. The offsets in `StackOffsetAbsVal`
are `Int64`; they are signed because references are both above and below the
stack pointer. The code constructing new values of this type was incorrectly
zero-extending new offsets instead of sign extending them. This did not matter
on 64 bit architectures, as it happened to result in the same values. It
substantially corrupted the abstract stack on PowerPC 32. It did not seem to
affect AArch32, but that is likely just due to luck in compiler code generation
that does not require this level of precision in the abstract stack.
The resulting errors manifest in the `absEvalCall` function. Because of the lack
of sign extension in `StackOffsetAbsVal`s, it made the current stack pointer
look like a huge number, which caused *all* stack entries to be dropped after
function calls.
This fix simplifies the stack offset abstract value computation substantially
and ensures that signs are extended correctly. The commit adds a PowerPC32 test
case that only passes with this fix.
GHC 9.2 adds `-Wincomplete-uni-patterns` to `-Wall`, which uncovers a slew of
previously unnoticed warnings in `macaw`. This patch fixes them, mostly by
adding explicit fall-through cases.
GHC 9.2 adds `-Wnoncanonical-monad-instances` to `-Wall`, which warns whenever
one has explicit implementations of `return` or `(>>)` that aren't simply
`return = pure` or `(>>) = (*>)`. Since these are the default
implementations of `return` and `(>>)` since `base-4.11`, the simplest
way to fix the warnings is to simply remove all explicit definitions of
`return` and `(>>)` and rely on the defaults, which this patch accomplishes.
In `base-4.16.*`, `Nat` is now a type synonym for `Natural`, and `GHC.TypeLits`
now re-exports `Natural`. This causes a `-Wunused-imports` warning in
`macaw-base` as a consequence. I fixed the warning by tightening up the imports
slightly.
The core of macaw cannot represent conditional calls because the existing block terminators are not sufficiently expressive and it doesn't support creating synthetic blocks to represent control flow not directly tied to machine addresses.
To work around this, we introduce ARM-specific block terminators for conditional calls and plumb them through up to macaw-aarch32-symbolic.
Fixes#288
The tail call classifier came after the jump classifier, which was a problem because it is less strict than the tail call classifier, meaning it would always fire. This commit moves direct jump to be the last classifier applied, giving the others a chance.
Includes a test case in the ARM backend.
This requires some updates to some of the expected test results, as a few blocks are now classified as tail calls that were
plain jumps before. They really could be considered either. I think it would be nice if these could be classified as jumps instead, but the reason they are flagged as tail calls is mostly down to the fact that their surrounding context is so simple that either interpretation works.
Correcting this would require some heuristics based on additional analysis passes.
The test harness for macaw symbolic required a few changes because the new detection of some jumps as tail calls introduces new calls into the symbolic test suites. However, the symbolic testing harness did not support calls before. Adding support required a bit of plumbing, including a more extensive code discovery pass.
Fixes#285
When a user overrides a system call on an architecture that supports returning two values from a system call and they provide a context containing the result of the system call in the form
```
empty :> v0 :> v1
```
macaw will perform the register assignment
```
r0 := v1
r1 := v0
```
This change reverses this behavior so that the assignment becomes
```
r0 := v0
r1 := v1
```
This brings the expected ordering of the result context in agreement
with the left-to-right ordering of the argument context:
```
empty :> arg1 :> arg2 :> ...
```
* remove/generalize MacawBlockEnd from CFG slicing
* expose functions in symbolic backend
* hide bvLit from Backend import
* add CI version to workflow
This bump the `elf-edit` submodule to bring in the changes from
https://github.com/GaloisInc/elf-edit/pull/29, which adds an additional
`VersionDefMap` argument to `elf-edit` to make it aware of version definitions.
This requires some changes to the API in `Data.Macaw.Memory.ElfLoader` to
accommodate.
This would come in handy for an application where I wish to pass a
`NonEmptyVector` to `newMergedGlobalMemoryWith`. Currently, I have to convert
the `NonEmptyVector` to a `NonEmpty` list to accomplish this, wish seems
wasteful given that `newMergedGlobalMemoryWith` only needs to use the
`Foldable` interface.
The return address gets masked and has the low-bit set in an obtuse way due to
the semantics. This threw off the call detection.
This change matches against the quirky pattern.