mirror of
https://github.com/GaloisInc/macaw.git
synced 2024-11-29 11:02:13 +03:00
75e31f8338
Given some symbolic value, @x@, we'd like to compute its possible upper and lower bounds after it is truncated to @w@ bits. To do this, we first find the bound of x by (recursively) calling `exprRangePred`. This bound is a statement of the following form (see `RangePred` for more info): "r bits of x are bounded between @low@ and @high@". Then, we check the following: - If x has a bound (r, l, w) AND - If r is less than or equal to w => Pass-through the bound (r, l, w) Otherwise, we deem x "unbounded" Declaring x unbounded in the second case seems to throw away useful information that causes many jump tables to remain unclassified. We attempt to improve on that in this commit. Consider an example where x is bounded by (64, 0, 10) (that is, the 64 bits of x are constrained to be between 0 and 10) and we want to find the bound of truncating x to 8 bits. With the current logic, since 64 > 8, we'd declare x unbounded. However, the bound (8, 0, 10) should also be valid: if 64 bits of x are bounded to [0, 10], then surely 8 bits of x also lie between 0 and 10. If the upper bound is instead larger than the largest 8-bit value, we can truncate it to the largest value. For example, (64, 0, 10000) becomes (8, 0, 255). Instead of losing the bound completely, we're able to tighten it! |
||
---|---|---|
.. | ||
src/Data/Macaw | ||
ChangeLog.md | ||
LICENSE | ||
macaw-base.cabal | ||
README.rst |
The macaw library implements architecture-independent binary code discovery. Support for specific architectures is provided by implementing the semantics of that architecture. The library is written in terms of an abstract interface to memory, for which an ELF backend is provided (via the elf-edit_ library). The basic code discovery is based on a variant of Value Set Analysis (VSA). The most important user-facing abstractions are: * The ``Memory`` type, defined in ``Data.Macaw.Memory``, which provides an abstract interface to an address space containing both code and data. * The ``memoryForElfSegments`` function is a useful helper to produce a ``Memory`` from an ELF file. * The ``cfgFromAddrs`` function, defined in ``Data.Macaw.Discovery``, which performs code discovery on a ``Memory`` given some initial parameters (semantics to use via ``ArchitectureInfo`` and some entry points). * The ``DiscoveryInfo`` type, which is the result of ``cfgFromAddrs``; it contains a collection of ``DiscoveryFunInfo`` records, each of which represents a discovered function. Every basic block is assigned to at least one function. Architecture-specific code goes into separate libraries. X86-specific code is in the macaw-x86 repo. An abbreviated example of using macaw on an X86_64 ELF file looks like:: import qualified Data.Map as M import qualified Data.ElfEdit as E import qualified Data.Parameterized.Some as PU import qualified Data.Macaw.X86 as MX86 import qualified Data.Macaw.Memory.ElfLoader as ML import qualified Data.Macaw.Discovery as MD discoverCode :: E.Elf Word64 -> (forall ids . MD.DiscoveryInfo X86_64 ids -> a) -> a discoverCode elf k = case ML.resolveElfContents ML.defaultLoadOptions elf of Left e -> error (show e) Right (_, _, Nothing, _) -> error "Unable to determine entry point" Right (warn, mem, Just entryPoint, _) -> do mapM_ print warn case MD.cfgFromAddrs MX86.x86_64_linux_info mem M.empty [entryPoint] [] of PU.Some di -> k di In the callback ``k``, the ``DiscoveryInfo`` can be analyzed as desired. Implementing support for an architecture is more involved and requires implementing an ``ArchitectureInfo``, which is defined in ``Data.Macaw.Architecture.Info``. This structure contains architecture-specific information like: * The pointer width * A disassembler from bytes to abstract instructions * ABI information regarding registers and calling conventions * A transfer function for architecture-specific features not represented in the common IR .. _elf-edit: https://github.com/GaloisInc/elf-edit .. _flexdis86: https://github.com/GaloisInc/flexdis86