symbolic: Add some documentation on pointer operations (#145)

symbolic: Add some documentation on pointer operations Their behavior is not entirely obvious, so hopefully this should be useful to someone in the future.
2024-11-26 09:22:20 +03:00 · 2020-06-13 10:27:43 -07:00 · 2020-06-13 10:27:43 -07:00 · 5ba28484f9
commit 5ba28484f9
parent 7ec8df5e92
1 changed files with 85 additions and 0 deletions
--- a/symbolic/src/Data/Macaw/Symbolic/MemOps.hs
+++ b/symbolic/src/Data/Macaw/Symbolic/MemOps.hs
@ -215,6 +215,10 @@ setMem st mvar mem =
  st & stateTree . actFrame . gpGlobals %~ insertGlobal mvar mem
 -- | Classify the arguments, and continue.
 --
 -- This combinator takes a continuation that is provided a number of (SMT)
 -- /predicates/ that classify the inputs as bitvectors or pointers.  An
 -- 'LLVMPtr' is a bitvector if its region id (base) is zero.
 ptrOp ::
  ( (1 <= w) =>
    sym ->
@ -373,6 +377,10 @@ check sym valid name msg = assert sym valid
    errMsg = "[" ++ name ++ "] " ++ msg
 -- | Define an operation by cases.
 --
 -- NOTE that the cases defined using this combinator do not need to be complete;
 -- it adds a fallthrough case that asserts false (indicating that it should be
 -- impossible)
 cases ::
  (IsSymInterface sym) =>
  sym         {- ^ Simulator -} ->
@ -443,6 +451,29 @@ doPtrMux c = ptrOp $ \sym _ w xPtr xBits yPtr yBits x y ->
           endCase =<< muxLLVMPtr sym c x y
       ]
 -- | Implementation of addition of two 'LLVMPtr's
 --
 -- The translation uses the 'LLVMPtr' type for both bitvectors and pointers, as
 -- they are mostly indistinguishable at the machine code level (until they are
 -- actually used as a pointer).  This operation looks a bit complicated because
 -- there are four possible cases:
 --
 -- * Adding a pointer to a bitvector
 -- * Adding a bitvector to a pointer
 -- * Adding two bitvectors
 -- * Adding two pointers (not allowed)
 --
 -- Note that the underlying pointer addition primitive from crucible-llvm,
 -- 'ptrAdd', only accepts its operands in one order: pointer and then bitvector.
 -- The cases below rearrange the operands as necessary.
 --
 -- The final case, of adding two bitvectors together, is also special cased
 -- here.  NOTE that we do not do the tests at symbolic execution time: instead,
 -- we generate a formula that encodes the necessary tests (hence the 'cases'
 -- combinator).
 --
 -- NOTE that the case of adding two pointers is not explicitly addressed in the
 -- 'cases' call below; 'cases' adds a fallthrough that asserts false.
 doPtrAdd :: PtrOp sym w (LLVMPtr sym w)
 doPtrAdd = ptrOp $ \sym _ w xPtr xBits yPtr yBits x y ->
  do both_bits <- andPred sym xBits yBits
@ -458,6 +489,20 @@ doPtrAdd = ptrOp $ \sym _ w xPtr xBits yPtr yBits x y ->
       ]
     return a
 -- | Implementation of subtraction of 'LLVMPtr's
 --
 -- This case is substantially similar to 'doPtrAdd', except the operation matrix
 -- is:
 --
 -- * Subtracting a pointer from a bitvector (not allowed)
 -- * Subtracting a bitvector from a pointer
 -- * Subtracting two bitvectors
 -- * Subtracting two pointers
 --
 -- Note that subtracting two pointers is allowed if (and only if) they are
 -- pointers to the same region of memory.  This check is again encoded
 -- symbolically in the final case, as we can't know if it is true or not during
 -- simulation (without help from the SMT solver).
 doPtrSub :: PtrOp sym w (LLVMPtr sym w)
 doPtrSub = ptrOp $ \sym mem w xPtr xBits yPtr yBits x y ->
  do both_bits <- andPred sym xBits yBits
@ -490,6 +535,46 @@ isAlignMask v =
     guard (all (testBit k) ones)
     return (fromIntegral (length zeros))
 -- | Perform bitwise and on 'LLVMPtr' values
 --
 -- This is somewhat similar to 'doPtrAdd'.  This is a special case because many
 -- machine code programs use bitwise masking to align pointers.  There are two
 -- cases here:
 --
 -- * Both values are actually bitvectors (in which case we just delegate to the
 --   low-level 'bvAndBits' operation)
 -- * One of the values is an 'LLVMPtr' and the other is a literal that looks like a mask
 --
 -- If none of those cases is obviously true, this function generates assertions
 -- that both values are actually bitvectors and uses the straightforward
 -- operations (the last case in the Haskell-level case expression).
 --
 -- This operation is tricky for two reasons:
 --
 -- 1. The underlying 'LLVMPtr' type does not support a bitwise and operation
 --    (since it makes less sense at the LLVM level, where alignment is specified
 --    explicitly for each pointer and pointer operation)
 -- 2. We do not know the alignment of the pointer being masked
 --
 -- If we knew the pointer alignment, we could simply generate an assertion that
 -- the masking operation is safe and then mask the offset portion of the
 -- pointer.  However, we do not have a guarantee of that, as pointer bases are
 -- abstract in the LLVM memory model.
 --
 -- As a result, we do not know the exact effect of the masking operation on the
 -- pointer.  Unfortunately, this means we have to treat the operation very
 -- conservatively.  We determine how many bits the program is attempting to mask
 -- off and create a symbolic constant of that size and subtract it from the
 -- offset in the pointer.  This value represents the most that could possibly be
 -- removed from the value (assuming the original pointer was actually properly
 -- aligned); however, if the pointer was not actually sufficiently aligned, the
 -- amount subtracted could be less.
 --
 -- This is not ideal, as there are not many constraints we can express about
 -- this value being subtracted off.
 --
 -- FIXME: If we had the alignment of the pointer available, we could assert that
 -- the alignment is sufficient to safely just apply the mask to the offset.
 doPtrAnd :: PtrOp sym w (LLVMPtr sym w)
 doPtrAnd = ptrOp $ \sym _mem w xPtr xBits yPtr yBits x y ->
  let nw = M.addrWidthNatRepr w