1
1
mirror of https://github.com/github/semantic.git synced 2025-01-03 04:51:57 +03:00

Merge branch 'master' into ses-parallelism

# Conflicts:
#	semantic-diff.cabal
#	src/Patch.hs
This commit is contained in:
Rob Rix 2016-07-12 09:50:02 -04:00
commit a0a86a0d01
35 changed files with 654 additions and 182 deletions

View File

@ -12,4 +12,6 @@ error "Avoid return" =
where note = "return is obsolete as of GHC 7.10"
error "use pure" = free . Pure ==> pure
error "use wrap" = free . Free ==> wrap
error "use extract" = headF . runCofree ==> extract

View File

@ -1,17 +1,47 @@
# Semantic diff roadmap
# Roadmap
## Q1 2016
## Things we are currently doing:
1. [Staff ship & limited beta of semantic diffing on .com](https://github.com/github/semantic-diff/milestones/Staff%20Ship). This will be an opt-in, limited release of semantic diffs for a very small set of languages. UI in general will be unchanged; well simply start showing better diffs for the languages in question. The goal is to ease ourselves into deployment of the system, and benchmark under real loads.
1. [Diff summaries][] for C & JavaScript. Q3 2016 or so.
2. [Semantic diffing on .com](https://github.com/github/semantic-diff/milestones/Dot%20Calm). General release of semantic diffs for the supported languages.
- Modelling the abstract semantics of the supported languages. Good summaries require us to know what different parts of the syntax represent.
- Performance/responsiveness. We need to be able to produce diffs more quickly, and without unicorns. Some of this will involve front-end work (e.g. requesting summaries out-of-band).
2. [Semantic diffs][] on .com for C & JavaScript. Q4 2016 or so.
## Q2Q4 2016
- Performance, as above.
- Resilience. A fault in `semantic-git-diff` should not break anything else.
- Metrics. We need to know how its behaving in the wild to know what to do about it. This also includes operational metrics such as health checks.
We will discuss future milestones at the **@github/network-intelligence** minisummit mid-Q1 2016, and document them here at that point.
## Follow-up things:
## Ongoing
1. Add support for more languages: [Ruby][], etc.
2. [Detecting & rendering moves][moves].
3. [Merging][].
4. Refining the diff summaries we produce.
- Creation, curation, and cultivation of grammars for semantic diffs.
## Things we would like to do:
1. [Interactively refining diffs][interactive].
2. [Filtering][] diffs.
3. Diff [table of contents][].
4. [Jump to symbol definition][].
5. Eliminate conflicts from renaming [variables][].
## Things we would like to do modulo interest/support from other teams:
1. APIs/tooling for data science & engineering teams.
2. Collect data on our heuristics &c. and refine them via e.g. ML.
3. Diffs as a [service][].
[Diff summaries]: https://github.com/github/semantic-diff/milestones/Summer%20Eyes
[Semantic diffs]: https://github.com/github/semantic-diff/milestones/Dot%20Calm
[Ruby]: https://github.com/github/semantic-diff/issues/282
[moves]: https://github.com/github/semantic-diff/issues/389
[Merging]: https://github.com/github/semantic-diff/issues/431
[interactive]: https://github.com/github/semantic-diff/issues/130
[Filtering]: https://github.com/github/semantic-diff/issues/428
[table of contents]: https://github.com/github/semantic-diff/issues/16
[Jump to symbol definition]: https://github.com/github/semantic-diff/issues/6
[variables]: https://github.com/github/semantic-diff/issues/91
[service]: https://github.com/github/platform/blob/master/services/README.md

View File

@ -18,7 +18,9 @@ library
, Category
, Data.Bifunctor.Join.Arbitrary
, Data.Functor.Both
, Data.RandomWalkSimilarity
, Data.OrderedMap
, Data.Record
, Data.These.Arbitrary
, Diff
, Diff.Arbitrary
@ -54,9 +56,13 @@ library
, bytestring
, containers
, directory
, dlist
, filepath
, hashable
, kdt
, mtl
, parallel
, MonadRandom
, pointed
, QuickCheck >= 2.8.1
, quickcheck-text
@ -71,7 +77,7 @@ library
, comonad
, protolude
default-language: Haskell2010
default-extensions: DeriveFunctor, DeriveFoldable, DeriveTraversable, DeriveGeneric, FlexibleInstances, OverloadedStrings, NoImplicitPrelude, RecordWildCards
default-extensions: DeriveFunctor, DeriveFoldable, DeriveTraversable, DeriveGeneric, FlexibleContexts, FlexibleInstances, OverloadedStrings, NoImplicitPrelude, RecordWildCards
ghc-options: -Wall -fno-warn-name-shadowing -O2 -threaded -fprof-auto "-with-rtsopts=-N -p -s -h -i0.1" -j
benchmark semantic-diff-bench
@ -95,18 +101,20 @@ test-suite semantic-diff-test
hs-source-dirs: test
main-is: Spec.hs
other-modules: AlignmentSpec
, ArbitraryTerm
, CorpusSpec
, Data.RandomWalkSimilarity.Spec
, Diff.Spec
, DiffSummarySpec
, InterpreterSpec
, OrderedMapSpec
, PatchOutputSpec
, TermSpec
, DiffSummarySpec
build-depends: base
, bifunctors
, bytestring
, containers
, deepseq
, dlist
, filepath
, Glob
, hspec >= 2.1.10

View File

@ -18,6 +18,7 @@ import Data.Functor.Foldable (hylo)
import Data.List (partition)
import Data.Maybe (fromJust)
import qualified Data.OrderedMap as Map
import Data.Record
import Data.These
import Diff
import Info
@ -39,15 +40,15 @@ numberedRows = countUp (both 1 1)
nextLineNumbers from row = modifyJoin (fromThese identity identity) (succ <$ row) <*> from
-- | Determine whether a line contains any patches.
hasChanges :: SplitDiff leaf Info -> Bool
hasChanges :: SplitDiff leaf annotation -> Bool
hasChanges = or . (True <$)
-- | Align a Diff into a list of Join These SplitDiffs representing the (possibly blank) lines on either side.
alignDiff :: Show leaf => Both (Source Char) -> Diff leaf Info -> [Join These (SplitDiff leaf Info)]
alignDiff :: (Show leaf, Show (Record fields), HasField fields Range) => Both (Source Char) -> Diff leaf (Record fields) -> [Join These (SplitDiff leaf (Record fields))]
alignDiff sources diff = iter (alignSyntax (runBothWith ((Join .) . These)) (free . Free) getRange sources) (alignPatch sources <$> diff)
-- | Align the contents of a patch into a list of lines on the corresponding side(s) of the diff.
alignPatch :: forall leaf. Show leaf => Both (Source Char) -> Patch (Term leaf Info) -> [Join These (SplitDiff leaf Info)]
alignPatch :: forall fields leaf. (Show leaf, Show (Record fields), HasField fields Range) => Both (Source Char) -> Patch (Term leaf (Record fields)) -> [Join These (SplitDiff leaf (Record fields))]
alignPatch sources patch = case patch of
Delete term -> fmap (pure . SplitDelete) <$> alignSyntax' this (fst sources) term
Insert term -> fmap (pure . SplitInsert) <$> alignSyntax' that (snd sources) term
@ -55,13 +56,13 @@ alignPatch sources patch = case patch of
(alignSyntax' this (fst sources) term1)
(alignSyntax' that (snd sources) term2)
where getRange = characterRange . extract
alignSyntax' :: (forall a. Identity a -> Join These a) -> Source Char -> Term leaf Info -> [Join These (Term leaf Info)]
alignSyntax' :: (forall a. Identity a -> Join These a) -> Source Char -> Term leaf (Record fields) -> [Join These (Term leaf (Record fields))]
alignSyntax' side source term = hylo (alignSyntax side cofree getRange (Identity source)) runCofree (Identity <$> term)
this = Join . This . runIdentity
that = Join . That . runIdentity
-- | The Applicative instance f is either Identity or Both. Identity is for Terms in Patches, Both is for Diffs in unchanged portions of the diff.
alignSyntax :: (Applicative f, Show term) => (forall a. f a -> Join These a) -> (CofreeF (Syntax leaf) Info term -> term) -> (term -> Range) -> f (Source Char) -> CofreeF (Syntax leaf) (f Info) [Join These term] -> [Join These term]
alignSyntax :: (Applicative f, Show term, HasField fields Range) => (forall a. f a -> Join These a) -> (CofreeF (Syntax leaf) (Record fields) term -> term) -> (term -> Range) -> f (Source Char) -> CofreeF (Syntax leaf) (f (Record fields)) [Join These term] -> [Join These term]
alignSyntax toJoinThese toNode getRange sources (infos :< syntax) = case syntax of
Leaf s -> catMaybes $ wrapInBranch (const (Leaf s)) . fmap (flip (,) []) <$> sequenceL lineRanges
Indexed children -> catMaybes $ wrapInBranch Indexed <$> alignBranch getRange (join children) bothRanges
@ -69,7 +70,7 @@ alignSyntax toJoinThese toNode getRange sources (infos :< syntax) = case syntax
Keyed children -> catMaybes $ wrapInBranch (Keyed . Map.fromList) <$> alignBranch (getRange . Prologue.snd) (Map.toList children >>= pairWithKey) bothRanges
where bothRanges = modifyJoin (fromThese [] []) lineRanges
lineRanges = toJoinThese $ actualLineRanges <$> (characterRange <$> infos) <*> sources
wrapInBranch constructor = applyThese $ toJoinThese ((\ info (range, children) -> toNode (info { characterRange = range } :< constructor children)) <$> infos)
wrapInBranch constructor = applyThese $ toJoinThese ((\ info (range, children) -> toNode (setCharacterRange info range :< constructor children)) <$> infos)
pairWithKey (key, values) = fmap ((,) key) <$> values
-- | Given a function to get the range, a list of already-aligned children, and the lists of ranges spanned by a branch, return the aligned lines.
@ -86,7 +87,7 @@ alignBranch getRange children ranges = case intersectingChildren of
_ -> case intersectionsWithHeadRanges <$> listToMaybe symmetricalChildren of
-- At least one child intersects on both sides, so align symmetrically.
Just (True, True) -> let (line, remaining) = lineAndRemaining intersectingChildren (Just headRanges) in
line $ alignBranch getRange (remaining ++ nonIntersectingChildren) (drop 1 <$> ranges)
line $ alignBranch getRange (remaining <> nonIntersectingChildren) (drop 1 <$> ranges)
-- A symmetrical child intersects on the right, so align asymmetrically on the left.
Just (False, True) -> alignAsymmetrically leftRange first
-- A symmetrical child intersects on the left, so align asymmetrically on the right.
@ -101,7 +102,7 @@ alignBranch getRange children ranges = case intersectingChildren of
Just headRanges = sequenceL (listToMaybe <$> Join (runBothWith These ranges))
(leftRange, rightRange) = splitThese headRanges
alignAsymmetrically range advanceBy = let (line, remaining) = lineAndRemaining asymmetricalChildren range in
line $ alignBranch getRange (remaining ++ symmetricalChildren ++ nonIntersectingChildren) (modifyJoin (advanceBy (drop 1)) ranges)
line $ alignBranch getRange (remaining <> symmetricalChildren <> nonIntersectingChildren) (modifyJoin (advanceBy (drop 1)) ranges)
lineAndRemaining _ Nothing = (identity, [])
lineAndRemaining children (Just ranges) = let (intersections, remaining) = alignChildren getRange children ranges in
((:) $ (,) <$> ranges `applyToBoth` (sortBy (compare `on` getRange) <$> intersections), remaining)

View File

@ -1,7 +1,9 @@
module Category where
import Prologue
import Data.Hashable
import Data.String
import Test.QuickCheck
-- | A standardized category of AST node. Used to determine the semantics for
-- | semantic diffing and define comparability of nodes.
@ -28,4 +30,27 @@ data Category
| ArrayLiteral
-- | A non-standard category, which can be used for comparability.
| Other String
deriving (Eq, Show, Ord)
deriving (Eq, Generic, Ord, Show)
-- Instances
instance Hashable Category
instance Arbitrary Category where
arbitrary = oneof
[ pure Program
, pure Error
, pure BinaryOperator
, pure DictionaryLiteral
, pure Pair
, pure FunctionCall
, pure StringLiteral
, pure IntegerLiteral
, pure SymbolLiteral
, pure ArrayLiteral
, Other <$> arbitrary
]
shrink (Other s) = Other <$> shrink s
shrink _ = []

View File

@ -1,3 +1,4 @@
{-# OPTIONS_GHC -fno-warn-orphans #-}
module Data.Functor.Both (Both,both, runBothWith, fst, snd, module X) where
import Data.Bifunctor.Join as X

View File

@ -15,10 +15,11 @@ module Data.OrderedMap (
) where
import Prologue hiding (toList, empty)
import Test.QuickCheck
-- | An ordered map of keys and values.
newtype OrderedMap key value = OrderedMap { toList :: [(key, value)] }
deriving (Show, Eq, Functor, Foldable, Traversable)
deriving (Eq, Foldable, Functor, Generic, Ord, Show, Traversable)
-- | Construct an ordered map from a list of pairs of keys and values.
fromList :: [(key, value)] -> OrderedMap key value
@ -70,3 +71,7 @@ difference (OrderedMap a) (OrderedMap b) = OrderedMap $ filter ((`notElem` extan
instance Eq key => Monoid (OrderedMap key value) where
mempty = fromList []
mappend = union
instance (Arbitrary key, Arbitrary value) => Arbitrary (OrderedMap key value) where
arbitrary = fromList <$> arbitrary
shrink = genericShrink

View File

@ -0,0 +1,110 @@
module Data.RandomWalkSimilarity where
import Control.Arrow ((&&&))
import Control.Monad.Random
import Control.Monad.State
import qualified Data.DList as DList
import Data.Functor.Foldable as Foldable
import Data.Hashable
import qualified Data.KdTree.Static as KdTree
import qualified Data.List as List
import qualified Data.OrderedMap as Map
import qualified Data.Vector as Vector
import Diff
import Patch
import Prologue
import Syntax
import Term
import Test.QuickCheck hiding (Fixed)
import Test.QuickCheck.Random
-- | Given a function comparing two terms recursively, and a function to compute a Hashable label from an annotation, compute the diff of a pair of lists of terms using a random walk similarity metric, which completes in log-linear time. This implementation is based on the paper [_RWS-Diff—Flexible and Efficient Change Detection in Hierarchical Data_](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf).
rws :: (Hashable label, Hashable leaf, Eq leaf, Eq annotation) => (Term leaf annotation -> Term leaf annotation -> Maybe (Diff leaf annotation)) -> (annotation -> label) -> [Term leaf annotation] -> [Term leaf annotation] -> [Diff leaf annotation]
rws compare getLabel as bs
| null as, null bs = []
| null as = insert <$> bs
| null bs = delete <$> as
| otherwise = fmap snd . uncurry deleteRemaining . (`runState` (negate 1, fas)) $ traverse findNearestNeighbourTo fbs
where insert = pure . Insert
delete = pure . Delete
(p, q, d) = (2, 2, 15)
fas = zipWith featurize [0..] as
fbs = zipWith featurize [0..] bs
kdas = KdTree.build (Vector.toList . feature) fas
featurize index term = UnmappedTerm index (featureVector d (pqGrams p q getLabel term)) term
findNearestNeighbourTo kv@(UnmappedTerm _ _ v) = do
(previous, unmapped) <- get
let (UnmappedTerm i _ _) = KdTree.nearest kdas kv
fromMaybe (pure (negate 1, insert v)) $ do
found <- find ((== i) . termIndex) unmapped
guard (i >= previous)
compared <- compare (term found) v
pure $! do
put (i, List.delete found unmapped)
pure (i, compared)
deleteRemaining diffs (_, unmapped) = foldl' (flip (List.insertBy (comparing fst))) diffs ((termIndex &&& delete . term) <$> unmapped)
-- | A term which has not yet been mapped by `rws`, along with its feature vector summary & index.
data UnmappedTerm leaf annotation = UnmappedTerm { termIndex :: {-# UNPACK #-} !Int, feature :: !(Vector.Vector Double), term :: !(Term leaf annotation) }
deriving Eq
-- | A `Gram` is a fixed-size view of some portion of a tree, consisting of a `stem` of _p_ labels for parent nodes, and a `base` of _q_ labels of sibling nodes. Collectively, the bag of `Gram`s for each node of a tree (e.g. as computed by `pqGrams`) form a summary of the tree.
data Gram label = Gram { stem :: [Maybe label], base :: [Maybe label] }
deriving (Eq, Show)
-- | Compute the bag of grams with stems of length _p_ and bases of length _q_, with labels computed from annotations, which summarize the entire subtree of a term.
pqGrams :: Int -> Int -> (annotation -> label) -> Cofree (Syntax leaf) annotation -> DList.DList (Gram (label, Maybe leaf))
pqGrams p q getLabel = cata merge . setRootBase . setRootStem . hylo go project
where go (annotation :< functor) = cofree (Gram [] [ Just (getLabel annotation, leafValue functor) ] :< (assignParent (Just (getLabel annotation, leafValue functor)) p <$> functor))
leafValue (Leaf s) = Just s
leafValue _ = Nothing
merge (head :< tail) = DList.cons head (Prologue.fold tail)
assignParent parentLabel n tree
| n > 0 = let gram :< functor = runCofree tree in cofree $ prependParent parentLabel gram :< assignSiblings (assignParent parentLabel (pred n) <$> functor)
| otherwise = tree
prependParent parentLabel gram = gram { stem = parentLabel : stem gram }
assignSiblings functor = case functor of
Leaf a -> Leaf a
Indexed a -> Indexed $ windowed q setBases [] a
Fixed a -> Fixed $ windowed q setBases [] a
Keyed a -> Keyed . Map.fromList $ windowed q setBasesKV [] (Map.toList a)
setBases child siblings rest = let (gram :< further) = (runCofree child) in cofree (setBase gram (siblings >>= base . extract) :< further) : rest
setBasesKV (key, child) siblings rest = let (gram :< further) = (runCofree child) in (key, cofree (setBase gram (siblings >>= base . extract . snd) :< further)) : rest
setBase gram newBase = gram { base = take q (newBase <> repeat Nothing) }
setRootBase term = let (a :< f) = runCofree term in cofree (setBase a (base a) :< f)
setRootStem = foldr (\ p rest -> assignParent Nothing p . rest) identity [0..p]
-- | A sliding-window fold over _n_ items of a list per iteration.
windowed :: Int -> (a -> [a] -> b -> b) -> b -> [a] -> b
windowed n f seed = para alg
where alg xs = case xs of
Cons a (as, b) -> f a (take n $ a : as) b
Nil -> seed
-- | Compute a vector with the specified number of dimensions, as an approximation of a bag of `Gram`s summarizing a tree.
featureVector :: Hashable label => Int -> DList.DList (Gram label) -> Vector.Vector Double
featureVector d bag = sumVectors $ unitDVector . hash <$> bag
where unitDVector hash = normalize . (`evalRand` mkQCGen hash) $ Prologue.sequence (Vector.replicate d getRandom)
normalize vec = fmap (/ vmagnitude vec) vec
sumVectors = DList.foldr (Vector.zipWith (+)) (Vector.replicate d 0)
-- | The magnitude of a Euclidean vector, i.e. its distance from the origin.
vmagnitude :: Vector.Vector Double -> Double
vmagnitude = sqrtDouble . Vector.sum . fmap (** 2)
-- Instances
instance Hashable label => Hashable (Gram label) where
hashWithSalt _ = hash
hash gram = hash (stem gram <> base gram)
-- | Construct a generator for arbitrary `Gram`s of size `(p, q)`.
gramWithPQ :: Arbitrary label => Int -> Int -> Gen (Gram label)
gramWithPQ p q = Gram <$> vectorOf p arbitrary <*> vectorOf q arbitrary
instance Arbitrary label => Arbitrary (Gram label) where
arbitrary = join $ gramWithPQ <$> arbitrary <*> arbitrary
shrink (Gram a b) = Gram <$> shrink a <*> shrink b

67
src/Data/Record.hs Normal file
View File

@ -0,0 +1,67 @@
{-# LANGUAGE DataKinds, GADTs, KindSignatures, MultiParamTypeClasses, TypeOperators #-}
module Data.Record where
import Prologue
import Test.QuickCheck
-- | A type-safe, extensible record structure.
-- |
-- | This is heavily inspired by Aaron Levins [Extensible Effects in the van Laarhoven Free Monad](http://aaronlevin.ca/post/136494428283/extensible-effects-in-the-van-laarhoven-free-monad).
data Record :: [*] -> * where
RNil :: Record '[]
RCons :: h -> Record t -> Record (h ': t)
infixr 0 .:
-- | Infix synonym for `RCons`: `a .: b .: RNil == RCons a (RCons b RNil)`.
(.:) :: h -> Record t -> Record (h ': t)
(.:) = RCons
-- Classes
-- | HasField enables indexing a Record by (phantom) type tags.
class HasField (fields :: [*]) (field :: *) where
getField :: Record fields -> field
setField :: Record fields -> field -> Record fields
-- Instances
-- OVERLAPPABLE is required for the HasField instances so that we can handle the two cases: either the head of the non-empty h-list is the requested field, or it isnt. The third possible case (the h-list is empty) is rejected at compile-time.
instance {-# OVERLAPPABLE #-} HasField fields field => HasField (notIt ': fields) field where
getField (RCons _ t) = getField t
setField (RCons h t) f = RCons h (setField t f)
instance {-# OVERLAPPABLE #-} HasField (field ': fields) field where
getField (RCons h _) = h
setField (RCons _ t) f = RCons f t
instance (Show h, Show (Record t)) => Show (Record (h ': t)) where
showsPrec n (RCons h t) = showsPrec n h . (" : " <>) . showsPrec n t
instance Show (Record '[]) where
showsPrec _ RNil = ("'[]" <>)
instance (Eq h, Eq (Record t)) => Eq (Record (h ': t)) where
RCons h1 t1 == RCons h2 t2 = h1 == h2 && t1 == t2
instance Eq (Record '[]) where
_ == _ = True
instance (Ord h, Ord (Record t)) => Ord (Record (h ': t)) where
RCons h1 t1 `compare` RCons h2 t2 = let h = h1 `compare` h2 in
if h == EQ then t1 `compare` t2 else h
instance Ord (Record '[]) where
_ `compare` _ = EQ
instance Arbitrary fields => Arbitrary (Record '[fields]) where
arbitrary = RCons <$> arbitrary <*> pure RNil
shrink (RCons h t) = RCons <$> shrink h <*> pure t

View File

@ -1,9 +1,11 @@
{-# LANGUAGE TypeFamilies, TypeSynonymInstances #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}
module Diff where
import Prologue
import Data.Functor.Foldable as Foldable
import Data.Functor.Both
import Data.Functor.Both as Both
import qualified Data.OrderedMap as Map
import Patch
import Syntax
import Term
@ -13,8 +15,8 @@ type DiffF leaf annotation = FreeF (CofreeF (Syntax leaf) (Both annotation)) (Pa
type Diff a annotation = Free (CofreeF (Syntax a) (Both annotation)) (Patch (Term a annotation))
type instance Base (Free f a) = FreeF f a
instance (Functor f) => Foldable.Foldable (Free f a) where project = runFree
instance (Functor f) => Foldable.Unfoldable (Free f a) where embed = free
instance Functor f => Foldable.Foldable (Free f a) where project = runFree
instance Functor f => Foldable.Unfoldable (Free f a) where embed = free
diffSum :: (Patch (Term a annotation) -> Integer) -> Diff a annotation -> Integer
diffSum patchCost diff = sum $ fmap patchCost diff
@ -22,3 +24,22 @@ diffSum patchCost diff = sum $ fmap patchCost diff
-- | The sum of the node count of the diffs patches.
diffCost :: Diff a annotation -> Integer
diffCost = diffSum $ patchSum termSize
-- | Merge a diff using a function to provide the Term (in Maybe, to simplify recovery of the before/after state) for every Patch.
mergeMaybe :: (Patch (Term leaf annotation) -> Maybe (Term leaf annotation)) -> Diff leaf annotation -> Maybe (Term leaf annotation)
mergeMaybe transform = cata algebra . fmap transform
where algebra :: FreeF (CofreeF (Syntax leaf) (Both annotation)) (Maybe (Term leaf annotation)) (Maybe (Term leaf annotation)) -> Maybe (Term leaf annotation)
algebra (Pure p) = p
algebra (Free (annotations :< syntax)) = Just . cofree $ Both.fst annotations :< case syntax of
Leaf s -> Leaf s
Indexed i -> Indexed (catMaybes i)
Fixed i -> Fixed (catMaybes i)
Keyed i -> Keyed (Map.fromList (Map.toList i >>= (\ (k, v) -> maybe [] (pure . (,) k) v)))
-- | Recover the before state of a diff.
beforeTerm :: Diff leaf annotation -> Maybe (Term leaf annotation)
beforeTerm = mergeMaybe before
-- | Recover the after state of a diff.
afterTerm :: Diff leaf annotation -> Maybe (Term leaf annotation)
afterTerm = mergeMaybe after

View File

@ -4,7 +4,6 @@ import Diff
import Data.Bifunctor.Join
import Data.Bifunctor.Join.Arbitrary ()
import Data.Functor.Foldable (unfold)
import qualified Data.List as List
import qualified Data.OrderedMap as Map
import Patch
import Patch.Arbitrary ()
@ -13,32 +12,25 @@ import Prologue
import Term.Arbitrary
import Test.QuickCheck hiding (Fixed)
newtype ArbitraryDiff leaf annotation = ArbitraryDiff { unArbitraryDiff :: FreeF (CofreeF (Syntax leaf) (Join (,) annotation)) (Patch (ArbitraryTerm leaf annotation)) (ArbitraryDiff leaf annotation) }
data ArbitraryDiff leaf annotation
= ArbitraryFree (Join (,) annotation) (Syntax leaf (ArbitraryDiff leaf annotation))
| ArbitraryPure (Patch (ArbitraryTerm leaf annotation))
deriving (Show, Eq, Generic)
unArbitraryDiff :: ArbitraryDiff leaf annotation -> FreeF (CofreeF (Syntax leaf) (Join (,) annotation)) (Patch (ArbitraryTerm leaf annotation)) (ArbitraryDiff leaf annotation)
unArbitraryDiff (ArbitraryFree a s) = Free (a :< s)
unArbitraryDiff (ArbitraryPure p) = Pure p
toDiff :: ArbitraryDiff leaf annotation -> Diff leaf annotation
toDiff = fmap (fmap toTerm) . unfold unArbitraryDiff
diffOfSize :: (Arbitrary leaf, Arbitrary annotation) => Int -> Gen (ArbitraryDiff leaf annotation)
diffOfSize n
| n <= 0 = (ArbitraryDiff .) . (Free .) . (:<) <$> arbitrary <*> syntaxOfSize n
| n <= 0 = ArbitraryFree <$> arbitrary <*> syntaxOfSize diffOfSize n
| otherwise = oneof
[ (ArbitraryDiff .) . (Free .) . (:<) <$> arbitrary <*> syntaxOfSize n
, ArbitraryDiff . Pure <$> patchOfSize n ]
where syntaxOfSize n | n <= 1 = oneof $ (Leaf <$> arbitrary) : branchGeneratorsOfSize n
| otherwise = oneof $ branchGeneratorsOfSize n
branchGeneratorsOfSize n =
[ Indexed <$> childrenOfSize (pred n)
, Fixed <$> childrenOfSize (pred n)
, (Keyed .) . (Map.fromList .) . zip <$> infiniteListOf arbitrary <*> childrenOfSize (pred n)
]
childrenOfSize n | n <= 0 = pure []
childrenOfSize n = do
m <- choose (1, n)
first <- diffOfSize m
rest <- childrenOfSize (n - m)
pure $! first : rest
patchOfSize 1 = oneof [ Insert <$> termOfSize 1
[ ArbitraryFree <$> arbitrary <*> syntaxOfSize diffOfSize n
, ArbitraryPure <$> patchOfSize n ]
where patchOfSize 1 = oneof [ Insert <$> termOfSize 1
, Delete <$> termOfSize 1 ]
patchOfSize n = do
m <- choose (1, n - 1)
@ -59,11 +51,4 @@ instance (Eq leaf, Eq annotation, Arbitrary leaf, Arbitrary annotation) => Arbit
m <- choose (0, n)
diffOfSize m
shrink diff@(ArbitraryDiff annotated) = case annotated of
Free (annotation :< syntax) -> (subterms diff ++) $ filter (/= diff) $
(ArbitraryDiff .) . (Free .) . (:<) <$> shrink annotation <*> case syntax of
Leaf a -> Leaf <$> shrink a
Indexed i -> Indexed <$> (List.subsequences i >>= recursivelyShrink)
Fixed f -> Fixed <$> (List.subsequences f >>= recursivelyShrink)
Keyed k -> Keyed . Map.fromList <$> (List.subsequences (Map.toList k) >>= recursivelyShrink)
Pure patch -> ArbitraryDiff . Pure <$> shrink patch
shrink = genericShrink

View File

@ -4,6 +4,7 @@ import Prologue hiding (fst, snd)
import qualified Data.ByteString.Char8 as B1
import Data.Functor.Both
import Data.Functor.Foldable
import Data.Record
import qualified Data.Text as T
import qualified Data.Text.ICU.Detect as Detect
import qualified Data.Text.ICU.Convert as Convert
@ -39,8 +40,8 @@ lineByLineParser input = pure . cofree . root $ case foldl' annotateLeaves ([],
where
lines = actualLines input
root children = let size = 1 + fromIntegral (length children) in
Info (Range 0 $ length input) (Other "program") size size :< Indexed children
leaf charIndex line = Info (Range charIndex $ charIndex + T.length line) (Other "program") 1 1 :< Leaf line
((Range 0 $ length input) .: Other "program" .: size .: Cost (unSize size) .: RNil) :< Indexed children
leaf charIndex line = ((Range charIndex $ charIndex + T.length line) .: Other "program" .: 1 .: 1 .: RNil) :< Leaf line
annotateLeaves (accum, charIndex) line =
(accum ++ [ leaf charIndex (toText line) ]
, charIndex + length line)
@ -55,13 +56,13 @@ breakDownLeavesByWord :: Source Char -> Term T.Text Info -> Term T.Text Info
breakDownLeavesByWord source = cata replaceIn
where
replaceIn :: TermF T.Text Info (Term T.Text Info) -> Term T.Text Info
replaceIn (info :< syntax) = let size' = 1 + sum (size . extract <$> syntax') in cofree $ info { size = size', cost = size' } :< syntax'
replaceIn (info :< syntax) = let size' = 1 + sum (size . extract <$> syntax') in cofree $ setCost (setSize info size') (Cost (unSize size')) :< syntax'
where syntax' = case (ranges, syntax) of
(_:_:_, Leaf _) -> Indexed (makeLeaf info <$> ranges)
_ -> syntax
ranges = rangesAndWordsInSource (characterRange info)
rangesAndWordsInSource range = rangesAndWordsFrom (start range) (toString $ slice range source)
makeLeaf info (range, substring) = cofree $ info { characterRange = range } :< Leaf (T.pack substring)
makeLeaf info (range, substring) = cofree $ setCharacterRange info range :< Leaf (T.pack substring)
-- | Transcode a file to a unicode source.
transcode :: B1.ByteString -> IO (Source Char)
@ -95,7 +96,6 @@ diffFiles parser renderer sourceBlobs = do
pure $! renderer textDiff sourceBlobs
where construct :: CofreeF (Syntax Text) (Both Info) (Diff Text Info) -> Diff Text Info
construct (info :< syntax) = free (Free ((setCost <$> info <*> sumCost syntax) :< syntax))
setCost info cost = info { cost = cost }
sumCost = fmap getSum . foldMap (fmap Sum . getCost)
getCost diff = case runFree diff of
Free (info :< _) -> cost <$> info
@ -104,6 +104,6 @@ diffFiles parser renderer sourceBlobs = do
-- | The sum of the node count of the diffs patches.
diffCostWithCachedTermSizes :: Diff a Info -> Integer
diffCostWithCachedTermSizes diff = case runFree diff of
diffCostWithCachedTermSizes diff = unCost $ case runFree diff of
Free (info :< _) -> sum (cost <$> info)
Pure patch -> sum (cost . extract <$> patch)

View File

@ -1,10 +1,40 @@
{-# LANGUAGE DataKinds, GeneralizedNewtypeDeriving #-}
module Info where
import Data.Record
import Prologue
import Category
import Range
-- | An annotation for a source file, including the source range and semantic
-- | categories.
data Info = Info { characterRange :: !Range, category :: !Category, size :: !Integer, cost :: !Integer }
deriving (Eq, Show)
newtype Size = Size { unSize :: Integer }
deriving (Eq, Num, Ord, Show)
newtype Cost = Cost { unCost :: Integer }
deriving (Eq, Num, Ord, Show)
type InfoFields = '[ Range, Category, Size, Cost ]
type Info = Record InfoFields
characterRange :: HasField fields Range => Record fields -> Range
characterRange = getField
setCharacterRange :: HasField fields Range => Record fields -> Range -> Record fields
setCharacterRange = setField
category :: HasField fields Category => Record fields -> Category
category = getField
setCategory :: HasField fields Category => Record fields -> Category -> Record fields
setCategory = setField
size :: HasField fields Size => Record fields -> Size
size = getField
setSize :: HasField fields Size => Record fields -> Size -> Record fields
setSize = setField
cost :: HasField fields Cost => Record fields -> Cost
cost = getField
setCost :: HasField fields Cost => Record fields -> Cost -> Record fields
setCost = setField

View File

@ -1,13 +1,20 @@
module Interpreter (Comparable, DiffConstructor, diffTerms) where
import Algorithm
import Category
import Data.Align
import Data.Functor.Foldable
import Data.Functor.Both
import qualified Data.OrderedMap as Map
import qualified Data.List as List
import Data.Hashable
import Data.List ((\\))
import qualified Data.List as List
import Data.OrderedMap ((!))
import qualified Data.OrderedMap as Map
import Data.RandomWalkSimilarity
import Data.Record
import Data.These
import Diff
import Info
import Operation
import Patch
import Prologue hiding (lookup)
@ -16,17 +23,17 @@ import Syntax
import Term
-- | Returns whether two terms are comparable
type Comparable a annotation = Term a annotation -> Term a annotation -> Bool
type Comparable leaf annotation = Term leaf annotation -> Term leaf annotation -> Bool
-- | Constructs a diff from the CofreeF containing its annotation and syntax. This function has the opportunity to, for example, cache properties in the annotation.
type DiffConstructor leaf annotation = CofreeF (Syntax leaf) (Both annotation) (Diff leaf annotation) -> Diff leaf annotation
-- | Diff two terms, given a function that determines whether two terms can be compared and a cost function.
diffTerms :: (Eq a, Eq annotation) => DiffConstructor a annotation -> Comparable a annotation -> Cost (Diff a annotation) -> Term a annotation -> Term a annotation -> Diff a annotation
diffTerms :: (Eq leaf, Hashable leaf, Ord (Record fields), HasField fields Category) => DiffConstructor leaf (Record fields) -> Comparable leaf (Record fields) -> SES.Cost (Diff leaf (Record fields)) -> Term leaf (Record fields) -> Term leaf (Record fields) -> Diff leaf (Record fields)
diffTerms construct comparable cost a b = fromMaybe (pure $ Replace a b) $ constructAndRun construct comparable cost a b
-- | Constructs an algorithm and runs it
constructAndRun :: (Eq a, Eq annotation) => DiffConstructor a annotation -> Comparable a annotation -> Cost (Diff a annotation) -> Term a annotation -> Term a annotation -> Maybe (Diff a annotation)
constructAndRun :: (Eq leaf, Hashable leaf, Ord (Record fields), HasField fields Category) => DiffConstructor leaf (Record fields) -> Comparable leaf (Record fields) -> SES.Cost (Diff leaf (Record fields)) -> Term leaf (Record fields) -> Term leaf (Record fields) -> Maybe (Diff leaf (Record fields))
constructAndRun _ comparable _ a b | not $ comparable a b = Nothing
constructAndRun construct _ _ a b | (() <$ a) == (() <$ b) = hylo construct runCofree <$> zipTerms a b
@ -41,21 +48,23 @@ constructAndRun construct comparable cost t1 t2 =
annotate = pure . construct . (both annotation1 annotation2 :<)
-- | Runs the diff algorithm
run :: (Eq a, Eq annotation) => DiffConstructor a annotation -> Comparable a annotation -> Cost (Diff a annotation) -> Algorithm a annotation (Diff a annotation) -> Maybe (Diff a annotation)
run :: (Eq leaf, Hashable leaf, Ord (Record fields), HasField fields Category) => DiffConstructor leaf (Record fields) -> Comparable leaf (Record fields) -> SES.Cost (Diff leaf (Record fields)) -> Algorithm leaf (Record fields) (Diff leaf (Record fields)) -> Maybe (Diff leaf (Record fields))
run construct comparable cost algorithm = case runFree algorithm of
Pure diff -> Just diff
Free (Recursive t1 t2 f) -> run construct comparable cost . f $ recur a b where
(annotation1 :< a, annotation2 :< b) = (runCofree t1, runCofree t2)
annotate = construct . (both annotation1 annotation2 :<)
recur (Indexed a') (Indexed b') | length a' == length b' = annotate . Indexed $ zipWith (diffTerms construct comparable cost) a' b'
recur (Fixed a') (Fixed b') | length a' == length b' = annotate . Fixed $ zipWith (diffTerms construct comparable cost) a' b'
recur (Indexed a') (Indexed b') = annotate . Indexed $ alignWith diffThese a' b'
recur (Fixed a') (Fixed b') = annotate . Fixed $ alignWith diffThese a' b'
recur (Keyed a') (Keyed b') | Map.keys a' == bKeys = annotate . Keyed . Map.fromList . fmap repack $ bKeys where
bKeys = Map.keys b'
repack key = (key, interpretInBoth key a' b')
interpretInBoth key x y = diffTerms construct comparable cost (x ! key) (y ! key)
recur _ _ = pure $ Replace (cofree (annotation1 :< a)) (cofree (annotation2 :< b))
diffThese = these (pure . Delete) (pure . Insert) (diffTerms construct comparable cost)
Free (ByKey a b f) -> run construct comparable cost $ f byKey where
byKey = Map.fromList $ toKeyValue <$> List.union aKeys bKeys
toKeyValue key | key `List.elem` deleted = (key, pure . Delete $ a ! key)
@ -67,3 +76,6 @@ run construct comparable cost algorithm = case runFree algorithm of
inserted = bKeys \\ aKeys
Free (ByIndex a b f) -> run construct comparable cost . f $ ses (constructAndRun construct comparable cost) cost a b
Free (ByRandomWalkSimilarity a b f) -> run construct comparable cost . f $ rws (constructAndRun construct comparable cost) getLabel a b
where getLabel = category

View File

@ -10,11 +10,11 @@ data Operation
a -- ^ The type of leaves in the syntax tree, typically String, but possibly some datatype representing different leaves more precisely.
annotation -- ^ The type of annotations.
f -- ^ The type representing another level of the diffing algorithm. Often Algorithm.
=
-- | Recursively diff two terms and pass the result to the continuation.
Recursive (Term a annotation) (Term a annotation) (Diff a annotation -> f)
= Recursive (Term a annotation) (Term a annotation) (Diff a annotation -> f)
-- | Diff two dictionaries and pass the result to the continuation.
| ByKey (OrderedMap Text (Term a annotation)) (OrderedMap Text (Term a annotation)) (OrderedMap Text (Diff a annotation) -> f)
-- | Diff two arrays and pass the result to the continuation.
| ByIndex [Term a annotation] [Term a annotation] ([Diff a annotation] -> f)
| ByRandomWalkSimilarity [Term a annotation] [Term a annotation] ([Diff a annotation] -> f)
deriving Functor

View File

@ -12,19 +12,19 @@ import Data.These
import Prologue
-- | An operation to replace, insert, or delete an item.
data Patch a =
Replace a a
data Patch a
= Replace a a
| Insert a
| Delete a
deriving (Eq, Foldable, Functor, Generic, Show, Traversable)
deriving (Eq, Foldable, Functor, Generic, Ord, Show, Traversable)
-- | Return the item from the after side of the patch.
after :: Patch a -> Maybe a
after = maybeFst . unPatch
after = maybeSnd . unPatch
-- | Return the item from the before side of the patch.
before :: Patch a -> Maybe a
before = maybeSnd . unPatch
before = maybeFst . unPatch
-- | Return both sides of a patch.
unPatch :: Patch a -> These a a

View File

@ -66,7 +66,7 @@ lineFields n term range = [ "number" .= n
]
termFields :: (ToJSON recur, KeyValue kv) => Info -> Syntax leaf recur -> [kv]
termFields Info{..} syntax = "range" .= characterRange : "category" .= category : case syntax of
termFields info syntax = "range" .= characterRange info : "category" .= category info : case syntax of
Leaf _ -> []
Indexed c -> childrenFields c
Fixed c -> childrenFields c

View File

@ -93,11 +93,11 @@ split diff blobs = TL.toStrict . renderHtml
newtype Renderable a = Renderable a
instance ToMarkup f => ToMarkup (Renderable (Source Char, Info, Syntax a (f, Range))) where
toMarkup (Renderable (source, Info {..}, syntax)) = (! A.data_ (stringValue (show size))) . classifyMarkup category $ case syntax of
Leaf _ -> span . string . toString $ slice characterRange source
Indexed children -> ul . mconcat $ wrapIn li <$> contentElements source characterRange children
Fixed children -> ul . mconcat $ wrapIn li <$> contentElements source characterRange children
Keyed children -> dl . mconcat $ wrapIn dd <$> contentElements source characterRange children
toMarkup (Renderable (source, info, syntax)) = (! A.data_ (stringValue (show (unSize (size info))))) . classifyMarkup (category info) $ case syntax of
Leaf _ -> span . string . toString $ slice (characterRange info) source
Indexed children -> ul . mconcat $ wrapIn li <$> contentElements source (characterRange info) children
Fixed children -> ul . mconcat $ wrapIn li <$> contentElements source (characterRange info) children
Keyed children -> dl . mconcat $ wrapIn dd <$> contentElements source (characterRange info) children
contentElements :: (Foldable t, ToMarkup f) => Source Char -> Range -> t (f, Range) -> [Markup]
contentElements source range children = let (elements, next) = foldr' (markupForContextAndChild source) ([], end range) children in
@ -114,13 +114,13 @@ wrapIn _ l@Blaze.Comment{} = l
wrapIn f p = f p
instance ToMarkup (Renderable (Source Char, Term a Info)) where
toMarkup (Renderable (source, term)) = Prologue.fst $ cata (\ (info@(Info{..}) :< syntax) -> (toMarkup $ Renderable (source, info, syntax), characterRange)) term
toMarkup (Renderable (source, term)) = Prologue.fst $ cata (\ (info :< syntax) -> (toMarkup $ Renderable (source, info, syntax), characterRange info)) term
instance ToMarkup (Renderable (Source Char, SplitDiff a Info)) where
toMarkup (Renderable (source, diff)) = Prologue.fst $ iter (\ (info@(Info{..}) :< syntax) -> (toMarkup $ Renderable (source, info, syntax), characterRange)) $ toMarkupAndRange <$> diff
toMarkup (Renderable (source, diff)) = Prologue.fst $ iter (\ (info :< syntax) -> (toMarkup $ Renderable (source, info, syntax), characterRange info)) $ toMarkupAndRange <$> diff
where toMarkupAndRange :: SplitPatch (Term a Info) -> (Markup, Range)
toMarkupAndRange patch = let term@(Info{..} :< _) = runCofree $ getSplitTerm patch in
((div ! A.class_ (splitPatchToClassName patch) ! A.data_ (stringValue (show size))) . toMarkup $ Renderable (source, cofree term), characterRange)
toMarkupAndRange patch = let term@(info :< _) = runCofree $ getSplitTerm patch in
((div ! A.class_ (splitPatchToClassName patch) ! A.data_ (stringValue (show (unSize (size info))))) . toMarkup $ Renderable (source, cofree term), characterRange info)
instance ToMarkup a => ToMarkup (Renderable (Bool, Int, a)) where
toMarkup (Renderable (hasChanges, num, line)) =

View File

@ -1,5 +1,6 @@
module SplitDiff where
import Data.Record
import Info
import Range
import Prologue
@ -17,7 +18,7 @@ getSplitTerm (SplitDelete a) = a
getSplitTerm (SplitReplace a) = a
-- | Get the range of a SplitDiff.
getRange :: SplitDiff leaf Info -> Range
getRange :: HasField fields Range => SplitDiff leaf (Record fields) -> Range
getRange diff = characterRange $ case runFree diff of
Free annotated -> headF annotated
Pure patch -> extract (getSplitTerm patch)

View File

@ -1,8 +1,10 @@
module Syntax where
import Prologue
import Data.OrderedMap
import Data.OrderedMap as Map
import Data.Text.Arbitrary ()
import qualified Data.Text as T
import Test.QuickCheck hiding (Fixed)
-- | A node in an abstract syntax tree.
data Syntax
@ -17,4 +19,27 @@ data Syntax
| Fixed [f]
-- | A branch of child nodes indexed by some String identity. This is useful for identifying e.g. methods & properties in a class scope by their names. Note that comments can generally occur in these scopes as well; one strategy for dealing with this is to identify comments by their text in the source.
| Keyed (OrderedMap T.Text f)
deriving (Functor, Show, Eq, Foldable, Traversable)
deriving (Eq, Foldable, Functor, Generic, Ord, Show, Traversable)
-- Instances
syntaxOfSize :: Arbitrary leaf => (Int -> Gen f) -> Int -> Gen (Syntax leaf f)
syntaxOfSize recur n | n <= 1 = oneof $ (Leaf <$> arbitrary) : branchGeneratorsOfSize n
| otherwise = oneof $ branchGeneratorsOfSize n
where branchGeneratorsOfSize n =
[ Indexed <$> childrenOfSize (pred n)
, Fixed <$> childrenOfSize (pred n)
, (Keyed .) . (Map.fromList .) . zip <$> infiniteListOf arbitrary <*> childrenOfSize (pred n)
]
childrenOfSize n | n <= 0 = pure []
childrenOfSize n = do
m <- choose (1, n)
first <- recur m
rest <- childrenOfSize (n - m)
pure $! first : rest
instance (Arbitrary leaf, Arbitrary f) => Arbitrary (Syntax leaf f) where
arbitrary = sized (syntaxOfSize (`resize` arbitrary) )
shrink = genericShrink

View File

@ -1,4 +1,5 @@
{-# LANGUAGE TypeFamilies, TypeSynonymInstances #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}
module Term where
import Prologue

View File

@ -2,52 +2,36 @@
module Term.Arbitrary where
import Data.Functor.Foldable (Base, cata, unfold, Unfoldable(embed))
import qualified Data.List as List
import qualified Data.OrderedMap as Map
import Data.Text.Arbitrary ()
import Prologue
import Syntax
import Term
import Test.QuickCheck hiding (Fixed)
newtype ArbitraryTerm leaf annotation = ArbitraryTerm { unArbitraryTerm :: TermF leaf annotation (ArbitraryTerm leaf annotation) }
data ArbitraryTerm leaf annotation = ArbitraryTerm annotation (Syntax leaf (ArbitraryTerm leaf annotation))
deriving (Show, Eq, Generic)
unArbitraryTerm :: ArbitraryTerm leaf annotation -> TermF leaf annotation (ArbitraryTerm leaf annotation)
unArbitraryTerm (ArbitraryTerm a s) = a :< s
toTerm :: ArbitraryTerm leaf annotation -> Term leaf annotation
toTerm = unfold unArbitraryTerm
termOfSize :: (Arbitrary leaf, Arbitrary annotation) => Int -> Gen (ArbitraryTerm leaf annotation)
termOfSize n = (ArbitraryTerm .) . (:<) <$> arbitrary <*> syntaxOfSize n
where syntaxOfSize n | n <= 1 = oneof $ (Leaf <$> arbitrary) : branchGeneratorsOfSize n
| otherwise = oneof $ branchGeneratorsOfSize n
branchGeneratorsOfSize n =
[ Indexed <$> childrenOfSize (pred n)
, Fixed <$> childrenOfSize (pred n)
, (Keyed .) . (Map.fromList .) . zip <$> infiniteListOf arbitrary <*> childrenOfSize (pred n)
]
childrenOfSize n | n <= 0 = pure []
childrenOfSize n = do
m <- choose (1, n)
first <- termOfSize m
rest <- childrenOfSize (n - m)
pure $! first : rest
termOfSize n = ArbitraryTerm <$> arbitrary <*> syntaxOfSize termOfSize n
arbitraryTermSize :: ArbitraryTerm leaf annotation -> Int
arbitraryTermSize = cata (succ . sum) . toTerm
-- Instances
type instance Base (ArbitraryTerm leaf annotation) = CofreeF (Syntax leaf) annotation
instance Unfoldable (ArbitraryTerm leaf annotation) where embed = ArbitraryTerm
type instance Base (ArbitraryTerm leaf annotation) = TermF leaf annotation
instance Unfoldable (ArbitraryTerm leaf annotation) where embed (a :< s) = ArbitraryTerm a s
instance (Eq leaf, Eq annotation, Arbitrary leaf, Arbitrary annotation) => Arbitrary (ArbitraryTerm leaf annotation) where
arbitrary = sized $ \ n -> do
m <- choose (0, n)
termOfSize m
shrink term@(ArbitraryTerm (annotation :< syntax)) = (subterms term ++) $ filter (/= term) $
(ArbitraryTerm .) . (:<) <$> shrink annotation <*> case syntax of
Leaf a -> Leaf <$> shrink a
Indexed i -> Indexed <$> (List.subsequences i >>= recursivelyShrink)
Fixed f -> Fixed <$> (List.subsequences f >>= recursivelyShrink)
Keyed k -> Keyed . Map.fromList <$> (List.subsequences (Map.toList k) >>= recursivelyShrink)
shrink = genericShrink

View File

@ -1,6 +1,7 @@
module TreeSitter where
import Prologue hiding (Constructor)
import Data.Record
import Data.String
import Category
import Info
@ -61,7 +62,7 @@ documentToTerm language document contents = alloca $ \ root -> do
range <- pure $! Range { start = fromIntegral $ ts_node_p_start_char node, end = fromIntegral $ ts_node_p_end_char node }
let size' = 1 + sum (size . extract <$> children)
let info = Info range (categoriesForLanguage language name) size' size'
let info = range .: (categoriesForLanguage language name) .: size' .: Cost (unSize size') .: RNil
pure $! termConstructor contents info children
getChild node n out = do
_ <- ts_node_p_named_child node n out

View File

@ -1,7 +1,6 @@
module AlignmentSpec where
import Alignment
import ArbitraryTerm ()
import Control.Arrow ((&&&))
import Control.Monad.State
import Data.Align hiding (align)
@ -11,6 +10,7 @@ import Data.Bifunctor.Join.Arbitrary ()
import Data.Functor.Both as Both
import Data.List (nub)
import Data.Monoid
import Data.Record
import Data.String
import Data.Text.Arbitrary ()
import Data.These
@ -48,7 +48,7 @@ spec = parallel $ do
prop "covers every input line" $
\ elements -> let (_, children, ranges) = toAlignBranchInputs elements in
join <$> (traverse (modifyJoin (fromThese [] []) . fmap pure . fmap Prologue.fst) (alignBranch Prologue.snd children ranges)) `shouldBe` ranges
join <$> (traverse (modifyJoin (fromThese [] []) . fmap (pure . Prologue.fst)) (alignBranch Prologue.snd children ranges)) `shouldBe` ranges
prop "covers every input child" $
\ elements -> let (_, children, ranges) = toAlignBranchInputs elements in
@ -193,7 +193,7 @@ spec = parallel $ do
(info 8 9 `branch` [ info 8 9 `leaf` "c" ]))
]
describe "numberedRows" $
describe "numberedRows" $ do
prop "counts only non-empty values" $
\ xs -> counts (numberedRows (xs :: [Join These Char])) `shouldBe` length . catMaybes <$> Join (unalign (runJoin <$> xs))
@ -236,9 +236,9 @@ keysOfAlignedChildren lines = lines >>= these identity identity (++) . runJoin .
instance Arbitrary BranchElement where
arbitrary = oneof [ key >>= \ key -> Child key <$> joinTheseOf (contents key)
, Margin <$> joinTheseOf margin ]
where key = listOf1 (elements (['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']))
where key = listOf1 (elements (['a'..'z'] <> ['A'..'Z'] <> ['0'..'9']))
contents key = wrap key <$> listOf (padding '*')
wrap key contents = "(" ++ key ++ contents ++ ")" :: String
wrap key contents = "(" <> key <> contents <> ")" :: String
margin = listOf (padding '-')
padding char = frequency [ (10, pure char)
, (1, pure '\n') ]
@ -258,7 +258,7 @@ align :: Both (Source.Source Char) -> ConstructibleFree (Patch (Term String Info
align sources = PrettyDiff sources . fmap (fmap (getRange &&& identity)) . alignDiff sources . deconstruct
info :: Int -> Int -> Info
info start end = Info (Range start end) StringLiteral 0 0
info start end = Range start end .: StringLiteral .: 0 .: 0 .: RNil
prettyDiff :: Both (Source.Source Char) -> [Join These (ConstructibleFree (SplitPatch (Term String Info)) Info)] -> PrettyDiff (SplitDiff String Info)
prettyDiff sources = PrettyDiff sources . fmap (fmap ((getRange &&& identity) . deconstruct))

View File

@ -1,16 +0,0 @@
{-# OPTIONS_GHC -fno-warn-orphans #-}
module ArbitraryTerm where
import Data.Text.Arbitrary ()
import Data.These.Arbitrary ()
import Prologue hiding (fst, snd)
import Source hiding ((++))
import Test.QuickCheck hiding (Fixed)
data CategorySet = A | B | C | D deriving (Eq, Show)
instance Arbitrary CategorySet where
arbitrary = elements [ A, B, C, D ]
instance Arbitrary a => Arbitrary (Source a) where
arbitrary = Source.fromList <$> arbitrary

View File

@ -0,0 +1,36 @@
module Data.RandomWalkSimilarity.Spec where
import Category
import Data.DList as DList hiding (toList)
import Data.RandomWalkSimilarity
import Diff
import Patch
import Prologue
import Syntax
import Term
import Term.Arbitrary
import Test.Hspec
import Test.Hspec.QuickCheck
import Test.QuickCheck
spec :: Spec
spec = parallel $ do
describe "pqGrams" $ do
prop "produces grams with stems of the specified length" . forAll (arbitrary `suchThat` (\ (_, p, q) -> p > 0 && q > 0)) $
\ (term, p, q) -> pqGrams p q identity (toTerm term :: Term Text Text) `shouldSatisfy` all ((== p) . length . stem)
prop "produces grams with bases of the specified length" . forAll (arbitrary `suchThat` (\ (_, p, q) -> p > 0 && q > 0)) $
\ (term, p, q) -> pqGrams p q identity (toTerm term :: Term Text Text) `shouldSatisfy` all ((== q) . length . base)
describe "featureVector" $ do
prop "produces a vector of the specified dimension" . forAll (arbitrary `suchThat` ((> 0) . Prologue.snd)) $
\ (grams, d) -> length (featureVector d (fromList (grams :: [Gram Text]))) `shouldBe` d
describe "rws" $ do
let compare a b = if extract a == extract b then Just (pure (Replace a b)) else Nothing
prop "produces correct diffs" . forAll (scale (`div` 4) arbitrary) $
\ (as, bs) -> let tas = toTerm <$> as
tbs = toTerm <$> bs
diff = free (Free (pure Program :< Indexed (rws compare identity tas tbs :: [Diff Text Category]))) in
(beforeTerm diff, afterTerm diff) `shouldBe` (Just (cofree (Program :< Indexed tas)), Just (cofree (Program :< Indexed tbs)))

38
test/Diff/Spec.hs Normal file
View File

@ -0,0 +1,38 @@
{-# LANGUAGE DataKinds #-}
module Diff.Spec where
import Category
import Data.Record
import Data.Text.Arbitrary ()
import Diff
import Diff.Arbitrary
import Interpreter
import Prologue
import Term.Arbitrary
import Test.Hspec
import Test.Hspec.QuickCheck
import Test.QuickCheck
spec :: Spec
spec = parallel $ do
prop "equality is reflexive" $
\ a b -> let diff = diffTerms (free . Free) (==) diffCost (toTerm a) (toTerm (b :: ArbitraryTerm Text (Record '[Category]))) in
diff `shouldBe` diff
prop "equal terms produce identity diffs" $
\ a -> let term = toTerm (a :: ArbitraryTerm Text (Record '[Category])) in
diffCost (diffTerms (free . Free) (==) diffCost term term) `shouldBe` 0
describe "beforeTerm" $ do
prop "recovers the before term" $
\ a b -> let diff = diffTerms (free . Free) (==) diffCost (toTerm a) (toTerm (b :: ArbitraryTerm Text (Record '[Category]))) in
beforeTerm diff `shouldBe` Just (toTerm a)
describe "afterTerm" $ do
prop "recovers the after term" $
\ a b -> let diff = diffTerms (free . Free) (==) diffCost (toTerm a) (toTerm (b :: ArbitraryTerm Text (Record '[Category]))) in
afterTerm diff `shouldBe` Just (toTerm b)
describe "ArbitraryDiff" $ do
prop "generates diffs of a specific size" . forAll ((arbitrary >>= \ n -> (,) n <$> diffOfSize n) `suchThat` ((> 0) . fst)) $
\ (n, diff) -> arbitraryDiffSize (diff :: ArbitraryDiff Text ()) `shouldBe` n

View File

@ -1,6 +1,7 @@
module DiffSummarySpec where
import Prologue
import Data.Record
import Data.String
import Test.Hspec
import Diff
@ -12,10 +13,10 @@ import Category
import DiffSummary
arrayInfo :: Info
arrayInfo = Info (rangeAt 0) ArrayLiteral 2 0
arrayInfo = rangeAt 0 .: ArrayLiteral .: 2 .: 0 .: RNil
literalInfo :: Info
literalInfo = Info (rangeAt 1) StringLiteral 1 0
literalInfo = rangeAt 1 .: StringLiteral .: 1 .: 0 .: RNil
testDiff :: Diff String Info
testDiff = free $ Free (pure arrayInfo :< Indexed [ free $ Pure (Insert (cofree $ literalInfo :< Leaf "a")) ])

View File

@ -1,22 +1,30 @@
{-# LANGUAGE DataKinds #-}
module InterpreterSpec where
import Prologue
import Diff
import qualified Interpreter as I
import Range
import Syntax
import Patch
import Info
import Category
import Diff
import Data.Record
import Interpreter
import Patch
import Prologue
import Syntax
import Term.Arbitrary
import Test.Hspec
import Test.Hspec.QuickCheck
spec :: Spec
spec = parallel $
describe "interpret" $
spec = parallel $ do
describe "interpret" $ do
it "returns a replacement when comparing two unicode equivalent terms" $
I.diffTerms (free . Free) ((==) `on` extract) diffCost (cofree (Info range StringLiteral 0 0 :< Leaf "t\776")) (cofree (Info range2 StringLiteral 0 0 :< Leaf "\7831")) `shouldBe`
free (Pure (Replace (cofree (Info range StringLiteral 0 0 :< Leaf "t\776")) (cofree (Info range2 StringLiteral 0 0 :< Leaf "\7831"))))
let termA = cofree $ (StringLiteral .: RNil) :< Leaf ("t\776" :: Text)
termB = cofree $ (StringLiteral .: RNil) :< Leaf "\7831" in
diffTerms (free . Free) ((==) `on` extract) diffCost termA termB `shouldBe` free (Pure (Replace termA termB))
where
range = Range 0 2
range2 = Range 0 1
prop "produces correct diffs" $
\ a b -> let diff = diffTerms (free . Free) ((==) `on` extract) diffCost (toTerm a) (toTerm b) :: Diff Text (Record '[Category]) in
(beforeTerm diff, afterTerm diff) `shouldBe` (Just (toTerm a), Just (toTerm b))
prop "constructs zero-cost diffs of equal terms" $
\ a -> let term = toTerm a
diff = diffTerms (free . Free) ((==) `on` extract) diffCost term term :: Diff Text (Record '[Category]) in
diffCost diff `shouldBe` 0

View File

@ -2,7 +2,7 @@ module PatchOutputSpec where
import Prologue
import Data.Functor.Both
import Info
import Data.Record
import Range
import Renderer.Patch
import Source
@ -14,4 +14,4 @@ spec :: Spec
spec = parallel $
describe "hunks" $
it "empty diffs have empty hunks" $
hunks (free . Free $ pure (Info (Range 0 0) StringLiteral 1 0) :< Leaf "") (both (SourceBlob (fromList "") "abcde" "path2.txt" (Just defaultPlainBlob)) (SourceBlob (fromList "") "xyz" "path2.txt" (Just defaultPlainBlob))) `shouldBe` [Hunk {offset = pure 0, changes = [], trailingContext = []}]
hunks (free . Free $ pure (Range 0 0 .: StringLiteral .: 1 .: 0 .: RNil) :< Leaf "") (both (SourceBlob (fromList "") "abcde" "path2.txt" (Just defaultPlainBlob)) (SourceBlob (fromList "") "xyz" "path2.txt" (Just defaultPlainBlob))) `shouldBe` [Hunk {offset = pure 0, changes = [], trailingContext = []}]

View File

@ -3,19 +3,23 @@ module Main where
import Prologue
import qualified AlignmentSpec
import qualified CorpusSpec
import qualified Data.RandomWalkSimilarity.Spec
import qualified Diff.Spec
import qualified DiffSummarySpec
import qualified InterpreterSpec
import qualified OrderedMapSpec
import qualified PatchOutputSpec
import qualified TermSpec
import qualified DiffSummarySpec
import Test.Hspec
main :: IO ()
main = hspec $ parallel $ do
main = hspec . parallel $ do
describe "Alignment" AlignmentSpec.spec
describe "Corpus" CorpusSpec.spec
describe "Data.RandomWalkSimilarity" Data.RandomWalkSimilarity.Spec.spec
describe "Diff.Spec" Diff.Spec.spec
describe "DiffSummary" DiffSummarySpec.spec
describe "Interpreter" InterpreterSpec.spec
describe "OrderedMap" OrderedMapSpec.spec
describe "PatchOutput" PatchOutputSpec.spec
describe "Term" TermSpec.spec
describe "DiffSummary" DiffSummarySpec.spec

View File

@ -1,11 +1,6 @@
module TermSpec where
import ArbitraryTerm
import Data.String
import Data.Text.Arbitrary ()
import Diff
import Diff.Arbitrary
import Interpreter
import Prologue
import Term.Arbitrary
import Test.Hspec
@ -16,21 +11,8 @@ spec :: Spec
spec = parallel $ do
describe "Term" $ do
prop "equality is reflexive" $
\ a -> toTerm a == toTerm (a :: ArbitraryTerm String ())
\ a -> toTerm a `shouldBe` toTerm (a :: ArbitraryTerm Text ())
describe "ArbitraryTerm" $
prop "generates terms of a specific size" $ forAll ((arbitrary >>= \ n -> (,) n <$> termOfSize n) `suchThat` ((> 0) . fst)) $
\ (n, term) -> arbitraryTermSize (term :: ArbitraryTerm String ()) `shouldBe` n
describe "ArbitraryDiff" $
prop "generates diffs of a specific size" $ forAll ((arbitrary >>= \ n -> (,) n <$> diffOfSize n) `suchThat` ((> 0) . fst)) $
\ (n, diff) -> arbitraryDiffSize (diff :: ArbitraryDiff String ()) `shouldBe` n
describe "Diff" $ do
prop "equality is reflexive" $
\ a b -> let diff = diffTerms (free . Free) (==) diffCost (toTerm a) (toTerm (b :: ArbitraryTerm String CategorySet)) in
diff == diff
prop "equal terms produce identity diffs" $
\ a -> let term = toTerm (a :: ArbitraryTerm String CategorySet) in
diffCost (diffTerms (free . Free) (==) diffCost term term) == 0
describe "ArbitraryTerm" $ do
prop "generates terms of a specific size" . forAll ((arbitrary >>= \ n -> (,) n <$> termOfSize n) `suchThat` ((> 0) . fst)) $
\ (n, term) -> arbitraryTermSize (term :: ArbitraryTerm Text ()) `shouldBe` n

2
vendor/gitlib vendored

@ -1 +1 @@
Subproject commit 0a9bb2671d8adfc6117173d46ca10fa8b8e94559
Subproject commit 4828dbf14adfc4ce0ac3536f8b192e65828e97bc

74
weekly/2016-06-21.md Normal file
View File

@ -0,0 +1,74 @@
# Semantic Diff Problems (Mini-Summit)
### Performance (most significant problem)
- SES / Alignment are biggest time / space consumers.
- Profiling small subsets of code paths rather than the full context.
- Adding more criterion benchmarks for code paths not currently profiled (like Diff Summaries).
##### Alignment performance
- Has to visit each child of each remaining line.
##### [SES](https://github.com/github/semantic-diff/files/22485/An.O.ND.Difference.Algorithm.and.its.Variations.pdf) Performance
- n^3 the size of the tree.
- Can try bounded SES (looks ahead by a fixed size of nodes).
- Identify more comparisons we can skip (i.e. don't compare functions with array literals).
- Does not look like there are more easy wins here (algorithm is already implemented to prevent unnecessary comparisions).
- In some cases, the diffing is expensive because we don't have more fine-grain identifiers for certain diffs. (e.g. a test file with 100 statement expressions).
- Diffing against identifiers (use the edit distance to determine whether to compare terms with SES or not).
- This could result in us missing a function rename though.
- Not a catchall, but it can help increase performance in a larger number of cases.
##### [RWS](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf) Performance
- Random Walk Similarity.
- computes approximation to the minimal edit script.
- O(log N) rather than O(n^3).
- RWS does not rely on identifiers.
- RWS solves our performance problem in the general form.
- Can allow us to diff patches of patches (something we cannot do currently with our implementation of SES).
##### Diff summaries performance
- Performance of DS is dependent on diffing (Diff Terms, Interpreter, cost functions)
### Failing too hard
- Request is not completing if Semantic Diff fails.
- How can we fail better on dotcom?
- How can we fail better when parsing? (both in Semantic Diff and dotcom)
### Responsiveness
- Async fetch diff summaries / diffs / progressive diffs or diff summaries
### Improving grammars
- Fix Ruby parser.
- Testing and verifying other grammars.
### Measure effectiveness of grammars
### Tooling
- Why isn't parallelization of SES having the expected effect?
- Should focus on low hanging fruit but we're not going to write a debugger.
### Time limitations with respect to solutions and team
### Ramp up time is extremely variable.
### Onboarding
- Pairing has been fantastic.
- SES algorithm requires some context and background to understand the code at the general / macro level.
- Plan a bit before pairing to gain context.
### Pre-launch Ideas
- Test on a couple file server nodes and run semantic diff on javascript repos.
- Collect repos, files, shas that contain error nodes to gain a % of error rates and expose errors in tree sitter grammars.
- If sources have errors, can we use a parser that validates the source is correct?
- Configure a script that is as language independent as possible that can automate the error collection process but allows us to specify an independent validating parser for each language.

36
weekly/2016-06-27.md Normal file
View File

@ -0,0 +1,36 @@
# June 27th, 2016 weekly
## What went well?
@joshvera: Pairing, minisummitting, RWS discussions.
@rewinfrey: Pairing, context on recursion schemes, started independent work on the project, minisummitting. Defined what to work on next
@robrix: Minisummit: got to know both of you better & really enjoyed that. Before that I was on vacation but you both did a great job!
## What went less well?
@joshvera: Lots more problems turned up. Lots of stuff that has taken on more importance as weve thought about it more. Feel like I couldve made more progress on diff summaries by now. Some of that has been minisummit, some of that has been every time we do more work on it there seems to be new layers peeling off exposing other issues & more work needing to be done.
@rewinfrey: Maybe Im overly optimistic but I dont have anything to point to that I didnt think went well. The challenges we identified during minisummit felt like a good sign of the project moving forward.
@robrix: Ditto. May end up being a bit distracted over the next couple of weeks figuring out some stuff re: summit & my attendance of it.
## What did you learn?
@joshvera: Type ∋ Type isnt as easy as it sounds. Learned about you both too!
@rewinfrey: Learned about you both. Recursion schemes! Relationships between algebras & projections, coalgebras & embeddings, and recursion-schemes `Base` type family. Further explored some other morphisms. RWS (albeit misreading some of it).
@robrix: Learned about you both! RWS. Some stuff about derivative-parsing. Learned a lot about communication too.
## Anything else?
@joshvera: Out Thursday/Friday.
@rewinfrey: Josh, how did the blue suit fit? (“Really well.”)
@robrix: Canada Day on Friday. Youre both invited to celebrate it as well, by being as Canadian as possible.