mirror of
https://github.com/github/semantic.git
synced 2025-01-03 04:51:57 +03:00
Merge branch 'master' into ses-parallelism
# Conflicts: # semantic-diff.cabal # src/Patch.hs
This commit is contained in:
commit
a0a86a0d01
2
HLint.hs
2
HLint.hs
@ -12,4 +12,6 @@ error "Avoid return" =
|
||||
where note = "return is obsolete as of GHC 7.10"
|
||||
|
||||
error "use pure" = free . Pure ==> pure
|
||||
error "use wrap" = free . Free ==> wrap
|
||||
|
||||
error "use extract" = headF . runCofree ==> extract
|
||||
|
48
ROADMAP.md
48
ROADMAP.md
@ -1,17 +1,47 @@
|
||||
# Semantic diff roadmap
|
||||
# Roadmap
|
||||
|
||||
## Q1 2016
|
||||
## Things we are currently doing:
|
||||
|
||||
1. [Staff ship & limited beta of semantic diffing on .com](https://github.com/github/semantic-diff/milestones/Staff%20Ship). This will be an opt-in, limited release of semantic diffs for a very small set of languages. UI in general will be unchanged; we’ll simply start showing better diffs for the languages in question. The goal is to ease ourselves into deployment of the system, and benchmark under real loads.
|
||||
1. [Diff summaries][] for C & JavaScript. Q3 2016 or so.
|
||||
|
||||
2. [Semantic diffing on .com](https://github.com/github/semantic-diff/milestones/Dot%20Calm). General release of semantic diffs for the supported languages.
|
||||
- Modelling the abstract semantics of the supported languages. Good summaries require us to know what different parts of the syntax represent.
|
||||
- Performance/responsiveness. We need to be able to produce diffs more quickly, and without unicorns. Some of this will involve front-end work (e.g. requesting summaries out-of-band).
|
||||
|
||||
2. [Semantic diffs][] on .com for C & JavaScript. Q4 2016 or so.
|
||||
|
||||
## Q2–Q4 2016
|
||||
- Performance, as above.
|
||||
- Resilience. A fault in `semantic-git-diff` should not break anything else.
|
||||
- Metrics. We need to know how it’s behaving in the wild to know what to do about it. This also includes operational metrics such as health checks.
|
||||
|
||||
We will discuss future milestones at the **@github/network-intelligence** minisummit mid-Q1 2016, and document them here at that point.
|
||||
|
||||
## Follow-up things:
|
||||
|
||||
## Ongoing
|
||||
1. Add support for more languages: [Ruby][], etc.
|
||||
2. [Detecting & rendering moves][moves].
|
||||
3. [Merging][].
|
||||
4. Refining the diff summaries we produce.
|
||||
|
||||
- Creation, curation, and cultivation of grammars for semantic diffs.
|
||||
## Things we would like to do:
|
||||
|
||||
1. [Interactively refining diffs][interactive].
|
||||
2. [Filtering][] diffs.
|
||||
3. Diff [table of contents][].
|
||||
4. [Jump to symbol definition][].
|
||||
5. Eliminate conflicts from renaming [variables][].
|
||||
|
||||
## Things we would like to do modulo interest/support from other teams:
|
||||
|
||||
1. APIs/tooling for data science & engineering teams.
|
||||
2. Collect data on our heuristics &c. and refine them via e.g. ML.
|
||||
3. Diffs as a [service][].
|
||||
|
||||
[Diff summaries]: https://github.com/github/semantic-diff/milestones/Summer%20Eyes
|
||||
[Semantic diffs]: https://github.com/github/semantic-diff/milestones/Dot%20Calm
|
||||
[Ruby]: https://github.com/github/semantic-diff/issues/282
|
||||
[moves]: https://github.com/github/semantic-diff/issues/389
|
||||
[Merging]: https://github.com/github/semantic-diff/issues/431
|
||||
[interactive]: https://github.com/github/semantic-diff/issues/130
|
||||
[Filtering]: https://github.com/github/semantic-diff/issues/428
|
||||
[table of contents]: https://github.com/github/semantic-diff/issues/16
|
||||
[Jump to symbol definition]: https://github.com/github/semantic-diff/issues/6
|
||||
[variables]: https://github.com/github/semantic-diff/issues/91
|
||||
[service]: https://github.com/github/platform/blob/master/services/README.md
|
||||
|
@ -18,7 +18,9 @@ library
|
||||
, Category
|
||||
, Data.Bifunctor.Join.Arbitrary
|
||||
, Data.Functor.Both
|
||||
, Data.RandomWalkSimilarity
|
||||
, Data.OrderedMap
|
||||
, Data.Record
|
||||
, Data.These.Arbitrary
|
||||
, Diff
|
||||
, Diff.Arbitrary
|
||||
@ -54,9 +56,13 @@ library
|
||||
, bytestring
|
||||
, containers
|
||||
, directory
|
||||
, dlist
|
||||
, filepath
|
||||
, hashable
|
||||
, kdt
|
||||
, mtl
|
||||
, parallel
|
||||
, MonadRandom
|
||||
, pointed
|
||||
, QuickCheck >= 2.8.1
|
||||
, quickcheck-text
|
||||
@ -71,7 +77,7 @@ library
|
||||
, comonad
|
||||
, protolude
|
||||
default-language: Haskell2010
|
||||
default-extensions: DeriveFunctor, DeriveFoldable, DeriveTraversable, DeriveGeneric, FlexibleInstances, OverloadedStrings, NoImplicitPrelude, RecordWildCards
|
||||
default-extensions: DeriveFunctor, DeriveFoldable, DeriveTraversable, DeriveGeneric, FlexibleContexts, FlexibleInstances, OverloadedStrings, NoImplicitPrelude, RecordWildCards
|
||||
ghc-options: -Wall -fno-warn-name-shadowing -O2 -threaded -fprof-auto "-with-rtsopts=-N -p -s -h -i0.1" -j
|
||||
|
||||
benchmark semantic-diff-bench
|
||||
@ -95,18 +101,20 @@ test-suite semantic-diff-test
|
||||
hs-source-dirs: test
|
||||
main-is: Spec.hs
|
||||
other-modules: AlignmentSpec
|
||||
, ArbitraryTerm
|
||||
, CorpusSpec
|
||||
, Data.RandomWalkSimilarity.Spec
|
||||
, Diff.Spec
|
||||
, DiffSummarySpec
|
||||
, InterpreterSpec
|
||||
, OrderedMapSpec
|
||||
, PatchOutputSpec
|
||||
, TermSpec
|
||||
, DiffSummarySpec
|
||||
build-depends: base
|
||||
, bifunctors
|
||||
, bytestring
|
||||
, containers
|
||||
, deepseq
|
||||
, dlist
|
||||
, filepath
|
||||
, Glob
|
||||
, hspec >= 2.1.10
|
||||
|
@ -18,6 +18,7 @@ import Data.Functor.Foldable (hylo)
|
||||
import Data.List (partition)
|
||||
import Data.Maybe (fromJust)
|
||||
import qualified Data.OrderedMap as Map
|
||||
import Data.Record
|
||||
import Data.These
|
||||
import Diff
|
||||
import Info
|
||||
@ -39,15 +40,15 @@ numberedRows = countUp (both 1 1)
|
||||
nextLineNumbers from row = modifyJoin (fromThese identity identity) (succ <$ row) <*> from
|
||||
|
||||
-- | Determine whether a line contains any patches.
|
||||
hasChanges :: SplitDiff leaf Info -> Bool
|
||||
hasChanges :: SplitDiff leaf annotation -> Bool
|
||||
hasChanges = or . (True <$)
|
||||
|
||||
-- | Align a Diff into a list of Join These SplitDiffs representing the (possibly blank) lines on either side.
|
||||
alignDiff :: Show leaf => Both (Source Char) -> Diff leaf Info -> [Join These (SplitDiff leaf Info)]
|
||||
alignDiff :: (Show leaf, Show (Record fields), HasField fields Range) => Both (Source Char) -> Diff leaf (Record fields) -> [Join These (SplitDiff leaf (Record fields))]
|
||||
alignDiff sources diff = iter (alignSyntax (runBothWith ((Join .) . These)) (free . Free) getRange sources) (alignPatch sources <$> diff)
|
||||
|
||||
-- | Align the contents of a patch into a list of lines on the corresponding side(s) of the diff.
|
||||
alignPatch :: forall leaf. Show leaf => Both (Source Char) -> Patch (Term leaf Info) -> [Join These (SplitDiff leaf Info)]
|
||||
alignPatch :: forall fields leaf. (Show leaf, Show (Record fields), HasField fields Range) => Both (Source Char) -> Patch (Term leaf (Record fields)) -> [Join These (SplitDiff leaf (Record fields))]
|
||||
alignPatch sources patch = case patch of
|
||||
Delete term -> fmap (pure . SplitDelete) <$> alignSyntax' this (fst sources) term
|
||||
Insert term -> fmap (pure . SplitInsert) <$> alignSyntax' that (snd sources) term
|
||||
@ -55,13 +56,13 @@ alignPatch sources patch = case patch of
|
||||
(alignSyntax' this (fst sources) term1)
|
||||
(alignSyntax' that (snd sources) term2)
|
||||
where getRange = characterRange . extract
|
||||
alignSyntax' :: (forall a. Identity a -> Join These a) -> Source Char -> Term leaf Info -> [Join These (Term leaf Info)]
|
||||
alignSyntax' :: (forall a. Identity a -> Join These a) -> Source Char -> Term leaf (Record fields) -> [Join These (Term leaf (Record fields))]
|
||||
alignSyntax' side source term = hylo (alignSyntax side cofree getRange (Identity source)) runCofree (Identity <$> term)
|
||||
this = Join . This . runIdentity
|
||||
that = Join . That . runIdentity
|
||||
|
||||
-- | The Applicative instance f is either Identity or Both. Identity is for Terms in Patches, Both is for Diffs in unchanged portions of the diff.
|
||||
alignSyntax :: (Applicative f, Show term) => (forall a. f a -> Join These a) -> (CofreeF (Syntax leaf) Info term -> term) -> (term -> Range) -> f (Source Char) -> CofreeF (Syntax leaf) (f Info) [Join These term] -> [Join These term]
|
||||
alignSyntax :: (Applicative f, Show term, HasField fields Range) => (forall a. f a -> Join These a) -> (CofreeF (Syntax leaf) (Record fields) term -> term) -> (term -> Range) -> f (Source Char) -> CofreeF (Syntax leaf) (f (Record fields)) [Join These term] -> [Join These term]
|
||||
alignSyntax toJoinThese toNode getRange sources (infos :< syntax) = case syntax of
|
||||
Leaf s -> catMaybes $ wrapInBranch (const (Leaf s)) . fmap (flip (,) []) <$> sequenceL lineRanges
|
||||
Indexed children -> catMaybes $ wrapInBranch Indexed <$> alignBranch getRange (join children) bothRanges
|
||||
@ -69,7 +70,7 @@ alignSyntax toJoinThese toNode getRange sources (infos :< syntax) = case syntax
|
||||
Keyed children -> catMaybes $ wrapInBranch (Keyed . Map.fromList) <$> alignBranch (getRange . Prologue.snd) (Map.toList children >>= pairWithKey) bothRanges
|
||||
where bothRanges = modifyJoin (fromThese [] []) lineRanges
|
||||
lineRanges = toJoinThese $ actualLineRanges <$> (characterRange <$> infos) <*> sources
|
||||
wrapInBranch constructor = applyThese $ toJoinThese ((\ info (range, children) -> toNode (info { characterRange = range } :< constructor children)) <$> infos)
|
||||
wrapInBranch constructor = applyThese $ toJoinThese ((\ info (range, children) -> toNode (setCharacterRange info range :< constructor children)) <$> infos)
|
||||
pairWithKey (key, values) = fmap ((,) key) <$> values
|
||||
|
||||
-- | Given a function to get the range, a list of already-aligned children, and the lists of ranges spanned by a branch, return the aligned lines.
|
||||
@ -86,7 +87,7 @@ alignBranch getRange children ranges = case intersectingChildren of
|
||||
_ -> case intersectionsWithHeadRanges <$> listToMaybe symmetricalChildren of
|
||||
-- At least one child intersects on both sides, so align symmetrically.
|
||||
Just (True, True) -> let (line, remaining) = lineAndRemaining intersectingChildren (Just headRanges) in
|
||||
line $ alignBranch getRange (remaining ++ nonIntersectingChildren) (drop 1 <$> ranges)
|
||||
line $ alignBranch getRange (remaining <> nonIntersectingChildren) (drop 1 <$> ranges)
|
||||
-- A symmetrical child intersects on the right, so align asymmetrically on the left.
|
||||
Just (False, True) -> alignAsymmetrically leftRange first
|
||||
-- A symmetrical child intersects on the left, so align asymmetrically on the right.
|
||||
@ -101,7 +102,7 @@ alignBranch getRange children ranges = case intersectingChildren of
|
||||
Just headRanges = sequenceL (listToMaybe <$> Join (runBothWith These ranges))
|
||||
(leftRange, rightRange) = splitThese headRanges
|
||||
alignAsymmetrically range advanceBy = let (line, remaining) = lineAndRemaining asymmetricalChildren range in
|
||||
line $ alignBranch getRange (remaining ++ symmetricalChildren ++ nonIntersectingChildren) (modifyJoin (advanceBy (drop 1)) ranges)
|
||||
line $ alignBranch getRange (remaining <> symmetricalChildren <> nonIntersectingChildren) (modifyJoin (advanceBy (drop 1)) ranges)
|
||||
lineAndRemaining _ Nothing = (identity, [])
|
||||
lineAndRemaining children (Just ranges) = let (intersections, remaining) = alignChildren getRange children ranges in
|
||||
((:) $ (,) <$> ranges `applyToBoth` (sortBy (compare `on` getRange) <$> intersections), remaining)
|
||||
|
@ -1,7 +1,9 @@
|
||||
module Category where
|
||||
|
||||
import Prologue
|
||||
import Data.Hashable
|
||||
import Data.String
|
||||
import Test.QuickCheck
|
||||
|
||||
-- | A standardized category of AST node. Used to determine the semantics for
|
||||
-- | semantic diffing and define comparability of nodes.
|
||||
@ -28,4 +30,27 @@ data Category
|
||||
| ArrayLiteral
|
||||
-- | A non-standard category, which can be used for comparability.
|
||||
| Other String
|
||||
deriving (Eq, Show, Ord)
|
||||
deriving (Eq, Generic, Ord, Show)
|
||||
|
||||
|
||||
-- Instances
|
||||
|
||||
instance Hashable Category
|
||||
|
||||
instance Arbitrary Category where
|
||||
arbitrary = oneof
|
||||
[ pure Program
|
||||
, pure Error
|
||||
, pure BinaryOperator
|
||||
, pure DictionaryLiteral
|
||||
, pure Pair
|
||||
, pure FunctionCall
|
||||
, pure StringLiteral
|
||||
, pure IntegerLiteral
|
||||
, pure SymbolLiteral
|
||||
, pure ArrayLiteral
|
||||
, Other <$> arbitrary
|
||||
]
|
||||
|
||||
shrink (Other s) = Other <$> shrink s
|
||||
shrink _ = []
|
||||
|
@ -1,3 +1,4 @@
|
||||
{-# OPTIONS_GHC -fno-warn-orphans #-}
|
||||
module Data.Functor.Both (Both,both, runBothWith, fst, snd, module X) where
|
||||
|
||||
import Data.Bifunctor.Join as X
|
||||
|
@ -15,10 +15,11 @@ module Data.OrderedMap (
|
||||
) where
|
||||
|
||||
import Prologue hiding (toList, empty)
|
||||
import Test.QuickCheck
|
||||
|
||||
-- | An ordered map of keys and values.
|
||||
newtype OrderedMap key value = OrderedMap { toList :: [(key, value)] }
|
||||
deriving (Show, Eq, Functor, Foldable, Traversable)
|
||||
deriving (Eq, Foldable, Functor, Generic, Ord, Show, Traversable)
|
||||
|
||||
-- | Construct an ordered map from a list of pairs of keys and values.
|
||||
fromList :: [(key, value)] -> OrderedMap key value
|
||||
@ -70,3 +71,7 @@ difference (OrderedMap a) (OrderedMap b) = OrderedMap $ filter ((`notElem` extan
|
||||
instance Eq key => Monoid (OrderedMap key value) where
|
||||
mempty = fromList []
|
||||
mappend = union
|
||||
|
||||
instance (Arbitrary key, Arbitrary value) => Arbitrary (OrderedMap key value) where
|
||||
arbitrary = fromList <$> arbitrary
|
||||
shrink = genericShrink
|
||||
|
110
src/Data/RandomWalkSimilarity.hs
Normal file
110
src/Data/RandomWalkSimilarity.hs
Normal file
@ -0,0 +1,110 @@
|
||||
module Data.RandomWalkSimilarity where
|
||||
|
||||
import Control.Arrow ((&&&))
|
||||
import Control.Monad.Random
|
||||
import Control.Monad.State
|
||||
import qualified Data.DList as DList
|
||||
import Data.Functor.Foldable as Foldable
|
||||
import Data.Hashable
|
||||
import qualified Data.KdTree.Static as KdTree
|
||||
import qualified Data.List as List
|
||||
import qualified Data.OrderedMap as Map
|
||||
import qualified Data.Vector as Vector
|
||||
import Diff
|
||||
import Patch
|
||||
import Prologue
|
||||
import Syntax
|
||||
import Term
|
||||
import Test.QuickCheck hiding (Fixed)
|
||||
import Test.QuickCheck.Random
|
||||
|
||||
-- | Given a function comparing two terms recursively, and a function to compute a Hashable label from an annotation, compute the diff of a pair of lists of terms using a random walk similarity metric, which completes in log-linear time. This implementation is based on the paper [_RWS-Diff—Flexible and Efficient Change Detection in Hierarchical Data_](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf).
|
||||
rws :: (Hashable label, Hashable leaf, Eq leaf, Eq annotation) => (Term leaf annotation -> Term leaf annotation -> Maybe (Diff leaf annotation)) -> (annotation -> label) -> [Term leaf annotation] -> [Term leaf annotation] -> [Diff leaf annotation]
|
||||
rws compare getLabel as bs
|
||||
| null as, null bs = []
|
||||
| null as = insert <$> bs
|
||||
| null bs = delete <$> as
|
||||
| otherwise = fmap snd . uncurry deleteRemaining . (`runState` (negate 1, fas)) $ traverse findNearestNeighbourTo fbs
|
||||
where insert = pure . Insert
|
||||
delete = pure . Delete
|
||||
(p, q, d) = (2, 2, 15)
|
||||
fas = zipWith featurize [0..] as
|
||||
fbs = zipWith featurize [0..] bs
|
||||
kdas = KdTree.build (Vector.toList . feature) fas
|
||||
featurize index term = UnmappedTerm index (featureVector d (pqGrams p q getLabel term)) term
|
||||
findNearestNeighbourTo kv@(UnmappedTerm _ _ v) = do
|
||||
(previous, unmapped) <- get
|
||||
let (UnmappedTerm i _ _) = KdTree.nearest kdas kv
|
||||
fromMaybe (pure (negate 1, insert v)) $ do
|
||||
found <- find ((== i) . termIndex) unmapped
|
||||
guard (i >= previous)
|
||||
compared <- compare (term found) v
|
||||
pure $! do
|
||||
put (i, List.delete found unmapped)
|
||||
pure (i, compared)
|
||||
deleteRemaining diffs (_, unmapped) = foldl' (flip (List.insertBy (comparing fst))) diffs ((termIndex &&& delete . term) <$> unmapped)
|
||||
|
||||
-- | A term which has not yet been mapped by `rws`, along with its feature vector summary & index.
|
||||
data UnmappedTerm leaf annotation = UnmappedTerm { termIndex :: {-# UNPACK #-} !Int, feature :: !(Vector.Vector Double), term :: !(Term leaf annotation) }
|
||||
deriving Eq
|
||||
|
||||
-- | A `Gram` is a fixed-size view of some portion of a tree, consisting of a `stem` of _p_ labels for parent nodes, and a `base` of _q_ labels of sibling nodes. Collectively, the bag of `Gram`s for each node of a tree (e.g. as computed by `pqGrams`) form a summary of the tree.
|
||||
data Gram label = Gram { stem :: [Maybe label], base :: [Maybe label] }
|
||||
deriving (Eq, Show)
|
||||
|
||||
-- | Compute the bag of grams with stems of length _p_ and bases of length _q_, with labels computed from annotations, which summarize the entire subtree of a term.
|
||||
pqGrams :: Int -> Int -> (annotation -> label) -> Cofree (Syntax leaf) annotation -> DList.DList (Gram (label, Maybe leaf))
|
||||
pqGrams p q getLabel = cata merge . setRootBase . setRootStem . hylo go project
|
||||
where go (annotation :< functor) = cofree (Gram [] [ Just (getLabel annotation, leafValue functor) ] :< (assignParent (Just (getLabel annotation, leafValue functor)) p <$> functor))
|
||||
leafValue (Leaf s) = Just s
|
||||
leafValue _ = Nothing
|
||||
merge (head :< tail) = DList.cons head (Prologue.fold tail)
|
||||
assignParent parentLabel n tree
|
||||
| n > 0 = let gram :< functor = runCofree tree in cofree $ prependParent parentLabel gram :< assignSiblings (assignParent parentLabel (pred n) <$> functor)
|
||||
| otherwise = tree
|
||||
prependParent parentLabel gram = gram { stem = parentLabel : stem gram }
|
||||
assignSiblings functor = case functor of
|
||||
Leaf a -> Leaf a
|
||||
Indexed a -> Indexed $ windowed q setBases [] a
|
||||
Fixed a -> Fixed $ windowed q setBases [] a
|
||||
Keyed a -> Keyed . Map.fromList $ windowed q setBasesKV [] (Map.toList a)
|
||||
setBases child siblings rest = let (gram :< further) = (runCofree child) in cofree (setBase gram (siblings >>= base . extract) :< further) : rest
|
||||
setBasesKV (key, child) siblings rest = let (gram :< further) = (runCofree child) in (key, cofree (setBase gram (siblings >>= base . extract . snd) :< further)) : rest
|
||||
setBase gram newBase = gram { base = take q (newBase <> repeat Nothing) }
|
||||
setRootBase term = let (a :< f) = runCofree term in cofree (setBase a (base a) :< f)
|
||||
setRootStem = foldr (\ p rest -> assignParent Nothing p . rest) identity [0..p]
|
||||
|
||||
-- | A sliding-window fold over _n_ items of a list per iteration.
|
||||
windowed :: Int -> (a -> [a] -> b -> b) -> b -> [a] -> b
|
||||
windowed n f seed = para alg
|
||||
where alg xs = case xs of
|
||||
Cons a (as, b) -> f a (take n $ a : as) b
|
||||
Nil -> seed
|
||||
|
||||
|
||||
-- | Compute a vector with the specified number of dimensions, as an approximation of a bag of `Gram`s summarizing a tree.
|
||||
featureVector :: Hashable label => Int -> DList.DList (Gram label) -> Vector.Vector Double
|
||||
featureVector d bag = sumVectors $ unitDVector . hash <$> bag
|
||||
where unitDVector hash = normalize . (`evalRand` mkQCGen hash) $ Prologue.sequence (Vector.replicate d getRandom)
|
||||
normalize vec = fmap (/ vmagnitude vec) vec
|
||||
sumVectors = DList.foldr (Vector.zipWith (+)) (Vector.replicate d 0)
|
||||
|
||||
-- | The magnitude of a Euclidean vector, i.e. its distance from the origin.
|
||||
vmagnitude :: Vector.Vector Double -> Double
|
||||
vmagnitude = sqrtDouble . Vector.sum . fmap (** 2)
|
||||
|
||||
|
||||
-- Instances
|
||||
|
||||
instance Hashable label => Hashable (Gram label) where
|
||||
hashWithSalt _ = hash
|
||||
hash gram = hash (stem gram <> base gram)
|
||||
|
||||
-- | Construct a generator for arbitrary `Gram`s of size `(p, q)`.
|
||||
gramWithPQ :: Arbitrary label => Int -> Int -> Gen (Gram label)
|
||||
gramWithPQ p q = Gram <$> vectorOf p arbitrary <*> vectorOf q arbitrary
|
||||
|
||||
instance Arbitrary label => Arbitrary (Gram label) where
|
||||
arbitrary = join $ gramWithPQ <$> arbitrary <*> arbitrary
|
||||
|
||||
shrink (Gram a b) = Gram <$> shrink a <*> shrink b
|
67
src/Data/Record.hs
Normal file
67
src/Data/Record.hs
Normal file
@ -0,0 +1,67 @@
|
||||
{-# LANGUAGE DataKinds, GADTs, KindSignatures, MultiParamTypeClasses, TypeOperators #-}
|
||||
module Data.Record where
|
||||
|
||||
import Prologue
|
||||
import Test.QuickCheck
|
||||
|
||||
-- | A type-safe, extensible record structure.
|
||||
-- |
|
||||
-- | This is heavily inspired by Aaron Levin’s [Extensible Effects in the van Laarhoven Free Monad](http://aaronlevin.ca/post/136494428283/extensible-effects-in-the-van-laarhoven-free-monad).
|
||||
data Record :: [*] -> * where
|
||||
RNil :: Record '[]
|
||||
RCons :: h -> Record t -> Record (h ': t)
|
||||
|
||||
infixr 0 .:
|
||||
|
||||
-- | Infix synonym for `RCons`: `a .: b .: RNil == RCons a (RCons b RNil)`.
|
||||
(.:) :: h -> Record t -> Record (h ': t)
|
||||
(.:) = RCons
|
||||
|
||||
|
||||
-- Classes
|
||||
|
||||
-- | HasField enables indexing a Record by (phantom) type tags.
|
||||
class HasField (fields :: [*]) (field :: *) where
|
||||
getField :: Record fields -> field
|
||||
setField :: Record fields -> field -> Record fields
|
||||
|
||||
|
||||
-- Instances
|
||||
|
||||
-- OVERLAPPABLE is required for the HasField instances so that we can handle the two cases: either the head of the non-empty h-list is the requested field, or it isn’t. The third possible case (the h-list is empty) is rejected at compile-time.
|
||||
|
||||
instance {-# OVERLAPPABLE #-} HasField fields field => HasField (notIt ': fields) field where
|
||||
getField (RCons _ t) = getField t
|
||||
setField (RCons h t) f = RCons h (setField t f)
|
||||
|
||||
instance {-# OVERLAPPABLE #-} HasField (field ': fields) field where
|
||||
getField (RCons h _) = h
|
||||
setField (RCons _ t) f = RCons f t
|
||||
|
||||
|
||||
instance (Show h, Show (Record t)) => Show (Record (h ': t)) where
|
||||
showsPrec n (RCons h t) = showsPrec n h . (" : " <>) . showsPrec n t
|
||||
|
||||
instance Show (Record '[]) where
|
||||
showsPrec _ RNil = ("'[]" <>)
|
||||
|
||||
|
||||
instance (Eq h, Eq (Record t)) => Eq (Record (h ': t)) where
|
||||
RCons h1 t1 == RCons h2 t2 = h1 == h2 && t1 == t2
|
||||
|
||||
instance Eq (Record '[]) where
|
||||
_ == _ = True
|
||||
|
||||
|
||||
instance (Ord h, Ord (Record t)) => Ord (Record (h ': t)) where
|
||||
RCons h1 t1 `compare` RCons h2 t2 = let h = h1 `compare` h2 in
|
||||
if h == EQ then t1 `compare` t2 else h
|
||||
|
||||
instance Ord (Record '[]) where
|
||||
_ `compare` _ = EQ
|
||||
|
||||
|
||||
instance Arbitrary fields => Arbitrary (Record '[fields]) where
|
||||
arbitrary = RCons <$> arbitrary <*> pure RNil
|
||||
|
||||
shrink (RCons h t) = RCons <$> shrink h <*> pure t
|
27
src/Diff.hs
27
src/Diff.hs
@ -1,9 +1,11 @@
|
||||
{-# LANGUAGE TypeFamilies, TypeSynonymInstances #-}
|
||||
{-# OPTIONS_GHC -fno-warn-orphans #-}
|
||||
module Diff where
|
||||
|
||||
import Prologue
|
||||
import Data.Functor.Foldable as Foldable
|
||||
import Data.Functor.Both
|
||||
import Data.Functor.Both as Both
|
||||
import qualified Data.OrderedMap as Map
|
||||
import Patch
|
||||
import Syntax
|
||||
import Term
|
||||
@ -13,8 +15,8 @@ type DiffF leaf annotation = FreeF (CofreeF (Syntax leaf) (Both annotation)) (Pa
|
||||
type Diff a annotation = Free (CofreeF (Syntax a) (Both annotation)) (Patch (Term a annotation))
|
||||
|
||||
type instance Base (Free f a) = FreeF f a
|
||||
instance (Functor f) => Foldable.Foldable (Free f a) where project = runFree
|
||||
instance (Functor f) => Foldable.Unfoldable (Free f a) where embed = free
|
||||
instance Functor f => Foldable.Foldable (Free f a) where project = runFree
|
||||
instance Functor f => Foldable.Unfoldable (Free f a) where embed = free
|
||||
|
||||
diffSum :: (Patch (Term a annotation) -> Integer) -> Diff a annotation -> Integer
|
||||
diffSum patchCost diff = sum $ fmap patchCost diff
|
||||
@ -22,3 +24,22 @@ diffSum patchCost diff = sum $ fmap patchCost diff
|
||||
-- | The sum of the node count of the diff’s patches.
|
||||
diffCost :: Diff a annotation -> Integer
|
||||
diffCost = diffSum $ patchSum termSize
|
||||
|
||||
-- | Merge a diff using a function to provide the Term (in Maybe, to simplify recovery of the before/after state) for every Patch.
|
||||
mergeMaybe :: (Patch (Term leaf annotation) -> Maybe (Term leaf annotation)) -> Diff leaf annotation -> Maybe (Term leaf annotation)
|
||||
mergeMaybe transform = cata algebra . fmap transform
|
||||
where algebra :: FreeF (CofreeF (Syntax leaf) (Both annotation)) (Maybe (Term leaf annotation)) (Maybe (Term leaf annotation)) -> Maybe (Term leaf annotation)
|
||||
algebra (Pure p) = p
|
||||
algebra (Free (annotations :< syntax)) = Just . cofree $ Both.fst annotations :< case syntax of
|
||||
Leaf s -> Leaf s
|
||||
Indexed i -> Indexed (catMaybes i)
|
||||
Fixed i -> Fixed (catMaybes i)
|
||||
Keyed i -> Keyed (Map.fromList (Map.toList i >>= (\ (k, v) -> maybe [] (pure . (,) k) v)))
|
||||
|
||||
-- | Recover the before state of a diff.
|
||||
beforeTerm :: Diff leaf annotation -> Maybe (Term leaf annotation)
|
||||
beforeTerm = mergeMaybe before
|
||||
|
||||
-- | Recover the after state of a diff.
|
||||
afterTerm :: Diff leaf annotation -> Maybe (Term leaf annotation)
|
||||
afterTerm = mergeMaybe after
|
||||
|
@ -4,7 +4,6 @@ import Diff
|
||||
import Data.Bifunctor.Join
|
||||
import Data.Bifunctor.Join.Arbitrary ()
|
||||
import Data.Functor.Foldable (unfold)
|
||||
import qualified Data.List as List
|
||||
import qualified Data.OrderedMap as Map
|
||||
import Patch
|
||||
import Patch.Arbitrary ()
|
||||
@ -13,32 +12,25 @@ import Prologue
|
||||
import Term.Arbitrary
|
||||
import Test.QuickCheck hiding (Fixed)
|
||||
|
||||
newtype ArbitraryDiff leaf annotation = ArbitraryDiff { unArbitraryDiff :: FreeF (CofreeF (Syntax leaf) (Join (,) annotation)) (Patch (ArbitraryTerm leaf annotation)) (ArbitraryDiff leaf annotation) }
|
||||
data ArbitraryDiff leaf annotation
|
||||
= ArbitraryFree (Join (,) annotation) (Syntax leaf (ArbitraryDiff leaf annotation))
|
||||
| ArbitraryPure (Patch (ArbitraryTerm leaf annotation))
|
||||
deriving (Show, Eq, Generic)
|
||||
|
||||
unArbitraryDiff :: ArbitraryDiff leaf annotation -> FreeF (CofreeF (Syntax leaf) (Join (,) annotation)) (Patch (ArbitraryTerm leaf annotation)) (ArbitraryDiff leaf annotation)
|
||||
unArbitraryDiff (ArbitraryFree a s) = Free (a :< s)
|
||||
unArbitraryDiff (ArbitraryPure p) = Pure p
|
||||
|
||||
toDiff :: ArbitraryDiff leaf annotation -> Diff leaf annotation
|
||||
toDiff = fmap (fmap toTerm) . unfold unArbitraryDiff
|
||||
|
||||
diffOfSize :: (Arbitrary leaf, Arbitrary annotation) => Int -> Gen (ArbitraryDiff leaf annotation)
|
||||
diffOfSize n
|
||||
| n <= 0 = (ArbitraryDiff .) . (Free .) . (:<) <$> arbitrary <*> syntaxOfSize n
|
||||
| n <= 0 = ArbitraryFree <$> arbitrary <*> syntaxOfSize diffOfSize n
|
||||
| otherwise = oneof
|
||||
[ (ArbitraryDiff .) . (Free .) . (:<) <$> arbitrary <*> syntaxOfSize n
|
||||
, ArbitraryDiff . Pure <$> patchOfSize n ]
|
||||
where syntaxOfSize n | n <= 1 = oneof $ (Leaf <$> arbitrary) : branchGeneratorsOfSize n
|
||||
| otherwise = oneof $ branchGeneratorsOfSize n
|
||||
branchGeneratorsOfSize n =
|
||||
[ Indexed <$> childrenOfSize (pred n)
|
||||
, Fixed <$> childrenOfSize (pred n)
|
||||
, (Keyed .) . (Map.fromList .) . zip <$> infiniteListOf arbitrary <*> childrenOfSize (pred n)
|
||||
]
|
||||
childrenOfSize n | n <= 0 = pure []
|
||||
childrenOfSize n = do
|
||||
m <- choose (1, n)
|
||||
first <- diffOfSize m
|
||||
rest <- childrenOfSize (n - m)
|
||||
pure $! first : rest
|
||||
patchOfSize 1 = oneof [ Insert <$> termOfSize 1
|
||||
[ ArbitraryFree <$> arbitrary <*> syntaxOfSize diffOfSize n
|
||||
, ArbitraryPure <$> patchOfSize n ]
|
||||
where patchOfSize 1 = oneof [ Insert <$> termOfSize 1
|
||||
, Delete <$> termOfSize 1 ]
|
||||
patchOfSize n = do
|
||||
m <- choose (1, n - 1)
|
||||
@ -59,11 +51,4 @@ instance (Eq leaf, Eq annotation, Arbitrary leaf, Arbitrary annotation) => Arbit
|
||||
m <- choose (0, n)
|
||||
diffOfSize m
|
||||
|
||||
shrink diff@(ArbitraryDiff annotated) = case annotated of
|
||||
Free (annotation :< syntax) -> (subterms diff ++) $ filter (/= diff) $
|
||||
(ArbitraryDiff .) . (Free .) . (:<) <$> shrink annotation <*> case syntax of
|
||||
Leaf a -> Leaf <$> shrink a
|
||||
Indexed i -> Indexed <$> (List.subsequences i >>= recursivelyShrink)
|
||||
Fixed f -> Fixed <$> (List.subsequences f >>= recursivelyShrink)
|
||||
Keyed k -> Keyed . Map.fromList <$> (List.subsequences (Map.toList k) >>= recursivelyShrink)
|
||||
Pure patch -> ArbitraryDiff . Pure <$> shrink patch
|
||||
shrink = genericShrink
|
||||
|
@ -4,6 +4,7 @@ import Prologue hiding (fst, snd)
|
||||
import qualified Data.ByteString.Char8 as B1
|
||||
import Data.Functor.Both
|
||||
import Data.Functor.Foldable
|
||||
import Data.Record
|
||||
import qualified Data.Text as T
|
||||
import qualified Data.Text.ICU.Detect as Detect
|
||||
import qualified Data.Text.ICU.Convert as Convert
|
||||
@ -39,8 +40,8 @@ lineByLineParser input = pure . cofree . root $ case foldl' annotateLeaves ([],
|
||||
where
|
||||
lines = actualLines input
|
||||
root children = let size = 1 + fromIntegral (length children) in
|
||||
Info (Range 0 $ length input) (Other "program") size size :< Indexed children
|
||||
leaf charIndex line = Info (Range charIndex $ charIndex + T.length line) (Other "program") 1 1 :< Leaf line
|
||||
((Range 0 $ length input) .: Other "program" .: size .: Cost (unSize size) .: RNil) :< Indexed children
|
||||
leaf charIndex line = ((Range charIndex $ charIndex + T.length line) .: Other "program" .: 1 .: 1 .: RNil) :< Leaf line
|
||||
annotateLeaves (accum, charIndex) line =
|
||||
(accum ++ [ leaf charIndex (toText line) ]
|
||||
, charIndex + length line)
|
||||
@ -55,13 +56,13 @@ breakDownLeavesByWord :: Source Char -> Term T.Text Info -> Term T.Text Info
|
||||
breakDownLeavesByWord source = cata replaceIn
|
||||
where
|
||||
replaceIn :: TermF T.Text Info (Term T.Text Info) -> Term T.Text Info
|
||||
replaceIn (info :< syntax) = let size' = 1 + sum (size . extract <$> syntax') in cofree $ info { size = size', cost = size' } :< syntax'
|
||||
replaceIn (info :< syntax) = let size' = 1 + sum (size . extract <$> syntax') in cofree $ setCost (setSize info size') (Cost (unSize size')) :< syntax'
|
||||
where syntax' = case (ranges, syntax) of
|
||||
(_:_:_, Leaf _) -> Indexed (makeLeaf info <$> ranges)
|
||||
_ -> syntax
|
||||
ranges = rangesAndWordsInSource (characterRange info)
|
||||
rangesAndWordsInSource range = rangesAndWordsFrom (start range) (toString $ slice range source)
|
||||
makeLeaf info (range, substring) = cofree $ info { characterRange = range } :< Leaf (T.pack substring)
|
||||
makeLeaf info (range, substring) = cofree $ setCharacterRange info range :< Leaf (T.pack substring)
|
||||
|
||||
-- | Transcode a file to a unicode source.
|
||||
transcode :: B1.ByteString -> IO (Source Char)
|
||||
@ -95,7 +96,6 @@ diffFiles parser renderer sourceBlobs = do
|
||||
pure $! renderer textDiff sourceBlobs
|
||||
where construct :: CofreeF (Syntax Text) (Both Info) (Diff Text Info) -> Diff Text Info
|
||||
construct (info :< syntax) = free (Free ((setCost <$> info <*> sumCost syntax) :< syntax))
|
||||
setCost info cost = info { cost = cost }
|
||||
sumCost = fmap getSum . foldMap (fmap Sum . getCost)
|
||||
getCost diff = case runFree diff of
|
||||
Free (info :< _) -> cost <$> info
|
||||
@ -104,6 +104,6 @@ diffFiles parser renderer sourceBlobs = do
|
||||
|
||||
-- | The sum of the node count of the diff’s patches.
|
||||
diffCostWithCachedTermSizes :: Diff a Info -> Integer
|
||||
diffCostWithCachedTermSizes diff = case runFree diff of
|
||||
diffCostWithCachedTermSizes diff = unCost $ case runFree diff of
|
||||
Free (info :< _) -> sum (cost <$> info)
|
||||
Pure patch -> sum (cost . extract <$> patch)
|
||||
|
38
src/Info.hs
38
src/Info.hs
@ -1,10 +1,40 @@
|
||||
{-# LANGUAGE DataKinds, GeneralizedNewtypeDeriving #-}
|
||||
module Info where
|
||||
|
||||
import Data.Record
|
||||
import Prologue
|
||||
import Category
|
||||
import Range
|
||||
|
||||
-- | An annotation for a source file, including the source range and semantic
|
||||
-- | categories.
|
||||
data Info = Info { characterRange :: !Range, category :: !Category, size :: !Integer, cost :: !Integer }
|
||||
deriving (Eq, Show)
|
||||
newtype Size = Size { unSize :: Integer }
|
||||
deriving (Eq, Num, Ord, Show)
|
||||
newtype Cost = Cost { unCost :: Integer }
|
||||
deriving (Eq, Num, Ord, Show)
|
||||
|
||||
type InfoFields = '[ Range, Category, Size, Cost ]
|
||||
|
||||
type Info = Record InfoFields
|
||||
|
||||
characterRange :: HasField fields Range => Record fields -> Range
|
||||
characterRange = getField
|
||||
|
||||
setCharacterRange :: HasField fields Range => Record fields -> Range -> Record fields
|
||||
setCharacterRange = setField
|
||||
|
||||
category :: HasField fields Category => Record fields -> Category
|
||||
category = getField
|
||||
|
||||
setCategory :: HasField fields Category => Record fields -> Category -> Record fields
|
||||
setCategory = setField
|
||||
|
||||
size :: HasField fields Size => Record fields -> Size
|
||||
size = getField
|
||||
|
||||
setSize :: HasField fields Size => Record fields -> Size -> Record fields
|
||||
setSize = setField
|
||||
|
||||
cost :: HasField fields Cost => Record fields -> Cost
|
||||
cost = getField
|
||||
|
||||
setCost :: HasField fields Cost => Record fields -> Cost -> Record fields
|
||||
setCost = setField
|
||||
|
@ -1,13 +1,20 @@
|
||||
module Interpreter (Comparable, DiffConstructor, diffTerms) where
|
||||
|
||||
import Algorithm
|
||||
import Category
|
||||
import Data.Align
|
||||
import Data.Functor.Foldable
|
||||
import Data.Functor.Both
|
||||
import qualified Data.OrderedMap as Map
|
||||
import qualified Data.List as List
|
||||
import Data.Hashable
|
||||
import Data.List ((\\))
|
||||
import qualified Data.List as List
|
||||
import Data.OrderedMap ((!))
|
||||
import qualified Data.OrderedMap as Map
|
||||
import Data.RandomWalkSimilarity
|
||||
import Data.Record
|
||||
import Data.These
|
||||
import Diff
|
||||
import Info
|
||||
import Operation
|
||||
import Patch
|
||||
import Prologue hiding (lookup)
|
||||
@ -16,17 +23,17 @@ import Syntax
|
||||
import Term
|
||||
|
||||
-- | Returns whether two terms are comparable
|
||||
type Comparable a annotation = Term a annotation -> Term a annotation -> Bool
|
||||
type Comparable leaf annotation = Term leaf annotation -> Term leaf annotation -> Bool
|
||||
|
||||
-- | Constructs a diff from the CofreeF containing its annotation and syntax. This function has the opportunity to, for example, cache properties in the annotation.
|
||||
type DiffConstructor leaf annotation = CofreeF (Syntax leaf) (Both annotation) (Diff leaf annotation) -> Diff leaf annotation
|
||||
|
||||
-- | Diff two terms, given a function that determines whether two terms can be compared and a cost function.
|
||||
diffTerms :: (Eq a, Eq annotation) => DiffConstructor a annotation -> Comparable a annotation -> Cost (Diff a annotation) -> Term a annotation -> Term a annotation -> Diff a annotation
|
||||
diffTerms :: (Eq leaf, Hashable leaf, Ord (Record fields), HasField fields Category) => DiffConstructor leaf (Record fields) -> Comparable leaf (Record fields) -> SES.Cost (Diff leaf (Record fields)) -> Term leaf (Record fields) -> Term leaf (Record fields) -> Diff leaf (Record fields)
|
||||
diffTerms construct comparable cost a b = fromMaybe (pure $ Replace a b) $ constructAndRun construct comparable cost a b
|
||||
|
||||
-- | Constructs an algorithm and runs it
|
||||
constructAndRun :: (Eq a, Eq annotation) => DiffConstructor a annotation -> Comparable a annotation -> Cost (Diff a annotation) -> Term a annotation -> Term a annotation -> Maybe (Diff a annotation)
|
||||
constructAndRun :: (Eq leaf, Hashable leaf, Ord (Record fields), HasField fields Category) => DiffConstructor leaf (Record fields) -> Comparable leaf (Record fields) -> SES.Cost (Diff leaf (Record fields)) -> Term leaf (Record fields) -> Term leaf (Record fields) -> Maybe (Diff leaf (Record fields))
|
||||
constructAndRun _ comparable _ a b | not $ comparable a b = Nothing
|
||||
|
||||
constructAndRun construct _ _ a b | (() <$ a) == (() <$ b) = hylo construct runCofree <$> zipTerms a b
|
||||
@ -41,21 +48,23 @@ constructAndRun construct comparable cost t1 t2 =
|
||||
annotate = pure . construct . (both annotation1 annotation2 :<)
|
||||
|
||||
-- | Runs the diff algorithm
|
||||
run :: (Eq a, Eq annotation) => DiffConstructor a annotation -> Comparable a annotation -> Cost (Diff a annotation) -> Algorithm a annotation (Diff a annotation) -> Maybe (Diff a annotation)
|
||||
run :: (Eq leaf, Hashable leaf, Ord (Record fields), HasField fields Category) => DiffConstructor leaf (Record fields) -> Comparable leaf (Record fields) -> SES.Cost (Diff leaf (Record fields)) -> Algorithm leaf (Record fields) (Diff leaf (Record fields)) -> Maybe (Diff leaf (Record fields))
|
||||
run construct comparable cost algorithm = case runFree algorithm of
|
||||
Pure diff -> Just diff
|
||||
Free (Recursive t1 t2 f) -> run construct comparable cost . f $ recur a b where
|
||||
(annotation1 :< a, annotation2 :< b) = (runCofree t1, runCofree t2)
|
||||
annotate = construct . (both annotation1 annotation2 :<)
|
||||
|
||||
recur (Indexed a') (Indexed b') | length a' == length b' = annotate . Indexed $ zipWith (diffTerms construct comparable cost) a' b'
|
||||
recur (Fixed a') (Fixed b') | length a' == length b' = annotate . Fixed $ zipWith (diffTerms construct comparable cost) a' b'
|
||||
recur (Indexed a') (Indexed b') = annotate . Indexed $ alignWith diffThese a' b'
|
||||
recur (Fixed a') (Fixed b') = annotate . Fixed $ alignWith diffThese a' b'
|
||||
recur (Keyed a') (Keyed b') | Map.keys a' == bKeys = annotate . Keyed . Map.fromList . fmap repack $ bKeys where
|
||||
bKeys = Map.keys b'
|
||||
repack key = (key, interpretInBoth key a' b')
|
||||
interpretInBoth key x y = diffTerms construct comparable cost (x ! key) (y ! key)
|
||||
recur _ _ = pure $ Replace (cofree (annotation1 :< a)) (cofree (annotation2 :< b))
|
||||
|
||||
diffThese = these (pure . Delete) (pure . Insert) (diffTerms construct comparable cost)
|
||||
|
||||
Free (ByKey a b f) -> run construct comparable cost $ f byKey where
|
||||
byKey = Map.fromList $ toKeyValue <$> List.union aKeys bKeys
|
||||
toKeyValue key | key `List.elem` deleted = (key, pure . Delete $ a ! key)
|
||||
@ -67,3 +76,6 @@ run construct comparable cost algorithm = case runFree algorithm of
|
||||
inserted = bKeys \\ aKeys
|
||||
|
||||
Free (ByIndex a b f) -> run construct comparable cost . f $ ses (constructAndRun construct comparable cost) cost a b
|
||||
|
||||
Free (ByRandomWalkSimilarity a b f) -> run construct comparable cost . f $ rws (constructAndRun construct comparable cost) getLabel a b
|
||||
where getLabel = category
|
||||
|
@ -10,11 +10,11 @@ data Operation
|
||||
a -- ^ The type of leaves in the syntax tree, typically String, but possibly some datatype representing different leaves more precisely.
|
||||
annotation -- ^ The type of annotations.
|
||||
f -- ^ The type representing another level of the diffing algorithm. Often Algorithm.
|
||||
=
|
||||
-- | Recursively diff two terms and pass the result to the continuation.
|
||||
Recursive (Term a annotation) (Term a annotation) (Diff a annotation -> f)
|
||||
= Recursive (Term a annotation) (Term a annotation) (Diff a annotation -> f)
|
||||
-- | Diff two dictionaries and pass the result to the continuation.
|
||||
| ByKey (OrderedMap Text (Term a annotation)) (OrderedMap Text (Term a annotation)) (OrderedMap Text (Diff a annotation) -> f)
|
||||
-- | Diff two arrays and pass the result to the continuation.
|
||||
| ByIndex [Term a annotation] [Term a annotation] ([Diff a annotation] -> f)
|
||||
| ByRandomWalkSimilarity [Term a annotation] [Term a annotation] ([Diff a annotation] -> f)
|
||||
deriving Functor
|
||||
|
10
src/Patch.hs
10
src/Patch.hs
@ -12,19 +12,19 @@ import Data.These
|
||||
import Prologue
|
||||
|
||||
-- | An operation to replace, insert, or delete an item.
|
||||
data Patch a =
|
||||
Replace a a
|
||||
data Patch a
|
||||
= Replace a a
|
||||
| Insert a
|
||||
| Delete a
|
||||
deriving (Eq, Foldable, Functor, Generic, Show, Traversable)
|
||||
deriving (Eq, Foldable, Functor, Generic, Ord, Show, Traversable)
|
||||
|
||||
-- | Return the item from the after side of the patch.
|
||||
after :: Patch a -> Maybe a
|
||||
after = maybeFst . unPatch
|
||||
after = maybeSnd . unPatch
|
||||
|
||||
-- | Return the item from the before side of the patch.
|
||||
before :: Patch a -> Maybe a
|
||||
before = maybeSnd . unPatch
|
||||
before = maybeFst . unPatch
|
||||
|
||||
-- | Return both sides of a patch.
|
||||
unPatch :: Patch a -> These a a
|
||||
|
@ -66,7 +66,7 @@ lineFields n term range = [ "number" .= n
|
||||
]
|
||||
|
||||
termFields :: (ToJSON recur, KeyValue kv) => Info -> Syntax leaf recur -> [kv]
|
||||
termFields Info{..} syntax = "range" .= characterRange : "category" .= category : case syntax of
|
||||
termFields info syntax = "range" .= characterRange info : "category" .= category info : case syntax of
|
||||
Leaf _ -> []
|
||||
Indexed c -> childrenFields c
|
||||
Fixed c -> childrenFields c
|
||||
|
@ -93,11 +93,11 @@ split diff blobs = TL.toStrict . renderHtml
|
||||
newtype Renderable a = Renderable a
|
||||
|
||||
instance ToMarkup f => ToMarkup (Renderable (Source Char, Info, Syntax a (f, Range))) where
|
||||
toMarkup (Renderable (source, Info {..}, syntax)) = (! A.data_ (stringValue (show size))) . classifyMarkup category $ case syntax of
|
||||
Leaf _ -> span . string . toString $ slice characterRange source
|
||||
Indexed children -> ul . mconcat $ wrapIn li <$> contentElements source characterRange children
|
||||
Fixed children -> ul . mconcat $ wrapIn li <$> contentElements source characterRange children
|
||||
Keyed children -> dl . mconcat $ wrapIn dd <$> contentElements source characterRange children
|
||||
toMarkup (Renderable (source, info, syntax)) = (! A.data_ (stringValue (show (unSize (size info))))) . classifyMarkup (category info) $ case syntax of
|
||||
Leaf _ -> span . string . toString $ slice (characterRange info) source
|
||||
Indexed children -> ul . mconcat $ wrapIn li <$> contentElements source (characterRange info) children
|
||||
Fixed children -> ul . mconcat $ wrapIn li <$> contentElements source (characterRange info) children
|
||||
Keyed children -> dl . mconcat $ wrapIn dd <$> contentElements source (characterRange info) children
|
||||
|
||||
contentElements :: (Foldable t, ToMarkup f) => Source Char -> Range -> t (f, Range) -> [Markup]
|
||||
contentElements source range children = let (elements, next) = foldr' (markupForContextAndChild source) ([], end range) children in
|
||||
@ -114,13 +114,13 @@ wrapIn _ l@Blaze.Comment{} = l
|
||||
wrapIn f p = f p
|
||||
|
||||
instance ToMarkup (Renderable (Source Char, Term a Info)) where
|
||||
toMarkup (Renderable (source, term)) = Prologue.fst $ cata (\ (info@(Info{..}) :< syntax) -> (toMarkup $ Renderable (source, info, syntax), characterRange)) term
|
||||
toMarkup (Renderable (source, term)) = Prologue.fst $ cata (\ (info :< syntax) -> (toMarkup $ Renderable (source, info, syntax), characterRange info)) term
|
||||
|
||||
instance ToMarkup (Renderable (Source Char, SplitDiff a Info)) where
|
||||
toMarkup (Renderable (source, diff)) = Prologue.fst $ iter (\ (info@(Info{..}) :< syntax) -> (toMarkup $ Renderable (source, info, syntax), characterRange)) $ toMarkupAndRange <$> diff
|
||||
toMarkup (Renderable (source, diff)) = Prologue.fst $ iter (\ (info :< syntax) -> (toMarkup $ Renderable (source, info, syntax), characterRange info)) $ toMarkupAndRange <$> diff
|
||||
where toMarkupAndRange :: SplitPatch (Term a Info) -> (Markup, Range)
|
||||
toMarkupAndRange patch = let term@(Info{..} :< _) = runCofree $ getSplitTerm patch in
|
||||
((div ! A.class_ (splitPatchToClassName patch) ! A.data_ (stringValue (show size))) . toMarkup $ Renderable (source, cofree term), characterRange)
|
||||
toMarkupAndRange patch = let term@(info :< _) = runCofree $ getSplitTerm patch in
|
||||
((div ! A.class_ (splitPatchToClassName patch) ! A.data_ (stringValue (show (unSize (size info))))) . toMarkup $ Renderable (source, cofree term), characterRange info)
|
||||
|
||||
instance ToMarkup a => ToMarkup (Renderable (Bool, Int, a)) where
|
||||
toMarkup (Renderable (hasChanges, num, line)) =
|
||||
|
@ -1,5 +1,6 @@
|
||||
module SplitDiff where
|
||||
|
||||
import Data.Record
|
||||
import Info
|
||||
import Range
|
||||
import Prologue
|
||||
@ -17,7 +18,7 @@ getSplitTerm (SplitDelete a) = a
|
||||
getSplitTerm (SplitReplace a) = a
|
||||
|
||||
-- | Get the range of a SplitDiff.
|
||||
getRange :: SplitDiff leaf Info -> Range
|
||||
getRange :: HasField fields Range => SplitDiff leaf (Record fields) -> Range
|
||||
getRange diff = characterRange $ case runFree diff of
|
||||
Free annotated -> headF annotated
|
||||
Pure patch -> extract (getSplitTerm patch)
|
||||
|
@ -1,8 +1,10 @@
|
||||
module Syntax where
|
||||
|
||||
import Prologue
|
||||
import Data.OrderedMap
|
||||
import Data.OrderedMap as Map
|
||||
import Data.Text.Arbitrary ()
|
||||
import qualified Data.Text as T
|
||||
import Test.QuickCheck hiding (Fixed)
|
||||
|
||||
-- | A node in an abstract syntax tree.
|
||||
data Syntax
|
||||
@ -17,4 +19,27 @@ data Syntax
|
||||
| Fixed [f]
|
||||
-- | A branch of child nodes indexed by some String identity. This is useful for identifying e.g. methods & properties in a class scope by their names. Note that comments can generally occur in these scopes as well; one strategy for dealing with this is to identify comments by their text in the source.
|
||||
| Keyed (OrderedMap T.Text f)
|
||||
deriving (Functor, Show, Eq, Foldable, Traversable)
|
||||
deriving (Eq, Foldable, Functor, Generic, Ord, Show, Traversable)
|
||||
|
||||
|
||||
-- Instances
|
||||
|
||||
syntaxOfSize :: Arbitrary leaf => (Int -> Gen f) -> Int -> Gen (Syntax leaf f)
|
||||
syntaxOfSize recur n | n <= 1 = oneof $ (Leaf <$> arbitrary) : branchGeneratorsOfSize n
|
||||
| otherwise = oneof $ branchGeneratorsOfSize n
|
||||
where branchGeneratorsOfSize n =
|
||||
[ Indexed <$> childrenOfSize (pred n)
|
||||
, Fixed <$> childrenOfSize (pred n)
|
||||
, (Keyed .) . (Map.fromList .) . zip <$> infiniteListOf arbitrary <*> childrenOfSize (pred n)
|
||||
]
|
||||
childrenOfSize n | n <= 0 = pure []
|
||||
childrenOfSize n = do
|
||||
m <- choose (1, n)
|
||||
first <- recur m
|
||||
rest <- childrenOfSize (n - m)
|
||||
pure $! first : rest
|
||||
|
||||
instance (Arbitrary leaf, Arbitrary f) => Arbitrary (Syntax leaf f) where
|
||||
arbitrary = sized (syntaxOfSize (`resize` arbitrary) )
|
||||
|
||||
shrink = genericShrink
|
||||
|
@ -1,4 +1,5 @@
|
||||
{-# LANGUAGE TypeFamilies, TypeSynonymInstances #-}
|
||||
{-# OPTIONS_GHC -fno-warn-orphans #-}
|
||||
module Term where
|
||||
|
||||
import Prologue
|
||||
|
@ -2,52 +2,36 @@
|
||||
module Term.Arbitrary where
|
||||
|
||||
import Data.Functor.Foldable (Base, cata, unfold, Unfoldable(embed))
|
||||
import qualified Data.List as List
|
||||
import qualified Data.OrderedMap as Map
|
||||
import Data.Text.Arbitrary ()
|
||||
import Prologue
|
||||
import Syntax
|
||||
import Term
|
||||
import Test.QuickCheck hiding (Fixed)
|
||||
|
||||
newtype ArbitraryTerm leaf annotation = ArbitraryTerm { unArbitraryTerm :: TermF leaf annotation (ArbitraryTerm leaf annotation) }
|
||||
data ArbitraryTerm leaf annotation = ArbitraryTerm annotation (Syntax leaf (ArbitraryTerm leaf annotation))
|
||||
deriving (Show, Eq, Generic)
|
||||
|
||||
unArbitraryTerm :: ArbitraryTerm leaf annotation -> TermF leaf annotation (ArbitraryTerm leaf annotation)
|
||||
unArbitraryTerm (ArbitraryTerm a s) = a :< s
|
||||
|
||||
toTerm :: ArbitraryTerm leaf annotation -> Term leaf annotation
|
||||
toTerm = unfold unArbitraryTerm
|
||||
|
||||
termOfSize :: (Arbitrary leaf, Arbitrary annotation) => Int -> Gen (ArbitraryTerm leaf annotation)
|
||||
termOfSize n = (ArbitraryTerm .) . (:<) <$> arbitrary <*> syntaxOfSize n
|
||||
where syntaxOfSize n | n <= 1 = oneof $ (Leaf <$> arbitrary) : branchGeneratorsOfSize n
|
||||
| otherwise = oneof $ branchGeneratorsOfSize n
|
||||
branchGeneratorsOfSize n =
|
||||
[ Indexed <$> childrenOfSize (pred n)
|
||||
, Fixed <$> childrenOfSize (pred n)
|
||||
, (Keyed .) . (Map.fromList .) . zip <$> infiniteListOf arbitrary <*> childrenOfSize (pred n)
|
||||
]
|
||||
childrenOfSize n | n <= 0 = pure []
|
||||
childrenOfSize n = do
|
||||
m <- choose (1, n)
|
||||
first <- termOfSize m
|
||||
rest <- childrenOfSize (n - m)
|
||||
pure $! first : rest
|
||||
termOfSize n = ArbitraryTerm <$> arbitrary <*> syntaxOfSize termOfSize n
|
||||
|
||||
arbitraryTermSize :: ArbitraryTerm leaf annotation -> Int
|
||||
arbitraryTermSize = cata (succ . sum) . toTerm
|
||||
|
||||
|
||||
-- Instances
|
||||
|
||||
type instance Base (ArbitraryTerm leaf annotation) = CofreeF (Syntax leaf) annotation
|
||||
instance Unfoldable (ArbitraryTerm leaf annotation) where embed = ArbitraryTerm
|
||||
type instance Base (ArbitraryTerm leaf annotation) = TermF leaf annotation
|
||||
instance Unfoldable (ArbitraryTerm leaf annotation) where embed (a :< s) = ArbitraryTerm a s
|
||||
|
||||
instance (Eq leaf, Eq annotation, Arbitrary leaf, Arbitrary annotation) => Arbitrary (ArbitraryTerm leaf annotation) where
|
||||
arbitrary = sized $ \ n -> do
|
||||
m <- choose (0, n)
|
||||
termOfSize m
|
||||
|
||||
shrink term@(ArbitraryTerm (annotation :< syntax)) = (subterms term ++) $ filter (/= term) $
|
||||
(ArbitraryTerm .) . (:<) <$> shrink annotation <*> case syntax of
|
||||
Leaf a -> Leaf <$> shrink a
|
||||
Indexed i -> Indexed <$> (List.subsequences i >>= recursivelyShrink)
|
||||
Fixed f -> Fixed <$> (List.subsequences f >>= recursivelyShrink)
|
||||
Keyed k -> Keyed . Map.fromList <$> (List.subsequences (Map.toList k) >>= recursivelyShrink)
|
||||
shrink = genericShrink
|
||||
|
@ -1,6 +1,7 @@
|
||||
module TreeSitter where
|
||||
|
||||
import Prologue hiding (Constructor)
|
||||
import Data.Record
|
||||
import Data.String
|
||||
import Category
|
||||
import Info
|
||||
@ -61,7 +62,7 @@ documentToTerm language document contents = alloca $ \ root -> do
|
||||
range <- pure $! Range { start = fromIntegral $ ts_node_p_start_char node, end = fromIntegral $ ts_node_p_end_char node }
|
||||
|
||||
let size' = 1 + sum (size . extract <$> children)
|
||||
let info = Info range (categoriesForLanguage language name) size' size'
|
||||
let info = range .: (categoriesForLanguage language name) .: size' .: Cost (unSize size') .: RNil
|
||||
pure $! termConstructor contents info children
|
||||
getChild node n out = do
|
||||
_ <- ts_node_p_named_child node n out
|
||||
|
@ -1,7 +1,6 @@
|
||||
module AlignmentSpec where
|
||||
|
||||
import Alignment
|
||||
import ArbitraryTerm ()
|
||||
import Control.Arrow ((&&&))
|
||||
import Control.Monad.State
|
||||
import Data.Align hiding (align)
|
||||
@ -11,6 +10,7 @@ import Data.Bifunctor.Join.Arbitrary ()
|
||||
import Data.Functor.Both as Both
|
||||
import Data.List (nub)
|
||||
import Data.Monoid
|
||||
import Data.Record
|
||||
import Data.String
|
||||
import Data.Text.Arbitrary ()
|
||||
import Data.These
|
||||
@ -48,7 +48,7 @@ spec = parallel $ do
|
||||
|
||||
prop "covers every input line" $
|
||||
\ elements -> let (_, children, ranges) = toAlignBranchInputs elements in
|
||||
join <$> (traverse (modifyJoin (fromThese [] []) . fmap pure . fmap Prologue.fst) (alignBranch Prologue.snd children ranges)) `shouldBe` ranges
|
||||
join <$> (traverse (modifyJoin (fromThese [] []) . fmap (pure . Prologue.fst)) (alignBranch Prologue.snd children ranges)) `shouldBe` ranges
|
||||
|
||||
prop "covers every input child" $
|
||||
\ elements -> let (_, children, ranges) = toAlignBranchInputs elements in
|
||||
@ -193,7 +193,7 @@ spec = parallel $ do
|
||||
(info 8 9 `branch` [ info 8 9 `leaf` "c" ]))
|
||||
]
|
||||
|
||||
describe "numberedRows" $
|
||||
describe "numberedRows" $ do
|
||||
prop "counts only non-empty values" $
|
||||
\ xs -> counts (numberedRows (xs :: [Join These Char])) `shouldBe` length . catMaybes <$> Join (unalign (runJoin <$> xs))
|
||||
|
||||
@ -236,9 +236,9 @@ keysOfAlignedChildren lines = lines >>= these identity identity (++) . runJoin .
|
||||
instance Arbitrary BranchElement where
|
||||
arbitrary = oneof [ key >>= \ key -> Child key <$> joinTheseOf (contents key)
|
||||
, Margin <$> joinTheseOf margin ]
|
||||
where key = listOf1 (elements (['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']))
|
||||
where key = listOf1 (elements (['a'..'z'] <> ['A'..'Z'] <> ['0'..'9']))
|
||||
contents key = wrap key <$> listOf (padding '*')
|
||||
wrap key contents = "(" ++ key ++ contents ++ ")" :: String
|
||||
wrap key contents = "(" <> key <> contents <> ")" :: String
|
||||
margin = listOf (padding '-')
|
||||
padding char = frequency [ (10, pure char)
|
||||
, (1, pure '\n') ]
|
||||
@ -258,7 +258,7 @@ align :: Both (Source.Source Char) -> ConstructibleFree (Patch (Term String Info
|
||||
align sources = PrettyDiff sources . fmap (fmap (getRange &&& identity)) . alignDiff sources . deconstruct
|
||||
|
||||
info :: Int -> Int -> Info
|
||||
info start end = Info (Range start end) StringLiteral 0 0
|
||||
info start end = Range start end .: StringLiteral .: 0 .: 0 .: RNil
|
||||
|
||||
prettyDiff :: Both (Source.Source Char) -> [Join These (ConstructibleFree (SplitPatch (Term String Info)) Info)] -> PrettyDiff (SplitDiff String Info)
|
||||
prettyDiff sources = PrettyDiff sources . fmap (fmap ((getRange &&& identity) . deconstruct))
|
||||
|
@ -1,16 +0,0 @@
|
||||
{-# OPTIONS_GHC -fno-warn-orphans #-}
|
||||
module ArbitraryTerm where
|
||||
|
||||
import Data.Text.Arbitrary ()
|
||||
import Data.These.Arbitrary ()
|
||||
import Prologue hiding (fst, snd)
|
||||
import Source hiding ((++))
|
||||
import Test.QuickCheck hiding (Fixed)
|
||||
|
||||
data CategorySet = A | B | C | D deriving (Eq, Show)
|
||||
|
||||
instance Arbitrary CategorySet where
|
||||
arbitrary = elements [ A, B, C, D ]
|
||||
|
||||
instance Arbitrary a => Arbitrary (Source a) where
|
||||
arbitrary = Source.fromList <$> arbitrary
|
36
test/Data/RandomWalkSimilarity/Spec.hs
Normal file
36
test/Data/RandomWalkSimilarity/Spec.hs
Normal file
@ -0,0 +1,36 @@
|
||||
module Data.RandomWalkSimilarity.Spec where
|
||||
|
||||
import Category
|
||||
import Data.DList as DList hiding (toList)
|
||||
import Data.RandomWalkSimilarity
|
||||
import Diff
|
||||
import Patch
|
||||
import Prologue
|
||||
import Syntax
|
||||
import Term
|
||||
import Term.Arbitrary
|
||||
import Test.Hspec
|
||||
import Test.Hspec.QuickCheck
|
||||
import Test.QuickCheck
|
||||
|
||||
spec :: Spec
|
||||
spec = parallel $ do
|
||||
describe "pqGrams" $ do
|
||||
prop "produces grams with stems of the specified length" . forAll (arbitrary `suchThat` (\ (_, p, q) -> p > 0 && q > 0)) $
|
||||
\ (term, p, q) -> pqGrams p q identity (toTerm term :: Term Text Text) `shouldSatisfy` all ((== p) . length . stem)
|
||||
|
||||
prop "produces grams with bases of the specified length" . forAll (arbitrary `suchThat` (\ (_, p, q) -> p > 0 && q > 0)) $
|
||||
\ (term, p, q) -> pqGrams p q identity (toTerm term :: Term Text Text) `shouldSatisfy` all ((== q) . length . base)
|
||||
|
||||
describe "featureVector" $ do
|
||||
prop "produces a vector of the specified dimension" . forAll (arbitrary `suchThat` ((> 0) . Prologue.snd)) $
|
||||
\ (grams, d) -> length (featureVector d (fromList (grams :: [Gram Text]))) `shouldBe` d
|
||||
|
||||
describe "rws" $ do
|
||||
let compare a b = if extract a == extract b then Just (pure (Replace a b)) else Nothing
|
||||
prop "produces correct diffs" . forAll (scale (`div` 4) arbitrary) $
|
||||
\ (as, bs) -> let tas = toTerm <$> as
|
||||
tbs = toTerm <$> bs
|
||||
diff = free (Free (pure Program :< Indexed (rws compare identity tas tbs :: [Diff Text Category]))) in
|
||||
(beforeTerm diff, afterTerm diff) `shouldBe` (Just (cofree (Program :< Indexed tas)), Just (cofree (Program :< Indexed tbs)))
|
||||
|
38
test/Diff/Spec.hs
Normal file
38
test/Diff/Spec.hs
Normal file
@ -0,0 +1,38 @@
|
||||
{-# LANGUAGE DataKinds #-}
|
||||
module Diff.Spec where
|
||||
|
||||
import Category
|
||||
import Data.Record
|
||||
import Data.Text.Arbitrary ()
|
||||
import Diff
|
||||
import Diff.Arbitrary
|
||||
import Interpreter
|
||||
import Prologue
|
||||
import Term.Arbitrary
|
||||
import Test.Hspec
|
||||
import Test.Hspec.QuickCheck
|
||||
import Test.QuickCheck
|
||||
|
||||
spec :: Spec
|
||||
spec = parallel $ do
|
||||
prop "equality is reflexive" $
|
||||
\ a b -> let diff = diffTerms (free . Free) (==) diffCost (toTerm a) (toTerm (b :: ArbitraryTerm Text (Record '[Category]))) in
|
||||
diff `shouldBe` diff
|
||||
|
||||
prop "equal terms produce identity diffs" $
|
||||
\ a -> let term = toTerm (a :: ArbitraryTerm Text (Record '[Category])) in
|
||||
diffCost (diffTerms (free . Free) (==) diffCost term term) `shouldBe` 0
|
||||
|
||||
describe "beforeTerm" $ do
|
||||
prop "recovers the before term" $
|
||||
\ a b -> let diff = diffTerms (free . Free) (==) diffCost (toTerm a) (toTerm (b :: ArbitraryTerm Text (Record '[Category]))) in
|
||||
beforeTerm diff `shouldBe` Just (toTerm a)
|
||||
|
||||
describe "afterTerm" $ do
|
||||
prop "recovers the after term" $
|
||||
\ a b -> let diff = diffTerms (free . Free) (==) diffCost (toTerm a) (toTerm (b :: ArbitraryTerm Text (Record '[Category]))) in
|
||||
afterTerm diff `shouldBe` Just (toTerm b)
|
||||
|
||||
describe "ArbitraryDiff" $ do
|
||||
prop "generates diffs of a specific size" . forAll ((arbitrary >>= \ n -> (,) n <$> diffOfSize n) `suchThat` ((> 0) . fst)) $
|
||||
\ (n, diff) -> arbitraryDiffSize (diff :: ArbitraryDiff Text ()) `shouldBe` n
|
@ -1,6 +1,7 @@
|
||||
module DiffSummarySpec where
|
||||
|
||||
import Prologue
|
||||
import Data.Record
|
||||
import Data.String
|
||||
import Test.Hspec
|
||||
import Diff
|
||||
@ -12,10 +13,10 @@ import Category
|
||||
import DiffSummary
|
||||
|
||||
arrayInfo :: Info
|
||||
arrayInfo = Info (rangeAt 0) ArrayLiteral 2 0
|
||||
arrayInfo = rangeAt 0 .: ArrayLiteral .: 2 .: 0 .: RNil
|
||||
|
||||
literalInfo :: Info
|
||||
literalInfo = Info (rangeAt 1) StringLiteral 1 0
|
||||
literalInfo = rangeAt 1 .: StringLiteral .: 1 .: 0 .: RNil
|
||||
|
||||
testDiff :: Diff String Info
|
||||
testDiff = free $ Free (pure arrayInfo :< Indexed [ free $ Pure (Insert (cofree $ literalInfo :< Leaf "a")) ])
|
||||
|
@ -1,22 +1,30 @@
|
||||
{-# LANGUAGE DataKinds #-}
|
||||
module InterpreterSpec where
|
||||
|
||||
import Prologue
|
||||
import Diff
|
||||
import qualified Interpreter as I
|
||||
import Range
|
||||
import Syntax
|
||||
import Patch
|
||||
import Info
|
||||
import Category
|
||||
import Diff
|
||||
import Data.Record
|
||||
import Interpreter
|
||||
import Patch
|
||||
import Prologue
|
||||
import Syntax
|
||||
import Term.Arbitrary
|
||||
import Test.Hspec
|
||||
import Test.Hspec.QuickCheck
|
||||
|
||||
spec :: Spec
|
||||
spec = parallel $
|
||||
describe "interpret" $
|
||||
spec = parallel $ do
|
||||
describe "interpret" $ do
|
||||
it "returns a replacement when comparing two unicode equivalent terms" $
|
||||
I.diffTerms (free . Free) ((==) `on` extract) diffCost (cofree (Info range StringLiteral 0 0 :< Leaf "t\776")) (cofree (Info range2 StringLiteral 0 0 :< Leaf "\7831")) `shouldBe`
|
||||
free (Pure (Replace (cofree (Info range StringLiteral 0 0 :< Leaf "t\776")) (cofree (Info range2 StringLiteral 0 0 :< Leaf "\7831"))))
|
||||
let termA = cofree $ (StringLiteral .: RNil) :< Leaf ("t\776" :: Text)
|
||||
termB = cofree $ (StringLiteral .: RNil) :< Leaf "\7831" in
|
||||
diffTerms (free . Free) ((==) `on` extract) diffCost termA termB `shouldBe` free (Pure (Replace termA termB))
|
||||
|
||||
where
|
||||
range = Range 0 2
|
||||
range2 = Range 0 1
|
||||
prop "produces correct diffs" $
|
||||
\ a b -> let diff = diffTerms (free . Free) ((==) `on` extract) diffCost (toTerm a) (toTerm b) :: Diff Text (Record '[Category]) in
|
||||
(beforeTerm diff, afterTerm diff) `shouldBe` (Just (toTerm a), Just (toTerm b))
|
||||
|
||||
prop "constructs zero-cost diffs of equal terms" $
|
||||
\ a -> let term = toTerm a
|
||||
diff = diffTerms (free . Free) ((==) `on` extract) diffCost term term :: Diff Text (Record '[Category]) in
|
||||
diffCost diff `shouldBe` 0
|
||||
|
@ -2,7 +2,7 @@ module PatchOutputSpec where
|
||||
|
||||
import Prologue
|
||||
import Data.Functor.Both
|
||||
import Info
|
||||
import Data.Record
|
||||
import Range
|
||||
import Renderer.Patch
|
||||
import Source
|
||||
@ -14,4 +14,4 @@ spec :: Spec
|
||||
spec = parallel $
|
||||
describe "hunks" $
|
||||
it "empty diffs have empty hunks" $
|
||||
hunks (free . Free $ pure (Info (Range 0 0) StringLiteral 1 0) :< Leaf "") (both (SourceBlob (fromList "") "abcde" "path2.txt" (Just defaultPlainBlob)) (SourceBlob (fromList "") "xyz" "path2.txt" (Just defaultPlainBlob))) `shouldBe` [Hunk {offset = pure 0, changes = [], trailingContext = []}]
|
||||
hunks (free . Free $ pure (Range 0 0 .: StringLiteral .: 1 .: 0 .: RNil) :< Leaf "") (both (SourceBlob (fromList "") "abcde" "path2.txt" (Just defaultPlainBlob)) (SourceBlob (fromList "") "xyz" "path2.txt" (Just defaultPlainBlob))) `shouldBe` [Hunk {offset = pure 0, changes = [], trailingContext = []}]
|
||||
|
10
test/Spec.hs
10
test/Spec.hs
@ -3,19 +3,23 @@ module Main where
|
||||
import Prologue
|
||||
import qualified AlignmentSpec
|
||||
import qualified CorpusSpec
|
||||
import qualified Data.RandomWalkSimilarity.Spec
|
||||
import qualified Diff.Spec
|
||||
import qualified DiffSummarySpec
|
||||
import qualified InterpreterSpec
|
||||
import qualified OrderedMapSpec
|
||||
import qualified PatchOutputSpec
|
||||
import qualified TermSpec
|
||||
import qualified DiffSummarySpec
|
||||
import Test.Hspec
|
||||
|
||||
main :: IO ()
|
||||
main = hspec $ parallel $ do
|
||||
main = hspec . parallel $ do
|
||||
describe "Alignment" AlignmentSpec.spec
|
||||
describe "Corpus" CorpusSpec.spec
|
||||
describe "Data.RandomWalkSimilarity" Data.RandomWalkSimilarity.Spec.spec
|
||||
describe "Diff.Spec" Diff.Spec.spec
|
||||
describe "DiffSummary" DiffSummarySpec.spec
|
||||
describe "Interpreter" InterpreterSpec.spec
|
||||
describe "OrderedMap" OrderedMapSpec.spec
|
||||
describe "PatchOutput" PatchOutputSpec.spec
|
||||
describe "Term" TermSpec.spec
|
||||
describe "DiffSummary" DiffSummarySpec.spec
|
||||
|
@ -1,11 +1,6 @@
|
||||
module TermSpec where
|
||||
|
||||
import ArbitraryTerm
|
||||
import Data.String
|
||||
import Data.Text.Arbitrary ()
|
||||
import Diff
|
||||
import Diff.Arbitrary
|
||||
import Interpreter
|
||||
import Prologue
|
||||
import Term.Arbitrary
|
||||
import Test.Hspec
|
||||
@ -16,21 +11,8 @@ spec :: Spec
|
||||
spec = parallel $ do
|
||||
describe "Term" $ do
|
||||
prop "equality is reflexive" $
|
||||
\ a -> toTerm a == toTerm (a :: ArbitraryTerm String ())
|
||||
\ a -> toTerm a `shouldBe` toTerm (a :: ArbitraryTerm Text ())
|
||||
|
||||
describe "ArbitraryTerm" $
|
||||
prop "generates terms of a specific size" $ forAll ((arbitrary >>= \ n -> (,) n <$> termOfSize n) `suchThat` ((> 0) . fst)) $
|
||||
\ (n, term) -> arbitraryTermSize (term :: ArbitraryTerm String ()) `shouldBe` n
|
||||
|
||||
describe "ArbitraryDiff" $
|
||||
prop "generates diffs of a specific size" $ forAll ((arbitrary >>= \ n -> (,) n <$> diffOfSize n) `suchThat` ((> 0) . fst)) $
|
||||
\ (n, diff) -> arbitraryDiffSize (diff :: ArbitraryDiff String ()) `shouldBe` n
|
||||
|
||||
describe "Diff" $ do
|
||||
prop "equality is reflexive" $
|
||||
\ a b -> let diff = diffTerms (free . Free) (==) diffCost (toTerm a) (toTerm (b :: ArbitraryTerm String CategorySet)) in
|
||||
diff == diff
|
||||
|
||||
prop "equal terms produce identity diffs" $
|
||||
\ a -> let term = toTerm (a :: ArbitraryTerm String CategorySet) in
|
||||
diffCost (diffTerms (free . Free) (==) diffCost term term) == 0
|
||||
describe "ArbitraryTerm" $ do
|
||||
prop "generates terms of a specific size" . forAll ((arbitrary >>= \ n -> (,) n <$> termOfSize n) `suchThat` ((> 0) . fst)) $
|
||||
\ (n, term) -> arbitraryTermSize (term :: ArbitraryTerm Text ()) `shouldBe` n
|
||||
|
2
vendor/gitlib
vendored
2
vendor/gitlib
vendored
@ -1 +1 @@
|
||||
Subproject commit 0a9bb2671d8adfc6117173d46ca10fa8b8e94559
|
||||
Subproject commit 4828dbf14adfc4ce0ac3536f8b192e65828e97bc
|
74
weekly/2016-06-21.md
Normal file
74
weekly/2016-06-21.md
Normal file
@ -0,0 +1,74 @@
|
||||
# Semantic Diff Problems (Mini-Summit)
|
||||
|
||||
### Performance (most significant problem)
|
||||
|
||||
- SES / Alignment are biggest time / space consumers.
|
||||
- Profiling small subsets of code paths rather than the full context.
|
||||
- Adding more criterion benchmarks for code paths not currently profiled (like Diff Summaries).
|
||||
|
||||
##### Alignment performance
|
||||
|
||||
- Has to visit each child of each remaining line.
|
||||
|
||||
##### [SES](https://github.com/github/semantic-diff/files/22485/An.O.ND.Difference.Algorithm.and.its.Variations.pdf) Performance
|
||||
|
||||
- n^3 the size of the tree.
|
||||
- Can try bounded SES (looks ahead by a fixed size of nodes).
|
||||
- Identify more comparisons we can skip (i.e. don't compare functions with array literals).
|
||||
- Does not look like there are more easy wins here (algorithm is already implemented to prevent unnecessary comparisions).
|
||||
- In some cases, the diffing is expensive because we don't have more fine-grain identifiers for certain diffs. (e.g. a test file with 100 statement expressions).
|
||||
- Diffing against identifiers (use the edit distance to determine whether to compare terms with SES or not).
|
||||
- This could result in us missing a function rename though.
|
||||
- Not a catchall, but it can help increase performance in a larger number of cases.
|
||||
|
||||
##### [RWS](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf) Performance
|
||||
|
||||
- Random Walk Similarity.
|
||||
- computes approximation to the minimal edit script.
|
||||
- O(log N) rather than O(n^3).
|
||||
- RWS does not rely on identifiers.
|
||||
- RWS solves our performance problem in the general form.
|
||||
- Can allow us to diff patches of patches (something we cannot do currently with our implementation of SES).
|
||||
|
||||
##### Diff summaries performance
|
||||
|
||||
- Performance of DS is dependent on diffing (Diff Terms, Interpreter, cost functions)
|
||||
|
||||
### Failing too hard
|
||||
|
||||
- Request is not completing if Semantic Diff fails.
|
||||
- How can we fail better on dotcom?
|
||||
- How can we fail better when parsing? (both in Semantic Diff and dotcom)
|
||||
|
||||
### Responsiveness
|
||||
|
||||
- Async fetch diff summaries / diffs / progressive diffs or diff summaries
|
||||
|
||||
### Improving grammars
|
||||
|
||||
- Fix Ruby parser.
|
||||
- Testing and verifying other grammars.
|
||||
|
||||
### Measure effectiveness of grammars
|
||||
|
||||
### Tooling
|
||||
|
||||
- Why isn't parallelization of SES having the expected effect?
|
||||
- Should focus on low hanging fruit but we're not going to write a debugger.
|
||||
|
||||
### Time limitations with respect to solutions and team
|
||||
|
||||
### Ramp up time is extremely variable.
|
||||
|
||||
### Onboarding
|
||||
|
||||
- Pairing has been fantastic.
|
||||
- SES algorithm requires some context and background to understand the code at the general / macro level.
|
||||
- Plan a bit before pairing to gain context.
|
||||
|
||||
### Pre-launch Ideas
|
||||
|
||||
- Test on a couple file server nodes and run semantic diff on javascript repos.
|
||||
- Collect repos, files, shas that contain error nodes to gain a % of error rates and expose errors in tree sitter grammars.
|
||||
- If sources have errors, can we use a parser that validates the source is correct?
|
||||
- Configure a script that is as language independent as possible that can automate the error collection process but allows us to specify an independent validating parser for each language.
|
36
weekly/2016-06-27.md
Normal file
36
weekly/2016-06-27.md
Normal file
@ -0,0 +1,36 @@
|
||||
# June 27th, 2016 weekly
|
||||
|
||||
## What went well?
|
||||
|
||||
@joshvera: Pairing, minisummitting, RWS discussions.
|
||||
|
||||
@rewinfrey: Pairing, context on recursion schemes, started independent work on the project, minisummitting. Defined what to work on next
|
||||
|
||||
@robrix: Minisummit: got to know both of you better & really enjoyed that. Before that I was on vacation but you both did a great job!
|
||||
|
||||
|
||||
## What went less well?
|
||||
|
||||
@joshvera: Lots more problems turned up. Lots of stuff that has taken on more importance as we’ve thought about it more. Feel like I could’ve made more progress on diff summaries by now. Some of that has been minisummit, some of that has been every time we do more work on it there seems to be new layers peeling off exposing other issues & more work needing to be done.
|
||||
|
||||
@rewinfrey: Maybe I’m overly optimistic but I don’t have anything to point to that I didn’t think went well. The challenges we identified during minisummit felt like a good sign of the project moving forward.
|
||||
|
||||
@robrix: Ditto. May end up being a bit distracted over the next couple of weeks figuring out some stuff re: summit & my attendance of it.
|
||||
|
||||
|
||||
## What did you learn?
|
||||
|
||||
@joshvera: Type ∋ Type isn’t as easy as it sounds. Learned about you both too!
|
||||
|
||||
@rewinfrey: Learned about you both. Recursion schemes! Relationships between algebras & projections, coalgebras & embeddings, and recursion-scheme’s `Base` type family. Further explored some other morphisms. RWS (albeit misreading some of it).
|
||||
|
||||
@robrix: Learned about you both! RWS. Some stuff about derivative-parsing. Learned a lot about communication too.
|
||||
|
||||
|
||||
## Anything else?
|
||||
|
||||
@joshvera: Out Thursday/Friday.
|
||||
|
||||
@rewinfrey: Josh, how did the blue suit fit? (“Really well.”)
|
||||
|
||||
@robrix: Canada Day on Friday. You’re both invited to celebrate it as well, by being as Canadian as possible.
|
Loading…
Reference in New Issue
Block a user