Add Deconvolution layer and update the GAN example to use it

This cuts the runtime by about 70% which is nice, and it's a better algorithm for it anyway. I've also refactored the Convolution layer such that there's only one actual implementation instead of two, and with that provided a few more instances for 2D and 3D shapes in and out. Update to the README and mnist show higher levels of composition.
2024-09-11 06:35:23 +03:00 · 2017-04-06 13:34:43 +10:00 · 2017-04-06 13:34:43 +10:00 · 10a6003e73
commit 10a6003e73
parent e6293b8461
8 changed files with 439 additions and 80 deletions
--- a/README.md
+++ b/README.md
@ -12,7 +12,7 @@ Five is right out.

 💣 Machine learning which might blow up in your face 💣

-Grenade is a dependently typed, practical, and fast recurrent neural network library
+Grenade is a composable, dependently typed, practical, and fast recurrent neural network library
 for concise and precise specifications of complex networks in Haskell.

 As an example, a network which can achieve ~1.5% error on MNIST can be
@ -100,9 +100,48 @@ easy in downstream code. If the shapes of a network are not specified correctly
 and a layer can not sensibly perform the operation between two shapes, then
 it will result in a compile time error.

+Composition
+-----------
+
+Networks and Layers in Grenade are easily composed at the type level. As a `Network`
+is an instance of `Layer`, one can use a trained Network as a small component in a
+larger network easily. Furthermore, we provide 2 layers which are designed to run
+layers in parallel and merge their output (either by concatenating them across one
+dimension or summing by pointwise adding their activations). This allows one to
+write any Network which can be expressed as a
+[series parallel graph](https://en.wikipedia.org/wiki/Series-parallel_graph).
+
+A residual network layer specification for instance could be written as
+```haskell
+type Residual net = Merge Trivial net
+```
+If the type `net` is an instance of `Layer`, then `Residual net` will be too. It will
+run the network, while retaining its input by passing it through the `Trivial` layer,
+and merge the original image with the output.
+
+See the [MNIST](https://github.com/HuwCampbell/grenade/blob/master/examples/main/mnist.hs)
+example, which has been overengineered to contain both residual style learning as well
+as inception style convolutions.
+
+Generative Adversarial Networks
+-------------------------------
+
+As Grenade is purely functional, one can compose its training functions in flexible
+ways. [GAN-MNIST](https://github.com/HuwCampbell/grenade/blob/master/examples/main/gan-mnist.hs)
+example displays an interesting, type safe way of writing a generative adversarial
+training function in 10 lines of code.
+
+Layer Zoo
+---------
+
+Grenade layers are normal haskell data types which are an instance of `Layer`, so
+it's easy to build one's own downstream code. We do however provide a decent set
+of layers, including convolution, deconvolution, pooling, pad, crop, logit, relu,
+elu, tanh, and fully connected.
+
 Build Instructions
 ------------------
-Grenade currently only builds with the [mafia](https://github.com/ambiata/mafia)
+Grenade is most easily built with the [mafia](https://github.com/ambiata/mafia)
 script that is located in the repository. You will also need the `lapack` and
 `blas` libraries and development tools. Once you have all that, Grenade can be
 build using:
@ -117,7 +156,7 @@ and the tests run using:
 ./mafia test
 ```

-Grenade is currently known to build with ghc 7.10 and 8.0.
+Grenade builds with ghc 7.10 and 8.0.

 Thanks
 ------
--- a/examples/main/gan-mnist.hs
+++ b/examples/main/gan-mnist.hs
@ -10,8 +10,8 @@
 -- This is a simple generative adversarial network to make pictures
 -- of numbers similar to those in MNIST.
 --
-- It demonstrates a different usage of the library. Within about an
-- hour it was producing examples like this:
+-- It demonstrates a different usage of the library. Within about 15
+-- minutes hour it was producing examples like this:
 --
 --               --.
 --     .=-.--..#=###
@ -57,11 +57,25 @@ import           Options.Applicative
 import           Grenade
 import           Grenade.Utils.OneHot

-type Discriminator = Network '[ Convolution 1 10 5 5 1 1, Pooling 2 2 2 2, Relu, Convolution 10 16 5 5 1 1, Pooling 2 2 2 2, Reshape, Relu, FullyConnected 256 80, Logit, FullyConnected 80 1, Logit]
-           '[ 'D2 28 28, 'D3 24 24 10, 'D3 12 12 10, 'D3 12 12 10, 'D3 8 8 16, 'D3 4 4 16, 'D1 256, 'D1 256, 'D1 80, 'D1 80, 'D1 1, 'D1 1]
+type Discriminator =
+  Network
+    '[ Convolution 1 10 5 5 1 1, Pooling 2 2 2 2, Relu
+     , Convolution 10 16 5 5 1 1, Pooling 2 2 2 2, Relu
+     , Reshape, FullyConnected 256 80, Logit, FullyConnected 80 1, Logit]
+    '[ 'D2 28 28
+     , 'D3 24 24 10, 'D3 12 12 10, 'D3 12 12 10
+     , 'D3 8 8 16, 'D3 4 4 16, 'D3 4 4 16
+     , 'D1 256, 'D1 80, 'D1 80, 'D1 1, 'D1 1]

-type Generator = Network '[ FullyConnected 100 10240, Relu, Reshape, Convolution 10 10 5 5 1 1, Relu, Convolution 10 1 1 1 1 1, Logit, Reshape]
-                         '[ 'D1 100, 'D1 10240, 'D1 10240, 'D3 32 32 10, 'D3 28 28 10, 'D3 28 28 10, 'D3 28 28 1, 'D3 28 28 1, 'D2 28 28 ]
+type Generator =
+  Network
+    '[ FullyConnected 80 256, Relu, Reshape
+     , Deconvolution 16 10 5 5 2 2, Relu
+     , Deconvolution 10 1 8 8 2 2, Logit]
+    '[ 'D1 80
+     , 'D1 256, 'D1 256, 'D3 4 4 16
+     , 'D3 11 11 10, 'D3 11 11 10
+     , 'D2 28 28, 'D2 28 28 ]

 randomDiscriminator :: MonadRandom m => m Discriminator
 randomDiscriminator = randomNetwork
@ -69,7 +83,7 @@ randomDiscriminator = randomNetwork
 randomGenerator :: MonadRandom m => m Generator
 randomGenerator = randomNetwork

-trainExample :: LearningParameters -> Discriminator -> Generator -> S ('D2 28 28) -> S ('D1 100) -> ( Discriminator, Generator )
+trainExample :: LearningParameters -> Discriminator -> Generator -> S ('D2 28 28) -> S ('D1 80) -> ( Discriminator, Generator )
 trainExample rate discriminator generator realExample noiseSource
 = let (generatorTape, fakeExample)       = runNetwork generator noiseSource

--- a/examples/main/mnist.hs
+++ b/examples/main/mnist.hs
@ -5,6 +5,7 @@
 {-# LANGUAGE TupleSections         #-}
 {-# LANGUAGE TypeFamilies          #-}
 {-# LANGUAGE FlexibleContexts      #-}
+
 import           Control.Applicative
 import           Control.Monad
 import           Control.Monad.Random
@ -25,6 +26,15 @@ import           Options.Applicative
 import           Grenade
 import           Grenade.Utils.OneHot

+-- It's logistic regression!
+--
+-- This network is used to show how we can embed a Network as a layer in the larger MNIST
+-- type.
+type FL i o =
+  Network
+    '[ FullyConnected i o, Logit ]
+    '[ 'D1 i, 'D1 o, 'D1 o ]
+
 -- The definition of our convolutional neural network.
 -- In the type signature, we have a type level list of shapes which are passed between the layers.
 -- One can see that the images we are inputing are two dimensional with 28 * 28 pixels.
@ -40,18 +50,15 @@ import           Grenade.Utils.OneHot
 --
 type MNIST =
  Network
-    '[ Reshape
-     , Inception 28 28 1 5 5 5, Pooling 2 2 2 2, Relu
-     , Inception 14 14 15 5 5 5, Pooling 2 2 2 2, Relu
-     , Reshape
-     , FullyConnected 735 80, Logit
-     , FullyConnected 80 10, Logit]
-    '[ 'D2 28 28, 'D3 28 28 1
-     , 'D3 28 28 15, 'D3 14 14 15, 'D3 14 14 15
-     , 'D3 14 14 15, 'D3 7 7 15, 'D3 7 7 15
-     , 'D1 735
-     , 'D1 80, 'D1 80
-     , 'D1 10, 'D1 10]
+    '[ Reshape,
+       Concat ('D3 28 28 1) Trivial ('D3 28 28 14) (InceptionMini 28 28 1 5 9),
+       Pooling 2 2 2 2, Relu,
+       Concat ('D3 14 14 3) (Convolution 15 3 1 1 1 1) ('D3 14 14 15) (InceptionMini 14 14 15 5 10), Crop 1 1 1 1, Pooling 3 3 3 3, Relu,
+       Reshape, FL 288 80, FL 80 10 ]
+    '[ 'D2 28 28, 'D3 28 28 1,
+       'D3 28 28 15, 'D3 14 14 15, 'D3 14 14 15, 'D3 14 14 18,
+       'D3 12 12 18, 'D3 4 4 18, 'D3 4 4 18,
+       'D1 288, 'D1 80, 'D1 10 ]

 randomMnist :: MonadRandom m => m MNIST
 randomMnist = randomNetwork
--- a/grenade.cabal
+++ b/grenade.cabal
@ -61,6 +61,7 @@ library
                       Grenade.Layers.Concat
                       Grenade.Layers.Convolution
                       Grenade.Layers.Crop
+                       Grenade.Layers.Deconvolution
                       Grenade.Layers.Dropout
                       Grenade.Layers.Elu
                       Grenade.Layers.FullyConnected
--- a/src/Grenade/Layers.hs
+++ b/src/Grenade/Layers.hs
@ -2,6 +2,7 @@ module Grenade.Layers (
    module Grenade.Layers.Concat
  , module Grenade.Layers.Convolution
  , module Grenade.Layers.Crop
+  , module Grenade.Layers.Deconvolution
  , module Grenade.Layers.Elu
  , module Grenade.Layers.FullyConnected
  , module Grenade.Layers.Inception
@ -19,6 +20,7 @@ module Grenade.Layers (
 import           Grenade.Layers.Concat
 import           Grenade.Layers.Convolution
 import           Grenade.Layers.Crop
+import           Grenade.Layers.Deconvolution
 import           Grenade.Layers.Elu
 import           Grenade.Layers.Pad
 import           Grenade.Layers.FullyConnected
--- a/src/Grenade/Layers/Convolution.hs
+++ b/src/Grenade/Layers/Convolution.hs
@ -141,62 +141,6 @@ instance ( KnownNat channels
      let mm = konst 0
      return $ Convolution wN mm

-- | A two dimentional image may have a convolution filter applied to it
-instance ( KnownNat kernelRows
-         , KnownNat kernelCols
-         , KnownNat filters
-         , KnownNat strideRows
-         , KnownNat strideCols
-         , KnownNat inputRows
-         , KnownNat inputCols
-         , KnownNat outputRows
-         , KnownNat outputCols
-         , ((outputRows - 1) * strideRows) ~ (inputRows - kernelRows)
-         , ((outputCols - 1) * strideCols) ~ (inputCols - kernelCols)
-         , KnownNat (kernelRows * kernelCols * 1)
-         , KnownNat (outputRows * filters)
-         ) => Layer (Convolution 1 filters kernelRows kernelCols strideRows strideCols) ('D2 inputRows inputCols) ('D3 outputRows outputCols filters) where
-  type Tape (Convolution 1 filters kernelRows kernelCols strideRows strideCols) ('D2 inputRows inputCols) ('D3 outputRows outputCols filters) = S ('D2 inputRows inputCols)
-  runForwards (Convolution kernel _) (S2D input) =
-    let ex = extract input
-        ek = extract kernel
-        kx = fromIntegral $ natVal (Proxy :: Proxy kernelRows)
-        ky = fromIntegral $ natVal (Proxy :: Proxy kernelCols)
-        sx = fromIntegral $ natVal (Proxy :: Proxy strideRows)
-        sy = fromIntegral $ natVal (Proxy :: Proxy strideCols)
-        ox = fromIntegral $ natVal (Proxy :: Proxy outputRows)
-        oy = fromIntegral $ natVal (Proxy :: Proxy outputCols)
-        c  = im2col kx ky sx sy ex
-        mt = c LA.<> ek
-        r  = col2vid 1 1 1 1 ox oy mt
-        rs = fromJust . create $ r
-    in  (S2D input, S3D rs)
-
-  runBackwards (Convolution kernel _) (S2D input) (S3D dEdy) =
-    let ex = extract input
-        ix = fromIntegral $ natVal (Proxy :: Proxy inputRows)
-        iy = fromIntegral $ natVal (Proxy :: Proxy inputCols)
-        kx = fromIntegral $ natVal (Proxy :: Proxy kernelRows)
-        ky = fromIntegral $ natVal (Proxy :: Proxy kernelCols)
-        sx = fromIntegral $ natVal (Proxy :: Proxy strideRows)
-        sy = fromIntegral $ natVal (Proxy :: Proxy strideCols)
-        ox = fromIntegral $ natVal (Proxy :: Proxy outputRows)
-        oy = fromIntegral $ natVal (Proxy :: Proxy outputCols)
-
-        c  = im2col kx ky sx sy ex
-
-        eo = extract dEdy
-        ek = extract kernel
-
-        vs = vid2col 1 1 1 1 ox oy eo
-
-        kN = fromJust . create $ tr c LA.<> vs
-        dW = vs LA.<> tr ek
-
-        xW = col2im kx ky sx sy ix iy dW
-    in  (Convolution' kN, S2D . fromJust . create $ xW)
-
-
 -- | A three dimensional image (or 2d with many channels) can have
 --   an appropriately sized convolution filter run across it.
 instance ( KnownNat kernelRows
@ -258,3 +202,75 @@ instance ( KnownNat kernelRows

        xW = col2vid kx ky sx sy ix iy dW
    in  (Convolution' kN, S3D . fromJust . create $ xW)
+
+
+-- | A two dimentional image may have a convolution filter applied to it
+instance ( KnownNat kernelRows
+         , KnownNat kernelCols
+         , KnownNat filters
+         , KnownNat strideRows
+         , KnownNat strideCols
+         , KnownNat inputRows
+         , KnownNat inputCols
+         , KnownNat outputRows
+         , KnownNat outputCols
+         , ((outputRows - 1) * strideRows) ~ (inputRows - kernelRows)
+         , ((outputCols - 1) * strideCols) ~ (inputCols - kernelCols)
+         , KnownNat (kernelRows * kernelCols * 1)
+         , KnownNat (outputRows * filters)
+         ) => Layer (Convolution 1 filters kernelRows kernelCols strideRows strideCols) ('D2 inputRows inputCols) ('D3 outputRows outputCols filters) where
+  type Tape (Convolution 1 filters kernelRows kernelCols strideRows strideCols) ('D2 inputRows inputCols) ('D3 outputRows outputCols filters) = S ('D3 inputRows inputCols 1)
+  runForwards c (S2D input) =
+    runForwards c (S3D input :: S ('D3 inputRows inputCols 1))
+
+  runBackwards c tape grads =
+    case runBackwards c tape grads of
+      (c', S3D back :: S ('D3 inputRows inputCols 1)) ->  (c', S2D back)
+
+
+-- | A two dimensional image may have a convolution filter applied to it producing
+--   a two dimensional image if both channels and filters is 1.
+instance ( KnownNat kernelRows
+         , KnownNat kernelCols
+         , KnownNat strideRows
+         , KnownNat strideCols
+         , KnownNat inputRows
+         , KnownNat inputCols
+         , KnownNat outputRows
+         , KnownNat outputCols
+         , ((outputRows - 1) * strideRows) ~ (inputRows - kernelRows)
+         , ((outputCols - 1) * strideCols) ~ (inputCols - kernelCols)
+         , KnownNat (kernelRows * kernelCols * 1)
+         , KnownNat (outputRows * 1)
+         ) => Layer (Convolution 1 1 kernelRows kernelCols strideRows strideCols) ('D2 inputRows inputCols) ('D2 outputRows outputCols) where
+  type Tape (Convolution 1 1 kernelRows kernelCols strideRows strideCols) ('D2 inputRows inputCols) ('D2 outputRows outputCols) = S ('D3 inputRows inputCols 1)
+  runForwards c (S2D input) =
+    case runForwards c (S3D input :: S ('D3 inputRows inputCols 1)) of
+      (tps, S3D back :: S ('D3 outputRows outputCols 1)) ->  (tps, S2D back)
+
+  runBackwards c tape (S2D grads) =
+    case runBackwards c tape (S3D grads :: S ('D3 outputRows outputCols 1)) of
+      (c', S3D back :: S ('D3 inputRows inputCols 1)) -> (c', S2D back)
+
+-- | A three dimensional image can produce a 2D image from a convolution with 1 filter
+instance ( KnownNat kernelRows
+         , KnownNat kernelCols
+         , KnownNat strideRows
+         , KnownNat strideCols
+         , KnownNat inputRows
+         , KnownNat inputCols
+         , KnownNat outputRows
+         , KnownNat outputCols
+         , KnownNat channels
+         , ((outputRows - 1) * strideRows) ~ (inputRows - kernelRows)
+         , ((outputCols - 1) * strideCols) ~ (inputCols - kernelCols)
+         , KnownNat (kernelRows * kernelCols * channels)
+         , KnownNat (outputRows * 1)
+         ) => Layer (Convolution channels 1 kernelRows kernelCols strideRows strideCols) ('D3 inputRows inputCols channels) ('D2 outputRows outputCols) where
+  type Tape (Convolution channels 1 kernelRows kernelCols strideRows strideCols) ('D3 inputRows inputCols channels) ('D2 outputRows outputCols) = S ('D3 inputRows inputCols channels)
+  runForwards c input =
+    case runForwards c input of
+      (tps, S3D back :: S ('D3 outputRows outputCols 1)) ->  (tps, S2D back)
+
+  runBackwards c tape (S2D grads) =
+    runBackwards c tape (S3D grads :: S ('D3 outputRows outputCols 1))
--- a/src/Grenade/Layers/Deconvolution.hs
+++ b/src/Grenade/Layers/Deconvolution.hs
@ -0,0 +1,275 @@
+{-# LANGUAGE DataKinds             #-}
+{-# LANGUAGE ScopedTypeVariables   #-}
+{-# LANGUAGE RecordWildCards       #-}
+{-# LANGUAGE GADTs                 #-}
+{-# LANGUAGE TypeOperators         #-}
+{-# LANGUAGE TypeFamilies          #-}
+{-# LANGUAGE MultiParamTypeClasses #-}
+{-# LANGUAGE FlexibleInstances     #-}
+{-# LANGUAGE FlexibleContexts      #-}
+module Grenade.Layers.Deconvolution (
+    Deconvolution (..)
+  , Deconvolution' (..)
+  , randomDeconvolution
+  ) where
+
+import           Control.Monad.Random hiding ( fromList )
+import           Data.Maybe
+import           Data.Proxy
+import           Data.Serialize
+import           Data.Singletons.TypeLits
+
+import           GHC.TypeLits
+
+import           Numeric.LinearAlgebra hiding ( uniformSample, konst )
+import qualified Numeric.LinearAlgebra as LA
+import           Numeric.LinearAlgebra.Static hiding ((|||), build, toRows)
+
+import           Grenade.Core
+import           Grenade.Layers.Internal.Convolution
+import           Grenade.Layers.Internal.Update
+
+-- | A Deconvolution layer for a neural network.
+--   This uses the im2col Deconvolution trick popularised by Caffe, which essentially turns the
+--   many, many, many, many loop Deconvolution into a single matrix multiplication.
+--
+--   The Deconvolution layer takes all of the kernels for the Deconvolution, which are flattened
+--   and then put into columns in the matrix.
+--
+--   The kernel size dictates which input and output sizes will "fit". Fitting the equation:
+--   `out = (in - kernel) / stride + 1` for both dimensions.
+--
+--   One probably shouldn't build their own layer, but rather use the randomDeconvolution function.
+data Deconvolution :: Nat -- Number of channels, for the first layer this could be RGB for instance.
+                 -> Nat -- Number of filters, this is the number of channels output by the layer.
+                 -> Nat -- The number of rows in the kernel filter
+                 -> Nat -- The number of column in the kernel filter
+                 -> Nat -- The row stride of the Deconvolution filter
+                 -> Nat -- The columns stride of the Deconvolution filter
+                 -> * where
+  Deconvolution :: ( KnownNat channels
+                 , KnownNat filters
+                 , KnownNat kernelRows
+                 , KnownNat kernelColumns
+                 , KnownNat strideRows
+                 , KnownNat strideColumns
+                 , KnownNat kernelFlattened
+                 , kernelFlattened ~ (kernelRows * kernelColumns * filters))
+              => !(L kernelFlattened channels) -- The kernel filter weights
+              -> !(L kernelFlattened channels) -- The last kernel update (or momentum)
+              -> Deconvolution channels filters kernelRows kernelColumns strideRows strideColumns
+
+data Deconvolution' :: Nat -- Number of channels, for the first layer this could be RGB for instance.
+                  -> Nat -- Number of filters, this is the number of channels output by the layer.
+                  -> Nat -- The number of rows in the kernel filter
+                  -> Nat -- The number of column in the kernel filter
+                  -> Nat -- The row stride of the Deconvolution filter
+                  -> Nat -- The columns stride of the Deconvolution filter
+                  -> * where
+  Deconvolution' :: ( KnownNat channels
+                  , KnownNat filters
+                  , KnownNat kernelRows
+                  , KnownNat kernelColumns
+                  , KnownNat strideRows
+                  , KnownNat strideColumns
+                  , KnownNat kernelFlattened
+                  , kernelFlattened ~ (kernelRows * kernelColumns * filters))
+               => !(L kernelFlattened channels) -- The kernel filter gradient
+               -> Deconvolution' channels filters kernelRows kernelColumns strideRows strideColumns
+
+instance Show (Deconvolution c f k k' s s') where
+  show (Deconvolution a _) = renderConv a
+    where
+      renderConv mm =
+        let m  = extract mm
+            ky = fromIntegral $ natVal (Proxy :: Proxy k)
+            rs = LA.toColumns m
+            ms = map (take ky) $ toLists . reshape ky <$> rs
+
+            render n'  | n' <= 0.2  = ' '
+                       | n' <= 0.4  = '.'
+                       | n' <= 0.6  = '-'
+                       | n' <= 0.8  = '='
+                       | otherwise =  '#'
+
+            px = (fmap . fmap . fmap) render ms
+        in unlines $ foldl1 (zipWith (\a' b' -> a' ++ "   |   " ++ b')) $ px
+
+randomDeconvolution :: ( MonadRandom m
+                     , KnownNat channels
+                     , KnownNat filters
+                     , KnownNat kernelRows
+                     , KnownNat kernelColumns
+                     , KnownNat strideRows
+                     , KnownNat strideColumns
+                     , KnownNat kernelFlattened
+                     , kernelFlattened ~ (kernelRows * kernelColumns * filters))
+                  => m (Deconvolution channels filters kernelRows kernelColumns strideRows strideColumns)
+randomDeconvolution = do
+    s     <- getRandom
+    let wN = uniformSample s (-1) 1
+        mm = konst 0
+    return $ Deconvolution wN mm
+
+instance ( KnownNat channels
+         , KnownNat filters
+         , KnownNat kernelRows
+         , KnownNat kernelColumns
+         , KnownNat strideRows
+         , KnownNat strideColumns
+         , KnownNat (kernelRows * kernelColumns * filters)
+         ) => UpdateLayer (Deconvolution channels filters kernelRows kernelColumns strideRows strideColumns) where
+  type Gradient (Deconvolution channels filters kernelRows kernelCols strideRows strideCols) = (Deconvolution' channels filters kernelRows kernelCols strideRows strideCols)
+  runUpdate LearningParameters {..} (Deconvolution oldKernel oldMomentum) (Deconvolution' kernelGradient) =
+    let (newKernel, newMomentum) = decendMatrix learningRate learningMomentum learningRegulariser oldKernel kernelGradient oldMomentum
+    in Deconvolution newKernel newMomentum
+
+  createRandom = randomDeconvolution
+
+instance ( KnownNat channels
+         , KnownNat filters
+         , KnownNat kernelRows
+         , KnownNat kernelColumns
+         , KnownNat strideRows
+         , KnownNat strideColumns
+         , KnownNat (kernelRows * kernelColumns * filters)
+         ) => Serialize (Deconvolution channels filters kernelRows kernelColumns strideRows strideColumns) where
+  put (Deconvolution w _) = putListOf put . toList . flatten . extract $ w
+  get = do
+      let f  = fromIntegral $ natVal (Proxy :: Proxy channels)
+      wN    <- maybe (fail "Vector of incorrect size") return . create . reshape f . LA.fromList =<< getListOf get
+      let mm = konst 0
+      return $ Deconvolution wN mm
+
+-- | A two dimentional image may have a Deconvolution filter applied to it
+instance ( KnownNat kernelRows
+         , KnownNat kernelCols
+         , KnownNat filters
+         , KnownNat strideRows
+         , KnownNat strideCols
+         , KnownNat inputRows
+         , KnownNat inputCols
+         , KnownNat outputRows
+         , KnownNat outputCols
+         , ((inputRows - 1) * strideRows) ~ (outputRows - kernelRows)
+         , ((inputCols - 1) * strideCols) ~ (outputCols - kernelCols)
+         , KnownNat (kernelRows * kernelCols * filters)
+         , KnownNat (outputRows * filters)
+         ) => Layer (Deconvolution 1 filters kernelRows kernelCols strideRows strideCols) ('D2 inputRows inputCols) ('D3 outputRows outputCols filters) where
+  type Tape (Deconvolution 1 filters kernelRows kernelCols strideRows strideCols) ('D2 inputRows inputCols) ('D3 outputRows outputCols filters) = S ('D3 inputRows inputCols 1)
+  runForwards c (S2D input) =
+    runForwards c (S3D input :: S ('D3 inputRows inputCols 1))
+
+  runBackwards c tape grads =
+    case runBackwards c tape grads of
+      (c', S3D back :: S ('D3 inputRows inputCols 1)) ->  (c', S2D back)
+
+-- | A two dimentional image may have a Deconvolution filter applied to it
+instance ( KnownNat kernelRows
+         , KnownNat kernelCols
+         , KnownNat strideRows
+         , KnownNat strideCols
+         , KnownNat inputRows
+         , KnownNat inputCols
+         , KnownNat outputRows
+         , KnownNat outputCols
+         , ((inputRows - 1) * strideRows) ~ (outputRows - kernelRows)
+         , ((inputCols - 1) * strideCols) ~ (outputCols - kernelCols)
+         , KnownNat (kernelRows * kernelCols * 1)
+         , KnownNat (outputRows * 1)
+         ) => Layer (Deconvolution 1 1 kernelRows kernelCols strideRows strideCols) ('D2 inputRows inputCols) ('D2 outputRows outputCols) where
+  type Tape (Deconvolution 1 1 kernelRows kernelCols strideRows strideCols) ('D2 inputRows inputCols) ('D2 outputRows outputCols) = S ('D3 inputRows inputCols 1)
+  runForwards c (S2D input) =
+    case runForwards c (S3D input :: S ('D3 inputRows inputCols 1)) of
+      (tps, S3D fore :: S ('D3 outputRows outputCols 1)) ->  (tps, S2D fore)
+
+  runBackwards c tape (S2D grads) =
+    case runBackwards c tape (S3D grads :: S ('D3 outputRows outputCols 1)) of
+      (c', S3D back :: S ('D3 inputRows inputCols 1)) ->  (c', S2D back)
+
+-- | A two dimentional image may have a Deconvolution filter applied to it
+instance ( KnownNat kernelRows
+         , KnownNat kernelCols
+         , KnownNat strideRows
+         , KnownNat strideCols
+         , KnownNat inputRows
+         , KnownNat inputCols
+         , KnownNat outputRows
+         , KnownNat outputCols
+         , ((inputRows - 1) * strideRows) ~ (outputRows - kernelRows)
+         , ((inputCols - 1) * strideCols) ~ (outputCols - kernelCols)
+         , KnownNat (kernelRows * kernelCols * 1)
+         , KnownNat (outputRows * 1)
+         , KnownNat channels
+         ) => Layer (Deconvolution channels 1 kernelRows kernelCols strideRows strideCols) ('D3 inputRows inputCols channels) ('D2 outputRows outputCols) where
+  type Tape (Deconvolution channels 1 kernelRows kernelCols strideRows strideCols) ('D3 inputRows inputCols channels) ('D2 outputRows outputCols) = S ('D3 inputRows inputCols channels)
+  runForwards c input =
+    case runForwards c input of
+      (tps, S3D fore :: S ('D3 outputRows outputCols 1)) ->  (tps, S2D fore)
+
+  runBackwards c tape (S2D grads) =
+    runBackwards c tape (S3D grads :: S ('D3 outputRows outputCols 1))
+
+-- | A three dimensional image (or 2d with many channels) can have
+--   an appropriately sized Deconvolution filter run across it.
+instance ( KnownNat kernelRows
+         , KnownNat kernelCols
+         , KnownNat filters
+         , KnownNat strideRows
+         , KnownNat strideCols
+         , KnownNat inputRows
+         , KnownNat inputCols
+         , KnownNat outputRows
+         , KnownNat outputCols
+         , KnownNat channels
+         , ((inputRows - 1) * strideRows) ~ (outputRows - kernelRows)
+         , ((inputCols - 1) * strideCols) ~ (outputCols - kernelCols)
+         , KnownNat (kernelRows * kernelCols * filters)
+         , KnownNat (outputRows * filters)
+         ) => Layer (Deconvolution channels filters kernelRows kernelCols strideRows strideCols) ('D3 inputRows inputCols channels) ('D3 outputRows outputCols filters) where
+
+  type Tape (Deconvolution channels filters kernelRows kernelCols strideRows strideCols) ('D3 inputRows inputCols channels) ('D3 outputRows outputCols filters) = S ('D3 inputRows inputCols channels)
+
+  runForwards (Deconvolution kernel _) (S3D input) =
+    let ex = extract input
+        ek = extract kernel
+        ix = fromIntegral $ natVal (Proxy :: Proxy inputRows)
+        iy = fromIntegral $ natVal (Proxy :: Proxy inputCols)
+        kx = fromIntegral $ natVal (Proxy :: Proxy kernelRows)
+        ky = fromIntegral $ natVal (Proxy :: Proxy kernelCols)
+        sx = fromIntegral $ natVal (Proxy :: Proxy strideRows)
+        sy = fromIntegral $ natVal (Proxy :: Proxy strideCols)
+        ox = fromIntegral $ natVal (Proxy :: Proxy outputRows)
+        oy = fromIntegral $ natVal (Proxy :: Proxy outputCols)
+
+        c  = vid2col 1 1 1 1 ix iy ex
+
+        mt = c LA.<> tr ek
+
+        r  = col2vid kx ky sx sy ox oy mt
+        rs = fromJust . create $ r
+    in  (S3D input, S3D rs)
+  runBackwards (Deconvolution kernel _) (S3D input) (S3D dEdy) =
+    let ex = extract input
+        ix = fromIntegral $ natVal (Proxy :: Proxy inputRows)
+        iy = fromIntegral $ natVal (Proxy :: Proxy inputCols)
+        kx = fromIntegral $ natVal (Proxy :: Proxy kernelRows)
+        ky = fromIntegral $ natVal (Proxy :: Proxy kernelCols)
+        sx = fromIntegral $ natVal (Proxy :: Proxy strideRows)
+        sy = fromIntegral $ natVal (Proxy :: Proxy strideCols)
+        ox = fromIntegral $ natVal (Proxy :: Proxy outputRows)
+        oy = fromIntegral $ natVal (Proxy :: Proxy outputCols)
+
+        c  = vid2col 1 1 1 1 ix iy ex
+
+        eo = extract dEdy
+        ek = extract kernel
+
+        vs = vid2col kx ky sx sy ox oy eo
+
+        kN = fromJust . create . tr $ tr c LA.<> vs
+
+        dW = vs LA.<> ek
+
+        xW = col2vid 1 1 1 1 ix iy dW
+    in  (Deconvolution' kN, S3D . fromJust . create $ xW)
--- a/src/Grenade/Layers/Inception.hs
+++ b/src/Grenade/Layers/Inception.hs
@ -19,6 +19,8 @@ complex multiconvolution size networks.
 -}
 module Grenade.Layers.Inception (
    Inception
+  , InceptionMini
+  , Resnet
  ) where

 import           GHC.TypeLits
@ -27,6 +29,8 @@ import           Grenade.Core
 import           Grenade.Layers.Convolution
 import           Grenade.Layers.Pad
 import           Grenade.Layers.Concat
+import           Grenade.Layers.Merge
+import           Grenade.Layers.Trivial

 -- | Type of an inception layer.
 --
@ -41,10 +45,10 @@ import           Grenade.Layers.Concat
 --   The network get padded effectively before each convolution filters
 --   such that the output dimension is the same x and y as the input.
 type Inception rows cols channels chx chy chz
-  = Network '[ Concat ('D3 rows cols (chx + chy)) (InceptionS rows cols channels chx chy) ('D3 rows cols chz) (Inception7x7 rows cols channels chz) ]
+  = Network '[ Concat ('D3 rows cols (chx + chy)) (InceptionMini rows cols channels chx chy) ('D3 rows cols chz) (Inception7x7 rows cols channels chz) ]
            '[ 'D3 rows cols channels, 'D3 rows cols (chx + chy + chz) ]

-type InceptionS rows cols channels chx chy
+type InceptionMini rows cols channels chx chy
  = Network '[ Concat ('D3 rows cols chx) (Inception3x3 rows cols channels chx) ('D3 rows cols chy) (Inception5x5 rows cols channels chy) ]
            '[ 'D3 rows cols channels, 'D3 rows cols (chx + chy) ]

@ -60,3 +64,4 @@ type Inception7x7 rows cols channels chx
  = Network '[ Pad 3 3 3 3, Convolution channels chx 7 7 1 1 ]
            '[ 'D3 rows cols channels, 'D3 (rows + 6) (cols + 6) channels, 'D3 rows cols chx ]

+type Resnet branch = Merge Trivial branch