Using dependent types in the deeper functions and
requiring a Proxy to reach them meant we required
dictionary passing to get the Nats. This made the
pad and crop layers almost 1000 times slower than
they should have been.
Changes shapes to get rid of the Vector, all data is
now held in contiguous memory.
Add fast c implementations for pooling layers.
Now does mnist on my laptop in 12 minutes.