marian/doc/layer.md
Qianqian Zhu 71b5454b9e
Layer documentation (#892)
* More examples for MLP layers and docs about RNN layers
* Docs about embedding layer and more doxygen code docs
* Add layer and factors docs into index.rst
* Update layer documentation
* Fix typos

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
Co-authored-by: Graeme Nail <graemenail.work@gmail.com>
2022-01-26 15:17:38 +00:00

13 KiB

Layers

In a typical deep neural network, highest-level blocks, which perform different kinds of transformations on their inputs are called layers. A layer wraps a group of nodes and performs a specific mathematical computation, offering a shortcut for building a more complex neural network.

In Marian, for example, the mlp::dense layer represents a fully connected layer, which implements the operation output = activation(input * weight + bias). A dense layer in the graph can be constructed with the following code:

// add input node x
auto x = graph->constant({120,5}, inits::fromVector(inputData));
// construct a dense layer in the graph
auto layer1 = mlp::dense()
      ("prefix", "layer1")                  // prefix name is layer1
      ("dim", 5)                            // output dimension is 5
      ("activation", (int)mlp::act::tanh)   // activation function is tanh
      .construct(graph)->apply(x);          // construct this layer in graph
                                            // and link node x as the input

The options are passed to the layer using pairs of (key, value), where key is a predefined option, and value is the option value. Then construct() is called to create a layer instance in the graph, and apply() to link the input with this layer.

Alternatively, the same layer can be created defining nodes and operations directly:

// construct a dense layer using nodes
auto W1 = graph->param("W1", {120, 5}, inits::glorotUniform());
auto b1 = graph->param("b1", {1, 5}, inits::zeros());
auto h = tanh(affine(x, W1, b1));

There are four categories of layers implemented in Marian, described in the sections below.

Convolution layer

To use a convolution layer, you first need to install NVIDIA cuDNN. The convolution layer supported by Marian is a 2D convolution layer. This layer creates a convolution kernel which is used to convolved with the input. The options that can be passed to a convolution layer are the following:

Option Name Definition Value Type Default Value
prefix Prefix name (used to form the parameter names) std::string None
kernel-dims The height and width of the kernel std::pair<int, int> None
kernel-num The number of kernel int None
paddings The height and width of paddings std::pair<int, int> (0,0)
strides The height and width of strides std::pair<int, int> (1,1)

Example:

// construct a convolution layer
auto conv_1 = convolution(graph)              // pass graph pointer to the layer
      ("prefix", "conv_1")                    // prefix name is conv_1
      ("kernel-dims", std::make_pair(3,3))    // kernel is 3*3
      ("kernel-num", 32)                      // kernel no. is 32
      .apply(x);                              // link node x as the input

MLP layers

Marian offers mlp::mlp, which creates a multilayer perceptron (MLP) network. It is a container which can stack multiple layers using push_back() function. There are two types of MLP layers provided by Marian: mlp::dense and mlp::output.

The mlp::dense layer, as introduced before, is a fully connected layer, and it accepts the following options:

Option Name Definition Value Type Default Value
prefix Prefix name (used to form the parameter names) std::string None
dim Output dimension int None
layer-normalization Whether to normalise the layer output or not bool false
nematus-normalization Whether to use Nematus layer normalisation or not bool false
activation Activation function int mlp::act::linear

The available activation functions for mlp are mlp::act::linear, mlp::act::tanh, mlp::act::sigmoid, mlp::act::ReLU, mlp::act::LeakyReLU, mlp::act::PReLU, and mlp::act::swish.

Example:

// construct a mlp::dense layer
auto dense_layer = mlp::dense()
      ("prefix", "dense_layer")                 // prefix name is dense_layer
      ("dim", 3)                                // output dimension is 3
      ("activation", (int)mlp::act::sigmoid)    // activation function is sigmoid
      .construct(graph)->apply(x);              // construct this layer in graph and link node x as the input

The mlp::output layer is used, as the name suggests, to construct an output layer. You can tie embedding layers to mlp::output layer using tieTransposed(), or set shortlisted words using setShortlist(). The general options of mlp::output layer are listed below:

Option Name Definition Value Type Default Value
prefix Prefix name (used to form the parameter names) std::string None
dim Output dimension int None
vocab File path to the factored vocabulary std::string None
output-omit-bias Whether this layer has a bias parameter bool true
lemma-dim-emb Re-embedding dimension of lemma in factors, must be used with vocab option int 0
output-approx-knn Parameters for LSH-based output approximation, i.e., k (the first element) and nbit (the second element) std::vector<int> None

Example:

// construct a mlp::output layer
auto last = mlp::output()
      ("prefix", "last")    // prefix name is dense_layer
      ("dim", 5);           // output dimension is 5

Finally, an example showing how to create a mlp::mlp network containing multiple layers:

// construct a mlp::mlp network
auto mlp_networks = mlp::mlp()                                       // construct a mpl container
                     .push_back(mlp::dense()                         // construct a dense layer
                                 ("prefix", "dense")                 // prefix name is dense
                                 ("dim", 5)                          // dimension is 5
                                 ("activation", (int)mlp::act::tanh))// activation function is tanh
                     .push_back(mlp::output()                        // construct a output layer
                                 ("dim", 5))                         // dimension is 5
                     ("prefix", "mlp_network")                       // prefix name is mlp_network
                     .construct(graph);                              // construct this mlp layers in graph

RNN layers

Marian offers rnn::rnn for creating a recurrent neural network (RNN) network. Just like mlp::mlp, rnn::rnn is a container which can stack multiple layers using push_back() function. Unlike mlp layers, Marian only provides cell-level APIs to construct RNN. RNN cells only process a single timestep instead of the whole batches of input sequences. There are two types of rnn layers provided by Marian: rnn::cell and rnn::stacked_cell.

The rnn::cell is the base component of RNN and rnn::stacked_cell is a stack of rnn::cell. The few options of rnn::cell layer are listed below:

Option Name Definition Value Type Default Value
type Type of RNN cell std::string None

There are nine types of RNN cells provided by Marian: gru, gru-nematus, lstm, mlstm, mgru, tanh, relu, sru, ssru. The general options for all RNN cells are the following:

Option Name Definition Value Type Default Value
dimInput Input dimension int None
dimState Dimension of hidden state int None
prefix Prefix name (used to form the parameter names) std::string None
layer-normalization Whether to normalise the layer output or not bool false
dropout Dropout probability float 0
transition Whether it is a transition layer bool false
final Whether it is an RNN final layer or hidden layer bool false
Not all the options listed above are available for all the cells. For example, `final` option is
only used for `gru` and `gru-nematus` cells.

Example for rnn::cell:

// construct a rnn cell
auto rnn_cell = rnn::cell()
         ("type", "gru")              // type of rnn cell is gru
         ("prefix", "gru_cell")       // prefix name is gru_cell
         ("final", false);            // this cell is the final layer

Example for rnn::stacked_cell:

// construct a stack of rnn cells
auto highCell = rnn::stacked_cell();
// for loop to add rnn cells into the stack
for(size_t j = 1; j <= 512; j++) {
    auto paramPrefix ="cell" + std::to_string(j);
    highCell.push_back(rnn::cell()("prefix", paramPrefix));
}

The list of available options for rnn::rnn layers:

Option Name Definition Value Type Default Value
type Type of RNN layer std::string gru
direction RNN direction int rnn::dir::forward
dimInput Input dimension int None
dimState Dimension of hidden state int None
prefix Prefix name (used to form the parameter names) std::string None
layer-normalization Whether to normalise the layer output or not bool false
nematus-normalization Whether to use Nematus layer normalisation or not bool false
dropout Dropout probability float 0
skip Whether to use skip connections bool false
skipFirst Whether to use skip connections for the layer(s) with index > 0 bool false

Examples for rnn::rnn():

// construct a `rnn::rnn()` container
auto rnn_container = rnn::rnn(
               "type", "gru",                  // type of rnn cell is gru
               "prefix", "rnn_layers",         // prefix name is rnn_layers
               "dimInput", 10,                 // input dimension is 10
               "dimState", 5,                  // dimension of hidden state is 5
               "dropout", 0,                   // dropout probability is 0
               "layer-normalization", false)   // do not normalise the layer output
               .push_back(rnn::cell())         // add a rnn::cell in this rnn container
               .construct(graph);              // construct this rnn container in graph

Marian provides four RNN directions in rnn::dir enumerator: rnn::dir::forward, rnn::dir::backward, rnn::dir::alternating_forward and rnn::dir::alternating_backward. For rnn::rnn(), you can use transduce() to map the input state to the output state.

An example for transduce():

auto output = rnn.construct(graph)->transduce(input);

Embedding layer

Marian provides a shortcut to construct a regular embedding layer embedding for words embedding. For embedding layers, there are following options available:

Option Name Definition Value Type Default Value
dimVocab Size of vocabulary int None
dimEmb Size of embedding vector int None
dropout Dropout probability float 0
inference Whether it is used for inference bool false
prefix Prefix name (used to form the parameter names) std::string None
fixed whether this layer is fixed (not trainable) bool false
dimFactorEmb Size of factored embedding vector int None
factorsCombine Which strategy is chosen to combine the factor embeddings; it can be "concat" std::string None
vocab File path to the factored vocabulary std::string None
embFile Paths to the factored embedding vectors std::string> None
normalization Whether to normalise the layer output or not bool false

Example to construct an embedding layer:

// construct an embedding layer
auto embedding_layer = embedding()
        ("prefix", "embedding")       // prefix name is embedding
        ("dimVocab", 1024)            // vocabulary size is 1024
        ("dimEmb", 512)               // size of embedding vector is 512
        .construct(graph);            // construct this embedding layer in graph