mirror of
https://github.com/marian-nmt/marian.git
synced 2024-09-17 09:47:34 +03:00
Layer documentation (#892)
* More examples for MLP layers and docs about RNN layers * Docs about embedding layer and more doxygen code docs * Add layer and factors docs into index.rst * Update layer documentation * Fix typos Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com> Co-authored-by: Graeme Nail <graemenail.work@gmail.com>
This commit is contained in:
parent
3b458b044e
commit
71b5454b9e
@ -1,4 +1,4 @@
|
||||
# Using marian with factors
|
||||
# Using Marian with factors
|
||||
|
||||
Following this README should allow the user to train a model with source and/or target side factors. To train with factors, the data must be formatted in a certain way. A special vocabulary file format is also required, and its extension should be `.fsv` as providing a source and/or target vocabulary file with this extension is what triggers the usage of source and/or target factors. See details below.
|
||||
|
||||
|
@ -14,7 +14,8 @@ This is developer documentation. User documentation is available at https://mari
|
||||
|
||||
graph
|
||||
operators
|
||||
|
||||
layer
|
||||
factors
|
||||
api/library_index
|
||||
|
||||
contributing
|
||||
|
241
doc/layer.md
Normal file
241
doc/layer.md
Normal file
@ -0,0 +1,241 @@
|
||||
# Layers
|
||||
|
||||
In a typical deep neural network, highest-level blocks, which perform different kinds of
|
||||
transformations on their inputs are called layers. A layer wraps a group of nodes and performs a
|
||||
specific mathematical computation, offering a shortcut for building a more complex neural network.
|
||||
|
||||
In Marian, for example, the `mlp::dense` layer represents a fully connected layer, which implements
|
||||
the operation `output = activation(input * weight + bias)`. A dense layer in the graph can be
|
||||
constructed with the following code:
|
||||
```cpp
|
||||
// add input node x
|
||||
auto x = graph->constant({120,5}, inits::fromVector(inputData));
|
||||
// construct a dense layer in the graph
|
||||
auto layer1 = mlp::dense()
|
||||
("prefix", "layer1") // prefix name is layer1
|
||||
("dim", 5) // output dimension is 5
|
||||
("activation", (int)mlp::act::tanh) // activation function is tanh
|
||||
.construct(graph)->apply(x); // construct this layer in graph
|
||||
// and link node x as the input
|
||||
```
|
||||
The options are passed to the layer using pairs of `(key, value)`, where `key` is a predefined
|
||||
option, and `value` is the option value. Then `construct()` is called to create a layer instance in
|
||||
the graph, and `apply()` to link the input with this layer.
|
||||
|
||||
Alternatively, the same layer can be created defining nodes and operations directly:
|
||||
```cpp
|
||||
// construct a dense layer using nodes
|
||||
auto W1 = graph->param("W1", {120, 5}, inits::glorotUniform());
|
||||
auto b1 = graph->param("b1", {1, 5}, inits::zeros());
|
||||
auto h = tanh(affine(x, W1, b1));
|
||||
```
|
||||
There are four categories of layers implemented in Marian, described in the sections below.
|
||||
|
||||
## Convolution layer
|
||||
|
||||
To use a `convolution` layer, you first need to install [NVIDIA cuDNN](https://developer.nvidia.com/cudnn).
|
||||
The convolution layer supported by Marian is a 2D
|
||||
[convolution layer](https://en.wikipedia.org/wiki/Convolutional_neural_network#Convolutional_layers).
|
||||
This layer creates a convolution kernel which is used to convolved with the input. The options that
|
||||
can be passed to a `convolution` layer are the following:
|
||||
|
||||
| Option Name | Definition | Value Type | Default Value |
|
||||
| ------------- |----------------|---------------|---------------|
|
||||
| prefix | Prefix name (used to form the parameter names) | `std::string` | `None` |
|
||||
| kernel-dims | The height and width of the kernel | `std::pair<int, int>` | `None`|
|
||||
| kernel-num | The number of kernel | `int` | `None` |
|
||||
| paddings | The height and width of paddings | `std::pair<int, int>` | `(0,0)`|
|
||||
| strides | The height and width of strides | `std::pair<int, int>` | `(1,1)` |
|
||||
|
||||
Example:
|
||||
```cpp
|
||||
// construct a convolution layer
|
||||
auto conv_1 = convolution(graph) // pass graph pointer to the layer
|
||||
("prefix", "conv_1") // prefix name is conv_1
|
||||
("kernel-dims", std::make_pair(3,3)) // kernel is 3*3
|
||||
("kernel-num", 32) // kernel no. is 32
|
||||
.apply(x); // link node x as the input
|
||||
```
|
||||
|
||||
## MLP layers
|
||||
|
||||
Marian offers `mlp::mlp`, which creates a
|
||||
[multilayer perceptron (MLP)](https://en.wikipedia.org/wiki/Multilayer_perceptron) network.
|
||||
It is a container which can stack multiple layers using `push_back()` function. There are two types
|
||||
of MLP layers provided by Marian: `mlp::dense` and `mlp::output`.
|
||||
|
||||
The `mlp::dense` layer, as introduced before, is a fully connected layer, and it accepts the
|
||||
following options:
|
||||
|
||||
| Option Name | Definition | Value Type | Default Value |
|
||||
| ------------- |----------------|---------------|---------------|
|
||||
| prefix | Prefix name (used to form the parameter names) | `std::string` | `None` |
|
||||
| dim | Output dimension | `int` | `None` |
|
||||
| layer-normalization | Whether to normalise the layer output or not | `bool` | `false` |
|
||||
| nematus-normalization | Whether to use Nematus layer normalisation or not | `bool` | `false` |
|
||||
| activation | Activation function | `int` | `mlp::act::linear` |
|
||||
|
||||
The available activation functions for mlp are `mlp::act::linear`, `mlp::act::tanh`,
|
||||
`mlp::act::sigmoid`, `mlp::act::ReLU`, `mlp::act::LeakyReLU`, `mlp::act::PReLU`, and
|
||||
`mlp::act::swish`.
|
||||
|
||||
Example:
|
||||
```cpp
|
||||
// construct a mlp::dense layer
|
||||
auto dense_layer = mlp::dense()
|
||||
("prefix", "dense_layer") // prefix name is dense_layer
|
||||
("dim", 3) // output dimension is 3
|
||||
("activation", (int)mlp::act::sigmoid) // activation function is sigmoid
|
||||
.construct(graph)->apply(x); // construct this layer in graph and link node x as the input
|
||||
```
|
||||
|
||||
The `mlp::output` layer is used, as the name suggests, to construct an output layer. You can tie
|
||||
embedding layers to `mlp::output` layer using `tieTransposed()`, or set shortlisted words using
|
||||
`setShortlist()`. The general options of `mlp::output` layer are listed below:
|
||||
|
||||
| Option Name | Definition | Value Type | Default Value |
|
||||
| ------------- |----------------|---------------|---------------|
|
||||
| prefix | Prefix name (used to form the parameter names) | `std::string` | `None` |
|
||||
| dim | Output dimension | `int` | `None` |
|
||||
| vocab | File path to the factored vocabulary | `std::string` | `None` |
|
||||
| output-omit-bias | Whether this layer has a bias parameter | `bool` | `true` |
|
||||
| lemma-dim-emb | Re-embedding dimension of lemma in factors, must be used with `vocab` option | `int` | `0` |
|
||||
| output-approx-knn | Parameters for LSH-based output approximation, i.e., `k` (the first element) and `nbit` (the second element) | `std::vector<int>` | None |
|
||||
|
||||
Example:
|
||||
```cpp
|
||||
// construct a mlp::output layer
|
||||
auto last = mlp::output()
|
||||
("prefix", "last") // prefix name is dense_layer
|
||||
("dim", 5); // output dimension is 5
|
||||
```
|
||||
Finally, an example showing how to create a `mlp::mlp` network containing multiple layers:
|
||||
```cpp
|
||||
// construct a mlp::mlp network
|
||||
auto mlp_networks = mlp::mlp() // construct a mpl container
|
||||
.push_back(mlp::dense() // construct a dense layer
|
||||
("prefix", "dense") // prefix name is dense
|
||||
("dim", 5) // dimension is 5
|
||||
("activation", (int)mlp::act::tanh))// activation function is tanh
|
||||
.push_back(mlp::output() // construct a output layer
|
||||
("dim", 5)) // dimension is 5
|
||||
("prefix", "mlp_network") // prefix name is mlp_network
|
||||
.construct(graph); // construct this mlp layers in graph
|
||||
```
|
||||
|
||||
## RNN layers
|
||||
Marian offers `rnn::rnn` for creating a [recurrent neural network
|
||||
(RNN)](https://en.wikipedia.org/wiki/Recurrent_neural_network) network. Just like `mlp::mlp`,
|
||||
`rnn::rnn` is a container which can stack multiple layers using `push_back()` function. Unlike mlp
|
||||
layers, Marian only provides cell-level APIs to construct RNN. RNN cells only process a single
|
||||
timestep instead of the whole batches of input sequences. There are two types of rnn layers provided
|
||||
by Marian: `rnn::cell` and `rnn::stacked_cell`.
|
||||
|
||||
The `rnn::cell` is the base component of RNN and `rnn::stacked_cell` is a stack of `rnn::cell`. The
|
||||
few options of `rnn::cell` layer are listed below:
|
||||
|
||||
| Option Name | Definition | Value Type | Default Value |
|
||||
| ------------- |----------------|---------------|---------------|
|
||||
| type | Type of RNN cell | `std::string` | `None` |
|
||||
|
||||
There are nine types of RNN cells provided by Marian: `gru`, `gru-nematus`, `lstm`, `mlstm`, `mgru`,
|
||||
`tanh`, `relu`, `sru`, `ssru`. The general options for all RNN cells are the following:
|
||||
|
||||
| Option Name | Definition | Value Type | Default Value |
|
||||
| ------------- |----------------|---------------|---------------|
|
||||
| dimInput | Input dimension | `int` | `None` |
|
||||
| dimState | Dimension of hidden state | `int` | `None` |
|
||||
| prefix | Prefix name (used to form the parameter names) | `std::string` | `None` |
|
||||
| layer-normalization | Whether to normalise the layer output or not | `bool` | `false` |
|
||||
| dropout | Dropout probability | `float` | `0` |
|
||||
| transition | Whether it is a transition layer | `bool` | `false` |
|
||||
| final | Whether it is an RNN final layer or hidden layer | `bool` | `false` |
|
||||
|
||||
```{note}
|
||||
Not all the options listed above are available for all the cells. For example, `final` option is
|
||||
only used for `gru` and `gru-nematus` cells.
|
||||
```
|
||||
|
||||
Example for `rnn::cell`:
|
||||
```cpp
|
||||
// construct a rnn cell
|
||||
auto rnn_cell = rnn::cell()
|
||||
("type", "gru") // type of rnn cell is gru
|
||||
("prefix", "gru_cell") // prefix name is gru_cell
|
||||
("final", false); // this cell is the final layer
|
||||
```
|
||||
Example for `rnn::stacked_cell`:
|
||||
```cpp
|
||||
// construct a stack of rnn cells
|
||||
auto highCell = rnn::stacked_cell();
|
||||
// for loop to add rnn cells into the stack
|
||||
for(size_t j = 1; j <= 512; j++) {
|
||||
auto paramPrefix ="cell" + std::to_string(j);
|
||||
highCell.push_back(rnn::cell()("prefix", paramPrefix));
|
||||
}
|
||||
```
|
||||
|
||||
The list of available options for `rnn::rnn` layers:
|
||||
|
||||
| Option Name | Definition | Value Type | Default Value |
|
||||
| ------------- |----------------|---------------|---------------|
|
||||
| type | Type of RNN layer | `std::string` | `gru` |
|
||||
| direction | RNN direction | `int` | `rnn::dir::forward` |
|
||||
| dimInput | Input dimension | `int` | `None` |
|
||||
| dimState | Dimension of hidden state | `int` | `None` |
|
||||
| prefix | Prefix name (used to form the parameter names) | `std::string` | `None` |
|
||||
| layer-normalization | Whether to normalise the layer output or not | `bool` | `false` |
|
||||
| nematus-normalization | Whether to use Nematus layer normalisation or not | `bool` | `false` |
|
||||
| dropout | Dropout probability | `float` | `0` |
|
||||
| skip | Whether to use skip connections | `bool` | `false` |
|
||||
| skipFirst | Whether to use skip connections for the layer(s) with `index > 0` | `bool` | `false` |
|
||||
|
||||
Examples for `rnn::rnn()`:
|
||||
```cpp
|
||||
// construct a `rnn::rnn()` container
|
||||
auto rnn_container = rnn::rnn(
|
||||
"type", "gru", // type of rnn cell is gru
|
||||
"prefix", "rnn_layers", // prefix name is rnn_layers
|
||||
"dimInput", 10, // input dimension is 10
|
||||
"dimState", 5, // dimension of hidden state is 5
|
||||
"dropout", 0, // dropout probability is 0
|
||||
"layer-normalization", false) // do not normalise the layer output
|
||||
.push_back(rnn::cell()) // add a rnn::cell in this rnn container
|
||||
.construct(graph); // construct this rnn container in graph
|
||||
```
|
||||
Marian provides four RNN directions in `rnn::dir` enumerator: `rnn::dir::forward`,
|
||||
`rnn::dir::backward`, `rnn::dir::alternating_forward` and `rnn::dir::alternating_backward`.
|
||||
For rnn::rnn(), you can use `transduce()` to map the input state to the output state.
|
||||
|
||||
An example for `transduce()`:
|
||||
```cpp
|
||||
auto output = rnn.construct(graph)->transduce(input);
|
||||
```
|
||||
|
||||
## Embedding layer
|
||||
Marian provides a shortcut to construct a regular embedding layer `embedding` for words embedding.
|
||||
For `embedding` layers, there are following options available:
|
||||
|
||||
| Option Name | Definition | Value Type | Default Value |
|
||||
| ------------- |----------------|---------------|---------------|
|
||||
| dimVocab | Size of vocabulary| `int` | `None` |
|
||||
| dimEmb | Size of embedding vector | `int` | `None` |
|
||||
| dropout | Dropout probability | `float` | `0` |
|
||||
| inference | Whether it is used for inference | `bool` | `false` |
|
||||
| prefix | Prefix name (used to form the parameter names) | `std::string` | `None` |
|
||||
| fixed | whether this layer is fixed (not trainable) | `bool` | `false` |
|
||||
| dimFactorEmb | Size of factored embedding vector | `int` | `None` |
|
||||
| factorsCombine | Which strategy is chosen to combine the factor embeddings; it can be `"concat"` | `std::string` | `None` |
|
||||
| vocab | File path to the factored vocabulary | `std::string` | `None` |
|
||||
| embFile | Paths to the factored embedding vectors | `std::string>` | `None` |
|
||||
| normalization | Whether to normalise the layer output or not | `bool` | `false` |
|
||||
|
||||
Example to construct an embedding layer:
|
||||
```cpp
|
||||
// construct an embedding layer
|
||||
auto embedding_layer = embedding()
|
||||
("prefix", "embedding") // prefix name is embedding
|
||||
("dimVocab", 1024) // vocabulary size is 1024
|
||||
("dimEmb", 512) // size of embedding vector is 512
|
||||
.construct(graph); // construct this embedding layer in graph
|
||||
```
|
@ -1,4 +1,4 @@
|
||||
# Operations in the Expression Graph
|
||||
# Operations in the expression graph
|
||||
|
||||
Operations are responsible for manipulating the elements of an expression graph.
|
||||
In Marian, many useful operations have already been implemented and can be found
|
||||
|
@ -12,6 +12,11 @@ namespace mlp {
|
||||
* Base class for layer factories, can be used in a multi-layer network factory.
|
||||
*/
|
||||
struct LayerFactory : public Factory {
|
||||
/**
|
||||
* Construct a layer instance in a given graph.
|
||||
* @param graph a shared pointer a graph
|
||||
* @return a shared pointer to the layer object
|
||||
*/
|
||||
virtual Ptr<IUnaryLayer> construct(Ptr<ExpressionGraph> graph) = 0;
|
||||
};
|
||||
|
||||
@ -31,18 +36,24 @@ public:
|
||||
}
|
||||
};
|
||||
|
||||
// @TODO: change naming convention
|
||||
/**
|
||||
* A convenient typedef for constructing a MLP dense layer.
|
||||
* @TODO: change naming convention
|
||||
*/
|
||||
typedef Accumulator<DenseFactory> dense;
|
||||
|
||||
/**
|
||||
* Factory for output layers, can be used in a multi-layer network factory.
|
||||
* Base factory for output layers, can be used in a multi-layer network factory.
|
||||
*/
|
||||
struct LogitLayerFactory : public Factory {
|
||||
using Factory::Factory;
|
||||
virtual Ptr<IUnaryLogitLayer> construct(Ptr<ExpressionGraph> graph) = 0;
|
||||
};
|
||||
|
||||
// @TODO: In the long run, I hope we can get rid of the abstract factories altogether.
|
||||
/**
|
||||
* Implementation of Output layer factory, can be used in a multi-layer network factory.
|
||||
* @TODO: In the long run, I hope we can get rid of the abstract factories altogether.
|
||||
*/
|
||||
class OutputFactory : public LogitLayerFactory {
|
||||
using LogitLayerFactory::LogitLayerFactory;
|
||||
|
||||
@ -74,12 +85,13 @@ public:
|
||||
}
|
||||
};
|
||||
|
||||
// @TODO: change naming convention
|
||||
/**
|
||||
* A convenient typedef for constructing a MLP output layer.
|
||||
* @TODO: change naming convention
|
||||
*/
|
||||
typedef Accumulator<OutputFactory> output;
|
||||
|
||||
/**
|
||||
* Multi-layer network, holds and applies layers.
|
||||
*/
|
||||
/** Multi-layer network, holds and applies layers. */
|
||||
class MLP : public IUnaryLogitLayer, public IHasShortList {
|
||||
protected:
|
||||
Ptr<ExpressionGraph> graph_;
|
||||
@ -88,8 +100,17 @@ protected:
|
||||
std::vector<Ptr<IUnaryLayer>> layers_;
|
||||
|
||||
public:
|
||||
/**
|
||||
* Construct a MLP container in the graph.
|
||||
* @param graph The expression graph.
|
||||
* @param options The options used for this mlp container.
|
||||
*/
|
||||
MLP(Ptr<ExpressionGraph> graph, Ptr<Options> options) : graph_(graph), options_(options) {}
|
||||
|
||||
/**
|
||||
* Apply/Link a vector of mlp layers (with the given inputs) to the expression graph.
|
||||
* @param av The vector of input expressions
|
||||
* @return The expression holding the mlp container
|
||||
*/
|
||||
Expr apply(const std::vector<Expr>& av) override {
|
||||
Expr output;
|
||||
if(av.size() == 1)
|
||||
@ -102,7 +123,12 @@ public:
|
||||
|
||||
return output;
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply/Link a vector of mlp layers (with the given inputs) to the expression graph.
|
||||
* @param av The vector of input expressions
|
||||
* @return The expression holding the mlp container as a
|
||||
* <a href=https://en.wikipedia.org/wiki/Logit>Logits</a> object
|
||||
*/
|
||||
Logits applyAsLogits(const std::vector<Expr>& av) override {
|
||||
// same as apply() except for the last layer, we invoke applyAsLogits(), which has a different
|
||||
// return type
|
||||
@ -126,13 +152,33 @@ public:
|
||||
return lastLayer->applyAsLogits(output);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply/Link a mlp layer (with the given input) to the expression graph.
|
||||
* @param e The input expression
|
||||
* @return The expression holding the mlp container
|
||||
*/
|
||||
Expr apply(Expr e) override { return apply(std::vector<Expr>{e}); }
|
||||
/**
|
||||
* Apply/Link a mlp layer (with the given input) to the expression graph.
|
||||
* @param e The input expression
|
||||
* @return The expression holding the mlp container as a
|
||||
* <a href=https://en.wikipedia.org/wiki/Logit>Logits</a> object
|
||||
*/
|
||||
Logits applyAsLogits(Expr e) override { return applyAsLogits(std::vector<Expr>{e}); }
|
||||
|
||||
/**
|
||||
* Stack a mlp layer to the mlp container.
|
||||
* @param layer The mlp layer
|
||||
*/
|
||||
void push_back(Ptr<IUnaryLayer> layer) { layers_.push_back(layer); }
|
||||
/**
|
||||
* Stack a mlp layer with <a href=https://en.wikipedia.org/wiki/Logit>Logits</a> object to the mlp container.
|
||||
* @param layer The mlp layer with <a href=https://en.wikipedia.org/wiki/Logit>Logits</a> object
|
||||
*/
|
||||
void push_back(Ptr<IUnaryLogitLayer> layer) { layers_.push_back(layer); }
|
||||
|
||||
/**
|
||||
* Set shortlisted words to the mlp container.
|
||||
* @param shortlist The given shortlisted words
|
||||
*/
|
||||
void setShortlist(Ptr<data::Shortlist> shortlist) override final {
|
||||
auto p = tryAsHasShortlist();
|
||||
ABORT_IF(
|
||||
@ -140,7 +186,7 @@ public:
|
||||
"setShortlist() called on an MLP with an output layer that does not support short lists");
|
||||
p->setShortlist(shortlist);
|
||||
}
|
||||
|
||||
/** Remove shortlisted words from the mlp container. */
|
||||
void clear() override final {
|
||||
auto p = tryAsHasShortlist();
|
||||
if(p)
|
||||
@ -154,8 +200,8 @@ private:
|
||||
};
|
||||
|
||||
/**
|
||||
* Multi-layer network factory. Can hold layer factories. Used
|
||||
* to accumulate options for later lazy construction.
|
||||
* Multi-layer network factory. Can hold layer factories.
|
||||
* Used to accumulate options for later lazy construction.
|
||||
*/
|
||||
class MLPFactory : public Factory {
|
||||
using Factory::Factory;
|
||||
@ -164,6 +210,12 @@ private:
|
||||
std::vector<Ptr<LayerFactory>> layers_;
|
||||
|
||||
public:
|
||||
/**
|
||||
* Create a MLP container instance in the expression graph.
|
||||
* Used to accumulate options for later lazy construction.
|
||||
* @param graph The expression graph
|
||||
* @return The shared pointer to the MLP container
|
||||
*/
|
||||
Ptr<MLP> construct(Ptr<ExpressionGraph> graph) {
|
||||
auto mlp = New<MLP>(graph, options_);
|
||||
for(auto layer : layers_) {
|
||||
@ -172,7 +224,11 @@ public:
|
||||
}
|
||||
return mlp;
|
||||
}
|
||||
|
||||
/**
|
||||
* Stack a layer to the mlp container.
|
||||
* @param lf The layer
|
||||
* @return The Accumulator object holding the mlp container
|
||||
*/
|
||||
template <class LF>
|
||||
Accumulator<MLPFactory> push_back(const LF& lf) {
|
||||
layers_.push_back(New<LF>(lf));
|
||||
@ -201,6 +257,11 @@ private:
|
||||
}
|
||||
|
||||
public:
|
||||
/**
|
||||
* Stack a mlp output layer to the mlp container.
|
||||
* @param lf The mlp output layer
|
||||
* @return The Accumulator object holding the mlp container
|
||||
*/
|
||||
Accumulator<MLPFactory> push_back(const Accumulator<OutputFactory>& lf) {
|
||||
push_back(AsLayerFactory<OutputFactory>(lf));
|
||||
// layers_.push_back(New<AsLayerFactory<OutputFactory>>(asLayerFactory((OutputFactory&)lf)));
|
||||
@ -208,13 +269,19 @@ public:
|
||||
}
|
||||
};
|
||||
|
||||
// @TODO: change naming convention.
|
||||
|
||||
/**
|
||||
* A convenient typedef for constructing MLP layers.
|
||||
* @TODO: change naming convention.
|
||||
*/
|
||||
typedef Accumulator<MLPFactory> mlp;
|
||||
} // namespace mlp
|
||||
|
||||
typedef ConstructingFactory<Embedding> EmbeddingFactory;
|
||||
typedef ConstructingFactory<ULREmbedding> ULREmbeddingFactory;
|
||||
|
||||
/** A convenient typedef for constructing a standard embedding layers. */
|
||||
typedef Accumulator<EmbeddingFactory> embedding;
|
||||
/** A convenient typedef for constructing ULR word embedding layers. */
|
||||
typedef Accumulator<ULREmbeddingFactory> ulr_embedding;
|
||||
} // namespace marian
|
||||
|
@ -6,10 +6,12 @@ namespace marian {
|
||||
|
||||
class FactoredVocab;
|
||||
|
||||
// A regular embedding layer.
|
||||
// Note that this also applies dropout if the option is passed (pass 0 when in inference mode).
|
||||
// It is best to not use Embedding directly, but rather via getEmbeddingLayer() in
|
||||
// EncoderDecoderLayerBase, which knows to pass on all required parameters from options.
|
||||
/**
|
||||
* A regular embedding layer.
|
||||
* Note that this also applies dropout if the option is passed (pass 0 when in inference mode).
|
||||
* It is best to not use Embedding directly, but rather via getEmbeddingLayer() in
|
||||
* EncoderDecoderLayerBase, which knows to pass on all required parameters from options.
|
||||
*/
|
||||
class Embedding : public LayerBase, public IEmbeddingLayer {
|
||||
Expr E_;
|
||||
Expr FactorEmbMatrix_; // Factors embedding matrix if combining lemma and factors embeddings with concatenation
|
||||
@ -19,16 +21,43 @@ class Embedding : public LayerBase, public IEmbeddingLayer {
|
||||
bool inference_{false};
|
||||
|
||||
public:
|
||||
/**
|
||||
* Construct a regular embedding layer in the graph.
|
||||
* @param graph The expression graph.
|
||||
* @param options The options used for this embedding layer.
|
||||
*/
|
||||
Embedding(Ptr<ExpressionGraph> graph, Ptr<Options> options);
|
||||
|
||||
/**
|
||||
* Apply/Link this embedding layer (with the given batch of sentences) to the expression graph.
|
||||
* @param subBatch The batch of sentences
|
||||
* @return The expression tuple holding the embedding layer and the masking layer
|
||||
*/
|
||||
std::tuple<Expr /*embeddings*/, Expr /*mask*/> apply(
|
||||
Ptr<data::SubBatch> subBatch) const override final;
|
||||
|
||||
/**
|
||||
* Apply/Link this embedding layer (with the given words and shape) to the expression graph.
|
||||
* @param words Sequence of vocabulary items
|
||||
* @param shape Shape of the words
|
||||
* @return The expression holding the embedding layer
|
||||
*/
|
||||
Expr apply(const Words& words, const Shape& shape) const override final;
|
||||
|
||||
/**
|
||||
* Apply/Link this embedding layer (with the given WordIndex vector and shape) to the expression graph.
|
||||
* @param embIdx The vector of WordIndex objects
|
||||
* @param shape Shape of the WordIndex vector
|
||||
* @return The expression holding the embedding layer
|
||||
*/
|
||||
Expr applyIndices(const std::vector<WordIndex>& embIdx, const Shape& shape) const override final;
|
||||
};
|
||||
|
||||
/**
|
||||
* Universal Language Representation (ULR) word embedding layer.
|
||||
* It is under development.
|
||||
* @todo applyIndices() is not implemented
|
||||
*/
|
||||
class ULREmbedding : public LayerBase, public IEmbeddingLayer {
|
||||
std::vector<Expr> ulrEmbeddings_; // @TODO: These could now better be written as 6 named class members
|
||||
bool inference_{false};
|
||||
|
@ -3,7 +3,10 @@
|
||||
#include "marian.h"
|
||||
|
||||
namespace marian {
|
||||
|
||||
/**
|
||||
* Base class for constructing models or layers.
|
||||
* Its main attribute is options which hold the basic characteristics of the model or the layer.
|
||||
*/
|
||||
class Factory : public std::enable_shared_from_this<Factory> {
|
||||
protected:
|
||||
Ptr<Options> options_;
|
||||
@ -68,8 +71,7 @@ public:
|
||||
template <class Cast>
|
||||
inline bool is() { return std::dynamic_pointer_cast<Cast>(shared_from_this()) != nullptr; }
|
||||
};
|
||||
|
||||
// simplest form of Factory that just passes on options to the constructor of a layer type
|
||||
/** Simplest form of Factory that just passes on options to the constructor of a layer. */
|
||||
template<class Class>
|
||||
struct ConstructingFactory : public Factory {
|
||||
using Factory::Factory;
|
||||
@ -79,6 +81,17 @@ struct ConstructingFactory : public Factory {
|
||||
}
|
||||
};
|
||||
|
||||
/**
|
||||
* Accumulator<Factory> pattern offers a shortcut to construct models or layers.
|
||||
* The options can be passed by a pair of parentheses. E.g., to construct a fully-connected layer:
|
||||
* \code{.cpp}
|
||||
* auto hidden = mlp::dense()
|
||||
("prefix", "hidden_layer") // layer name
|
||||
("dim", outDim) // output dimension
|
||||
("activation", (int)mlp::act::sigmoid) // activation function
|
||||
.construct(graph); // construct this layer in graph
|
||||
\endcode
|
||||
*/
|
||||
template <class BaseFactory> // where BaseFactory : Factory
|
||||
class Accumulator : public BaseFactory {
|
||||
typedef BaseFactory Factory;
|
||||
|
@ -9,18 +9,19 @@
|
||||
|
||||
namespace marian {
|
||||
namespace mlp {
|
||||
/**
|
||||
* @brief Activation functions
|
||||
*/
|
||||
/** Activation functions for MLP layers. */
|
||||
enum struct act : int { linear, tanh, sigmoid, ReLU, LeakyReLU, PReLU, swish };
|
||||
} // namespace mlp
|
||||
} // namespace marian
|
||||
|
||||
namespace marian {
|
||||
|
||||
// Each layer consists of LayerBase and IXXXLayer which defines one or more apply()
|
||||
// functions for the respective layer type (different layers may require different signatures).
|
||||
// This base class contains configuration info for creating parameters and executing apply().
|
||||
/**
|
||||
* Base class for a layer.
|
||||
* Each layer consists of LayerBase and IXXXLayer which defines one or more apply()
|
||||
* functions for the respective layer type (different layers may require different signatures).
|
||||
* This base class contains configuration info for creating parameters and executing apply().
|
||||
*/
|
||||
class LayerBase {
|
||||
protected:
|
||||
Ptr<ExpressionGraph> graph_;
|
||||
@ -40,22 +41,25 @@ public:
|
||||
}
|
||||
};
|
||||
|
||||
// Simplest layer interface: Unary function
|
||||
/** Simplest layer interface: Unary function. */
|
||||
struct IUnaryLayer {
|
||||
virtual ~IUnaryLayer() {}
|
||||
/** Link a node as the input for this layer. */
|
||||
virtual Expr apply(Expr) = 0;
|
||||
/** Link a list of nodes as the inputs for this layer. */
|
||||
virtual Expr apply(const std::vector<Expr>& es) {
|
||||
ABORT_IF(es.size() > 1, "Not implemented"); // simple stub
|
||||
return apply(es.front());
|
||||
}
|
||||
};
|
||||
|
||||
/** Shortlist interface for layers. */
|
||||
struct IHasShortList {
|
||||
virtual void setShortlist(Ptr<data::Shortlist> shortlist) = 0;
|
||||
virtual void clear() = 0;
|
||||
};
|
||||
|
||||
// Embedding from corpus sub-batch to (emb, mask)
|
||||
/** Embedding from corpus sub-batch to (emb, mask). */
|
||||
struct IEmbeddingLayer {
|
||||
virtual std::tuple<Expr /*embeddings*/, Expr /*mask*/> apply(
|
||||
Ptr<data::SubBatch> subBatch) const = 0;
|
||||
@ -67,8 +71,10 @@ struct IEmbeddingLayer {
|
||||
virtual ~IEmbeddingLayer() {}
|
||||
};
|
||||
|
||||
// base class for Encoder and Decoder classes, which have embeddings and a batch index (=stream
|
||||
// index)
|
||||
/**
|
||||
* Base class for Encoder and Decoder classes.
|
||||
* Have embeddings and a batch index (=stream index).
|
||||
*/
|
||||
class EncoderDecoderLayerBase : public LayerBase {
|
||||
protected:
|
||||
const std::string prefix_;
|
||||
@ -98,16 +104,42 @@ private:
|
||||
Ptr<IEmbeddingLayer> createULREmbeddingLayer() const;
|
||||
|
||||
public:
|
||||
// get embedding layer; lazily create on first call
|
||||
/**
|
||||
* Get all embedding layer(s).
|
||||
* It lazily creates the embedding layer on first call.
|
||||
* This is lazy mostly because the constructors of the consuming objects are not
|
||||
* guaranteed presently to have access to their graph.
|
||||
* @param ulr whether to use ULREmbedding layer. false by default.
|
||||
* @return a shared pointer to the embedding layer
|
||||
*/
|
||||
Ptr<IEmbeddingLayer> getEmbeddingLayer(bool ulr = false) const;
|
||||
};
|
||||
|
||||
/**
|
||||
* The namespace mlp.
|
||||
* Declare class Dense and all the available functions for creating
|
||||
* <a href=https://en.wikipedia.org/wiki/Multilayer_perceptron>multilayer perceptron (MLP)</a>
|
||||
* network.
|
||||
*/
|
||||
namespace mlp {
|
||||
|
||||
/**
|
||||
* Base class for a fully connected layer.
|
||||
* Implement the operations `output = activation(input * weight + bias)`.
|
||||
*/
|
||||
class Dense : public LayerBase, public IUnaryLayer {
|
||||
public:
|
||||
/**
|
||||
* Construct a dense layer in the graph.
|
||||
* @param graph The expression graph.
|
||||
* @param options The options used for this dense layer.
|
||||
*/
|
||||
Dense(Ptr<ExpressionGraph> graph, Ptr<Options> options) : LayerBase(graph, options) {}
|
||||
|
||||
/**
|
||||
* Apply/Link a vector of dense layers (with the given inputs) to the expression graph.
|
||||
* @param inputs The vector of the input expressions
|
||||
* @return The expression holding the dense layers
|
||||
*/
|
||||
Expr apply(const std::vector<Expr>& inputs) override {
|
||||
ABORT_IF(inputs.empty(), "No inputs");
|
||||
|
||||
@ -161,7 +193,11 @@ public:
|
||||
}
|
||||
// clang-format on
|
||||
};
|
||||
|
||||
/**
|
||||
* Apply/Link this dense layer (with the given input) to the expression graph.
|
||||
* @param input The input expression
|
||||
* @return The expression holding the dense layer
|
||||
*/
|
||||
Expr apply(Expr input) override { return apply(std::vector<Expr>({input})); }
|
||||
};
|
||||
|
||||
|
@ -33,6 +33,7 @@ public:
|
||||
}
|
||||
};
|
||||
|
||||
/** A convenient typedef for constructing RNN attention layers. */
|
||||
typedef Accumulator<AttentionFactory> attention;
|
||||
} // namespace rnn
|
||||
} // namespace marian
|
||||
|
@ -197,6 +197,7 @@ public:
|
||||
|
||||
Expr gruOps(const std::vector<Expr>& nodes, bool final = false);
|
||||
|
||||
/** Base class for a gated recurrent unit (GRU) cell. */
|
||||
class GRU : public Cell {
|
||||
protected:
|
||||
std::string prefix_;
|
||||
|
@ -5,6 +5,12 @@
|
||||
#include "rnn/rnn.h"
|
||||
|
||||
namespace marian {
|
||||
/**
|
||||
* The namespace rnn.
|
||||
* Declare class Dense and all the available functions for creating
|
||||
* <a href=https://en.wikipedia.org/wiki/Recurrent_neural_network>recurrent neural network (RNN)</a>
|
||||
* network.
|
||||
*/
|
||||
namespace rnn {
|
||||
|
||||
typedef Factory StackableFactory;
|
||||
@ -28,6 +34,12 @@ struct InputFactory : public StackableFactory {
|
||||
virtual Ptr<CellInput> construct(Ptr<ExpressionGraph> graph) = 0;
|
||||
};
|
||||
|
||||
/**
|
||||
* Base class for constructing RNN cells.
|
||||
* RNN cells only process a single timestep instead of the whole batches of input sequences.
|
||||
* There are nine types of RNN cells provided by Marian, i.e., `gru`, `gru-nematus`, `lstm`,
|
||||
* `mlstm`, `mgru`, `tanh`, `relu`, `sru`, `ssru`.
|
||||
*/
|
||||
class CellFactory : public StackableFactory {
|
||||
protected:
|
||||
std::vector<std::function<Expr(Ptr<rnn::RNN>)>> inputs_;
|
||||
@ -92,8 +104,10 @@ public:
|
||||
}
|
||||
};
|
||||
|
||||
/** A convenience typedef for constructing RNN cells. */
|
||||
typedef Accumulator<CellFactory> cell;
|
||||
|
||||
/** Base class for constructing a stack of RNN cells (`rnn::cell`). */
|
||||
class StackedCellFactory : public CellFactory {
|
||||
protected:
|
||||
std::vector<Ptr<StackableFactory>> stackableFactories_;
|
||||
@ -137,8 +151,10 @@ public:
|
||||
}
|
||||
};
|
||||
|
||||
/** A convenience typedef for constructing a stack of RNN cells. */
|
||||
typedef Accumulator<StackedCellFactory> stacked_cell;
|
||||
|
||||
/** Base class for constructing RNN layers. */
|
||||
class RNNFactory : public Factory {
|
||||
using Factory::Factory;
|
||||
protected:
|
||||
@ -195,6 +211,7 @@ public:
|
||||
}
|
||||
};
|
||||
|
||||
/** A convenience typedef for constructing RNN containers/layers. */
|
||||
typedef Accumulator<RNNFactory> rnn;
|
||||
} // namespace rnn
|
||||
} // namespace marian
|
||||
|
Loading…
Reference in New Issue
Block a user