2017-10-14 20:52:56 +03:00
|
|
|
# Changelog
|
2017-10-15 17:40:35 +03:00
|
|
|
|
2017-10-14 20:52:56 +03:00
|
|
|
All notable changes to this project will be documented in this file.
|
|
|
|
|
|
|
|
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
|
|
|
|
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
|
|
|
|
|
|
|
|
## [Unreleased]
|
2018-01-24 10:10:34 +03:00
|
|
|
|
2018-08-10 12:35:35 +03:00
|
|
|
### Added
|
2018-08-17 15:16:00 +03:00
|
|
|
- Word alignment generation in scorer
|
|
|
|
- Attention output generation in decoder and scorer with `--alignment soft`
|
2018-08-10 12:35:35 +03:00
|
|
|
|
2018-08-16 13:08:52 +03:00
|
|
|
### Fixed
|
|
|
|
- Delayed output in line-by-line translation
|
|
|
|
|
2018-08-17 15:16:00 +03:00
|
|
|
### Changed
|
|
|
|
- Generated word alignments include alignments for target EOS tokens
|
2018-08-28 16:55:31 +03:00
|
|
|
- Boost::program_options has been replaced by another CLI library
|
|
|
|
- Expansion of unambiguous command-line arguments is no longer supported
|
2018-08-17 15:16:00 +03:00
|
|
|
|
2018-08-09 02:21:04 +03:00
|
|
|
## [1.6.0] - 2018-08-08
|
|
|
|
|
2018-06-19 15:06:40 +03:00
|
|
|
### Added
|
2018-06-24 02:09:05 +03:00
|
|
|
- Faster training (20-30%) by optimizing gradient popagation of biases
|
2018-08-10 12:35:35 +03:00
|
|
|
- Returning Moses-style hard alignments during decoding single models,
|
|
|
|
ensembles and n-best lists
|
2018-07-08 22:53:49 +03:00
|
|
|
- Hard alignment extraction strategy taking source words that have the
|
|
|
|
attention value greater than the threshold
|
2018-07-01 10:19:28 +03:00
|
|
|
- Refactored sync sgd for easier communication and integration with NCCL
|
|
|
|
- Smaller memory-overhead for sync-sgd
|
|
|
|
- NCCL integration (version 2.2.13)
|
2018-08-08 19:15:17 +03:00
|
|
|
- New binary format for saving/load of models, can be used with _*.bin_
|
|
|
|
extension (can be memory mapped)
|
|
|
|
- Memory-mapping of graphs for inferece with `ExpressionGraph::mmap(const void*
|
|
|
|
ptr)` function. (assumes _*.bin_ model is mapped or in buffer)
|
|
|
|
- Added SRU (--dec-cell sru) and ReLU (--dec-cell relu) cells to inventory of
|
2018-08-10 12:35:35 +03:00
|
|
|
RNN cells
|
2018-08-08 19:15:17 +03:00
|
|
|
- RNN auto-regression layers in transformer (`--transformer-decoder-autreg
|
2018-08-10 12:35:35 +03:00
|
|
|
rnn`), work with gru, lstm, tanh, relu, sru cells
|
2018-08-08 19:15:17 +03:00
|
|
|
- Recurrently stacked layers in transformer (`--transformer-tied-layers 1 1 1 2
|
|
|
|
2 2` means 6 layers with 1-3 and 4-6 tied parameters, two groups of
|
|
|
|
parameters)
|
2018-08-10 12:35:35 +03:00
|
|
|
- Seamless training continuation with exponential smoothing
|
2018-06-19 15:06:40 +03:00
|
|
|
|
2018-06-28 00:09:55 +03:00
|
|
|
### Fixed
|
2018-08-10 12:35:35 +03:00
|
|
|
- A couple of bugs in "selection" (transpose, shift, cols, rows) operators
|
|
|
|
during back-prob for a very specific case: one of the operators is the first
|
|
|
|
operator after a branch, in that case gradient propgation might be
|
|
|
|
interrupted. This did not affect any of the existing models as such a case
|
|
|
|
was not present, but might have caused future models to not train properly
|
|
|
|
- Bug in mini-batch-fit, tied embeddings would result in identical embeddings
|
|
|
|
in fake source and target batch. Caused under-estimation of memory usage and
|
|
|
|
re-allocation
|
2018-06-28 00:09:55 +03:00
|
|
|
|
2018-06-19 15:06:40 +03:00
|
|
|
## [1.5.0] - 2018-06-17
|
|
|
|
|
|
|
|
### Added
|
|
|
|
- Average Attention Networks for Transformer model
|
|
|
|
- 16-bit matrix multiplication on CPU
|
|
|
|
- Memoization for constant nodes for decoding
|
|
|
|
- Autotuning for decoding
|
|
|
|
|
|
|
|
### Fixed
|
|
|
|
- GPU decoding optimizations, about 2x faster decoding of transformer models
|
|
|
|
- Multi-node MPI-based training on GPUs
|
|
|
|
|
2018-03-13 20:02:59 +03:00
|
|
|
## [1.4.0] - 2018-03-13
|
2018-02-10 21:56:37 +03:00
|
|
|
|
2018-02-09 19:03:49 +03:00
|
|
|
### Added
|
|
|
|
- Data weighting with `--data-weighting` at sentence or word level
|
|
|
|
- Persistent SQLite3 corpus storage with `--sqlite file.db`
|
2018-02-14 13:21:35 +03:00
|
|
|
- Experimental multi-node asynchronous training
|
2018-02-19 17:55:39 +03:00
|
|
|
- Restoring optimizer and training parameters such as learning rate, validation
|
|
|
|
results, etc.
|
2018-03-13 20:02:59 +03:00
|
|
|
- Experimental multi-CPU training/translation/scoring with `--cpu-threads=N`
|
2018-02-25 16:48:38 +03:00
|
|
|
- Restoring corpus iteration after training is restarted
|
2018-03-13 20:02:59 +03:00
|
|
|
- N-best-list scoring in marian-scorer
|
2018-02-09 19:03:49 +03:00
|
|
|
|
2018-02-10 21:56:37 +03:00
|
|
|
### Fixed
|
|
|
|
- Deterministic data shuffling with specific seed for SQLite3 corpus storage
|
2018-02-25 07:44:38 +03:00
|
|
|
- Mini-batch fitting with binary search for faster fitting
|
2018-03-13 20:02:59 +03:00
|
|
|
- Better batch packing due to sorting
|
2018-02-10 21:56:37 +03:00
|
|
|
|
2018-02-19 17:55:39 +03:00
|
|
|
|
2018-02-05 05:30:36 +03:00
|
|
|
## [1.3.1] - 2018-02-04
|
|
|
|
|
|
|
|
### Fixed
|
2018-02-25 16:48:38 +03:00
|
|
|
- Missing final validation when done with training
|
|
|
|
- Differing summaries for marian-scorer when used with multiple GPUs
|
2018-02-05 05:30:36 +03:00
|
|
|
|
2018-01-24 10:10:34 +03:00
|
|
|
## [1.3.0] - 2018-01-24
|
|
|
|
|
|
|
|
### Added
|
2018-02-09 19:03:49 +03:00
|
|
|
- SQLite3 based corpus storage for on-disk shuffling etc. with `--sqlite`
|
|
|
|
- Asynchronous maxi-batch preloading
|
2018-01-24 02:08:07 +03:00
|
|
|
- Using transpose in SGEMM to tie embeddings in output layer
|
2017-11-21 12:45:46 +03:00
|
|
|
|
2018-01-19 18:58:16 +03:00
|
|
|
## [1.2.1] - 2018-01-19
|
|
|
|
|
|
|
|
### Fixed
|
2018-08-10 12:35:35 +03:00
|
|
|
- Use valid-mini-batch size during validation with "translation" instead of
|
|
|
|
mini-batch
|
2018-01-19 18:58:16 +03:00
|
|
|
- Normalize gradients with multi-gpu synchronous SGD
|
|
|
|
- Fix divergence between saved models and validated models in asynchronous SGD
|
|
|
|
|
2018-01-13 23:50:46 +03:00
|
|
|
## [1.2.0] - 2018-01-13
|
|
|
|
|
2018-01-10 13:59:53 +03:00
|
|
|
### Added
|
2018-01-13 16:19:09 +03:00
|
|
|
- Option `--pretrained-model` to be used for network weights initialization
|
|
|
|
with a pretrained model
|
2018-01-13 00:19:37 +03:00
|
|
|
- Version number saved in the model file
|
2018-01-10 13:59:53 +03:00
|
|
|
- CMake option `-DCOMPILE_SERVER=ON`
|
2018-01-13 23:50:46 +03:00
|
|
|
- Right-to-left training, scoring, decoding with `--right-left`
|
2018-01-10 13:59:53 +03:00
|
|
|
|
|
|
|
### Fixed
|
|
|
|
- Fixed marian-server compilation with Boost 1.66
|
2018-01-13 23:50:46 +03:00
|
|
|
- Fixed compilation on g++-4.8.4
|
|
|
|
- Fixed compilation without marian-server if openssl is not available
|
2018-01-10 13:59:53 +03:00
|
|
|
|
2017-12-06 20:03:41 +03:00
|
|
|
## [1.1.3] - 2017-12-06
|
|
|
|
|
2017-12-06 20:05:09 +03:00
|
|
|
### Added
|
|
|
|
- Added back gradient-dropping
|
|
|
|
|
2017-12-06 20:03:41 +03:00
|
|
|
### Fixed
|
2018-01-10 13:59:53 +03:00
|
|
|
- Fixed parameters initialization for `--tied-embeddings` during translation
|
2017-12-06 20:03:41 +03:00
|
|
|
|
2017-12-05 21:35:19 +03:00
|
|
|
## [1.1.2] - 2017-12-05
|
|
|
|
|
|
|
|
### Fixed
|
|
|
|
- Fixed ensembling with language model and batched decoding
|
2018-01-10 13:59:53 +03:00
|
|
|
- Fixed attention reduction kernel with large matrices (added missing
|
|
|
|
`syncthreads()`), which should fix stability with large batches and beam-size
|
2018-08-10 12:35:35 +03:00
|
|
|
during batched decoding
|
2017-12-05 21:35:19 +03:00
|
|
|
|
2017-11-30 13:06:04 +03:00
|
|
|
## [1.1.1] - 2017-11-30
|
|
|
|
|
2017-11-26 04:01:49 +03:00
|
|
|
### Added
|
2018-01-10 13:59:53 +03:00
|
|
|
- Option `--max-length-crop` to be used together with `--max-length N` to crop
|
|
|
|
sentences to length N rather than omitting them.
|
2017-11-30 13:06:04 +03:00
|
|
|
- Experimental model with convolution over input characters
|
|
|
|
|
2017-12-05 21:35:19 +03:00
|
|
|
### Fixed
|
2017-11-30 13:06:04 +03:00
|
|
|
- Fixed a number of bugs for vocabulary and directory handling
|
2017-11-26 04:01:49 +03:00
|
|
|
|
2017-11-21 12:45:46 +03:00
|
|
|
## [1.1.0] - 2017-11-21
|
2017-10-14 20:52:56 +03:00
|
|
|
|
2017-10-18 12:02:53 +03:00
|
|
|
### Added
|
2017-11-21 12:45:46 +03:00
|
|
|
- Batched translation for all model types, significant translation speed-up
|
|
|
|
- Batched translation during validation with translation
|
|
|
|
- `--maxi-batch-sort` option for `marian-decoder`
|
|
|
|
- Support for CUBLAS_TENSOR_OP_MATH mode for cublas in cuda 9.0
|
2017-11-21 14:00:15 +03:00
|
|
|
- The "marian-vocab" tool to create vocabularies
|
2017-10-18 12:02:53 +03:00
|
|
|
|
2017-11-13 15:27:52 +03:00
|
|
|
## [1.0.0] - 2017-11-13
|
2017-10-15 17:40:35 +03:00
|
|
|
|
2017-10-14 20:52:56 +03:00
|
|
|
### Added
|
2017-11-05 11:19:24 +03:00
|
|
|
- Multi-gpu validation, scorer and in-training translation
|
|
|
|
- summary-mode for scorer
|
2017-10-15 17:40:35 +03:00
|
|
|
- New "transformer" model based on [Attention is all you
|
|
|
|
need](https://arxiv.org/abs/1706.03762)
|
|
|
|
- Options specific for the transformer model
|
2017-10-14 20:52:56 +03:00
|
|
|
- Linear learning rate warmup with and without initial value
|
|
|
|
- Cyclic learning rate warmup
|
2017-10-15 17:40:35 +03:00
|
|
|
- More options for learning rate decay, including: optimizer history reset,
|
|
|
|
repeated warmup
|
2018-01-10 13:59:53 +03:00
|
|
|
- Continuous inverted square root decay of learning (`--lr-decay-inv-sqrt`)
|
|
|
|
rate based on number of updates
|
2017-10-14 20:52:56 +03:00
|
|
|
- Exposed optimizer parameters (e.g. momentum etc. for Adam)
|
2017-10-18 12:02:53 +03:00
|
|
|
- Version of deep RNN-based models compatible with Nematus (`--type nematus`)
|
|
|
|
- Synchronous SGD training for multi-gpu (enable with `--sync-sgd`)
|
2017-10-14 20:52:56 +03:00
|
|
|
- Dynamic construction of complex models with different encoders and decoders,
|
2017-10-15 17:40:35 +03:00
|
|
|
currently only available through the C++ API
|
2017-11-21 14:00:15 +03:00
|
|
|
- Option `--quiet` to suppress output to stderr
|
2017-10-15 17:40:35 +03:00
|
|
|
- Option to choose different variants of optimization criterion: mean
|
2018-01-10 13:59:53 +03:00
|
|
|
cross-entropy, perplexity, cross-entropy sum
|
2017-10-15 17:40:35 +03:00
|
|
|
- In-process translation for validation, uses the same memory as training
|
|
|
|
- Label Smoothing
|
2017-11-13 15:27:52 +03:00
|
|
|
- CHANGELOG.md
|
|
|
|
- CONTRIBUTING.md
|
2017-10-22 22:18:30 +03:00
|
|
|
- Swish activation function default for Transformer
|
2017-11-13 15:27:52 +03:00
|
|
|
(https://arxiv.org/pdf/1710.05941.pdf)
|
2017-10-14 20:52:56 +03:00
|
|
|
|
|
|
|
### Changed
|
2017-11-13 15:27:52 +03:00
|
|
|
- Changed shape organization to follow numpy.
|
2018-08-08 19:15:17 +03:00
|
|
|
- Changed option `--moving-average` to `--exponential-smoothing` and inverted
|
|
|
|
formula to `s_t = (1 - \alpha) * s_{t-1} + \alpha * x_t`, `\alpha` is now
|
2017-11-13 15:27:52 +03:00
|
|
|
`1-e4` by default
|
2017-11-05 11:19:24 +03:00
|
|
|
- Got rid of thrust for compile-time mathematical expressions
|
|
|
|
- Changed boolean option `--normalize` to `--normalize [arg=1] (=0)`. New
|
2017-11-13 15:27:52 +03:00
|
|
|
behaviour is backwards-compatible and can also be specified as
|
|
|
|
`--normalize=0.6`
|
2017-10-17 16:22:42 +03:00
|
|
|
- Renamed "s2s" binary to "marian-decoder"
|
|
|
|
- Renamed "rescorer" binary to "marian-scorer"
|
2017-10-18 12:02:53 +03:00
|
|
|
- Renamed "server" binary to "marian-server"
|
2017-10-17 16:22:42 +03:00
|
|
|
- Renamed option name `--dynamic-batching` to `--mini-batch-fit`
|
2017-10-15 17:40:35 +03:00
|
|
|
- Unified cross-entropy-based validation, supports now perplexity and other CE
|
2017-10-18 12:02:53 +03:00
|
|
|
- Changed `--normalize (bool)` to `--normalize (float)arg`, allow to change
|
2018-08-10 12:35:35 +03:00
|
|
|
length normalization weight as `score / pow(length, arg)`
|
2017-10-14 20:52:56 +03:00
|
|
|
|
|
|
|
### Removed
|
2017-10-17 16:22:42 +03:00
|
|
|
- Temporarily removed gradient dropping (`--drop-rate X`) until refactoring.
|