marian/CHANGELOG.md

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- Automatic vectorization of elementwise operations on CPU for tensors dims that 
  are divible by 4 (AVX) and 8 (AVX2)
- Replacing std::shared_ptr<T> with custom IntrusivePtr<T> for small objects like 
  Tensors, Hypotheses and Expressions.
- First steps towards integrating FP16 support, currently no-ops.

### Fixed
- Sort parameters by lexicographical order during allocation to ensure consistent 
  memory-layout during allocation, loading, saving.
- Output empty line when input is empty line. Previous behavior might result in 
  hallucinated outputs.
- Compilation with CUDA 10.1

## [1.8.0] - 2019-09-04

### Added
- Alias options and new --task option
- Automatic detection of CPU intrisics when building with -arch=native
- First version of BERT-training and BERT-classifier, currently not compatible with TF models
- New reduction operators
- Use Cmake's ExternalProject to build NCCL and potentially other external libs
- Code for Factored Vocabulary, currently not usable yet without outside tools

### Fixed
- Issue with relative paths in automatically generated decoder config files
- Bug with overlapping CXX flags and building spm_train executable
- Compilation with gcc 8
- Overwriting and unsetting vector options
- Windows build with recent changes
- Bug with read-ahead buffer
- Handling of "dump-config: false" in YAML config
- Errors due to warnings
- Issue concerning failed saving with single GPU training and --sync-sgd option.
- NaN problem when training with Tensor Cores on Volta GPUs
- Fix pipe-handling
- Fix compilation with GCC 9.1
- Fix CMake build types

### Changed
- Error message when using left-to-right and right-to-left models together in ensembles
- Regression tests included as a submodule
- Update NCCL to 2.4.2
- Add zlib source to Marian's source tree, builds now as object lib
- -DUSE_STATIC_LIBS=on now also looks for static versions of CUDA libraries
- Include NCCL build from github.com/marian-nmt/nccl and compile within source tree
- Set nearly all warnings as errors for Marian's own targets. Disable warnings for 3rd party
- Refactored beam search

## [1.7.0] - 2018-11-27

### Added
- Word alignment generation in scorer
- Attention output generation in decoder and scorer with `--alignment soft`
- Support for SentencePiece vocabularies and run-time segmentation/desegmentation
- Support for SentencePiece vocabulary training during model training
- Group training files by filename when creating vocabularies for joint vocabularies
- Updated examples
- Synchronous multi-node training (early version)

### Fixed
- Delayed output in line-by-line translation

### Changed
- Generated word alignments include alignments for target EOS tokens
- Boost::program_options has been replaced by another CLI library
- Replace boost::file_system with Pathie
- Expansion of unambiguous command-line arguments is no longer supported

## [1.6.0] - 2018-08-08

### Added
- Faster training (20-30%) by optimizing gradient popagation of biases
- Returning Moses-style hard alignments during decoding single models,
  ensembles and n-best lists
- Hard alignment extraction strategy taking source words that have the
  attention value greater than the threshold
- Refactored sync sgd for easier communication and integration with NCCL
- Smaller memory-overhead for sync-sgd
- NCCL integration (version 2.2.13)
- New binary format for saving/load of models, can be used with _*.bin_
  extension (can be memory mapped)
- Memory-mapping of graphs for inferece with `ExpressionGraph::mmap(const void*
  ptr)` function. (assumes _*.bin_ model is mapped or in buffer)
- Added SRU (--dec-cell sru) and ReLU (--dec-cell relu) cells to inventory of
  RNN cells
- RNN auto-regression layers in transformer (`--transformer-decoder-autreg
  rnn`), work with gru, lstm, tanh, relu, sru cells
- Recurrently stacked layers in transformer (`--transformer-tied-layers 1 1 1 2
  2 2` means 6 layers with 1-3 and 4-6 tied parameters, two groups of
  parameters)
- Seamless training continuation with exponential smoothing

### Fixed
- A couple of bugs in "selection" (transpose, shift, cols, rows) operators
  during back-prob for a very specific case: one of the operators is the first
  operator after a branch, in that case gradient propgation might be
  interrupted. This did not affect any of the existing models as such a case
  was not present, but might have caused future models to not train properly
- Bug in mini-batch-fit, tied embeddings would result in identical embeddings
  in fake source and target batch. Caused under-estimation of memory usage and
  re-allocation

## [1.5.0] - 2018-06-17

### Added
- Average Attention Networks for Transformer model
- 16-bit matrix multiplication on CPU
- Memoization for constant nodes for decoding
- Autotuning for decoding

### Fixed
- GPU decoding optimizations, about 2x faster decoding of transformer models
- Multi-node MPI-based training on GPUs

## [1.4.0] - 2018-03-13

### Added
- Data weighting with `--data-weighting` at sentence or word level
- Persistent SQLite3 corpus storage with `--sqlite file.db`
- Experimental multi-node asynchronous training
- Restoring optimizer and training parameters such as learning rate, validation
  results, etc.
- Experimental multi-CPU training/translation/scoring with `--cpu-threads=N`
- Restoring corpus iteration after training is restarted
- N-best-list scoring in marian-scorer

### Fixed
- Deterministic data shuffling with specific seed for SQLite3 corpus storage
- Mini-batch fitting with binary search for faster fitting
- Better batch packing due to sorting


## [1.3.1] - 2018-02-04

### Fixed
- Missing final validation when done with training
- Differing summaries for marian-scorer when used with multiple GPUs

## [1.3.0] - 2018-01-24

### Added
- SQLite3 based corpus storage for on-disk shuffling etc. with `--sqlite`
- Asynchronous maxi-batch preloading
- Using transpose in SGEMM to tie embeddings in output layer

## [1.2.1] - 2018-01-19

### Fixed
- Use valid-mini-batch size during validation with "translation" instead of
  mini-batch
- Normalize gradients with multi-gpu synchronous SGD
- Fix divergence between saved models and validated models in asynchronous SGD

## [1.2.0] - 2018-01-13

### Added
- Option `--pretrained-model` to be used for network weights initialization
  with a pretrained model
- Version number saved in the model file
- CMake option `-DCOMPILE_SERVER=ON`
- Right-to-left training, scoring, decoding with `--right-left`

### Fixed
- Fixed marian-server compilation with Boost 1.66
- Fixed compilation on g++-4.8.4
- Fixed compilation without marian-server if openssl is not available

## [1.1.3] - 2017-12-06

### Added
- Added back gradient-dropping

### Fixed
- Fixed parameters initialization for `--tied-embeddings` during translation

## [1.1.2] - 2017-12-05

### Fixed
- Fixed ensembling with language model and batched decoding
- Fixed attention reduction kernel with large matrices (added missing
  `syncthreads()`), which should fix stability with large batches and beam-size
  during batched decoding

## [1.1.1] - 2017-11-30

### Added
- Option `--max-length-crop` to be used together with `--max-length N` to crop
  sentences to length N rather than omitting them.
- Experimental model with convolution over input characters

### Fixed
- Fixed a number of bugs for vocabulary and directory handling

## [1.1.0] - 2017-11-21

### Added
- Batched translation for all model types, significant translation speed-up
- Batched translation during validation with translation
- `--maxi-batch-sort` option for `marian-decoder`
- Support for CUBLAS_TENSOR_OP_MATH mode for cublas in cuda 9.0
- The "marian-vocab" tool to create vocabularies

## [1.0.0] - 2017-11-13

### Added
- Multi-gpu validation, scorer and in-training translation
- summary-mode for scorer
- New "transformer" model based on [Attention is all you
  need](https://arxiv.org/abs/1706.03762)
- Options specific for the transformer model
- Linear learning rate warmup with and without initial value
- Cyclic learning rate warmup
- More options for learning rate decay, including: optimizer history reset,
  repeated warmup
- Continuous inverted square root decay of learning (`--lr-decay-inv-sqrt`)
  rate based on number of updates
- Exposed optimizer parameters (e.g. momentum etc. for Adam)
- Version of deep RNN-based models compatible with Nematus (`--type nematus`)
- Synchronous SGD training for multi-gpu (enable with `--sync-sgd`)
- Dynamic construction of complex models with different encoders and decoders,
  currently only available through the C++ API
- Option `--quiet` to suppress output to stderr
- Option to choose different variants of optimization criterion: mean
  cross-entropy, perplexity, cross-entropy sum
- In-process translation for validation, uses the same memory as training
- Label Smoothing
- CHANGELOG.md
- CONTRIBUTING.md
- Swish activation function default for Transformer
  (https://arxiv.org/pdf/1710.05941.pdf)

### Changed
- Changed shape organization to follow numpy.
- Changed option `--moving-average` to `--exponential-smoothing` and inverted
  formula to `s_t = (1 - \alpha) * s_{t-1} + \alpha * x_t`, `\alpha` is now
  `1-e4` by default
- Got rid of thrust for compile-time mathematical expressions
- Changed boolean option `--normalize` to `--normalize [arg=1] (=0)`. New
  behaviour is backwards-compatible and can also be specified as
  `--normalize=0.6`
- Renamed "s2s" binary to "marian-decoder"
- Renamed "rescorer" binary to "marian-scorer"
- Renamed "server" binary to "marian-server"
- Renamed option name `--dynamic-batching` to `--mini-batch-fit`
- Unified cross-entropy-based validation, supports now perplexity and other CE
- Changed `--normalize (bool)` to `--normalize (float)arg`, allow to change
  length normalization weight as `score / pow(length, arg)`

### Removed
- Temporarily removed gradient dropping (`--drop-rate X`) until refactoring.
Changelog 2017-10-14 20:52:56 +03:00			`# Changelog`
Update changelog 2017-10-15 17:40:35 +03:00
Changelog 2017-10-14 20:52:56 +03:00			`All notable changes to this project will be documented in this file.`

			`The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)`
			`and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).`

			`## [Unreleased]`
update to v1.3.0 2018-01-24 10:10:34 +03:00
Adapt changelog and version 2019-09-10 02:39:36 +03:00			`### Added`
			`- Automatic vectorization of elementwise operations on CPU for tensors dims that`
			`are divible by 4 (AVX) and 8 (AVX2)`
			`- Replacing std::shared_ptr<T> with custom IntrusivePtr<T> for small objects like`
			`Tensors, Hypotheses and Expressions.`
			`- First steps towards integrating FP16 support, currently no-ops.`

update changelog 2019-09-08 09:05:37 +03:00			`### Fixed`
Adapt changelog and version 2019-09-10 02:39:36 +03:00			`- Sort parameters by lexicographical order during allocation to ensure consistent`
			`memory-layout during allocation, loading, saving.`
update changelog 2019-09-08 09:05:37 +03:00			`- Output empty line when input is empty line. Previous behavior might result in`
			`hallucinated outputs.`
update changelog and version 2019-09-10 17:25:05 +03:00			`- Compilation with CUDA 10.1`
update changelog 2019-09-08 09:05:37 +03:00
update version 2019-09-04 23:50:39 +03:00			`## [1.8.0] - 2019-09-04`

bump version, add to changelog 2018-12-08 00:39:27 +03:00			`### Added`
Update CHANGELOG and VERSION 2019-08-14 22:02:22 +03:00			`- Alias options and new --task option`
			`- Automatic detection of CPU intrisics when building with -arch=native`
changelog added information about first BERT attempts 2019-01-25 04:22:44 +03:00			`- First version of BERT-training and BERT-classifier, currently not compatible with TF models`
update changelog 2019-01-25 05:55:01 +03:00			`- New reduction operators`
Update CHANGELOG and VERSION 2019-08-14 22:02:22 +03:00			`- Use Cmake's ExternalProject to build NCCL and potentially other external libs`
update version 2019-09-04 23:50:39 +03:00			`- Code for Factored Vocabulary, currently not usable yet without outside tools`
bump version, add to changelog 2018-12-08 00:39:27 +03:00
fix all warnings 2018-11-29 06:30:27 +03:00			`### Fixed`
Update CHANGELOG and VERSION 2019-08-14 22:02:22 +03:00			`- Issue with relative paths in automatically generated decoder config files`
			`- Bug with overlapping CXX flags and building spm_train executable`
			`- Compilation with gcc 8`
			`- Overwriting and unsetting vector options`
bump version number 2018-12-13 04:30:18 +03:00			`- Windows build with recent changes`
			`- Bug with read-ahead buffer`
Update CHANGELOG and VERSION 2019-08-14 22:02:22 +03:00			`- Handling of "dump-config: false" in YAML config`
fix all warnings 2018-11-29 06:30:27 +03:00			`- Errors due to warnings`
Update CHANGELOG and VERSION 2019-08-14 22:02:22 +03:00			`- Issue concerning failed saving with single GPU training and --sync-sgd option.`
			`- NaN problem when training with Tensor Cores on Volta GPUs`
update version 2019-09-04 23:50:39 +03:00			`- Fix pipe-handling`
			`- Fix compilation with GCC 9.1`
			`- Fix CMake build types`
fix all warnings 2018-11-29 06:30:27 +03:00
			`### Changed`
Update CHANGELOG and VERSION 2019-08-14 22:02:22 +03:00			`- Error message when using left-to-right and right-to-left models together in ensembles`
			`- Regression tests included as a submodule`
update changelog 2019-02-05 10:14:10 +03:00			`- Update NCCL to 2.4.2`
bump to version 1.7.5 2018-12-08 01:46:48 +03:00			`- Add zlib source to Marian's source tree, builds now as object lib`
bump version, add to changelog 2018-12-08 00:39:27 +03:00			`- -DUSE_STATIC_LIBS=on now also looks for static versions of CUDA libraries`
			`- Include NCCL build from github.com/marian-nmt/nccl and compile within source tree`
Update CHANGELOG and VERSION 2019-08-14 22:02:22 +03:00			`- Set nearly all warnings as errors for Marian's own targets. Disable warnings for 3rd party`
update version 2019-09-04 23:50:39 +03:00			`- Refactored beam search`
fix all warnings 2018-11-29 06:30:27 +03:00
update version log 2018-11-27 22:25:48 +03:00			`## [1.7.0] - 2018-11-27`

Update CHANGELOG 2018-08-10 12:35:35 +03:00			`### Added`
Update CHANGELOG 2018-08-17 15:16:00 +03:00			`- Word alignment generation in scorer`
			- Attention output generation in decoder and scorer with `--alignment soft`
update changelog 2018-11-10 12:56:27 +03:00			`- Support for SentencePiece vocabularies and run-time segmentation/desegmentation`
update version log 2018-11-27 22:25:48 +03:00			`- Support for SentencePiece vocabulary training during model training`
			`- Group training files by filename when creating vocabularies for joint vocabularies`
			`- Updated examples`
			`- Synchronous multi-node training (early version)`
Update CHANGELOG 2018-08-10 12:35:35 +03:00
Update CHANGELOG 2018-08-16 13:08:52 +03:00			`### Fixed`
			`- Delayed output in line-by-line translation`

Update CHANGELOG 2018-08-17 15:16:00 +03:00			`### Changed`
			`- Generated word alignments include alignments for target EOS tokens`
Update CHANGELOG 2018-08-28 16:55:31 +03:00			`- Boost::program_options has been replaced by another CLI library`
update changelog 2018-11-10 12:56:27 +03:00			`- Replace boost::file_system with Pathie`
Update CHANGELOG 2018-08-28 16:55:31 +03:00			`- Expansion of unambiguous command-line arguments is no longer supported`
Update CHANGELOG 2018-08-17 15:16:00 +03:00
set version to 1.6.0 2018-08-09 02:21:04 +03:00			`## [1.6.0] - 2018-08-08`

Update CHANGELOG 2018-06-19 15:06:40 +03:00			`### Added`
update CHANGELOG.md 2018-06-24 02:09:05 +03:00			`- Faster training (20-30%) by optimizing gradient popagation of biases`
Update CHANGELOG 2018-08-10 12:35:35 +03:00			`- Returning Moses-style hard alignments during decoding single models,`
			`ensembles and n-best lists`
Update CHANGELOG 2018-07-08 22:53:49 +03:00			`- Hard alignment extraction strategy taking source words that have the`
			`attention value greater than the threshold`
Update changelog 2018-07-01 10:19:28 +03:00			`- Refactored sync sgd for easier communication and integration with NCCL`
			`- Smaller memory-overhead for sync-sgd`
			`- NCCL integration (version 2.2.13)`
Update CHANGELOG 2018-08-08 19:15:17 +03:00			`- New binary format for saving/load of models, can be used with _*.bin_`
			`extension (can be memory mapped)`
			- Memory-mapping of graphs for inferece with `ExpressionGraph::mmap(const void*
			ptr)` function. (assumes _*.bin_ model is mapped or in buffer)
			`- Added SRU (--dec-cell sru) and ReLU (--dec-cell relu) cells to inventory of`
Update CHANGELOG 2018-08-10 12:35:35 +03:00			`RNN cells`
Update CHANGELOG 2018-08-08 19:15:17 +03:00			- RNN auto-regression layers in transformer (`--transformer-decoder-autreg
Update CHANGELOG 2018-08-10 12:35:35 +03:00			rnn`), work with gru, lstm, tanh, relu, sru cells
Update CHANGELOG 2018-08-08 19:15:17 +03:00			- Recurrently stacked layers in transformer (`--transformer-tied-layers 1 1 1 2
			2 2` means 6 layers with 1-3 and 4-6 tied parameters, two groups of
			`parameters)`
Update CHANGELOG 2018-08-10 12:35:35 +03:00			`- Seamless training continuation with exponential smoothing`
Update CHANGELOG 2018-06-19 15:06:40 +03:00
../CHANGELOG.md 2018-06-28 00:09:55 +03:00			`### Fixed`
Update CHANGELOG 2018-08-10 12:35:35 +03:00			`- A couple of bugs in "selection" (transpose, shift, cols, rows) operators`
			`during back-prob for a very specific case: one of the operators is the first`
			`operator after a branch, in that case gradient propgation might be`
			`interrupted. This did not affect any of the existing models as such a case`
			`was not present, but might have caused future models to not train properly`
			`- Bug in mini-batch-fit, tied embeddings would result in identical embeddings`
			`in fake source and target batch. Caused under-estimation of memory usage and`
			`re-allocation`
../CHANGELOG.md 2018-06-28 00:09:55 +03:00
Update CHANGELOG 2018-06-19 15:06:40 +03:00			`## [1.5.0] - 2018-06-17`

			`### Added`
			`- Average Attention Networks for Transformer model`
			`- 16-bit matrix multiplication on CPU`
			`- Memoization for constant nodes for decoding`
			`- Autotuning for decoding`

			`### Fixed`
			`- GPU decoding optimizations, about 2x faster decoding of transformer models`
			`- Multi-node MPI-based training on GPUs`

update changelog 2018-03-13 20:02:59 +03:00			`## [1.4.0] - 2018-03-13`
Update CHANGELOG 2018-02-10 21:56:37 +03:00
Update CHANGELOG 2018-02-09 19:03:49 +03:00			`### Added`
			- Data weighting with `--data-weighting` at sentence or word level
			- Persistent SQLite3 corpus storage with `--sqlite file.db`
Update CHANGELOG 2018-02-14 13:21:35 +03:00			`- Experimental multi-node asynchronous training`
Update CHANGELOG 2018-02-19 17:55:39 +03:00			`- Restoring optimizer and training parameters such as learning rate, validation`
			`results, etc.`
update changelog 2018-03-13 20:02:59 +03:00			- Experimental multi-CPU training/translation/scoring with `--cpu-threads=N`
Update CHANGELOG 2018-02-25 16:48:38 +03:00			`- Restoring corpus iteration after training is restarted`
update changelog 2018-03-13 20:02:59 +03:00			`- N-best-list scoring in marian-scorer`
Update CHANGELOG 2018-02-09 19:03:49 +03:00
Update CHANGELOG 2018-02-10 21:56:37 +03:00			`### Fixed`
			`- Deterministic data shuffling with specific seed for SQLite3 corpus storage`
update changelog 2018-02-25 07:44:38 +03:00			`- Mini-batch fitting with binary search for faster fitting`
update changelog 2018-03-13 20:02:59 +03:00			`- Better batch packing due to sorting`
Update CHANGELOG 2018-02-10 21:56:37 +03:00
Update CHANGELOG 2018-02-19 17:55:39 +03:00
update to version 1.3.1 2018-02-05 05:30:36 +03:00			`## [1.3.1] - 2018-02-04`

			`### Fixed`
Update CHANGELOG 2018-02-25 16:48:38 +03:00			`- Missing final validation when done with training`
			`- Differing summaries for marian-scorer when used with multiple GPUs`
update to version 1.3.1 2018-02-05 05:30:36 +03:00
update to v1.3.0 2018-01-24 10:10:34 +03:00			`## [1.3.0] - 2018-01-24`

			`### Added`
Update CHANGELOG 2018-02-09 19:03:49 +03:00			- SQLite3 based corpus storage for on-disk shuffling etc. with `--sqlite`
			`- Asynchronous maxi-batch preloading`
Update CHANGELOG.md 2018-01-24 02:08:07 +03:00			`- Using transpose in SGEMM to tie embeddings in output layer`
update changelog 2017-11-21 12:45:46 +03:00
update changelog 2018-01-19 18:58:16 +03:00			`## [1.2.1] - 2018-01-19`

			`### Fixed`
Update CHANGELOG 2018-08-10 12:35:35 +03:00			`- Use valid-mini-batch size during validation with "translation" instead of`
			`mini-batch`
update changelog 2018-01-19 18:58:16 +03:00			`- Normalize gradients with multi-gpu synchronous SGD`
			`- Fix divergence between saved models and validated models in asynchronous SGD`

Updated to v1.2.0 2018-01-13 23:50:46 +03:00			`## [1.2.0] - 2018-01-13`

Update CHANGELOG 2018-01-10 13:59:53 +03:00			`### Added`
Update CHANGELOG 2018-01-13 16:19:09 +03:00			- Option `--pretrained-model` to be used for network weights initialization
			`with a pretrained model`
Update CHANGELOG 2018-01-13 00:19:37 +03:00			`- Version number saved in the model file`
Update CHANGELOG 2018-01-10 13:59:53 +03:00			- CMake option `-DCOMPILE_SERVER=ON`
Updated to v1.2.0 2018-01-13 23:50:46 +03:00			- Right-to-left training, scoring, decoding with `--right-left`
Update CHANGELOG 2018-01-10 13:59:53 +03:00
			`### Fixed`
			`- Fixed marian-server compilation with Boost 1.66`
Updated to v1.2.0 2018-01-13 23:50:46 +03:00			`- Fixed compilation on g++-4.8.4`
			`- Fixed compilation without marian-server if openssl is not available`
Update CHANGELOG 2018-01-10 13:59:53 +03:00
update to v1.1.3 2017-12-06 20:03:41 +03:00			`## [1.1.3] - 2017-12-06`

add gradient dropping to changelog 2017-12-06 20:05:09 +03:00			`### Added`
			`- Added back gradient-dropping`

update to v1.1.3 2017-12-06 20:03:41 +03:00			`### Fixed`
Update CHANGELOG 2018-01-10 13:59:53 +03:00			- Fixed parameters initialization for `--tied-embeddings` during translation
update to v1.1.3 2017-12-06 20:03:41 +03:00
update to v1.1.2 2017-12-05 21:35:19 +03:00			`## [1.1.2] - 2017-12-05`

			`### Fixed`
			`- Fixed ensembling with language model and batched decoding`
Update CHANGELOG 2018-01-10 13:59:53 +03:00			`- Fixed attention reduction kernel with large matrices (added missing`
			`syncthreads()`), which should fix stability with large batches and beam-size
Update CHANGELOG 2018-08-10 12:35:35 +03:00			`during batched decoding`
update to v1.1.2 2017-12-05 21:35:19 +03:00
v1.1.1 2017-11-30 13:06:04 +03:00			`## [1.1.1] - 2017-11-30`

option max-length-crop 2017-11-26 04:01:49 +03:00			`### Added`
Update CHANGELOG 2018-01-10 13:59:53 +03:00			- Option `--max-length-crop` to be used together with `--max-length N` to crop
			`sentences to length N rather than omitting them.`
v1.1.1 2017-11-30 13:06:04 +03:00			`- Experimental model with convolution over input characters`

update to v1.1.2 2017-12-05 21:35:19 +03:00			`### Fixed`
v1.1.1 2017-11-30 13:06:04 +03:00			`- Fixed a number of bugs for vocabulary and directory handling`
option max-length-crop 2017-11-26 04:01:49 +03:00
update changelog 2017-11-21 12:45:46 +03:00			`## [1.1.0] - 2017-11-21`
Changelog 2017-10-14 20:52:56 +03:00
Merge with 'master' branch 2017-10-18 12:02:53 +03:00			`### Added`
update changelog 2017-11-21 12:45:46 +03:00			`- Batched translation for all model types, significant translation speed-up`
			`- Batched translation during validation with translation`
			- `--maxi-batch-sort` option for `marian-decoder`
			`- Support for CUBLAS_TENSOR_OP_MATH mode for cublas in cuda 9.0`
Add missing marian-vocab do CHANGELOG 2017-11-21 14:00:15 +03:00			`- The "marian-vocab" tool to create vocabularies`
Merge with 'master' branch 2017-10-18 12:02:53 +03:00
Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`## [1.0.0] - 2017-11-13`
Update changelog 2017-10-15 17:40:35 +03:00
Changelog 2017-10-14 20:52:56 +03:00			`### Added`
update changelog 2017-11-05 11:19:24 +03:00			`- Multi-gpu validation, scorer and in-training translation`
			`- summary-mode for scorer`
Update changelog 2017-10-15 17:40:35 +03:00			`- New "transformer" model based on [Attention is all you`
			`need](https://arxiv.org/abs/1706.03762)`
			`- Options specific for the transformer model`
Changelog 2017-10-14 20:52:56 +03:00			`- Linear learning rate warmup with and without initial value`
			`- Cyclic learning rate warmup`
Update changelog 2017-10-15 17:40:35 +03:00			`- More options for learning rate decay, including: optimizer history reset,`
			`repeated warmup`
Update CHANGELOG 2018-01-10 13:59:53 +03:00			- Continuous inverted square root decay of learning (`--lr-decay-inv-sqrt`)
			`rate based on number of updates`
Changelog 2017-10-14 20:52:56 +03:00			`- Exposed optimizer parameters (e.g. momentum etc. for Adam)`
Merge with 'master' branch 2017-10-18 12:02:53 +03:00			- Version of deep RNN-based models compatible with Nematus (`--type nematus`)
			- Synchronous SGD training for multi-gpu (enable with `--sync-sgd`)
Changelog 2017-10-14 20:52:56 +03:00			`- Dynamic construction of complex models with different encoders and decoders,`
Update changelog 2017-10-15 17:40:35 +03:00			`currently only available through the C++ API`
Add missing marian-vocab do CHANGELOG 2017-11-21 14:00:15 +03:00			- Option `--quiet` to suppress output to stderr
Update changelog 2017-10-15 17:40:35 +03:00			`- Option to choose different variants of optimization criterion: mean`
Update CHANGELOG 2018-01-10 13:59:53 +03:00			`cross-entropy, perplexity, cross-entropy sum`
Update changelog 2017-10-15 17:40:35 +03:00			`- In-process translation for validation, uses the same memory as training`
			`- Label Smoothing`
Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`- CHANGELOG.md`
			`- CONTRIBUTING.md`
swish 2017-10-22 22:18:30 +03:00			`- Swish activation function default for Transformer`
Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`(https://arxiv.org/pdf/1710.05941.pdf)`
Changelog 2017-10-14 20:52:56 +03:00
			`### Changed`
Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`- Changed shape organization to follow numpy.`
Update CHANGELOG 2018-08-08 19:15:17 +03:00			- Changed option `--moving-average` to `--exponential-smoothing` and inverted
			formula to `s_t = (1 - \alpha) * s_{t-1} + \alpha * x_t`, `\alpha` is now
Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`1-e4` by default
update changelog 2017-11-05 11:19:24 +03:00			`- Got rid of thrust for compile-time mathematical expressions`
			- Changed boolean option `--normalize` to `--normalize [arg=1] (=0)`. New
Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`behaviour is backwards-compatible and can also be specified as`
			`--normalize=0.6`
update changelog 2017-10-17 16:22:42 +03:00			`- Renamed "s2s" binary to "marian-decoder"`
			`- Renamed "rescorer" binary to "marian-scorer"`
Merge with 'master' branch 2017-10-18 12:02:53 +03:00			`- Renamed "server" binary to "marian-server"`
update changelog 2017-10-17 16:22:42 +03:00			- Renamed option name `--dynamic-batching` to `--mini-batch-fit`
Update changelog 2017-10-15 17:40:35 +03:00			`- Unified cross-entropy-based validation, supports now perplexity and other CE`
Merge with 'master' branch 2017-10-18 12:02:53 +03:00			- Changed `--normalize (bool)` to `--normalize (float)arg`, allow to change
Update CHANGELOG 2018-08-10 12:35:35 +03:00			length normalization weight as `score / pow(length, arg)`
Changelog 2017-10-14 20:52:56 +03:00
			`### Removed`
update changelog 2017-10-17 16:22:42 +03:00			- Temporarily removed gradient dropping (`--drop-rate X`) until refactoring.