2017-10-14 20:52:56 +03:00
# Changelog
2017-10-15 17:40:35 +03:00
2017-10-14 20:52:56 +03:00
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog ](http://keepachangelog.com/en/1.0.0/ )
and this project adheres to [Semantic Versioning ](http://semver.org/spec/v2.0.0.html ).
2019-12-13 06:17:16 +03:00
2017-10-14 20:52:56 +03:00
## [Unreleased]
2018-01-24 10:10:34 +03:00
2021-02-07 02:35:16 +03:00
## [1.10.0] - 2021-02-06
2020-04-11 18:04:20 +03:00
### Added
2021-01-25 03:02:30 +03:00
- Added `intgemm8(ssse3|avx|avx512)?` , `intgemm16(sse2|avx|avx512)?` types to marian-conv with uses intgemm backend. Types intgemm8 and intgemm16 are hardware-agnostic, the other ones hardware-specific.
- Shortlist is now always multiple-of-eight.
- Added intgemm 8/16bit integer binary architecture agnostic format.
2020-11-07 20:46:39 +03:00
- Add --train-embedder-rank for fine-tuning any encoder(-decoder) model for multi-lingual similarity via softmax-margin loss
2020-11-04 20:31:59 +03:00
- Add --logical-epoch that allows to redefine the displayed epoch counter as a multiple of n data epochs, updates or labels. Also allows to define width of fractional part with second argument.
2020-11-02 18:47:46 +03:00
- Add --metrics chrf for computing ChrF according to https://www.aclweb.org/anthology/W15-3049/ and SacreBLEU reference implementation
2020-10-29 23:16:19 +03:00
- Add --after option which is meant to replace --after-batches and --after-epochs and can take label based criteria
2020-09-09 18:06:20 +03:00
- Add --transformer-postprocess-top option to enable correctly normalized prenorm behavior
- Add --task transformer-base-prenorm and --task transformer-big-prenorm
2020-09-01 18:56:24 +03:00
- Turing and Ampere GPU optimisation support, if the CUDA version supports it.
2020-08-04 18:15:49 +03:00
- Printing word-level scores in marian-scorer
Optimize CPU LayerNormalization 6x with -ffast-math and lifting branches (#689)
* Optimize LayerNormalization with -ffast-math and lifting branches
* LayerNormalization changelog
gprof:
Now
1.65 30.11 0.57 void marian::cpu::LayerNormalizationImpl<1, 1, true>(float*, float const*, float const*, float const*, float, int, int)
Baseline
9.08 22.31 3.49 marian::cpu::LayerNormalization(IntrusivePtr<marian::TensorBase>, IntrusivePtr<marian::TensorBase>, IntrusivePtr<marian::TensorBase>, IntrusivePtr<marian::TensorBase>, float)
That's 3.49 seconds to 0.57 second
* LayerNormalization: longer comments, @frankseide-style if
2020-08-02 06:41:56 +03:00
- Optimize LayerNormalization on CPU by 6x through vectorization (ffast-math) and fixing performance regression introduced with strides in 77a420
2020-07-26 19:39:12 +03:00
- Decoding multi-source models in marian-server with --tsv
2020-07-26 16:43:23 +03:00
- GitHub workflows on Ubuntu, Windows, and MacOS
2020-05-22 06:10:33 +03:00
- LSH indexing to replace short list
- ONNX support for transformer models
2020-05-15 07:25:29 +03:00
- Add topk operator like PyTorch's topk
2020-05-14 18:00:41 +03:00
- Use *cblas_sgemm_batch* instead of a for loop of *cblas_sgemm* on CPU as the batched_gemm implementation
2020-04-12 20:56:11 +03:00
- Supporting relative paths in shortlist and sqlite options
2020-04-11 18:04:20 +03:00
- Training and scoring from STDIN
2020-04-12 20:56:11 +03:00
- Support for reading from TSV files from STDIN and other sources during training
2020-04-11 19:45:57 +03:00
and translation with options --tsv and --tsv-fields n.
2020-07-30 21:27:19 +03:00
- Internal optional parameter in n-best list generation that skips empty hypotheses.
2020-11-05 01:25:40 +03:00
- Quantized training (fixed point or log-based quantization) with --quantize-bits N command
2021-01-07 13:41:55 +03:00
- Support for using Apple Accelerate as the BLAS library
2020-04-11 19:45:57 +03:00
### Fixed
2020-11-11 03:38:37 +03:00
- Segfault of spm_train when compiled with -DUSE_STATIC_LIBS=ON seems to have gone away with update to newer SentencePiece version.
2020-10-26 22:26:41 +03:00
- Fix bug causing certain reductions into scalars to be 0 on the GPU backend. Removed unnecessary warp shuffle instructions.
2020-11-02 18:47:46 +03:00
- Do not apply dropout in embeddings layers during inference with dropout-src/trg
2020-09-03 13:47:10 +03:00
- Print "server is listening on port" message after it is accepting connections
2020-07-26 22:19:05 +03:00
- Fix compilation without BLAS installed
2020-07-26 19:35:30 +03:00
- Providing a single value to vector-like options using the equals sign, e.g. --models=model.npz
2020-07-26 22:17:50 +03:00
- Fix quiet-translation in marian-server
2020-07-26 16:43:23 +03:00
- CMake-based compilation on Windows
2020-07-17 20:42:26 +03:00
- Fix minor issues with compilation on MacOS
2020-07-17 13:27:17 +03:00
- Fix warnings in Windows MSVC builds using CMake
2020-04-27 12:34:10 +03:00
- Fix building server with Boost 1.72
2020-04-14 03:31:06 +03:00
- Make mini-batch scaling depend on mini-batch-words and not on mini-batch-words-ref
2020-04-11 19:45:57 +03:00
- In concatenation make sure that we do not multiply 0 with nan (which results in nan)
- Change Approx.epsilon(0.01) to Approx.margin(0.001) in unit tests. Tolerance is now
absolute and not relative. We assumed incorrectly that epsilon is absolute tolerance.
2020-07-21 13:32:08 +03:00
- Fixed bug in finding .git/logs/HEAD when Marian is a submodule in another project.
- Properly record cmake variables in the cmake build directory instead of the source tree.
2020-05-13 19:06:25 +03:00
- Added default "none" for option shuffle in BatchGenerator, so that it works in executables where shuffle is not an option.
2020-07-29 23:12:10 +03:00
- Added a few missing header files in shortlist.h and beam_search.h.
2020-09-01 07:13:41 +03:00
- Improved handling for receiving SIGTERM during training. By default, SIGTERM triggers 'save (now) and exit'. Prior to this fix, batch pre-fetching did not check for this sigal, potentially delaying exit considerably. It now pays attention to that. Also, the default behaviour of save-and-exit can now be disabled on the command line with --sigterm exit-immediately.
2021-01-07 16:12:36 +03:00
- Fix the runtime failures for FASTOPT on 32-bit builds (wasm just happens to be 32-bit) because it uses hashing with an inconsistent mix of uint64_t and size_t.
2020-04-11 18:04:20 +03:00
2020-03-13 06:49:31 +03:00
### Changed
2021-01-25 03:02:30 +03:00
- Remove `--clip-gemm` which is obsolete and was never used anyway
- Removed `--optimize` switch, instead we now determine compute type based on binary model.
2021-01-07 13:41:55 +03:00
- Updated SentencePiece repository to version 8336bbd0c1cfba02a879afe625bf1ddaf7cd93c5 from https://github.com/google/sentencepiece.
- Enabled compilation of SentencePiece by default since no dependency on protobuf anymore.
2020-11-11 03:38:37 +03:00
- Changed default value of --sentencepiece-max-lines from 10000000 to 2000000 since apparently the new version doesn't sample automatically anymore (Not quite clear how that affects quality of the vocabulary).
2020-11-10 20:08:47 +03:00
- Change mini-batch-fit search stopping criterion to stop at ideal binary search threshold.
2020-11-02 18:47:46 +03:00
- --metric bleu now always detokenizes SacreBLEU-style if a vocabulary knows how to, use bleu-segmented to compute BLEU on word ids. bleu-detok is now a synonym for bleu.
2020-11-01 21:54:24 +03:00
- Move label-smoothing computation into Cross-entropy node
2020-04-27 12:34:10 +03:00
- Move Simple-WebSocket-Server to submodule
2020-04-16 13:15:42 +03:00
- Python scripts start with #!/usr/bin/env python3 instead of python
2020-03-14 19:53:54 +03:00
- Changed compile flags -Ofast to -O3 and remove --ffinite-math
- Moved old graph groups to depracated folder
2020-03-13 06:49:31 +03:00
- Make cublas and cusparse handle inits lazy to save memory when unused
2021-01-06 14:30:22 +03:00
- Replaced exception-based implementation for type determination in FastOpt::makeScalar
2020-03-13 06:49:31 +03:00
2020-03-10 21:34:07 +03:00
## [1.9.0] - 2020-03-10
2019-09-10 02:39:36 +03:00
### Added
2020-03-10 20:29:50 +03:00
- An option to print cached variables from CMake
2020-03-07 07:46:11 +03:00
- Add support for compiling on Mac (and clang)
2020-01-30 00:08:45 +03:00
- An option for resetting stalled validation metrics
2020-01-06 22:14:00 +03:00
- Add CMAKE options to disable compilation for specific GPU SM types
2020-01-04 06:10:21 +03:00
- An option to print word-level translation scores
- An option to turn off automatic detokenization from SentencePiece
2019-12-23 23:10:30 +03:00
- Separate quantization types for 8-bit FBGEMM for AVX2 and AVX512
2019-12-13 21:55:36 +03:00
- Sequence-level unliklihood training
2019-12-13 06:17:16 +03:00
- Allow file name templated valid-translation-output files
2019-12-06 06:41:36 +03:00
- Support for lexical shortlists in marian-server
2019-12-04 07:32:50 +03:00
- Support for 8-bit matrix multiplication with FBGEMM
2019-11-26 06:20:03 +03:00
- CMakeLists.txt now looks for SSE 4.2
2019-11-12 08:13:15 +03:00
- Purging of finished hypotheses during beam-search. A lot faster for large batches.
- Faster option look-up, up to 20-30% faster translation
2019-11-02 02:37:12 +03:00
- Added --cite and --authors flag
- Added optional support for ccache
2019-11-02 02:33:29 +03:00
- Switch to change abort to exception, only to be used in library mode
- Support for 16-bit packed models with FBGEMM
- Multiple separated parameter types in ExpressionGraph, currently inference-only
2019-10-28 01:02:18 +03:00
- Safe handling of sigterm signal
2020-04-10 23:01:56 +03:00
- Automatic vectorization of elementwise operations on CPU for tensors dims that
2019-11-12 08:13:15 +03:00
are divisible by 4 (AVX) and 8 (AVX2)
2020-04-10 23:01:56 +03:00
- Replacing std::shared_ptr< T > with custom IntrusivePtr< T > for small objects like
2019-09-10 02:39:36 +03:00
Tensors, Hypotheses and Expressions.
2019-10-26 18:53:57 +03:00
- Fp16 inference working for translation
2019-09-12 00:28:14 +03:00
- Gradient-checkpointing
2019-09-10 02:39:36 +03:00
2019-09-08 09:05:37 +03:00
### Fixed
2020-04-10 23:01:56 +03:00
- Replace value for INVALID_PATH_SCORE with std::numer_limits< float > ::lowest()
2020-03-07 07:32:30 +03:00
to avoid overflow with long sequences
- Break up potential circular references for GraphGroup*
2020-01-18 00:52:33 +03:00
- Fix empty source batch entries with batch purging
2020-01-11 23:31:34 +03:00
- Clear RNN chache in transformer model, add correct hash functions to nodes
2019-12-13 21:55:36 +03:00
- Gather-operation for all index sizes
2019-12-13 06:17:16 +03:00
- Fix word weighting with max length cropping
2019-12-05 21:44:57 +03:00
- Fixed compilation on CPUs without support for AVX
2019-11-27 08:24:22 +03:00
- FastOpt now reads "n" and "y" values as strings, not as boolean values
2019-11-26 04:48:07 +03:00
- Fixed multiple reduction kernels on GPU
2019-11-26 04:49:34 +03:00
- Fixed guided-alignment training with cross-entropy
2020-04-10 23:01:56 +03:00
- Replace IntrusivePtr with std::uniq_ptr in FastOpt, fixes random segfaults
2019-11-12 01:04:19 +03:00
due to thread-non-safty of reference counting.
2019-11-11 21:56:12 +03:00
- Make sure that items are 256-byte aligned during saving
- Make explicit matmul functions respect setting of cublasMathMode
- Fix memory mapping for mixed paramter models
2019-11-06 02:18:45 +03:00
- Removed naked pointer and potential memory-leak from file_stream.{cpp,h}
- Compilation for GCC >= 7 due to exception thrown in destructor
2020-04-10 23:01:56 +03:00
- Sort parameters by lexicographical order during allocation to ensure consistent
2019-09-10 02:39:36 +03:00
memory-layout during allocation, loading, saving.
2020-04-10 23:01:56 +03:00
- Output empty line when input is empty line. Previous behavior might result in
2019-09-08 09:05:37 +03:00
hallucinated outputs.
2019-09-10 17:25:05 +03:00
- Compilation with CUDA 10.1
2019-09-08 09:05:37 +03:00
2019-09-12 00:28:14 +03:00
### Changed
2020-03-07 07:46:11 +03:00
- Combine two for-loops in nth_element.cpp on CPU
2020-01-11 23:31:34 +03:00
- Revert LayerNorm eps to old position, i.e. sigma' = sqrt(sigma^2 + eps)
2020-01-06 02:16:13 +03:00
- Downgrade NCCL to 2.3.7 as 2.4.2 is buggy (hangs with larger models)
2019-11-26 06:20:03 +03:00
- Return error signal on SIGTERM
2019-11-26 04:48:07 +03:00
- Dropped support for CUDA 8.0, CUDA 9.0 is now minimal requirement
2019-11-02 02:33:29 +03:00
- Removed autotuner for now, will be switched back on later
2020-04-10 23:01:56 +03:00
- Boost depdendency is now optional and only required for marian_server
2019-11-12 08:13:15 +03:00
- Dropped support for g++-4.9
2019-10-26 19:04:18 +03:00
- Simplified file stream and temporary file handling
2019-09-12 00:28:14 +03:00
- Unified node intializers, same function API.
2020-03-07 07:32:30 +03:00
- Remove overstuff/understuff code
2019-09-12 00:28:14 +03:00
2019-09-04 23:50:39 +03:00
## [1.8.0] - 2019-09-04
2018-12-08 00:39:27 +03:00
### Added
2019-08-14 22:02:22 +03:00
- Alias options and new --task option
- Automatic detection of CPU intrisics when building with -arch=native
2019-01-25 04:22:44 +03:00
- First version of BERT-training and BERT-classifier, currently not compatible with TF models
2019-01-25 05:55:01 +03:00
- New reduction operators
2019-08-14 22:02:22 +03:00
- Use Cmake's ExternalProject to build NCCL and potentially other external libs
2019-09-04 23:50:39 +03:00
- Code for Factored Vocabulary, currently not usable yet without outside tools
2018-12-08 00:39:27 +03:00
2018-11-29 06:30:27 +03:00
### Fixed
2019-08-14 22:02:22 +03:00
- Issue with relative paths in automatically generated decoder config files
- Bug with overlapping CXX flags and building spm_train executable
- Compilation with gcc 8
- Overwriting and unsetting vector options
2018-12-13 04:30:18 +03:00
- Windows build with recent changes
- Bug with read-ahead buffer
2019-08-14 22:02:22 +03:00
- Handling of "dump-config: false" in YAML config
2018-11-29 06:30:27 +03:00
- Errors due to warnings
2019-08-14 22:02:22 +03:00
- Issue concerning failed saving with single GPU training and --sync-sgd option.
- NaN problem when training with Tensor Cores on Volta GPUs
2019-09-04 23:50:39 +03:00
- Fix pipe-handling
- Fix compilation with GCC 9.1
- Fix CMake build types
2018-11-29 06:30:27 +03:00
### Changed
2019-08-14 22:02:22 +03:00
- Error message when using left-to-right and right-to-left models together in ensembles
- Regression tests included as a submodule
2019-02-05 10:14:10 +03:00
- Update NCCL to 2.4.2
2018-12-08 01:46:48 +03:00
- Add zlib source to Marian's source tree, builds now as object lib
2018-12-08 00:39:27 +03:00
- -DUSE_STATIC_LIBS=on now also looks for static versions of CUDA libraries
- Include NCCL build from github.com/marian-nmt/nccl and compile within source tree
2019-08-14 22:02:22 +03:00
- Set nearly all warnings as errors for Marian's own targets. Disable warnings for 3rd party
2019-09-04 23:50:39 +03:00
- Refactored beam search
2018-11-29 06:30:27 +03:00
2018-11-27 22:25:48 +03:00
## [1.7.0] - 2018-11-27
2018-08-10 12:35:35 +03:00
### Added
2018-08-17 15:16:00 +03:00
- Word alignment generation in scorer
- Attention output generation in decoder and scorer with `--alignment soft`
2018-11-10 12:56:27 +03:00
- Support for SentencePiece vocabularies and run-time segmentation/desegmentation
2018-11-27 22:25:48 +03:00
- Support for SentencePiece vocabulary training during model training
- Group training files by filename when creating vocabularies for joint vocabularies
- Updated examples
- Synchronous multi-node training (early version)
2018-08-10 12:35:35 +03:00
2018-08-16 13:08:52 +03:00
### Fixed
- Delayed output in line-by-line translation
2018-08-17 15:16:00 +03:00
### Changed
- Generated word alignments include alignments for target EOS tokens
2018-08-28 16:55:31 +03:00
- Boost::program_options has been replaced by another CLI library
2018-11-10 12:56:27 +03:00
- Replace boost::file_system with Pathie
2018-08-28 16:55:31 +03:00
- Expansion of unambiguous command-line arguments is no longer supported
2018-08-17 15:16:00 +03:00
2018-08-09 02:21:04 +03:00
## [1.6.0] - 2018-08-08
2018-06-19 15:06:40 +03:00
### Added
2018-06-24 02:09:05 +03:00
- Faster training (20-30%) by optimizing gradient popagation of biases
2018-08-10 12:35:35 +03:00
- Returning Moses-style hard alignments during decoding single models,
ensembles and n-best lists
2018-07-08 22:53:49 +03:00
- Hard alignment extraction strategy taking source words that have the
attention value greater than the threshold
2018-07-01 10:19:28 +03:00
- Refactored sync sgd for easier communication and integration with NCCL
- Smaller memory-overhead for sync-sgd
- NCCL integration (version 2.2.13)
2018-08-08 19:15:17 +03:00
- New binary format for saving/load of models, can be used with _*.bin_
extension (can be memory mapped)
- Memory-mapping of graphs for inferece with `ExpressionGraph::mmap(const void*
ptr)` function. (assumes _*.bin_ model is mapped or in buffer)
- Added SRU (--dec-cell sru) and ReLU (--dec-cell relu) cells to inventory of
2018-08-10 12:35:35 +03:00
RNN cells
2018-08-08 19:15:17 +03:00
- RNN auto-regression layers in transformer (`--transformer-decoder-autreg
2018-08-10 12:35:35 +03:00
rnn`), work with gru, lstm, tanh, relu, sru cells
2018-08-08 19:15:17 +03:00
- Recurrently stacked layers in transformer (`--transformer-tied-layers 1 1 1 2
2 2` means 6 layers with 1-3 and 4-6 tied parameters, two groups of
parameters)
2018-08-10 12:35:35 +03:00
- Seamless training continuation with exponential smoothing
2018-06-19 15:06:40 +03:00
2018-06-28 00:09:55 +03:00
### Fixed
2018-08-10 12:35:35 +03:00
- A couple of bugs in "selection" (transpose, shift, cols, rows) operators
during back-prob for a very specific case: one of the operators is the first
operator after a branch, in that case gradient propgation might be
interrupted. This did not affect any of the existing models as such a case
was not present, but might have caused future models to not train properly
- Bug in mini-batch-fit, tied embeddings would result in identical embeddings
in fake source and target batch. Caused under-estimation of memory usage and
re-allocation
2018-06-28 00:09:55 +03:00
2018-06-19 15:06:40 +03:00
## [1.5.0] - 2018-06-17
### Added
- Average Attention Networks for Transformer model
- 16-bit matrix multiplication on CPU
- Memoization for constant nodes for decoding
- Autotuning for decoding
### Fixed
- GPU decoding optimizations, about 2x faster decoding of transformer models
- Multi-node MPI-based training on GPUs
2018-03-13 20:02:59 +03:00
## [1.4.0] - 2018-03-13
2018-02-10 21:56:37 +03:00
2018-02-09 19:03:49 +03:00
### Added
- Data weighting with `--data-weighting` at sentence or word level
- Persistent SQLite3 corpus storage with `--sqlite file.db`
2018-02-14 13:21:35 +03:00
- Experimental multi-node asynchronous training
2018-02-19 17:55:39 +03:00
- Restoring optimizer and training parameters such as learning rate, validation
results, etc.
2018-03-13 20:02:59 +03:00
- Experimental multi-CPU training/translation/scoring with `--cpu-threads=N`
2018-02-25 16:48:38 +03:00
- Restoring corpus iteration after training is restarted
2018-03-13 20:02:59 +03:00
- N-best-list scoring in marian-scorer
2018-02-09 19:03:49 +03:00
2018-02-10 21:56:37 +03:00
### Fixed
- Deterministic data shuffling with specific seed for SQLite3 corpus storage
2018-02-25 07:44:38 +03:00
- Mini-batch fitting with binary search for faster fitting
2018-03-13 20:02:59 +03:00
- Better batch packing due to sorting
2018-02-10 21:56:37 +03:00
2018-02-19 17:55:39 +03:00
2018-02-05 05:30:36 +03:00
## [1.3.1] - 2018-02-04
### Fixed
2018-02-25 16:48:38 +03:00
- Missing final validation when done with training
- Differing summaries for marian-scorer when used with multiple GPUs
2018-02-05 05:30:36 +03:00
2018-01-24 10:10:34 +03:00
## [1.3.0] - 2018-01-24
### Added
2018-02-09 19:03:49 +03:00
- SQLite3 based corpus storage for on-disk shuffling etc. with `--sqlite`
- Asynchronous maxi-batch preloading
2018-01-24 02:08:07 +03:00
- Using transpose in SGEMM to tie embeddings in output layer
2017-11-21 12:45:46 +03:00
2018-01-19 18:58:16 +03:00
## [1.2.1] - 2018-01-19
### Fixed
2018-08-10 12:35:35 +03:00
- Use valid-mini-batch size during validation with "translation" instead of
mini-batch
2018-01-19 18:58:16 +03:00
- Normalize gradients with multi-gpu synchronous SGD
- Fix divergence between saved models and validated models in asynchronous SGD
2018-01-13 23:50:46 +03:00
## [1.2.0] - 2018-01-13
2018-01-10 13:59:53 +03:00
### Added
2018-01-13 16:19:09 +03:00
- Option `--pretrained-model` to be used for network weights initialization
with a pretrained model
2018-01-13 00:19:37 +03:00
- Version number saved in the model file
2018-01-10 13:59:53 +03:00
- CMake option `-DCOMPILE_SERVER=ON`
2018-01-13 23:50:46 +03:00
- Right-to-left training, scoring, decoding with `--right-left`
2018-01-10 13:59:53 +03:00
### Fixed
- Fixed marian-server compilation with Boost 1.66
2018-01-13 23:50:46 +03:00
- Fixed compilation on g++-4.8.4
- Fixed compilation without marian-server if openssl is not available
2018-01-10 13:59:53 +03:00
2017-12-06 20:03:41 +03:00
## [1.1.3] - 2017-12-06
2017-12-06 20:05:09 +03:00
### Added
- Added back gradient-dropping
2017-12-06 20:03:41 +03:00
### Fixed
2018-01-10 13:59:53 +03:00
- Fixed parameters initialization for `--tied-embeddings` during translation
2017-12-06 20:03:41 +03:00
2017-12-05 21:35:19 +03:00
## [1.1.2] - 2017-12-05
### Fixed
- Fixed ensembling with language model and batched decoding
2018-01-10 13:59:53 +03:00
- Fixed attention reduction kernel with large matrices (added missing
`syncthreads()` ), which should fix stability with large batches and beam-size
2018-08-10 12:35:35 +03:00
during batched decoding
2017-12-05 21:35:19 +03:00
2017-11-30 13:06:04 +03:00
## [1.1.1] - 2017-11-30
2017-11-26 04:01:49 +03:00
### Added
2018-01-10 13:59:53 +03:00
- Option `--max-length-crop` to be used together with `--max-length N` to crop
sentences to length N rather than omitting them.
2017-11-30 13:06:04 +03:00
- Experimental model with convolution over input characters
2017-12-05 21:35:19 +03:00
### Fixed
2017-11-30 13:06:04 +03:00
- Fixed a number of bugs for vocabulary and directory handling
2017-11-26 04:01:49 +03:00
2017-11-21 12:45:46 +03:00
## [1.1.0] - 2017-11-21
2017-10-14 20:52:56 +03:00
2017-10-18 12:02:53 +03:00
### Added
2017-11-21 12:45:46 +03:00
- Batched translation for all model types, significant translation speed-up
- Batched translation during validation with translation
- `--maxi-batch-sort` option for `marian-decoder`
- Support for CUBLAS_TENSOR_OP_MATH mode for cublas in cuda 9.0
2017-11-21 14:00:15 +03:00
- The "marian-vocab" tool to create vocabularies
2017-10-18 12:02:53 +03:00
2017-11-13 15:27:52 +03:00
## [1.0.0] - 2017-11-13
2017-10-15 17:40:35 +03:00
2017-10-14 20:52:56 +03:00
### Added
2017-11-05 11:19:24 +03:00
- Multi-gpu validation, scorer and in-training translation
- summary-mode for scorer
2017-10-15 17:40:35 +03:00
- New "transformer" model based on [Attention is all you
need](https://arxiv.org/abs/1706.03762)
- Options specific for the transformer model
2017-10-14 20:52:56 +03:00
- Linear learning rate warmup with and without initial value
- Cyclic learning rate warmup
2017-10-15 17:40:35 +03:00
- More options for learning rate decay, including: optimizer history reset,
repeated warmup
2018-01-10 13:59:53 +03:00
- Continuous inverted square root decay of learning (`--lr-decay-inv-sqrt`)
rate based on number of updates
2017-10-14 20:52:56 +03:00
- Exposed optimizer parameters (e.g. momentum etc. for Adam)
2017-10-18 12:02:53 +03:00
- Version of deep RNN-based models compatible with Nematus (`--type nematus`)
- Synchronous SGD training for multi-gpu (enable with `--sync-sgd` )
2017-10-14 20:52:56 +03:00
- Dynamic construction of complex models with different encoders and decoders,
2017-10-15 17:40:35 +03:00
currently only available through the C++ API
2017-11-21 14:00:15 +03:00
- Option `--quiet` to suppress output to stderr
2017-10-15 17:40:35 +03:00
- Option to choose different variants of optimization criterion: mean
2018-01-10 13:59:53 +03:00
cross-entropy, perplexity, cross-entropy sum
2017-10-15 17:40:35 +03:00
- In-process translation for validation, uses the same memory as training
- Label Smoothing
2017-11-13 15:27:52 +03:00
- CHANGELOG.md
- CONTRIBUTING.md
2017-10-22 22:18:30 +03:00
- Swish activation function default for Transformer
2017-11-13 15:27:52 +03:00
(https://arxiv.org/pdf/1710.05941.pdf)
2017-10-14 20:52:56 +03:00
### Changed
2017-11-13 15:27:52 +03:00
- Changed shape organization to follow numpy.
2018-08-08 19:15:17 +03:00
- Changed option `--moving-average` to `--exponential-smoothing` and inverted
formula to `s_t = (1 - \alpha) * s_{t-1} + \alpha * x_t` , `\alpha` is now
2017-11-13 15:27:52 +03:00
`1-e4` by default
2017-11-05 11:19:24 +03:00
- Got rid of thrust for compile-time mathematical expressions
- Changed boolean option `--normalize` to `--normalize [arg=1] (=0)` . New
2017-11-13 15:27:52 +03:00
behaviour is backwards-compatible and can also be specified as
`--normalize=0.6`
2017-10-17 16:22:42 +03:00
- Renamed "s2s" binary to "marian-decoder"
- Renamed "rescorer" binary to "marian-scorer"
2017-10-18 12:02:53 +03:00
- Renamed "server" binary to "marian-server"
2017-10-17 16:22:42 +03:00
- Renamed option name `--dynamic-batching` to `--mini-batch-fit`
2017-10-15 17:40:35 +03:00
- Unified cross-entropy-based validation, supports now perplexity and other CE
2017-10-18 12:02:53 +03:00
- Changed `--normalize (bool)` to `--normalize (float)arg` , allow to change
2018-08-10 12:35:35 +03:00
length normalization weight as `score / pow(length, arg)`
2017-10-14 20:52:56 +03:00
### Removed
2017-10-17 16:22:42 +03:00
- Temporarily removed gradient dropping (`--drop-rate X`) until refactoring.