Commit Graph

4398 Commits

Author SHA1 Message Date
Roman Grundkiewicz
c343ced9e1 Add lexical shortlists to marian-server (#560)
* Add lexical shortlists to marian-server

* Update CHANGELOG
2019-12-05 19:41:36 -08:00
Kenneth Heafield
734a8791dd Use __AVX__ compiler define instead of custom and broken NO_AVX flag. Fixes #559 (#564)
* Change to __AVX__ macro defined by compiler
* Source files should not be executable
2019-12-05 14:17:54 -08:00
Marcin Junczys-Dowmunt
6224cb16e2 bump patch 2019-12-05 10:44:57 -08:00
Marcin Junczys-Dowmunt
c12fd5b027 Merge branch 'master' of ssh://github.com/marian-nmt/marian-dev into pmaster 2019-12-05 10:31:04 -08:00
Marcin Junczys-Dowmunt
4993417816 Merge branch 'master' into pmaster 2019-12-05 10:30:58 -08:00
Marcin Junczys-Dowmunt
ac20f77287 abort when trying to use packed8 or packed16 without FBGEMM compiled into Marian 2019-12-05 10:30:27 -08:00
Nikolay Bogoychev
34e99da49b Fix compilation on CPUs that don't support AVX and some white space i… (#561)
* Fix compilation on CPUs that don't support AVX and some white space issues
* Change from ifdef to ifndef
2019-12-05 08:12:17 -08:00
Marcin Junczys-Dowmunt
6b3c8f5a8c Merge branch 'master' into pmaster 2019-12-04 16:36:21 -08:00
Marcin Junczys-Dowmunt
a6d0af024c bump version 2019-12-04 16:34:51 -08:00
Martin Junczys-Dowmunt
4b4e6b5412 Merged PR 10736: Unify options and type names
Make sure command line options and parameter names match, also use parameters instead of strings for passing gemmType for packing.
2019-12-05 00:33:15 +00:00
Marcin Junczys-Dowmunt
b20d9d7415 Merge branch 'master' into pmaster 2019-12-03 22:57:10 -08:00
Marcin Junczys-Dowmunt
82da7d5219 do not warn for a number of warnings for cpp files from src/3rd_party/.. 2019-12-03 20:32:50 -08:00
Young Jin Kim
67e9bc4957 Fix windows build warning 2019-12-03 12:14:32 -08:00
Young Jin Kim
183d0b8f65 Update fbgemm submodule to the master branch 2019-12-03 11:56:58 -08:00
Young Jin Kim
9c9a240354 Merged PR 10266: FBGEMM based Int8 model
FBGEMM based Int8 model - working with the master
1. Added int8 implementation into packed_gemm.h/cpp with FBGEMM
2. Update FBGEMM library to make it work on windows
3. Split 'ispacked' into packed8 and packed16
4. Change all names for PackFp32 to PackFp16 which is more accurate
2019-12-03 19:14:18 +00:00
Marcin Junczys-Dowmunt
9fd5ba99bf
Update README.md 2019-11-27 19:28:16 -08:00
Marcin Junczys-Dowmunt
f007772a49
Update README.md 2019-11-27 19:27:49 -08:00
Marcin Junczys-Dowmunt
0197b89b43 bump version 2019-11-26 21:24:22 -08:00
Marcin Junczys-Dowmunt
49b54a6180 catch n and y as strings in FastOpt instead of as boolean values 2019-11-26 21:23:38 -08:00
Marcin Junczys-Dowmunt
120ab8fa75 increase tolerance for unit test 2019-11-25 20:51:35 -08:00
Marcin Junczys-Dowmunt
e12a5dbe17 regression-tests/ 2019-11-25 19:20:34 -08:00
Marcin Junczys-Dowmunt
f07042b9c3 bump version based on PRs 2019-11-25 19:20:03 -08:00
Marcin Junczys-Dowmunt
26859d2e15 Merge with ms-internal master 2019-11-25 19:18:03 -08:00
Marcin Junczys-Dowmunt
9e090e3472 bump version 2019-11-25 17:49:34 -08:00
Martin Junczys-Dowmunt
93b7ed80fe Merged PR 10553: Fix multiple problems in reduce kernels that occurred during back-prop
This fixes a number of bugs in our GPU reduce-kernels that would manifest mainly for larger matrices and during back-prop. We also drop support for CUDA 8.0 to be able to take advantage of new GPU primitives introduced by NVidia in CUDA 9.0.
2019-11-26 01:48:07 +00:00
Ulrich Germann
a1763e2d9e Const diligence and thread safety ... (#553)
* Const diligence and thread safety with respect to Vocab and ShortlistGenerator

- Use pointers to const Vocab instead of Vocab in corpus_base.h and shortlist.h
- Declare some member functions of ShortlistGenerator const
- Use pointers to const ShortlistGenerator instead of ShortlistGenerator in
  numerous places.
- Use a thread-local random generator in shortlist.h

* Bug fixes (missing const; instantiate *gen_) in SampledShortlistGenerator.
2019-11-22 20:35:39 -08:00
Frank Seide
b19820c8ba Merged PR 10588: bug fix: guided-alignment loss should not normalize by source length
The guided-alignment loss has two bugs related to two different types of normalization:
* The returned `RationalLoss` should not have its denominator weighted by `guided-alignment-weight`, except in `sum` mode.
* The CE loss cannot directly use the alignment labels as ground truth in the CE loss function, since CE requires one-hot labels, while the actual labels those are multi-hot. They must be renormalized.
2019-11-22 23:18:52 +00:00
Ulrich Germann
a27fda70d1 Return exit code 15 (SIGTERM) after SIGTERM. (#551)
* Return exit code 15 (SIGTERM) after SIGTERM.
When marian receives signal SIGTERM and exits gracefully (save model & exit),
it should then exit with a non-zero exit code, to signal to any parent process
that it did not exit "naturally".

* Added explanatory comment about exiting marian_train with non-zero status after SIGTERM.
2019-11-22 15:02:38 -08:00
Ulrich Germann
76e229308a Merge branch 'master' into ug-const-diligence 2019-11-22 16:09:55 +00:00
Ulrich Germann
61c0195e16 Bug fixes (missing const; instantiate *gen_) in SampledShortlistGenerator. 2019-11-22 15:25:11 +00:00
Marcin Junczys-Dowmunt
d394641275
Merge pull request #546 from alvations/marian-from-origin
Added ssse4.2 support
2019-11-21 21:28:32 -08:00
Ulrich Germann
4c0698fc62 Const diligence and thread safety with respect to Vocab and ShortlistGenerator
- Use pointers to const Vocab instead of Vocab in corpus_base.h and shortlist.h
- Declare some member functions of ShortlistGenerator const
- Use pointers to const ShortlistGenerator instead of ShortlistGenerator in
  numerous places.
- Use a thread-local random generator in shortlist.h
2019-11-22 02:37:35 +00:00
Young Jin Kim
5fb31b28d2 Merged PR 10415: Fix windows build errors
1. Added template definition for 'uint64_t', Windows has different definition of 'long' type.
2. Fixed warnings on windows.
2019-11-13 07:37:46 +00:00
Marcin Junczys-Dowmunt
9a4f784390 merge with internal master 2019-11-11 21:21:17 -08:00
Martin Junczys-Dowmunt
189d89e1dd Merged PR 10333: Batch-pruning in beam search
This PR introduces batch-purging in Marian, i.e. whenever a virtual beam becomes inactive (empty) the entire batch entry that corresponds to that beam can be removed from the encoder and decoder neural states. The CPU-side beam search keeps tracks of the hypotheses as before, but needs to perform mappings between original and shifted batch indices.
2019-11-12 05:13:15 +00:00
Martin Junczys-Dowmunt
9353f065f8 Merged PR 10373: Replace IntrusivePtr with std::uniq_ptr in FastOpt
In FastOpt we do not want to use locking during access, but that makes reference counting not thread-safe. We now use std::unique_ptr to const objects or const references everywhere. This fixes random segfaults with multi-GPU training. @TODO: clean-up option merging to make option generally immutable.
2019-11-11 22:04:19 +00:00
Martin Junczys-Dowmunt
5dfd8a7026 Merged PR 10383: Align items while saving at 256-byte boundary
Align items while saving at 256-byte boundary. Maintains binary-compatibility.
2019-11-11 18:56:12 +00:00
Martin Junczys-Dowmunt
c96d709d58 Merged PR 10376: Fix memory-mapping bug for default parameter-object - [Let's not merge before discussion of comment.]
This fixes the memory increase while doing memory-mapping (we need regression tests for that in Marian, although that particular error would be hard to monitor. Maybe the current way by observing runtime behavior was just correct).

**This PR does:**
* Fixes memory-mapping for default parameter type. The buggy code was not replacing the in-memory parameter set with mmapping for the new default parameter. Now the parameter object is destroyed and replaced with a mmapable version if it is not mmapable.
* Add small cross-system library (mio) for memory-mapping, currently only to enable easy diagnostics of mmap on Linux. to be extended to a proper feature.

**Comments:**
* It seems we need to re-binarize the models for correct mmap behavior. Since we have identified a smaller bug in the binary file (incorrect byte-boundary alignment) this might be the right moment to fix this issue. This might help to stabilize the format for the future.
* The incorrect alignment needs to be addressed anyway, and currently is causing small losses in speed (about 3%).
2019-11-11 17:30:54 +00:00
Martin Junczys-Dowmunt
fd3404deab Merged PR 10382: Fix cublas math mode querying
While moving to the explicit matmul functions like `cublasGemmEx` the specific algorithm needs to be set. This seems to override math mode from `cublasSetMathMode`.  Now we query explicitly and choose the algorithm accordingly.
2019-11-11 17:19:04 +00:00
Ulrich Germann
66b95a5afa Sort custom cmake options alphabetically (#547) 2019-11-11 09:33:13 +00:00
Marcin Junczys-Dowmunt
13e61820fd bump version and changelog 2019-11-09 01:00:24 -08:00
Marcin Junczys-Dowmunt
ca610330d5 remove reference counting from fastopt 2019-11-09 00:30:24 -08:00
alvations
59011c8129 added ssse4.2 support 2019-11-07 07:09:40 +08:00
Marcin Junczys-Dowmunt
1171e3d5b9 Merge branch 'master' into pmaster 2019-11-05 23:53:56 -08:00
Marcin Junczys-Dowmunt
a8826c5d12 make compile with cudnn 2019-11-05 23:53:40 -08:00
Marcin Junczys-Dowmunt
0cb8125a1b Merge branch 'master' into pmaster 2019-11-05 15:19:24 -08:00
Martin Junczys-Dowmunt
233281cc26 Merged PR 10304: Remove naked pointers, add binary read mode
* Remove naked pointers in file_stream.{h,cpp}
* Add binary read mode
2019-11-05 23:18:45 +00:00
Marcin Junczys-Dowmunt
03bb51cd96 Merge branch 'master' into pmaster 2019-11-05 11:48:21 -08:00
Marcin Junczys-Dowmunt
7ba804b20b bump patch version, update changelog 2019-11-05 11:46:43 -08:00
Martin Junczys-Dowmunt
78f671c39f Merged PR 10297: Fast option look-up with lazy option contruction
Thie PR adds the FastOpt object and extends the Options object by
* Using YAML only as a background option container that is used for lazily updating the FastOpt object.
* The FastOpt object gets rebuilt whenever an access is about to occur and the doLazyRebuild_ flag is set.
* This will be the case whenever the underlying YAML object has been modified through the public interface.
* The YAML object itself it not accessible by the outside anymore (or only through cloning) to avoid untrackable modifications.

This seems to result in a >20% speed-up for decoding.
2019-11-05 19:28:23 +00:00