Commit Graph

13450 Commits

Author SHA1 Message Date
Matthias Huck
9f562e0fd4 SoftSourceSyntacticConstraintsFeature: better config parameter names 2015-01-27 18:15:51 +00:00
Matthias Huck
0a0ea437bb use pragma once; avoid using cerr directly 2015-01-26 22:12:44 +00:00
Matthias Huck
eb9d8134a7 PhraseOrientationFeature: Heuristic score for boundary non-terminals is basically a lookahead. Compute a lookahead for everything.
(+ Refined feature state comparison.)
2015-01-26 21:11:37 +00:00
Matthias Huck
e51714ff7a a plain dense unaligned word count feature with two scores (source and target unaligned words) 2015-01-26 21:06:12 +00:00
Matthias Huck
c66d6a9b86 using pragma once and VERBOSE in SourceWordDeletionFeature/TargetWordInsertionFeature 2015-01-26 20:45:08 +00:00
Nicola Bertoldi
fa00c99aa3 fixings to the IRSTLM interface for textual input; code cleanup 2015-01-26 18:24:12 +01:00
Nicola Bertoldi
5d186874f4 minor fixing to the comparison script of the regression tests 2015-01-26 15:52:29 +01:00
Nicola Bertoldi
18eaf62ce3 fixings to the IRSTLM interface for textual input 2015-01-26 15:51:08 +01:00
Hieu Hoang
4202ad473c Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-01-25 15:02:51 +00:00
Hieu Hoang
1dea58e945 separate parameters into it's own class 2015-01-25 15:02:33 +00:00
Hieu Hoang
5d2b0224d6 Jamfile for tokenizer 2015-01-25 14:00:35 +00:00
XapaJIaMnu
6ca1a4718c Expose learning rate as a parameter 2015-01-25 02:13:47 +00:00
Matthias Huck
55f6bbb14a Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-01-23 18:45:31 +00:00
Matthias Huck
9987beb453 SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals.
Also added score components based on relative frequency.
(TODO: logprobs right now; are plain probabilities better?)
2015-01-23 18:41:18 +00:00
Kenneth Heafield
98c352ed3a Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-01-23 13:38:35 -05:00
akimbal1
d38dcd89bb add glib-2.0 for better unicodification and faster implementation 2015-01-23 13:35:09 -05:00
Hieu Hoang
45ff417244 beautify 2015-01-22 22:41:56 +00:00
Hieu Hoang
4f322242e9 eclipse 2015-01-22 22:17:50 +00:00
Hieu Hoang
a6cef9382c eclipse 2015-01-22 22:06:53 +00:00
Marcin Junczys-Dowmunt
bf5280851e Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-01-22 22:18:33 +01:00
Marcin Junczys-Dowmunt
4140756fdf Add missing chck for empty range while flushing 2015-01-22 22:18:19 +01:00
Hieu Hoang
a165ba9005 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-01-22 16:43:07 +00:00
Kenneth Heafield
769c19d10c KenLM a6d57501dcac95a31719a8628f6cbd288f6741e2 including Marcin's fixed pruning 2015-01-22 11:42:46 -05:00
Hieu Hoang
9235534269 Merge ../hh 2015-01-22 16:11:24 +00:00
Hieu Hoang
59c4baec3f use utf8 german model 2015-01-22 16:10:12 +00:00
Marcin Junczys-Dowmunt
b5b048cf1a Set default number of scores to 4 2015-01-22 12:36:50 +01:00
Marcin Junczys-Dowmunt
e3ef09e9a4 fixed segfault for querying, set scores to 4 2015-01-22 12:35:55 +01:00
Kenneth Heafield
1dce55f413 C++ tokenizer based on RE2. Not by me.
Some differences from Moses tokenizer:  fraction characters count as numbers, _ handling, URLs
Currently 3x slower than perl :'(.  Looking to make it faster by composing regex substitutions.
TODO eliminate sprintf and fixed-size buffers.
2015-01-22 12:25:02 +01:00
Hieu Hoang
ad6f3a8026 option to sort translation options after EvaluateAfterSourceContext 2015-01-22 12:25:02 +01:00
Kenneth Heafield
e30065072e C++ tokenizer based on RE2. Not by me.
Some differences from Moses tokenizer:  fraction characters count as numbers, _ handling, URLs
Currently 3x slower than perl :'(.  Looking to make it faster by composing regex substitutions.
TODO eliminate sprintf and fixed-size buffers.
2015-01-21 12:23:44 -05:00
Hieu Hoang
f8781eaefa option to sort translation options after EvaluateAfterSourceContext 2015-01-21 16:07:50 +00:00
Matthias Huck
ec547fa56a SoftSourceSyntacticConstraintsFeature: use -inf rather than min for featureVariant=1 2015-01-20 21:43:23 +00:00
Matthias Huck
b50c197313 forgot to check this in some time ago 2015-01-20 21:41:41 +00:00
Matthias Huck
a6c09e57d0 domain features in GHKM extraction 2015-01-20 21:36:55 +00:00
Matthias Huck
db655a09e5 Revert "improved interface towards IRSTLM"
This reverts commit 8316ca5948.

Moses did not compile with the current release version of IRSTLM (irstlm-5.80.06)
2015-01-20 19:23:12 +00:00
Marcin Junczys-Dowmunt
7d9013a85b Work-around for temporary translation option collection size during phrase table binarization 2015-01-19 23:15:08 +01:00
Kenneth Heafield
7c507bfa74 May is not an abbreviation 2015-01-19 16:37:57 -05:00
Marcin Junczys-Dowmunt
fbcf2dcb56 Fixed thread-safety 2015-01-19 21:56:04 +01:00
Marcin Junczys-Dowmunt
82c603213a Thread-safety and constness 2015-01-18 23:58:28 +01:00
Marcin Junczys-Dowmunt
16ffc2c978 Added new VW feature and execption to Simple9 2015-01-18 23:26:32 +01:00
Nicola Bertoldi
95a88a17c5 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2015-01-18 14:25:40 +01:00
Marcin Junczys-Dowmunt
41f829651b Another attempt at fixing dangling alignment points 2015-01-17 00:44:04 +01:00
Matthias Huck
db09949587 PhraseOrientationFeature: distinguishStates parameter,
use TransformScore rather than std::log
2015-01-16 17:48:58 +00:00
Ales Tamchyna
44d1aaa58e Merge branch 'master' of github.com:moses-smt/mosesdecoder 2015-01-16 16:30:57 +01:00
Ales Tamchyna
9366d82785 IsCorrectTranslationOption no longer confused by matching subphrases 2015-01-16 16:30:43 +01:00
Matthias Huck
083ed44091 SoftSourceSyntacticConstraintsFeature: bugfix 2015-01-16 15:26:02 +00:00
Hieu Hoang
30e31d4a95 don't normalise quotes if tokenizing like Penn /Phil Williams 2015-01-16 12:34:22 +00:00
Hieu Hoang
19d7c44aad move normalisation of quotes into normalize-punctuation.perl /Tom Hoar 2015-01-16 11:37:31 +00:00
Hieu Hoang
b50b3164fa beautify 2015-01-15 11:18:39 +00:00
Hieu Hoang
6289b39fd8 update extract-mixed-syntax 2015-01-15 09:53:57 +00:00