Matthias Huck
9f562e0fd4
SoftSourceSyntacticConstraintsFeature: better config parameter names
2015-01-27 18:15:51 +00:00
Matthias Huck
0a0ea437bb
use pragma once; avoid using cerr directly
2015-01-26 22:12:44 +00:00
Matthias Huck
eb9d8134a7
PhraseOrientationFeature: Heuristic score for boundary non-terminals is basically a lookahead. Compute a lookahead for everything.
...
(+ Refined feature state comparison.)
2015-01-26 21:11:37 +00:00
Matthias Huck
e51714ff7a
a plain dense unaligned word count feature with two scores (source and target unaligned words)
2015-01-26 21:06:12 +00:00
Matthias Huck
c66d6a9b86
using pragma once and VERBOSE in SourceWordDeletionFeature/TargetWordInsertionFeature
2015-01-26 20:45:08 +00:00
Nicola Bertoldi
fa00c99aa3
fixings to the IRSTLM interface for textual input; code cleanup
2015-01-26 18:24:12 +01:00
Nicola Bertoldi
5d186874f4
minor fixing to the comparison script of the regression tests
2015-01-26 15:52:29 +01:00
Nicola Bertoldi
18eaf62ce3
fixings to the IRSTLM interface for textual input
2015-01-26 15:51:08 +01:00
Hieu Hoang
4202ad473c
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2015-01-25 15:02:51 +00:00
Hieu Hoang
1dea58e945
separate parameters into it's own class
2015-01-25 15:02:33 +00:00
Hieu Hoang
5d2b0224d6
Jamfile for tokenizer
2015-01-25 14:00:35 +00:00
XapaJIaMnu
6ca1a4718c
Expose learning rate as a parameter
2015-01-25 02:13:47 +00:00
Matthias Huck
55f6bbb14a
Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
2015-01-23 18:45:31 +00:00
Matthias Huck
9987beb453
SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals.
...
Also added score components based on relative frequency.
(TODO: logprobs right now; are plain probabilities better?)
2015-01-23 18:41:18 +00:00
Kenneth Heafield
98c352ed3a
Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
2015-01-23 13:38:35 -05:00
akimbal1
d38dcd89bb
add glib-2.0 for better unicodification and faster implementation
2015-01-23 13:35:09 -05:00
Hieu Hoang
45ff417244
beautify
2015-01-22 22:41:56 +00:00
Hieu Hoang
4f322242e9
eclipse
2015-01-22 22:17:50 +00:00
Hieu Hoang
a6cef9382c
eclipse
2015-01-22 22:06:53 +00:00
Marcin Junczys-Dowmunt
bf5280851e
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2015-01-22 22:18:33 +01:00
Marcin Junczys-Dowmunt
4140756fdf
Add missing chck for empty range while flushing
2015-01-22 22:18:19 +01:00
Hieu Hoang
a165ba9005
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2015-01-22 16:43:07 +00:00
Kenneth Heafield
769c19d10c
KenLM a6d57501dcac95a31719a8628f6cbd288f6741e2 including Marcin's fixed pruning
2015-01-22 11:42:46 -05:00
Hieu Hoang
9235534269
Merge ../hh
2015-01-22 16:11:24 +00:00
Hieu Hoang
59c4baec3f
use utf8 german model
2015-01-22 16:10:12 +00:00
Marcin Junczys-Dowmunt
b5b048cf1a
Set default number of scores to 4
2015-01-22 12:36:50 +01:00
Marcin Junczys-Dowmunt
e3ef09e9a4
fixed segfault for querying, set scores to 4
2015-01-22 12:35:55 +01:00
Kenneth Heafield
1dce55f413
C++ tokenizer based on RE2. Not by me.
...
Some differences from Moses tokenizer: fraction characters count as numbers, _ handling, URLs
Currently 3x slower than perl :'(. Looking to make it faster by composing regex substitutions.
TODO eliminate sprintf and fixed-size buffers.
2015-01-22 12:25:02 +01:00
Hieu Hoang
ad6f3a8026
option to sort translation options after EvaluateAfterSourceContext
2015-01-22 12:25:02 +01:00
Kenneth Heafield
e30065072e
C++ tokenizer based on RE2. Not by me.
...
Some differences from Moses tokenizer: fraction characters count as numbers, _ handling, URLs
Currently 3x slower than perl :'(. Looking to make it faster by composing regex substitutions.
TODO eliminate sprintf and fixed-size buffers.
2015-01-21 12:23:44 -05:00
Hieu Hoang
f8781eaefa
option to sort translation options after EvaluateAfterSourceContext
2015-01-21 16:07:50 +00:00
Matthias Huck
ec547fa56a
SoftSourceSyntacticConstraintsFeature: use -inf rather than min for featureVariant=1
2015-01-20 21:43:23 +00:00
Matthias Huck
b50c197313
forgot to check this in some time ago
2015-01-20 21:41:41 +00:00
Matthias Huck
a6c09e57d0
domain features in GHKM extraction
2015-01-20 21:36:55 +00:00
Matthias Huck
db655a09e5
Revert "improved interface towards IRSTLM"
...
This reverts commit 8316ca5948
.
Moses did not compile with the current release version of IRSTLM (irstlm-5.80.06)
2015-01-20 19:23:12 +00:00
Marcin Junczys-Dowmunt
7d9013a85b
Work-around for temporary translation option collection size during phrase table binarization
2015-01-19 23:15:08 +01:00
Kenneth Heafield
7c507bfa74
May is not an abbreviation
2015-01-19 16:37:57 -05:00
Marcin Junczys-Dowmunt
fbcf2dcb56
Fixed thread-safety
2015-01-19 21:56:04 +01:00
Marcin Junczys-Dowmunt
82c603213a
Thread-safety and constness
2015-01-18 23:58:28 +01:00
Marcin Junczys-Dowmunt
16ffc2c978
Added new VW feature and execption to Simple9
2015-01-18 23:26:32 +01:00
Nicola Bertoldi
95a88a17c5
Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
2015-01-18 14:25:40 +01:00
Marcin Junczys-Dowmunt
41f829651b
Another attempt at fixing dangling alignment points
2015-01-17 00:44:04 +01:00
Matthias Huck
db09949587
PhraseOrientationFeature: distinguishStates parameter,
...
use TransformScore rather than std::log
2015-01-16 17:48:58 +00:00
Ales Tamchyna
44d1aaa58e
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2015-01-16 16:30:57 +01:00
Ales Tamchyna
9366d82785
IsCorrectTranslationOption no longer confused by matching subphrases
2015-01-16 16:30:43 +01:00
Matthias Huck
083ed44091
SoftSourceSyntacticConstraintsFeature: bugfix
2015-01-16 15:26:02 +00:00
Hieu Hoang
30e31d4a95
don't normalise quotes if tokenizing like Penn /Phil Williams
2015-01-16 12:34:22 +00:00
Hieu Hoang
19d7c44aad
move normalisation of quotes into normalize-punctuation.perl /Tom Hoar
2015-01-16 11:37:31 +00:00
Hieu Hoang
b50b3164fa
beautify
2015-01-15 11:18:39 +00:00
Hieu Hoang
6289b39fd8
update extract-mixed-syntax
2015-01-15 09:53:57 +00:00