Commit Graph

87 Commits

Author SHA1 Message Date
Matthias Huck
643fa18805 Merge branch 'GHKMStruct' of github.com:moses-smt/mosesdecoder into GHKMStruct 2013-09-13 17:13:20 +02:00
Matthias Huck
c39bed60c0 Tree fragments in GHKM glue rules;
output of LHS tag in tree fragments for UNKs;
GHKMParse info is now denoted as Tree info
2013-09-13 17:10:21 +02:00
maria nadejde
fad57a60a7 comment for Equal implementation 2013-09-13 16:13:36 +02:00
maria nadejde
5615a11766 sparse feature weight file 2013-09-13 16:06:48 +02:00
maria nadejde
bff123635e added Dense and Sparse feature to scorer 2013-09-13 12:45:46 +02:00
maria nadejde
43a9323d0f add feature files 2013-09-12 18:46:40 +02:00
maria nadejde
67b873b67d mock feature 2013-09-12 18:40:08 +02:00
Matthias Huck
96d14555fc GHKM tree output during extraction: modified extract-ghkm and score tools 2013-09-11 16:46:37 +02:00
Matthias Huck
004c44faf1 prototype GHKM tree output from extract-ghkm (still flawed) 2013-09-10 15:41:26 +02:00
Rico Sennrich
b421f7c9b0 refactoring to minimize overhead from flexibility score code (if off) 2013-09-07 23:04:40 +02:00
Rico Sennrich
7138056b8f flexibility scores 2013-09-07 23:04:01 +02:00
Hieu Hoang
77872f7521 beautify 2013-07-30 15:04:37 +01:00
Hieu Hoang
9cdcf713a6 phrase penalty now has it's own ff. No longer in the phrase table 2013-07-29 12:55:44 +01:00
Hieu Hoang
9e8402dedd add placeholder support to extract 2013-07-26 15:46:15 +01:00
Hieu Hoang
e3917f911b add placeholder support to extract 2013-07-26 15:44:29 +01:00
Hieu Hoang
2ba7a372e8 add placeholder support to extract 2013-07-26 14:12:27 +01:00
Hieu Hoang
4fde5f7ea2 eclipse file for extract-rules 2013-07-26 12:27:55 +01:00
Phil Williams
f0b603e6b5 extract-ghkm: write glue grammars for all sentence offsets
extract-parallel now merges separate glue grammars, so remove
previous workaround.
2013-07-25 13:53:32 +01:00
Phil Williams
b5584fdecf extract-ghkm: workaround for extract-parallel issue
Don't write glue grammar or unknown word label files unless the sentence
offset is 0.  This prevents multiple instances of extract-ghkm writing
to the same two files when extract-parallel is used.

TODO Better solutions might be:
 1. modify extract-parallel so that it only configures one instance of
    extract-ghkm to write the glue / unknown-lhs files (like the current
    workaround, this assumes file chunks are representative of the whole)
 2. add multithreading support directly to extract-ghkm
 3. write distinct output files for each extract-ghkm instance and
    combine them on completion
2013-07-23 14:55:16 +01:00
Hieu Hoang
310b26f989 beautify 2013-07-08 20:52:14 +01:00
Hieu Hoang
3eba5782c2 beautify 2013-07-08 20:25:47 +01:00
Hieu Hoang
dc33fa3d3d redo parsing of feature function parameters 2013-06-20 12:50:41 +01:00
Hieu Hoang
abe6bb7c22 refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
Hieu Hoang
6249432407 beautify 2013-05-29 18:16:15 +01:00
phikoehn
41da5b2760 Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2013-05-12 08:16:22 +01:00
Rico Sennrich
ce5311c076 fix undefined behaviour in rule extract (thanks Barry) 2013-05-07 10:50:19 +02:00
Rico Sennrich
a52f0a8c4d avoid costly copy operation in extract-rules
(noticeable speed-up with large number of non-terminals:
2x speed-up in benchmark with target syntax and --MaxNonTerm 5)
2013-05-03 10:48:14 +02:00
phikoehn
d19a28ae21 Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2013-05-01 19:22:00 +01:00
phikoehn
cd8915647b support for Chris Dyer's fast-align; bug fix with sparse word translations feature; threshold pruning in filter 2013-05-01 19:20:05 +01:00
Rico Sennrich
4e87a012d0 fix two bugs with relax-parse:
- size of sentence was not calculated correctly
    (instead, number of positions at which a subtree starts was used)
  - code entered an infinitive loop sometimes; added break condition
2013-04-25 17:27:50 +02:00
phikoehn
5ba153806b fixed kneserNey phrase probability smoothing bug reported by Česlav Przywara <ceslav@przywara.cz> 2013-03-13 17:52:24 +00:00
Barry Haddow
5f1be3217b bugifx format of extract file for instance weighting 2013-03-07 21:40:43 +00:00
Rico Sennrich
e3ea93acb7 speed up rule extraction by factor 2 (by rewriting rule consolidation to have linear instead of quadratic complexity) 2013-02-06 13:10:38 +01:00
Barry Haddow
8db90fd2ac instance weighting for lex reordering 2013-01-10 19:46:19 +00:00
Barry Haddow
2e8bad22e4 lex reordering scoring uses FilePiece/StringPiece 2013-01-09 17:38:48 +00:00
Barry Haddow
861792bfc5 extract can read an instance weights file.
Still have to parallelise.
2012-12-21 15:39:25 +00:00
Phil Williams
139148bc8f extract-ghkm and friends: don't unescape special characters
Don't unescape special characters when reading XML parse trees in
extract-ghkm, extract-rules, and relax-parse.
2012-12-17 20:08:02 +00:00
Phil Williams
0ca5b8932a extract-ghkm: tweak label collection for unknown words
Produce a better label set when unary rule elimination is enabled.
2012-12-17 19:43:42 +00:00
Phil Williams
fb8d20a22f extract-ghkm: --UnknownWordMinRelFreq, --UnknownWordUniform 2012-12-17 19:02:30 +00:00
Hieu Hoang
d0cf8f47db order of lexical probability has flipped 2012-11-22 17:37:36 +00:00
Barry Haddow
a90e1861c0 Alignments on by default for phrase-based 2012-11-15 12:35:43 +00:00
Hieu Hoang
f96b33de83 only include moses root when compiling 2012-11-14 13:43:04 +00:00
Hieu Hoang
5e3ef23cef move moses/src/* to moses/ 2012-11-12 19:56:18 +00:00
Kenneth Heafield
7d692496c3 More little jamfile changes 2012-11-12 16:57:56 +00:00
Kenneth Heafield
d74b784ad2 And pcfg-common too... 2012-11-12 16:53:42 +00:00
Kenneth Heafield
ddd3cc1d8a Fix extract-ghkm compilation 2012-11-12 16:50:46 +00:00
Kenneth Heafield
62d37fa2b6 Refactor phrase-extract/Jamfile 2012-11-12 14:17:48 +00:00
Hieu Hoang
e75522b602 rename functions 2012-11-09 18:55:01 -05:00
Hieu Hoang
db29cc50d0 xcode 2012-11-09 18:13:20 -05:00
Kenneth Heafield
cd00219fa4 Missing dependency 2012-11-04 16:26:47 -05:00