Commit Graph

22 Commits

Author SHA1 Message Date
Rico Sennrich
c8682e9420 target-syntax: use SoftMatchingFeature to assign non-terminal to unknown words 2014-03-24 14:57:24 +00:00
Matthias Huck
65811a0325 tree fragments: tiny issues with the extraction pipeline 2014-02-03 18:13:10 +00:00
Phil Williams
6bee77e207 extract-ghkm: use square brackets for glue rule internal tree structure 2013-11-12 15:49:49 +00:00
Hieu Hoang
24f95297fc compiles with clang 2013-10-31 12:46:41 +00:00
Phil Williams
2a28d1a73e Merge branch 'master' into GHKMStruct
Conflicts:
	moses-chart-cmd/IOWrapper.cpp
	moses-chart-cmd/IOWrapper.h
	moses/FF/Factory.cpp
	moses/Parameter.cpp
	moses/StaticData.h
	phrase-extract/extract-ghkm/ScfgRuleWriter.cpp
	phrase-extract/score-main.cpp
2013-09-29 15:27:09 +01:00
Phil Williams
e497dc4857 Remove NT length code missed in commit cdd9df19... 2013-09-29 15:09:14 +01:00
Phil Williams
940591a1a3 extract-ghkm: allow trailing whitespace in alignment file
Thanks to Matt Post for reporting the problem.
2013-09-26 15:49:08 +01:00
Phil Williams
23488e1adb extract-ghkm: use square brackets for --TreeFragments
Use square brackets instead of round brackets for internal tree
structure.  This avoids the need for additional escaping since
square brackets are already escaped in Moses.

Also: tweak code style to match the rest of the source file, and
output less whitespace to make the extract files (marginally)
smaller.
2013-09-20 14:57:40 +01:00
Matthias Huck
a6d172e0f1 command line option for extract-ghkm: --TreeFragments 2013-09-16 20:06:02 +01:00
Matthias Huck
c39bed60c0 Tree fragments in GHKM glue rules;
output of LHS tag in tree fragments for UNKs;
GHKMParse info is now denoted as Tree info
2013-09-13 17:10:21 +02:00
Matthias Huck
96d14555fc GHKM tree output during extraction: modified extract-ghkm and score tools 2013-09-11 16:46:37 +02:00
Matthias Huck
004c44faf1 prototype GHKM tree output from extract-ghkm (still flawed) 2013-09-10 15:41:26 +02:00
Phil Williams
f0b603e6b5 extract-ghkm: write glue grammars for all sentence offsets
extract-parallel now merges separate glue grammars, so remove
previous workaround.
2013-07-25 13:53:32 +01:00
Phil Williams
b5584fdecf extract-ghkm: workaround for extract-parallel issue
Don't write glue grammar or unknown word label files unless the sentence
offset is 0.  This prevents multiple instances of extract-ghkm writing
to the same two files when extract-parallel is used.

TODO Better solutions might be:
 1. modify extract-parallel so that it only configures one instance of
    extract-ghkm to write the glue / unknown-lhs files (like the current
    workaround, this assumes file chunks are representative of the whole)
 2. add multithreading support directly to extract-ghkm
 3. write distinct output files for each extract-ghkm instance and
    combine them on completion
2013-07-23 14:55:16 +01:00
Hieu Hoang
6249432407 beautify 2013-05-29 18:16:15 +01:00
Phil Williams
139148bc8f extract-ghkm and friends: don't unescape special characters
Don't unescape special characters when reading XML parse trees in
extract-ghkm, extract-rules, and relax-parse.
2012-12-17 20:08:02 +00:00
Phil Williams
0ca5b8932a extract-ghkm: tweak label collection for unknown words
Produce a better label set when unary rule elimination is enabled.
2012-12-17 19:43:42 +00:00
Phil Williams
fb8d20a22f extract-ghkm: --UnknownWordMinRelFreq, --UnknownWordUniform 2012-12-17 19:02:30 +00:00
Kenneth Heafield
ddd3cc1d8a Fix extract-ghkm compilation 2012-11-12 16:50:46 +00:00
Phil Williams
0851a4d113 extract-ghkm: add --SentenceOffset option
This should behave the same as the --SentenceOffset option for
extract-rules.  The extract-parallel.perl script expects the rule
extractor to have this option.
2012-10-03 20:04:09 +01:00
Hieu Hoang
121e258e84 namespace all classes in mert directory 2012-06-30 21:39:10 +01:00
Hieu Hoang
4eef94b121 move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00