Commit Graph

248 Commits

Author SHA1 Message Date
Hieu Hoang
ad73919979 merge with private branch 2015-03-10 15:28:45 +00:00
Phil Williams
9e88f794e6 Add phrase-extract/postprocess-egret-forests
This performs some minor transformations to Egret forests: escaping of
Moses special characters; removal of "^g" suffixes from constituent labels;
and marking of slash/hyphen split points (using @ characters).
2015-03-10 13:51:30 +00:00
Matthias Huck
25f5470216 GHKM: write target parts-of-speech as a factor 2015-03-09 21:54:03 +00:00
Matthias Huck
524ed4406e pragma once 2015-03-09 21:44:54 +00:00
Matthias Huck
559077f6f8 some moderate modifications in phrase-extract/score-main.cpp
(e.g., use Moses::Scan<>() rather than atof()/atoi())
2015-03-09 18:49:32 +00:00
Matthias Huck
973fd98052 conservative update of some old code in phrase-extract/consolidate-main.cpp 2015-03-09 18:47:28 +00:00
Matthias Huck
0c79e19ff9 consolidate properties: fixing bug from commit b08d3ed 2015-03-09 18:44:02 +00:00
Hieu Hoang
b08d3ed0fe merge with private branch. Add --Count arg 2015-03-09 00:47:51 +00:00
Matthias Huck
99b8f65fb1 GHKM: POS factor in glue rules: target side only 2015-03-06 16:47:44 +00:00
Matthias Huck
aa077ab66c GHKM extraction / consolidate: write most frequent POS sequence from property to factor (for usage with a POS LM) 2015-03-05 22:25:32 +00:00
Matthias Huck
773a16b5fd POS property in glue rules 2015-03-04 23:05:45 +00:00
Matthias Huck
638e9c3f60 POS property: map tags to indices in consolidate 2015-03-04 22:48:34 +00:00
Matthias Huck
06e87d851e GHKM: extract POS phrase property (from preterminals in the syntactic parse tree) 2015-03-04 21:40:56 +00:00
Phil Williams
0346fbb138 filter-rule-table: stopgap (non-) filtering for T2S/SCFG 2015-02-23 11:27:20 +00:00
Hieu Hoang
32de075022 beautify 2015-02-19 12:27:23 +00:00
Phil Williams
e1d60211a4 filter-rule-table: comments + minor clean-up. 2015-02-11 12:03:27 +00:00
Phil Williams
02f5ada680 filter-rule-table: support for "hierarchical" and "s2t" model types
Output should match filter-rule-table.py, but filtering is faster.  Some rough
timings:

             That        This
  System A    0h 13m     0h 04m
  System B   18h 03m     0h 51m

System A is WMT14, en-de, string-to-tree (32M rules, 3,000 test sentences)
System B is WMT14, cs-en, string-to-tree (293M rules, 13,071 test sentences)
2015-02-10 15:11:10 +00:00
Hieu Hoang
70e8eb54ce Using boost for prefix/suffix checks /Jeroen Vermeulen 2015-02-05 16:23:47 +00:00
Philipp Koehn
f69c1dab02 more efficient default recaser training 2015-02-04 09:18:09 +00:00
Phil Williams
6b9da6c585 filter-rule-table: merge changes from t2s branch (still WIP) 2015-02-03 11:33:10 +00:00
Matthias Huck
9987beb453 SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals.
Also added score components based on relative frequency.
(TODO: logprobs right now; are plain probabilities better?)
2015-01-23 18:41:18 +00:00
Matthias Huck
b50c197313 forgot to check this in some time ago 2015-01-20 21:41:41 +00:00
Matthias Huck
a6c09e57d0 domain features in GHKM extraction 2015-01-20 21:36:55 +00:00
Hieu Hoang
b50b3164fa beautify 2015-01-15 11:18:39 +00:00
Hieu Hoang
6289b39fd8 update extract-mixed-syntax 2015-01-15 09:53:57 +00:00
Hieu Hoang
6d61db28fa use astyle 2.01. It's on Edinburgh server and doesn't screw up enum 2015-01-14 19:21:11 +00:00
Hieu Hoang
05ead45e71 beautify 2015-01-14 11:07:42 +00:00
Matthias Huck
168118d252 PhraseOrientationFeature efficiency improvement 2015-01-09 14:03:18 +00:00
Phil Williams
7cc75a0fa1 score-stsg: add --TreeScore option 2014-12-30 18:57:23 +00:00
Philipp Koehn
831f947874 long overdue feature: do not produce very low scoring translation table entries that are never used and just gum up the works 2014-12-21 01:14:42 +00:00
Nicola Bertoldi
e4eb201c52 merged master into dynamic-models and solved conflicts 2014-12-13 12:52:47 +01:00
Phil Williams
b9a382aa78 Add filter-rule-table
This will eventually replace filter-rule-table.py.  At the moment
it can only filter rule tables where the source-side is a STSG
fragment and when the test sentences have parse trees.
2014-12-07 14:56:48 +00:00
Phil Williams
60e56efc6b phrase-extract: add syntax-common sub-library
And remove some (near-)duplicate code from pcfg-common and score-stsg.
2014-12-07 14:27:51 +00:00
Phil Williams
a2708b8431 relax-parse: fix hang
SyntaxTree::Parse() would enter a *very* long loop due to an unintialized
member variable.
2014-12-07 12:56:41 +00:00
Hieu Hoang
4b10c59bea add OutputSearchGraphHypergraph() to API framework. Move m_source to BaseManager 2014-12-05 21:33:59 +00:00
Matthias Huck
7a299de66b avoid necessity of masking "{{" in the data 2014-12-04 15:54:05 +00:00
Matthias Huck
24a8a6a511 PhraseOrientationFeature 2014-12-03 20:04:26 +00:00
Hieu Hoang
49a2ff1faa Merge branch 'merge-cmd' 2014-12-02 19:09:34 +00:00
Hieu Hoang
ba7afba9f6 move n-best code for phrase-based from IOWrapper to ChartManager 2014-12-02 17:40:53 +00:00
Phil Williams
f84f159247 Add score-stsg, a program for scoring STSG extract files 2014-12-02 17:10:20 +00:00
Phil Williams
ef1262a17f extract-ghkm: change STSG output format. 2014-11-21 15:46:12 +00:00
Phil Williams
c46fb10ec7 extract-ghkm: add --STSG option 2014-11-21 11:30:29 +00:00
Matthias Huck
0fd987a8c6 avoid necessity of masking "{{" in the data 2014-11-12 18:28:59 +00:00
Phil Williams
a5d803ee14 extract-ghkm: add -T2S option 2014-11-12 14:03:24 +00:00
Rico Sennrich
ae8b9cbfef glue grammar: alignment for <s> and </s> 2014-11-04 14:05:13 +00:00
Phil Williams
05ecc914c2 Fix a few more compiler warnings (from Clang mostly). 2014-10-10 15:47:53 +01:00
Paul Baltescu
8f74ecd8f3 Fix OxLM. 2014-10-08 22:08:42 +01:00
Matthias Huck
5ac6c42508 PhraseOrientationFeature: bugfixes 2014-09-13 00:20:17 +01:00
Matthias Huck
0cf0d595d3 GHKM glue grammar: Orientation phrase property 2014-09-12 17:30:03 +01:00
Matthias Huck
63316960a1 GHKM glue grammar: print word alignment links for <s> and </s>,
SSTART and SEND in internal tree structure
2014-09-12 17:18:31 +01:00