Commit Graph

334 Commits

Author SHA1 Message Date
Hieu Hoang
32de075022 beautify 2015-02-19 12:27:23 +00:00
Phil Williams
e1d60211a4 filter-rule-table: comments + minor clean-up. 2015-02-11 12:03:27 +00:00
Phil Williams
02f5ada680 filter-rule-table: support for "hierarchical" and "s2t" model types
Output should match filter-rule-table.py, but filtering is faster.  Some rough
timings:

             That        This
  System A    0h 13m     0h 04m
  System B   18h 03m     0h 51m

System A is WMT14, en-de, string-to-tree (32M rules, 3,000 test sentences)
System B is WMT14, cs-en, string-to-tree (293M rules, 13,071 test sentences)
2015-02-10 15:11:10 +00:00
Hieu Hoang
70e8eb54ce Using boost for prefix/suffix checks /Jeroen Vermeulen 2015-02-05 16:23:47 +00:00
Philipp Koehn
f69c1dab02 more efficient default recaser training 2015-02-04 09:18:09 +00:00
Phil Williams
6b9da6c585 filter-rule-table: merge changes from t2s branch (still WIP) 2015-02-03 11:33:10 +00:00
Matthias Huck
9987beb453 SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals.
Also added score components based on relative frequency.
(TODO: logprobs right now; are plain probabilities better?)
2015-01-23 18:41:18 +00:00
Matthias Huck
b50c197313 forgot to check this in some time ago 2015-01-20 21:41:41 +00:00
Matthias Huck
a6c09e57d0 domain features in GHKM extraction 2015-01-20 21:36:55 +00:00
Hieu Hoang
b50b3164fa beautify 2015-01-15 11:18:39 +00:00
Hieu Hoang
6289b39fd8 update extract-mixed-syntax 2015-01-15 09:53:57 +00:00
Hieu Hoang
6d61db28fa use astyle 2.01. It's on Edinburgh server and doesn't screw up enum 2015-01-14 19:21:11 +00:00
Hieu Hoang
05ead45e71 beautify 2015-01-14 11:07:42 +00:00
Matthias Huck
168118d252 PhraseOrientationFeature efficiency improvement 2015-01-09 14:03:18 +00:00
Phil Williams
7cc75a0fa1 score-stsg: add --TreeScore option 2014-12-30 18:57:23 +00:00
Philipp Koehn
831f947874 long overdue feature: do not produce very low scoring translation table entries that are never used and just gum up the works 2014-12-21 01:14:42 +00:00
Nicola Bertoldi
e4eb201c52 merged master into dynamic-models and solved conflicts 2014-12-13 12:52:47 +01:00
Phil Williams
b9a382aa78 Add filter-rule-table
This will eventually replace filter-rule-table.py.  At the moment
it can only filter rule tables where the source-side is a STSG
fragment and when the test sentences have parse trees.
2014-12-07 14:56:48 +00:00
Phil Williams
60e56efc6b phrase-extract: add syntax-common sub-library
And remove some (near-)duplicate code from pcfg-common and score-stsg.
2014-12-07 14:27:51 +00:00
Phil Williams
a2708b8431 relax-parse: fix hang
SyntaxTree::Parse() would enter a *very* long loop due to an unintialized
member variable.
2014-12-07 12:56:41 +00:00
Hieu Hoang
4b10c59bea add OutputSearchGraphHypergraph() to API framework. Move m_source to BaseManager 2014-12-05 21:33:59 +00:00
Matthias Huck
7a299de66b avoid necessity of masking "{{" in the data 2014-12-04 15:54:05 +00:00
Matthias Huck
24a8a6a511 PhraseOrientationFeature 2014-12-03 20:04:26 +00:00
Hieu Hoang
49a2ff1faa Merge branch 'merge-cmd' 2014-12-02 19:09:34 +00:00
Hieu Hoang
ba7afba9f6 move n-best code for phrase-based from IOWrapper to ChartManager 2014-12-02 17:40:53 +00:00
Phil Williams
f84f159247 Add score-stsg, a program for scoring STSG extract files 2014-12-02 17:10:20 +00:00
Phil Williams
ef1262a17f extract-ghkm: change STSG output format. 2014-11-21 15:46:12 +00:00
Phil Williams
c46fb10ec7 extract-ghkm: add --STSG option 2014-11-21 11:30:29 +00:00
Matthias Huck
0fd987a8c6 avoid necessity of masking "{{" in the data 2014-11-12 18:28:59 +00:00
Phil Williams
a5d803ee14 extract-ghkm: add -T2S option 2014-11-12 14:03:24 +00:00
Rico Sennrich
ae8b9cbfef glue grammar: alignment for <s> and </s> 2014-11-04 14:05:13 +00:00
Phil Williams
05ecc914c2 Fix a few more compiler warnings (from Clang mostly). 2014-10-10 15:47:53 +01:00
Paul Baltescu
8f74ecd8f3 Fix OxLM. 2014-10-08 22:08:42 +01:00
Matthias Huck
5ac6c42508 PhraseOrientationFeature: bugfixes 2014-09-13 00:20:17 +01:00
Matthias Huck
0cf0d595d3 GHKM glue grammar: Orientation phrase property 2014-09-12 17:30:03 +01:00
Matthias Huck
63316960a1 GHKM glue grammar: print word alignment links for <s> and </s>,
SSTART and SEND in internal tree structure
2014-09-12 17:18:31 +01:00
Matthias Huck
1523f3315d PhraseOrientationFeature for chart-based decoding: a first simple version,
with lots of log output
2014-09-12 13:51:04 +01:00
Hieu Hoang
0b41879a3a argument --NonTermConsecSourceMixedSyntax 2014-09-03 02:33:50 +01:00
Hieu Hoang
b0ee7f68e2 argument --NonTermConsecSourceMixedSyntax 2014-09-03 02:19:49 +01:00
Hieu Hoang
2d73f6f803 jamfile error 2014-09-01 18:02:42 +01:00
Hieu Hoang
1aa5c4fa35 change size of syntactic non-term 2014-08-30 07:44:53 +01:00
Hieu Hoang
f19781e05b add Jamfile 2014-08-29 16:26:28 +01:00
Hieu Hoang
e6438e378f Add option to sort chart translation option after EvaluateWithSourceContext 2014-08-29 16:24:49 +01:00
Hieu Hoang
379da960d1 eclipse 2014-08-29 16:21:27 +01:00
Hieu Hoang
26741c6bca Roll out mixed syntax 2014-08-29 15:56:01 +01:00
Hieu Hoang
4b8d29d18d Roll out mixed syntax 2014-08-29 15:33:35 +01:00
Hieu Hoang
73f1d259a1 Roll out mixed syntax 2014-08-29 15:31:47 +01:00
Hieu Hoang
049a9a9ea7 space between ||| and {{ 2014-08-28 13:22:48 +01:00
Hieu Hoang
794b946783 space between ||| and {{ 2014-08-28 13:20:05 +01:00
Matthias Huck
33992f9af5 uninitialized variables and double include 2014-08-08 16:27:17 +01:00
Paul Baltescu
d75c4e1ae5 OxLM integration. 2014-08-08 01:18:05 +01:00
Matthias Huck
c27cbf55ea source labels: integration into EMS 2014-08-07 21:02:51 +01:00
Ulrich Germann
df3fb4ac5c Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
Conflicts:
	doc/Mmsapt.howto
2014-08-04 17:26:15 +01:00
Ulrich Germann
2711360ce7 Added missing #include. 2014-08-04 17:19:58 +01:00
Barry Haddow
65b3e0b96e Missing include 2014-08-01 11:13:34 +01:00
Hieu Hoang
8d7871125f delete extract-ordering. Not part of the core functionality 2014-07-30 13:16:40 +01:00
Matthias Huck
7b02017da1 use std::numeric_limits 2014-07-28 19:49:43 +01:00
Matthias Huck
3a5dee12e8 implementation of phrase orientation in GHKM extraction
(...but a corresponding feature function for the chart-based decoder has not been written yet)
2014-07-28 18:27:12 +01:00
Matthias Huck
19a5ef4a1a relax-parse: use cin.peek()
Hope this eliminates some weird behavior
2014-07-17 20:19:28 +01:00
Hieu Hoang
d7cbef5cbe minor format change in consolidate 2014-06-25 07:04:11 -04:00
Hieu Hoang
ab3ed27f20 Merge ../mosesdecoder into hieu 2014-06-13 17:04:52 +01:00
Hieu Hoang
32b4eb5168 add NonTermContext property 2014-06-13 17:04:41 +01:00
Hieu Hoang
dddadc4c81 Merge branch 'hieu' of github.com:hieuhoang/mosesdecoder into hieu 2014-06-13 10:35:43 +01:00
Hieu Hoang
768ac1c6a8 add SpanLength to score 2014-06-12 13:26:01 +01:00
Matthias Huck
d0e92da734 GHKM extraction can add a source labels phrase property 2014-06-11 19:27:18 +01:00
Hieu Hoang
af659446bd Merge branch 'hieu' of github.com:hieuhoang/mosesdecoder into hieu 2014-06-11 04:35:33 +01:00
Hieu Hoang
92089f9726 ignore 0 span. Don't bomb out 2014-06-11 04:35:09 +01:00
Hieu Hoang
0178e5237e don't output SpanLength property if empty 2014-06-09 09:05:31 +01:00
Hieu Hoang
7c5208e6d6 Merge ../mosesdecoder into hieu 2014-06-08 17:18:16 +01:00
Hieu Hoang
29d83d94b1 delete any mention of SAFE_GETLINE so it doesn't reappear 2014-06-08 17:18:07 +01:00
Hieu Hoang
3c6a31128d Merge ../mosesdecoder into hieu 2014-06-08 17:07:41 +01:00
Hieu Hoang
1b667e3e24 delete any mention of SAFE_GETLINE so it doesn't reappear 2014-06-08 17:07:12 +01:00
Hieu Hoang
cb94a3181b use standard c++ getline instead of old Moses SAFE_GETLINE 2014-06-08 16:23:14 +01:00
Hieu Hoang
23ba0de224 use standard c++ getline instead of old Moses SAFE_GETLINE 2014-06-08 15:41:27 +01:00
Hieu Hoang
d979b24314 use standard c++ getline instead of old Moses SAFE_GETLINE 2014-06-08 14:06:33 +01:00
Hieu Hoang
45ed0a5b1f Merge ../mosesdecoder into hieu 2014-06-08 13:22:34 +01:00
Hieu Hoang
f58c7fc831 use standard c++ getline instead of old Moses SAFE_GETLINE 2014-06-08 13:17:23 +01:00
Nicola Bertoldi
4d75c889f1 merged master into dynamic-models 2014-06-08 09:39:37 +02:00
Hieu Hoang
9f2e3a4194 add SpanLength property 2014-06-03 17:26:21 +01:00
Hieu Hoang
fcf9e4b51c Merge ../mosesdecoder into hieu 2014-06-03 17:11:16 +01:00
Hieu Hoang
3ae671fc7c ||| separator after counts 2014-06-03 17:10:09 +01:00
Hieu Hoang
fe1fdb7980 Merge ../mosesdecoder into hieu 2014-06-03 14:40:19 +01:00
Hieu Hoang
23e9083514 erroneous assert 2014-06-03 14:40:00 +01:00
Hieu Hoang
280f02cd1a Merge ../mosesdecoder into hieu 2014-06-03 14:15:54 +01:00
Hieu Hoang
8ba078c1eb erroneous assert 2014-06-03 14:06:40 +01:00
Hieu Hoang
ea1fb296fe add span length to training 2014-05-31 21:39:47 +01:00
Nicola Bertoldi
0ca98837db beautify 2014-05-19 15:35:33 +02:00
Nicola Bertoldi
20b3e8929e beautify 2014-05-19 15:35:08 +02:00
Nicola Bertoldi
20381cbf89 merged master into dynamic-models and solved conflicts 2014-04-28 19:18:38 +02:00
Rico Sennrich
c8682e9420 target-syntax: use SoftMatchingFeature to assign non-terminal to unknown words 2014-03-24 14:57:24 +00:00
Rico Sennrich
ba52fa163b use &#124; as default escape sequence for "|" (for consistency with tokenizer.perl) 2014-03-21 19:19:03 +00:00
Hieu Hoang
0e308e41ca recommit Rico's change to score format 2014-03-13 18:30:24 +00:00
Ulrich Germann
a7c85780ee Merge branch 'master' into dynamic-phrase-tables
Conflicts:
	phrase-extract/score-main.cpp
2014-03-10 14:25:45 +00:00
Ulrich Germann
fdc504d47a Changes on main branch files while I was working on dynamic phrase tables. 2014-03-10 14:08:00 +00:00
Rico Sennrich
01bc3c111e swap position of alignment and scores in phrase table halves (before consolidate step).
ensures that multiple hierarchical rules with same source/target phrase, but different alignment, are sorted correctly
2014-03-02 16:55:42 +00:00
Ulrich Germann
ef2ef881a4 Merge branch 'dynamic-phrase-tables' of file:///home/germann/git/mosesdecoder into dynamic-phrase-tables 2014-02-21 01:04:02 +00:00
Ulrich Germann
e089c7463d Fixed code formatting. 2014-02-08 16:03:50 +00:00
Matthias Huck
e40fabfad5 fixed compile errors in debug mode 2014-02-06 19:46:32 +00:00
Matthias Huck
65811a0325 tree fragments: tiny issues with the extraction pipeline 2014-02-03 18:13:10 +00:00
Matthias Huck
86ee3e15a4 new version of the score tool
which is now capable of dealing with additional properties in an appropriate manner
2014-01-29 18:37:42 +00:00
Hieu Hoang
4c009e31e8 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder into hieu 2014-01-20 17:08:02 +00:00
Rico Sennrich
bc0cac59be unescape "&#124;" (for compatibility with escape-special-chars scripts) 2014-01-18 12:23:21 +00:00
Nicola Bertoldi
4b072f2097 merge master into this branch 2014-01-17 14:04:15 +01:00
Rico Sennrich
c1d8f6e267 Revert "testing the waters for C++11 adoption"
This reverts commit d2d508184e.

there's problems with gcc 4.5, and apparently different problems with new boost versions; sticking with C++03 for the time being.
2014-01-15 16:16:11 +00:00
Nicola Bertoldi
572728074d removed useless files 2014-01-15 16:52:25 +01:00
Nicola Bertoldi
e452a13062 beautify 2014-01-15 16:49:57 +01:00
Nicola Bertoldi
47bece6eac code cleanup; fixings to others' code/test 2014-01-15 16:16:37 +01:00
Rico Sennrich
d2d508184e testing the waters for C++11 adoption 2014-01-14 17:01:46 +00:00
Nicola Bertoldi
50970b2b59 merge master into this branch 2014-01-14 08:50:18 +01:00
Hieu Hoang
584af0d015 add support for --MinPhraseLength 2014-01-06 18:03:38 +00:00
Hieu Hoang
35faa887e8 add support for --MinPhraseLength 2014-01-06 17:34:04 +00:00
Hieu Hoang
abe0155f81 ordering extract in same format as my own 2014-01-06 17:21:39 +00:00
Hieu Hoang
ac5d6676f2 ordering extract in same format as my own 2014-01-06 17:04:10 +00:00
Hieu Hoang
d4d4e27511 only output ordering extract 2014-01-06 16:31:21 +00:00
Hieu Hoang
2fb99f07bb only output ordering extract 2014-01-06 13:31:47 +00:00
Hieu Hoang
63f6ea8fa7 eclipse 2014-01-06 11:55:22 +00:00
Hieu Hoang
b3a712baa0 output reordering only 2013-12-18 18:40:23 +00:00
Hieu Hoang
7d497abf41 minor verbose in consolidate-main.cpp 2013-12-06 11:46:19 +00:00
Hieu Hoang
4f6f127486 Merge pull request #53 from pengli09/master
Fix the bug in phrase-extract/extract-main.cpp: the authors forgot to change three variable names
2013-11-20 03:04:41 -08:00
Peng Li
f53825c71e Fix the bug in phrase-extract/extract-main.cpp: the authors forgot to change inBottomRight/outBottomRight to inBottomLeft/outBottomLeft in the second loops in getOrientPhraseModel() and getOrientHierModel() 2013-11-20 16:22:15 +08:00
Hieu Hoang
ccf9662748 Merge branch 'master' of ../mosesdecoder 2013-11-15 14:03:05 +00:00
Phil Williams
6bee77e207 extract-ghkm: use square brackets for glue rule internal tree structure 2013-11-12 15:49:49 +00:00
Hieu Hoang
477314cda4 Merge branch 'master' of github.com:hieuhoang/mosesdecoder 2013-11-12 12:26:35 +00:00
Hieu Hoang
24f95297fc compiles with clang 2013-10-31 12:46:41 +00:00
Hieu Hoang
125e9a8569 add debug argument 2013-10-05 10:48:01 +01:00
Hieu Hoang
902741681a reverse 7d3de78500 2013-10-04 21:27:53 +01:00
Hieu Hoang
7d3de78500 minor error with placeholder 2013-10-04 19:29:16 +01:00
Phil Williams
d6aa123d03 score: write sparse features to third field. 2013-09-29 18:58:20 +01:00
Phil Williams
2a28d1a73e Merge branch 'master' into GHKMStruct
Conflicts:
	moses-chart-cmd/IOWrapper.cpp
	moses-chart-cmd/IOWrapper.h
	moses/FF/Factory.cpp
	moses/Parameter.cpp
	moses/StaticData.h
	phrase-extract/extract-ghkm/ScfgRuleWriter.cpp
	phrase-extract/score-main.cpp
2013-09-29 15:27:09 +01:00
Phil Williams
20b96fd0a7 Oops, fix e497dc485... 2013-09-29 15:23:37 +01:00
Phil Williams
e497dc4857 Remove NT length code missed in commit cdd9df19... 2013-09-29 15:09:14 +01:00
Hieu Hoang
31ce9b510e beautify 2013-09-27 09:35:24 +01:00
Phil Williams
940591a1a3 extract-ghkm: allow trailing whitespace in alignment file
Thanks to Matt Post for reporting the problem.
2013-09-26 15:49:08 +01:00
Phil Williams
29c1089283 consolidate: don't assume input contains key-value field 2013-09-24 09:45:49 +01:00
Phil Williams
74ed066569 consolidate: expect key-value pairs in 7th field, not 6th 2013-09-20 15:50:03 +01:00
Phil Williams
23488e1adb extract-ghkm: use square brackets for --TreeFragments
Use square brackets instead of round brackets for internal tree
structure.  This avoids the need for additional escaping since
square brackets are already escaped in Moses.

Also: tweak code style to match the rest of the source file, and
output less whitespace to make the extract files (marginally)
smaller.
2013-09-20 14:57:40 +01:00
Phil Williams
ab863d1f16 consolidate: write key-value field to rule table 2013-09-20 09:42:13 +01:00
Hieu Hoang
98bb4fa1c7 placeholders work in extract 2013-09-19 12:24:57 +02:00
Hieu Hoang
a40d9082cd more placeholder code and 'NO BEST TRANSLATION' to stderr for pb 2013-09-18 23:47:50 +02:00
Matthias Huck
a6d172e0f1 command line option for extract-ghkm: --TreeFragments 2013-09-16 20:06:02 +01:00
maria nadejde
7cc284a743 comment 2013-09-14 10:50:33 +02:00
maria nadejde
df86f0e78b Merge branch 'GHKMStruct' of github.com:moses-smt/mosesdecoder into GHKMStruct 2013-09-14 10:46:17 +02:00
maria nadejde
5f37a545b1 fixed sparse feature output 2013-09-14 10:44:35 +02:00
Phil Williams
296eb6804a Merge master 2013-09-13 22:32:45 +01:00
Phil Williams
cdd9df19d2 Remove --OutputNTLengths from extract-rules, etc.
The option isn't used in master and the output is compatible with the
current rule table format.  If anyone wants this in master it should
probably be fixed in the span-length branch then merged.
2013-09-13 22:16:42 +01:00
maria nadejde
bf5c32df6c stuff that probably doesn't work 2013-09-13 19:43:04 +02:00
Matthias Huck
643fa18805 Merge branch 'GHKMStruct' of github.com:moses-smt/mosesdecoder into GHKMStruct 2013-09-13 17:13:20 +02:00
Matthias Huck
c39bed60c0 Tree fragments in GHKM glue rules;
output of LHS tag in tree fragments for UNKs;
GHKMParse info is now denoted as Tree info
2013-09-13 17:10:21 +02:00
maria nadejde
fad57a60a7 comment for Equal implementation 2013-09-13 16:13:36 +02:00
maria nadejde
5615a11766 sparse feature weight file 2013-09-13 16:06:48 +02:00