Commit Graph

81 Commits

Author SHA1 Message Date
hieuhoang1972
8595b06dce rewrite lex prob calc
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@4058 1f5c12ca-751b-0410-a591-d2e778427230
2011-07-01 05:40:46 +00:00
hieuhoang1972
024b5f9100 vs.net build
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@4048 1f5c12ca-751b-0410-a591-d2e778427230
2011-06-28 19:38:57 +00:00
pjwilliams
c14723cc83 Oops, fix commit 4032: option is called --PhrasePairCount not --RuleCount.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@4034 1f5c12ca-751b-0410-a591-d2e778427230
2011-06-24 16:40:17 +00:00
pjwilliams
108dc4d12e Add --PhrasePairCount option to score.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@4032 1f5c12ca-751b-0410-a591-d2e778427230
2011-06-24 16:24:33 +00:00
hieuhoang1972
7408636328 add --MaxLinesGTDiscount to usage display
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3987 1f5c12ca-751b-0410-a591-d2e778427230
2011-05-22 16:02:26 +00:00
pjwilliams
b186fcd2c7 Simple SCFG rule extraction speed-ups based on callgrind profile.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3946 1f5c12ca-751b-0410-a591-d2e778427230
2011-04-07 11:03:23 +00:00
hieuhoang1972
1e76baa978 #include for Ubuntu build
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3918 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-08 15:45:03 +00:00
hieuhoang1972
2880656d8d option of outputting scoring to stdout
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3914 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-07 02:44:34 +00:00
hieuhoang1972
cd384a1fbc option of outputting scoring to stdout
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3913 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-05 15:38:50 +00:00
hieuhoang1972
a3d97584a9 run beautify.perl. Consistent formatting for .h & .cpp files
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3902 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-24 13:57:11 +00:00
phkoehn
4c11bcd617 extensions to phrase table scoring options
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3893 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-23 10:27:54 +00:00
hieuhoang1972
fe0d53b73f vs.net
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3801 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-17 15:48:46 +00:00
pjwilliams
3dec57a518 When scoring phrase pairs, store copies of the active pairs' PHRASE objects
instead of inserting them into a PhraseTable.  In a test on a 21GB
target-syntax extract file, this reduced user time from 195 to 120 mins.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3777 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-14 23:49:57 +00:00
pjwilliams
627d8edf8e Fix bug affecting Good-Turing discounting: repeated phrase pairs were always
contributing a count of 1 because PhraseAlignment::addToCount() was looking
for counts in the fifth column, not the fourth.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3775 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-14 16:31:53 +00:00
hieuhoang1972
71093403df use gzipped extract file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3736 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-25 13:54:40 +00:00
hieuhoang1972
dd6c1e722e use gzipped extract file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3729 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-23 14:30:36 +00:00
hieuhoang1972
867a9bdf4b use gzipped extract file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3728 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-23 14:15:54 +00:00
hieuhoang1972
4bc0a8e6b2 can set max num of lines for GT discount calc.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3723 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-19 20:11:10 +00:00
pjwilliams
3ca16120a2 Add --MaxScope option to extract-rules (Hopkins and Langmead, 2010)
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3661 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-26 15:55:57 +00:00
rsennrich
7929e4624e more informative error message when hierarchical phrase extraction fails.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3550 1f5c12ca-751b-0410-a591-d2e778427230
2010-09-22 12:56:11 +00:00
hieuhoang1972
083a9af215 delete alignment info for terminals
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3405 1f5c12ca-751b-0410-a591-d2e778427230
2010-08-13 10:03:13 +00:00
rafpayen
b431f951c5 git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3404 1f5c12ca-751b-0410-a591-d2e778427230 2010-08-13 09:58:17 +00:00
bhaddow
f2660e8d41 Fix glue grammar generation for new ttable format
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3386 1f5c12ca-751b-0410-a591-d2e778427230
2010-08-06 14:45:37 +00:00
hieuhoang1972
7e6b3766dd visual studio
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3385 1f5c12ca-751b-0410-a591-d2e778427230
2010-08-03 15:56:41 +00:00
rafpayen
2ef133e02b add empty fields in glue grammar to accomodate the new phrase table format
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3378 1f5c12ca-751b-0410-a591-d2e778427230
2010-07-30 15:55:14 +00:00
hieuhoang1972
8adef921ed new format for consolidate-direct
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3374 1f5c12ca-751b-0410-a591-d2e778427230
2010-07-29 23:20:37 +00:00
hieuhoang1972
0ee6d75566 bug in Good turing
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3372 1f5c12ca-751b-0410-a591-d2e778427230
2010-07-28 22:49:37 +00:00
hieuhoang1972
340ebbd333 bug in Good turing
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3370 1f5c12ca-751b-0410-a591-d2e778427230
2010-07-28 21:52:32 +00:00
hieuhoang1972
ae9779dd7f separate PhraseAlignment class into separate file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3369 1f5c12ca-751b-0410-a591-d2e778427230
2010-07-28 21:28:14 +00:00
hieuhoang1972
3d9d756055 alignment info, new format
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3363 1f5c12ca-751b-0410-a591-d2e778427230
2010-07-27 11:04:03 +00:00
hieuhoang1972
881117d9f5 alignment info in pt
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3361 1f5c12ca-751b-0410-a591-d2e778427230
2010-07-18 19:49:08 +00:00
hieuhoang1972
31930eb6fc alignment info in pt
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3358 1f5c12ca-751b-0410-a591-d2e778427230
2010-07-17 22:29:06 +00:00
pjwilliams
fab2e96d2f In extract-rules, if the source or target syntax contains an unsupported
escape sequence (anything other than "<", ">", "&", "&apos",
and "&quot") then write a warning message and skip the sentence pair
(instead of asserting).


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3350 1f5c12ca-751b-0410-a591-d2e778427230
2010-06-29 10:41:42 +00:00
pjwilliams
2edfc16912 Merge remaining script support for tree-based models from mt3_chart.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3137 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-16 09:45:51 +00:00
hieuhoang1972
a2233d0f8d xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3136 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-14 16:53:39 +00:00
pjwilliams
5faaedc0df Copy in consolidate,' consolidate-direct,' and the new version of
`score' from mt3_chart.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3134 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-14 15:50:17 +00:00
hieuhoang1972
06ee9a3be3 vs
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3132 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-13 16:50:44 +00:00
pjwilliams
53cb08efca Use a generic version of the SAFE_GETLINE macro in scripts/phrase-extract
instead of defining one per source file.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3131 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-13 16:29:55 +00:00
hieuhoang1972
0440dfe079 xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3130 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-13 16:13:56 +00:00
pjwilliams
580acce9e2 Integrate rule extraction code from mt3_chart. There are now two extract
programs: `extract' for the phrase-based model and `extract-rules' for
tree-based models.  They could be combined into a single program, but
they're probably sufficiently different that it isn't worthwhile.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3129 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-13 15:34:39 +00:00
pjwilliams
51ae927ede Start merging in rule extraction code from mt3_chart branch.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3126 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-12 15:22:50 +00:00
pjwilliams
9c2536417f Remove file limit option for phrase extraction.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3122 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-12 11:56:54 +00:00
pjwilliams
99f1c92edb Remove redundant --ZipFiles option from extract.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3120 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-12 10:53:08 +00:00
pjwilliams
4c6c4b71cf Remove redundant --ProperConditioning option from extract.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3118 1f5c12ca-751b-0410-a591-d2e778427230
2010-04-12 10:41:32 +00:00
bhaddow
795224736b Merge revisions 2670-2988 from track. Passes all regression except lexicalised
reordering


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/branches/hierarchical-reo@2989 1f5c12ca-751b-0410-a591-d2e778427230
2010-03-19 17:52:51 +00:00
sarst
b95cc2f556 Added the check from word-based models of the alignment points in the adjacent corners, to the more complex models.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/branches/hierarchical-reo@2916 1f5c12ca-751b-0410-a591-d2e778427230
2010-02-19 15:15:24 +00:00
sarst
c65945b531 Cleaned up lescial reordering scoring, and sent vectors as references instead of copying them. Fixed bugs in extract: it used to choose the wrong orientation at end of sentences, and the hierarchical model typ is no longer dependent on the phrase-based model type.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/branches/hierarchical-reo@2892 1f5c12ca-751b-0410-a591-d2e778427230
2010-02-12 13:46:33 +00:00
sarst
92368ba490 Rewrote the lexical reordering model scoring in C++. Adapted train-factored-phrase-model.perl to that change. Minor fixes in other places, for compatibility
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/branches/hierarchical-reo@2884 1f5c12ca-751b-0410-a591-d2e778427230
2010-02-10 17:19:06 +00:00
naditomeh
242d6c6ddd word-based, phrase-based and hierarchical reordering is implemented in the training
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/branches/hierarchical-reo@2823 1f5c12ca-751b-0410-a591-d2e778427230
2010-01-31 23:56:45 +00:00
sarst
f2a5678541 added new file hierarchical.h
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/branches/hierarchical-reo@2803 1f5c12ca-751b-0410-a591-d2e778427230
2010-01-30 09:16:27 +00:00