Commit Graph

221 Commits

Author SHA1 Message Date
Hieu Hoang
d4d4e27511 only output ordering extract 2014-01-06 16:31:21 +00:00
Hieu Hoang
2fb99f07bb only output ordering extract 2014-01-06 13:31:47 +00:00
Hieu Hoang
63f6ea8fa7 eclipse 2014-01-06 11:55:22 +00:00
Hieu Hoang
b3a712baa0 output reordering only 2013-12-18 18:40:23 +00:00
Hieu Hoang
7d497abf41 minor verbose in consolidate-main.cpp 2013-12-06 11:46:19 +00:00
Hieu Hoang
4f6f127486 Merge pull request #53 from pengli09/master
Fix the bug in phrase-extract/extract-main.cpp: the authors forgot to change three variable names
2013-11-20 03:04:41 -08:00
Peng Li
f53825c71e Fix the bug in phrase-extract/extract-main.cpp: the authors forgot to change inBottomRight/outBottomRight to inBottomLeft/outBottomLeft in the second loops in getOrientPhraseModel() and getOrientHierModel() 2013-11-20 16:22:15 +08:00
Hieu Hoang
ccf9662748 Merge branch 'master' of ../mosesdecoder 2013-11-15 14:03:05 +00:00
Phil Williams
6bee77e207 extract-ghkm: use square brackets for glue rule internal tree structure 2013-11-12 15:49:49 +00:00
Hieu Hoang
477314cda4 Merge branch 'master' of github.com:hieuhoang/mosesdecoder 2013-11-12 12:26:35 +00:00
Hieu Hoang
24f95297fc compiles with clang 2013-10-31 12:46:41 +00:00
Hieu Hoang
125e9a8569 add debug argument 2013-10-05 10:48:01 +01:00
Hieu Hoang
902741681a reverse 7d3de78500 2013-10-04 21:27:53 +01:00
Hieu Hoang
7d3de78500 minor error with placeholder 2013-10-04 19:29:16 +01:00
Phil Williams
d6aa123d03 score: write sparse features to third field. 2013-09-29 18:58:20 +01:00
Phil Williams
2a28d1a73e Merge branch 'master' into GHKMStruct
Conflicts:
	moses-chart-cmd/IOWrapper.cpp
	moses-chart-cmd/IOWrapper.h
	moses/FF/Factory.cpp
	moses/Parameter.cpp
	moses/StaticData.h
	phrase-extract/extract-ghkm/ScfgRuleWriter.cpp
	phrase-extract/score-main.cpp
2013-09-29 15:27:09 +01:00
Phil Williams
20b96fd0a7 Oops, fix e497dc485... 2013-09-29 15:23:37 +01:00
Phil Williams
e497dc4857 Remove NT length code missed in commit cdd9df19... 2013-09-29 15:09:14 +01:00
Hieu Hoang
31ce9b510e beautify 2013-09-27 09:35:24 +01:00
Phil Williams
940591a1a3 extract-ghkm: allow trailing whitespace in alignment file
Thanks to Matt Post for reporting the problem.
2013-09-26 15:49:08 +01:00
Phil Williams
29c1089283 consolidate: don't assume input contains key-value field 2013-09-24 09:45:49 +01:00
Phil Williams
74ed066569 consolidate: expect key-value pairs in 7th field, not 6th 2013-09-20 15:50:03 +01:00
Phil Williams
23488e1adb extract-ghkm: use square brackets for --TreeFragments
Use square brackets instead of round brackets for internal tree
structure.  This avoids the need for additional escaping since
square brackets are already escaped in Moses.

Also: tweak code style to match the rest of the source file, and
output less whitespace to make the extract files (marginally)
smaller.
2013-09-20 14:57:40 +01:00
Phil Williams
ab863d1f16 consolidate: write key-value field to rule table 2013-09-20 09:42:13 +01:00
Hieu Hoang
98bb4fa1c7 placeholders work in extract 2013-09-19 12:24:57 +02:00
Hieu Hoang
a40d9082cd more placeholder code and 'NO BEST TRANSLATION' to stderr for pb 2013-09-18 23:47:50 +02:00
Matthias Huck
a6d172e0f1 command line option for extract-ghkm: --TreeFragments 2013-09-16 20:06:02 +01:00
maria nadejde
7cc284a743 comment 2013-09-14 10:50:33 +02:00
maria nadejde
df86f0e78b Merge branch 'GHKMStruct' of github.com:moses-smt/mosesdecoder into GHKMStruct 2013-09-14 10:46:17 +02:00
maria nadejde
5f37a545b1 fixed sparse feature output 2013-09-14 10:44:35 +02:00
Phil Williams
296eb6804a Merge master 2013-09-13 22:32:45 +01:00
Phil Williams
cdd9df19d2 Remove --OutputNTLengths from extract-rules, etc.
The option isn't used in master and the output is compatible with the
current rule table format.  If anyone wants this in master it should
probably be fixed in the span-length branch then merged.
2013-09-13 22:16:42 +01:00
maria nadejde
bf5c32df6c stuff that probably doesn't work 2013-09-13 19:43:04 +02:00
Matthias Huck
643fa18805 Merge branch 'GHKMStruct' of github.com:moses-smt/mosesdecoder into GHKMStruct 2013-09-13 17:13:20 +02:00
Matthias Huck
c39bed60c0 Tree fragments in GHKM glue rules;
output of LHS tag in tree fragments for UNKs;
GHKMParse info is now denoted as Tree info
2013-09-13 17:10:21 +02:00
maria nadejde
fad57a60a7 comment for Equal implementation 2013-09-13 16:13:36 +02:00
maria nadejde
5615a11766 sparse feature weight file 2013-09-13 16:06:48 +02:00
maria nadejde
bff123635e added Dense and Sparse feature to scorer 2013-09-13 12:45:46 +02:00
maria nadejde
43a9323d0f add feature files 2013-09-12 18:46:40 +02:00
maria nadejde
67b873b67d mock feature 2013-09-12 18:40:08 +02:00
Matthias Huck
96d14555fc GHKM tree output during extraction: modified extract-ghkm and score tools 2013-09-11 16:46:37 +02:00
Matthias Huck
004c44faf1 prototype GHKM tree output from extract-ghkm (still flawed) 2013-09-10 15:41:26 +02:00
Rico Sennrich
b421f7c9b0 refactoring to minimize overhead from flexibility score code (if off) 2013-09-07 23:04:40 +02:00
Rico Sennrich
7138056b8f flexibility scores 2013-09-07 23:04:01 +02:00
Nicola Bertoldi
614d7a0376 beautify 2013-08-11 23:43:26 +02:00
Hieu Hoang
77872f7521 beautify 2013-07-30 15:04:37 +01:00
Hieu Hoang
9cdcf713a6 phrase penalty now has it's own ff. No longer in the phrase table 2013-07-29 12:55:44 +01:00
Hieu Hoang
9e8402dedd add placeholder support to extract 2013-07-26 15:46:15 +01:00
Hieu Hoang
e3917f911b add placeholder support to extract 2013-07-26 15:44:29 +01:00
Hieu Hoang
2ba7a372e8 add placeholder support to extract 2013-07-26 14:12:27 +01:00
Hieu Hoang
4fde5f7ea2 eclipse file for extract-rules 2013-07-26 12:27:55 +01:00
Phil Williams
f0b603e6b5 extract-ghkm: write glue grammars for all sentence offsets
extract-parallel now merges separate glue grammars, so remove
previous workaround.
2013-07-25 13:53:32 +01:00
Phil Williams
b5584fdecf extract-ghkm: workaround for extract-parallel issue
Don't write glue grammar or unknown word label files unless the sentence
offset is 0.  This prevents multiple instances of extract-ghkm writing
to the same two files when extract-parallel is used.

TODO Better solutions might be:
 1. modify extract-parallel so that it only configures one instance of
    extract-ghkm to write the glue / unknown-lhs files (like the current
    workaround, this assumes file chunks are representative of the whole)
 2. add multithreading support directly to extract-ghkm
 3. write distinct output files for each extract-ghkm instance and
    combine them on completion
2013-07-23 14:55:16 +01:00
Hieu Hoang
310b26f989 beautify 2013-07-08 20:52:14 +01:00
Hieu Hoang
3eba5782c2 beautify 2013-07-08 20:25:47 +01:00
Hieu Hoang
dc33fa3d3d redo parsing of feature function parameters 2013-06-20 12:50:41 +01:00
Hieu Hoang
abe6bb7c22 refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
Hieu Hoang
6249432407 beautify 2013-05-29 18:16:15 +01:00
phikoehn
41da5b2760 Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2013-05-12 08:16:22 +01:00
Rico Sennrich
ce5311c076 fix undefined behaviour in rule extract (thanks Barry) 2013-05-07 10:50:19 +02:00
Rico Sennrich
a52f0a8c4d avoid costly copy operation in extract-rules
(noticeable speed-up with large number of non-terminals:
2x speed-up in benchmark with target syntax and --MaxNonTerm 5)
2013-05-03 10:48:14 +02:00
phikoehn
d19a28ae21 Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2013-05-01 19:22:00 +01:00
phikoehn
cd8915647b support for Chris Dyer's fast-align; bug fix with sparse word translations feature; threshold pruning in filter 2013-05-01 19:20:05 +01:00
Rico Sennrich
4e87a012d0 fix two bugs with relax-parse:
- size of sentence was not calculated correctly
    (instead, number of positions at which a subtree starts was used)
  - code entered an infinitive loop sometimes; added break condition
2013-04-25 17:27:50 +02:00
phikoehn
5ba153806b fixed kneserNey phrase probability smoothing bug reported by Česlav Przywara <ceslav@przywara.cz> 2013-03-13 17:52:24 +00:00
Barry Haddow
5f1be3217b bugifx format of extract file for instance weighting 2013-03-07 21:40:43 +00:00
Rico Sennrich
e3ea93acb7 speed up rule extraction by factor 2 (by rewriting rule consolidation to have linear instead of quadratic complexity) 2013-02-06 13:10:38 +01:00
Barry Haddow
8db90fd2ac instance weighting for lex reordering 2013-01-10 19:46:19 +00:00
Barry Haddow
2e8bad22e4 lex reordering scoring uses FilePiece/StringPiece 2013-01-09 17:38:48 +00:00
Barry Haddow
861792bfc5 extract can read an instance weights file.
Still have to parallelise.
2012-12-21 15:39:25 +00:00
Phil Williams
139148bc8f extract-ghkm and friends: don't unescape special characters
Don't unescape special characters when reading XML parse trees in
extract-ghkm, extract-rules, and relax-parse.
2012-12-17 20:08:02 +00:00
Phil Williams
0ca5b8932a extract-ghkm: tweak label collection for unknown words
Produce a better label set when unary rule elimination is enabled.
2012-12-17 19:43:42 +00:00
Phil Williams
fb8d20a22f extract-ghkm: --UnknownWordMinRelFreq, --UnknownWordUniform 2012-12-17 19:02:30 +00:00
Hieu Hoang
d0cf8f47db order of lexical probability has flipped 2012-11-22 17:37:36 +00:00
Barry Haddow
a90e1861c0 Alignments on by default for phrase-based 2012-11-15 12:35:43 +00:00
Hieu Hoang
f96b33de83 only include moses root when compiling 2012-11-14 13:43:04 +00:00
Hieu Hoang
5e3ef23cef move moses/src/* to moses/ 2012-11-12 19:56:18 +00:00
Kenneth Heafield
7d692496c3 More little jamfile changes 2012-11-12 16:57:56 +00:00
Kenneth Heafield
d74b784ad2 And pcfg-common too... 2012-11-12 16:53:42 +00:00
Kenneth Heafield
ddd3cc1d8a Fix extract-ghkm compilation 2012-11-12 16:50:46 +00:00
Kenneth Heafield
62d37fa2b6 Refactor phrase-extract/Jamfile 2012-11-12 14:17:48 +00:00
Hieu Hoang
e75522b602 rename functions 2012-11-09 18:55:01 -05:00
Hieu Hoang
db29cc50d0 xcode 2012-11-09 18:13:20 -05:00
Kenneth Heafield
cd00219fa4 Missing dependency 2012-11-04 16:26:47 -05:00
Barry Haddow
62fa6d6f28 Feature function interface for use in scoring 2012-11-02 23:30:51 +00:00
Barry Haddow
d1d5fe4036 Remove -SentenceId (since we have -IncludeSentenceId now) 2012-10-22 22:03:43 +01:00
Barry Haddow
848aafb644 Merge remote branch 'github/master' into miramerge
Conflicts:
	moses/src/AlignmentInfo.cpp
	moses/src/AlignmentInfo.h
	moses/src/ChartHypothesis.cpp
	moses/src/ChartTrellisNode.cpp
	moses/src/LM/Implementation.cpp
	moses/src/LM/Ken.cpp
	moses/src/TargetPhrase.cpp
	moses/src/TargetPhrase.h
2012-10-08 17:54:59 +01:00
Phil Williams
0851a4d113 extract-ghkm: add --SentenceOffset option
This should behave the same as the --SentenceOffset option for
extract-rules.  The extract-parallel.perl script expects the rule
extractor to have this option.
2012-10-03 20:04:09 +01:00
Barry Haddow
0a950ee9f4 Merge remote branch 'github/master' into miramerge
Compiles, but not tested. Had to disable relent filter. Strangely, it seems to contain the
whole of moses-cmd.

Conflicts:
	Jamroot
	OnDiskPt/TargetPhrase.cpp
	moses-cmd/src/Main.cpp
	moses/src/AlignmentInfo.cpp
	moses/src/AlignmentInfo.h
	moses/src/ChartTranslationOptionCollection.cpp
	moses/src/ChartTranslationOptionCollection.h
	moses/src/GenerationDictionary.cpp
	moses/src/Jamfile
	moses/src/Parameter.cpp
	moses/src/PhraseDictionary.cpp
	moses/src/StaticData.cpp
	moses/src/StaticData.h
	moses/src/TargetPhrase.h
	moses/src/TranslationSystem.cpp
	moses/src/TranslationSystem.h
	moses/src/Word.cpp
	phrase-extract/score.cpp
	regression-testing/Jamfile
	scripts/ems/experiment.meta
	scripts/ems/experiment.perl
	scripts/training/train-model.perl
2012-09-26 22:49:33 +01:00
phikoehn
28e8832a15 bug fix domain features 2012-09-25 01:22:09 +01:00
Kenneth Heafield
0cddf8a58b Fix compilation without threads 2012-09-21 23:11:59 +01:00
Eva Hasler
21938e4d94 initialize correct variable (includeSentenceIdFlag) 2012-09-12 20:02:57 +01:00
phikoehn
5d9859ba0e merge issues 2012-09-03 07:27:41 +01:00
phikoehn
e072a7f9a7 merge issues 2012-09-03 07:24:07 +01:00
phikoehn
0e783dc529 bug fix to enable pruned search graph output by default 2012-09-03 07:23:32 +01:00
phikoehn
d99f97297f merges 2012-09-03 07:21:47 +01:00
Hieu Hoang
c639cdbb38 binary hiero reordering feature. Integrated into train-model.perl and experiment.perl. In the 2nd to last position in phrase table, just in front of 2.718 2012-08-28 17:01:08 +01:00
Hieu Hoang
33c03edfbb binary hiero reordering feature. Implementation of 1 described in nist 2012. 1 if non-term is reordered wrt to other words or non-terms. 0 otherwise 2012-08-25 00:47:57 +01:00
Hieu Hoang
fa56d7861f word alignment info for hiero grammar 2012-08-24 19:11:35 +01:00
Hieu Hoang
69fc00faf9 singleton feature in phrase table. Like similar feature in Adam's suffix array, as implemented in cdec 2012-08-24 00:54:05 +01:00
Hieu Hoang
5dbb0e66ce option to produce rules that have boundary <s> & </s> words. Like Chris Dyer's extraction 2012-08-23 19:40:09 +01:00
phikoehn
4a1a995878 a lot of changes 2012-08-18 23:48:26 +01:00
phikoehn
366ab93f8a a lot of changes 2012-08-18 23:47:05 +01:00
Hieu Hoang
aaa2432851 get rid of threading 2012-07-31 23:34:53 +01:00
Hieu Hoang
c778113ba7 get rid of threading 2012-07-31 22:03:11 +01:00
Hieu Hoang
dd4cf4523b get rid of threading 2012-07-31 21:49:38 +01:00
Hieu Hoang
3b65a8c626 get rid of threading 2012-07-31 19:40:43 +01:00
Hieu Hoang
3302301c6d compile error in statistics program 2012-07-31 02:32:58 +01:00
Hieu Hoang
a1ab8e354a cleanup of variables. Need to delete temporary files 2012-07-31 02:21:48 +01:00
Hieu Hoang
7ae76dfe75 multi-threaded extract program. Thanks to Rohit Gupta 2012-07-18 12:46:59 +01:00
Barry Haddow
2b4e61d826 Merge branch 'trunk' into miramerge
Compiles, not tested.

Conflicts:
	Jamroot
	OnDiskPt/PhraseNode.h
	OnDiskPt/TargetPhrase.cpp
	OnDiskPt/TargetPhrase.h
	OnDiskPt/TargetPhraseCollection.cpp
	mert/BleuScorer.cpp
	mert/Data.cpp
	mert/FeatureData.cpp
	moses-chart-cmd/src/Main.cpp
	moses/src/AlignmentInfo.h
	moses/src/ChartManager.cpp
	moses/src/LM/Ken.cpp
	moses/src/LM/Ken.h
	moses/src/LMList.h
	moses/src/LexicalReordering.h
	moses/src/PhraseDictionaryTree.h
	moses/src/ScoreIndexManager.h
	moses/src/StaticData.h
	moses/src/TargetPhrase.h
	moses/src/Word.cpp
	scripts/ems/experiment.meta
	scripts/ems/experiment.perl
	scripts/training/train-model.perl
2012-07-17 13:36:50 +01:00
Hieu Hoang
15b95cd042 use consistent alignment info for lexical probabilities for both forward and inverse scoring 2012-07-12 16:45:43 +01:00
Eva Hasler
027a20730e merge Jamfiles 2012-07-04 11:49:07 +01:00
phikoehn
ff79f9f054 fix conflict
Merge branch 'master' of git://github.com/moses-smt/mosesdecoder

Conflicts:
	scripts/ems/experiment.perl
2012-07-03 00:05:13 +01:00
phikoehn
ce65a47f0d count bin feature 2012-07-03 00:00:21 +01:00
Hieu Hoang
75e038f4cf create namespace for all classes 2012-07-02 17:05:11 +01:00
Hieu Hoang
121e258e84 namespace all classes in mert directory 2012-06-30 21:39:10 +01:00
Hieu Hoang
1cf1c2e515 create namespace for all classes in phrase-extract 2012-06-30 16:56:53 +01:00
Hieu Hoang
ef9db932aa add namespace to phrase-extract 2012-06-30 15:43:47 +01:00
Hieu Hoang
a5ca652a76 move c++ code out of /script/ to / 2012-05-31 17:58:10 +01:00
Hieu Hoang
4eef94b121 move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00