Commit Graph

640 Commits

Author SHA1 Message Date
Nicola Bertoldi
4b072f2097 merge master into this branch 2014-01-17 14:04:15 +01:00
jiejiang
5f1217d793 merged upstream with origin for mingw 2014-01-15 18:16:56 +00:00
Nicola Bertoldi
e452a13062 beautify 2014-01-15 16:49:57 +01:00
jiejiang
744376b3fb moses windows build, with some TODO list 2013-12-18 20:15:39 +00:00
Hieu Hoang
cedc815c88 return default even if compiling with non-gcc 2013-11-25 16:38:19 +00:00
Hieu Hoang
905ab6de82 stop warning about incomplete data type 2013-11-25 16:24:51 +00:00
Hieu Hoang
d9be81596e replace CHECK with UTIL_THROW_IF in mert 2013-11-18 18:13:10 +00:00
Hieu Hoang
d6d0877ea3 add comment for future work on making fdstream more portable /Jeroen Vermeulen 2013-11-15 11:56:00 +00:00
Hieu Hoang
17887a2796 replace nth_element() with macro that execute sort() instead for gcc 4.8.1 & 4.8.2 2013-11-15 10:55:38 +00:00
Hieu Hoang
3d37a8ffda more compile errors, with clang 2013-11-14 19:30:45 +00:00
Hieu Hoang
862e1ad4ae more gcc compile errors 2013-11-14 19:15:53 +00:00
Hieu Hoang
dfbad35abd minor compile errors on gcc now 2013-11-14 18:59:36 +00:00
Hieu Hoang
0b1bb6a443 mert compiles under Mac OSX Mavericks. #ifdef PreProcessFilter.cpp and accomplices 2013-11-14 17:58:59 +00:00
Hieu Hoang
f35750bc08 beautify 2013-07-04 20:19:51 +01:00
Hieu Hoang
b10159a29f mac compile 2013-07-03 20:19:45 +01:00
Sara Stymne
b2eb42ed12 added document level Bleu scoring to mert 2013-07-03 14:03:58 +02:00
Hieu Hoang
abe6bb7c22 refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
Hieu Hoang
b9f54b195a implement GenerationDictionary.Load() 2013-06-05 13:42:56 +01:00
Matous Machacek
3055be8837 fixed two bugs in CderScorer.cpp 2013-06-03 16:59:13 +02:00
Hieu Hoang
6249432407 beautify 2013-05-29 18:16:15 +01:00
phikoehn
4cdffc8a89 fixes for sparse feature handling 2013-05-17 08:37:29 +01:00
Hieu Hoang
fd4e954322 merge 2013-03-24 09:57:36 +00:00
Matous Machacek
7b9c5c1194 fixed bug in InterpolatedScorer 2013-03-19 23:08:28 +01:00
Hieu Hoang
5c10a2889e Merge github.com:moses-smt/mosesdecoder into weight-new 2013-02-20 17:02:34 +00:00
Barry Haddow
9ca364fb22 Implement brevity penalty smoothing for PRO
As in Nakov et al (Coling 2012)
2013-02-18 11:11:20 +00:00
hieu
01243e415a Merge branch 'master' into weight-new 2013-01-03 17:17:35 +00:00
hieu
1dfbe1113c delete MergeScorer in mert/ 2013-01-03 15:01:30 +00:00
Hieu Hoang
4f0d3c2032 Merge branch 'master' into weight-new 2012-12-20 20:27:50 +00:00
Tetsuo Kiso
ce1b650b53 Fix memory leak.
The object was allocated with new, but it was not deleted.
This may not be a serious problem because the program mostly runs
a short time. However, it is not a good practice.
2012-12-21 03:06:41 +09:00
Hieu Hoang
bc615bdac8 Merge branch 'master' into weight-new 2012-12-18 12:46:00 +00:00
hieu
05045d574c don't display unknown weight penalty when showing weight, don't usually tune. Also, change delimiter in mert extractor from : to = 2012-12-16 18:29:53 +00:00
Ales Tamchyna
598d65bcfd adding a simple command-line utility for computing sentence-level BLEU (+1) 2012-12-10 13:12:34 +01:00
Tetsuo Kiso
2a3c9fc679 Further optimization for extractor.
Fixes inefficient updating N-gram counts.

NOTE: Using '--binary' option (this option is not enabled by default yet)
for saving outputs would lead to significant speed up.
2012-12-07 08:45:47 +09:00
Tetsuo Kiso
8fdec9bf30 Use boost::unordered_map instead of std::map.
For storing the word vocabulary used in computation of
BLEU scores. This change will reduce the running time
of extractor about 2-3 seconds (9% reduction).
2012-12-07 05:12:24 +09:00
Tetsuo Kiso
6c04c4ad9c Add more tests to the Data class. 2012-12-07 02:46:59 +09:00
Tetsuo Kiso
c7f6e38326 Use FilePiece to load N-best lists.
Since FilePiece is friendly with StringPiece.
2012-12-07 02:39:02 +09:00
Tetsuo Kiso
38e145e556 Use util::TokenIter to tokenize n-best lists.
Reduce creating std::string objects, too. In both ScoreArray
and FeatureArray classes, the private members to track sentence
indices (namely, "m_index") were unnecessarily declared as
std::string, but it's better to directly declare them as 'int'.
2012-12-07 01:39:22 +09:00
Tetsuo Kiso
cd3fb3b831 Untabify. 2012-12-06 23:46:22 +09:00
Tetsuo Kiso
ac045a11c1 Speed up N-gram counts when running extractor.
By replacing std::map with boost::unordered_map.

Runtime of extractor on 100-best lists of 2679 sentences:

Before:
real    0m35.314s
user    0m34.030s
sys     0m1.280s

Ater:
real    0m26.729s
user    0m25.420s
sys     0m1.310s
2012-12-06 22:08:33 +09:00
Hieu Hoang
f96b33de83 only include moses root when compiling 2012-11-14 13:43:04 +00:00
Hieu Hoang
f8438f80cc move moses/src/* to moses/ 2012-11-12 20:30:39 +00:00
Hieu Hoang
5e3ef23cef move moses/src/* to moses/ 2012-11-12 19:56:18 +00:00
Kenneth Heafield
e9eb7dd021 More shared build fixes 2012-11-07 23:28:42 +01:00
Kenneth Heafield
d7ecd0be1a Remove bleu_lib target. 2012-11-07 23:21:59 +01:00
Tetsuo Kiso
96f7b42eb9 Move implementation details from the header to .cpp file.
Also add const to variables that we don't want to change.
2012-11-05 01:24:16 +09:00
Tetsuo Kiso
cccfb9a0c9 Using namespace std in a header file pollutes the global namespace.
Using directives should be put into the implementation files.
2012-11-05 00:43:36 +09:00
Kenneth Heafield
dc31857bbc Reduce header pollution 2012-10-30 20:25:05 +00:00
Barry Haddow
848aafb644 Merge remote branch 'github/master' into miramerge
Conflicts:
	moses/src/AlignmentInfo.cpp
	moses/src/AlignmentInfo.h
	moses/src/ChartHypothesis.cpp
	moses/src/ChartTrellisNode.cpp
	moses/src/LM/Implementation.cpp
	moses/src/LM/Ken.cpp
	moses/src/TargetPhrase.cpp
	moses/src/TargetPhrase.h
2012-10-08 17:54:59 +01:00
Arianna Bisazza
c78d571561 fixed bug in permutation scores due to scientific notation 2012-10-08 16:15:59 +01:00
Barry Haddow
0a950ee9f4 Merge remote branch 'github/master' into miramerge
Compiles, but not tested. Had to disable relent filter. Strangely, it seems to contain the
whole of moses-cmd.

Conflicts:
	Jamroot
	OnDiskPt/TargetPhrase.cpp
	moses-cmd/src/Main.cpp
	moses/src/AlignmentInfo.cpp
	moses/src/AlignmentInfo.h
	moses/src/ChartTranslationOptionCollection.cpp
	moses/src/ChartTranslationOptionCollection.h
	moses/src/GenerationDictionary.cpp
	moses/src/Jamfile
	moses/src/Parameter.cpp
	moses/src/PhraseDictionary.cpp
	moses/src/StaticData.cpp
	moses/src/StaticData.h
	moses/src/TargetPhrase.h
	moses/src/TranslationSystem.cpp
	moses/src/TranslationSystem.h
	moses/src/Word.cpp
	phrase-extract/score.cpp
	regression-testing/Jamfile
	scripts/ems/experiment.meta
	scripts/ems/experiment.perl
	scripts/training/train-model.perl
2012-09-26 22:49:33 +01:00
Arianna Bisazza
ff276e9911 Fixed several bugs in LRscore-MERT. Namely, solved a float-to-int conversion; added hypothesis counter to the scores file to enable later computation of average reordering score; fixed special case of 1-word hypothesis; enabled reading of word-based alignments from n-best-list. 2012-09-24 15:40:18 +02:00
Colin Cherry
ae6ac1c2ae Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2012-09-14 14:07:53 -04:00
Colin Cherry
3fa95c022b Addad a "--safe-hope" option to kbmira.
This will limit the influence of model score on oracle (hope) selection.
Good for cases whith extremely large feature values. May make it the defult.
2012-09-14 13:58:28 -04:00
Barry Haddow
021f5702a7 remove obsolete file 2012-09-13 22:21:31 +01:00
Barry Haddow
2b4e61d826 Merge branch 'trunk' into miramerge
Compiles, not tested.

Conflicts:
	Jamroot
	OnDiskPt/PhraseNode.h
	OnDiskPt/TargetPhrase.cpp
	OnDiskPt/TargetPhrase.h
	OnDiskPt/TargetPhraseCollection.cpp
	mert/BleuScorer.cpp
	mert/Data.cpp
	mert/FeatureData.cpp
	moses-chart-cmd/src/Main.cpp
	moses/src/AlignmentInfo.h
	moses/src/ChartManager.cpp
	moses/src/LM/Ken.cpp
	moses/src/LM/Ken.h
	moses/src/LMList.h
	moses/src/LexicalReordering.h
	moses/src/PhraseDictionaryTree.h
	moses/src/ScoreIndexManager.h
	moses/src/StaticData.h
	moses/src/TargetPhrase.h
	moses/src/Word.cpp
	scripts/ems/experiment.meta
	scripts/ems/experiment.perl
	scripts/training/train-model.perl
2012-07-17 13:36:50 +01:00
bhaddow
d0f1c15105 enable single character option 2012-07-12 19:47:57 +01:00
Barry Haddow
c303142ab2 option to skip duplicate removal 2012-07-12 19:08:55 +01:00
Colin Cherry
662e7e7f64 As requested by my bosses: added NRC copyright to kbmira. 2012-07-10 13:13:50 -04:00
Hieu Hoang
7d664b745e Integrate Lexi's LR Score into tuning 2012-07-10 09:25:00 +01:00
Eva Hasler
027a20730e merge Jamfiles 2012-07-04 11:49:07 +01:00
Hieu Hoang
75e038f4cf create namespace for all classes 2012-07-02 17:05:11 +01:00
Hieu Hoang
b5aa04feb7 compile error 2012-07-02 10:23:26 +01:00
Hieu Hoang
121e258e84 namespace all classes in mert directory 2012-06-30 21:39:10 +01:00
Hieu Hoang
e3dd3a8d2c namespace all classes in mert directory 2012-06-30 20:23:45 +01:00
Colin Cherry
65df386581 Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2012-06-26 17:07:27 -04:00
Colin Cherry
58c3280c2c HypPackEnumerator now stores MiraFeatureVectors, as opposed to
FeatureDataItems. Uses roughly half the memory.
2012-06-26 17:02:32 -04:00
Hieu Hoang
3c7b7ac9f5 rollback 2012-06-26 16:31:38 -04:00
Colin Cherry
32299593fa Added debugging info to kbmira. 2012-06-26 16:29:20 -04:00
Hieu Hoang
153e80053c lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe 2012-06-26 13:33:50 -04:00
Hieu Hoang
00f018a477 Merge https://github.com/moses-smt/mosesdecoder into lrscore 2012-06-25 16:57:17 -04:00
Hieu Hoang
2a03f275a3 change regression data download to git instead of download from edin server.
Minor change in mert/trimStr() function to prevent warning
2012-06-25 16:03:11 -04:00
Hieu Hoang
8498b17a41 gcc version-specific error 2012-06-25 14:45:45 +01:00
Hieu Hoang
0fd0adc1f6 merge Lexi Birch's LRScore from mert_mtm5 branch. Compiles and run. Hack, must double check with barry or lexi 2012-06-23 22:58:18 -04:00
Hieu Hoang
0cb63edcb9 merge Lexi Birch's LRScore from mert_mtm5 branch. Compiles and run. Hack, must double check with barry or lexi 2012-06-23 22:51:48 -04:00
Hieu Hoang
f48c348508 typo 2012-06-22 22:23:14 -04:00
Hieu Hoang
b1ca36387f mert now compiles with PermScorer. However, didn't implement score() - assert(false). Update Jamfile 2012-06-22 21:07:05 -04:00
Hieu Hoang
7d19fe13ae merge Lexi Birch's LRScore from mert_mtm5 branch 2012-06-22 18:19:16 +01:00
Colin Cherry
07a5c67ebc Merge branch 'master' into miramerge
Conflicts:
	Jamroot
	misc/queryPhraseTable.cpp
	scripts/training/train-model.perl
2012-06-14 17:08:16 -04:00
Colin Cherry
5932800489 Spurious space disagreed with master 2012-06-14 14:15:06 -04:00
Colin Cherry
a8a5f896db Fixed some bugs in BatchMira's sparse feature handling. 2012-06-14 14:09:06 -04:00
Colin Cherry
a901fc9f50 Fixed some bugs in BatchMira's sparse feature handling. 2012-06-14 13:41:47 -04:00
Tetsuo Kiso
1dbd8e5ec5 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2012-06-09 19:33:31 +09:00
Tetsuo Kiso
2599ef6dc3 Bug fix: kbmira failed to load dense weights. 2012-06-09 18:03:12 +09:00
Eva Hasler
e1c1a5343c merge 2012-06-07 11:16:52 +01:00
Eva Hasler
6a6a35c65e fix start weights in experiment.perl, add hypothesis queue for picking hope and fear translations, add variations to 1slack formulation 2012-06-01 01:49:42 +01:00
Tetsuo Kiso
713ff8c5e2 Delete mert/init.opt.
It looks like the file was no longer used.
2012-06-01 02:25:25 +09:00
Hieu Hoang
465c5cbf97 move all executables into bin 2012-05-31 12:55:05 +01:00
Colin Cherry
3c44d04baf Merge branch 'master' into miramerge
Conflicts:
	Jamroot
	mert/FeatureStats.cpp
	moses-cmd/src/IOWrapper.h
	scripts/training/mert-moses.pl
	scripts/training/train-model.perl.missing_bin_dir
2012-05-30 12:39:53 -04:00
Tetsuo Kiso
beb2256dba Move 'using namespace std' out from .h.
Add "std" to size_t, too.
2012-05-30 23:11:09 +09:00
Tetsuo Kiso
01eb60f350 Add "virtual" destructor to the HypPackEnumerator class. 2012-05-30 22:59:23 +09:00
Hieu Hoang
d25805858d xcode build supports threads. move 'using namespace' out from .h file to stop namespace pollution 2012-05-30 13:04:02 +01:00
Hieu Hoang
45870348ff xcode build supports threads. move 'using namespace' out from .h file to stop namespace pollution 2012-05-30 12:47:20 +01:00
Colin Cherry
fd577d7a65 Batch k-best MIRA is written and integrated into mert-moses.pl
Regression tests all check out, and kbmira seems to work fine
on a Hansard French->English task.

HypPackEnumerator class may be of interest to pro.cpp and future
optimizers, as it abstracts a lot of the boilerplate involved in
enumerating multiple k-best lists.

MiraWeightVector is not really mira-specific - just a weight vector
that enables efficient averaging. Could be useful to a perceptron
as well. Same goes for MiraFeatureVector.

Interaction with sparse features is written, but untested.
2012-05-29 13:38:57 -04:00
Barry Haddow
c397d2068b Merge branch 'trunk' into miramerge. Still to fix build.
Conflicts:
	Jamroot
	mert/Data.cpp
	mert/Data.h
	mert/FeatureArray.cpp
	mert/FeatureArray.h
	mert/FeatureData.cpp
	mert/FeatureData.h
	mert/FeatureStats.cpp
	mert/FeatureStats.h
	mert/mert.cpp
	moses-chart-cmd/src/IOWrapper.h
	moses-chart-cmd/src/Main.cpp
	moses-cmd/src/IOWrapper.cpp
	moses-cmd/src/IOWrapper.h
	moses-cmd/src/Main.cpp
	moses/src/GlobalLexicalModel.cpp
	moses/src/Jamfile
	moses/src/Parameter.cpp
	moses/src/PhraseDictionary.cpp
	moses/src/ScoreIndexManager.h
	moses/src/TargetPhrase.h
	regression-testing/tests/phrase.lexicalized-reordering-bin/truth/results.txt
	regression-testing/tests/phrase.lexicalized-reordering-cn/truth/results.txt
	regression-testing/tests/phrase.lexicalized-reordering/truth/results.txt
	regression-testing/tests/phrase.multiple-translation-system-lr/truth/results.txt
	regression-testing/tests/phrase.show-weights.lex-reorder/truth/results.txt
	regression-testing/tests/phrase.show-weights/truth/results.txt
	scripts/ems/experiment.meta
	scripts/ems/experiment.perl
	scripts/training/filter-model-given-input.pl
	scripts/training/mert-moses.pl
2012-05-24 21:11:35 +01:00
Matous Machacek
a77cca4f86 Fixed CderScorer name bug 2012-05-15 00:35:08 +02:00
Matous Machacek
7da028e240 Fixed CderScorer name bug 2012-05-15 00:35:08 +02:00
Matous Machacek
3943112eb3 Fixed bug in SemposScorer.cpp 2012-05-13 11:11:13 +02:00
Matous Machacek
7a0c42b1bb Fixed bug in SemposScorer.cpp 2012-05-13 11:11:13 +02:00
Matous Machacek
97f82a3e4d Fixed interpolated scorer 2012-05-12 16:11:33 +02:00
Matous Machacek
8343a469e0 Fixed interpolated scorer 2012-05-12 16:11:33 +02:00