Commit Graph

70 Commits

Author SHA1 Message Date
Phil Williams
f0b603e6b5 extract-ghkm: write glue grammars for all sentence offsets
extract-parallel now merges separate glue grammars, so remove
previous workaround.
2013-07-25 13:53:32 +01:00
Phil Williams
b5584fdecf extract-ghkm: workaround for extract-parallel issue
Don't write glue grammar or unknown word label files unless the sentence
offset is 0.  This prevents multiple instances of extract-ghkm writing
to the same two files when extract-parallel is used.

TODO Better solutions might be:
 1. modify extract-parallel so that it only configures one instance of
    extract-ghkm to write the glue / unknown-lhs files (like the current
    workaround, this assumes file chunks are representative of the whole)
 2. add multithreading support directly to extract-ghkm
 3. write distinct output files for each extract-ghkm instance and
    combine them on completion
2013-07-23 14:55:16 +01:00
Hieu Hoang
310b26f989 beautify 2013-07-08 20:52:14 +01:00
Hieu Hoang
3eba5782c2 beautify 2013-07-08 20:25:47 +01:00
Hieu Hoang
dc33fa3d3d redo parsing of feature function parameters 2013-06-20 12:50:41 +01:00
Hieu Hoang
abe6bb7c22 refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
Hieu Hoang
6249432407 beautify 2013-05-29 18:16:15 +01:00
phikoehn
41da5b2760 Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2013-05-12 08:16:22 +01:00
Rico Sennrich
ce5311c076 fix undefined behaviour in rule extract (thanks Barry) 2013-05-07 10:50:19 +02:00
Rico Sennrich
a52f0a8c4d avoid costly copy operation in extract-rules
(noticeable speed-up with large number of non-terminals:
2x speed-up in benchmark with target syntax and --MaxNonTerm 5)
2013-05-03 10:48:14 +02:00
phikoehn
d19a28ae21 Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2013-05-01 19:22:00 +01:00
phikoehn
cd8915647b support for Chris Dyer's fast-align; bug fix with sparse word translations feature; threshold pruning in filter 2013-05-01 19:20:05 +01:00
Rico Sennrich
4e87a012d0 fix two bugs with relax-parse:
- size of sentence was not calculated correctly
    (instead, number of positions at which a subtree starts was used)
  - code entered an infinitive loop sometimes; added break condition
2013-04-25 17:27:50 +02:00
phikoehn
5ba153806b fixed kneserNey phrase probability smoothing bug reported by Česlav Przywara <ceslav@przywara.cz> 2013-03-13 17:52:24 +00:00
Barry Haddow
5f1be3217b bugifx format of extract file for instance weighting 2013-03-07 21:40:43 +00:00
Rico Sennrich
e3ea93acb7 speed up rule extraction by factor 2 (by rewriting rule consolidation to have linear instead of quadratic complexity) 2013-02-06 13:10:38 +01:00
Barry Haddow
8db90fd2ac instance weighting for lex reordering 2013-01-10 19:46:19 +00:00
Barry Haddow
2e8bad22e4 lex reordering scoring uses FilePiece/StringPiece 2013-01-09 17:38:48 +00:00
Barry Haddow
861792bfc5 extract can read an instance weights file.
Still have to parallelise.
2012-12-21 15:39:25 +00:00
Phil Williams
139148bc8f extract-ghkm and friends: don't unescape special characters
Don't unescape special characters when reading XML parse trees in
extract-ghkm, extract-rules, and relax-parse.
2012-12-17 20:08:02 +00:00
Phil Williams
0ca5b8932a extract-ghkm: tweak label collection for unknown words
Produce a better label set when unary rule elimination is enabled.
2012-12-17 19:43:42 +00:00
Phil Williams
fb8d20a22f extract-ghkm: --UnknownWordMinRelFreq, --UnknownWordUniform 2012-12-17 19:02:30 +00:00
Hieu Hoang
d0cf8f47db order of lexical probability has flipped 2012-11-22 17:37:36 +00:00
Barry Haddow
a90e1861c0 Alignments on by default for phrase-based 2012-11-15 12:35:43 +00:00
Hieu Hoang
f96b33de83 only include moses root when compiling 2012-11-14 13:43:04 +00:00
Hieu Hoang
5e3ef23cef move moses/src/* to moses/ 2012-11-12 19:56:18 +00:00
Kenneth Heafield
7d692496c3 More little jamfile changes 2012-11-12 16:57:56 +00:00
Kenneth Heafield
d74b784ad2 And pcfg-common too... 2012-11-12 16:53:42 +00:00
Kenneth Heafield
ddd3cc1d8a Fix extract-ghkm compilation 2012-11-12 16:50:46 +00:00
Kenneth Heafield
62d37fa2b6 Refactor phrase-extract/Jamfile 2012-11-12 14:17:48 +00:00
Hieu Hoang
e75522b602 rename functions 2012-11-09 18:55:01 -05:00
Hieu Hoang
db29cc50d0 xcode 2012-11-09 18:13:20 -05:00
Kenneth Heafield
cd00219fa4 Missing dependency 2012-11-04 16:26:47 -05:00
Barry Haddow
62fa6d6f28 Feature function interface for use in scoring 2012-11-02 23:30:51 +00:00
Barry Haddow
d1d5fe4036 Remove -SentenceId (since we have -IncludeSentenceId now) 2012-10-22 22:03:43 +01:00
Barry Haddow
848aafb644 Merge remote branch 'github/master' into miramerge
Conflicts:
	moses/src/AlignmentInfo.cpp
	moses/src/AlignmentInfo.h
	moses/src/ChartHypothesis.cpp
	moses/src/ChartTrellisNode.cpp
	moses/src/LM/Implementation.cpp
	moses/src/LM/Ken.cpp
	moses/src/TargetPhrase.cpp
	moses/src/TargetPhrase.h
2012-10-08 17:54:59 +01:00
Phil Williams
0851a4d113 extract-ghkm: add --SentenceOffset option
This should behave the same as the --SentenceOffset option for
extract-rules.  The extract-parallel.perl script expects the rule
extractor to have this option.
2012-10-03 20:04:09 +01:00
Barry Haddow
0a950ee9f4 Merge remote branch 'github/master' into miramerge
Compiles, but not tested. Had to disable relent filter. Strangely, it seems to contain the
whole of moses-cmd.

Conflicts:
	Jamroot
	OnDiskPt/TargetPhrase.cpp
	moses-cmd/src/Main.cpp
	moses/src/AlignmentInfo.cpp
	moses/src/AlignmentInfo.h
	moses/src/ChartTranslationOptionCollection.cpp
	moses/src/ChartTranslationOptionCollection.h
	moses/src/GenerationDictionary.cpp
	moses/src/Jamfile
	moses/src/Parameter.cpp
	moses/src/PhraseDictionary.cpp
	moses/src/StaticData.cpp
	moses/src/StaticData.h
	moses/src/TargetPhrase.h
	moses/src/TranslationSystem.cpp
	moses/src/TranslationSystem.h
	moses/src/Word.cpp
	phrase-extract/score.cpp
	regression-testing/Jamfile
	scripts/ems/experiment.meta
	scripts/ems/experiment.perl
	scripts/training/train-model.perl
2012-09-26 22:49:33 +01:00
phikoehn
28e8832a15 bug fix domain features 2012-09-25 01:22:09 +01:00
Kenneth Heafield
0cddf8a58b Fix compilation without threads 2012-09-21 23:11:59 +01:00
Eva Hasler
21938e4d94 initialize correct variable (includeSentenceIdFlag) 2012-09-12 20:02:57 +01:00
phikoehn
5d9859ba0e merge issues 2012-09-03 07:27:41 +01:00
phikoehn
e072a7f9a7 merge issues 2012-09-03 07:24:07 +01:00
phikoehn
0e783dc529 bug fix to enable pruned search graph output by default 2012-09-03 07:23:32 +01:00
phikoehn
d99f97297f merges 2012-09-03 07:21:47 +01:00
Hieu Hoang
c639cdbb38 binary hiero reordering feature. Integrated into train-model.perl and experiment.perl. In the 2nd to last position in phrase table, just in front of 2.718 2012-08-28 17:01:08 +01:00
Hieu Hoang
33c03edfbb binary hiero reordering feature. Implementation of 1 described in nist 2012. 1 if non-term is reordered wrt to other words or non-terms. 0 otherwise 2012-08-25 00:47:57 +01:00
Hieu Hoang
fa56d7861f word alignment info for hiero grammar 2012-08-24 19:11:35 +01:00
Hieu Hoang
69fc00faf9 singleton feature in phrase table. Like similar feature in Adam's suffix array, as implemented in cdec 2012-08-24 00:54:05 +01:00
Hieu Hoang
5dbb0e66ce option to produce rules that have boundary <s> & </s> words. Like Chris Dyer's extraction 2012-08-23 19:40:09 +01:00