Commit Graph

2062 Commits

Author SHA1 Message Date
pjwilliams
3dec57a518 When scoring phrase pairs, store copies of the active pairs' PHRASE objects
instead of inserting them into a PhraseTable.  In a test on a 21GB
target-syntax extract file, this reduced user time from 195 to 120 mins.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3777 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-14 23:49:57 +00:00
pjwilliams
627d8edf8e Fix bug affecting Good-Turing discounting: repeated phrase pairs were always
contributing a count of 1 because PhraseAlignment::addToCount() was looking
for counts in the fifth column, not the fourth.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3775 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-14 16:31:53 +00:00
hieuhoang1972
41c5b3a1c2 xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3769 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-10 13:04:06 +00:00
heafield
5e9df58a3c Respect -v 0
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3768 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-09 22:13:09 +00:00
heafield
915cb22b6a Make tests run on OS X. This was an issue with the test (and its use of popen) not with the code.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3767 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-09 21:58:54 +00:00
bhaddow
4174082396 Non-breaking prefixes for Dutch
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3764 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-08 16:09:24 +00:00
bhaddow
2e77dce57e improvement to error message
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3763 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-08 10:13:19 +00:00
heafield
9062e3b73b KenLM update: allow user to specify data structure and parameters on command line to
build_binary.  Also some minor bugfixes.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3762 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-08 03:15:37 +00:00
dowobeha
44b3af7cac Re-enabled --skip-decoder in mert-moses.pl
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3759 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-03 16:44:13 +00:00
rafpayen
be92193c03 fix for multiple whitespace in dictionary
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3750 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-30 11:16:07 +00:00
rafpayen
51fd4afb79 add giza dictionary option
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3749 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-30 11:05:09 +00:00
bhaddow
0e5fbcdb4a Add show-weights for moses_chart
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3745 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-29 17:05:16 +00:00
bhaddow
50f0e6c07d Add a show-weights option. It prints out the moses features and exits. May
load tables as a side-effect.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3744 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-29 16:44:28 +00:00
mphi
ddabdf6b1b added support for arbitrary encodings via the $IO_ENCODING global variable on line 23; set to UTF8 by default
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3739 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-29 09:04:44 +00:00
heafield
eabc137306 Make kenlm tests compile on more systems by adding headers.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3738 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-28 21:16:06 +00:00
heafield
0d8b62791e kenlm update.
Improved portability:
Hopefully handle big endian architectures (trie will fail at runtime with if this isn't working yet).  
Remove dependence on err.h.
Handle some Solaris weirdness wrt mmap and strerror.  
Clean up murmur_hash header.  

Add comparison and ZeroRemaining requested by Chris Dyer.  

More number parsing in FilePiece.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3737 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-28 02:54:56 +00:00
hieuhoang1972
71093403df use gzipped extract file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3736 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-25 13:54:40 +00:00
hieuhoang1972
dd6c1e722e use gzipped extract file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3729 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-23 14:30:36 +00:00
hieuhoang1972
867a9bdf4b use gzipped extract file
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3728 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-23 14:15:54 +00:00
xandfraser
c0c617a8c4 Added print word alignment in nbest
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3726 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-22 15:34:53 +00:00
bhaddow
6255216b6a Remove gnu-specific typeof
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3725 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-22 10:05:17 +00:00
hieuhoang1972
6f5d1e4732 deleting offending comment
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3724 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-21 16:35:31 +00:00
hieuhoang1972
4bc0a8e6b2 can set max num of lines for GT discount calc.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3723 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-19 20:11:10 +00:00
bhaddow
a7e0977eea Fix compile error by using correct macro.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3720 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-18 10:27:30 +00:00
chardmeier
837a667a95 Cleaned up language modelling code by disentangling the decoder's LM feature
function from the LM toolkit abstraction layer. There are two different groups
of classes now:
- LanguageModel, which inherits from StatefulFeatureFunction and contains
  the n-gram model feature function.
- LanguageModelImplementation, which is the base class of the individual
  LM implementations (SRI, IRST, RandLM, KenLM) and provides methods to
  query LM probabilities and states.
Each LanguageModel controls a LanguageModelImplementation. Implementations can
be shared by more than one LanguageModel.
This should make it easier to use the LM libraries as a backend for other
feature functions while retaining the flexibility to use different LM toolkits.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3719 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-17 14:06:21 +00:00
chardmeier
d18ff948f5 Bugfixes in srilm adaptor.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3718 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-17 13:23:44 +00:00
bojar
6616dd3f62 prettified usage string
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3714 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-16 00:26:50 +00:00
bojar
5c3a38bc2e fixed behaviour wrt to weight-d, don't expect it unconditionally as moses-chart
does not use it


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3713 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-16 00:00:17 +00:00
hieuhoang1972
57e3a92836 rollback. argument not supported by all iconv
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3712 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-15 12:50:11 +00:00
leven101
84d83480b6 function name changes
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3711 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-15 11:32:02 +00:00
leven101
5251a2823a separated source and target vocab in suffixarrays to support unequal factors
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3710 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-15 11:28:27 +00:00
hieuhoang1972
ff339e56e3 don't drop unknown char. replace it with improbable string. avoid misalignment
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3709 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-14 20:50:15 +00:00
hieuhoang1972
f7904a871c add scripts to exclude unparseable sentences
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3704 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-12 14:43:52 +00:00
hieuhoang1972
687cf9bf29 add scripts to exclude unparseable sentences
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3702 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-12 14:20:11 +00:00
hieuhoang1972
a79a6bbaec add scripts to exclude unparseable sentences
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3700 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-11 18:04:16 +00:00
hieuhoang1972
f1f04daa0a add empty line if input is empty line
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3699 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-10 12:11:55 +00:00
bojar
2ea140062b don't warn about probs outside [0,1] in -verbose 0
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3698 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-10 11:51:26 +00:00
bojar
ff56054a03 removed --inputweights, read this information from link-param-count instead
added negatable --starting-weights-from-ini (defaulting to yes)
improved documentation of --activate-features


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3697 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-10 11:25:40 +00:00
bojar
9838de2a81 handle also gzipped ini files
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3696 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-10 11:21:28 +00:00
nicolabertoldi
d38b319405 workaround to force the use of the bash shell in the SGE
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3695 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-10 10:32:34 +00:00
heafield
82f29bfc16 Chris Dyer says this should make things compile better on OS X.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3694 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-10 02:05:51 +00:00
hieuhoang1972
3b6b002df8 xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3691 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-09 13:25:09 +00:00
bgottesman
518035ed05 add --possiblyUseFirstToken option, which, when selected, allows certain sentence-initial tokens to be taken into account. See comment in header or support mailing list discussion.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3690 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-09 11:05:23 +00:00
hieuhoang1972
9a72825d29 mac compile
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3689 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-08 16:09:04 +00:00
heafield
2784923899 Rename a bunch of kenlm files. A ./regenerate-makefiles.sh is required.
Make loading with MAP_POPULATE on Linux and read on other OSes the default.
Use LM #9 for lazy loading, as recommended by other devs.  
Slightly faster trie.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3688 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-06 00:40:16 +00:00
leven101
34b45c0480 removed debug messages from BilingualDynSuffixArray.cpp
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3687 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-04 18:41:04 +00:00
bhaddow
3aee6fab5d Use correct conditional compilation flag for threaded moses
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3686 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-03 18:43:18 +00:00
heafield
bf88f87d78 Fix return value of FilePiece::ReadLine at end of file. Did not impact existing kenlm (since
they don't read to the end of file) but will impact future versions.  


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3682 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-29 17:53:19 +00:00
heafield
c12c2c59d2 Autodetect model from binary format.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3675 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-28 01:05:04 +00:00
hieuhoang1972
eb374bf082 cygwin build
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3674 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-27 20:47:28 +00:00