Commit Graph

7 Commits

Author SHA1 Message Date
heafield
1e05ab182e Allow broken IRST ARPAs to still build but be passive-aggressive about it.
Slight update to SRI wrapper that nobody uses anyway.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3980 1f5c12ca-751b-0410-a591-d2e778427230
2011-05-17 16:43:05 +00:00
heafield
4674d9bcfd Change default <unk> to -100.0. Fix some exception printing.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3933 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-21 14:40:21 +00:00
heafield
7b5d0234c6 More error handling:
<s> and </s> throw up is optional, but default.  
If a binary file makes it to the ARPA parser (somebody gzipped a binary file or passed it build binary), the message is more informative.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3905 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-24 19:37:39 +00:00
heafield
6f63bb4161 Prevent people from loading partially built binary files. Partially build files made with the old build_binary will still load, but any partial files with the new build_binary (this revision) will throw an error on load.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3904 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-24 17:11:53 +00:00
heafield
22ce1d2f19 kenlm update
- Fix case where "foo bar baz" appears but "bar baz" does not.  Previously probing silently returned the wrong answer and trie silently broke.  
- More aggressive recombination: if "baz quux" is never followed by any word, then do not include "bar" in the state.  
- kenlm assumes that "foo bar" is present if "foo bar baz" is.  This is now checked.  
- Binary format version number bump because the format has changed to support the above.  
- Lower memory consumption trie building.  But it will take longer for to ensure correct handling of blanks and aggressive recombination.  
- Fix progress bar newlines on trie building.

Agrees with SRI's 1-best outputs on the WMT 10 evaluation set.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3847 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-25 19:11:48 +00:00
heafield
9062e3b73b KenLM update: allow user to specify data structure and parameters on command line to
build_binary.  Also some minor bugfixes.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3762 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-08 03:15:37 +00:00
heafield
2784923899 Rename a bunch of kenlm files. A ./regenerate-makefiles.sh is required.
Make loading with MAP_POPULATE on Linux and read on other OSes the default.
Use LM #9 for lazy loading, as recommended by other devs.  
Slightly faster trie.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3688 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-06 00:40:16 +00:00