mosesdecoder/sigtest-filter
hieuhoang1972 80d66757c6 win32 compile
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@1509 1f5c12ca-751b-0410-a591-d2e778427230
2007-11-10 09:32:07 +00:00
..
check-install Implementation of idea in "Improving Translation Quality by Discarding Most of the Phrasetable". Johnson et al. 2007. EMNLP. Requires Joy Zhang's SALM toolkit. 2007-07-26 20:26:32 +00:00
filter-pt.cpp win32 compile 2007-11-10 09:32:07 +00:00
Makefile fix bug in phrase tables that don't have extra fields 2007-08-22 22:04:09 +00:00
README.txt make filter-pt compile under windows 2007-08-16 18:13:04 +00:00
sigtest-filter.sln make filter-pt compile under windows 2007-08-16 18:13:04 +00:00
sigtest-filter.vcproj win32 compile 2007-11-10 09:32:07 +00:00
WIN32_functions.cpp win32 compile 2007-11-10 09:32:07 +00:00
WIN32_functions.h win32 compile 2007-11-10 09:32:07 +00:00

Re-implementation of Johnson et al. (2007)'s phrasetable filtering strategy.

This implementation relies on Joy Zhang's SALM Suffix Array toolkit. It is
available here:

  http://projectile.is.cs.cmu.edu/research/public/tools/salm/salm.htm

--Chris Dyer <redpony@umd.edu>

BUILD INSTRUCTIONS
---------------------------------

1. Download and build SALM.

2. make SALMDIR=/path/to/SALM


USAGE INSTRUCTIONS
---------------------------------

1. Using the SALM/Bin/Linux/Index/IndexSA.O32, create a suffix array index
   of the source and target sides of your training bitext.

2. cat phrase-table.txt | ./filter-pt -e TARG.suffix -f SOURCE.suffix \
    -l <FILTER-VALUE>

   FILTER-VALUE is the -log prob threshold described in Johnson et al.
     (2007)'s paper.  It may be either 'a+e', 'a-e', or a positive real
     value.

3. Run with no options to see more use-cases.


REFERENCES
---------------------------------

H. Johnson, J. Martin, G. Foster and R. Kuhn. (2007) Improving Translation
  Quality by Discarding Most of the Phrasetable. In Proceedings of the 2007
  Joint Conference on Empirical Methods in Natural Language Processing and
  Computational Natural Language Learning (EMNLP-CoNLL), pp. 967-975.