Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder

2024-12-26 05:14:36 +03:00 · 2014-06-11 13:44:22 +01:00 · 2014-06-11 13:44:22 +01:00 · 89a9c410c9
commit 89a9c410c9
parent 45648d03b9 2f752fe833
51 changed files with 1541 additions and 291 deletions
--- a/1
+++ b/1
@ -145,6 +145,7 @@ build-projects lm util phrase-extract search moses moses/LM mert moses-cmd moses
 if [ option.get "with-mm" : : "yes" ]
 {
 alias mm :  
+  moses/TranslationModel/UG//lookup_mmsapt 
  moses/TranslationModel/UG/mm//mtt-build 
  moses/TranslationModel/UG/mm//mtt-dump 
  moses/TranslationModel/UG/mm//symal2mam 
--- a/contrib/moses-speedtest/README.md
+++ b/contrib/moses-speedtest/README.md
@ -0,0 +1,122 @@
+# Moses speedtesting framework 
+
+### Description
+
+This is an automatic test framework that is designed to test the day to day performance changes in Moses.
+
+### Set up
+
+#### Set up a Moses repo
+Set up a Moses repo and build it with the desired configuration.
+```bash
+git clone https://github.com/moses-smt/mosesdecoder.git
+cd mosesdecoder
+./bjam -j10 --with-cmph=/usr/include/
+```
+You need to build Moses first, so that the testsuite knows what command you want it to use when rebuilding against newer revisions.
+
+#### Create a parent directory.
+Create a parent directory where the **runtests.py** and related scripts and configuration file should reside.
+This should also be the location of the TEST_DIR and TEST_LOG_DIR as explained in the next section.
+
+#### Set up a global configuration file.
+You need a configuration file for the testsuite. A sample configuration file is provided in **testsuite\_config**
+<pre>
+MOSES_REPO_PATH: /home/moses-speedtest/moses-standard/mosesdecoder
+DROP_CACHES_COMM: sys_drop_caches 3
+TEST_DIR: /home/moses-speedtest/phrase_tables/tests
+TEST_LOG_DIR: /home/moses-speedtest/phrase_tables/testlogs
+BASEBRANCH: RELEASE-2.1.1
+</pre>
+
+The _MOSES\_REPO\_PATH_ is the place where you have set up and built moses.
+The _DROP\_CACHES\_COMM_ is the command that would beused to drop caches. It should run without needing root access.
+_TEST\_DIR_ is the directory where all the tests will reside.
+_TEST\_LOG\_DIR_ is the directory where the performance logs will be gathered. It should be created before running the testsuite for the first time.
+_BASEBRANCH_ is the branch against which all new tests will be compared. It should normally be set to be the latest Moses stable release.
+
+### Creating tests
+
+In order to create a test one should go into the TEST_DIR and create a new folder. That folder will be used for the name of the test.
+Inside that folder one should place a configuration file named **config**. The naming is mandatory.
+An example such configuration file is **test\_config**
+
+<pre>
+Command: moses -f ... -i fff #Looks for the command in the /bin directory of the repo specified in the testsuite_config
+LDPRE: ldpreloads #Comma separated LD_LIBRARY_PATH:/, 
+Variants: vanilla, cached, ldpre #Can't have cached without ldpre or vanilla
+</pre>
+
+The _Command:_ line specifies the executable (which is looked up in the /bin directory of the repo.) and any arguments necessary. Before running the test, the script cds to the current test directory so you can use relative paths.
+The _LDPRE:_ specifies if tests should be run with any LD\_PRELOAD flags.
+The _Variants:_ line specifies what type of tests should we run. This particular line will run the following tests:
+1. A Vanilla test meaning just the command after _Command_ will be issued.
+2. A vanilla cached test meaning that after the vanilla test, the test will be run again without dropping caches in order to benchmark performance on cached filesystem.
+3. A test with LD_PRELOAD ldpreloads moses -f command. For each available LDPRELOAD comma separated library to preload.
+4. A cached version of all LD_PRELOAD tests.
+
+### Running tests.
+Running the tests is done through the **runtests.py** script.
+
+#### Running all tests.
+To run all tests, with the base branch and the latests revision (and generate new basebranch test data if such is missing) do a:
+```bash
+python3 runtests.py -c testsuite_config
+```
+
+#### Running specific tests.
+The script allows the user to manually run a particular test or to test against a specific branch or revision:
+<pre>
+moses-speedtest@crom:~/phrase_tables$ python3 runtests.py --help
+usage: runtests.py [-h] -c CONFIGFILE [-s SINGLETESTDIR] [-r REVISION]
+                   [-b BRANCH]
+
+A python based speedtest suite for moses.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -c CONFIGFILE, --configfile CONFIGFILE
+                        Specify test config file
+  -s SINGLETESTDIR, --singletest SINGLETESTDIR
+                        Single test name directory. Specify directory name,
+                        not full path!
+  -r REVISION, --revision REVISION
+                        Specify a specific revison for the test.
+  -b BRANCH, --branch BRANCH
+                        Specify a branch for the test.
+</pre>
+
+### Generating HTML report.
+To generate a summary of the test results use the **html\_gen.py** script. It places a file named *index.html* in the current script directory.
+```bash
+python3 html_gen.py testsuite_config
+```
+You should use the generated file with the **style.css** file provided in the html directory.
+
+### Command line regression testing.
+Alternatively you could check for regressions from the command line using the **check\_fo\r_regression.py** script:
+```bash
+python3 check_for_regression.py TESTLOGS_DIRECTORY
+```
+
+Alternatively the results of all tests are logged inside the the specified TESTLOGS directory so you can manually check them for additional information such as date, time, revision, branch, etc...
+
+### Create a cron job:
+Create a cron job to run the tests daily and generate an html report. An example *cronjob* is available.
+```bash
+#!/bin/sh
+cd /home/moses-speedtest/phrase_tables
+
+python3 runtests.py -c testsuite_config #Run the tests.
+python3 html_gen.py testsuite_config #Generate html
+
+cp index.html /fs/thor4/html/www/speed-test/ #Update the html
+```
+
+Place the script in _/etc/cron.daily_ for dayly testing
+
+###### Author
+Nikolay Bogoychev, 2014
+
+###### License
+This software is licensed under the LGPL.
--- a/contrib/moses-speedtest/check_for_regression.py
+++ b/contrib/moses-speedtest/check_for_regression.py
@ -0,0 +1,63 @@
+"""Checks if any of the latests tests has performed considerably different than
+ the previous ones. Takes the log directory as an argument."""
+import os
+import sys
+from testsuite_common import Result, processLogLine, bcolors, getLastTwoLines
+
+LOGDIR = sys.argv[1] #Get the log directory as an argument
+PERCENTAGE = 5 #Default value for how much a test shoudl change
+if len(sys.argv) == 3:
+    PERCENTAGE = float(sys.argv[2]) #Default is 5%, but we can specify more
+    #line parameter
+
+def printResults(regressed, better, unchanged, firsttime):
+    """Pretty print the results in different colours"""
+    if regressed != []:
+        for item in regressed:
+            print(bcolors.RED + "REGRESSION! " + item.testname + " Was: "\
+            + str(item.previous) + " Is: " + str(item.current) + " Change: "\
+            + str(abs(item.percentage)) + "%. Revision: " + item.revision\
+            + bcolors.ENDC)
+    print('\n')
+    if unchanged != []:
+        for item in unchanged:
+            print(bcolors.BLUE + "UNCHANGED: " + item.testname + " Revision: " +\
+                item.revision + bcolors.ENDC)
+    print('\n')
+    if better != []:
+        for item in better:
+            print(bcolors.GREEN + "IMPROVEMENT! " + item.testname + " Was: "\
+            + str(item.previous) + " Is: " + str(item.current) + " Change: "\
+            + str(abs(item.percentage)) + "%. Revision: " + item.revision\
+            + bcolors.ENDC)
+    if firsttime != []:
+        for item in firsttime:
+            print(bcolors.PURPLE + "First time test! " + item.testname +\
+            " Took: " + str(item.real) +  " seconds. Revision: " +\
+            item.revision + bcolors.ENDC)
+
+
+all_files = os.listdir(LOGDIR)
+regressed = []
+better = []
+unchanged = []
+firsttime = []
+
+#Go through all log files and find which tests have performed better.
+for logfile in all_files:
+    (line1, line2) = getLastTwoLines(logfile, LOGDIR)
+    log1 = processLogLine(line1)
+    if line2 == '\n': # Empty line, only one test ever run
+        firsttime.append(log1)
+        continue
+    log2 = processLogLine(line2)
+    res = Result(log1.testname, log1.real, log2.real, log2.revision,\
+    log2.branch, log1.revision, log1.branch)
+    if res.percentage < -PERCENTAGE:
+        regressed.append(res)
+    elif res.change > PERCENTAGE:
+        better.append(res)
+    else:
+        unchanged.append(res)
+
+printResults(regressed, better, unchanged, firsttime)
--- a/contrib/moses-speedtest/cronjob
+++ b/contrib/moses-speedtest/cronjob
@ -0,0 +1,7 @@
+#!/bin/sh
+cd /home/moses-speedtest/phrase_tables
+
+python3 runtests.py -c testsuite_config #Run the tests.
+python3 html_gen.py testsuite_config #Generate html
+
+cp index.html /fs/thor4/html/www/speed-test/ #Update the html
--- a/contrib/moses-speedtest/helpers/README.md
+++ b/contrib/moses-speedtest/helpers/README.md
@ -0,0 +1,5 @@
+###Helpers
+
+This is a python script that basically gives you the equivalent of:
+```echo 3 > /proc/sys/vm/drop_caches```
+You need to set it up so it is executed with root access without needing a password so that the tests can be automated.
--- a/contrib/moses-speedtest/helpers/sys_drop_caches.py
+++ b/contrib/moses-speedtest/helpers/sys_drop_caches.py
@ -0,0 +1,22 @@
+#!/usr/bin/spython
+from sys import argv, stderr, exit
+from os import linesep as ls
+procfile = "/proc/sys/vm/drop_caches"
+options = ["1","2","3"]
+flush_type = None
+try:
+    flush_type = argv[1][0:1] 
+    if not flush_type in options:
+        raise IndexError, "not in options"
+    with open(procfile, "w") as f:
+        f.write("%s%s" % (flush_type,ls))
+    exit(0)
+except IndexError, e:
+    stderr.write("Argument %s required.%s" % (options, ls))
+except IOError, e:
+    stderr.write("Error writing to file.%s" % ls)
+except StandardError, e:
+    stderr.write("Unknown Error.%s" % ls)
+
+exit(1)
+
--- a/contrib/moses-speedtest/html/README.md
+++ b/contrib/moses-speedtest/html/README.md
@ -0,0 +1,5 @@
+###HTML files.
+
+_index.html_ is a sample generated file by this testsuite. 
+
+_style.css_ should be placed in the html directory in which _index.html_ will be placed in order to visualize the test results in a browser.
--- a/contrib/moses-speedtest/html/index.html
+++ b/contrib/moses-speedtest/html/index.html
--- a/contrib/moses-speedtest/html/style.css
+++ b/contrib/moses-speedtest/html/style.css
@ -0,0 +1,21 @@
+table,th,td
+{
+border:1px solid black;
+ border-collapse:collapse
+}
+
+tr:nth-child(odd) {
+    background-color: Gainsboro;
+}
+
+.better {
+	color: Green;
+}
+
+.worse {
+	color: Red;
+}
+
+.unchanged {
+	color: SkyBlue;
+}
--- a/contrib/moses-speedtest/html_gen.py
+++ b/contrib/moses-speedtest/html_gen.py
@ -0,0 +1,192 @@
+"""Generates HTML page containing the testresults"""
+from testsuite_common import Result, processLogLine, getLastTwoLines
+from runtests import parse_testconfig
+import os
+import sys
+
+from datetime import datetime, timedelta
+
+HTML_HEADING = """<html>
+<head>
+<title>Moses speed testing</title>
+<link rel="stylesheet" type="text/css" href="style.css"></head><body>"""
+HTML_ENDING = "</table></body></html>\n"
+
+TABLE_HEADING = """<table><tr class="heading">
+  <th>Date</th>
+  <th>Time</th> 
+  <th>Testname</th>
+  <th>Revision</th>
+  <th>Branch</th> 
+  <th>Time</th>
+  <th>Prevtime</th>
+  <th>Prevrev</th> 
+  <th>Change (%)</th>
+  <th>Time (Basebranch)</th> 
+  <th>Change (%, Basebranch)</th>
+  <th>Time (Days -2)</th> 
+  <th>Change (%, Days -2)</th>
+  <th>Time (Days -3)</th> 
+  <th>Change (%, Days -3)</th>
+  <th>Time (Days -4)</th> 
+  <th>Change (%, Days -4)</th>
+  <th>Time (Days -5)</th> 
+  <th>Change (%, Days -5)</th>
+  <th>Time (Days -6)</th> 
+  <th>Change (%, Days -6)</th>
+  <th>Time (Days -7)</th> 
+  <th>Change (%, Days -7)</th>
+  <th>Time (Days -14)</th> 
+  <th>Change (%, Days -14)</th>
+  <th>Time (Years -1)</th> 
+  <th>Change (%, Years -1)</th>
+ </tr>"""
+
+def get_prev_days(date, numdays):
+    """Gets the date numdays previous days so that we could search for
+    that test in the config file"""
+    date_obj = datetime.strptime(date, '%d.%m.%Y').date()
+    past_date = date_obj - timedelta(days=numdays)
+    return past_date.strftime('%d.%m.%Y')
+
+def gather_necessary_lines(logfile, date):
+    """Gathers the necessary lines corresponding to past dates
+    and parses them if they exist"""
+    #Get a dictionary of dates
+    dates = {}
+    dates[get_prev_days(date, 2)] = ('-2', None)
+    dates[get_prev_days(date, 3)] = ('-3', None)
+    dates[get_prev_days(date, 4)] = ('-4', None)
+    dates[get_prev_days(date, 5)] = ('-5', None)
+    dates[get_prev_days(date, 6)] = ('-6', None)
+    dates[get_prev_days(date, 7)] = ('-7', None)
+    dates[get_prev_days(date, 14)] = ('-14', None)
+    dates[get_prev_days(date, 365)] = ('-365', None)
+
+    openfile = open(logfile, 'r')
+    for line in openfile:
+        if line.split()[0] in dates.keys():
+            day = dates[line.split()[0]][0]
+            dates[line.split()[0]] = (day, processLogLine(line))
+    openfile.close()
+    return dates
+
+def append_date_to_table(resline):
+    """Appends past dates to the html"""
+    cur_html = '<td>' + str(resline.current) + '</td>'
+
+    if resline.percentage > 0.05: #If we have improvement of more than 5%
+        cur_html = cur_html +  '<td class="better">' + str(resline.percentage) + '</td>'
+    elif resline.percentage < -0.05: #We have a regression of more than 5%
+        cur_html = cur_html +  '<td class="worse">' + str(resline.percentage) + '</td>'
+    else:
+        cur_html = cur_html +  '<td class="unchanged">' + str(resline.percentage) + '</td>'
+    return cur_html
+
+def compare_rev(filename, rev1, rev2, branch1=False, branch2=False):
+    """Compare the test results of two lines. We can specify either a
+    revision or a branch for comparison. The first rev should be the
+    base version and the second revision should be the later version"""
+
+    #In the log file the index of the revision is 2 but the index of
+    #the branch is 12. Alternate those depending on whether we are looking
+    #for a specific revision or branch.
+    firstidx = 2
+    secondidx = 2
+    if branch1 == True:
+        firstidx = 12
+    if branch2 == True:
+        secondidx = 12
+
+    rev1line = ''
+    rev2line = ''
+    resfile = open(filename, 'r')
+    for line in resfile:
+        if rev1 == line.split()[firstidx]:
+            rev1line = line
+        elif rev2 == line.split()[secondidx]:
+            rev2line = line
+        if rev1line != '' and rev2line != '':
+            break
+    resfile.close()
+    if rev1line == '':
+        raise ValueError('Revision ' + rev1 + " was not found!")
+    if rev2line == '':
+        raise ValueError('Revision ' + rev2 + " was not found!")
+
+    logLine1 = processLogLine(rev1line)
+    logLine2 = processLogLine(rev2line)
+    res = Result(logLine1.testname, logLine1.real, logLine2.real,\
+        logLine2.revision, logLine2.branch, logLine1.revision, logLine1.branch)
+
+    return res
+
+def produce_html(path, global_config):
+    """Produces html file for the report."""
+    html = '' #The table HTML
+    for filenam in os.listdir(global_config.testlogs):
+        #Generate html for the newest two lines
+        #Get the lines from the config file
+        (ll1, ll2) = getLastTwoLines(filenam, global_config.testlogs)
+        logLine1 = processLogLine(ll1)
+        logLine2 = processLogLine(ll2)
+
+        #Generate html
+        res1 = Result(logLine1.testname, logLine1.real, logLine2.real,\
+            logLine2.revision, logLine2.branch, logLine1.revision, logLine1.branch)
+        html = html + '<tr><td>' + logLine2.date + '</td><td>' + logLine2.time + '</td><td>' +\
+        res1.testname + '</td><td>' + res1.revision[:10] + '</td><td>' + res1.branch + '</td><td>' +\
+        str(res1.current) + '</td><td>' + str(res1.previous) + '</td><td>' + res1.prevrev[:10] + '</td>'
+
+        #Add fancy colours depending on the change
+        if res1.percentage > 0.05: #If we have improvement of more than 5%
+            html = html +  '<td class="better">' + str(res1.percentage) + '</td>'
+        elif res1.percentage < -0.05: #We have a regression of more than 5%
+            html = html +  '<td class="worse">' + str(res1.percentage) + '</td>'
+        else:
+            html = html +  '<td class="unchanged">' + str(res1.percentage) + '</td>'
+
+        #Get comparison against the base version
+        filenam = global_config.testlogs + '/' + filenam #Get proper directory
+        res2 = compare_rev(filenam, global_config.basebranch, res1.revision, branch1=True)
+        html = html + '<td>' + str(res2.previous) + '</td>'
+
+        #Add fancy colours depending on the change
+        if res2.percentage > 0.05: #If we have improvement of more than 5%
+            html = html +  '<td class="better">' + str(res2.percentage) + '</td>'
+        elif res2.percentage < -0.05: #We have a regression of more than 5%
+            html = html +  '<td class="worse">' + str(res2.percentage) + '</td>'
+        else:
+            html = html +  '<td class="unchanged">' + str(res2.percentage) + '</td>'
+
+        #Add extra dates comparison dating from the beginning of time if they exist
+        past_dates = list(range(2, 8))
+        past_dates.append(14)
+        past_dates.append(365) # Get the 1 year ago day
+        linesdict = gather_necessary_lines(filenam, logLine2.date)
+
+        for days in past_dates:
+            act_date = get_prev_days(logLine2.date, days)
+            if linesdict[act_date][1] is not None:
+                logline_date = linesdict[act_date][1]
+                restemp = Result(logline_date.testname, logline_date.real, logLine2.real,\
+                logLine2.revision, logLine2.branch, logline_date.revision, logline_date.branch)
+                html = html + append_date_to_table(restemp)
+            else:
+                html = html + '<td>N/A</td><td>N/A</td>'
+
+
+
+        html = html + '</tr>' #End row
+
+    #Write out the file
+    basebranch_info = '<text><b>Basebranch:</b> ' + res2.prevbranch + ' <b>Revision:</b> ' +\
+    res2.prevrev + '</text>'
+    writeoutstr = HTML_HEADING + basebranch_info + TABLE_HEADING + html + HTML_ENDING
+    writefile = open(path, 'w')
+    writefile.write(writeoutstr)
+    writefile.close()
+
+if __name__ == '__main__':
+    CONFIG = parse_testconfig(sys.argv[1])
+    produce_html('index.html', CONFIG)
--- a/contrib/moses-speedtest/runtests.py
+++ b/contrib/moses-speedtest/runtests.py
@ -0,0 +1,293 @@
+"""Given a config file, runs tests"""
+import os
+import subprocess
+import time
+from argparse import ArgumentParser
+from testsuite_common import processLogLine
+
+def parse_cmd():
+    """Parse the command line arguments"""
+    description = "A python based speedtest suite for moses."
+    parser = ArgumentParser(description=description)
+    parser.add_argument("-c", "--configfile", action="store",\
+                dest="configfile", required=True,\
+                help="Specify test config file")
+    parser.add_argument("-s", "--singletest", action="store",\
+                dest="singletestdir", default=None,\
+                help="Single test name directory. Specify directory name,\
+                not full path!")
+    parser.add_argument("-r", "--revision", action="store",\
+                dest="revision", default=None,\
+                help="Specify a specific revison for the test.")
+    parser.add_argument("-b", "--branch", action="store",\
+                dest="branch", default=None,\
+                help="Specify a branch for the test.")
+
+    arguments = parser.parse_args()
+    return arguments
+
+def repoinit(testconfig):
+    """Determines revision and sets up the repo."""
+    revision = ''
+    #Update the repo
+    os.chdir(testconfig.repo)
+    #Checkout specific branch, else maintain main branch
+    if testconfig.branch != 'master':
+        subprocess.call(['git', 'checkout', testconfig.branch])
+        rev, _ = subprocess.Popen(['git', 'rev-parse', 'HEAD'],\
+            stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()
+        revision = str(rev).replace("\\n'", '').replace("b'", '')
+    else:
+        subprocess.call(['git checkout master'], shell=True)
+
+    #Check a specific revision. Else checkout master.
+    if testconfig.revision:
+        subprocess.call(['git', 'checkout', testconfig.revision])
+        revision = testconfig.revision
+    elif testconfig.branch == 'master':
+        subprocess.call(['git pull'], shell=True)
+        rev, _ = subprocess.Popen(['git rev-parse HEAD'], stdout=subprocess.PIPE,\
+            stderr=subprocess.PIPE, shell=True).communicate()
+        revision = str(rev).replace("\\n'", '').replace("b'", '')
+    
+    return revision
+
+class Configuration:
+    """A simple class to hold all of the configuration constatns"""
+    def __init__(self, repo, drop_caches, tests, testlogs, basebranch, baserev):
+        self.repo = repo
+        self.drop_caches = drop_caches
+        self.tests = tests
+        self.testlogs = testlogs
+        self.basebranch = basebranch
+        self.baserev = baserev
+        self.singletest = None
+        self.revision = None
+        self.branch = 'master' # Default branch
+
+    def additional_args(self, singletest, revision, branch):
+        """Additional configuration from command line arguments"""
+        self.singletest = singletest
+        if revision is not None:
+            self.revision = revision
+        if branch is not None:
+            self.branch = branch
+
+    def set_revision(self, revision):
+        """Sets the current revision that is being tested"""
+        self.revision = revision
+
+
+class Test:
+    """A simple class to contain all information about tests"""
+    def __init__(self, name, command, ldopts, permutations):
+        self.name = name
+        self.command = command
+        self.ldopts = ldopts.replace(' ', '').split(',') #Not tested yet
+        self.permutations = permutations
+
+def parse_configfile(conffile, testdir, moses_repo):
+    """Parses the config file"""
+    command, ldopts = '', ''
+    permutations = []
+    fileopen = open(conffile, 'r')
+    for line in fileopen:
+        line = line.split('#')[0] # Discard comments
+        if line == '' or line == '\n':
+            continue # Discard lines with comments only and empty lines
+        opt, args = line.split(' ', 1) # Get arguments
+
+        if opt == 'Command:':
+            command = args.replace('\n', '')
+            command = moses_repo + '/bin/' + command
+        elif opt == 'LDPRE:':
+            ldopts = args.replace('\n', '')
+        elif opt == 'Variants:':
+            permutations = args.replace('\n', '').replace(' ', '').split(',')
+        else:
+            raise ValueError('Unrecognized option ' + opt)
+    #We use the testdir as the name.
+    testcase = Test(testdir, command, ldopts, permutations)
+    fileopen.close()
+    return testcase
+
+def parse_testconfig(conffile):
+    """Parses the config file for the whole testsuite."""
+    repo_path, drop_caches, tests_dir, testlog_dir = '', '', '', ''
+    basebranch, baserev = '', ''
+    fileopen = open(conffile, 'r')
+    for line in fileopen:
+        line = line.split('#')[0] # Discard comments
+        if line == '' or line == '\n':
+            continue # Discard lines with comments only and empty lines
+        opt, args = line.split(' ', 1) # Get arguments
+        if opt == 'MOSES_REPO_PATH:':
+            repo_path = args.replace('\n', '')
+        elif opt == 'DROP_CACHES_COMM:':
+            drop_caches = args.replace('\n', '')
+        elif opt == 'TEST_DIR:':
+            tests_dir = args.replace('\n', '')
+        elif opt == 'TEST_LOG_DIR:':
+            testlog_dir = args.replace('\n', '')
+        elif opt == 'BASEBRANCH:':
+            basebranch = args.replace('\n', '')
+        elif opt == 'BASEREV:':
+            baserev = args.replace('\n', '')
+        else:
+            raise ValueError('Unrecognized option ' + opt)
+    config = Configuration(repo_path, drop_caches, tests_dir, testlog_dir,\
+    basebranch, baserev)
+    fileopen.close()
+    return config
+
+def get_config():
+    """Builds the config object with all necessary attributes"""
+    args = parse_cmd()
+    config = parse_testconfig(args.configfile)
+    config.additional_args(args.singletestdir, args.revision, args.branch)
+    revision = repoinit(config)
+    config.set_revision(revision)
+    return config
+
+def check_for_basever(testlogfile, basebranch):
+    """Checks if the base revision is present in the testlogs"""
+    filetoopen = open(testlogfile, 'r')
+    for line in filetoopen:
+        templine = processLogLine(line)
+        if templine.branch == basebranch:
+            return True
+    return False
+
+def split_time(filename):
+    """Splits the output of the time function into seperate parts.
+    We will write time to file, because many programs output to
+    stderr which makes it difficult to get only the exact results we need."""
+    timefile = open(filename, 'r')
+    realtime = float(timefile.readline().replace('\n', '').split()[1])
+    usertime = float(timefile.readline().replace('\n', '').split()[1])
+    systime = float(timefile.readline().replace('\n', '').split()[1])
+    timefile.close()
+
+    return (realtime, usertime, systime)
+
+
+def write_log(time_file, logname, config):
+    """Writes to a logfile"""
+    log_write = open(config.testlogs + '/' + logname, 'a') # Open logfile
+    date_run = time.strftime("%d.%m.%Y %H:%M:%S") # Get the time of the test
+    realtime, usertime, systime = split_time(time_file) # Get the times in a nice form
+
+    # Append everything to a log file.
+    writestr = date_run + " " + config.revision + " Testname: " + logname +\
+    " RealTime: " + str(realtime) + " UserTime: " + str(usertime) +\
+    " SystemTime: " + str(systime) + " Branch: " + config.branch +'\n'
+    log_write.write(writestr)
+    log_write.close()
+
+
+def execute_tests(testcase, cur_directory, config):
+    """Executes timed tests based on the config file"""
+    #Figure out the order of which tests must be executed.
+    #Change to the current test directory
+    os.chdir(config.tests + '/' + cur_directory)
+    #Clear caches
+    subprocess.call(['sync'], shell=True)
+    subprocess.call([config.drop_caches], shell=True)
+    #Perform vanilla test and if a cached test exists - as well
+    print(testcase.name)
+    if 'vanilla' in testcase.permutations:
+        print(testcase.command)
+        subprocess.Popen(['time -p -o /tmp/time_moses_tests ' + testcase.command], stdout=None,\
+         stderr=subprocess.PIPE, shell=True).communicate()
+        write_log('/tmp/time_moses_tests', testcase.name + '_vanilla', config)
+        if 'cached' in testcase.permutations:
+            subprocess.Popen(['time -p -o /tmp/time_moses_tests ' + testcase.command], stdout=None,\
+            stderr=None, shell=True).communicate()
+            write_log('/tmp/time_moses_tests', testcase.name + '_vanilla_cached', config)
+    
+    #Now perform LD_PRELOAD tests
+    if 'ldpre' in testcase.permutations:
+        for opt in testcase.ldopts:
+            #Clear caches
+            subprocess.call(['sync'], shell=True)
+            subprocess.call([config.drop_caches], shell=True)
+
+            #test
+            subprocess.Popen(['LD_PRELOAD ' + opt + ' time -p -o /tmp/time_moses_tests ' + testcase.command], stdout=None,\
+            stderr=None, shell=True).communicate()
+            write_log('/tmp/time_moses_tests', testcase.name + '_ldpre_' + opt, config)
+            if 'cached' in testcase.permutations:
+                subprocess.Popen(['LD_PRELOAD ' + opt + ' time -p -o /tmp/time_moses_tests ' + testcase.command], stdout=None,\
+                stderr=None, shell=True).communicate()
+                write_log('/tmp/time_moses_tests', testcase.name + '_ldpre_' +opt +'_cached', config)
+
+# Go through all the test directories and executes tests
+if __name__ == '__main__':
+    CONFIG = get_config()
+    ALL_DIR = os.listdir(CONFIG.tests)
+
+    #We should first check if any of the tests is run for the first time.
+    #If some of them are run for the first time we should first get their
+    #time with the base version (usually the previous release)
+    FIRSTTIME = []
+    TESTLOGS = []
+    #Strip filenames of test underscores
+    for listline in os.listdir(CONFIG.testlogs):
+        listline = listline.replace('_vanilla', '')
+        listline = listline.replace('_cached', '')
+        listline = listline.replace('_ldpre', '')
+        TESTLOGS.append(listline)
+    for directory in ALL_DIR:
+        if directory not in TESTLOGS:
+            FIRSTTIME.append(directory)
+
+    #Sometimes even though we have the log files, we will need to rerun them
+    #Against a base version, because we require a different baseversion (for
+    #example when a new version of Moses is released.) Therefore we should
+    #Check if the version of Moses that we have as a base version is in all
+    #of the log files.
+
+    for logfile in os.listdir(CONFIG.testlogs):
+        logfile_name = CONFIG.testlogs + '/' + logfile
+        if not check_for_basever(logfile_name, CONFIG.basebranch):
+            logfile = logfile.replace('_vanilla', '')
+            logfile = logfile.replace('_cached', '')
+            logfile = logfile.replace('_ldpre', '')
+            FIRSTTIME.append(logfile)
+    FIRSTTIME = list(set(FIRSTTIME)) #Deduplicate
+
+    if FIRSTTIME != []:
+        #Create a new configuration for base version tests:
+        BASECONFIG = Configuration(CONFIG.repo, CONFIG.drop_caches,\
+            CONFIG.tests, CONFIG.testlogs, CONFIG.basebranch,\
+            CONFIG.baserev)
+        BASECONFIG.additional_args(None, CONFIG.baserev, CONFIG.basebranch)
+        #Set up the repository and get its revision:
+        REVISION = repoinit(BASECONFIG)
+        BASECONFIG.set_revision(REVISION)
+        #Build
+        os.chdir(BASECONFIG.repo)
+        subprocess.call(['./previous.sh'], shell=True)
+
+        #Perform tests
+        for directory in FIRSTTIME:
+            cur_testcase = parse_configfile(BASECONFIG.tests + '/' + directory +\
+            '/config', directory, BASECONFIG.repo)
+            execute_tests(cur_testcase, directory, BASECONFIG)
+
+        #Reset back the repository to the normal configuration
+        repoinit(CONFIG)
+
+    #Builds moses
+    os.chdir(CONFIG.repo)
+    subprocess.call(['./previous.sh'], shell=True)
+
+    if CONFIG.singletest:
+        TESTCASE = parse_configfile(CONFIG.tests + '/' +\
+            CONFIG.singletest + '/config', CONFIG.singletest, CONFIG.repo)
+        execute_tests(TESTCASE, CONFIG.singletest, CONFIG)
+    else:
+        for directory in ALL_DIR:
+            cur_testcase = parse_configfile(CONFIG.tests + '/' + directory +\
+            '/config', directory, CONFIG.repo)
+            execute_tests(cur_testcase, directory, CONFIG)
--- a/contrib/moses-speedtest/sys_drop_caches.py
+++ b/contrib/moses-speedtest/sys_drop_caches.py
@ -0,0 +1,22 @@
+#!/usr/bin/spython
+from sys import argv, stderr, exit
+from os import linesep as ls
+procfile = "/proc/sys/vm/drop_caches"
+options = ["1","2","3"]
+flush_type = None
+try:
+    flush_type = argv[1][0:1] 
+    if not flush_type in options:
+        raise IndexError, "not in options"
+    with open(procfile, "w") as f:
+        f.write("%s%s" % (flush_type,ls))
+    exit(0)
+except IndexError, e:
+    stderr.write("Argument %s required.%s" % (options, ls))
+except IOError, e:
+    stderr.write("Error writing to file.%s" % ls)
+except StandardError, e:
+    stderr.write("Unknown Error.%s" % ls)
+
+exit(1)
+
--- a/contrib/moses-speedtest/test_config
+++ b/contrib/moses-speedtest/test_config
@ -0,0 +1,3 @@
+Command: moses -f ... -i fff #Looks for the command in the /bin directory of the repo specified in the testsuite_config
+LDPRE: ldpreloads #Comma separated LD_LIBRARY_PATH:/, 
+Variants: vanilla, cached, ldpre #Can't have cached without ldpre or vanilla
--- a/contrib/moses-speedtest/testsuite_common.py
+++ b/contrib/moses-speedtest/testsuite_common.py
@ -0,0 +1,54 @@
+"""Common functions of the testsuitce"""
+import os
+#Clour constants
+class bcolors:
+    PURPLE = '\033[95m'
+    BLUE = '\033[94m'
+    GREEN = '\033[92m'
+    YELLOW = '\033[93m'
+    RED = '\033[91m'
+    ENDC = '\033[0m'
+
+class LogLine:
+    """A class to contain logfile line"""
+    def __init__(self, date, time, revision, testname, real, user, system, branch):
+        self.date = date
+        self.time = time
+        self.revision = revision
+        self.testname = testname
+        self.real = real
+        self.system = system
+        self.user = user
+        self.branch = branch
+
+class Result:
+    """A class to contain results of benchmarking"""
+    def __init__(self, testname, previous, current, revision, branch, prevrev, prevbranch):
+        self.testname = testname
+        self.previous = previous
+        self.current = current
+        self.change = previous - current
+        self.revision = revision
+        self.branch = branch
+        self.prevbranch = prevbranch
+        self.prevrev = prevrev
+        #Produce a percentage with fewer digits
+        self.percentage = float(format(1 - current/previous, '.4f'))
+
+def processLogLine(logline):
+    """Parses the log line into a nice datastructure"""
+    logline = logline.split()
+    log = LogLine(logline[0], logline[1], logline[2], logline[4],\
+        float(logline[6]), float(logline[8]), float(logline[10]), logline[12])
+    return log
+
+def getLastTwoLines(filename, logdir):
+    """Just a call to tail to get the diff between the last two runs"""
+    try:
+        line1, line2 = os.popen("tail -n2 " + logdir + '/' + filename)
+    except ValueError: #Check for new tests
+        tempfile = open(logdir + '/' + filename)
+        line1 = tempfile.readline()
+        tempfile.close()
+        return (line1, '\n')
+    return (line1, line2)
--- a/contrib/moses-speedtest/testsuite_config
+++ b/contrib/moses-speedtest/testsuite_config
@ -0,0 +1,5 @@
+MOSES_REPO_PATH: /home/moses-speedtest/moses-standard/mosesdecoder
+DROP_CACHES_COMM: sys_drop_caches 3
+TEST_DIR: /home/moses-speedtest/phrase_tables/tests
+TEST_LOG_DIR: /home/moses-speedtest/phrase_tables/testlogs
+BASEBRANCH: RELEASE-2.1.1
--- a/contrib/other-builds/consolidate/.cproject
+++ b/contrib/other-builds/consolidate/.cproject
@ -0,0 +1,132 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<?fileVersion 4.0.0?><cproject storage_type_id="org.eclipse.cdt.core.XmlProjectDescriptionStorage">
+	<storageModule moduleId="org.eclipse.cdt.core.settings">
+		<cconfiguration id="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686">
+			<storageModule buildSystemId="org.eclipse.cdt.managedbuilder.core.configurationDataProvider" id="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686" moduleId="org.eclipse.cdt.core.settings" name="Debug">
+				<externalSettings/>
+				<extensions>
+					<extension id="org.eclipse.cdt.core.GmakeErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
+					<extension id="org.eclipse.cdt.core.CWDLocator" point="org.eclipse.cdt.core.ErrorParser"/>
+					<extension id="org.eclipse.cdt.core.GCCErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
+					<extension id="org.eclipse.cdt.core.GASErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
+					<extension id="org.eclipse.cdt.core.GLDErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
+					<extension id="org.eclipse.cdt.core.ELF" point="org.eclipse.cdt.core.BinaryParser"/>
+				</extensions>
+			</storageModule>
+			<storageModule moduleId="cdtBuildSystem" version="4.0.0">
+				<configuration artifactName="${ProjName}" buildArtefactType="org.eclipse.cdt.build.core.buildArtefactType.exe" buildProperties="org.eclipse.cdt.build.core.buildType=org.eclipse.cdt.build.core.buildType.debug,org.eclipse.cdt.build.core.buildArtefactType=org.eclipse.cdt.build.core.buildArtefactType.exe" cleanCommand="rm -rf" description="" id="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686" name="Debug" parent="cdt.managedbuild.config.gnu.cross.exe.debug">
+					<folderInfo id="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686." name="/" resourcePath="">
+						<toolChain id="cdt.managedbuild.toolchain.gnu.cross.exe.debug.1312813804" name="Cross GCC" superClass="cdt.managedbuild.toolchain.gnu.cross.exe.debug">
+							<targetPlatform archList="all" binaryParser="org.eclipse.cdt.core.ELF" id="cdt.managedbuild.targetPlatform.gnu.cross.1457158442" isAbstract="false" osList="all" superClass="cdt.managedbuild.targetPlatform.gnu.cross"/>
+							<builder buildPath="${workspace_loc:/consolidate}/Debug" id="cdt.managedbuild.builder.gnu.cross.401817170" keepEnvironmentInBuildfile="false" managedBuildOn="true" name="Gnu Make Builder" superClass="cdt.managedbuild.builder.gnu.cross"/>
+							<tool id="cdt.managedbuild.tool.gnu.cross.c.compiler.584773180" name="Cross GCC Compiler" superClass="cdt.managedbuild.tool.gnu.cross.c.compiler">
+								<option defaultValue="gnu.c.optimization.level.none" id="gnu.c.compiler.option.optimization.level.548826159" name="Optimization Level" superClass="gnu.c.compiler.option.optimization.level" valueType="enumerated"/>
+								<option id="gnu.c.compiler.option.debugging.level.69309976" name="Debug Level" superClass="gnu.c.compiler.option.debugging.level" value="gnu.c.debugging.level.max" valueType="enumerated"/>
+								<inputType id="cdt.managedbuild.tool.gnu.c.compiler.input.1869389417" superClass="cdt.managedbuild.tool.gnu.c.compiler.input"/>
+							</tool>
+							<tool id="cdt.managedbuild.tool.gnu.cross.cpp.compiler.1684035985" name="Cross G++ Compiler" superClass="cdt.managedbuild.tool.gnu.cross.cpp.compiler">
+								<option id="gnu.cpp.compiler.option.optimization.level.1978964587" name="Optimization Level" superClass="gnu.cpp.compiler.option.optimization.level" value="gnu.cpp.compiler.optimization.level.none" valueType="enumerated"/>
+								<option id="gnu.cpp.compiler.option.debugging.level.1174628687" name="Debug Level" superClass="gnu.cpp.compiler.option.debugging.level" value="gnu.cpp.compiler.debugging.level.max" valueType="enumerated"/>
+								<option id="gnu.cpp.compiler.option.include.paths.1899244069" name="Include paths (-I)" superClass="gnu.cpp.compiler.option.include.paths" valueType="includePath">
+									<listOptionValue builtIn="false" value="&quot;${workspace_loc}/../../boost/include&quot;"/>
+								</option>
+								<inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.1369007077" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
+							</tool>
+							<tool id="cdt.managedbuild.tool.gnu.cross.c.linker.988122551" name="Cross GCC Linker" superClass="cdt.managedbuild.tool.gnu.cross.c.linker"/>
+							<tool id="cdt.managedbuild.tool.gnu.cross.cpp.linker.580092188" name="Cross G++ Linker" superClass="cdt.managedbuild.tool.gnu.cross.cpp.linker">
+								<option id="gnu.cpp.link.option.libs.1224797947" name="Libraries (-l)" superClass="gnu.cpp.link.option.libs" valueType="libs">
+									<listOptionValue builtIn="false" value="z"/>
+									<listOptionValue builtIn="false" value="boost_iostreams-mt"/>
+								</option>
+								<option id="gnu.cpp.link.option.paths.845281969" superClass="gnu.cpp.link.option.paths" valueType="libPaths">
+									<listOptionValue builtIn="false" value="&quot;${workspace_loc:}/../../boost/lib64&quot;"/>
+								</option>
+								<inputType id="cdt.managedbuild.tool.gnu.cpp.linker.input.1562981657" superClass="cdt.managedbuild.tool.gnu.cpp.linker.input">
+									<additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
+									<additionalInput kind="additionalinput" paths="$(LIBS)"/>
+								</inputType>
+							</tool>
+							<tool id="cdt.managedbuild.tool.gnu.cross.archiver.1813579853" name="Cross GCC Archiver" superClass="cdt.managedbuild.tool.gnu.cross.archiver"/>
+							<tool id="cdt.managedbuild.tool.gnu.cross.assembler.660034723" name="Cross GCC Assembler" superClass="cdt.managedbuild.tool.gnu.cross.assembler">
+								<inputType id="cdt.managedbuild.tool.gnu.assembler.input.2016181080" superClass="cdt.managedbuild.tool.gnu.assembler.input"/>
+							</tool>
+						</toolChain>
+					</folderInfo>
+				</configuration>
+			</storageModule>
+			<storageModule moduleId="org.eclipse.cdt.core.externalSettings"/>
+		</cconfiguration>
+		<cconfiguration id="cdt.managedbuild.config.gnu.cross.exe.release.1197533473">
+			<storageModule buildSystemId="org.eclipse.cdt.managedbuilder.core.configurationDataProvider" id="cdt.managedbuild.config.gnu.cross.exe.release.1197533473" moduleId="org.eclipse.cdt.core.settings" name="Release">
+				<externalSettings/>
+				<extensions>
+					<extension id="org.eclipse.cdt.core.GmakeErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
+					<extension id="org.eclipse.cdt.core.CWDLocator" point="org.eclipse.cdt.core.ErrorParser"/>
+					<extension id="org.eclipse.cdt.core.GCCErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
+					<extension id="org.eclipse.cdt.core.GASErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
+					<extension id="org.eclipse.cdt.core.GLDErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
+					<extension id="org.eclipse.cdt.core.ELF" point="org.eclipse.cdt.core.BinaryParser"/>
+				</extensions>
+			</storageModule>
+			<storageModule moduleId="cdtBuildSystem" version="4.0.0">
+				<configuration artifactName="${ProjName}" buildArtefactType="org.eclipse.cdt.build.core.buildArtefactType.exe" buildProperties="org.eclipse.cdt.build.core.buildType=org.eclipse.cdt.build.core.buildType.release,org.eclipse.cdt.build.core.buildArtefactType=org.eclipse.cdt.build.core.buildArtefactType.exe" cleanCommand="rm -rf" description="" id="cdt.managedbuild.config.gnu.cross.exe.release.1197533473" name="Release" parent="cdt.managedbuild.config.gnu.cross.exe.release">
+					<folderInfo id="cdt.managedbuild.config.gnu.cross.exe.release.1197533473." name="/" resourcePath="">
+						<toolChain id="cdt.managedbuild.toolchain.gnu.cross.exe.release.1193312581" name="Cross GCC" superClass="cdt.managedbuild.toolchain.gnu.cross.exe.release">
+							<targetPlatform archList="all" binaryParser="org.eclipse.cdt.core.ELF" id="cdt.managedbuild.targetPlatform.gnu.cross.1614674218" isAbstract="false" osList="all" superClass="cdt.managedbuild.targetPlatform.gnu.cross"/>
+							<builder buildPath="${workspace_loc:/consolidate}/Release" id="cdt.managedbuild.builder.gnu.cross.1921548268" keepEnvironmentInBuildfile="false" managedBuildOn="true" name="Gnu Make Builder" superClass="cdt.managedbuild.builder.gnu.cross"/>
+							<tool id="cdt.managedbuild.tool.gnu.cross.c.compiler.1402792534" name="Cross GCC Compiler" superClass="cdt.managedbuild.tool.gnu.cross.c.compiler">
+								<option defaultValue="gnu.c.optimization.level.most" id="gnu.c.compiler.option.optimization.level.172258714" name="Optimization Level" superClass="gnu.c.compiler.option.optimization.level" valueType="enumerated"/>
+								<option id="gnu.c.compiler.option.debugging.level.949623548" name="Debug Level" superClass="gnu.c.compiler.option.debugging.level" value="gnu.c.debugging.level.none" valueType="enumerated"/>
+								<inputType id="cdt.managedbuild.tool.gnu.c.compiler.input.1960225725" superClass="cdt.managedbuild.tool.gnu.c.compiler.input"/>
+							</tool>
+							<tool id="cdt.managedbuild.tool.gnu.cross.cpp.compiler.1697856596" name="Cross G++ Compiler" superClass="cdt.managedbuild.tool.gnu.cross.cpp.compiler">
+								<option id="gnu.cpp.compiler.option.optimization.level.1575999400" name="Optimization Level" superClass="gnu.cpp.compiler.option.optimization.level" value="gnu.cpp.compiler.optimization.level.most" valueType="enumerated"/>
+								<option id="gnu.cpp.compiler.option.debugging.level.732263649" name="Debug Level" superClass="gnu.cpp.compiler.option.debugging.level" value="gnu.cpp.compiler.debugging.level.none" valueType="enumerated"/>
+								<inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.1685852561" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
+							</tool>
+							<tool id="cdt.managedbuild.tool.gnu.cross.c.linker.1332869586" name="Cross GCC Linker" superClass="cdt.managedbuild.tool.gnu.cross.c.linker"/>
+							<tool id="cdt.managedbuild.tool.gnu.cross.cpp.linker.484647585" name="Cross G++ Linker" superClass="cdt.managedbuild.tool.gnu.cross.cpp.linker">
+								<inputType id="cdt.managedbuild.tool.gnu.cpp.linker.input.2140954002" superClass="cdt.managedbuild.tool.gnu.cpp.linker.input">
+									<additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
+									<additionalInput kind="additionalinput" paths="$(LIBS)"/>
+								</inputType>
+							</tool>
+							<tool id="cdt.managedbuild.tool.gnu.cross.archiver.620666274" name="Cross GCC Archiver" superClass="cdt.managedbuild.tool.gnu.cross.archiver"/>
+							<tool id="cdt.managedbuild.tool.gnu.cross.assembler.1478840357" name="Cross GCC Assembler" superClass="cdt.managedbuild.tool.gnu.cross.assembler">
+								<inputType id="cdt.managedbuild.tool.gnu.assembler.input.412043972" superClass="cdt.managedbuild.tool.gnu.assembler.input"/>
+							</tool>
+						</toolChain>
+					</folderInfo>
+				</configuration>
+			</storageModule>
+			<storageModule moduleId="org.eclipse.cdt.core.externalSettings"/>
+		</cconfiguration>
+	</storageModule>
+	<storageModule moduleId="cdtBuildSystem" version="4.0.0">
+		<project id="consolidate.cdt.managedbuild.target.gnu.cross.exe.1166003694" name="Executable" projectType="cdt.managedbuild.target.gnu.cross.exe"/>
+	</storageModule>
+	<storageModule moduleId="scannerConfiguration">
+		<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId=""/>
+		<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686;cdt.managedbuild.config.gnu.cross.exe.debug.1847651686.;cdt.managedbuild.tool.gnu.cross.c.compiler.584773180;cdt.managedbuild.tool.gnu.c.compiler.input.1869389417">
+			<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileC"/>
+		</scannerConfigBuildInfo>
+		<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.cross.exe.release.1197533473;cdt.managedbuild.config.gnu.cross.exe.release.1197533473.;cdt.managedbuild.tool.gnu.cross.cpp.compiler.1697856596;cdt.managedbuild.tool.gnu.cpp.compiler.input.1685852561">
+			<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileCPP"/>
+		</scannerConfigBuildInfo>
+		<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686;cdt.managedbuild.config.gnu.cross.exe.debug.1847651686.;cdt.managedbuild.tool.gnu.cross.cpp.compiler.1684035985;cdt.managedbuild.tool.gnu.cpp.compiler.input.1369007077">
+			<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileCPP"/>
+		</scannerConfigBuildInfo>
+		<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.cross.exe.release.1197533473;cdt.managedbuild.config.gnu.cross.exe.release.1197533473.;cdt.managedbuild.tool.gnu.cross.c.compiler.1402792534;cdt.managedbuild.tool.gnu.c.compiler.input.1960225725">
+			<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileC"/>
+		</scannerConfigBuildInfo>
+	</storageModule>
+	<storageModule moduleId="org.eclipse.cdt.core.LanguageSettingsProviders"/>
+	<storageModule moduleId="refreshScope" versionNumber="2">
+		<configuration configurationName="Release">
+			<resource resourceType="PROJECT" workspacePath="/consolidate"/>
+		</configuration>
+		<configuration configurationName="Debug">
+			<resource resourceType="PROJECT" workspacePath="/consolidate"/>
+		</configuration>
+	</storageModule>
+</cproject>
--- a/contrib/other-builds/consolidate/.project
+++ b/contrib/other-builds/consolidate/.project
@ -0,0 +1,64 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<projectDescription>
+	<name>consolidate</name>
+	<comment></comment>
+	<projects>
+	</projects>
+	<buildSpec>
+		<buildCommand>
+			<name>org.eclipse.cdt.managedbuilder.core.genmakebuilder</name>
+			<triggers>clean,full,incremental,</triggers>
+			<arguments>
+			</arguments>
+		</buildCommand>
+		<buildCommand>
+			<name>org.eclipse.cdt.managedbuilder.core.ScannerConfigBuilder</name>
+			<triggers>full,incremental,</triggers>
+			<arguments>
+			</arguments>
+		</buildCommand>
+	</buildSpec>
+	<natures>
+		<nature>org.eclipse.cdt.core.cnature</nature>
+		<nature>org.eclipse.cdt.core.ccnature</nature>
+		<nature>org.eclipse.cdt.managedbuilder.core.managedBuildNature</nature>
+		<nature>org.eclipse.cdt.managedbuilder.core.ScannerConfigNature</nature>
+	</natures>
+	<linkedResources>
+		<link>
+			<name>InputFileStream.cpp</name>
+			<type>1</type>
+			<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/InputFileStream.cpp</locationURI>
+		</link>
+		<link>
+			<name>InputFileStream.h</name>
+			<type>1</type>
+			<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/InputFileStream.h</locationURI>
+		</link>
+		<link>
+			<name>OutputFileStream.cpp</name>
+			<type>1</type>
+			<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/OutputFileStream.cpp</locationURI>
+		</link>
+		<link>
+			<name>OutputFileStream.h</name>
+			<type>1</type>
+			<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/OutputFileStream.h</locationURI>
+		</link>
+		<link>
+			<name>consolidate-main.cpp</name>
+			<type>1</type>
+			<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/consolidate-main.cpp</locationURI>
+		</link>
+		<link>
+			<name>tables-core.cpp</name>
+			<type>1</type>
+			<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/tables-core.cpp</locationURI>
+		</link>
+		<link>
+			<name>tables-core.h</name>
+			<type>1</type>
+			<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/tables-core.h</locationURI>
+		</link>
+	</linkedResources>
+</projectDescription>
--- a/contrib/other-builds/extractor/.cproject
+++ b/contrib/other-builds/extractor/.cproject
@ -42,9 +42,11 @@
 								</option>
 								<option id="gnu.cpp.link.option.libs.585257079" name="Libraries (-l)" superClass="gnu.cpp.link.option.libs" valueType="libs">
 									<listOptionValue builtIn="false" value="mert_lib"/>
-									<listOptionValue builtIn="false" value="boost_system-mt"/>
 									<listOptionValue builtIn="false" value="util"/>
+									<listOptionValue builtIn="false" value="boost_system-mt"/>
+									<listOptionValue builtIn="false" value="boost_thread-mt"/>
 									<listOptionValue builtIn="false" value="z"/>
+									<listOptionValue builtIn="false" value="pthread"/>
 								</option>
 								<inputType id="cdt.managedbuild.tool.gnu.cpp.linker.input.656319745" superClass="cdt.managedbuild.tool.gnu.cpp.linker.input">
 									<additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
--- a/contrib/other-builds/extractor/.project
+++ b/contrib/other-builds/extractor/.project
@ -4,6 +4,7 @@
 	<comment></comment>
 	<projects>
 		<project>mert_lib</project>
+		<project>util</project>
 	</projects>
 	<buildSpec>
 		<buildCommand>
--- a/moses/ChartManager.cpp
+++ b/moses/ChartManager.cpp
@ -125,7 +125,7 @@ void ChartManager::ProcessSentence()
 */
 void ChartManager::AddXmlChartOptions()
 {
-  const StaticData &staticData = StaticData::Instance();
+  // const StaticData &staticData = StaticData::Instance();

  const std::vector <ChartTranslationOptions*> xmlChartOptionsList = m_source.GetXmlChartTranslationOptions();
  IFVERBOSE(2) {
--- a/moses/ConfusionNet.cpp
+++ b/moses/ConfusionNet.cpp
@ -142,7 +142,7 @@ namespace Moses
  {
    Clear();

-    const StaticData   &staticData   = StaticData::Instance();
+    // const StaticData   &staticData   = StaticData::Instance();
    const InputFeature &inputFeature = InputFeature::Instance();
    size_t numInputScores   = inputFeature.GetNumInputScores();
    size_t numRealWordCount = inputFeature.GetNumRealWordsInInput();
--- a/moses/InputPath.cpp
+++ b/moses/InputPath.cpp
@ -85,7 +85,7 @@ size_t InputPath::GetTotalRuleSize() const
  size_t ret = 0;
  std::map<const PhraseDictionary*, std::pair<const TargetPhraseCollection*, const void*> >::const_iterator iter;
  for (iter = m_targetPhrases.begin(); iter != m_targetPhrases.end(); ++iter) {
-	const PhraseDictionary *pt = iter->first;
+    // const PhraseDictionary *pt = iter->first;
 	const TargetPhraseCollection *tpColl = iter->second.first;

 	if (tpColl) {
--- a/moses/PP/PhraseProperty.h
+++ b/moses/PP/PhraseProperty.h
@ -15,7 +15,7 @@ public:

  virtual void ProcessValue() {};

-  const std::string &GetValueString() { return m_value; };
+  const std::string &GetValueString() const { return m_value; };

 protected:

--- a/moses/Phrase.h
+++ b/moses/Phrase.h
@ -47,8 +47,8 @@ class WordsRange;
 class Phrase
 {
  friend std::ostream& operator<<(std::ostream&, const Phrase&);
-private:
-
+  // private:
+protected:
  std::vector<Word>			m_words;

 public:
--- a/moses/StaticData.cpp
+++ b/moses/StaticData.cpp
@ -494,7 +494,8 @@ bool StaticData::LoadData(Parameter *parameter)
    }
    m_xmlBrackets.first= brackets[0];
    m_xmlBrackets.second=brackets[1];
-    cerr << "XML tags opening and closing brackets for XML input are: " << m_xmlBrackets.first << " and " << m_xmlBrackets.second << endl;
+    VERBOSE(1,"XML tags opening and closing brackets for XML input are: " 
+	    << m_xmlBrackets.first << " and " << m_xmlBrackets.second << endl);
  }

  if (m_parameter->GetParam("placeholder-factor").size() > 0) {
@ -511,7 +512,7 @@ bool StaticData::LoadData(Parameter *parameter)
  const vector<string> &features = m_parameter->GetParam("feature");
  for (size_t i = 0; i < features.size(); ++i) {
    const string &line = Trim(features[i]);
-    cerr << "line=" << line << endl;
+    VERBOSE(1,"line=" << line << endl);
    if (line.empty())
      continue;

@ -535,7 +536,9 @@ bool StaticData::LoadData(Parameter *parameter)
  NoCache();
  OverrideFeatures();

-  LoadFeatureFunctions();
+  if (!m_parameter->isParamSpecified("show-weights")) {
+    LoadFeatureFunctions();
+  }

  if (!LoadDecodeGraphs()) return false;

@ -640,7 +643,8 @@ void StaticData::LoadNonTerminals()
    		  "Incorrect unknown LHS format: " << line);
      UnknownLHSEntry entry(tokens[0], Scan<float>(tokens[1]));
      m_unknownLHS.push_back(entry);
-      const Factor *targetFactor = factorCollection.AddFactor(Output, 0, tokens[0], true);
+      // const Factor *targetFactor = 
+      factorCollection.AddFactor(Output, 0, tokens[0], true);
    }

  }
@ -734,7 +738,7 @@ bool StaticData::LoadDecodeGraphs()
      DecodeGraph *decodeGraph;
      if (IsChart()) {
        size_t maxChartSpan = (decodeGraphInd < maxChartSpans.size()) ? maxChartSpans[decodeGraphInd] : DEFAULT_MAX_CHART_SPAN;
-        cerr << "max-chart-span: " << maxChartSpans[decodeGraphInd] << endl;
+        VERBOSE(1,"max-chart-span: " << maxChartSpans[decodeGraphInd] << endl);
        decodeGraph = new DecodeGraph(m_decodeGraphs.size(), maxChartSpan);
      } else {
        decodeGraph = new DecodeGraph(m_decodeGraphs.size());
@ -866,7 +870,7 @@ void StaticData::SetExecPath(const std::string &path)
  if (pos !=  string::npos) {
    m_binPath = path.substr(0, pos);
  }
-  cerr << m_binPath << endl;
+  VERBOSE(1,m_binPath << endl);
 }

 const string &StaticData::GetBinDirectory() const
@ -920,7 +924,8 @@ void StaticData::LoadFeatureFunctions()
    FeatureFunction *ff = *iter;
    bool doLoad = true;

-    if (PhraseDictionary *ffCast = dynamic_cast<PhraseDictionary*>(ff)) {
+    // if (PhraseDictionary *ffCast = dynamic_cast<PhraseDictionary*>(ff)) {
+    if (dynamic_cast<PhraseDictionary*>(ff)) {
      doLoad = false;
    }

@ -964,7 +969,7 @@ bool StaticData::CheckWeights() const
    set<string>::iterator iter;
    for (iter = weightNames.begin(); iter != weightNames.end(); ) {
      string fname = (*iter).substr(0, (*iter).find("_"));
-      cerr << fname << "\n";
+      VERBOSE(1,fname << "\n");
      if (featureNames.find(fname) != featureNames.end()) {
        weightNames.erase(iter++);
      }
@ -1039,7 +1044,7 @@ bool StaticData::LoadAlternateWeightSettings()
      vector<string> tokens = Tokenize(weightSpecification[i]);
      vector<string> args = Tokenize(tokens[0], "=");
      currentId = args[1];
-      cerr << "alternate weight setting " << currentId << endl;
+      VERBOSE(1,"alternate weight setting " << currentId << endl);
      UTIL_THROW_IF2(m_weightSetting.find(currentId) != m_weightSetting.end(),
    		  "Duplicate alternate weight id: " << currentId);
      m_weightSetting[ currentId ] = new ScoreComponentCollection;
--- a/moses/TargetPhraseCollection.h
+++ b/moses/TargetPhraseCollection.h
@ -44,6 +44,12 @@ public:
  typedef CollType::iterator iterator;
  typedef CollType::const_iterator const_iterator;

+  TargetPhrase const* 
+  operator[](size_t const i) const
+  {
+    return m_collection.at(i);
+  }  
+
  iterator begin() {
    return m_collection.begin();
  }
--- a/moses/TranslationModel/PhraseDictionaryMultiModelCounts.cpp
+++ b/moses/TranslationModel/PhraseDictionaryMultiModelCounts.cpp
@ -17,12 +17,8 @@ License along with this library; if not, write to the Free Software
 Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
 ***********************************************************************/
 #include "util/exception.hh"
-
 #include "moses/TranslationModel/PhraseDictionaryMultiModelCounts.h"

-#define LINE_MAX_LENGTH 100000
-#include "phrase-extract/SafeGetline.h" // for SAFE_GETLINE()
-
 using namespace std;

 template<typename T>
@ -461,16 +457,14 @@ void PhraseDictionaryMultiModelCounts::LoadLexicalTable( string &fileName, lexic
  }
  istream *inFileP = &inFile;

-  char line[LINE_MAX_LENGTH];
-
  int i=0;
-  while(true) {
+  string line;
+
+  while(getline(*inFileP, line)) {
    i++;
    if (i%100000 == 0) cerr << "." << flush;
-    SAFE_GETLINE((*inFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
-    if (inFileP->eof()) break;

-    vector<string> token = tokenize( line );
+    vector<string> token = tokenize( line.c_str() );
    if (token.size() != 4) {
      cerr << "line " << i << " in " << fileName
           << " has wrong number of tokens, skipping:\n"
--- a/moses/TranslationModel/UG/Jamfile
+++ b/moses/TranslationModel/UG/Jamfile
@ -9,6 +9,17 @@ $(TOP)/moses/TranslationModel/UG//mmsapt
 $(TOP)/util//kenutil 
 ; 

+exe lookup_mmsapt : 
+lookup_mmsapt.cc 
+$(TOP)/moses//moses
+$(TOP)/moses/TranslationModel/UG/generic//generic 
+$(TOP)//boost_iostreams 
+$(TOP)//boost_program_options 
+$(TOP)/moses/TranslationModel/UG/mm//mm 
+$(TOP)/moses/TranslationModel/UG//mmsapt 
+$(TOP)/util//kenutil 
+; 
+
 install $(PREFIX)/bin : try-align ; 

 fakelib mmsapt : [ glob *.cpp mmsapt*.cc ] ;
--- a/moses/TranslationModel/UG/lookup_mmsapt.cc
+++ b/moses/TranslationModel/UG/lookup_mmsapt.cc
@ -0,0 +1,76 @@
+#include "mmsapt.h"
+#include <boost/foreach.hpp>
+#include <boost/tokenizer.hpp>
+#include <boost/shared_ptr.hpp>
+#include <algorithm>
+#include <iostream>
+
+using namespace Moses;
+using namespace bitext;
+using namespace std;
+using namespace boost;
+
+vector<FactorType> fo(1,FactorType(0));
+
+class SimplePhrase : public Moses::Phrase
+{
+  vector<FactorType> const m_fo; // factor order
+public:
+  SimplePhrase(): m_fo(1,FactorType(0)) {}
+  
+  void init(string const& s) 
+  {
+    istringstream buf(s); string w;
+    while (buf >> w) 
+      {
+	Word wrd; 
+	this->AddWord().CreateFromString(Input,m_fo,StringPiece(w),false,false);
+      }
+  }
+};
+
+class TargetPhraseIndexSorter
+{
+  TargetPhraseCollection const& my_tpc;
+  CompareTargetPhrase cmp;
+public:
+  TargetPhraseIndexSorter(TargetPhraseCollection const& tpc) : my_tpc(tpc) {}
+  bool operator()(size_t a, size_t b) const
+  {
+    return cmp(*my_tpc[a], *my_tpc[b]);
+  }
+};
+
+int main(int argc, char* argv[])
+{
+  Parameter params;
+  if (!params.LoadParam(argc,argv) || !StaticData::LoadDataStatic(&params, argv[0]))
+    exit(1);
+
+  Mmsapt* PT;
+  BOOST_FOREACH(PhraseDictionary* pd, PhraseDictionary::GetColl())
+    if ((PT = dynamic_cast<Mmsapt*>(pd))) break;
+
+  string line;
+  while (getline(cin,line))
+    {
+      SimplePhrase p; p.init(line); 
+      cout << p << endl;
+      TargetPhraseCollection const* trg = PT->GetTargetPhraseCollectionLEGACY(p);
+      if (!trg) continue;
+      vector<size_t> order(trg->GetSize()); 
+      for (size_t i = 0; i < order.size(); ++i) order[i] = i;
+      sort(order.begin(),order.end(),TargetPhraseIndexSorter(*trg));
+      size_t k = 0;
+      BOOST_FOREACH(size_t i, order)
+	{
+	  Phrase const& phr = static_cast<Phrase const&>(*(*trg)[i]);
+	  cout << setw(3) << ++k << " " << phr << endl;
+	}
+      PT->Release(trg);
+    }
+  exit(0);
+}
+  
+  
+
--- a/moses/TranslationModel/UG/mm/mmlex-lookup.cc
+++ b/moses/TranslationModel/UG/mm/mmlex-lookup.cc
@ -131,7 +131,7 @@ interpret_args(int ac, char* av[])
  o.add_options()
    ("help,h",    "print this message")
    ("source,s",po::value<string>(&swrd),"source word")
-    ("target,t",po::value<string>(&swrd),"target word")
+    ("target,t",po::value<string>(&twrd),"target word")
    ;
  
  h.add_options()
--- a/moses/TranslationModel/UG/mm/ug_bitext.h
+++ b/moses/TranslationModel/UG/mm/ug_bitext.h
@ -318,10 +318,10 @@ namespace Moses {
 	assert(pp.sample1);
 	assert(pp.joint);
 	assert(pp.raw2);
-	(*dest)[i] = log(pp.raw1);
-	(*dest)[++i] = log(pp.sample1);
-	(*dest)[++i] = log(pp.joint);
-	(*dest)[++i] = log(pp.raw2);
+	(*dest)[i]   = -log(pp.raw1);
+	(*dest)[++i] = -log(pp.sample1);
+	(*dest)[++i] = +log(pp.joint);
+	(*dest)[++i] = -log(pp.raw2);
      }
    };

@ -590,8 +590,9 @@ namespace Moses {
 	static ThreadSafeCounter active;
 	boost::mutex lock; 
 	friend class agenda;
-	boost::taus88 rnd; // every job has its own pseudo random generator 
-	double rnddenom;   // denominator for scaling random sampling
+	boost::taus88 rnd;  // every job has its own pseudo random generator 
+	double rnddenom;    // denominator for scaling random sampling
+	size_t min_diverse; // minimum number of distinct translations
      public:
 	size_t         workers; // how many workers are working on this job?
 	sptr<TSA<Token> const> root; // root of the underlying suffix array
@ -644,34 +645,47 @@ namespace Moses {
    step(uint64_t & sid, uint64_t & offset)
    {
      boost::lock_guard<boost::mutex> jguard(lock);
-      if ((max_samples == 0) && (next < stop))
+      bool ret = (max_samples == 0) && (next < stop);
+      if (ret)
 	{
 	  next = root->readSid(next,stop,sid);
 	  next = root->readOffset(next,stop,offset);
 	  boost::lock_guard<boost::mutex> sguard(stats->lock);
 	  if (stats->raw_cnt == ctr) ++stats->raw_cnt;
 	  stats->sample_cnt++;
-	  return true;
 	}
      else 
 	{
-	  while (next < stop && stats->good < max_samples)
+	  while (next < stop && (stats->good < max_samples || 
+				 stats->trg.size() < min_diverse))
 	    {
 	      next = root->readSid(next,stop,sid);
 	      next = root->readOffset(next,stop,offset);
-	      {
-		boost::lock_guard<boost::mutex> sguard(stats->lock);
+	      { // brackets required for lock scoping; see sguard immediately below
+		boost::lock_guard<boost::mutex> sguard(stats->lock); 
 		if (stats->raw_cnt == ctr) ++stats->raw_cnt;
-		size_t rnum = (stats->raw_cnt - ctr++)*(rnd()/(rnd.max()+1.));
+		size_t scalefac = (stats->raw_cnt - ctr++);
+		size_t rnum = scalefac*(rnd()/(rnd.max()+1.));
+#if 0
+		cerr << rnum << "/" << scalefac << " vs. " 
+		     << max_samples - stats->good << " ("
+		     << max_samples << " - " << stats->good << ")" 
+		     << endl;
+#endif
 		if (rnum < max_samples - stats->good)
 		  {
 		    stats->sample_cnt++;
-		    return true;
+		    ret = true;
+		    break;
 		  }
 	      }
 	    }
-	  return false;
 	}
+      
+      // boost::lock_guard<boost::mutex> sguard(stats->lock); 
+      // abuse of lock for clean output to cerr
+      // cerr << stats->sample_cnt++;
+      return ret;
    }

    template<typename Token>
@ -713,6 +727,13 @@ namespace Moses {
    worker::
    operator()()
    {
+      // things to do:
+      // - have each worker maintain their own pstats object and merge results at the end;
+      // - ensure the minimum size of samples considered by a non-locked counter that is only 
+      //   ever incremented -- who cares if we look at more samples than required, as long
+      //   as we look at at least the minimum required
+      // This way, we can reduce the number of lock / unlock operations we need to do during 
+      // sampling. 
      size_t s1=0, s2=0, e1=0, e2=0;
      uint64_t sid=0, offset=0; // of the source phrase
      while(sptr<job> j = ag.get_job())
@ -812,6 +833,7 @@ namespace Moses {
 	sptr<TSA<Token> > const& r, size_t maxsmpl, bool isfwd)
      : rnd(0)
      , rnddenom(rnd.max() + 1.)
+      , min_diverse(10)
      , workers(0)
      , root(r)
      , next(m.lower_bound(-1))
--- a/moses/TranslationModel/UG/mmsapt.cpp
+++ b/moses/TranslationModel/UG/mmsapt.cpp
@ -122,16 +122,16 @@ namespace Moses
    if (m != param.end())
      withPbwd = m->second != "0";
      
-    m_default_sample_size = m != param.end() ? atoi(m->second.c_str()) : 1000;
-
    m = param.find("workers");
    m_workers = m != param.end() ? atoi(m->second.c_str()) : 8;
    m_workers = min(m_workers,24UL);

+    m = param.find("limit");
+    if (m != param.end()) m_tableLimit = atoi(m->second.c_str());
+
    m = param.find("cache-size");
-    m_history.reserve(m != param.end() 
-		      ? max(1000,atoi(m->second.c_str()))
-		      : 10000);
+    m_history.reserve(m != param.end()?max(1000,atoi(m->second.c_str())):10000);
+    // in plain language: cache size is at least 1000, and 10,000 by default
    
    this->m_numScoreComponents = atoi(param["num-features"].c_str());

@ -196,8 +196,8 @@ namespace Moses
    // currently always active by default; may (should) change later
    num_feats  = calc_lex.init(num_feats, bname + L1 + "-" + L2 + ".lex");

-    if (this->m_numScoreComponents%2) // a bit of a hack, for backwards compatibility
-      num_feats  = apply_pp.init(num_feats);
+    // if (this->m_numScoreComponents%2) // a bit of a hack, for backwards compatibility
+    // num_feats  = apply_pp.init(num_feats);

    if (num_feats < this->m_numScoreComponents)
      {
@ -283,8 +283,8 @@ namespace Moses
  {
    PhrasePair pp;   
    pp.init(pid1, stats, this->m_numScoreComponents);
-    if (this->m_numScoreComponents%2)
-      apply_pp(bt,pp);
+    // if (this->m_numScoreComponents%2)
+    // apply_pp(bt,pp);
    pstats::trg_map_t::const_iterator t;
    for (t = stats.trg.begin(); t != stats.trg.end(); ++t)
      {
@ -318,8 +318,8 @@ namespace Moses
      pp.init(pid1b, *statsb, this->m_numScoreComponents);
    else return false; // throw "no stats for pooling available!";

-    if (this->m_numScoreComponents%2)
-      apply_pp(bta,pp);
+    // if (this->m_numScoreComponents%2)
+    // apply_pp(bta,pp);
    pstats::trg_map_t::const_iterator b;
    pstats::trg_map_t::iterator a;
    if (statsb)
@ -368,6 +368,13 @@ namespace Moses
 	  }
 	else 
 	  pp.update(a->first,a->second);
+#if 0
+	// jstats const& j = a->second;
+	cerr << bta.T1->pid2str(bta.V1.get(),pp.p1) << " ::: " 
+	     << bta.T2->pid2str(bta.V2.get(),pp.p2) << endl;
+	cerr << pp.raw1 << " " << pp.sample1 << " " << pp.good1 << " " 
+	     << pp.joint << " " << pp.raw2 << endl;
+#endif

 	UTIL_THROW_IF2(pp.raw2 == 0, 
 		       "OOPS" 
@ -376,12 +383,6 @@ namespace Moses
 		       << pp.raw1 << " " << pp.sample1 << " " 
 		       << pp.good1 << " " << pp.joint << " " 
 		       << pp.raw2);
-#if 0
-	jstats const& j = a->second;
-	cerr << bta.T1->pid2str(bta.V1.get(),pp.p1) << " ::: " 
-	     << bta.T2->pid2str(bta.V2.get(),pp.p2) << endl;
-	cerr << j.rcnt() << " " << j.cnt2() << " " << j.wcnt() << endl;
-#endif
 	calc_lex(bta,pp);
 	if (withPfwd) calc_pfwd_fix(bta,pp);
 	if (withPbwd) calc_pbwd_fix(bta,pp);
@ -415,8 +416,8 @@ namespace Moses
    if (statsb)
      {
 	pool.init(pid1b,*statsb,0);
-	if (this->m_numScoreComponents%2)
-	  apply_pp(btb,ppdyn);
+	// if (this->m_numScoreComponents%2)
+	// apply_pp(btb,ppdyn);
 	for (b = statsb->trg.begin(); b != statsb->trg.end(); ++b)
 	  {
 	    ppdyn.update(b->first,b->second);
@ -456,8 +457,8 @@ namespace Moses
    if (statsa)
      {
 	pool.init(pid1a,*statsa,0);
-	if (this->m_numScoreComponents%2)
-	  apply_pp(bta,ppfix);
+	// if (this->m_numScoreComponents%2)
+	// apply_pp(bta,ppfix);
 	for (a = statsa->trg.begin(); a != statsa->trg.end(); ++a)
 	  {
 	    if (!a->second.valid()) continue; // done above
@ -662,7 +663,7 @@ namespace Moses
 	|| combine_pstats(src, mfix.getPid(),sfix.get(),btfix, 
 			  mdyn.getPid(),sdyn.get(),*dyn,ret))
      {
-	ret->NthElement(m_tableLimit);
+	if (m_tableLimit) ret->Prune(true,m_tableLimit);
 #if 0
 	sort(ret->begin(), ret->end(), CompareTargetPhrase());
 	cout << "SOURCE PHRASE: " << src << endl;
@ -683,6 +684,14 @@ namespace Moses
    return encache(ret);
  }

+  size_t 
+  Mmsapt::
+  SetTableLimit(size_t limit)
+  {
+    std::swap(m_tableLimit,limit);
+    return limit;
+  }
+
  void
  Mmsapt::
  CleanUpAfterSentenceProcessing(const InputType& source)
--- a/moses/TranslationModel/UG/mmsapt.h
+++ b/moses/TranslationModel/UG/mmsapt.h
@ -71,7 +71,7 @@ namespace Moses
    PScorePfwd<Token> calc_pfwd_fix, calc_pfwd_dyn;
    PScorePbwd<Token> calc_pbwd_fix, calc_pbwd_dyn;
    PScoreLex<Token>  calc_lex; // this one I'd like to see as an external ff eventually
-    PScorePP<Token>   apply_pp; // apply phrase penalty 
+    // PScorePP<Token>   apply_pp; // apply phrase penalty 
    PScoreLogCounts<Token>   add_logcounts_fix;
    PScoreLogCounts<Token>   add_logcounts_dyn;
    void init(string const& line);
@ -168,6 +168,9 @@ namespace Moses
    void
    Load();
    
+    // returns the prior table limit
+    size_t SetTableLimit(size_t limit);
+
 #ifndef NO_MOSES
    TargetPhraseCollection const* 
    GetTargetPhraseCollectionLEGACY(const Phrase& src) const;
--- a/moses/TranslationModel/fuzzy-match/FuzzyMatchWrapper.cpp
+++ b/moses/TranslationModel/fuzzy-match/FuzzyMatchWrapper.cpp
@ -413,11 +413,9 @@ void FuzzyMatchWrapper::load_corpus( const std::string &fileName, vector< vector

  istream *fileStreamP = &fileStream;

-  char line[LINE_MAX_LENGTH];
-  while(true) {
-    SAFE_GETLINE((*fileStreamP), line, LINE_MAX_LENGTH, '\n');
-    if (fileStreamP->eof()) break;
-    corpus.push_back( GetVocabulary().Tokenize( line ) );
+  string line;
+  while(getline(*fileStreamP, line)) {
+    corpus.push_back( GetVocabulary().Tokenize( line.c_str() ) );
  }
 }

@ -436,12 +434,9 @@ void FuzzyMatchWrapper::load_target(const std::string &fileName, vector< vector<
  WORD_ID delimiter = GetVocabulary().StoreIfNew("|||");

  int lineNum = 0;
-  char line[LINE_MAX_LENGTH];
-  while(true) {
-    SAFE_GETLINE((*fileStreamP), line, LINE_MAX_LENGTH, '\n');
-    if (fileStreamP->eof()) break;
-
-    vector<WORD_ID> toks = GetVocabulary().Tokenize( line );
+  string line;
+  while(getline(*fileStreamP, line)) {
+    vector<WORD_ID> toks = GetVocabulary().Tokenize( line.c_str() );

    corpus.push_back(vector< SentenceAlignment >());
    vector< SentenceAlignment > &vec = corpus.back();
@ -493,11 +488,8 @@ void FuzzyMatchWrapper::load_alignment(const std::string &fileName, vector< vect
  string delimiter = "|||";

  int lineNum = 0;
-  char line[LINE_MAX_LENGTH];
-  while(true) {
-    SAFE_GETLINE((*fileStreamP), line, LINE_MAX_LENGTH, '\n');
-    if (fileStreamP->eof()) break;
-
+  string line;
+  while(getline(*fileStreamP, line)) {
    vector< SentenceAlignment > &vec = corpus[lineNum];
    size_t targetInd = 0;
    SentenceAlignment *sentence = &vec[targetInd];
--- a/moses/TranslationModel/fuzzy-match/SuffixArray.cpp
+++ b/moses/TranslationModel/fuzzy-match/SuffixArray.cpp
@ -14,17 +14,16 @@ SuffixArray::SuffixArray( string fileName )
  m_endOfSentence = m_vcb.StoreIfNew( "<s>" );

  ifstream extractFile;
-  char line[LINE_MAX_LENGTH];

  // count the number of words first;
  extractFile.open(fileName.c_str());
  istream *fileP = &extractFile;
  m_size = 0;
  size_t sentenceCount = 0;
-  while(!fileP->eof()) {
-    SAFE_GETLINE((*fileP), line, LINE_MAX_LENGTH, '\n');
-    if (fileP->eof()) break;
-    vector< WORD_ID > words = m_vcb.Tokenize( line );
+  string line;
+  while(getline(*fileP, line)) {
+
+    vector< WORD_ID > words = m_vcb.Tokenize( line.c_str() );
    m_size += words.size() + 1;
    sentenceCount++;
  }
@ -43,10 +42,8 @@ SuffixArray::SuffixArray( string fileName )
  int sentenceId = 0;
  extractFile.open(fileName.c_str());
  fileP = &extractFile;
-  while(!fileP->eof()) {
-    SAFE_GETLINE((*fileP), line, LINE_MAX_LENGTH, '\n');
-    if (fileP->eof()) break;
-    vector< WORD_ID > words = m_vcb.Tokenize( line );
+  while(getline(*fileP, line)) {
+    vector< WORD_ID > words = m_vcb.Tokenize( line.c_str() );

    // add to corpus vector
    corpus.push_back(words);
--- a/moses/TranslationModel/fuzzy-match/Vocabulary.h
+++ b/moses/TranslationModel/fuzzy-match/Vocabulary.h
@ -17,20 +17,6 @@

 namespace tmmt
 {
-
-#define MAX_LENGTH 10000
-
-#define SAFE_GETLINE(_IS, _LINE, _SIZE, _DELIM) { \
-                _IS.getline(_LINE, _SIZE, _DELIM); \
-                if(_IS.fail() && !_IS.bad() && !_IS.eof()) _IS.clear(); \
-                if (_IS.gcount() == _SIZE-1) { \
-                  cerr << "Line too long! Buffer overflow. Delete lines >=" \
-                    << _SIZE << " chars or raise MAX_LENGTH in phrase-extract/tables-core.cpp" \
-                    << endl; \
-                    exit(1); \
-                } \
-              }
-
 typedef std::string WORD;
 typedef unsigned int WORD_ID;

--- a/phrase-extract/DomainFeature.cpp
+++ b/phrase-extract/DomainFeature.cpp
@ -2,9 +2,6 @@
 #include "ExtractionPhrasePair.h"
 #include "tables-core.h"
 #include "InputFileStream.h"
-#include "SafeGetline.h"
-
-#define TABLE_LINE_MAX_LENGTH 1000

 using namespace std;

@ -16,12 +13,11 @@ void Domain::load( const std::string &domainFileName )
 {
  Moses::InputFileStream fileS( domainFileName );
  istream *fileP = &fileS;
-  while(true) {
-    char line[TABLE_LINE_MAX_LENGTH];
-    SAFE_GETLINE((*fileP), line, TABLE_LINE_MAX_LENGTH, '\n', __FILE__);
-    if (fileP->eof()) break;
+
+	string line;
+  while(getline(*fileP, line)) {
    // read
-    vector< string > domainSpecLine = tokenize( line );
+    vector< string > domainSpecLine = tokenize( line.c_str() );
    int lineNumber;
    if (domainSpecLine.size() != 2 ||
        ! sscanf(domainSpecLine[0].c_str(), "%d", &lineNumber)) {
--- a/phrase-extract/ExtractionPhrasePair.cpp
+++ b/phrase-extract/ExtractionPhrasePair.cpp
@ -19,7 +19,6 @@

 #include <sstream>
 #include "ExtractionPhrasePair.h"
-#include "SafeGetline.h"
 #include "tables-core.h"
 #include "score.h"
 #include "moses/Util.h"
--- a/phrase-extract/SafeGetline.h
+++ b/phrase-extract/SafeGetline.h
@ -1,35 +0,0 @@
-/***********************************************************************
-  Moses - factored phrase-based language decoder
-  Copyright (C) 2010 University of Edinburgh
-
-  This library is free software; you can redistribute it and/or
-  modify it under the terms of the GNU Lesser General Public
-  License as published by the Free Software Foundation; either
-  version 2.1 of the License, or (at your option) any later version.
-
-  This library is distributed in the hope that it will be useful,
-  but WITHOUT ANY WARRANTY; without even the implied warranty of
-  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-  Lesser General Public License for more details.
-
-  You should have received a copy of the GNU Lesser General Public
-  License along with this library; if not, write to the Free Software
-  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
- ***********************************************************************/
-
-#pragma once
-#ifndef SAFE_GETLINE_INCLUDED_
-#define SAFE_GETLINE_INCLUDED_
-
-#define SAFE_GETLINE(_IS, _LINE, _SIZE, _DELIM, _FILE) {            \
-    _IS.getline(_LINE, _SIZE, _DELIM);                              \
-    if(_IS.fail() && !_IS.bad() && !_IS.eof()) _IS.clear();         \
-    if (_IS.gcount() == _SIZE-1) {                                  \
-      cerr << "Line too long! Buffer overflow. Delete lines >="     \
-       << _SIZE << " chars or raise LINE_MAX_LENGTH in " << _FILE   \
-       << endl;                                                     \
-      exit(1);                                                      \
-    }                                                               \
-  }
-
-#endif
--- a/phrase-extract/SentenceAlignment.cpp
+++ b/phrase-extract/SentenceAlignment.cpp
@ -54,7 +54,11 @@ bool SentenceAlignment::processSourceSentence(const char * sourceString, int, bo
  return true;
 }

-bool SentenceAlignment::create( char targetString[], char sourceString[], char alignmentString[], char weightString[], int sentenceID, bool boundaryRules)
+bool SentenceAlignment::create(const char targetString[],
+							const char sourceString[],
+							const char alignmentString[],
+							const char weightString[],
+							int sentenceID, bool boundaryRules)
 {
  using namespace std;
  this->sentenceID = sentenceID;
--- a/phrase-extract/SentenceAlignment.h
+++ b/phrase-extract/SentenceAlignment.h
@ -43,8 +43,11 @@ public:

  virtual bool processSourceSentence(const char *, int, bool boundaryRules);

-  bool create(char targetString[], char sourceString[],
-              char alignmentString[], char weightString[], int sentenceID, bool boundaryRules);
+  bool create(const char targetString[],
+		  	  const char sourceString[],
+		  	  const char alignmentString[],
+		  	  const char weightString[],
+		  	  int sentenceID, bool boundaryRules);

  void invertAlignment();

--- a/phrase-extract/consolidate-direct-main.cpp
+++ b/phrase-extract/consolidate-direct-main.cpp
@ -26,16 +26,9 @@
 #include "InputFileStream.h"
 #include "OutputFileStream.h"

-#include "SafeGetline.h"
-
-#define LINE_MAX_LENGTH 10000
-
 using namespace std;

-char line[LINE_MAX_LENGTH];
-
-
-vector< string > splitLine()
+vector< string > splitLine(const char *line)
 {
  vector< string > item;
  int start=0;
@ -61,14 +54,15 @@ bool getLine( istream &fileP, vector< string > &item )
 {
  if (fileP.eof())
    return false;
-
-  SAFE_GETLINE((fileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
-  if (fileP.eof())
+  
+  string line;
+  if (getline(fileP, line)) {
+    item = splitLine(line.c_str());
    return false;
-
-  item = splitLine();
-
-  return true;
+  }
+  else {
+    return false;
+  }
 }


--- a/phrase-extract/consolidate-main.cpp
+++ b/phrase-extract/consolidate-main.cpp
@ -26,12 +26,9 @@
 #include <cstring>

 #include "tables-core.h"
-#include "SafeGetline.h"
 #include "InputFileStream.h"
 #include "OutputFileStream.h"

-#define LINE_MAX_LENGTH 10000
-
 using namespace std;

 bool hierarchicalFlag = false;
@ -46,12 +43,11 @@ inline float maybeLogProb( float a )
  return logProbFlag ? log(a) : a;
 }

-char line[LINE_MAX_LENGTH];
 void processFiles( char*, char*, char*, char* );
 void loadCountOfCounts( char* );
 void breakdownCoreAndSparse( string combined, string &core, string &sparse );
 bool getLine( istream &fileP, vector< string > &item );
-vector< string > splitLine();
+vector< string > splitLine(const char *line);
 vector< int > countBin;
 bool sparseCountBinFeatureFlag = false;

@ -140,14 +136,13 @@ void loadCountOfCounts( char* fileNameCountOfCounts )
  istream &fileP = fileCountOfCounts;

  countOfCounts.push_back(0.0);
-  while(1) {
-    if (fileP.eof()) break;
-    SAFE_GETLINE((fileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
-    if (fileP.eof()) break;
+
+  string line;
+  while (getline(fileP, line)) {
    if (totalCount < 0)
-      totalCount = atof(line); // total number of distinct phrase pairs
+      totalCount = atof(line.c_str()); // total number of distinct phrase pairs
    else
-      countOfCounts.push_back( atof(line) );
+      countOfCounts.push_back( atof(line.c_str()) );
  }
  fileCountOfCounts.Close();

@ -370,16 +365,16 @@ bool getLine( istream &fileP, vector< string > &item )
  if (fileP.eof())
    return false;

-  SAFE_GETLINE((fileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
-  if (fileP.eof())
+  string line;
+  if (!getline(fileP, line))
    return false;

-  item = splitLine();
+  item = splitLine(line.c_str());

  return true;
 }

-vector< string > splitLine()
+vector< string > splitLine(const char *line)
 {
  vector< string > item;
  int start=0;
--- a/phrase-extract/consolidate-reverse-main.cpp
+++ b/phrase-extract/consolidate-reverse-main.cpp
@ -27,23 +27,19 @@
 #include <cstring>

 #include "tables-core.h"
-#include "SafeGetline.h"
 #include "InputFileStream.h"

-#define LINE_MAX_LENGTH 10000
-
 using namespace std;

 bool hierarchicalFlag = false;
 bool onlyDirectFlag = false;
 bool phraseCountFlag = true;
 bool logProbFlag = false;
-char line[LINE_MAX_LENGTH];

 void processFiles( char*, char*, char* );
 bool getLine( istream &fileP, vector< string > &item );
 string reverseAlignment(const string &alignments);
-vector< string > splitLine();
+vector< string > splitLine(const char *lin);

 inline void Tokenize(std::vector<std::string> &output
                     , const std::string& str
@ -190,17 +186,18 @@ bool getLine( istream &fileP, vector< string > &item )
 {
  if (fileP.eof())
    return false;
-
-  SAFE_GETLINE((fileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
-  if (fileP.eof())
+  
+  string line;
+  if (getline(fileP, line)) {
+    item = splitLine(line.c_str());
    return false;
-
-  item = splitLine();
-
-  return true;
+  }
+  else {
+    return false;
+  }
 }

-vector< string > splitLine()
+vector< string > splitLine(const char *line)
 {
  vector< string > item;
  bool betweenWords = true;
--- a/phrase-extract/extract-main.cpp
+++ b/phrase-extract/extract-main.cpp
@ -19,7 +19,6 @@
 #include <set>
 #include <vector>

-#include "SafeGetline.h"
 #include "SentenceAlignment.h"
 #include "tables-core.h"
 #include "InputFileStream.h"
@ -32,10 +31,6 @@ using namespace MosesTraining;
 namespace MosesTraining
 {

-
-const long int LINE_MAX_LENGTH = 500000 ;
-
-
 // HPhraseVertex represents a point in the alignment matrix
 typedef pair <int, int> HPhraseVertex;

@ -277,20 +272,18 @@ int main(int argc, char* argv[])

  int i = sentenceOffset;

-  while(true) {
+  string englishString, foreignString, alignmentString, weightString;
+
+  while(getline(*eFileP, englishString)) {
    i++;
    if (i%10000 == 0) cerr << "." << flush;
-    char englishString[LINE_MAX_LENGTH];
-    char foreignString[LINE_MAX_LENGTH];
-    char alignmentString[LINE_MAX_LENGTH];
-    char weightString[LINE_MAX_LENGTH];
-    SAFE_GETLINE((*eFileP), englishString, LINE_MAX_LENGTH, '\n', __FILE__);
-    if (eFileP->eof()) break;
-    SAFE_GETLINE((*fFileP), foreignString, LINE_MAX_LENGTH, '\n', __FILE__);
-    SAFE_GETLINE((*aFileP), alignmentString, LINE_MAX_LENGTH, '\n', __FILE__);
+
+    getline(*fFileP, foreignString);
+    getline(*aFileP, alignmentString);
    if (iwFileP) {
-      SAFE_GETLINE((*iwFileP), weightString, LINE_MAX_LENGTH, '\n', __FILE__);
+      getline(*iwFileP, weightString);
    }
+
    SentenceAlignment sentence;
    // cout << "read in: " << englishString << " & " << foreignString << " & " << alignmentString << endl;
    //az: output src, tgt, and alingment line
@ -300,7 +293,11 @@ int main(int argc, char* argv[])
      cout << "LOG: ALT: " << alignmentString << endl;
      cout << "LOG: PHRASES_BEGIN:" << endl;
    }
-    if (sentence.create( englishString, foreignString, alignmentString, weightString, i, false)) {
+    if (sentence.create( englishString.c_str(),
+    					foreignString.c_str(),
+    					alignmentString.c_str(),
+    					weightString.c_str(),
+    					i, false)) {
      if (options.placeholders.size()) {
        sentence.invertAlignment();
      }
--- a/phrase-extract/extract-ordering-main.cpp
+++ b/phrase-extract/extract-ordering-main.cpp
@ -19,7 +19,6 @@
 #include <set>
 #include <vector>

-#include "SafeGetline.h"
 #include "SentenceAlignment.h"
 #include "tables-core.h"
 #include "InputFileStream.h"
@ -32,10 +31,6 @@ using namespace MosesTraining;
 namespace MosesTraining
 {

-
-const long int LINE_MAX_LENGTH = 500000 ;
-
-
 // HPhraseVertex represents a point in the alignment matrix
 typedef pair <int, int> HPhraseVertex;

@ -246,20 +241,20 @@ int main(int argc, char* argv[])

  int i = sentenceOffset;

-  while(true) {
+  string englishString, foreignString, alignmentString, weightString;
+
+  while(getline(*eFileP, englishString)) {
    i++;
-    if (i%10000 == 0) cerr << "." << flush;
-    char englishString[LINE_MAX_LENGTH];
-    char foreignString[LINE_MAX_LENGTH];
-    char alignmentString[LINE_MAX_LENGTH];
-    char weightString[LINE_MAX_LENGTH];
-    SAFE_GETLINE((*eFileP), englishString, LINE_MAX_LENGTH, '\n', __FILE__);
-    if (eFileP->eof()) break;
-    SAFE_GETLINE((*fFileP), foreignString, LINE_MAX_LENGTH, '\n', __FILE__);
-    SAFE_GETLINE((*aFileP), alignmentString, LINE_MAX_LENGTH, '\n', __FILE__);
+
+    getline(*eFileP, englishString);
+    getline(*fFileP, foreignString);
+    getline(*aFileP, alignmentString);
    if (iwFileP) {
-      SAFE_GETLINE((*iwFileP), weightString, LINE_MAX_LENGTH, '\n', __FILE__);
+      getline(*iwFileP, weightString);
    }
+
+    if (i%10000 == 0) cerr << "." << flush;
+
    SentenceAlignment sentence;
    // cout << "read in: " << englishString << " & " << foreignString << " & " << alignmentString << endl;
    //az: output src, tgt, and alingment line
@ -269,7 +264,7 @@ int main(int argc, char* argv[])
      cout << "LOG: ALT: " << alignmentString << endl;
      cout << "LOG: PHRASES_BEGIN:" << endl;
    }
-    if (sentence.create( englishString, foreignString, alignmentString, weightString, i, false)) {
+    if (sentence.create( englishString.c_str(), foreignString.c_str(), alignmentString.c_str(), weightString.c_str(), i, false)) {
      ExtractTask *task = new ExtractTask(i-1, sentence, options, extractFileOrientation);
      task->Run();
      delete task;
--- a/phrase-extract/extract-rules-main.cpp
+++ b/phrase-extract/extract-rules-main.cpp
@ -39,7 +39,6 @@
 #include "Hole.h"
 #include "HoleCollection.h"
 #include "RuleExist.h"
-#include "SafeGetline.h"
 #include "SentenceAlignmentWithSyntax.h"
 #include "SyntaxTree.h"
 #include "tables-core.h"
@ -47,8 +46,6 @@
 #include "InputFileStream.h"
 #include "OutputFileStream.h"

-#define LINE_MAX_LENGTH 500000
-
 using namespace std;
 using namespace MosesTraining;

@ -326,17 +323,15 @@ int main(int argc, char* argv[])

  // loop through all sentence pairs
  size_t i=sentenceOffset;
-  while(true) {
-    i++;
-    if (i%1000 == 0) cerr << i << " " << flush;
+  string targetString, sourceString, alignmentString;

-    char targetString[LINE_MAX_LENGTH];
-    char sourceString[LINE_MAX_LENGTH];
-    char alignmentString[LINE_MAX_LENGTH];
-    SAFE_GETLINE((*tFileP), targetString, LINE_MAX_LENGTH, '\n', __FILE__);
-    if (tFileP->eof()) break;
-    SAFE_GETLINE((*sFileP), sourceString, LINE_MAX_LENGTH, '\n', __FILE__);
-    SAFE_GETLINE((*aFileP), alignmentString, LINE_MAX_LENGTH, '\n', __FILE__);
+  while(getline(*tFileP, targetString)) {
+    i++;
+
+    getline(*sFileP, sourceString);
+    getline(*aFileP, alignmentString);
+
+    if (i%1000 == 0) cerr << i << " " << flush;

    SentenceAlignmentWithSyntax sentence
    (targetLabelCollection, sourceLabelCollection,
@ -349,7 +344,7 @@ int main(int argc, char* argv[])
      cout << "LOG: PHRASES_BEGIN:" << endl;
    }

-    if (sentence.create(targetString, sourceString, alignmentString,"", i, options.boundaryRules)) {
+    if (sentence.create(targetString.c_str(), sourceString.c_str(), alignmentString.c_str(),"", i, options.boundaryRules)) {
      if (options.unknownWordLabelFlag) {
        collectWordLabelCounts(sentence);
      }
--- a/phrase-extract/relax-parse-main.cpp
+++ b/phrase-extract/relax-parse-main.cpp
@ -20,8 +20,6 @@
 ***********************************************************************/

 #include "relax-parse.h"
-
-#include "SafeGetline.h"
 #include "tables-core.h"

 using namespace std;
@ -33,17 +31,13 @@ int main(int argc, char* argv[])

  // loop through all sentences
  int i=0;
-  char inBuffer[LINE_MAX_LENGTH];
-  while(true) {
+  string inBuffer;
+  while(getline(cin, inBuffer)) {
    i++;
    if (i%1000 == 0) cerr << "." << flush;
    if (i%10000 == 0) cerr << ":" << flush;
    if (i%100000 == 0) cerr << "!" << flush;

-    // get line from stdin
-    SAFE_GETLINE( cin, inBuffer, LINE_MAX_LENGTH, '\n', __FILE__);
-    if (cin.eof()) break;
-
    // process into syntax tree representation
    string inBufferString = string( inBuffer );
    set< string > labelCollection;         // set of labels, not used
--- a/phrase-extract/score-main.cpp
+++ b/phrase-extract/score-main.cpp
@ -29,7 +29,6 @@
 #include <vector>
 #include <algorithm>

-#include "SafeGetline.h"
 #include "ScoreFeature.h"
 #include "tables-core.h"
 #include "ExtractionPhrasePair.h"
@ -40,8 +39,6 @@
 using namespace std;
 using namespace MosesTraining;

-#define LINE_MAX_LENGTH 100000
-
 namespace MosesTraining
 {
 LexicalTable lexTable;
@ -232,7 +229,7 @@ int main(int argc, char* argv[])
  }

  // loop through all extracted phrase translations
-  char line[LINE_MAX_LENGTH], lastLine[LINE_MAX_LENGTH];
+  string line, lastLine;
  lastLine[0] = '\0';
  ExtractionPhrasePair *phrasePair = NULL;
  std::vector< ExtractionPhrasePair* > phrasePairsWithSameSource;
@ -245,8 +242,8 @@ int main(int argc, char* argv[])
  float tmpCount=0.0f, tmpPcfgSum=0.0f;

  int i=0;
-  SAFE_GETLINE( (extractFileP), line, LINE_MAX_LENGTH, '\n', __FILE__ );
-  if ( !extractFileP.eof() ) {
+  // TODO why read only the 1st line?
+  if ( getline(extractFileP, line)) {
    ++i;
    tmpPhraseSource = new PHRASE();
    tmpPhraseTarget = new PHRASE();
@ -265,23 +262,21 @@ int main(int argc, char* argv[])
    if ( hierarchicalFlag ) {
      phrasePairsWithSameSourceAndTarget.push_back( phrasePair );
    }
-    strcpy( lastLine, line );
-    SAFE_GETLINE( (extractFileP), line, LINE_MAX_LENGTH, '\n', __FILE__ );
+    lastLine = line;
  }

-  while ( !extractFileP.eof() ) {
+  while ( getline(extractFileP, line) ) {

    if ( ++i % 100000 == 0 ) {
      std::cerr << "." << std::flush;
    }

    // identical to last line? just add count
-    if (strcmp(line,lastLine) == 0) {
+    if (line == lastLine) {
      phrasePair->IncrementPrevious(tmpCount,tmpPcfgSum);
-      SAFE_GETLINE((extractFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
      continue;
    } else {
-      strcpy( lastLine, line );
+      lastLine = line;
    }

    tmpPhraseSource = new PHRASE();
@ -359,8 +354,6 @@ int main(int argc, char* argv[])
      }
    }

-    SAFE_GETLINE((extractFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
-
  }

  processPhrasePairs( phrasePairsWithSameSource, *phraseTableFile, featureManager, maybeLogProb );
@ -750,11 +743,9 @@ void loadFunctionWords( const string &fileName )
  }
  istream *inFileP = &inFile;

-  char line[LINE_MAX_LENGTH];
-  while(true) {
-    SAFE_GETLINE((*inFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
-    if (inFileP->eof()) break;
-    std::vector<string> token = tokenize( line );
+  string line;
+  while(getline(*inFileP, line)) {
+    std::vector<string> token = tokenize( line.c_str() );
    if (token.size() > 0)
      functionWordList.insert( token[0] );
  }
@ -799,16 +790,13 @@ void LexicalTable::load( const string &fileName )
  }
  istream *inFileP = &inFile;

-  char line[LINE_MAX_LENGTH];
-
+  string line;
  int i=0;
-  while(true) {
+  while(getline(*inFileP, line)) {
    i++;
    if (i%100000 == 0) std::cerr << "." << flush;
-    SAFE_GETLINE((*inFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
-    if (inFileP->eof()) break;

-    std::vector<string> token = tokenize( line );
+    std::vector<string> token = tokenize( line.c_str() );
    if (token.size() != 3) {
        std::cerr << "line " << i << " in " << fileName
           << " has wrong number of tokens, skipping:" << std::endl
--- a/phrase-extract/statistics-main.cpp
+++ b/phrase-extract/statistics-main.cpp
@ -12,15 +12,12 @@
 #include <time.h>

 #include "AlignmentPhrase.h"
-#include "SafeGetline.h"
 #include "tables-core.h"
 #include "InputFileStream.h"

 using namespace std;
 using namespace MosesTraining;

-#define LINE_MAX_LENGTH 10000
-
 namespace MosesTraining
 {

@ -31,7 +28,7 @@ public:
  vector< vector<size_t> > alignedToE;
  vector< vector<size_t> > alignedToF;

-  bool create( char*, int );
+  bool create( const char*, int );
  void clear();
  bool equals( const PhraseAlignment& );
 };
@ -106,16 +103,14 @@ int main(int argc, char* argv[])
  vector< PhraseAlignment > phrasePairsWithSameF;
  int i=0;
  int fileCount = 0;
-  while(true) {
+
+  string line;
+  while(getline(extractFileP, line)) {
    if (extractFileP.eof()) break;
    if (++i % 100000 == 0) cerr << "." << flush;
-    char line[LINE_MAX_LENGTH];
-    SAFE_GETLINE((extractFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
-    //    if (fileCount>0)
-    if (extractFileP.eof())
-      break;
+
    PhraseAlignment phrasePair;
-    bool isPhrasePair = phrasePair.create( line, i );
+    bool isPhrasePair = phrasePair.create( line.c_str(), i );
    if (lastForeign >= 0 && lastForeign != phrasePair.foreign) {
      processPhrasePairs( phrasePairsWithSameF );
      for(size_t j=0; j<phrasePairsWithSameF.size(); j++)
@ -124,7 +119,7 @@ int main(int argc, char* argv[])
      phraseTableE.clear();
      phraseTableF.clear();
      phrasePair.clear(); // process line again, since phrase tables flushed
-      phrasePair.create( line, i );
+      phrasePair.create( line.c_str(), i );
      phrasePairBase = 0;
    }
    lastForeign = phrasePair.foreign;
@ -242,7 +237,7 @@ void processPhrasePairs( vector< PhraseAlignment > &phrasePair )
  }
 }

-bool PhraseAlignment::create( char line[], int lineID )
+bool PhraseAlignment::create(const char line[], int lineID )
 {
  vector< string > token = tokenize( line );
  int item = 1;
@ -321,16 +316,14 @@ void LexicalTable::load( const string &filePath )
  }
  istream *inFileP = &inFile;

-  char line[LINE_MAX_LENGTH];
+  string line;

  int i=0;
-  while(true) {
+  while(getline(*inFileP, line)) {
    i++;
    if (i%100000 == 0) cerr << "." << flush;
-    SAFE_GETLINE((*inFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
-    if (inFileP->eof()) break;

-    vector<string> token = tokenize( line );
+    vector<string> token = tokenize( line.c_str() );
    if (token.size() != 3) {
      cerr << "line " << i << " in " << filePath << " has wrong number of tokens, skipping:\n" <<
           token.size() << " " << token[0] << " " << line << endl;
--- a/scripts/training/wrappers/conll2mosesxml.py
+++ b/scripts/training/wrappers/conll2mosesxml.py
@ -0,0 +1,188 @@
+#!/usr/bin/python
+# -*- coding: utf-8 -*-
+# Author: Rico Sennrich
+
+# takes a file in the CoNLL dependency format (from the CoNLL-X shared task on dependency parsing; http://ilk.uvt.nl/conll/#dataformat )
+# and produces Moses XML format. Note that the structure is built based on fields 9 and 10 (projective HEAD and RELATION),
+# which not all parsers produce.
+
+# usage: conll2mosesxml.py [--brackets] < input_file > output_file
+
+from __future__ import print_function, unicode_literals
+import sys
+import re
+import codecs
+from collections import namedtuple,defaultdict
+from lxml import etree as ET
+
+
+Word = namedtuple('Word', ['pos','word','lemma','tag','head','func', 'proj_head', 'proj_func'])
+
+def main(output_format='xml'):
+    sentence = []
+
+    for line in sys.stdin:
+
+        # process sentence
+        if line == "\n":
+            sentence.insert(0,[])
+            if is_projective(sentence):
+                write(sentence,output_format)
+            else:
+                sys.stderr.write(' '.join(w.word for w in sentence[1:]) + '\n')
+                sys.stdout.write('\n')
+            sentence = []
+            continue
+
+        try:
+            pos, word, lemma, tag, tag2, morph, head, func, proj_head, proj_func = line.split()
+        except ValueError: # word may be unicode whitespace
+            pos, word, lemma, tag, tag2, morph, head, func, proj_head, proj_func = re.split(' *\t*',line.strip())
+
+        word = escape_special_chars(word)
+        lemma = escape_special_chars(lemma)
+
+        if proj_head == '_':
+            proj_head = head
+            proj_func = func
+
+        sentence.append(Word(int(pos), word, lemma, tag2,int(head), func, int(proj_head), proj_func))
+
+
+# this script performs the same escaping as escape-special-chars.perl in Moses.
+# most of it is done in function write(), but quotation marks need to be processed first
+def escape_special_chars(line):
+
+    line = line.replace('\'','&apos;') # xml
+    line = line.replace('"','&quot;') # xml
+
+    return line
+
+
+# make a check if structure is projective
+def is_projective(sentence):
+    dominates = defaultdict(set)
+    for i,w in enumerate(sentence):
+        dominates[i].add(i)
+        if not i:
+            continue
+        head = int(w.proj_head)
+        while head != 0:
+            if i in dominates[head]:
+                break
+            dominates[head].add(i)
+            head = int(sentence[head].proj_head)
+
+    for i in dominates:
+        dependents = dominates[i]
+        if max(dependents) - min(dependents) != len(dependents)-1:
+            sys.stderr.write("error: non-projective structure.\n")
+            return False
+    return True
+
+
+def write(sentence, output_format='xml'):
+
+    if output_format == 'xml':
+        tree = create_subtree(0,sentence)
+        out = ET.tostring(tree, encoding = 'UTF-8').decode('UTF-8')
+
+    if output_format == 'brackets':
+        out = create_brackets(0,sentence)
+
+    out = out.replace('|','&#124;') # factor separator
+    out = out.replace('[','&#91;') # syntax non-terminal
+    out = out.replace(']','&#93;') # syntax non-terminal
+
+    out = out.replace('&amp;apos;','&apos;') # lxml is buggy if input is escaped
+    out = out.replace('&amp;quot;','&quot;') # lxml is buggy if input is escaped
+
+    print(out)
+
+# write node in Moses XML format
+def create_subtree(position, sentence):
+
+    element = ET.Element('tree')
+
+    if position:
+        element.set('label', sentence[position].proj_func)
+    else:
+        element.set('label', 'sent')
+
+    for i in range(1,position):
+        if sentence[i].proj_head == position:
+            element.append(create_subtree(i, sentence))
+
+    if position:
+
+        if preterminals:
+            head = ET.Element('tree')
+            head.set('label', sentence[position].tag)
+            head.text = sentence[position].word
+            element.append(head)
+
+        else:
+            if len(element):
+                element[-1].tail = sentence[position].word
+            else:
+                element.text = sentence[position].word
+
+    for i in range(position, len(sentence)):
+        if i and sentence[i].proj_head == position:
+            element.append(create_subtree(i, sentence))
+
+    return element
+
+
+# write node in bracket format (Penn treebank style)
+def create_brackets(position, sentence):
+
+    if position:
+        element = "( " + sentence[position].proj_func + ' '
+    else:
+        element = "( sent "
+
+    for i in range(1,position):
+        if sentence[i].proj_head == position:
+            element += create_brackets(i, sentence)
+
+    if position:
+        word = sentence[position].word
+        if word == ')':
+            word = 'RBR'
+        elif word == '(':
+            word = 'LBR'
+
+        tag = sentence[position].tag
+        if tag == '$(':
+            tag = '$BR'
+
+        if preterminals:
+            element += '( ' + tag + ' ' + word + ' ) '
+        else:
+            element += word + ' ) '
+
+    for i in range(position, len(sentence)):
+        if i and sentence[i].proj_head == position:
+            element += create_brackets(i, sentence)
+
+    if preterminals or not position:
+        element += ') '
+
+    return element
+
+if __name__ == '__main__':
+    if sys.version_info < (3,0,0):
+        sys.stdin = codecs.getreader('UTF-8')(sys.stdin)
+        sys.stdout = codecs.getwriter('UTF-8')(sys.stdout)
+        sys.stderr = codecs.getwriter('UTF-8')(sys.stderr)
+
+    if '--no_preterminals' in sys.argv:
+        preterminals = False
+    else:
+        preterminals = True
+
+    if '--brackets' in sys.argv:
+        main('brackets')
+    else:
+        main('xml')