mirror of
https://github.com/moses-smt/mosesdecoder.git
synced 2024-12-27 14:05:29 +03:00
Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder
This commit is contained in:
commit
89a9c410c9
1
Jamroot
1
Jamroot
@ -145,6 +145,7 @@ build-projects lm util phrase-extract search moses moses/LM mert moses-cmd moses
|
|||||||
if [ option.get "with-mm" : : "yes" ]
|
if [ option.get "with-mm" : : "yes" ]
|
||||||
{
|
{
|
||||||
alias mm :
|
alias mm :
|
||||||
|
moses/TranslationModel/UG//lookup_mmsapt
|
||||||
moses/TranslationModel/UG/mm//mtt-build
|
moses/TranslationModel/UG/mm//mtt-build
|
||||||
moses/TranslationModel/UG/mm//mtt-dump
|
moses/TranslationModel/UG/mm//mtt-dump
|
||||||
moses/TranslationModel/UG/mm//symal2mam
|
moses/TranslationModel/UG/mm//symal2mam
|
||||||
|
122
contrib/moses-speedtest/README.md
Normal file
122
contrib/moses-speedtest/README.md
Normal file
@ -0,0 +1,122 @@
|
|||||||
|
# Moses speedtesting framework
|
||||||
|
|
||||||
|
### Description
|
||||||
|
|
||||||
|
This is an automatic test framework that is designed to test the day to day performance changes in Moses.
|
||||||
|
|
||||||
|
### Set up
|
||||||
|
|
||||||
|
#### Set up a Moses repo
|
||||||
|
Set up a Moses repo and build it with the desired configuration.
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/moses-smt/mosesdecoder.git
|
||||||
|
cd mosesdecoder
|
||||||
|
./bjam -j10 --with-cmph=/usr/include/
|
||||||
|
```
|
||||||
|
You need to build Moses first, so that the testsuite knows what command you want it to use when rebuilding against newer revisions.
|
||||||
|
|
||||||
|
#### Create a parent directory.
|
||||||
|
Create a parent directory where the **runtests.py** and related scripts and configuration file should reside.
|
||||||
|
This should also be the location of the TEST_DIR and TEST_LOG_DIR as explained in the next section.
|
||||||
|
|
||||||
|
#### Set up a global configuration file.
|
||||||
|
You need a configuration file for the testsuite. A sample configuration file is provided in **testsuite\_config**
|
||||||
|
<pre>
|
||||||
|
MOSES_REPO_PATH: /home/moses-speedtest/moses-standard/mosesdecoder
|
||||||
|
DROP_CACHES_COMM: sys_drop_caches 3
|
||||||
|
TEST_DIR: /home/moses-speedtest/phrase_tables/tests
|
||||||
|
TEST_LOG_DIR: /home/moses-speedtest/phrase_tables/testlogs
|
||||||
|
BASEBRANCH: RELEASE-2.1.1
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
The _MOSES\_REPO\_PATH_ is the place where you have set up and built moses.
|
||||||
|
The _DROP\_CACHES\_COMM_ is the command that would beused to drop caches. It should run without needing root access.
|
||||||
|
_TEST\_DIR_ is the directory where all the tests will reside.
|
||||||
|
_TEST\_LOG\_DIR_ is the directory where the performance logs will be gathered. It should be created before running the testsuite for the first time.
|
||||||
|
_BASEBRANCH_ is the branch against which all new tests will be compared. It should normally be set to be the latest Moses stable release.
|
||||||
|
|
||||||
|
### Creating tests
|
||||||
|
|
||||||
|
In order to create a test one should go into the TEST_DIR and create a new folder. That folder will be used for the name of the test.
|
||||||
|
Inside that folder one should place a configuration file named **config**. The naming is mandatory.
|
||||||
|
An example such configuration file is **test\_config**
|
||||||
|
|
||||||
|
<pre>
|
||||||
|
Command: moses -f ... -i fff #Looks for the command in the /bin directory of the repo specified in the testsuite_config
|
||||||
|
LDPRE: ldpreloads #Comma separated LD_LIBRARY_PATH:/,
|
||||||
|
Variants: vanilla, cached, ldpre #Can't have cached without ldpre or vanilla
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
The _Command:_ line specifies the executable (which is looked up in the /bin directory of the repo.) and any arguments necessary. Before running the test, the script cds to the current test directory so you can use relative paths.
|
||||||
|
The _LDPRE:_ specifies if tests should be run with any LD\_PRELOAD flags.
|
||||||
|
The _Variants:_ line specifies what type of tests should we run. This particular line will run the following tests:
|
||||||
|
1. A Vanilla test meaning just the command after _Command_ will be issued.
|
||||||
|
2. A vanilla cached test meaning that after the vanilla test, the test will be run again without dropping caches in order to benchmark performance on cached filesystem.
|
||||||
|
3. A test with LD_PRELOAD ldpreloads moses -f command. For each available LDPRELOAD comma separated library to preload.
|
||||||
|
4. A cached version of all LD_PRELOAD tests.
|
||||||
|
|
||||||
|
### Running tests.
|
||||||
|
Running the tests is done through the **runtests.py** script.
|
||||||
|
|
||||||
|
#### Running all tests.
|
||||||
|
To run all tests, with the base branch and the latests revision (and generate new basebranch test data if such is missing) do a:
|
||||||
|
```bash
|
||||||
|
python3 runtests.py -c testsuite_config
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Running specific tests.
|
||||||
|
The script allows the user to manually run a particular test or to test against a specific branch or revision:
|
||||||
|
<pre>
|
||||||
|
moses-speedtest@crom:~/phrase_tables$ python3 runtests.py --help
|
||||||
|
usage: runtests.py [-h] -c CONFIGFILE [-s SINGLETESTDIR] [-r REVISION]
|
||||||
|
[-b BRANCH]
|
||||||
|
|
||||||
|
A python based speedtest suite for moses.
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-c CONFIGFILE, --configfile CONFIGFILE
|
||||||
|
Specify test config file
|
||||||
|
-s SINGLETESTDIR, --singletest SINGLETESTDIR
|
||||||
|
Single test name directory. Specify directory name,
|
||||||
|
not full path!
|
||||||
|
-r REVISION, --revision REVISION
|
||||||
|
Specify a specific revison for the test.
|
||||||
|
-b BRANCH, --branch BRANCH
|
||||||
|
Specify a branch for the test.
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
### Generating HTML report.
|
||||||
|
To generate a summary of the test results use the **html\_gen.py** script. It places a file named *index.html* in the current script directory.
|
||||||
|
```bash
|
||||||
|
python3 html_gen.py testsuite_config
|
||||||
|
```
|
||||||
|
You should use the generated file with the **style.css** file provided in the html directory.
|
||||||
|
|
||||||
|
### Command line regression testing.
|
||||||
|
Alternatively you could check for regressions from the command line using the **check\_fo\r_regression.py** script:
|
||||||
|
```bash
|
||||||
|
python3 check_for_regression.py TESTLOGS_DIRECTORY
|
||||||
|
```
|
||||||
|
|
||||||
|
Alternatively the results of all tests are logged inside the the specified TESTLOGS directory so you can manually check them for additional information such as date, time, revision, branch, etc...
|
||||||
|
|
||||||
|
### Create a cron job:
|
||||||
|
Create a cron job to run the tests daily and generate an html report. An example *cronjob* is available.
|
||||||
|
```bash
|
||||||
|
#!/bin/sh
|
||||||
|
cd /home/moses-speedtest/phrase_tables
|
||||||
|
|
||||||
|
python3 runtests.py -c testsuite_config #Run the tests.
|
||||||
|
python3 html_gen.py testsuite_config #Generate html
|
||||||
|
|
||||||
|
cp index.html /fs/thor4/html/www/speed-test/ #Update the html
|
||||||
|
```
|
||||||
|
|
||||||
|
Place the script in _/etc/cron.daily_ for dayly testing
|
||||||
|
|
||||||
|
###### Author
|
||||||
|
Nikolay Bogoychev, 2014
|
||||||
|
|
||||||
|
###### License
|
||||||
|
This software is licensed under the LGPL.
|
63
contrib/moses-speedtest/check_for_regression.py
Normal file
63
contrib/moses-speedtest/check_for_regression.py
Normal file
@ -0,0 +1,63 @@
|
|||||||
|
"""Checks if any of the latests tests has performed considerably different than
|
||||||
|
the previous ones. Takes the log directory as an argument."""
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from testsuite_common import Result, processLogLine, bcolors, getLastTwoLines
|
||||||
|
|
||||||
|
LOGDIR = sys.argv[1] #Get the log directory as an argument
|
||||||
|
PERCENTAGE = 5 #Default value for how much a test shoudl change
|
||||||
|
if len(sys.argv) == 3:
|
||||||
|
PERCENTAGE = float(sys.argv[2]) #Default is 5%, but we can specify more
|
||||||
|
#line parameter
|
||||||
|
|
||||||
|
def printResults(regressed, better, unchanged, firsttime):
|
||||||
|
"""Pretty print the results in different colours"""
|
||||||
|
if regressed != []:
|
||||||
|
for item in regressed:
|
||||||
|
print(bcolors.RED + "REGRESSION! " + item.testname + " Was: "\
|
||||||
|
+ str(item.previous) + " Is: " + str(item.current) + " Change: "\
|
||||||
|
+ str(abs(item.percentage)) + "%. Revision: " + item.revision\
|
||||||
|
+ bcolors.ENDC)
|
||||||
|
print('\n')
|
||||||
|
if unchanged != []:
|
||||||
|
for item in unchanged:
|
||||||
|
print(bcolors.BLUE + "UNCHANGED: " + item.testname + " Revision: " +\
|
||||||
|
item.revision + bcolors.ENDC)
|
||||||
|
print('\n')
|
||||||
|
if better != []:
|
||||||
|
for item in better:
|
||||||
|
print(bcolors.GREEN + "IMPROVEMENT! " + item.testname + " Was: "\
|
||||||
|
+ str(item.previous) + " Is: " + str(item.current) + " Change: "\
|
||||||
|
+ str(abs(item.percentage)) + "%. Revision: " + item.revision\
|
||||||
|
+ bcolors.ENDC)
|
||||||
|
if firsttime != []:
|
||||||
|
for item in firsttime:
|
||||||
|
print(bcolors.PURPLE + "First time test! " + item.testname +\
|
||||||
|
" Took: " + str(item.real) + " seconds. Revision: " +\
|
||||||
|
item.revision + bcolors.ENDC)
|
||||||
|
|
||||||
|
|
||||||
|
all_files = os.listdir(LOGDIR)
|
||||||
|
regressed = []
|
||||||
|
better = []
|
||||||
|
unchanged = []
|
||||||
|
firsttime = []
|
||||||
|
|
||||||
|
#Go through all log files and find which tests have performed better.
|
||||||
|
for logfile in all_files:
|
||||||
|
(line1, line2) = getLastTwoLines(logfile, LOGDIR)
|
||||||
|
log1 = processLogLine(line1)
|
||||||
|
if line2 == '\n': # Empty line, only one test ever run
|
||||||
|
firsttime.append(log1)
|
||||||
|
continue
|
||||||
|
log2 = processLogLine(line2)
|
||||||
|
res = Result(log1.testname, log1.real, log2.real, log2.revision,\
|
||||||
|
log2.branch, log1.revision, log1.branch)
|
||||||
|
if res.percentage < -PERCENTAGE:
|
||||||
|
regressed.append(res)
|
||||||
|
elif res.change > PERCENTAGE:
|
||||||
|
better.append(res)
|
||||||
|
else:
|
||||||
|
unchanged.append(res)
|
||||||
|
|
||||||
|
printResults(regressed, better, unchanged, firsttime)
|
7
contrib/moses-speedtest/cronjob
Normal file
7
contrib/moses-speedtest/cronjob
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
#!/bin/sh
|
||||||
|
cd /home/moses-speedtest/phrase_tables
|
||||||
|
|
||||||
|
python3 runtests.py -c testsuite_config #Run the tests.
|
||||||
|
python3 html_gen.py testsuite_config #Generate html
|
||||||
|
|
||||||
|
cp index.html /fs/thor4/html/www/speed-test/ #Update the html
|
5
contrib/moses-speedtest/helpers/README.md
Normal file
5
contrib/moses-speedtest/helpers/README.md
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
###Helpers
|
||||||
|
|
||||||
|
This is a python script that basically gives you the equivalent of:
|
||||||
|
```echo 3 > /proc/sys/vm/drop_caches```
|
||||||
|
You need to set it up so it is executed with root access without needing a password so that the tests can be automated.
|
22
contrib/moses-speedtest/helpers/sys_drop_caches.py
Normal file
22
contrib/moses-speedtest/helpers/sys_drop_caches.py
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
#!/usr/bin/spython
|
||||||
|
from sys import argv, stderr, exit
|
||||||
|
from os import linesep as ls
|
||||||
|
procfile = "/proc/sys/vm/drop_caches"
|
||||||
|
options = ["1","2","3"]
|
||||||
|
flush_type = None
|
||||||
|
try:
|
||||||
|
flush_type = argv[1][0:1]
|
||||||
|
if not flush_type in options:
|
||||||
|
raise IndexError, "not in options"
|
||||||
|
with open(procfile, "w") as f:
|
||||||
|
f.write("%s%s" % (flush_type,ls))
|
||||||
|
exit(0)
|
||||||
|
except IndexError, e:
|
||||||
|
stderr.write("Argument %s required.%s" % (options, ls))
|
||||||
|
except IOError, e:
|
||||||
|
stderr.write("Error writing to file.%s" % ls)
|
||||||
|
except StandardError, e:
|
||||||
|
stderr.write("Unknown Error.%s" % ls)
|
||||||
|
|
||||||
|
exit(1)
|
||||||
|
|
5
contrib/moses-speedtest/html/README.md
Normal file
5
contrib/moses-speedtest/html/README.md
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
###HTML files.
|
||||||
|
|
||||||
|
_index.html_ is a sample generated file by this testsuite.
|
||||||
|
|
||||||
|
_style.css_ should be placed in the html directory in which _index.html_ will be placed in order to visualize the test results in a browser.
|
32
contrib/moses-speedtest/html/index.html
Normal file
32
contrib/moses-speedtest/html/index.html
Normal file
File diff suppressed because one or more lines are too long
21
contrib/moses-speedtest/html/style.css
Normal file
21
contrib/moses-speedtest/html/style.css
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
table,th,td
|
||||||
|
{
|
||||||
|
border:1px solid black;
|
||||||
|
border-collapse:collapse
|
||||||
|
}
|
||||||
|
|
||||||
|
tr:nth-child(odd) {
|
||||||
|
background-color: Gainsboro;
|
||||||
|
}
|
||||||
|
|
||||||
|
.better {
|
||||||
|
color: Green;
|
||||||
|
}
|
||||||
|
|
||||||
|
.worse {
|
||||||
|
color: Red;
|
||||||
|
}
|
||||||
|
|
||||||
|
.unchanged {
|
||||||
|
color: SkyBlue;
|
||||||
|
}
|
192
contrib/moses-speedtest/html_gen.py
Normal file
192
contrib/moses-speedtest/html_gen.py
Normal file
@ -0,0 +1,192 @@
|
|||||||
|
"""Generates HTML page containing the testresults"""
|
||||||
|
from testsuite_common import Result, processLogLine, getLastTwoLines
|
||||||
|
from runtests import parse_testconfig
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
|
||||||
|
HTML_HEADING = """<html>
|
||||||
|
<head>
|
||||||
|
<title>Moses speed testing</title>
|
||||||
|
<link rel="stylesheet" type="text/css" href="style.css"></head><body>"""
|
||||||
|
HTML_ENDING = "</table></body></html>\n"
|
||||||
|
|
||||||
|
TABLE_HEADING = """<table><tr class="heading">
|
||||||
|
<th>Date</th>
|
||||||
|
<th>Time</th>
|
||||||
|
<th>Testname</th>
|
||||||
|
<th>Revision</th>
|
||||||
|
<th>Branch</th>
|
||||||
|
<th>Time</th>
|
||||||
|
<th>Prevtime</th>
|
||||||
|
<th>Prevrev</th>
|
||||||
|
<th>Change (%)</th>
|
||||||
|
<th>Time (Basebranch)</th>
|
||||||
|
<th>Change (%, Basebranch)</th>
|
||||||
|
<th>Time (Days -2)</th>
|
||||||
|
<th>Change (%, Days -2)</th>
|
||||||
|
<th>Time (Days -3)</th>
|
||||||
|
<th>Change (%, Days -3)</th>
|
||||||
|
<th>Time (Days -4)</th>
|
||||||
|
<th>Change (%, Days -4)</th>
|
||||||
|
<th>Time (Days -5)</th>
|
||||||
|
<th>Change (%, Days -5)</th>
|
||||||
|
<th>Time (Days -6)</th>
|
||||||
|
<th>Change (%, Days -6)</th>
|
||||||
|
<th>Time (Days -7)</th>
|
||||||
|
<th>Change (%, Days -7)</th>
|
||||||
|
<th>Time (Days -14)</th>
|
||||||
|
<th>Change (%, Days -14)</th>
|
||||||
|
<th>Time (Years -1)</th>
|
||||||
|
<th>Change (%, Years -1)</th>
|
||||||
|
</tr>"""
|
||||||
|
|
||||||
|
def get_prev_days(date, numdays):
|
||||||
|
"""Gets the date numdays previous days so that we could search for
|
||||||
|
that test in the config file"""
|
||||||
|
date_obj = datetime.strptime(date, '%d.%m.%Y').date()
|
||||||
|
past_date = date_obj - timedelta(days=numdays)
|
||||||
|
return past_date.strftime('%d.%m.%Y')
|
||||||
|
|
||||||
|
def gather_necessary_lines(logfile, date):
|
||||||
|
"""Gathers the necessary lines corresponding to past dates
|
||||||
|
and parses them if they exist"""
|
||||||
|
#Get a dictionary of dates
|
||||||
|
dates = {}
|
||||||
|
dates[get_prev_days(date, 2)] = ('-2', None)
|
||||||
|
dates[get_prev_days(date, 3)] = ('-3', None)
|
||||||
|
dates[get_prev_days(date, 4)] = ('-4', None)
|
||||||
|
dates[get_prev_days(date, 5)] = ('-5', None)
|
||||||
|
dates[get_prev_days(date, 6)] = ('-6', None)
|
||||||
|
dates[get_prev_days(date, 7)] = ('-7', None)
|
||||||
|
dates[get_prev_days(date, 14)] = ('-14', None)
|
||||||
|
dates[get_prev_days(date, 365)] = ('-365', None)
|
||||||
|
|
||||||
|
openfile = open(logfile, 'r')
|
||||||
|
for line in openfile:
|
||||||
|
if line.split()[0] in dates.keys():
|
||||||
|
day = dates[line.split()[0]][0]
|
||||||
|
dates[line.split()[0]] = (day, processLogLine(line))
|
||||||
|
openfile.close()
|
||||||
|
return dates
|
||||||
|
|
||||||
|
def append_date_to_table(resline):
|
||||||
|
"""Appends past dates to the html"""
|
||||||
|
cur_html = '<td>' + str(resline.current) + '</td>'
|
||||||
|
|
||||||
|
if resline.percentage > 0.05: #If we have improvement of more than 5%
|
||||||
|
cur_html = cur_html + '<td class="better">' + str(resline.percentage) + '</td>'
|
||||||
|
elif resline.percentage < -0.05: #We have a regression of more than 5%
|
||||||
|
cur_html = cur_html + '<td class="worse">' + str(resline.percentage) + '</td>'
|
||||||
|
else:
|
||||||
|
cur_html = cur_html + '<td class="unchanged">' + str(resline.percentage) + '</td>'
|
||||||
|
return cur_html
|
||||||
|
|
||||||
|
def compare_rev(filename, rev1, rev2, branch1=False, branch2=False):
|
||||||
|
"""Compare the test results of two lines. We can specify either a
|
||||||
|
revision or a branch for comparison. The first rev should be the
|
||||||
|
base version and the second revision should be the later version"""
|
||||||
|
|
||||||
|
#In the log file the index of the revision is 2 but the index of
|
||||||
|
#the branch is 12. Alternate those depending on whether we are looking
|
||||||
|
#for a specific revision or branch.
|
||||||
|
firstidx = 2
|
||||||
|
secondidx = 2
|
||||||
|
if branch1 == True:
|
||||||
|
firstidx = 12
|
||||||
|
if branch2 == True:
|
||||||
|
secondidx = 12
|
||||||
|
|
||||||
|
rev1line = ''
|
||||||
|
rev2line = ''
|
||||||
|
resfile = open(filename, 'r')
|
||||||
|
for line in resfile:
|
||||||
|
if rev1 == line.split()[firstidx]:
|
||||||
|
rev1line = line
|
||||||
|
elif rev2 == line.split()[secondidx]:
|
||||||
|
rev2line = line
|
||||||
|
if rev1line != '' and rev2line != '':
|
||||||
|
break
|
||||||
|
resfile.close()
|
||||||
|
if rev1line == '':
|
||||||
|
raise ValueError('Revision ' + rev1 + " was not found!")
|
||||||
|
if rev2line == '':
|
||||||
|
raise ValueError('Revision ' + rev2 + " was not found!")
|
||||||
|
|
||||||
|
logLine1 = processLogLine(rev1line)
|
||||||
|
logLine2 = processLogLine(rev2line)
|
||||||
|
res = Result(logLine1.testname, logLine1.real, logLine2.real,\
|
||||||
|
logLine2.revision, logLine2.branch, logLine1.revision, logLine1.branch)
|
||||||
|
|
||||||
|
return res
|
||||||
|
|
||||||
|
def produce_html(path, global_config):
|
||||||
|
"""Produces html file for the report."""
|
||||||
|
html = '' #The table HTML
|
||||||
|
for filenam in os.listdir(global_config.testlogs):
|
||||||
|
#Generate html for the newest two lines
|
||||||
|
#Get the lines from the config file
|
||||||
|
(ll1, ll2) = getLastTwoLines(filenam, global_config.testlogs)
|
||||||
|
logLine1 = processLogLine(ll1)
|
||||||
|
logLine2 = processLogLine(ll2)
|
||||||
|
|
||||||
|
#Generate html
|
||||||
|
res1 = Result(logLine1.testname, logLine1.real, logLine2.real,\
|
||||||
|
logLine2.revision, logLine2.branch, logLine1.revision, logLine1.branch)
|
||||||
|
html = html + '<tr><td>' + logLine2.date + '</td><td>' + logLine2.time + '</td><td>' +\
|
||||||
|
res1.testname + '</td><td>' + res1.revision[:10] + '</td><td>' + res1.branch + '</td><td>' +\
|
||||||
|
str(res1.current) + '</td><td>' + str(res1.previous) + '</td><td>' + res1.prevrev[:10] + '</td>'
|
||||||
|
|
||||||
|
#Add fancy colours depending on the change
|
||||||
|
if res1.percentage > 0.05: #If we have improvement of more than 5%
|
||||||
|
html = html + '<td class="better">' + str(res1.percentage) + '</td>'
|
||||||
|
elif res1.percentage < -0.05: #We have a regression of more than 5%
|
||||||
|
html = html + '<td class="worse">' + str(res1.percentage) + '</td>'
|
||||||
|
else:
|
||||||
|
html = html + '<td class="unchanged">' + str(res1.percentage) + '</td>'
|
||||||
|
|
||||||
|
#Get comparison against the base version
|
||||||
|
filenam = global_config.testlogs + '/' + filenam #Get proper directory
|
||||||
|
res2 = compare_rev(filenam, global_config.basebranch, res1.revision, branch1=True)
|
||||||
|
html = html + '<td>' + str(res2.previous) + '</td>'
|
||||||
|
|
||||||
|
#Add fancy colours depending on the change
|
||||||
|
if res2.percentage > 0.05: #If we have improvement of more than 5%
|
||||||
|
html = html + '<td class="better">' + str(res2.percentage) + '</td>'
|
||||||
|
elif res2.percentage < -0.05: #We have a regression of more than 5%
|
||||||
|
html = html + '<td class="worse">' + str(res2.percentage) + '</td>'
|
||||||
|
else:
|
||||||
|
html = html + '<td class="unchanged">' + str(res2.percentage) + '</td>'
|
||||||
|
|
||||||
|
#Add extra dates comparison dating from the beginning of time if they exist
|
||||||
|
past_dates = list(range(2, 8))
|
||||||
|
past_dates.append(14)
|
||||||
|
past_dates.append(365) # Get the 1 year ago day
|
||||||
|
linesdict = gather_necessary_lines(filenam, logLine2.date)
|
||||||
|
|
||||||
|
for days in past_dates:
|
||||||
|
act_date = get_prev_days(logLine2.date, days)
|
||||||
|
if linesdict[act_date][1] is not None:
|
||||||
|
logline_date = linesdict[act_date][1]
|
||||||
|
restemp = Result(logline_date.testname, logline_date.real, logLine2.real,\
|
||||||
|
logLine2.revision, logLine2.branch, logline_date.revision, logline_date.branch)
|
||||||
|
html = html + append_date_to_table(restemp)
|
||||||
|
else:
|
||||||
|
html = html + '<td>N/A</td><td>N/A</td>'
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
html = html + '</tr>' #End row
|
||||||
|
|
||||||
|
#Write out the file
|
||||||
|
basebranch_info = '<text><b>Basebranch:</b> ' + res2.prevbranch + ' <b>Revision:</b> ' +\
|
||||||
|
res2.prevrev + '</text>'
|
||||||
|
writeoutstr = HTML_HEADING + basebranch_info + TABLE_HEADING + html + HTML_ENDING
|
||||||
|
writefile = open(path, 'w')
|
||||||
|
writefile.write(writeoutstr)
|
||||||
|
writefile.close()
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
CONFIG = parse_testconfig(sys.argv[1])
|
||||||
|
produce_html('index.html', CONFIG)
|
293
contrib/moses-speedtest/runtests.py
Normal file
293
contrib/moses-speedtest/runtests.py
Normal file
@ -0,0 +1,293 @@
|
|||||||
|
"""Given a config file, runs tests"""
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import time
|
||||||
|
from argparse import ArgumentParser
|
||||||
|
from testsuite_common import processLogLine
|
||||||
|
|
||||||
|
def parse_cmd():
|
||||||
|
"""Parse the command line arguments"""
|
||||||
|
description = "A python based speedtest suite for moses."
|
||||||
|
parser = ArgumentParser(description=description)
|
||||||
|
parser.add_argument("-c", "--configfile", action="store",\
|
||||||
|
dest="configfile", required=True,\
|
||||||
|
help="Specify test config file")
|
||||||
|
parser.add_argument("-s", "--singletest", action="store",\
|
||||||
|
dest="singletestdir", default=None,\
|
||||||
|
help="Single test name directory. Specify directory name,\
|
||||||
|
not full path!")
|
||||||
|
parser.add_argument("-r", "--revision", action="store",\
|
||||||
|
dest="revision", default=None,\
|
||||||
|
help="Specify a specific revison for the test.")
|
||||||
|
parser.add_argument("-b", "--branch", action="store",\
|
||||||
|
dest="branch", default=None,\
|
||||||
|
help="Specify a branch for the test.")
|
||||||
|
|
||||||
|
arguments = parser.parse_args()
|
||||||
|
return arguments
|
||||||
|
|
||||||
|
def repoinit(testconfig):
|
||||||
|
"""Determines revision and sets up the repo."""
|
||||||
|
revision = ''
|
||||||
|
#Update the repo
|
||||||
|
os.chdir(testconfig.repo)
|
||||||
|
#Checkout specific branch, else maintain main branch
|
||||||
|
if testconfig.branch != 'master':
|
||||||
|
subprocess.call(['git', 'checkout', testconfig.branch])
|
||||||
|
rev, _ = subprocess.Popen(['git', 'rev-parse', 'HEAD'],\
|
||||||
|
stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()
|
||||||
|
revision = str(rev).replace("\\n'", '').replace("b'", '')
|
||||||
|
else:
|
||||||
|
subprocess.call(['git checkout master'], shell=True)
|
||||||
|
|
||||||
|
#Check a specific revision. Else checkout master.
|
||||||
|
if testconfig.revision:
|
||||||
|
subprocess.call(['git', 'checkout', testconfig.revision])
|
||||||
|
revision = testconfig.revision
|
||||||
|
elif testconfig.branch == 'master':
|
||||||
|
subprocess.call(['git pull'], shell=True)
|
||||||
|
rev, _ = subprocess.Popen(['git rev-parse HEAD'], stdout=subprocess.PIPE,\
|
||||||
|
stderr=subprocess.PIPE, shell=True).communicate()
|
||||||
|
revision = str(rev).replace("\\n'", '').replace("b'", '')
|
||||||
|
|
||||||
|
return revision
|
||||||
|
|
||||||
|
class Configuration:
|
||||||
|
"""A simple class to hold all of the configuration constatns"""
|
||||||
|
def __init__(self, repo, drop_caches, tests, testlogs, basebranch, baserev):
|
||||||
|
self.repo = repo
|
||||||
|
self.drop_caches = drop_caches
|
||||||
|
self.tests = tests
|
||||||
|
self.testlogs = testlogs
|
||||||
|
self.basebranch = basebranch
|
||||||
|
self.baserev = baserev
|
||||||
|
self.singletest = None
|
||||||
|
self.revision = None
|
||||||
|
self.branch = 'master' # Default branch
|
||||||
|
|
||||||
|
def additional_args(self, singletest, revision, branch):
|
||||||
|
"""Additional configuration from command line arguments"""
|
||||||
|
self.singletest = singletest
|
||||||
|
if revision is not None:
|
||||||
|
self.revision = revision
|
||||||
|
if branch is not None:
|
||||||
|
self.branch = branch
|
||||||
|
|
||||||
|
def set_revision(self, revision):
|
||||||
|
"""Sets the current revision that is being tested"""
|
||||||
|
self.revision = revision
|
||||||
|
|
||||||
|
|
||||||
|
class Test:
|
||||||
|
"""A simple class to contain all information about tests"""
|
||||||
|
def __init__(self, name, command, ldopts, permutations):
|
||||||
|
self.name = name
|
||||||
|
self.command = command
|
||||||
|
self.ldopts = ldopts.replace(' ', '').split(',') #Not tested yet
|
||||||
|
self.permutations = permutations
|
||||||
|
|
||||||
|
def parse_configfile(conffile, testdir, moses_repo):
|
||||||
|
"""Parses the config file"""
|
||||||
|
command, ldopts = '', ''
|
||||||
|
permutations = []
|
||||||
|
fileopen = open(conffile, 'r')
|
||||||
|
for line in fileopen:
|
||||||
|
line = line.split('#')[0] # Discard comments
|
||||||
|
if line == '' or line == '\n':
|
||||||
|
continue # Discard lines with comments only and empty lines
|
||||||
|
opt, args = line.split(' ', 1) # Get arguments
|
||||||
|
|
||||||
|
if opt == 'Command:':
|
||||||
|
command = args.replace('\n', '')
|
||||||
|
command = moses_repo + '/bin/' + command
|
||||||
|
elif opt == 'LDPRE:':
|
||||||
|
ldopts = args.replace('\n', '')
|
||||||
|
elif opt == 'Variants:':
|
||||||
|
permutations = args.replace('\n', '').replace(' ', '').split(',')
|
||||||
|
else:
|
||||||
|
raise ValueError('Unrecognized option ' + opt)
|
||||||
|
#We use the testdir as the name.
|
||||||
|
testcase = Test(testdir, command, ldopts, permutations)
|
||||||
|
fileopen.close()
|
||||||
|
return testcase
|
||||||
|
|
||||||
|
def parse_testconfig(conffile):
|
||||||
|
"""Parses the config file for the whole testsuite."""
|
||||||
|
repo_path, drop_caches, tests_dir, testlog_dir = '', '', '', ''
|
||||||
|
basebranch, baserev = '', ''
|
||||||
|
fileopen = open(conffile, 'r')
|
||||||
|
for line in fileopen:
|
||||||
|
line = line.split('#')[0] # Discard comments
|
||||||
|
if line == '' or line == '\n':
|
||||||
|
continue # Discard lines with comments only and empty lines
|
||||||
|
opt, args = line.split(' ', 1) # Get arguments
|
||||||
|
if opt == 'MOSES_REPO_PATH:':
|
||||||
|
repo_path = args.replace('\n', '')
|
||||||
|
elif opt == 'DROP_CACHES_COMM:':
|
||||||
|
drop_caches = args.replace('\n', '')
|
||||||
|
elif opt == 'TEST_DIR:':
|
||||||
|
tests_dir = args.replace('\n', '')
|
||||||
|
elif opt == 'TEST_LOG_DIR:':
|
||||||
|
testlog_dir = args.replace('\n', '')
|
||||||
|
elif opt == 'BASEBRANCH:':
|
||||||
|
basebranch = args.replace('\n', '')
|
||||||
|
elif opt == 'BASEREV:':
|
||||||
|
baserev = args.replace('\n', '')
|
||||||
|
else:
|
||||||
|
raise ValueError('Unrecognized option ' + opt)
|
||||||
|
config = Configuration(repo_path, drop_caches, tests_dir, testlog_dir,\
|
||||||
|
basebranch, baserev)
|
||||||
|
fileopen.close()
|
||||||
|
return config
|
||||||
|
|
||||||
|
def get_config():
|
||||||
|
"""Builds the config object with all necessary attributes"""
|
||||||
|
args = parse_cmd()
|
||||||
|
config = parse_testconfig(args.configfile)
|
||||||
|
config.additional_args(args.singletestdir, args.revision, args.branch)
|
||||||
|
revision = repoinit(config)
|
||||||
|
config.set_revision(revision)
|
||||||
|
return config
|
||||||
|
|
||||||
|
def check_for_basever(testlogfile, basebranch):
|
||||||
|
"""Checks if the base revision is present in the testlogs"""
|
||||||
|
filetoopen = open(testlogfile, 'r')
|
||||||
|
for line in filetoopen:
|
||||||
|
templine = processLogLine(line)
|
||||||
|
if templine.branch == basebranch:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def split_time(filename):
|
||||||
|
"""Splits the output of the time function into seperate parts.
|
||||||
|
We will write time to file, because many programs output to
|
||||||
|
stderr which makes it difficult to get only the exact results we need."""
|
||||||
|
timefile = open(filename, 'r')
|
||||||
|
realtime = float(timefile.readline().replace('\n', '').split()[1])
|
||||||
|
usertime = float(timefile.readline().replace('\n', '').split()[1])
|
||||||
|
systime = float(timefile.readline().replace('\n', '').split()[1])
|
||||||
|
timefile.close()
|
||||||
|
|
||||||
|
return (realtime, usertime, systime)
|
||||||
|
|
||||||
|
|
||||||
|
def write_log(time_file, logname, config):
|
||||||
|
"""Writes to a logfile"""
|
||||||
|
log_write = open(config.testlogs + '/' + logname, 'a') # Open logfile
|
||||||
|
date_run = time.strftime("%d.%m.%Y %H:%M:%S") # Get the time of the test
|
||||||
|
realtime, usertime, systime = split_time(time_file) # Get the times in a nice form
|
||||||
|
|
||||||
|
# Append everything to a log file.
|
||||||
|
writestr = date_run + " " + config.revision + " Testname: " + logname +\
|
||||||
|
" RealTime: " + str(realtime) + " UserTime: " + str(usertime) +\
|
||||||
|
" SystemTime: " + str(systime) + " Branch: " + config.branch +'\n'
|
||||||
|
log_write.write(writestr)
|
||||||
|
log_write.close()
|
||||||
|
|
||||||
|
|
||||||
|
def execute_tests(testcase, cur_directory, config):
|
||||||
|
"""Executes timed tests based on the config file"""
|
||||||
|
#Figure out the order of which tests must be executed.
|
||||||
|
#Change to the current test directory
|
||||||
|
os.chdir(config.tests + '/' + cur_directory)
|
||||||
|
#Clear caches
|
||||||
|
subprocess.call(['sync'], shell=True)
|
||||||
|
subprocess.call([config.drop_caches], shell=True)
|
||||||
|
#Perform vanilla test and if a cached test exists - as well
|
||||||
|
print(testcase.name)
|
||||||
|
if 'vanilla' in testcase.permutations:
|
||||||
|
print(testcase.command)
|
||||||
|
subprocess.Popen(['time -p -o /tmp/time_moses_tests ' + testcase.command], stdout=None,\
|
||||||
|
stderr=subprocess.PIPE, shell=True).communicate()
|
||||||
|
write_log('/tmp/time_moses_tests', testcase.name + '_vanilla', config)
|
||||||
|
if 'cached' in testcase.permutations:
|
||||||
|
subprocess.Popen(['time -p -o /tmp/time_moses_tests ' + testcase.command], stdout=None,\
|
||||||
|
stderr=None, shell=True).communicate()
|
||||||
|
write_log('/tmp/time_moses_tests', testcase.name + '_vanilla_cached', config)
|
||||||
|
|
||||||
|
#Now perform LD_PRELOAD tests
|
||||||
|
if 'ldpre' in testcase.permutations:
|
||||||
|
for opt in testcase.ldopts:
|
||||||
|
#Clear caches
|
||||||
|
subprocess.call(['sync'], shell=True)
|
||||||
|
subprocess.call([config.drop_caches], shell=True)
|
||||||
|
|
||||||
|
#test
|
||||||
|
subprocess.Popen(['LD_PRELOAD ' + opt + ' time -p -o /tmp/time_moses_tests ' + testcase.command], stdout=None,\
|
||||||
|
stderr=None, shell=True).communicate()
|
||||||
|
write_log('/tmp/time_moses_tests', testcase.name + '_ldpre_' + opt, config)
|
||||||
|
if 'cached' in testcase.permutations:
|
||||||
|
subprocess.Popen(['LD_PRELOAD ' + opt + ' time -p -o /tmp/time_moses_tests ' + testcase.command], stdout=None,\
|
||||||
|
stderr=None, shell=True).communicate()
|
||||||
|
write_log('/tmp/time_moses_tests', testcase.name + '_ldpre_' +opt +'_cached', config)
|
||||||
|
|
||||||
|
# Go through all the test directories and executes tests
|
||||||
|
if __name__ == '__main__':
|
||||||
|
CONFIG = get_config()
|
||||||
|
ALL_DIR = os.listdir(CONFIG.tests)
|
||||||
|
|
||||||
|
#We should first check if any of the tests is run for the first time.
|
||||||
|
#If some of them are run for the first time we should first get their
|
||||||
|
#time with the base version (usually the previous release)
|
||||||
|
FIRSTTIME = []
|
||||||
|
TESTLOGS = []
|
||||||
|
#Strip filenames of test underscores
|
||||||
|
for listline in os.listdir(CONFIG.testlogs):
|
||||||
|
listline = listline.replace('_vanilla', '')
|
||||||
|
listline = listline.replace('_cached', '')
|
||||||
|
listline = listline.replace('_ldpre', '')
|
||||||
|
TESTLOGS.append(listline)
|
||||||
|
for directory in ALL_DIR:
|
||||||
|
if directory not in TESTLOGS:
|
||||||
|
FIRSTTIME.append(directory)
|
||||||
|
|
||||||
|
#Sometimes even though we have the log files, we will need to rerun them
|
||||||
|
#Against a base version, because we require a different baseversion (for
|
||||||
|
#example when a new version of Moses is released.) Therefore we should
|
||||||
|
#Check if the version of Moses that we have as a base version is in all
|
||||||
|
#of the log files.
|
||||||
|
|
||||||
|
for logfile in os.listdir(CONFIG.testlogs):
|
||||||
|
logfile_name = CONFIG.testlogs + '/' + logfile
|
||||||
|
if not check_for_basever(logfile_name, CONFIG.basebranch):
|
||||||
|
logfile = logfile.replace('_vanilla', '')
|
||||||
|
logfile = logfile.replace('_cached', '')
|
||||||
|
logfile = logfile.replace('_ldpre', '')
|
||||||
|
FIRSTTIME.append(logfile)
|
||||||
|
FIRSTTIME = list(set(FIRSTTIME)) #Deduplicate
|
||||||
|
|
||||||
|
if FIRSTTIME != []:
|
||||||
|
#Create a new configuration for base version tests:
|
||||||
|
BASECONFIG = Configuration(CONFIG.repo, CONFIG.drop_caches,\
|
||||||
|
CONFIG.tests, CONFIG.testlogs, CONFIG.basebranch,\
|
||||||
|
CONFIG.baserev)
|
||||||
|
BASECONFIG.additional_args(None, CONFIG.baserev, CONFIG.basebranch)
|
||||||
|
#Set up the repository and get its revision:
|
||||||
|
REVISION = repoinit(BASECONFIG)
|
||||||
|
BASECONFIG.set_revision(REVISION)
|
||||||
|
#Build
|
||||||
|
os.chdir(BASECONFIG.repo)
|
||||||
|
subprocess.call(['./previous.sh'], shell=True)
|
||||||
|
|
||||||
|
#Perform tests
|
||||||
|
for directory in FIRSTTIME:
|
||||||
|
cur_testcase = parse_configfile(BASECONFIG.tests + '/' + directory +\
|
||||||
|
'/config', directory, BASECONFIG.repo)
|
||||||
|
execute_tests(cur_testcase, directory, BASECONFIG)
|
||||||
|
|
||||||
|
#Reset back the repository to the normal configuration
|
||||||
|
repoinit(CONFIG)
|
||||||
|
|
||||||
|
#Builds moses
|
||||||
|
os.chdir(CONFIG.repo)
|
||||||
|
subprocess.call(['./previous.sh'], shell=True)
|
||||||
|
|
||||||
|
if CONFIG.singletest:
|
||||||
|
TESTCASE = parse_configfile(CONFIG.tests + '/' +\
|
||||||
|
CONFIG.singletest + '/config', CONFIG.singletest, CONFIG.repo)
|
||||||
|
execute_tests(TESTCASE, CONFIG.singletest, CONFIG)
|
||||||
|
else:
|
||||||
|
for directory in ALL_DIR:
|
||||||
|
cur_testcase = parse_configfile(CONFIG.tests + '/' + directory +\
|
||||||
|
'/config', directory, CONFIG.repo)
|
||||||
|
execute_tests(cur_testcase, directory, CONFIG)
|
22
contrib/moses-speedtest/sys_drop_caches.py
Normal file
22
contrib/moses-speedtest/sys_drop_caches.py
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
#!/usr/bin/spython
|
||||||
|
from sys import argv, stderr, exit
|
||||||
|
from os import linesep as ls
|
||||||
|
procfile = "/proc/sys/vm/drop_caches"
|
||||||
|
options = ["1","2","3"]
|
||||||
|
flush_type = None
|
||||||
|
try:
|
||||||
|
flush_type = argv[1][0:1]
|
||||||
|
if not flush_type in options:
|
||||||
|
raise IndexError, "not in options"
|
||||||
|
with open(procfile, "w") as f:
|
||||||
|
f.write("%s%s" % (flush_type,ls))
|
||||||
|
exit(0)
|
||||||
|
except IndexError, e:
|
||||||
|
stderr.write("Argument %s required.%s" % (options, ls))
|
||||||
|
except IOError, e:
|
||||||
|
stderr.write("Error writing to file.%s" % ls)
|
||||||
|
except StandardError, e:
|
||||||
|
stderr.write("Unknown Error.%s" % ls)
|
||||||
|
|
||||||
|
exit(1)
|
||||||
|
|
3
contrib/moses-speedtest/test_config
Normal file
3
contrib/moses-speedtest/test_config
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
Command: moses -f ... -i fff #Looks for the command in the /bin directory of the repo specified in the testsuite_config
|
||||||
|
LDPRE: ldpreloads #Comma separated LD_LIBRARY_PATH:/,
|
||||||
|
Variants: vanilla, cached, ldpre #Can't have cached without ldpre or vanilla
|
54
contrib/moses-speedtest/testsuite_common.py
Normal file
54
contrib/moses-speedtest/testsuite_common.py
Normal file
@ -0,0 +1,54 @@
|
|||||||
|
"""Common functions of the testsuitce"""
|
||||||
|
import os
|
||||||
|
#Clour constants
|
||||||
|
class bcolors:
|
||||||
|
PURPLE = '\033[95m'
|
||||||
|
BLUE = '\033[94m'
|
||||||
|
GREEN = '\033[92m'
|
||||||
|
YELLOW = '\033[93m'
|
||||||
|
RED = '\033[91m'
|
||||||
|
ENDC = '\033[0m'
|
||||||
|
|
||||||
|
class LogLine:
|
||||||
|
"""A class to contain logfile line"""
|
||||||
|
def __init__(self, date, time, revision, testname, real, user, system, branch):
|
||||||
|
self.date = date
|
||||||
|
self.time = time
|
||||||
|
self.revision = revision
|
||||||
|
self.testname = testname
|
||||||
|
self.real = real
|
||||||
|
self.system = system
|
||||||
|
self.user = user
|
||||||
|
self.branch = branch
|
||||||
|
|
||||||
|
class Result:
|
||||||
|
"""A class to contain results of benchmarking"""
|
||||||
|
def __init__(self, testname, previous, current, revision, branch, prevrev, prevbranch):
|
||||||
|
self.testname = testname
|
||||||
|
self.previous = previous
|
||||||
|
self.current = current
|
||||||
|
self.change = previous - current
|
||||||
|
self.revision = revision
|
||||||
|
self.branch = branch
|
||||||
|
self.prevbranch = prevbranch
|
||||||
|
self.prevrev = prevrev
|
||||||
|
#Produce a percentage with fewer digits
|
||||||
|
self.percentage = float(format(1 - current/previous, '.4f'))
|
||||||
|
|
||||||
|
def processLogLine(logline):
|
||||||
|
"""Parses the log line into a nice datastructure"""
|
||||||
|
logline = logline.split()
|
||||||
|
log = LogLine(logline[0], logline[1], logline[2], logline[4],\
|
||||||
|
float(logline[6]), float(logline[8]), float(logline[10]), logline[12])
|
||||||
|
return log
|
||||||
|
|
||||||
|
def getLastTwoLines(filename, logdir):
|
||||||
|
"""Just a call to tail to get the diff between the last two runs"""
|
||||||
|
try:
|
||||||
|
line1, line2 = os.popen("tail -n2 " + logdir + '/' + filename)
|
||||||
|
except ValueError: #Check for new tests
|
||||||
|
tempfile = open(logdir + '/' + filename)
|
||||||
|
line1 = tempfile.readline()
|
||||||
|
tempfile.close()
|
||||||
|
return (line1, '\n')
|
||||||
|
return (line1, line2)
|
5
contrib/moses-speedtest/testsuite_config
Normal file
5
contrib/moses-speedtest/testsuite_config
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
MOSES_REPO_PATH: /home/moses-speedtest/moses-standard/mosesdecoder
|
||||||
|
DROP_CACHES_COMM: sys_drop_caches 3
|
||||||
|
TEST_DIR: /home/moses-speedtest/phrase_tables/tests
|
||||||
|
TEST_LOG_DIR: /home/moses-speedtest/phrase_tables/testlogs
|
||||||
|
BASEBRANCH: RELEASE-2.1.1
|
132
contrib/other-builds/consolidate/.cproject
Normal file
132
contrib/other-builds/consolidate/.cproject
Normal file
@ -0,0 +1,132 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||||
|
<?fileVersion 4.0.0?><cproject storage_type_id="org.eclipse.cdt.core.XmlProjectDescriptionStorage">
|
||||||
|
<storageModule moduleId="org.eclipse.cdt.core.settings">
|
||||||
|
<cconfiguration id="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686">
|
||||||
|
<storageModule buildSystemId="org.eclipse.cdt.managedbuilder.core.configurationDataProvider" id="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686" moduleId="org.eclipse.cdt.core.settings" name="Debug">
|
||||||
|
<externalSettings/>
|
||||||
|
<extensions>
|
||||||
|
<extension id="org.eclipse.cdt.core.GmakeErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
|
||||||
|
<extension id="org.eclipse.cdt.core.CWDLocator" point="org.eclipse.cdt.core.ErrorParser"/>
|
||||||
|
<extension id="org.eclipse.cdt.core.GCCErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
|
||||||
|
<extension id="org.eclipse.cdt.core.GASErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
|
||||||
|
<extension id="org.eclipse.cdt.core.GLDErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
|
||||||
|
<extension id="org.eclipse.cdt.core.ELF" point="org.eclipse.cdt.core.BinaryParser"/>
|
||||||
|
</extensions>
|
||||||
|
</storageModule>
|
||||||
|
<storageModule moduleId="cdtBuildSystem" version="4.0.0">
|
||||||
|
<configuration artifactName="${ProjName}" buildArtefactType="org.eclipse.cdt.build.core.buildArtefactType.exe" buildProperties="org.eclipse.cdt.build.core.buildType=org.eclipse.cdt.build.core.buildType.debug,org.eclipse.cdt.build.core.buildArtefactType=org.eclipse.cdt.build.core.buildArtefactType.exe" cleanCommand="rm -rf" description="" id="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686" name="Debug" parent="cdt.managedbuild.config.gnu.cross.exe.debug">
|
||||||
|
<folderInfo id="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686." name="/" resourcePath="">
|
||||||
|
<toolChain id="cdt.managedbuild.toolchain.gnu.cross.exe.debug.1312813804" name="Cross GCC" superClass="cdt.managedbuild.toolchain.gnu.cross.exe.debug">
|
||||||
|
<targetPlatform archList="all" binaryParser="org.eclipse.cdt.core.ELF" id="cdt.managedbuild.targetPlatform.gnu.cross.1457158442" isAbstract="false" osList="all" superClass="cdt.managedbuild.targetPlatform.gnu.cross"/>
|
||||||
|
<builder buildPath="${workspace_loc:/consolidate}/Debug" id="cdt.managedbuild.builder.gnu.cross.401817170" keepEnvironmentInBuildfile="false" managedBuildOn="true" name="Gnu Make Builder" superClass="cdt.managedbuild.builder.gnu.cross"/>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.c.compiler.584773180" name="Cross GCC Compiler" superClass="cdt.managedbuild.tool.gnu.cross.c.compiler">
|
||||||
|
<option defaultValue="gnu.c.optimization.level.none" id="gnu.c.compiler.option.optimization.level.548826159" name="Optimization Level" superClass="gnu.c.compiler.option.optimization.level" valueType="enumerated"/>
|
||||||
|
<option id="gnu.c.compiler.option.debugging.level.69309976" name="Debug Level" superClass="gnu.c.compiler.option.debugging.level" value="gnu.c.debugging.level.max" valueType="enumerated"/>
|
||||||
|
<inputType id="cdt.managedbuild.tool.gnu.c.compiler.input.1869389417" superClass="cdt.managedbuild.tool.gnu.c.compiler.input"/>
|
||||||
|
</tool>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.cpp.compiler.1684035985" name="Cross G++ Compiler" superClass="cdt.managedbuild.tool.gnu.cross.cpp.compiler">
|
||||||
|
<option id="gnu.cpp.compiler.option.optimization.level.1978964587" name="Optimization Level" superClass="gnu.cpp.compiler.option.optimization.level" value="gnu.cpp.compiler.optimization.level.none" valueType="enumerated"/>
|
||||||
|
<option id="gnu.cpp.compiler.option.debugging.level.1174628687" name="Debug Level" superClass="gnu.cpp.compiler.option.debugging.level" value="gnu.cpp.compiler.debugging.level.max" valueType="enumerated"/>
|
||||||
|
<option id="gnu.cpp.compiler.option.include.paths.1899244069" name="Include paths (-I)" superClass="gnu.cpp.compiler.option.include.paths" valueType="includePath">
|
||||||
|
<listOptionValue builtIn="false" value=""${workspace_loc}/../../boost/include""/>
|
||||||
|
</option>
|
||||||
|
<inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.1369007077" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
|
||||||
|
</tool>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.c.linker.988122551" name="Cross GCC Linker" superClass="cdt.managedbuild.tool.gnu.cross.c.linker"/>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.cpp.linker.580092188" name="Cross G++ Linker" superClass="cdt.managedbuild.tool.gnu.cross.cpp.linker">
|
||||||
|
<option id="gnu.cpp.link.option.libs.1224797947" name="Libraries (-l)" superClass="gnu.cpp.link.option.libs" valueType="libs">
|
||||||
|
<listOptionValue builtIn="false" value="z"/>
|
||||||
|
<listOptionValue builtIn="false" value="boost_iostreams-mt"/>
|
||||||
|
</option>
|
||||||
|
<option id="gnu.cpp.link.option.paths.845281969" superClass="gnu.cpp.link.option.paths" valueType="libPaths">
|
||||||
|
<listOptionValue builtIn="false" value=""${workspace_loc:}/../../boost/lib64""/>
|
||||||
|
</option>
|
||||||
|
<inputType id="cdt.managedbuild.tool.gnu.cpp.linker.input.1562981657" superClass="cdt.managedbuild.tool.gnu.cpp.linker.input">
|
||||||
|
<additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
|
||||||
|
<additionalInput kind="additionalinput" paths="$(LIBS)"/>
|
||||||
|
</inputType>
|
||||||
|
</tool>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.archiver.1813579853" name="Cross GCC Archiver" superClass="cdt.managedbuild.tool.gnu.cross.archiver"/>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.assembler.660034723" name="Cross GCC Assembler" superClass="cdt.managedbuild.tool.gnu.cross.assembler">
|
||||||
|
<inputType id="cdt.managedbuild.tool.gnu.assembler.input.2016181080" superClass="cdt.managedbuild.tool.gnu.assembler.input"/>
|
||||||
|
</tool>
|
||||||
|
</toolChain>
|
||||||
|
</folderInfo>
|
||||||
|
</configuration>
|
||||||
|
</storageModule>
|
||||||
|
<storageModule moduleId="org.eclipse.cdt.core.externalSettings"/>
|
||||||
|
</cconfiguration>
|
||||||
|
<cconfiguration id="cdt.managedbuild.config.gnu.cross.exe.release.1197533473">
|
||||||
|
<storageModule buildSystemId="org.eclipse.cdt.managedbuilder.core.configurationDataProvider" id="cdt.managedbuild.config.gnu.cross.exe.release.1197533473" moduleId="org.eclipse.cdt.core.settings" name="Release">
|
||||||
|
<externalSettings/>
|
||||||
|
<extensions>
|
||||||
|
<extension id="org.eclipse.cdt.core.GmakeErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
|
||||||
|
<extension id="org.eclipse.cdt.core.CWDLocator" point="org.eclipse.cdt.core.ErrorParser"/>
|
||||||
|
<extension id="org.eclipse.cdt.core.GCCErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
|
||||||
|
<extension id="org.eclipse.cdt.core.GASErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
|
||||||
|
<extension id="org.eclipse.cdt.core.GLDErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
|
||||||
|
<extension id="org.eclipse.cdt.core.ELF" point="org.eclipse.cdt.core.BinaryParser"/>
|
||||||
|
</extensions>
|
||||||
|
</storageModule>
|
||||||
|
<storageModule moduleId="cdtBuildSystem" version="4.0.0">
|
||||||
|
<configuration artifactName="${ProjName}" buildArtefactType="org.eclipse.cdt.build.core.buildArtefactType.exe" buildProperties="org.eclipse.cdt.build.core.buildType=org.eclipse.cdt.build.core.buildType.release,org.eclipse.cdt.build.core.buildArtefactType=org.eclipse.cdt.build.core.buildArtefactType.exe" cleanCommand="rm -rf" description="" id="cdt.managedbuild.config.gnu.cross.exe.release.1197533473" name="Release" parent="cdt.managedbuild.config.gnu.cross.exe.release">
|
||||||
|
<folderInfo id="cdt.managedbuild.config.gnu.cross.exe.release.1197533473." name="/" resourcePath="">
|
||||||
|
<toolChain id="cdt.managedbuild.toolchain.gnu.cross.exe.release.1193312581" name="Cross GCC" superClass="cdt.managedbuild.toolchain.gnu.cross.exe.release">
|
||||||
|
<targetPlatform archList="all" binaryParser="org.eclipse.cdt.core.ELF" id="cdt.managedbuild.targetPlatform.gnu.cross.1614674218" isAbstract="false" osList="all" superClass="cdt.managedbuild.targetPlatform.gnu.cross"/>
|
||||||
|
<builder buildPath="${workspace_loc:/consolidate}/Release" id="cdt.managedbuild.builder.gnu.cross.1921548268" keepEnvironmentInBuildfile="false" managedBuildOn="true" name="Gnu Make Builder" superClass="cdt.managedbuild.builder.gnu.cross"/>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.c.compiler.1402792534" name="Cross GCC Compiler" superClass="cdt.managedbuild.tool.gnu.cross.c.compiler">
|
||||||
|
<option defaultValue="gnu.c.optimization.level.most" id="gnu.c.compiler.option.optimization.level.172258714" name="Optimization Level" superClass="gnu.c.compiler.option.optimization.level" valueType="enumerated"/>
|
||||||
|
<option id="gnu.c.compiler.option.debugging.level.949623548" name="Debug Level" superClass="gnu.c.compiler.option.debugging.level" value="gnu.c.debugging.level.none" valueType="enumerated"/>
|
||||||
|
<inputType id="cdt.managedbuild.tool.gnu.c.compiler.input.1960225725" superClass="cdt.managedbuild.tool.gnu.c.compiler.input"/>
|
||||||
|
</tool>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.cpp.compiler.1697856596" name="Cross G++ Compiler" superClass="cdt.managedbuild.tool.gnu.cross.cpp.compiler">
|
||||||
|
<option id="gnu.cpp.compiler.option.optimization.level.1575999400" name="Optimization Level" superClass="gnu.cpp.compiler.option.optimization.level" value="gnu.cpp.compiler.optimization.level.most" valueType="enumerated"/>
|
||||||
|
<option id="gnu.cpp.compiler.option.debugging.level.732263649" name="Debug Level" superClass="gnu.cpp.compiler.option.debugging.level" value="gnu.cpp.compiler.debugging.level.none" valueType="enumerated"/>
|
||||||
|
<inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.1685852561" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
|
||||||
|
</tool>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.c.linker.1332869586" name="Cross GCC Linker" superClass="cdt.managedbuild.tool.gnu.cross.c.linker"/>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.cpp.linker.484647585" name="Cross G++ Linker" superClass="cdt.managedbuild.tool.gnu.cross.cpp.linker">
|
||||||
|
<inputType id="cdt.managedbuild.tool.gnu.cpp.linker.input.2140954002" superClass="cdt.managedbuild.tool.gnu.cpp.linker.input">
|
||||||
|
<additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
|
||||||
|
<additionalInput kind="additionalinput" paths="$(LIBS)"/>
|
||||||
|
</inputType>
|
||||||
|
</tool>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.archiver.620666274" name="Cross GCC Archiver" superClass="cdt.managedbuild.tool.gnu.cross.archiver"/>
|
||||||
|
<tool id="cdt.managedbuild.tool.gnu.cross.assembler.1478840357" name="Cross GCC Assembler" superClass="cdt.managedbuild.tool.gnu.cross.assembler">
|
||||||
|
<inputType id="cdt.managedbuild.tool.gnu.assembler.input.412043972" superClass="cdt.managedbuild.tool.gnu.assembler.input"/>
|
||||||
|
</tool>
|
||||||
|
</toolChain>
|
||||||
|
</folderInfo>
|
||||||
|
</configuration>
|
||||||
|
</storageModule>
|
||||||
|
<storageModule moduleId="org.eclipse.cdt.core.externalSettings"/>
|
||||||
|
</cconfiguration>
|
||||||
|
</storageModule>
|
||||||
|
<storageModule moduleId="cdtBuildSystem" version="4.0.0">
|
||||||
|
<project id="consolidate.cdt.managedbuild.target.gnu.cross.exe.1166003694" name="Executable" projectType="cdt.managedbuild.target.gnu.cross.exe"/>
|
||||||
|
</storageModule>
|
||||||
|
<storageModule moduleId="scannerConfiguration">
|
||||||
|
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId=""/>
|
||||||
|
<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686;cdt.managedbuild.config.gnu.cross.exe.debug.1847651686.;cdt.managedbuild.tool.gnu.cross.c.compiler.584773180;cdt.managedbuild.tool.gnu.c.compiler.input.1869389417">
|
||||||
|
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileC"/>
|
||||||
|
</scannerConfigBuildInfo>
|
||||||
|
<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.cross.exe.release.1197533473;cdt.managedbuild.config.gnu.cross.exe.release.1197533473.;cdt.managedbuild.tool.gnu.cross.cpp.compiler.1697856596;cdt.managedbuild.tool.gnu.cpp.compiler.input.1685852561">
|
||||||
|
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileCPP"/>
|
||||||
|
</scannerConfigBuildInfo>
|
||||||
|
<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.cross.exe.debug.1847651686;cdt.managedbuild.config.gnu.cross.exe.debug.1847651686.;cdt.managedbuild.tool.gnu.cross.cpp.compiler.1684035985;cdt.managedbuild.tool.gnu.cpp.compiler.input.1369007077">
|
||||||
|
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileCPP"/>
|
||||||
|
</scannerConfigBuildInfo>
|
||||||
|
<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.cross.exe.release.1197533473;cdt.managedbuild.config.gnu.cross.exe.release.1197533473.;cdt.managedbuild.tool.gnu.cross.c.compiler.1402792534;cdt.managedbuild.tool.gnu.c.compiler.input.1960225725">
|
||||||
|
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileC"/>
|
||||||
|
</scannerConfigBuildInfo>
|
||||||
|
</storageModule>
|
||||||
|
<storageModule moduleId="org.eclipse.cdt.core.LanguageSettingsProviders"/>
|
||||||
|
<storageModule moduleId="refreshScope" versionNumber="2">
|
||||||
|
<configuration configurationName="Release">
|
||||||
|
<resource resourceType="PROJECT" workspacePath="/consolidate"/>
|
||||||
|
</configuration>
|
||||||
|
<configuration configurationName="Debug">
|
||||||
|
<resource resourceType="PROJECT" workspacePath="/consolidate"/>
|
||||||
|
</configuration>
|
||||||
|
</storageModule>
|
||||||
|
</cproject>
|
64
contrib/other-builds/consolidate/.project
Normal file
64
contrib/other-builds/consolidate/.project
Normal file
@ -0,0 +1,64 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<projectDescription>
|
||||||
|
<name>consolidate</name>
|
||||||
|
<comment></comment>
|
||||||
|
<projects>
|
||||||
|
</projects>
|
||||||
|
<buildSpec>
|
||||||
|
<buildCommand>
|
||||||
|
<name>org.eclipse.cdt.managedbuilder.core.genmakebuilder</name>
|
||||||
|
<triggers>clean,full,incremental,</triggers>
|
||||||
|
<arguments>
|
||||||
|
</arguments>
|
||||||
|
</buildCommand>
|
||||||
|
<buildCommand>
|
||||||
|
<name>org.eclipse.cdt.managedbuilder.core.ScannerConfigBuilder</name>
|
||||||
|
<triggers>full,incremental,</triggers>
|
||||||
|
<arguments>
|
||||||
|
</arguments>
|
||||||
|
</buildCommand>
|
||||||
|
</buildSpec>
|
||||||
|
<natures>
|
||||||
|
<nature>org.eclipse.cdt.core.cnature</nature>
|
||||||
|
<nature>org.eclipse.cdt.core.ccnature</nature>
|
||||||
|
<nature>org.eclipse.cdt.managedbuilder.core.managedBuildNature</nature>
|
||||||
|
<nature>org.eclipse.cdt.managedbuilder.core.ScannerConfigNature</nature>
|
||||||
|
</natures>
|
||||||
|
<linkedResources>
|
||||||
|
<link>
|
||||||
|
<name>InputFileStream.cpp</name>
|
||||||
|
<type>1</type>
|
||||||
|
<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/InputFileStream.cpp</locationURI>
|
||||||
|
</link>
|
||||||
|
<link>
|
||||||
|
<name>InputFileStream.h</name>
|
||||||
|
<type>1</type>
|
||||||
|
<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/InputFileStream.h</locationURI>
|
||||||
|
</link>
|
||||||
|
<link>
|
||||||
|
<name>OutputFileStream.cpp</name>
|
||||||
|
<type>1</type>
|
||||||
|
<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/OutputFileStream.cpp</locationURI>
|
||||||
|
</link>
|
||||||
|
<link>
|
||||||
|
<name>OutputFileStream.h</name>
|
||||||
|
<type>1</type>
|
||||||
|
<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/OutputFileStream.h</locationURI>
|
||||||
|
</link>
|
||||||
|
<link>
|
||||||
|
<name>consolidate-main.cpp</name>
|
||||||
|
<type>1</type>
|
||||||
|
<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/consolidate-main.cpp</locationURI>
|
||||||
|
</link>
|
||||||
|
<link>
|
||||||
|
<name>tables-core.cpp</name>
|
||||||
|
<type>1</type>
|
||||||
|
<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/tables-core.cpp</locationURI>
|
||||||
|
</link>
|
||||||
|
<link>
|
||||||
|
<name>tables-core.h</name>
|
||||||
|
<type>1</type>
|
||||||
|
<locationURI>PARENT-3-PROJECT_LOC/phrase-extract/tables-core.h</locationURI>
|
||||||
|
</link>
|
||||||
|
</linkedResources>
|
||||||
|
</projectDescription>
|
@ -42,9 +42,11 @@
|
|||||||
</option>
|
</option>
|
||||||
<option id="gnu.cpp.link.option.libs.585257079" name="Libraries (-l)" superClass="gnu.cpp.link.option.libs" valueType="libs">
|
<option id="gnu.cpp.link.option.libs.585257079" name="Libraries (-l)" superClass="gnu.cpp.link.option.libs" valueType="libs">
|
||||||
<listOptionValue builtIn="false" value="mert_lib"/>
|
<listOptionValue builtIn="false" value="mert_lib"/>
|
||||||
<listOptionValue builtIn="false" value="boost_system-mt"/>
|
|
||||||
<listOptionValue builtIn="false" value="util"/>
|
<listOptionValue builtIn="false" value="util"/>
|
||||||
|
<listOptionValue builtIn="false" value="boost_system-mt"/>
|
||||||
|
<listOptionValue builtIn="false" value="boost_thread-mt"/>
|
||||||
<listOptionValue builtIn="false" value="z"/>
|
<listOptionValue builtIn="false" value="z"/>
|
||||||
|
<listOptionValue builtIn="false" value="pthread"/>
|
||||||
</option>
|
</option>
|
||||||
<inputType id="cdt.managedbuild.tool.gnu.cpp.linker.input.656319745" superClass="cdt.managedbuild.tool.gnu.cpp.linker.input">
|
<inputType id="cdt.managedbuild.tool.gnu.cpp.linker.input.656319745" superClass="cdt.managedbuild.tool.gnu.cpp.linker.input">
|
||||||
<additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
|
<additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
|
||||||
|
@ -4,6 +4,7 @@
|
|||||||
<comment></comment>
|
<comment></comment>
|
||||||
<projects>
|
<projects>
|
||||||
<project>mert_lib</project>
|
<project>mert_lib</project>
|
||||||
|
<project>util</project>
|
||||||
</projects>
|
</projects>
|
||||||
<buildSpec>
|
<buildSpec>
|
||||||
<buildCommand>
|
<buildCommand>
|
||||||
|
@ -125,7 +125,7 @@ void ChartManager::ProcessSentence()
|
|||||||
*/
|
*/
|
||||||
void ChartManager::AddXmlChartOptions()
|
void ChartManager::AddXmlChartOptions()
|
||||||
{
|
{
|
||||||
const StaticData &staticData = StaticData::Instance();
|
// const StaticData &staticData = StaticData::Instance();
|
||||||
|
|
||||||
const std::vector <ChartTranslationOptions*> xmlChartOptionsList = m_source.GetXmlChartTranslationOptions();
|
const std::vector <ChartTranslationOptions*> xmlChartOptionsList = m_source.GetXmlChartTranslationOptions();
|
||||||
IFVERBOSE(2) {
|
IFVERBOSE(2) {
|
||||||
|
@ -142,7 +142,7 @@ namespace Moses
|
|||||||
{
|
{
|
||||||
Clear();
|
Clear();
|
||||||
|
|
||||||
const StaticData &staticData = StaticData::Instance();
|
// const StaticData &staticData = StaticData::Instance();
|
||||||
const InputFeature &inputFeature = InputFeature::Instance();
|
const InputFeature &inputFeature = InputFeature::Instance();
|
||||||
size_t numInputScores = inputFeature.GetNumInputScores();
|
size_t numInputScores = inputFeature.GetNumInputScores();
|
||||||
size_t numRealWordCount = inputFeature.GetNumRealWordsInInput();
|
size_t numRealWordCount = inputFeature.GetNumRealWordsInInput();
|
||||||
|
@ -85,7 +85,7 @@ size_t InputPath::GetTotalRuleSize() const
|
|||||||
size_t ret = 0;
|
size_t ret = 0;
|
||||||
std::map<const PhraseDictionary*, std::pair<const TargetPhraseCollection*, const void*> >::const_iterator iter;
|
std::map<const PhraseDictionary*, std::pair<const TargetPhraseCollection*, const void*> >::const_iterator iter;
|
||||||
for (iter = m_targetPhrases.begin(); iter != m_targetPhrases.end(); ++iter) {
|
for (iter = m_targetPhrases.begin(); iter != m_targetPhrases.end(); ++iter) {
|
||||||
const PhraseDictionary *pt = iter->first;
|
// const PhraseDictionary *pt = iter->first;
|
||||||
const TargetPhraseCollection *tpColl = iter->second.first;
|
const TargetPhraseCollection *tpColl = iter->second.first;
|
||||||
|
|
||||||
if (tpColl) {
|
if (tpColl) {
|
||||||
|
@ -15,7 +15,7 @@ public:
|
|||||||
|
|
||||||
virtual void ProcessValue() {};
|
virtual void ProcessValue() {};
|
||||||
|
|
||||||
const std::string &GetValueString() { return m_value; };
|
const std::string &GetValueString() const { return m_value; };
|
||||||
|
|
||||||
protected:
|
protected:
|
||||||
|
|
||||||
|
@ -47,8 +47,8 @@ class WordsRange;
|
|||||||
class Phrase
|
class Phrase
|
||||||
{
|
{
|
||||||
friend std::ostream& operator<<(std::ostream&, const Phrase&);
|
friend std::ostream& operator<<(std::ostream&, const Phrase&);
|
||||||
private:
|
// private:
|
||||||
|
protected:
|
||||||
std::vector<Word> m_words;
|
std::vector<Word> m_words;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
|
@ -494,7 +494,8 @@ bool StaticData::LoadData(Parameter *parameter)
|
|||||||
}
|
}
|
||||||
m_xmlBrackets.first= brackets[0];
|
m_xmlBrackets.first= brackets[0];
|
||||||
m_xmlBrackets.second=brackets[1];
|
m_xmlBrackets.second=brackets[1];
|
||||||
cerr << "XML tags opening and closing brackets for XML input are: " << m_xmlBrackets.first << " and " << m_xmlBrackets.second << endl;
|
VERBOSE(1,"XML tags opening and closing brackets for XML input are: "
|
||||||
|
<< m_xmlBrackets.first << " and " << m_xmlBrackets.second << endl);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (m_parameter->GetParam("placeholder-factor").size() > 0) {
|
if (m_parameter->GetParam("placeholder-factor").size() > 0) {
|
||||||
@ -511,7 +512,7 @@ bool StaticData::LoadData(Parameter *parameter)
|
|||||||
const vector<string> &features = m_parameter->GetParam("feature");
|
const vector<string> &features = m_parameter->GetParam("feature");
|
||||||
for (size_t i = 0; i < features.size(); ++i) {
|
for (size_t i = 0; i < features.size(); ++i) {
|
||||||
const string &line = Trim(features[i]);
|
const string &line = Trim(features[i]);
|
||||||
cerr << "line=" << line << endl;
|
VERBOSE(1,"line=" << line << endl);
|
||||||
if (line.empty())
|
if (line.empty())
|
||||||
continue;
|
continue;
|
||||||
|
|
||||||
@ -535,7 +536,9 @@ bool StaticData::LoadData(Parameter *parameter)
|
|||||||
NoCache();
|
NoCache();
|
||||||
OverrideFeatures();
|
OverrideFeatures();
|
||||||
|
|
||||||
|
if (!m_parameter->isParamSpecified("show-weights")) {
|
||||||
LoadFeatureFunctions();
|
LoadFeatureFunctions();
|
||||||
|
}
|
||||||
|
|
||||||
if (!LoadDecodeGraphs()) return false;
|
if (!LoadDecodeGraphs()) return false;
|
||||||
|
|
||||||
@ -640,7 +643,8 @@ void StaticData::LoadNonTerminals()
|
|||||||
"Incorrect unknown LHS format: " << line);
|
"Incorrect unknown LHS format: " << line);
|
||||||
UnknownLHSEntry entry(tokens[0], Scan<float>(tokens[1]));
|
UnknownLHSEntry entry(tokens[0], Scan<float>(tokens[1]));
|
||||||
m_unknownLHS.push_back(entry);
|
m_unknownLHS.push_back(entry);
|
||||||
const Factor *targetFactor = factorCollection.AddFactor(Output, 0, tokens[0], true);
|
// const Factor *targetFactor =
|
||||||
|
factorCollection.AddFactor(Output, 0, tokens[0], true);
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
@ -734,7 +738,7 @@ bool StaticData::LoadDecodeGraphs()
|
|||||||
DecodeGraph *decodeGraph;
|
DecodeGraph *decodeGraph;
|
||||||
if (IsChart()) {
|
if (IsChart()) {
|
||||||
size_t maxChartSpan = (decodeGraphInd < maxChartSpans.size()) ? maxChartSpans[decodeGraphInd] : DEFAULT_MAX_CHART_SPAN;
|
size_t maxChartSpan = (decodeGraphInd < maxChartSpans.size()) ? maxChartSpans[decodeGraphInd] : DEFAULT_MAX_CHART_SPAN;
|
||||||
cerr << "max-chart-span: " << maxChartSpans[decodeGraphInd] << endl;
|
VERBOSE(1,"max-chart-span: " << maxChartSpans[decodeGraphInd] << endl);
|
||||||
decodeGraph = new DecodeGraph(m_decodeGraphs.size(), maxChartSpan);
|
decodeGraph = new DecodeGraph(m_decodeGraphs.size(), maxChartSpan);
|
||||||
} else {
|
} else {
|
||||||
decodeGraph = new DecodeGraph(m_decodeGraphs.size());
|
decodeGraph = new DecodeGraph(m_decodeGraphs.size());
|
||||||
@ -866,7 +870,7 @@ void StaticData::SetExecPath(const std::string &path)
|
|||||||
if (pos != string::npos) {
|
if (pos != string::npos) {
|
||||||
m_binPath = path.substr(0, pos);
|
m_binPath = path.substr(0, pos);
|
||||||
}
|
}
|
||||||
cerr << m_binPath << endl;
|
VERBOSE(1,m_binPath << endl);
|
||||||
}
|
}
|
||||||
|
|
||||||
const string &StaticData::GetBinDirectory() const
|
const string &StaticData::GetBinDirectory() const
|
||||||
@ -920,7 +924,8 @@ void StaticData::LoadFeatureFunctions()
|
|||||||
FeatureFunction *ff = *iter;
|
FeatureFunction *ff = *iter;
|
||||||
bool doLoad = true;
|
bool doLoad = true;
|
||||||
|
|
||||||
if (PhraseDictionary *ffCast = dynamic_cast<PhraseDictionary*>(ff)) {
|
// if (PhraseDictionary *ffCast = dynamic_cast<PhraseDictionary*>(ff)) {
|
||||||
|
if (dynamic_cast<PhraseDictionary*>(ff)) {
|
||||||
doLoad = false;
|
doLoad = false;
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -964,7 +969,7 @@ bool StaticData::CheckWeights() const
|
|||||||
set<string>::iterator iter;
|
set<string>::iterator iter;
|
||||||
for (iter = weightNames.begin(); iter != weightNames.end(); ) {
|
for (iter = weightNames.begin(); iter != weightNames.end(); ) {
|
||||||
string fname = (*iter).substr(0, (*iter).find("_"));
|
string fname = (*iter).substr(0, (*iter).find("_"));
|
||||||
cerr << fname << "\n";
|
VERBOSE(1,fname << "\n");
|
||||||
if (featureNames.find(fname) != featureNames.end()) {
|
if (featureNames.find(fname) != featureNames.end()) {
|
||||||
weightNames.erase(iter++);
|
weightNames.erase(iter++);
|
||||||
}
|
}
|
||||||
@ -1039,7 +1044,7 @@ bool StaticData::LoadAlternateWeightSettings()
|
|||||||
vector<string> tokens = Tokenize(weightSpecification[i]);
|
vector<string> tokens = Tokenize(weightSpecification[i]);
|
||||||
vector<string> args = Tokenize(tokens[0], "=");
|
vector<string> args = Tokenize(tokens[0], "=");
|
||||||
currentId = args[1];
|
currentId = args[1];
|
||||||
cerr << "alternate weight setting " << currentId << endl;
|
VERBOSE(1,"alternate weight setting " << currentId << endl);
|
||||||
UTIL_THROW_IF2(m_weightSetting.find(currentId) != m_weightSetting.end(),
|
UTIL_THROW_IF2(m_weightSetting.find(currentId) != m_weightSetting.end(),
|
||||||
"Duplicate alternate weight id: " << currentId);
|
"Duplicate alternate weight id: " << currentId);
|
||||||
m_weightSetting[ currentId ] = new ScoreComponentCollection;
|
m_weightSetting[ currentId ] = new ScoreComponentCollection;
|
||||||
|
@ -44,6 +44,12 @@ public:
|
|||||||
typedef CollType::iterator iterator;
|
typedef CollType::iterator iterator;
|
||||||
typedef CollType::const_iterator const_iterator;
|
typedef CollType::const_iterator const_iterator;
|
||||||
|
|
||||||
|
TargetPhrase const*
|
||||||
|
operator[](size_t const i) const
|
||||||
|
{
|
||||||
|
return m_collection.at(i);
|
||||||
|
}
|
||||||
|
|
||||||
iterator begin() {
|
iterator begin() {
|
||||||
return m_collection.begin();
|
return m_collection.begin();
|
||||||
}
|
}
|
||||||
|
@ -17,12 +17,8 @@ License along with this library; if not, write to the Free Software
|
|||||||
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
||||||
***********************************************************************/
|
***********************************************************************/
|
||||||
#include "util/exception.hh"
|
#include "util/exception.hh"
|
||||||
|
|
||||||
#include "moses/TranslationModel/PhraseDictionaryMultiModelCounts.h"
|
#include "moses/TranslationModel/PhraseDictionaryMultiModelCounts.h"
|
||||||
|
|
||||||
#define LINE_MAX_LENGTH 100000
|
|
||||||
#include "phrase-extract/SafeGetline.h" // for SAFE_GETLINE()
|
|
||||||
|
|
||||||
using namespace std;
|
using namespace std;
|
||||||
|
|
||||||
template<typename T>
|
template<typename T>
|
||||||
@ -461,16 +457,14 @@ void PhraseDictionaryMultiModelCounts::LoadLexicalTable( string &fileName, lexic
|
|||||||
}
|
}
|
||||||
istream *inFileP = &inFile;
|
istream *inFileP = &inFile;
|
||||||
|
|
||||||
char line[LINE_MAX_LENGTH];
|
|
||||||
|
|
||||||
int i=0;
|
int i=0;
|
||||||
while(true) {
|
string line;
|
||||||
|
|
||||||
|
while(getline(*inFileP, line)) {
|
||||||
i++;
|
i++;
|
||||||
if (i%100000 == 0) cerr << "." << flush;
|
if (i%100000 == 0) cerr << "." << flush;
|
||||||
SAFE_GETLINE((*inFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
if (inFileP->eof()) break;
|
|
||||||
|
|
||||||
vector<string> token = tokenize( line );
|
vector<string> token = tokenize( line.c_str() );
|
||||||
if (token.size() != 4) {
|
if (token.size() != 4) {
|
||||||
cerr << "line " << i << " in " << fileName
|
cerr << "line " << i << " in " << fileName
|
||||||
<< " has wrong number of tokens, skipping:\n"
|
<< " has wrong number of tokens, skipping:\n"
|
||||||
|
@ -9,6 +9,17 @@ $(TOP)/moses/TranslationModel/UG//mmsapt
|
|||||||
$(TOP)/util//kenutil
|
$(TOP)/util//kenutil
|
||||||
;
|
;
|
||||||
|
|
||||||
|
exe lookup_mmsapt :
|
||||||
|
lookup_mmsapt.cc
|
||||||
|
$(TOP)/moses//moses
|
||||||
|
$(TOP)/moses/TranslationModel/UG/generic//generic
|
||||||
|
$(TOP)//boost_iostreams
|
||||||
|
$(TOP)//boost_program_options
|
||||||
|
$(TOP)/moses/TranslationModel/UG/mm//mm
|
||||||
|
$(TOP)/moses/TranslationModel/UG//mmsapt
|
||||||
|
$(TOP)/util//kenutil
|
||||||
|
;
|
||||||
|
|
||||||
install $(PREFIX)/bin : try-align ;
|
install $(PREFIX)/bin : try-align ;
|
||||||
|
|
||||||
fakelib mmsapt : [ glob *.cpp mmsapt*.cc ] ;
|
fakelib mmsapt : [ glob *.cpp mmsapt*.cc ] ;
|
||||||
|
76
moses/TranslationModel/UG/lookup_mmsapt.cc
Normal file
76
moses/TranslationModel/UG/lookup_mmsapt.cc
Normal file
@ -0,0 +1,76 @@
|
|||||||
|
#include "mmsapt.h"
|
||||||
|
#include <boost/foreach.hpp>
|
||||||
|
#include <boost/tokenizer.hpp>
|
||||||
|
#include <boost/shared_ptr.hpp>
|
||||||
|
#include <algorithm>
|
||||||
|
#include <iostream>
|
||||||
|
|
||||||
|
using namespace Moses;
|
||||||
|
using namespace bitext;
|
||||||
|
using namespace std;
|
||||||
|
using namespace boost;
|
||||||
|
|
||||||
|
vector<FactorType> fo(1,FactorType(0));
|
||||||
|
|
||||||
|
class SimplePhrase : public Moses::Phrase
|
||||||
|
{
|
||||||
|
vector<FactorType> const m_fo; // factor order
|
||||||
|
public:
|
||||||
|
SimplePhrase(): m_fo(1,FactorType(0)) {}
|
||||||
|
|
||||||
|
void init(string const& s)
|
||||||
|
{
|
||||||
|
istringstream buf(s); string w;
|
||||||
|
while (buf >> w)
|
||||||
|
{
|
||||||
|
Word wrd;
|
||||||
|
this->AddWord().CreateFromString(Input,m_fo,StringPiece(w),false,false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
class TargetPhraseIndexSorter
|
||||||
|
{
|
||||||
|
TargetPhraseCollection const& my_tpc;
|
||||||
|
CompareTargetPhrase cmp;
|
||||||
|
public:
|
||||||
|
TargetPhraseIndexSorter(TargetPhraseCollection const& tpc) : my_tpc(tpc) {}
|
||||||
|
bool operator()(size_t a, size_t b) const
|
||||||
|
{
|
||||||
|
return cmp(*my_tpc[a], *my_tpc[b]);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
int main(int argc, char* argv[])
|
||||||
|
{
|
||||||
|
Parameter params;
|
||||||
|
if (!params.LoadParam(argc,argv) || !StaticData::LoadDataStatic(¶ms, argv[0]))
|
||||||
|
exit(1);
|
||||||
|
|
||||||
|
Mmsapt* PT;
|
||||||
|
BOOST_FOREACH(PhraseDictionary* pd, PhraseDictionary::GetColl())
|
||||||
|
if ((PT = dynamic_cast<Mmsapt*>(pd))) break;
|
||||||
|
|
||||||
|
string line;
|
||||||
|
while (getline(cin,line))
|
||||||
|
{
|
||||||
|
SimplePhrase p; p.init(line);
|
||||||
|
cout << p << endl;
|
||||||
|
TargetPhraseCollection const* trg = PT->GetTargetPhraseCollectionLEGACY(p);
|
||||||
|
if (!trg) continue;
|
||||||
|
vector<size_t> order(trg->GetSize());
|
||||||
|
for (size_t i = 0; i < order.size(); ++i) order[i] = i;
|
||||||
|
sort(order.begin(),order.end(),TargetPhraseIndexSorter(*trg));
|
||||||
|
size_t k = 0;
|
||||||
|
BOOST_FOREACH(size_t i, order)
|
||||||
|
{
|
||||||
|
Phrase const& phr = static_cast<Phrase const&>(*(*trg)[i]);
|
||||||
|
cout << setw(3) << ++k << " " << phr << endl;
|
||||||
|
}
|
||||||
|
PT->Release(trg);
|
||||||
|
}
|
||||||
|
exit(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -131,7 +131,7 @@ interpret_args(int ac, char* av[])
|
|||||||
o.add_options()
|
o.add_options()
|
||||||
("help,h", "print this message")
|
("help,h", "print this message")
|
||||||
("source,s",po::value<string>(&swrd),"source word")
|
("source,s",po::value<string>(&swrd),"source word")
|
||||||
("target,t",po::value<string>(&swrd),"target word")
|
("target,t",po::value<string>(&twrd),"target word")
|
||||||
;
|
;
|
||||||
|
|
||||||
h.add_options()
|
h.add_options()
|
||||||
|
@ -318,10 +318,10 @@ namespace Moses {
|
|||||||
assert(pp.sample1);
|
assert(pp.sample1);
|
||||||
assert(pp.joint);
|
assert(pp.joint);
|
||||||
assert(pp.raw2);
|
assert(pp.raw2);
|
||||||
(*dest)[i] = log(pp.raw1);
|
(*dest)[i] = -log(pp.raw1);
|
||||||
(*dest)[++i] = log(pp.sample1);
|
(*dest)[++i] = -log(pp.sample1);
|
||||||
(*dest)[++i] = log(pp.joint);
|
(*dest)[++i] = +log(pp.joint);
|
||||||
(*dest)[++i] = log(pp.raw2);
|
(*dest)[++i] = -log(pp.raw2);
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -592,6 +592,7 @@ namespace Moses {
|
|||||||
friend class agenda;
|
friend class agenda;
|
||||||
boost::taus88 rnd; // every job has its own pseudo random generator
|
boost::taus88 rnd; // every job has its own pseudo random generator
|
||||||
double rnddenom; // denominator for scaling random sampling
|
double rnddenom; // denominator for scaling random sampling
|
||||||
|
size_t min_diverse; // minimum number of distinct translations
|
||||||
public:
|
public:
|
||||||
size_t workers; // how many workers are working on this job?
|
size_t workers; // how many workers are working on this job?
|
||||||
sptr<TSA<Token> const> root; // root of the underlying suffix array
|
sptr<TSA<Token> const> root; // root of the underlying suffix array
|
||||||
@ -644,34 +645,47 @@ namespace Moses {
|
|||||||
step(uint64_t & sid, uint64_t & offset)
|
step(uint64_t & sid, uint64_t & offset)
|
||||||
{
|
{
|
||||||
boost::lock_guard<boost::mutex> jguard(lock);
|
boost::lock_guard<boost::mutex> jguard(lock);
|
||||||
if ((max_samples == 0) && (next < stop))
|
bool ret = (max_samples == 0) && (next < stop);
|
||||||
|
if (ret)
|
||||||
{
|
{
|
||||||
next = root->readSid(next,stop,sid);
|
next = root->readSid(next,stop,sid);
|
||||||
next = root->readOffset(next,stop,offset);
|
next = root->readOffset(next,stop,offset);
|
||||||
boost::lock_guard<boost::mutex> sguard(stats->lock);
|
boost::lock_guard<boost::mutex> sguard(stats->lock);
|
||||||
if (stats->raw_cnt == ctr) ++stats->raw_cnt;
|
if (stats->raw_cnt == ctr) ++stats->raw_cnt;
|
||||||
stats->sample_cnt++;
|
stats->sample_cnt++;
|
||||||
return true;
|
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
while (next < stop && stats->good < max_samples)
|
while (next < stop && (stats->good < max_samples ||
|
||||||
|
stats->trg.size() < min_diverse))
|
||||||
{
|
{
|
||||||
next = root->readSid(next,stop,sid);
|
next = root->readSid(next,stop,sid);
|
||||||
next = root->readOffset(next,stop,offset);
|
next = root->readOffset(next,stop,offset);
|
||||||
{
|
{ // brackets required for lock scoping; see sguard immediately below
|
||||||
boost::lock_guard<boost::mutex> sguard(stats->lock);
|
boost::lock_guard<boost::mutex> sguard(stats->lock);
|
||||||
if (stats->raw_cnt == ctr) ++stats->raw_cnt;
|
if (stats->raw_cnt == ctr) ++stats->raw_cnt;
|
||||||
size_t rnum = (stats->raw_cnt - ctr++)*(rnd()/(rnd.max()+1.));
|
size_t scalefac = (stats->raw_cnt - ctr++);
|
||||||
|
size_t rnum = scalefac*(rnd()/(rnd.max()+1.));
|
||||||
|
#if 0
|
||||||
|
cerr << rnum << "/" << scalefac << " vs. "
|
||||||
|
<< max_samples - stats->good << " ("
|
||||||
|
<< max_samples << " - " << stats->good << ")"
|
||||||
|
<< endl;
|
||||||
|
#endif
|
||||||
if (rnum < max_samples - stats->good)
|
if (rnum < max_samples - stats->good)
|
||||||
{
|
{
|
||||||
stats->sample_cnt++;
|
stats->sample_cnt++;
|
||||||
return true;
|
ret = true;
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return false;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// boost::lock_guard<boost::mutex> sguard(stats->lock);
|
||||||
|
// abuse of lock for clean output to cerr
|
||||||
|
// cerr << stats->sample_cnt++;
|
||||||
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
template<typename Token>
|
template<typename Token>
|
||||||
@ -713,6 +727,13 @@ namespace Moses {
|
|||||||
worker::
|
worker::
|
||||||
operator()()
|
operator()()
|
||||||
{
|
{
|
||||||
|
// things to do:
|
||||||
|
// - have each worker maintain their own pstats object and merge results at the end;
|
||||||
|
// - ensure the minimum size of samples considered by a non-locked counter that is only
|
||||||
|
// ever incremented -- who cares if we look at more samples than required, as long
|
||||||
|
// as we look at at least the minimum required
|
||||||
|
// This way, we can reduce the number of lock / unlock operations we need to do during
|
||||||
|
// sampling.
|
||||||
size_t s1=0, s2=0, e1=0, e2=0;
|
size_t s1=0, s2=0, e1=0, e2=0;
|
||||||
uint64_t sid=0, offset=0; // of the source phrase
|
uint64_t sid=0, offset=0; // of the source phrase
|
||||||
while(sptr<job> j = ag.get_job())
|
while(sptr<job> j = ag.get_job())
|
||||||
@ -812,6 +833,7 @@ namespace Moses {
|
|||||||
sptr<TSA<Token> > const& r, size_t maxsmpl, bool isfwd)
|
sptr<TSA<Token> > const& r, size_t maxsmpl, bool isfwd)
|
||||||
: rnd(0)
|
: rnd(0)
|
||||||
, rnddenom(rnd.max() + 1.)
|
, rnddenom(rnd.max() + 1.)
|
||||||
|
, min_diverse(10)
|
||||||
, workers(0)
|
, workers(0)
|
||||||
, root(r)
|
, root(r)
|
||||||
, next(m.lower_bound(-1))
|
, next(m.lower_bound(-1))
|
||||||
|
@ -122,16 +122,16 @@ namespace Moses
|
|||||||
if (m != param.end())
|
if (m != param.end())
|
||||||
withPbwd = m->second != "0";
|
withPbwd = m->second != "0";
|
||||||
|
|
||||||
m_default_sample_size = m != param.end() ? atoi(m->second.c_str()) : 1000;
|
|
||||||
|
|
||||||
m = param.find("workers");
|
m = param.find("workers");
|
||||||
m_workers = m != param.end() ? atoi(m->second.c_str()) : 8;
|
m_workers = m != param.end() ? atoi(m->second.c_str()) : 8;
|
||||||
m_workers = min(m_workers,24UL);
|
m_workers = min(m_workers,24UL);
|
||||||
|
|
||||||
|
m = param.find("limit");
|
||||||
|
if (m != param.end()) m_tableLimit = atoi(m->second.c_str());
|
||||||
|
|
||||||
m = param.find("cache-size");
|
m = param.find("cache-size");
|
||||||
m_history.reserve(m != param.end()
|
m_history.reserve(m != param.end()?max(1000,atoi(m->second.c_str())):10000);
|
||||||
? max(1000,atoi(m->second.c_str()))
|
// in plain language: cache size is at least 1000, and 10,000 by default
|
||||||
: 10000);
|
|
||||||
|
|
||||||
this->m_numScoreComponents = atoi(param["num-features"].c_str());
|
this->m_numScoreComponents = atoi(param["num-features"].c_str());
|
||||||
|
|
||||||
@ -196,8 +196,8 @@ namespace Moses
|
|||||||
// currently always active by default; may (should) change later
|
// currently always active by default; may (should) change later
|
||||||
num_feats = calc_lex.init(num_feats, bname + L1 + "-" + L2 + ".lex");
|
num_feats = calc_lex.init(num_feats, bname + L1 + "-" + L2 + ".lex");
|
||||||
|
|
||||||
if (this->m_numScoreComponents%2) // a bit of a hack, for backwards compatibility
|
// if (this->m_numScoreComponents%2) // a bit of a hack, for backwards compatibility
|
||||||
num_feats = apply_pp.init(num_feats);
|
// num_feats = apply_pp.init(num_feats);
|
||||||
|
|
||||||
if (num_feats < this->m_numScoreComponents)
|
if (num_feats < this->m_numScoreComponents)
|
||||||
{
|
{
|
||||||
@ -283,8 +283,8 @@ namespace Moses
|
|||||||
{
|
{
|
||||||
PhrasePair pp;
|
PhrasePair pp;
|
||||||
pp.init(pid1, stats, this->m_numScoreComponents);
|
pp.init(pid1, stats, this->m_numScoreComponents);
|
||||||
if (this->m_numScoreComponents%2)
|
// if (this->m_numScoreComponents%2)
|
||||||
apply_pp(bt,pp);
|
// apply_pp(bt,pp);
|
||||||
pstats::trg_map_t::const_iterator t;
|
pstats::trg_map_t::const_iterator t;
|
||||||
for (t = stats.trg.begin(); t != stats.trg.end(); ++t)
|
for (t = stats.trg.begin(); t != stats.trg.end(); ++t)
|
||||||
{
|
{
|
||||||
@ -318,8 +318,8 @@ namespace Moses
|
|||||||
pp.init(pid1b, *statsb, this->m_numScoreComponents);
|
pp.init(pid1b, *statsb, this->m_numScoreComponents);
|
||||||
else return false; // throw "no stats for pooling available!";
|
else return false; // throw "no stats for pooling available!";
|
||||||
|
|
||||||
if (this->m_numScoreComponents%2)
|
// if (this->m_numScoreComponents%2)
|
||||||
apply_pp(bta,pp);
|
// apply_pp(bta,pp);
|
||||||
pstats::trg_map_t::const_iterator b;
|
pstats::trg_map_t::const_iterator b;
|
||||||
pstats::trg_map_t::iterator a;
|
pstats::trg_map_t::iterator a;
|
||||||
if (statsb)
|
if (statsb)
|
||||||
@ -368,6 +368,13 @@ namespace Moses
|
|||||||
}
|
}
|
||||||
else
|
else
|
||||||
pp.update(a->first,a->second);
|
pp.update(a->first,a->second);
|
||||||
|
#if 0
|
||||||
|
// jstats const& j = a->second;
|
||||||
|
cerr << bta.T1->pid2str(bta.V1.get(),pp.p1) << " ::: "
|
||||||
|
<< bta.T2->pid2str(bta.V2.get(),pp.p2) << endl;
|
||||||
|
cerr << pp.raw1 << " " << pp.sample1 << " " << pp.good1 << " "
|
||||||
|
<< pp.joint << " " << pp.raw2 << endl;
|
||||||
|
#endif
|
||||||
|
|
||||||
UTIL_THROW_IF2(pp.raw2 == 0,
|
UTIL_THROW_IF2(pp.raw2 == 0,
|
||||||
"OOPS"
|
"OOPS"
|
||||||
@ -376,12 +383,6 @@ namespace Moses
|
|||||||
<< pp.raw1 << " " << pp.sample1 << " "
|
<< pp.raw1 << " " << pp.sample1 << " "
|
||||||
<< pp.good1 << " " << pp.joint << " "
|
<< pp.good1 << " " << pp.joint << " "
|
||||||
<< pp.raw2);
|
<< pp.raw2);
|
||||||
#if 0
|
|
||||||
jstats const& j = a->second;
|
|
||||||
cerr << bta.T1->pid2str(bta.V1.get(),pp.p1) << " ::: "
|
|
||||||
<< bta.T2->pid2str(bta.V2.get(),pp.p2) << endl;
|
|
||||||
cerr << j.rcnt() << " " << j.cnt2() << " " << j.wcnt() << endl;
|
|
||||||
#endif
|
|
||||||
calc_lex(bta,pp);
|
calc_lex(bta,pp);
|
||||||
if (withPfwd) calc_pfwd_fix(bta,pp);
|
if (withPfwd) calc_pfwd_fix(bta,pp);
|
||||||
if (withPbwd) calc_pbwd_fix(bta,pp);
|
if (withPbwd) calc_pbwd_fix(bta,pp);
|
||||||
@ -415,8 +416,8 @@ namespace Moses
|
|||||||
if (statsb)
|
if (statsb)
|
||||||
{
|
{
|
||||||
pool.init(pid1b,*statsb,0);
|
pool.init(pid1b,*statsb,0);
|
||||||
if (this->m_numScoreComponents%2)
|
// if (this->m_numScoreComponents%2)
|
||||||
apply_pp(btb,ppdyn);
|
// apply_pp(btb,ppdyn);
|
||||||
for (b = statsb->trg.begin(); b != statsb->trg.end(); ++b)
|
for (b = statsb->trg.begin(); b != statsb->trg.end(); ++b)
|
||||||
{
|
{
|
||||||
ppdyn.update(b->first,b->second);
|
ppdyn.update(b->first,b->second);
|
||||||
@ -456,8 +457,8 @@ namespace Moses
|
|||||||
if (statsa)
|
if (statsa)
|
||||||
{
|
{
|
||||||
pool.init(pid1a,*statsa,0);
|
pool.init(pid1a,*statsa,0);
|
||||||
if (this->m_numScoreComponents%2)
|
// if (this->m_numScoreComponents%2)
|
||||||
apply_pp(bta,ppfix);
|
// apply_pp(bta,ppfix);
|
||||||
for (a = statsa->trg.begin(); a != statsa->trg.end(); ++a)
|
for (a = statsa->trg.begin(); a != statsa->trg.end(); ++a)
|
||||||
{
|
{
|
||||||
if (!a->second.valid()) continue; // done above
|
if (!a->second.valid()) continue; // done above
|
||||||
@ -662,7 +663,7 @@ namespace Moses
|
|||||||
|| combine_pstats(src, mfix.getPid(),sfix.get(),btfix,
|
|| combine_pstats(src, mfix.getPid(),sfix.get(),btfix,
|
||||||
mdyn.getPid(),sdyn.get(),*dyn,ret))
|
mdyn.getPid(),sdyn.get(),*dyn,ret))
|
||||||
{
|
{
|
||||||
ret->NthElement(m_tableLimit);
|
if (m_tableLimit) ret->Prune(true,m_tableLimit);
|
||||||
#if 0
|
#if 0
|
||||||
sort(ret->begin(), ret->end(), CompareTargetPhrase());
|
sort(ret->begin(), ret->end(), CompareTargetPhrase());
|
||||||
cout << "SOURCE PHRASE: " << src << endl;
|
cout << "SOURCE PHRASE: " << src << endl;
|
||||||
@ -683,6 +684,14 @@ namespace Moses
|
|||||||
return encache(ret);
|
return encache(ret);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
size_t
|
||||||
|
Mmsapt::
|
||||||
|
SetTableLimit(size_t limit)
|
||||||
|
{
|
||||||
|
std::swap(m_tableLimit,limit);
|
||||||
|
return limit;
|
||||||
|
}
|
||||||
|
|
||||||
void
|
void
|
||||||
Mmsapt::
|
Mmsapt::
|
||||||
CleanUpAfterSentenceProcessing(const InputType& source)
|
CleanUpAfterSentenceProcessing(const InputType& source)
|
||||||
|
@ -71,7 +71,7 @@ namespace Moses
|
|||||||
PScorePfwd<Token> calc_pfwd_fix, calc_pfwd_dyn;
|
PScorePfwd<Token> calc_pfwd_fix, calc_pfwd_dyn;
|
||||||
PScorePbwd<Token> calc_pbwd_fix, calc_pbwd_dyn;
|
PScorePbwd<Token> calc_pbwd_fix, calc_pbwd_dyn;
|
||||||
PScoreLex<Token> calc_lex; // this one I'd like to see as an external ff eventually
|
PScoreLex<Token> calc_lex; // this one I'd like to see as an external ff eventually
|
||||||
PScorePP<Token> apply_pp; // apply phrase penalty
|
// PScorePP<Token> apply_pp; // apply phrase penalty
|
||||||
PScoreLogCounts<Token> add_logcounts_fix;
|
PScoreLogCounts<Token> add_logcounts_fix;
|
||||||
PScoreLogCounts<Token> add_logcounts_dyn;
|
PScoreLogCounts<Token> add_logcounts_dyn;
|
||||||
void init(string const& line);
|
void init(string const& line);
|
||||||
@ -168,6 +168,9 @@ namespace Moses
|
|||||||
void
|
void
|
||||||
Load();
|
Load();
|
||||||
|
|
||||||
|
// returns the prior table limit
|
||||||
|
size_t SetTableLimit(size_t limit);
|
||||||
|
|
||||||
#ifndef NO_MOSES
|
#ifndef NO_MOSES
|
||||||
TargetPhraseCollection const*
|
TargetPhraseCollection const*
|
||||||
GetTargetPhraseCollectionLEGACY(const Phrase& src) const;
|
GetTargetPhraseCollectionLEGACY(const Phrase& src) const;
|
||||||
|
@ -413,11 +413,9 @@ void FuzzyMatchWrapper::load_corpus( const std::string &fileName, vector< vector
|
|||||||
|
|
||||||
istream *fileStreamP = &fileStream;
|
istream *fileStreamP = &fileStream;
|
||||||
|
|
||||||
char line[LINE_MAX_LENGTH];
|
string line;
|
||||||
while(true) {
|
while(getline(*fileStreamP, line)) {
|
||||||
SAFE_GETLINE((*fileStreamP), line, LINE_MAX_LENGTH, '\n');
|
corpus.push_back( GetVocabulary().Tokenize( line.c_str() ) );
|
||||||
if (fileStreamP->eof()) break;
|
|
||||||
corpus.push_back( GetVocabulary().Tokenize( line ) );
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -436,12 +434,9 @@ void FuzzyMatchWrapper::load_target(const std::string &fileName, vector< vector<
|
|||||||
WORD_ID delimiter = GetVocabulary().StoreIfNew("|||");
|
WORD_ID delimiter = GetVocabulary().StoreIfNew("|||");
|
||||||
|
|
||||||
int lineNum = 0;
|
int lineNum = 0;
|
||||||
char line[LINE_MAX_LENGTH];
|
string line;
|
||||||
while(true) {
|
while(getline(*fileStreamP, line)) {
|
||||||
SAFE_GETLINE((*fileStreamP), line, LINE_MAX_LENGTH, '\n');
|
vector<WORD_ID> toks = GetVocabulary().Tokenize( line.c_str() );
|
||||||
if (fileStreamP->eof()) break;
|
|
||||||
|
|
||||||
vector<WORD_ID> toks = GetVocabulary().Tokenize( line );
|
|
||||||
|
|
||||||
corpus.push_back(vector< SentenceAlignment >());
|
corpus.push_back(vector< SentenceAlignment >());
|
||||||
vector< SentenceAlignment > &vec = corpus.back();
|
vector< SentenceAlignment > &vec = corpus.back();
|
||||||
@ -493,11 +488,8 @@ void FuzzyMatchWrapper::load_alignment(const std::string &fileName, vector< vect
|
|||||||
string delimiter = "|||";
|
string delimiter = "|||";
|
||||||
|
|
||||||
int lineNum = 0;
|
int lineNum = 0;
|
||||||
char line[LINE_MAX_LENGTH];
|
string line;
|
||||||
while(true) {
|
while(getline(*fileStreamP, line)) {
|
||||||
SAFE_GETLINE((*fileStreamP), line, LINE_MAX_LENGTH, '\n');
|
|
||||||
if (fileStreamP->eof()) break;
|
|
||||||
|
|
||||||
vector< SentenceAlignment > &vec = corpus[lineNum];
|
vector< SentenceAlignment > &vec = corpus[lineNum];
|
||||||
size_t targetInd = 0;
|
size_t targetInd = 0;
|
||||||
SentenceAlignment *sentence = &vec[targetInd];
|
SentenceAlignment *sentence = &vec[targetInd];
|
||||||
|
@ -14,17 +14,16 @@ SuffixArray::SuffixArray( string fileName )
|
|||||||
m_endOfSentence = m_vcb.StoreIfNew( "<s>" );
|
m_endOfSentence = m_vcb.StoreIfNew( "<s>" );
|
||||||
|
|
||||||
ifstream extractFile;
|
ifstream extractFile;
|
||||||
char line[LINE_MAX_LENGTH];
|
|
||||||
|
|
||||||
// count the number of words first;
|
// count the number of words first;
|
||||||
extractFile.open(fileName.c_str());
|
extractFile.open(fileName.c_str());
|
||||||
istream *fileP = &extractFile;
|
istream *fileP = &extractFile;
|
||||||
m_size = 0;
|
m_size = 0;
|
||||||
size_t sentenceCount = 0;
|
size_t sentenceCount = 0;
|
||||||
while(!fileP->eof()) {
|
string line;
|
||||||
SAFE_GETLINE((*fileP), line, LINE_MAX_LENGTH, '\n');
|
while(getline(*fileP, line)) {
|
||||||
if (fileP->eof()) break;
|
|
||||||
vector< WORD_ID > words = m_vcb.Tokenize( line );
|
vector< WORD_ID > words = m_vcb.Tokenize( line.c_str() );
|
||||||
m_size += words.size() + 1;
|
m_size += words.size() + 1;
|
||||||
sentenceCount++;
|
sentenceCount++;
|
||||||
}
|
}
|
||||||
@ -43,10 +42,8 @@ SuffixArray::SuffixArray( string fileName )
|
|||||||
int sentenceId = 0;
|
int sentenceId = 0;
|
||||||
extractFile.open(fileName.c_str());
|
extractFile.open(fileName.c_str());
|
||||||
fileP = &extractFile;
|
fileP = &extractFile;
|
||||||
while(!fileP->eof()) {
|
while(getline(*fileP, line)) {
|
||||||
SAFE_GETLINE((*fileP), line, LINE_MAX_LENGTH, '\n');
|
vector< WORD_ID > words = m_vcb.Tokenize( line.c_str() );
|
||||||
if (fileP->eof()) break;
|
|
||||||
vector< WORD_ID > words = m_vcb.Tokenize( line );
|
|
||||||
|
|
||||||
// add to corpus vector
|
// add to corpus vector
|
||||||
corpus.push_back(words);
|
corpus.push_back(words);
|
||||||
|
@ -17,20 +17,6 @@
|
|||||||
|
|
||||||
namespace tmmt
|
namespace tmmt
|
||||||
{
|
{
|
||||||
|
|
||||||
#define MAX_LENGTH 10000
|
|
||||||
|
|
||||||
#define SAFE_GETLINE(_IS, _LINE, _SIZE, _DELIM) { \
|
|
||||||
_IS.getline(_LINE, _SIZE, _DELIM); \
|
|
||||||
if(_IS.fail() && !_IS.bad() && !_IS.eof()) _IS.clear(); \
|
|
||||||
if (_IS.gcount() == _SIZE-1) { \
|
|
||||||
cerr << "Line too long! Buffer overflow. Delete lines >=" \
|
|
||||||
<< _SIZE << " chars or raise MAX_LENGTH in phrase-extract/tables-core.cpp" \
|
|
||||||
<< endl; \
|
|
||||||
exit(1); \
|
|
||||||
} \
|
|
||||||
}
|
|
||||||
|
|
||||||
typedef std::string WORD;
|
typedef std::string WORD;
|
||||||
typedef unsigned int WORD_ID;
|
typedef unsigned int WORD_ID;
|
||||||
|
|
||||||
|
@ -2,9 +2,6 @@
|
|||||||
#include "ExtractionPhrasePair.h"
|
#include "ExtractionPhrasePair.h"
|
||||||
#include "tables-core.h"
|
#include "tables-core.h"
|
||||||
#include "InputFileStream.h"
|
#include "InputFileStream.h"
|
||||||
#include "SafeGetline.h"
|
|
||||||
|
|
||||||
#define TABLE_LINE_MAX_LENGTH 1000
|
|
||||||
|
|
||||||
using namespace std;
|
using namespace std;
|
||||||
|
|
||||||
@ -16,12 +13,11 @@ void Domain::load( const std::string &domainFileName )
|
|||||||
{
|
{
|
||||||
Moses::InputFileStream fileS( domainFileName );
|
Moses::InputFileStream fileS( domainFileName );
|
||||||
istream *fileP = &fileS;
|
istream *fileP = &fileS;
|
||||||
while(true) {
|
|
||||||
char line[TABLE_LINE_MAX_LENGTH];
|
string line;
|
||||||
SAFE_GETLINE((*fileP), line, TABLE_LINE_MAX_LENGTH, '\n', __FILE__);
|
while(getline(*fileP, line)) {
|
||||||
if (fileP->eof()) break;
|
|
||||||
// read
|
// read
|
||||||
vector< string > domainSpecLine = tokenize( line );
|
vector< string > domainSpecLine = tokenize( line.c_str() );
|
||||||
int lineNumber;
|
int lineNumber;
|
||||||
if (domainSpecLine.size() != 2 ||
|
if (domainSpecLine.size() != 2 ||
|
||||||
! sscanf(domainSpecLine[0].c_str(), "%d", &lineNumber)) {
|
! sscanf(domainSpecLine[0].c_str(), "%d", &lineNumber)) {
|
||||||
|
@ -19,7 +19,6 @@
|
|||||||
|
|
||||||
#include <sstream>
|
#include <sstream>
|
||||||
#include "ExtractionPhrasePair.h"
|
#include "ExtractionPhrasePair.h"
|
||||||
#include "SafeGetline.h"
|
|
||||||
#include "tables-core.h"
|
#include "tables-core.h"
|
||||||
#include "score.h"
|
#include "score.h"
|
||||||
#include "moses/Util.h"
|
#include "moses/Util.h"
|
||||||
|
@ -1,35 +0,0 @@
|
|||||||
/***********************************************************************
|
|
||||||
Moses - factored phrase-based language decoder
|
|
||||||
Copyright (C) 2010 University of Edinburgh
|
|
||||||
|
|
||||||
This library is free software; you can redistribute it and/or
|
|
||||||
modify it under the terms of the GNU Lesser General Public
|
|
||||||
License as published by the Free Software Foundation; either
|
|
||||||
version 2.1 of the License, or (at your option) any later version.
|
|
||||||
|
|
||||||
This library is distributed in the hope that it will be useful,
|
|
||||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
||||||
Lesser General Public License for more details.
|
|
||||||
|
|
||||||
You should have received a copy of the GNU Lesser General Public
|
|
||||||
License along with this library; if not, write to the Free Software
|
|
||||||
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
|
||||||
***********************************************************************/
|
|
||||||
|
|
||||||
#pragma once
|
|
||||||
#ifndef SAFE_GETLINE_INCLUDED_
|
|
||||||
#define SAFE_GETLINE_INCLUDED_
|
|
||||||
|
|
||||||
#define SAFE_GETLINE(_IS, _LINE, _SIZE, _DELIM, _FILE) { \
|
|
||||||
_IS.getline(_LINE, _SIZE, _DELIM); \
|
|
||||||
if(_IS.fail() && !_IS.bad() && !_IS.eof()) _IS.clear(); \
|
|
||||||
if (_IS.gcount() == _SIZE-1) { \
|
|
||||||
cerr << "Line too long! Buffer overflow. Delete lines >=" \
|
|
||||||
<< _SIZE << " chars or raise LINE_MAX_LENGTH in " << _FILE \
|
|
||||||
<< endl; \
|
|
||||||
exit(1); \
|
|
||||||
} \
|
|
||||||
}
|
|
||||||
|
|
||||||
#endif
|
|
@ -54,7 +54,11 @@ bool SentenceAlignment::processSourceSentence(const char * sourceString, int, bo
|
|||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
bool SentenceAlignment::create( char targetString[], char sourceString[], char alignmentString[], char weightString[], int sentenceID, bool boundaryRules)
|
bool SentenceAlignment::create(const char targetString[],
|
||||||
|
const char sourceString[],
|
||||||
|
const char alignmentString[],
|
||||||
|
const char weightString[],
|
||||||
|
int sentenceID, bool boundaryRules)
|
||||||
{
|
{
|
||||||
using namespace std;
|
using namespace std;
|
||||||
this->sentenceID = sentenceID;
|
this->sentenceID = sentenceID;
|
||||||
|
@ -43,8 +43,11 @@ public:
|
|||||||
|
|
||||||
virtual bool processSourceSentence(const char *, int, bool boundaryRules);
|
virtual bool processSourceSentence(const char *, int, bool boundaryRules);
|
||||||
|
|
||||||
bool create(char targetString[], char sourceString[],
|
bool create(const char targetString[],
|
||||||
char alignmentString[], char weightString[], int sentenceID, bool boundaryRules);
|
const char sourceString[],
|
||||||
|
const char alignmentString[],
|
||||||
|
const char weightString[],
|
||||||
|
int sentenceID, bool boundaryRules);
|
||||||
|
|
||||||
void invertAlignment();
|
void invertAlignment();
|
||||||
|
|
||||||
|
@ -26,16 +26,9 @@
|
|||||||
#include "InputFileStream.h"
|
#include "InputFileStream.h"
|
||||||
#include "OutputFileStream.h"
|
#include "OutputFileStream.h"
|
||||||
|
|
||||||
#include "SafeGetline.h"
|
|
||||||
|
|
||||||
#define LINE_MAX_LENGTH 10000
|
|
||||||
|
|
||||||
using namespace std;
|
using namespace std;
|
||||||
|
|
||||||
char line[LINE_MAX_LENGTH];
|
vector< string > splitLine(const char *line)
|
||||||
|
|
||||||
|
|
||||||
vector< string > splitLine()
|
|
||||||
{
|
{
|
||||||
vector< string > item;
|
vector< string > item;
|
||||||
int start=0;
|
int start=0;
|
||||||
@ -62,13 +55,14 @@ bool getLine( istream &fileP, vector< string > &item )
|
|||||||
if (fileP.eof())
|
if (fileP.eof())
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
SAFE_GETLINE((fileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
string line;
|
||||||
if (fileP.eof())
|
if (getline(fileP, line)) {
|
||||||
|
item = splitLine(line.c_str());
|
||||||
return false;
|
return false;
|
||||||
|
}
|
||||||
item = splitLine();
|
else {
|
||||||
|
return false;
|
||||||
return true;
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@ -26,12 +26,9 @@
|
|||||||
#include <cstring>
|
#include <cstring>
|
||||||
|
|
||||||
#include "tables-core.h"
|
#include "tables-core.h"
|
||||||
#include "SafeGetline.h"
|
|
||||||
#include "InputFileStream.h"
|
#include "InputFileStream.h"
|
||||||
#include "OutputFileStream.h"
|
#include "OutputFileStream.h"
|
||||||
|
|
||||||
#define LINE_MAX_LENGTH 10000
|
|
||||||
|
|
||||||
using namespace std;
|
using namespace std;
|
||||||
|
|
||||||
bool hierarchicalFlag = false;
|
bool hierarchicalFlag = false;
|
||||||
@ -46,12 +43,11 @@ inline float maybeLogProb( float a )
|
|||||||
return logProbFlag ? log(a) : a;
|
return logProbFlag ? log(a) : a;
|
||||||
}
|
}
|
||||||
|
|
||||||
char line[LINE_MAX_LENGTH];
|
|
||||||
void processFiles( char*, char*, char*, char* );
|
void processFiles( char*, char*, char*, char* );
|
||||||
void loadCountOfCounts( char* );
|
void loadCountOfCounts( char* );
|
||||||
void breakdownCoreAndSparse( string combined, string &core, string &sparse );
|
void breakdownCoreAndSparse( string combined, string &core, string &sparse );
|
||||||
bool getLine( istream &fileP, vector< string > &item );
|
bool getLine( istream &fileP, vector< string > &item );
|
||||||
vector< string > splitLine();
|
vector< string > splitLine(const char *line);
|
||||||
vector< int > countBin;
|
vector< int > countBin;
|
||||||
bool sparseCountBinFeatureFlag = false;
|
bool sparseCountBinFeatureFlag = false;
|
||||||
|
|
||||||
@ -140,14 +136,13 @@ void loadCountOfCounts( char* fileNameCountOfCounts )
|
|||||||
istream &fileP = fileCountOfCounts;
|
istream &fileP = fileCountOfCounts;
|
||||||
|
|
||||||
countOfCounts.push_back(0.0);
|
countOfCounts.push_back(0.0);
|
||||||
while(1) {
|
|
||||||
if (fileP.eof()) break;
|
string line;
|
||||||
SAFE_GETLINE((fileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
while (getline(fileP, line)) {
|
||||||
if (fileP.eof()) break;
|
|
||||||
if (totalCount < 0)
|
if (totalCount < 0)
|
||||||
totalCount = atof(line); // total number of distinct phrase pairs
|
totalCount = atof(line.c_str()); // total number of distinct phrase pairs
|
||||||
else
|
else
|
||||||
countOfCounts.push_back( atof(line) );
|
countOfCounts.push_back( atof(line.c_str()) );
|
||||||
}
|
}
|
||||||
fileCountOfCounts.Close();
|
fileCountOfCounts.Close();
|
||||||
|
|
||||||
@ -370,16 +365,16 @@ bool getLine( istream &fileP, vector< string > &item )
|
|||||||
if (fileP.eof())
|
if (fileP.eof())
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
SAFE_GETLINE((fileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
string line;
|
||||||
if (fileP.eof())
|
if (!getline(fileP, line))
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
item = splitLine();
|
item = splitLine(line.c_str());
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
vector< string > splitLine()
|
vector< string > splitLine(const char *line)
|
||||||
{
|
{
|
||||||
vector< string > item;
|
vector< string > item;
|
||||||
int start=0;
|
int start=0;
|
||||||
|
@ -27,23 +27,19 @@
|
|||||||
#include <cstring>
|
#include <cstring>
|
||||||
|
|
||||||
#include "tables-core.h"
|
#include "tables-core.h"
|
||||||
#include "SafeGetline.h"
|
|
||||||
#include "InputFileStream.h"
|
#include "InputFileStream.h"
|
||||||
|
|
||||||
#define LINE_MAX_LENGTH 10000
|
|
||||||
|
|
||||||
using namespace std;
|
using namespace std;
|
||||||
|
|
||||||
bool hierarchicalFlag = false;
|
bool hierarchicalFlag = false;
|
||||||
bool onlyDirectFlag = false;
|
bool onlyDirectFlag = false;
|
||||||
bool phraseCountFlag = true;
|
bool phraseCountFlag = true;
|
||||||
bool logProbFlag = false;
|
bool logProbFlag = false;
|
||||||
char line[LINE_MAX_LENGTH];
|
|
||||||
|
|
||||||
void processFiles( char*, char*, char* );
|
void processFiles( char*, char*, char* );
|
||||||
bool getLine( istream &fileP, vector< string > &item );
|
bool getLine( istream &fileP, vector< string > &item );
|
||||||
string reverseAlignment(const string &alignments);
|
string reverseAlignment(const string &alignments);
|
||||||
vector< string > splitLine();
|
vector< string > splitLine(const char *lin);
|
||||||
|
|
||||||
inline void Tokenize(std::vector<std::string> &output
|
inline void Tokenize(std::vector<std::string> &output
|
||||||
, const std::string& str
|
, const std::string& str
|
||||||
@ -191,16 +187,17 @@ bool getLine( istream &fileP, vector< string > &item )
|
|||||||
if (fileP.eof())
|
if (fileP.eof())
|
||||||
return false;
|
return false;
|
||||||
|
|
||||||
SAFE_GETLINE((fileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
string line;
|
||||||
if (fileP.eof())
|
if (getline(fileP, line)) {
|
||||||
|
item = splitLine(line.c_str());
|
||||||
return false;
|
return false;
|
||||||
|
}
|
||||||
item = splitLine();
|
else {
|
||||||
|
return false;
|
||||||
return true;
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
vector< string > splitLine()
|
vector< string > splitLine(const char *line)
|
||||||
{
|
{
|
||||||
vector< string > item;
|
vector< string > item;
|
||||||
bool betweenWords = true;
|
bool betweenWords = true;
|
||||||
|
@ -19,7 +19,6 @@
|
|||||||
#include <set>
|
#include <set>
|
||||||
#include <vector>
|
#include <vector>
|
||||||
|
|
||||||
#include "SafeGetline.h"
|
|
||||||
#include "SentenceAlignment.h"
|
#include "SentenceAlignment.h"
|
||||||
#include "tables-core.h"
|
#include "tables-core.h"
|
||||||
#include "InputFileStream.h"
|
#include "InputFileStream.h"
|
||||||
@ -32,10 +31,6 @@ using namespace MosesTraining;
|
|||||||
namespace MosesTraining
|
namespace MosesTraining
|
||||||
{
|
{
|
||||||
|
|
||||||
|
|
||||||
const long int LINE_MAX_LENGTH = 500000 ;
|
|
||||||
|
|
||||||
|
|
||||||
// HPhraseVertex represents a point in the alignment matrix
|
// HPhraseVertex represents a point in the alignment matrix
|
||||||
typedef pair <int, int> HPhraseVertex;
|
typedef pair <int, int> HPhraseVertex;
|
||||||
|
|
||||||
@ -277,20 +272,18 @@ int main(int argc, char* argv[])
|
|||||||
|
|
||||||
int i = sentenceOffset;
|
int i = sentenceOffset;
|
||||||
|
|
||||||
while(true) {
|
string englishString, foreignString, alignmentString, weightString;
|
||||||
|
|
||||||
|
while(getline(*eFileP, englishString)) {
|
||||||
i++;
|
i++;
|
||||||
if (i%10000 == 0) cerr << "." << flush;
|
if (i%10000 == 0) cerr << "." << flush;
|
||||||
char englishString[LINE_MAX_LENGTH];
|
|
||||||
char foreignString[LINE_MAX_LENGTH];
|
getline(*fFileP, foreignString);
|
||||||
char alignmentString[LINE_MAX_LENGTH];
|
getline(*aFileP, alignmentString);
|
||||||
char weightString[LINE_MAX_LENGTH];
|
|
||||||
SAFE_GETLINE((*eFileP), englishString, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
if (eFileP->eof()) break;
|
|
||||||
SAFE_GETLINE((*fFileP), foreignString, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
SAFE_GETLINE((*aFileP), alignmentString, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
if (iwFileP) {
|
if (iwFileP) {
|
||||||
SAFE_GETLINE((*iwFileP), weightString, LINE_MAX_LENGTH, '\n', __FILE__);
|
getline(*iwFileP, weightString);
|
||||||
}
|
}
|
||||||
|
|
||||||
SentenceAlignment sentence;
|
SentenceAlignment sentence;
|
||||||
// cout << "read in: " << englishString << " & " << foreignString << " & " << alignmentString << endl;
|
// cout << "read in: " << englishString << " & " << foreignString << " & " << alignmentString << endl;
|
||||||
//az: output src, tgt, and alingment line
|
//az: output src, tgt, and alingment line
|
||||||
@ -300,7 +293,11 @@ int main(int argc, char* argv[])
|
|||||||
cout << "LOG: ALT: " << alignmentString << endl;
|
cout << "LOG: ALT: " << alignmentString << endl;
|
||||||
cout << "LOG: PHRASES_BEGIN:" << endl;
|
cout << "LOG: PHRASES_BEGIN:" << endl;
|
||||||
}
|
}
|
||||||
if (sentence.create( englishString, foreignString, alignmentString, weightString, i, false)) {
|
if (sentence.create( englishString.c_str(),
|
||||||
|
foreignString.c_str(),
|
||||||
|
alignmentString.c_str(),
|
||||||
|
weightString.c_str(),
|
||||||
|
i, false)) {
|
||||||
if (options.placeholders.size()) {
|
if (options.placeholders.size()) {
|
||||||
sentence.invertAlignment();
|
sentence.invertAlignment();
|
||||||
}
|
}
|
||||||
|
@ -19,7 +19,6 @@
|
|||||||
#include <set>
|
#include <set>
|
||||||
#include <vector>
|
#include <vector>
|
||||||
|
|
||||||
#include "SafeGetline.h"
|
|
||||||
#include "SentenceAlignment.h"
|
#include "SentenceAlignment.h"
|
||||||
#include "tables-core.h"
|
#include "tables-core.h"
|
||||||
#include "InputFileStream.h"
|
#include "InputFileStream.h"
|
||||||
@ -32,10 +31,6 @@ using namespace MosesTraining;
|
|||||||
namespace MosesTraining
|
namespace MosesTraining
|
||||||
{
|
{
|
||||||
|
|
||||||
|
|
||||||
const long int LINE_MAX_LENGTH = 500000 ;
|
|
||||||
|
|
||||||
|
|
||||||
// HPhraseVertex represents a point in the alignment matrix
|
// HPhraseVertex represents a point in the alignment matrix
|
||||||
typedef pair <int, int> HPhraseVertex;
|
typedef pair <int, int> HPhraseVertex;
|
||||||
|
|
||||||
@ -246,20 +241,20 @@ int main(int argc, char* argv[])
|
|||||||
|
|
||||||
int i = sentenceOffset;
|
int i = sentenceOffset;
|
||||||
|
|
||||||
while(true) {
|
string englishString, foreignString, alignmentString, weightString;
|
||||||
|
|
||||||
|
while(getline(*eFileP, englishString)) {
|
||||||
i++;
|
i++;
|
||||||
if (i%10000 == 0) cerr << "." << flush;
|
|
||||||
char englishString[LINE_MAX_LENGTH];
|
getline(*eFileP, englishString);
|
||||||
char foreignString[LINE_MAX_LENGTH];
|
getline(*fFileP, foreignString);
|
||||||
char alignmentString[LINE_MAX_LENGTH];
|
getline(*aFileP, alignmentString);
|
||||||
char weightString[LINE_MAX_LENGTH];
|
|
||||||
SAFE_GETLINE((*eFileP), englishString, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
if (eFileP->eof()) break;
|
|
||||||
SAFE_GETLINE((*fFileP), foreignString, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
SAFE_GETLINE((*aFileP), alignmentString, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
if (iwFileP) {
|
if (iwFileP) {
|
||||||
SAFE_GETLINE((*iwFileP), weightString, LINE_MAX_LENGTH, '\n', __FILE__);
|
getline(*iwFileP, weightString);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (i%10000 == 0) cerr << "." << flush;
|
||||||
|
|
||||||
SentenceAlignment sentence;
|
SentenceAlignment sentence;
|
||||||
// cout << "read in: " << englishString << " & " << foreignString << " & " << alignmentString << endl;
|
// cout << "read in: " << englishString << " & " << foreignString << " & " << alignmentString << endl;
|
||||||
//az: output src, tgt, and alingment line
|
//az: output src, tgt, and alingment line
|
||||||
@ -269,7 +264,7 @@ int main(int argc, char* argv[])
|
|||||||
cout << "LOG: ALT: " << alignmentString << endl;
|
cout << "LOG: ALT: " << alignmentString << endl;
|
||||||
cout << "LOG: PHRASES_BEGIN:" << endl;
|
cout << "LOG: PHRASES_BEGIN:" << endl;
|
||||||
}
|
}
|
||||||
if (sentence.create( englishString, foreignString, alignmentString, weightString, i, false)) {
|
if (sentence.create( englishString.c_str(), foreignString.c_str(), alignmentString.c_str(), weightString.c_str(), i, false)) {
|
||||||
ExtractTask *task = new ExtractTask(i-1, sentence, options, extractFileOrientation);
|
ExtractTask *task = new ExtractTask(i-1, sentence, options, extractFileOrientation);
|
||||||
task->Run();
|
task->Run();
|
||||||
delete task;
|
delete task;
|
||||||
|
@ -39,7 +39,6 @@
|
|||||||
#include "Hole.h"
|
#include "Hole.h"
|
||||||
#include "HoleCollection.h"
|
#include "HoleCollection.h"
|
||||||
#include "RuleExist.h"
|
#include "RuleExist.h"
|
||||||
#include "SafeGetline.h"
|
|
||||||
#include "SentenceAlignmentWithSyntax.h"
|
#include "SentenceAlignmentWithSyntax.h"
|
||||||
#include "SyntaxTree.h"
|
#include "SyntaxTree.h"
|
||||||
#include "tables-core.h"
|
#include "tables-core.h"
|
||||||
@ -47,8 +46,6 @@
|
|||||||
#include "InputFileStream.h"
|
#include "InputFileStream.h"
|
||||||
#include "OutputFileStream.h"
|
#include "OutputFileStream.h"
|
||||||
|
|
||||||
#define LINE_MAX_LENGTH 500000
|
|
||||||
|
|
||||||
using namespace std;
|
using namespace std;
|
||||||
using namespace MosesTraining;
|
using namespace MosesTraining;
|
||||||
|
|
||||||
@ -326,17 +323,15 @@ int main(int argc, char* argv[])
|
|||||||
|
|
||||||
// loop through all sentence pairs
|
// loop through all sentence pairs
|
||||||
size_t i=sentenceOffset;
|
size_t i=sentenceOffset;
|
||||||
while(true) {
|
string targetString, sourceString, alignmentString;
|
||||||
i++;
|
|
||||||
if (i%1000 == 0) cerr << i << " " << flush;
|
|
||||||
|
|
||||||
char targetString[LINE_MAX_LENGTH];
|
while(getline(*tFileP, targetString)) {
|
||||||
char sourceString[LINE_MAX_LENGTH];
|
i++;
|
||||||
char alignmentString[LINE_MAX_LENGTH];
|
|
||||||
SAFE_GETLINE((*tFileP), targetString, LINE_MAX_LENGTH, '\n', __FILE__);
|
getline(*sFileP, sourceString);
|
||||||
if (tFileP->eof()) break;
|
getline(*aFileP, alignmentString);
|
||||||
SAFE_GETLINE((*sFileP), sourceString, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
SAFE_GETLINE((*aFileP), alignmentString, LINE_MAX_LENGTH, '\n', __FILE__);
|
if (i%1000 == 0) cerr << i << " " << flush;
|
||||||
|
|
||||||
SentenceAlignmentWithSyntax sentence
|
SentenceAlignmentWithSyntax sentence
|
||||||
(targetLabelCollection, sourceLabelCollection,
|
(targetLabelCollection, sourceLabelCollection,
|
||||||
@ -349,7 +344,7 @@ int main(int argc, char* argv[])
|
|||||||
cout << "LOG: PHRASES_BEGIN:" << endl;
|
cout << "LOG: PHRASES_BEGIN:" << endl;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (sentence.create(targetString, sourceString, alignmentString,"", i, options.boundaryRules)) {
|
if (sentence.create(targetString.c_str(), sourceString.c_str(), alignmentString.c_str(),"", i, options.boundaryRules)) {
|
||||||
if (options.unknownWordLabelFlag) {
|
if (options.unknownWordLabelFlag) {
|
||||||
collectWordLabelCounts(sentence);
|
collectWordLabelCounts(sentence);
|
||||||
}
|
}
|
||||||
|
@ -20,8 +20,6 @@
|
|||||||
***********************************************************************/
|
***********************************************************************/
|
||||||
|
|
||||||
#include "relax-parse.h"
|
#include "relax-parse.h"
|
||||||
|
|
||||||
#include "SafeGetline.h"
|
|
||||||
#include "tables-core.h"
|
#include "tables-core.h"
|
||||||
|
|
||||||
using namespace std;
|
using namespace std;
|
||||||
@ -33,17 +31,13 @@ int main(int argc, char* argv[])
|
|||||||
|
|
||||||
// loop through all sentences
|
// loop through all sentences
|
||||||
int i=0;
|
int i=0;
|
||||||
char inBuffer[LINE_MAX_LENGTH];
|
string inBuffer;
|
||||||
while(true) {
|
while(getline(cin, inBuffer)) {
|
||||||
i++;
|
i++;
|
||||||
if (i%1000 == 0) cerr << "." << flush;
|
if (i%1000 == 0) cerr << "." << flush;
|
||||||
if (i%10000 == 0) cerr << ":" << flush;
|
if (i%10000 == 0) cerr << ":" << flush;
|
||||||
if (i%100000 == 0) cerr << "!" << flush;
|
if (i%100000 == 0) cerr << "!" << flush;
|
||||||
|
|
||||||
// get line from stdin
|
|
||||||
SAFE_GETLINE( cin, inBuffer, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
if (cin.eof()) break;
|
|
||||||
|
|
||||||
// process into syntax tree representation
|
// process into syntax tree representation
|
||||||
string inBufferString = string( inBuffer );
|
string inBufferString = string( inBuffer );
|
||||||
set< string > labelCollection; // set of labels, not used
|
set< string > labelCollection; // set of labels, not used
|
||||||
|
@ -29,7 +29,6 @@
|
|||||||
#include <vector>
|
#include <vector>
|
||||||
#include <algorithm>
|
#include <algorithm>
|
||||||
|
|
||||||
#include "SafeGetline.h"
|
|
||||||
#include "ScoreFeature.h"
|
#include "ScoreFeature.h"
|
||||||
#include "tables-core.h"
|
#include "tables-core.h"
|
||||||
#include "ExtractionPhrasePair.h"
|
#include "ExtractionPhrasePair.h"
|
||||||
@ -40,8 +39,6 @@
|
|||||||
using namespace std;
|
using namespace std;
|
||||||
using namespace MosesTraining;
|
using namespace MosesTraining;
|
||||||
|
|
||||||
#define LINE_MAX_LENGTH 100000
|
|
||||||
|
|
||||||
namespace MosesTraining
|
namespace MosesTraining
|
||||||
{
|
{
|
||||||
LexicalTable lexTable;
|
LexicalTable lexTable;
|
||||||
@ -232,7 +229,7 @@ int main(int argc, char* argv[])
|
|||||||
}
|
}
|
||||||
|
|
||||||
// loop through all extracted phrase translations
|
// loop through all extracted phrase translations
|
||||||
char line[LINE_MAX_LENGTH], lastLine[LINE_MAX_LENGTH];
|
string line, lastLine;
|
||||||
lastLine[0] = '\0';
|
lastLine[0] = '\0';
|
||||||
ExtractionPhrasePair *phrasePair = NULL;
|
ExtractionPhrasePair *phrasePair = NULL;
|
||||||
std::vector< ExtractionPhrasePair* > phrasePairsWithSameSource;
|
std::vector< ExtractionPhrasePair* > phrasePairsWithSameSource;
|
||||||
@ -245,8 +242,8 @@ int main(int argc, char* argv[])
|
|||||||
float tmpCount=0.0f, tmpPcfgSum=0.0f;
|
float tmpCount=0.0f, tmpPcfgSum=0.0f;
|
||||||
|
|
||||||
int i=0;
|
int i=0;
|
||||||
SAFE_GETLINE( (extractFileP), line, LINE_MAX_LENGTH, '\n', __FILE__ );
|
// TODO why read only the 1st line?
|
||||||
if ( !extractFileP.eof() ) {
|
if ( getline(extractFileP, line)) {
|
||||||
++i;
|
++i;
|
||||||
tmpPhraseSource = new PHRASE();
|
tmpPhraseSource = new PHRASE();
|
||||||
tmpPhraseTarget = new PHRASE();
|
tmpPhraseTarget = new PHRASE();
|
||||||
@ -265,23 +262,21 @@ int main(int argc, char* argv[])
|
|||||||
if ( hierarchicalFlag ) {
|
if ( hierarchicalFlag ) {
|
||||||
phrasePairsWithSameSourceAndTarget.push_back( phrasePair );
|
phrasePairsWithSameSourceAndTarget.push_back( phrasePair );
|
||||||
}
|
}
|
||||||
strcpy( lastLine, line );
|
lastLine = line;
|
||||||
SAFE_GETLINE( (extractFileP), line, LINE_MAX_LENGTH, '\n', __FILE__ );
|
|
||||||
}
|
}
|
||||||
|
|
||||||
while ( !extractFileP.eof() ) {
|
while ( getline(extractFileP, line) ) {
|
||||||
|
|
||||||
if ( ++i % 100000 == 0 ) {
|
if ( ++i % 100000 == 0 ) {
|
||||||
std::cerr << "." << std::flush;
|
std::cerr << "." << std::flush;
|
||||||
}
|
}
|
||||||
|
|
||||||
// identical to last line? just add count
|
// identical to last line? just add count
|
||||||
if (strcmp(line,lastLine) == 0) {
|
if (line == lastLine) {
|
||||||
phrasePair->IncrementPrevious(tmpCount,tmpPcfgSum);
|
phrasePair->IncrementPrevious(tmpCount,tmpPcfgSum);
|
||||||
SAFE_GETLINE((extractFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
continue;
|
continue;
|
||||||
} else {
|
} else {
|
||||||
strcpy( lastLine, line );
|
lastLine = line;
|
||||||
}
|
}
|
||||||
|
|
||||||
tmpPhraseSource = new PHRASE();
|
tmpPhraseSource = new PHRASE();
|
||||||
@ -359,8 +354,6 @@ int main(int argc, char* argv[])
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
SAFE_GETLINE((extractFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
processPhrasePairs( phrasePairsWithSameSource, *phraseTableFile, featureManager, maybeLogProb );
|
processPhrasePairs( phrasePairsWithSameSource, *phraseTableFile, featureManager, maybeLogProb );
|
||||||
@ -750,11 +743,9 @@ void loadFunctionWords( const string &fileName )
|
|||||||
}
|
}
|
||||||
istream *inFileP = &inFile;
|
istream *inFileP = &inFile;
|
||||||
|
|
||||||
char line[LINE_MAX_LENGTH];
|
string line;
|
||||||
while(true) {
|
while(getline(*inFileP, line)) {
|
||||||
SAFE_GETLINE((*inFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
std::vector<string> token = tokenize( line.c_str() );
|
||||||
if (inFileP->eof()) break;
|
|
||||||
std::vector<string> token = tokenize( line );
|
|
||||||
if (token.size() > 0)
|
if (token.size() > 0)
|
||||||
functionWordList.insert( token[0] );
|
functionWordList.insert( token[0] );
|
||||||
}
|
}
|
||||||
@ -799,16 +790,13 @@ void LexicalTable::load( const string &fileName )
|
|||||||
}
|
}
|
||||||
istream *inFileP = &inFile;
|
istream *inFileP = &inFile;
|
||||||
|
|
||||||
char line[LINE_MAX_LENGTH];
|
string line;
|
||||||
|
|
||||||
int i=0;
|
int i=0;
|
||||||
while(true) {
|
while(getline(*inFileP, line)) {
|
||||||
i++;
|
i++;
|
||||||
if (i%100000 == 0) std::cerr << "." << flush;
|
if (i%100000 == 0) std::cerr << "." << flush;
|
||||||
SAFE_GETLINE((*inFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
if (inFileP->eof()) break;
|
|
||||||
|
|
||||||
std::vector<string> token = tokenize( line );
|
std::vector<string> token = tokenize( line.c_str() );
|
||||||
if (token.size() != 3) {
|
if (token.size() != 3) {
|
||||||
std::cerr << "line " << i << " in " << fileName
|
std::cerr << "line " << i << " in " << fileName
|
||||||
<< " has wrong number of tokens, skipping:" << std::endl
|
<< " has wrong number of tokens, skipping:" << std::endl
|
||||||
|
@ -12,15 +12,12 @@
|
|||||||
#include <time.h>
|
#include <time.h>
|
||||||
|
|
||||||
#include "AlignmentPhrase.h"
|
#include "AlignmentPhrase.h"
|
||||||
#include "SafeGetline.h"
|
|
||||||
#include "tables-core.h"
|
#include "tables-core.h"
|
||||||
#include "InputFileStream.h"
|
#include "InputFileStream.h"
|
||||||
|
|
||||||
using namespace std;
|
using namespace std;
|
||||||
using namespace MosesTraining;
|
using namespace MosesTraining;
|
||||||
|
|
||||||
#define LINE_MAX_LENGTH 10000
|
|
||||||
|
|
||||||
namespace MosesTraining
|
namespace MosesTraining
|
||||||
{
|
{
|
||||||
|
|
||||||
@ -31,7 +28,7 @@ public:
|
|||||||
vector< vector<size_t> > alignedToE;
|
vector< vector<size_t> > alignedToE;
|
||||||
vector< vector<size_t> > alignedToF;
|
vector< vector<size_t> > alignedToF;
|
||||||
|
|
||||||
bool create( char*, int );
|
bool create( const char*, int );
|
||||||
void clear();
|
void clear();
|
||||||
bool equals( const PhraseAlignment& );
|
bool equals( const PhraseAlignment& );
|
||||||
};
|
};
|
||||||
@ -106,16 +103,14 @@ int main(int argc, char* argv[])
|
|||||||
vector< PhraseAlignment > phrasePairsWithSameF;
|
vector< PhraseAlignment > phrasePairsWithSameF;
|
||||||
int i=0;
|
int i=0;
|
||||||
int fileCount = 0;
|
int fileCount = 0;
|
||||||
while(true) {
|
|
||||||
|
string line;
|
||||||
|
while(getline(extractFileP, line)) {
|
||||||
if (extractFileP.eof()) break;
|
if (extractFileP.eof()) break;
|
||||||
if (++i % 100000 == 0) cerr << "." << flush;
|
if (++i % 100000 == 0) cerr << "." << flush;
|
||||||
char line[LINE_MAX_LENGTH];
|
|
||||||
SAFE_GETLINE((extractFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
// if (fileCount>0)
|
|
||||||
if (extractFileP.eof())
|
|
||||||
break;
|
|
||||||
PhraseAlignment phrasePair;
|
PhraseAlignment phrasePair;
|
||||||
bool isPhrasePair = phrasePair.create( line, i );
|
bool isPhrasePair = phrasePair.create( line.c_str(), i );
|
||||||
if (lastForeign >= 0 && lastForeign != phrasePair.foreign) {
|
if (lastForeign >= 0 && lastForeign != phrasePair.foreign) {
|
||||||
processPhrasePairs( phrasePairsWithSameF );
|
processPhrasePairs( phrasePairsWithSameF );
|
||||||
for(size_t j=0; j<phrasePairsWithSameF.size(); j++)
|
for(size_t j=0; j<phrasePairsWithSameF.size(); j++)
|
||||||
@ -124,7 +119,7 @@ int main(int argc, char* argv[])
|
|||||||
phraseTableE.clear();
|
phraseTableE.clear();
|
||||||
phraseTableF.clear();
|
phraseTableF.clear();
|
||||||
phrasePair.clear(); // process line again, since phrase tables flushed
|
phrasePair.clear(); // process line again, since phrase tables flushed
|
||||||
phrasePair.create( line, i );
|
phrasePair.create( line.c_str(), i );
|
||||||
phrasePairBase = 0;
|
phrasePairBase = 0;
|
||||||
}
|
}
|
||||||
lastForeign = phrasePair.foreign;
|
lastForeign = phrasePair.foreign;
|
||||||
@ -242,7 +237,7 @@ void processPhrasePairs( vector< PhraseAlignment > &phrasePair )
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
bool PhraseAlignment::create( char line[], int lineID )
|
bool PhraseAlignment::create(const char line[], int lineID )
|
||||||
{
|
{
|
||||||
vector< string > token = tokenize( line );
|
vector< string > token = tokenize( line );
|
||||||
int item = 1;
|
int item = 1;
|
||||||
@ -321,16 +316,14 @@ void LexicalTable::load( const string &filePath )
|
|||||||
}
|
}
|
||||||
istream *inFileP = &inFile;
|
istream *inFileP = &inFile;
|
||||||
|
|
||||||
char line[LINE_MAX_LENGTH];
|
string line;
|
||||||
|
|
||||||
int i=0;
|
int i=0;
|
||||||
while(true) {
|
while(getline(*inFileP, line)) {
|
||||||
i++;
|
i++;
|
||||||
if (i%100000 == 0) cerr << "." << flush;
|
if (i%100000 == 0) cerr << "." << flush;
|
||||||
SAFE_GETLINE((*inFileP), line, LINE_MAX_LENGTH, '\n', __FILE__);
|
|
||||||
if (inFileP->eof()) break;
|
|
||||||
|
|
||||||
vector<string> token = tokenize( line );
|
vector<string> token = tokenize( line.c_str() );
|
||||||
if (token.size() != 3) {
|
if (token.size() != 3) {
|
||||||
cerr << "line " << i << " in " << filePath << " has wrong number of tokens, skipping:\n" <<
|
cerr << "line " << i << " in " << filePath << " has wrong number of tokens, skipping:\n" <<
|
||||||
token.size() << " " << token[0] << " " << line << endl;
|
token.size() << " " << token[0] << " " << line << endl;
|
||||||
|
188
scripts/training/wrappers/conll2mosesxml.py
Executable file
188
scripts/training/wrappers/conll2mosesxml.py
Executable file
@ -0,0 +1,188 @@
|
|||||||
|
#!/usr/bin/python
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
# Author: Rico Sennrich
|
||||||
|
|
||||||
|
# takes a file in the CoNLL dependency format (from the CoNLL-X shared task on dependency parsing; http://ilk.uvt.nl/conll/#dataformat )
|
||||||
|
# and produces Moses XML format. Note that the structure is built based on fields 9 and 10 (projective HEAD and RELATION),
|
||||||
|
# which not all parsers produce.
|
||||||
|
|
||||||
|
# usage: conll2mosesxml.py [--brackets] < input_file > output_file
|
||||||
|
|
||||||
|
from __future__ import print_function, unicode_literals
|
||||||
|
import sys
|
||||||
|
import re
|
||||||
|
import codecs
|
||||||
|
from collections import namedtuple,defaultdict
|
||||||
|
from lxml import etree as ET
|
||||||
|
|
||||||
|
|
||||||
|
Word = namedtuple('Word', ['pos','word','lemma','tag','head','func', 'proj_head', 'proj_func'])
|
||||||
|
|
||||||
|
def main(output_format='xml'):
|
||||||
|
sentence = []
|
||||||
|
|
||||||
|
for line in sys.stdin:
|
||||||
|
|
||||||
|
# process sentence
|
||||||
|
if line == "\n":
|
||||||
|
sentence.insert(0,[])
|
||||||
|
if is_projective(sentence):
|
||||||
|
write(sentence,output_format)
|
||||||
|
else:
|
||||||
|
sys.stderr.write(' '.join(w.word for w in sentence[1:]) + '\n')
|
||||||
|
sys.stdout.write('\n')
|
||||||
|
sentence = []
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
pos, word, lemma, tag, tag2, morph, head, func, proj_head, proj_func = line.split()
|
||||||
|
except ValueError: # word may be unicode whitespace
|
||||||
|
pos, word, lemma, tag, tag2, morph, head, func, proj_head, proj_func = re.split(' *\t*',line.strip())
|
||||||
|
|
||||||
|
word = escape_special_chars(word)
|
||||||
|
lemma = escape_special_chars(lemma)
|
||||||
|
|
||||||
|
if proj_head == '_':
|
||||||
|
proj_head = head
|
||||||
|
proj_func = func
|
||||||
|
|
||||||
|
sentence.append(Word(int(pos), word, lemma, tag2,int(head), func, int(proj_head), proj_func))
|
||||||
|
|
||||||
|
|
||||||
|
# this script performs the same escaping as escape-special-chars.perl in Moses.
|
||||||
|
# most of it is done in function write(), but quotation marks need to be processed first
|
||||||
|
def escape_special_chars(line):
|
||||||
|
|
||||||
|
line = line.replace('\'',''') # xml
|
||||||
|
line = line.replace('"','"') # xml
|
||||||
|
|
||||||
|
return line
|
||||||
|
|
||||||
|
|
||||||
|
# make a check if structure is projective
|
||||||
|
def is_projective(sentence):
|
||||||
|
dominates = defaultdict(set)
|
||||||
|
for i,w in enumerate(sentence):
|
||||||
|
dominates[i].add(i)
|
||||||
|
if not i:
|
||||||
|
continue
|
||||||
|
head = int(w.proj_head)
|
||||||
|
while head != 0:
|
||||||
|
if i in dominates[head]:
|
||||||
|
break
|
||||||
|
dominates[head].add(i)
|
||||||
|
head = int(sentence[head].proj_head)
|
||||||
|
|
||||||
|
for i in dominates:
|
||||||
|
dependents = dominates[i]
|
||||||
|
if max(dependents) - min(dependents) != len(dependents)-1:
|
||||||
|
sys.stderr.write("error: non-projective structure.\n")
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def write(sentence, output_format='xml'):
|
||||||
|
|
||||||
|
if output_format == 'xml':
|
||||||
|
tree = create_subtree(0,sentence)
|
||||||
|
out = ET.tostring(tree, encoding = 'UTF-8').decode('UTF-8')
|
||||||
|
|
||||||
|
if output_format == 'brackets':
|
||||||
|
out = create_brackets(0,sentence)
|
||||||
|
|
||||||
|
out = out.replace('|','|') # factor separator
|
||||||
|
out = out.replace('[','[') # syntax non-terminal
|
||||||
|
out = out.replace(']',']') # syntax non-terminal
|
||||||
|
|
||||||
|
out = out.replace('&apos;',''') # lxml is buggy if input is escaped
|
||||||
|
out = out.replace('&quot;','"') # lxml is buggy if input is escaped
|
||||||
|
|
||||||
|
print(out)
|
||||||
|
|
||||||
|
# write node in Moses XML format
|
||||||
|
def create_subtree(position, sentence):
|
||||||
|
|
||||||
|
element = ET.Element('tree')
|
||||||
|
|
||||||
|
if position:
|
||||||
|
element.set('label', sentence[position].proj_func)
|
||||||
|
else:
|
||||||
|
element.set('label', 'sent')
|
||||||
|
|
||||||
|
for i in range(1,position):
|
||||||
|
if sentence[i].proj_head == position:
|
||||||
|
element.append(create_subtree(i, sentence))
|
||||||
|
|
||||||
|
if position:
|
||||||
|
|
||||||
|
if preterminals:
|
||||||
|
head = ET.Element('tree')
|
||||||
|
head.set('label', sentence[position].tag)
|
||||||
|
head.text = sentence[position].word
|
||||||
|
element.append(head)
|
||||||
|
|
||||||
|
else:
|
||||||
|
if len(element):
|
||||||
|
element[-1].tail = sentence[position].word
|
||||||
|
else:
|
||||||
|
element.text = sentence[position].word
|
||||||
|
|
||||||
|
for i in range(position, len(sentence)):
|
||||||
|
if i and sentence[i].proj_head == position:
|
||||||
|
element.append(create_subtree(i, sentence))
|
||||||
|
|
||||||
|
return element
|
||||||
|
|
||||||
|
|
||||||
|
# write node in bracket format (Penn treebank style)
|
||||||
|
def create_brackets(position, sentence):
|
||||||
|
|
||||||
|
if position:
|
||||||
|
element = "( " + sentence[position].proj_func + ' '
|
||||||
|
else:
|
||||||
|
element = "( sent "
|
||||||
|
|
||||||
|
for i in range(1,position):
|
||||||
|
if sentence[i].proj_head == position:
|
||||||
|
element += create_brackets(i, sentence)
|
||||||
|
|
||||||
|
if position:
|
||||||
|
word = sentence[position].word
|
||||||
|
if word == ')':
|
||||||
|
word = 'RBR'
|
||||||
|
elif word == '(':
|
||||||
|
word = 'LBR'
|
||||||
|
|
||||||
|
tag = sentence[position].tag
|
||||||
|
if tag == '$(':
|
||||||
|
tag = '$BR'
|
||||||
|
|
||||||
|
if preterminals:
|
||||||
|
element += '( ' + tag + ' ' + word + ' ) '
|
||||||
|
else:
|
||||||
|
element += word + ' ) '
|
||||||
|
|
||||||
|
for i in range(position, len(sentence)):
|
||||||
|
if i and sentence[i].proj_head == position:
|
||||||
|
element += create_brackets(i, sentence)
|
||||||
|
|
||||||
|
if preterminals or not position:
|
||||||
|
element += ') '
|
||||||
|
|
||||||
|
return element
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
if sys.version_info < (3,0,0):
|
||||||
|
sys.stdin = codecs.getreader('UTF-8')(sys.stdin)
|
||||||
|
sys.stdout = codecs.getwriter('UTF-8')(sys.stdout)
|
||||||
|
sys.stderr = codecs.getwriter('UTF-8')(sys.stderr)
|
||||||
|
|
||||||
|
if '--no_preterminals' in sys.argv:
|
||||||
|
preterminals = False
|
||||||
|
else:
|
||||||
|
preterminals = True
|
||||||
|
|
||||||
|
if '--brackets' in sys.argv:
|
||||||
|
main('brackets')
|
||||||
|
else:
|
||||||
|
main('xml')
|
Loading…
Reference in New Issue
Block a user