Matthias Huck
633e7be8f0
integer overflows in Good-Turing discounting
2015-03-30 17:42:55 +01:00
Jeroen Vermeulen
c634f6ee5b
Remove some unused variables.
...
This silences a few more compiler warnings.
2015-03-30 10:26:39 +07:00
Jeroen Vermeulen
789a2e2bc3
Fix some compile warnings (gcc 4.9.2).
...
Mostly signed/unsigned comparisons and reordered member
initializations; also a few unused variables.
There are more, but if I chip away at them for a while, who knows, it
may catch on and warnings may eventually become socially stigmatizing.
:)
2015-03-29 18:10:51 +07:00
Jeroen Vermeulen
9852a0c2ff
Modernize "C" includes in phrase-extract.
...
This is one of those little chores in managing a long-lived C++
project: standard C headers like stdio.h and math.h now have their own
place in the C++ standard as resp. cstdio, cmath, and so on. In this
branch the #include names are updated for the phrase-extract/
subdirectory; more branches to follow.
C++11 adds cstdint, but to support compilation with the previous
standard, that change is left for later.
2015-03-28 19:56:20 +07:00
Matthias Huck
534a894c0b
glue rules with stripped BitPar labels
2015-03-10 22:02:21 +00:00
Matthias Huck
01bed83cf9
GHKM extraction: option to strip non-terminal labels from BitPar syntactic parses right during extraction (i.e., remove any suffix starting with a hyphen from the label)
2015-03-10 21:25:32 +00:00
Hieu Hoang
ad73919979
merge with private branch
2015-03-10 15:28:45 +00:00
Phil Williams
9e88f794e6
Add phrase-extract/postprocess-egret-forests
...
This performs some minor transformations to Egret forests: escaping of
Moses special characters; removal of "^g" suffixes from constituent labels;
and marking of slash/hyphen split points (using @ characters).
2015-03-10 13:51:30 +00:00
Matthias Huck
25f5470216
GHKM: write target parts-of-speech as a factor
2015-03-09 21:54:03 +00:00
Matthias Huck
524ed4406e
pragma once
2015-03-09 21:44:54 +00:00
Matthias Huck
559077f6f8
some moderate modifications in phrase-extract/score-main.cpp
...
(e.g., use Moses::Scan<>() rather than atof()/atoi())
2015-03-09 18:49:32 +00:00
Matthias Huck
973fd98052
conservative update of some old code in phrase-extract/consolidate-main.cpp
2015-03-09 18:47:28 +00:00
Matthias Huck
0c79e19ff9
consolidate properties: fixing bug from commit b08d3ed
2015-03-09 18:44:02 +00:00
Hieu Hoang
b08d3ed0fe
merge with private branch. Add --Count arg
2015-03-09 00:47:51 +00:00
Matthias Huck
99b8f65fb1
GHKM: POS factor in glue rules: target side only
2015-03-06 16:47:44 +00:00
Matthias Huck
aa077ab66c
GHKM extraction / consolidate: write most frequent POS sequence from property to factor (for usage with a POS LM)
2015-03-05 22:25:32 +00:00
Matthias Huck
773a16b5fd
POS property in glue rules
2015-03-04 23:05:45 +00:00
Matthias Huck
638e9c3f60
POS property: map tags to indices in consolidate
2015-03-04 22:48:34 +00:00
Matthias Huck
06e87d851e
GHKM: extract POS phrase property (from preterminals in the syntactic parse tree)
2015-03-04 21:40:56 +00:00
Phil Williams
0346fbb138
filter-rule-table: stopgap (non-) filtering for T2S/SCFG
2015-02-23 11:27:20 +00:00
Hieu Hoang
32de075022
beautify
2015-02-19 12:27:23 +00:00
Phil Williams
e1d60211a4
filter-rule-table: comments + minor clean-up.
2015-02-11 12:03:27 +00:00
Phil Williams
02f5ada680
filter-rule-table: support for "hierarchical" and "s2t" model types
...
Output should match filter-rule-table.py, but filtering is faster. Some rough
timings:
That This
System A 0h 13m 0h 04m
System B 18h 03m 0h 51m
System A is WMT14, en-de, string-to-tree (32M rules, 3,000 test sentences)
System B is WMT14, cs-en, string-to-tree (293M rules, 13,071 test sentences)
2015-02-10 15:11:10 +00:00
Hieu Hoang
70e8eb54ce
Using boost for prefix/suffix checks /Jeroen Vermeulen
2015-02-05 16:23:47 +00:00
Philipp Koehn
f69c1dab02
more efficient default recaser training
2015-02-04 09:18:09 +00:00
Phil Williams
6b9da6c585
filter-rule-table: merge changes from t2s branch (still WIP)
2015-02-03 11:33:10 +00:00
Matthias Huck
9987beb453
SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals.
...
Also added score components based on relative frequency.
(TODO: logprobs right now; are plain probabilities better?)
2015-01-23 18:41:18 +00:00
Matthias Huck
b50c197313
forgot to check this in some time ago
2015-01-20 21:41:41 +00:00
Matthias Huck
a6c09e57d0
domain features in GHKM extraction
2015-01-20 21:36:55 +00:00
Hieu Hoang
b50b3164fa
beautify
2015-01-15 11:18:39 +00:00
Hieu Hoang
6289b39fd8
update extract-mixed-syntax
2015-01-15 09:53:57 +00:00
Hieu Hoang
6d61db28fa
use astyle 2.01. It's on Edinburgh server and doesn't screw up enum
2015-01-14 19:21:11 +00:00
Hieu Hoang
05ead45e71
beautify
2015-01-14 11:07:42 +00:00
Matthias Huck
168118d252
PhraseOrientationFeature efficiency improvement
2015-01-09 14:03:18 +00:00
Phil Williams
7cc75a0fa1
score-stsg: add --TreeScore option
2014-12-30 18:57:23 +00:00
Philipp Koehn
831f947874
long overdue feature: do not produce very low scoring translation table entries that are never used and just gum up the works
2014-12-21 01:14:42 +00:00
Nicola Bertoldi
e4eb201c52
merged master into dynamic-models and solved conflicts
2014-12-13 12:52:47 +01:00
Phil Williams
b9a382aa78
Add filter-rule-table
...
This will eventually replace filter-rule-table.py. At the moment
it can only filter rule tables where the source-side is a STSG
fragment and when the test sentences have parse trees.
2014-12-07 14:56:48 +00:00
Phil Williams
60e56efc6b
phrase-extract: add syntax-common sub-library
...
And remove some (near-)duplicate code from pcfg-common and score-stsg.
2014-12-07 14:27:51 +00:00
Phil Williams
a2708b8431
relax-parse: fix hang
...
SyntaxTree::Parse() would enter a *very* long loop due to an unintialized
member variable.
2014-12-07 12:56:41 +00:00
Hieu Hoang
4b10c59bea
add OutputSearchGraphHypergraph() to API framework. Move m_source to BaseManager
2014-12-05 21:33:59 +00:00
Matthias Huck
7a299de66b
avoid necessity of masking "{{" in the data
2014-12-04 15:54:05 +00:00
Matthias Huck
24a8a6a511
PhraseOrientationFeature
2014-12-03 20:04:26 +00:00
Hieu Hoang
49a2ff1faa
Merge branch 'merge-cmd'
2014-12-02 19:09:34 +00:00
Hieu Hoang
ba7afba9f6
move n-best code for phrase-based from IOWrapper to ChartManager
2014-12-02 17:40:53 +00:00
Phil Williams
f84f159247
Add score-stsg, a program for scoring STSG extract files
2014-12-02 17:10:20 +00:00
Phil Williams
ef1262a17f
extract-ghkm: change STSG output format.
2014-11-21 15:46:12 +00:00
Phil Williams
c46fb10ec7
extract-ghkm: add --STSG option
2014-11-21 11:30:29 +00:00
Matthias Huck
0fd987a8c6
avoid necessity of masking "{{" in the data
2014-11-12 18:28:59 +00:00
Phil Williams
a5d803ee14
extract-ghkm: add -T2S option
2014-11-12 14:03:24 +00:00