This only affects configurations where inline functions become regular,
non-weak symbols, leading to link conflicts. The extra definition was not
used anywhere.
The removed definition was probably less efficient. However the only
functional difference was that it returned false for the empty nonterminal,
i.e. "[]".
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.
In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions. In theory this may help performance, but it's mainly for clarity.
The comments are based on reverse-engineering, and the unit tests are based
on the comments. It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!
I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
Mostly signed/unsigned comparisons and reordered member
initializations; also a few unused variables.
There are more, but if I chip away at them for a while, who knows, it
may catch on and warnings may eventually become socially stigmatizing.
:)
This is one of those little chores in managing a long-lived C++
project: standard C headers like stdio.h and math.h now have their own
place in the C++ standard as resp. cstdio, cmath, and so on. In this
branch the #include names are updated for the phrase-extract/
subdirectory; more branches to follow.
C++11 adds cstdint, but to support compilation with the previous
standard, that change is left for later.
This performs some minor transformations to Egret forests: escaping of
Moses special characters; removal of "^g" suffixes from constituent labels;
and marking of slash/hyphen split points (using @ characters).
Output should match filter-rule-table.py, but filtering is faster. Some rough
timings:
That This
System A 0h 13m 0h 04m
System B 18h 03m 0h 51m
System A is WMT14, en-de, string-to-tree (32M rules, 3,000 test sentences)
System B is WMT14, cs-en, string-to-tree (293M rules, 13,071 test sentences)
This will eventually replace filter-rule-table.py. At the moment
it can only filter rule tables where the source-side is a STSG
fragment and when the test sentences have parse trees.