Commit Graph

5 Commits

Author SHA1 Message Date
Jeroen Vermeulen
b2d821a141 Unify tokenize() into util, and unit-test it.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.

In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions.  In theory this may help performance, but it's mainly for clarity.

The comments are based on reverse-engineering, and the unit tests are based
on the comments.  It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!

I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
2015-04-22 09:59:05 +07:00
Jeroen Vermeulen
9852a0c2ff Modernize "C" includes in phrase-extract.
This is one of those little chores in managing a long-lived C++
project: standard C headers like stdio.h and math.h now have their own
place in the C++ standard as resp. cstdio, cmath, and so on.  In this
branch the #include names are updated for the phrase-extract/
subdirectory; more branches to follow.

C++11 adds cstdint, but to support compilation with the previous
standard, that change is left for later.
2015-03-28 19:56:20 +07:00
Matthias Huck
d0e92da734 GHKM extraction can add a source labels phrase property 2014-06-11 19:27:18 +01:00
Hieu Hoang
ef9db932aa add namespace to phrase-extract 2012-06-30 15:43:47 +01:00
Hieu Hoang
4eef94b121 move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00