mosesdecoder/phrase-extract
Jeroen Vermeulen b2d821a141 Unify tokenize() into util, and unit-test it.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.

In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions.  In theory this may help performance, but it's mainly for clarity.

The comments are based on reverse-engineering, and the unit tests are based
on the comments.  It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!

I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
2015-04-22 09:59:05 +07:00
..
extract-ghkm Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
extract-mixed-syntax Fix some compile warnings (gcc 4.9.2). 2015-03-29 18:10:51 +07:00
filter-rule-table filter-rule-table: stopgap (non-) filtering for T2S/SCFG 2015-02-23 11:27:20 +00:00
lexical-reordering Using boost for prefix/suffix checks /Jeroen Vermeulen 2015-02-05 16:23:47 +00:00
pcfg-common Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
pcfg-extract beautify 2015-01-14 11:07:42 +00:00
pcfg-score beautify 2015-01-14 11:07:42 +00:00
postprocess-egret-forests Add phrase-extract/postprocess-egret-forests 2015-03-10 13:51:30 +00:00
score-stsg beautify 2015-01-14 11:07:42 +00:00
syntax-common Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
AlignmentPhrase.cpp add namespace to phrase-extract 2012-06-30 15:43:47 +01:00
AlignmentPhrase.h refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
consolidate-direct-main.cpp Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
consolidate-direct.vcxproj move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00
consolidate-main.cpp conservative update of some old code in phrase-extract/consolidate-main.cpp 2015-03-09 18:47:28 +00:00
consolidate-reverse-main.cpp Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
consolidate.vcxproj move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00
DomainFeature.cpp Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
DomainFeature.h Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
extract-lex-main.cpp beautify 2013-05-29 18:16:15 +01:00
extract-lex.h refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
extract-lex.vcxproj move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00
extract-main.cpp Modernize "C" includes in phrase-extract. 2015-03-28 19:56:20 +07:00
extract-rules-main.cpp glue grammar: alignment for <s> and </s> 2014-11-04 14:05:13 +00:00
extract-rules.vcxproj move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00
extract.vcxproj move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00
ExtractedRule.h Merge master 2013-09-13 22:32:45 +01:00
ExtractionPhrasePair.cpp GHKM: extract POS phrase property (from preterminals in the syntactic parse tree) 2015-03-04 21:40:56 +00:00
ExtractionPhrasePair.h GHKM: extract POS phrase property (from preterminals in the syntactic parse tree) 2015-03-04 21:40:56 +00:00
gzfilebuf.h move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00
hierarchical.h add namespace to phrase-extract 2012-06-30 15:43:47 +01:00
Hole.h refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
HoleCollection.cpp beautify 2013-05-29 18:16:15 +01:00
HoleCollection.h refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
InputFileStream.cpp move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00
InputFileStream.h move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00
InternalStructFeature.cpp beautify 2014-05-19 15:35:08 +02:00
InternalStructFeature.h Modernize "C" includes in phrase-extract. 2015-03-28 19:56:20 +07:00
Jamfile add OutputSearchGraphHypergraph() to API framework. Move m_source to BaseManager 2014-12-05 21:33:59 +00:00
OutputFileStream.cpp Using boost for prefix/suffix checks /Jeroen Vermeulen 2015-02-05 16:23:47 +00:00
OutputFileStream.h move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00
phrase-extract.sln move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00
PhraseExtractionOptions.h beautify 2015-01-14 11:07:42 +00:00
PropertiesConsolidator.cpp GHKM extraction / consolidate: write most frequent POS sequence from property to factor (for usage with a POS LM) 2015-03-05 22:25:32 +00:00
PropertiesConsolidator.h GHKM extraction / consolidate: write most frequent POS sequence from property to factor (for usage with a POS LM) 2015-03-05 22:25:32 +00:00
relax-parse-main.cpp Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
relax-parse.h Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
RuleExist.h add namespace to phrase-extract 2012-06-30 15:43:47 +01:00
RuleExtractionOptions.h Merge master 2013-09-13 22:32:45 +01:00
score-main.cpp integer overflows in Good-Turing discounting 2015-03-30 17:42:55 +01:00
score.h pragma once 2015-03-09 21:44:54 +00:00
score.vcxproj SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals. 2015-01-23 18:41:18 +00:00
ScoreFeature.cpp Using boost for prefix/suffix checks /Jeroen Vermeulen 2015-02-05 16:23:47 +00:00
ScoreFeature.h merged master into dynamic-models and solved conflicts 2014-12-13 12:52:47 +01:00
ScoreFeatureTest.cpp beautify 2015-01-14 11:07:42 +00:00
SentenceAlignment.cpp Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
SentenceAlignment.h beautify 2015-01-14 11:07:42 +00:00
SentenceAlignmentWithSyntax.cpp Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
SentenceAlignmentWithSyntax.h refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
statistics-main.cpp Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
SyntaxTree.cpp fix two bugs with relax-parse: 2013-04-25 17:27:50 +02:00
SyntaxTree.h beautify 2015-01-14 11:07:42 +00:00
tables-core.cpp Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
tables-core.h Unify tokenize() into util, and unit-test it. 2015-04-22 09:59:05 +07:00
test.domain Feature function interface for use in scoring 2012-11-02 23:30:51 +00:00
XmlException.h refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
XmlTree.cpp beautify 2015-01-14 11:07:42 +00:00
XmlTree.h extract-ghkm and friends: don't unescape special characters 2012-12-17 20:08:02 +00:00