mosesdecoder/contrib/c++tokenizer/Jamfile
Kenneth Heafield 1dce55f413 C++ tokenizer based on RE2. Not by me.
Some differences from Moses tokenizer:  fraction characters count as numbers, _ handling, URLs
Currently 3x slower than perl :'(.  Looking to make it faster by composing regex substitutions.
TODO eliminate sprintf and fixed-size buffers.
2015-01-22 12:25:02 +01:00

3 lines
95 B
Plaintext

external-lib re2 ;
exe tokenizer : tokenizer.cpp tokenizer_main.cpp re2 : <cflags>-std=c++11 ;