mosesdecoder/contrib/c++tokenizer
Kenneth Heafield 1dce55f413 C++ tokenizer based on RE2. Not by me.
Some differences from Moses tokenizer:  fraction characters count as numbers, _ handling, URLs
Currently 3x slower than perl :'(.  Looking to make it faster by composing regex substitutions.
TODO eliminate sprintf and fixed-size buffers.
2015-01-22 12:25:02 +01:00
..
Jamfile C++ tokenizer based on RE2. Not by me. 2015-01-22 12:25:02 +01:00
tokenizer_main.cpp C++ tokenizer based on RE2. Not by me. 2015-01-22 12:25:02 +01:00
tokenizer.cpp C++ tokenizer based on RE2. Not by me. 2015-01-22 12:25:02 +01:00
tokenizer.h C++ tokenizer based on RE2. Not by me. 2015-01-22 12:25:02 +01:00