mosesdecoder/Jamfile at a6cef9382c9b6dacb003006cee17f112ee36bdfa - mosesdecoder - gitea: Gitea Service

moses-smt/mosesdecoder

mirror of https://github.com/moses-smt/mosesdecoder.git synced 2024-12-30 23:42:30 +03:00

Kenneth Heafield 1dce55f413 C++ tokenizer based on RE2. Not by me.

Some differences from Moses tokenizer:  fraction characters count as numbers, _ handling, URLs
Currently 3x slower than perl :'(.  Looking to make it faster by composing regex substitutions.
TODO eliminate sprintf and fixed-size buffers.

2015-01-22 12:25:02 +01:00

3 lines

95 B

Plaintext

Raw Blame History

	`external-lib re2 ;`
	`exe tokenizer : tokenizer.cpp tokenizer_main.cpp re2 : <cflags>-std=c++11 ;`