OPUS-MT-train/tatoeba/SentencePieceModels.md
2022-07-12 23:08:00 +03:00

459 KiB

Tatoeba-MT Sentence Piece Models

import sentencepiece as spm
sp = spm.SentencePieceProcessor(model_file='opusTC.eng.16k.spm')
print(sp.encode(['Hello world', 'This is a tokenization-test'], out_type=str))