Attempt to stop people from publishing non-comparable BLEU scores, as discussed in statmt meeting

This commit is contained in:
Kenneth Heafield 2017-10-19 22:57:36 +01:00
parent eced95d694
commit 545eee7e75

View File

@ -168,6 +168,9 @@ printf "BLEU = %.2f, %.1f/%.1f/%.1f/%.1f (BP=%.3f, ratio=%.3f, hyp_len=%d, ref_l
$length_translation,
$length_reference;
print STDERR "Do not publish scores from multi-bleu.perl. The scores depend on your tokenizer, which is unlikely to be reproducible from your paper or consistent across research groups. Instead you should detokenize then use mteval-v14.pl, which has a standard tokenization. Scores from multi-bleu.perl can still be used for internal purposes when you have a consistent tokenizer.\n";
sub my_log {
return -9999999999 unless $_[0];
return log($_[0]);