mosesdecoder/scripts/generic/strip-xml.perl
Jeroen Vermeulen a25193cc5d Fix a lot of lint, mostly trailing whitespace.
This is lint reported by the new lint-checking functionality in beautify.py.
(We can change to a different lint checker if we have a better one, but it
would probably still flag these same problems.)

Lint checking can help a lot, but only if we get the lint under control.
2015-05-17 20:04:04 +07:00

46 lines
761 B
Perl
Executable File

#!/usr/bin/env perl
use warnings;
use strict;
while (my $line = <STDIN>) {
chomp($line);
#print "$line\n";
my $len = length($line);
my $inXML = 0;
my $prevSpace = 1;
my $prevBar = 0;
for (my $i = 0; $i < $len; ++$i) {
my $c = substr($line, $i, 1);
if ($c eq "<" && !$prevBar) {
++$inXML;
}
elsif ($c eq ">" && $inXML>0) {
--$inXML;
}
elsif ($prevSpace == 1 && $c eq " ")
{ # duplicate space. Do nothing
}
elsif ($inXML == 0) {
if ($c eq " ") {
$prevSpace = 1;
$prevBar = 0;
}
elsif ($c eq "|") {
$prevSpace = 0;
$prevBar = 1;
}
else {
$prevSpace = 0;
$prevBar = 0;
}
print $c;
}
}
print "\n";
}