Added Å Ä Ö, which are not unusual initials in names, e.g. Åke, Ärling, Östen.
Added some new, but mostly variations on the existing ones. Both a dot after each letter (or pair) and a dot only after last letter are accepted forms. A couple of decades ago, there had to be a space after the dot, which explains the third form.
The file for sv is much more useful with these few additions. Although, It is still far from complete.
Removed: G (occured twice).
In this list there is one item that is also a word, even when case is kept: tom.
If all words are in small case, then tex, mao, tom (again), may be confused with names, and iaf, etc with named entities.
I wanted to properly parse links on https://dumps.wikimedia.org/mirrors.html when page copied as text
My proposed changes does the job.
Basically I had to change by replacing the + at end of line 5 with *(\/)?
The pipe symbol could lead to crashes why I broke up line 5 to three lines. I suggest not using the pipe (|) after reading various posts.
Causes abbreviations to not split when ending with a fullstop. E.g.
> The restructuring of IBM was essential to enable it organisationally to take up the responsibilities entrusted in the role with the recent changes in the policy and legislations, revised charter of function of IBM and the new activities and initiatives undertaken by IBM. IBM is also engaged in handholding the States for auction of mineral blocks for greater transparency in allocation of mineral concessions.
The script doesn't escape angle brackets which can result in bad SGML / XML output. This fixes that, although ideally, this should be implemented with a proper parser and dumper.