1
0
mirror of https://github.com/google/fonts.git synced 2025-01-07 10:11:37 +03:00
fonts/tools/encodings
2017-03-13 20:08:11 +01:00
..
GF 2016 Glyph Sets [tools] Add Namelist include headers. 2017-03-13 20:08:11 +01:00
arabic_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
bengali_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
cyrillic_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
cyrillic-ext_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
devanagari_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
ethiopic_unique-glyphs.nam [tools] Fix encodings/ethiopic_unique-glyphs.nam: codepoints must be uppercase hex values. 2017-03-13 20:08:11 +01:00
greek_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
greek-ext_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
gujarati_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
gurmukhi_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
hebrew_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
japanese_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
kannada_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
khmer_unique-glyphs.nam Add names of unicode characters to nam files 2016-01-28 14:47:19 -08:00
lao_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
latin_unique-glyphs.nam Add names of unicode characters to nam files 2016-01-28 14:47:19 -08:00
latin-ext_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
malayalam_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
myanmar_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
oriya_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
README.md [tools] Update Namelist spec to allow 4 or 5 hex digits codepoints. 2017-03-13 20:08:11 +01:00
sinhala_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
tamil_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
telugu_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
thai_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
vietnamese_unique-glyphs.nam [tools] Add Namelist include headers to legacy encodings. 2017-03-13 20:08:11 +01:00
wgl-latin.enc I<3 nam files 2015-03-16 11:14:46 -07:00

This directory contains "Namelist" files, that list Unicode characters followed by glyph names or glyph descriptions.

TODO: clarify: Typically all the Unicodes in each file are in each font. If the fonts go beyond that list, those additional characters will not be available to Fonts API end users.

The Google Fonts API uses these files in combination with pyftsubset to generate script subsets from the full .ttf files in this repository.

There are "legacy" encodings, the files directly in this directory and the newer "2016" encodings contained in the GF 2016 Glyph Sets subdirectory. The latter directory contains itself a [README.md]('GF 2016 Glyph Sets/README.md').

The "Namelist" file format

The extension of a Namelist file is ".nam".

Namelist files are encoded in UTF-8.

legacy subsetting

  • all encodings with the exception of khmer include latin_unique-glyphs.nam
  • all extended encodings with filenames like{lang}-ext_unique-glyphs.nam also include {lang}_unique-glyphs.nam

This is implemented in the CodepointFiles function of google_fonts.py

2016 subsetting

The [README.md]('GF 2016 Glyph Sets/README.md') describes mostly how each of the Namelist files depend on each other, however, this is not implemented anywhere yet. To make the system more flexible and self describing I suggest using header includes.

Codepoint format

A line that starts with 0x and then have 4 or 5 uppercase hex digits; what follows is an arbitrary description to the end of the line.

Example:

0x0061  a LATIN SMALL LETTER A
0x0062  b LATIN SMALL LETTER B
0x0063  c LATIN SMALL LETTER C
0x03E4  Ϥ COPTIC CAPITAL LETTER FEI
0x10142  𐅂 GREEK ACROPHONIC ATTIC ONE DRACHMA

Comments

Comments are lines starting with #.

Example:

# Created by Alexei Vanyashin 2016-27-06
#$ include ../GF-latin-plus_unique-glyphs.nam

glyphs without Unicode (since 2016)

A line that starts with at least six spaces describes a glyph that has no Unicode and will usually be accessible via OpenType GSUB. These glyphs are used to ensure the fonts contain certain OpenType features.

The description is a human readable glyph name that can be used in the authoring application (targeted at Glyphs) to create and process the glyph properly.

Example:

          aacute.sc
          abreve.sc
          abreveacute.sc
          abrevedotbelow.sc

Header (proposal)

To make the Namelist format more self contained I suggest a file header.

The Namelist header is made of all consecutive comment lines at the beginning of the file. The first non-comment line ends the header.

Specially crafted comment lines, "header data", define the meta data of the file. Other comments are just that, comments.

A header data line begins with "#$" then is followed by a keyword and then followed by the arguments for the keyword.

The keyword is separated from its arguments by one or more space characters.

The keywords, the semantics of a keyword and the syntax of the arguments of a keyword should be documented here.

Format for header data:

#$ {keyword} {arguments}

Keyword #$ include {namfile}

The include keyword can be used zero or more times. Order of appearance should have no effect on the resulting set and thus be not important.

It specifies the namfiles on which the current namfile depends. The file plus all of its includes together define the complete char-set.

Includes are loaded recursively, meaning that the includes of an included files must also be loaded.

E.g. a 2016 a "pro" encoding would include its "plus" encoding and the latter would include its "base" encoding. The Pro charset then is the union of pro, plus and base.

Loops in the includes are not followed; the final result as a set wouldn't have a different value whether the same file is included once or many times.

{namfile} is a file path to a namfile relative to the path of the file that contains the include statement.

Possible Keywords

  • author {name} (zero or more) since we can already find # Created by comments in the 2016 namfiles we could as well just institutionalize it.

  • label {name} A human readable name for the file, to be used in user interfaces. Could also have a further {locale} argument for internationalization.

Scripts

A python script, tools/namelist.py can generate these files:

namelist.py Font.ttf > NameList.nam

The [wgl-latin.enc] file can be used by Fontlab Studio 5, and represented Microsoft's Windows Glyph List 4 glyph set.

A python script, tools/unicode_names.py can reformat these files:

python unicode_names.py --nam_file=encodings/latin_unique-glyphs.nam;
0x0020    SPACE
0x0021  ! EXCLAMATION MARK
0x0022  " QUOTATION MARK
0x0023  # NUMBER SIGN
...