leo/grammar/FORMAT_ABNF_GRAMMER.md
2021-07-21 14:42:25 -07:00

17 KiB

Copyright (C) 2019-2021 Aleo Systems Inc. This file is part of the Leo library.

The Leo library is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

The Leo library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with the Leo library. If not, see https://www.gnu.org/licenses/.


Introduction

This file contains an ABNF (Augmented Backus-Naur Form) grammar of Leo string formatting. Background on ABNF is provided later in this file.

This grammar provides an official definition of how format strings are parsed for printed by the Leo compiler.


Background on ABNF

ABNF is an Internet standard: see RFC 5234 at https://www.rfc-editor.org/info/rfc5234 and RFC 7405 at https://www.rfc-editor.org/info/rfc7405. It is used to specify the syntax of JSON, HTTP, and other standards.

ABNF adds conveniences and makes slight modifications to Backus-Naur Form (BNF), without going beyond context-free grammars.

Instead of BNF's angle-bracket notation for nonterminals, ABNF uses case-insensitive names consisting of letters, digits, and dashes, e.g. HTTP-message and IPv6address. ABNF includes an angle-bracket notation for prose descriptions, e.g. <host, see [RFC3986], Section 3.2.2>, usable as last resort in the definiens of a nonterminal.

While BNF allows arbitrary terminals, ABNF uses only natural numbers as terminals, and denotes them via: (i) binary, decimal, or hexadecimal sequences, e.g. %b1.11.1010, %d1.3.10, and %x.1.3.A all denote the sequence of terminals [1, 3, 10]; (ii) binary, decimal, or hexadecimal ranges, e.g. %x30-39 denotes any singleton sequence of terminals [n] with 48 <= n <= 57 (an ASCII digit); (iii) case-sensitive ASCII strings, e.g. %s"Ab" denotes the sequence of terminals [65, 98]; and (iv) case-insensitive ASCII strings, e.g. %i"ab", or just "ab", denotes any sequence of terminals among [65, 66], [65, 98], [97, 66], and [97, 98]. ABNF terminals in suitable sets represent ASCII or Unicode characters.

ABNF allows repetition prefixes n*m, where n and m are natural numbers in decimal notation; if absent, n defaults to 0, and m defaults to infinity. For example, 1*4HEXDIG denotes one to four HEXDIGs, *3DIGIT denotes up to three DIGITs, and 1*OCTET denotes one or more OCTETs. A single n prefix abbreviates n*n, e.g. 3DIGIT denotes three DIGITs.

Instead of BNF's |, ABNF uses / to separate alternatives. Repetition prefixes have precedence over juxtapositions, which have precedence over /. Round brackets group things and override the aforementioned precedence rules, e.g. *(WSP / CRLF WSP) denotes sequences of terminals obtained by repeating, zero or more times, either (i) a WSP or (ii) a CRLF followed by a WSP. Square brackets also group things but make them optional, e.g. [":" port] is equivalent to 0*1(":" port).

Instead of BNF's ::=, ABNF uses = to define nonterminals, and =/ to incrementally add alternatives to previously defined nonterminals. For example, the rule BIT = "0" / "1" is equivalent to BIT = "0" followed by BIT =/ "1".

The syntax of ABNF itself is formally specified in ABNF (in Section 4 of the aforementioned RFC 5234, after the syntax and semantics of ABNF are informally specified in natural language (in Sections 1, 2, and 3 of the aforementioned RFC 5234). The syntax rules of ABNF prescribe the ASCII codes allowed for white space (spaces and horizontal tabs), line endings (carriage returns followed by line feeds), and comments (semicolons to line endings).


Structure

This ABNF grammar consists of one grammar: that describes how a Leo string-literal is parsed for formatting.


Format String

not-double-quote-or-backslash-or-brace = %x0-21
                                         / %x23-5B
                                         / %x5D-7A
                                         / %x7C
                                         / %x7E-10FFFF
                                         ; anything but " or \ or { or }

double-quote = %x22   ; "

single-quote = %x27   ; '

single-quote-escape = "\" single-quote   ; \'

Go to: single-quote;

double-quote-escape = "\" double-quote   ; \"

Go to: double-quote;

backslash-escape = "\\"

line-feed-escape = %s"\n"

carriage-return-escape = %s"\r"

horizontal-tab-escape = %s"\t"

null-character-escape = "\0"

simple-character-escape = single-quote-escape
                        / double-quote-escape
                        / backslash-escape
                        / line-feed-escape
                        / carriage-return-escape
                        / horizontal-tab-escape
                        / null-character-escape

Go to: null-character-escape, line-feed-escape, single-quote-escape, double-quote-escape, backslash-escape, carriage-return-escape, horizontal-tab-escape;

octal-digit = %x30-37   ; 0-7

hexadecimal-digit = digit / "a" / "b" / "c" / "d" / "e" / "f"

Go to: digit;

ascii-character-escape = %s"\x" octal-digit hexadecimal-digit

Go to: octal-digit, hexadecimal-digit;

unicode-character-escape = %s"\u{" 1*6hexadecimal-digit "}"

format-string-element-not-brace = not-double-quote-or-backslash-or-brace
                                 / simple-character-escape
                                 / ascii-character-escape
                                 / unicode-character-escape

Go to: not-double-quote-or-backslash-or-brace, simple-character-escape, ascii-character-escape, unicode-character-escape;

format-string-container = "{}"

format-string-open-brace = "{{"

format-string-close-brace = "}}"

format-string-element = format-string-element-not-brace
                      / format-string-container
                      / format-string-open-brace
                      / format-string-close-brace

Go to: format-string-element-not-brace, format-string-open-brace, format-string-close-brace, format-string-container;

format-string = double-quote *format-string-element double-quote