17 KiB
Copyright (C) 2019-2021 Aleo Systems Inc. This file is part of the Leo library.
The Leo library is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
The Leo library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with the Leo library. If not, see https://www.gnu.org/licenses/.
Introduction
This file contains an ABNF (Augmented Backus-Naur Form) grammar of Leo string formatting. Background on ABNF is provided later in this file.
This grammar provides an official definition of how format strings are parsed for printed by the Leo compiler.
Background on ABNF
ABNF is an Internet standard: see RFC 5234 at https://www.rfc-editor.org/info/rfc5234 and RFC 7405 at https://www.rfc-editor.org/info/rfc7405. It is used to specify the syntax of JSON, HTTP, and other standards.
ABNF adds conveniences and makes slight modifications to Backus-Naur Form (BNF), without going beyond context-free grammars.
Instead of BNF's angle-bracket notation for nonterminals,
ABNF uses case-insensitive names consisting of letters, digits, and dashes,
e.g. HTTP-message
and IPv6address
.
ABNF includes an angle-bracket notation for prose descriptions,
e.g. <host, see [RFC3986], Section 3.2.2>
,
usable as last resort in the definiens of a nonterminal.
While BNF allows arbitrary terminals,
ABNF uses only natural numbers as terminals,
and denotes them via:
(i) binary, decimal, or hexadecimal sequences,
e.g. %b1.11.1010
, %d1.3.10
, and %x.1.3.A
all denote the sequence of terminals [1, 3, 10];
(ii) binary, decimal, or hexadecimal ranges,
e.g. %x30-39
denotes any singleton sequence of terminals
[n] with 48 <= n <= 57 (an ASCII digit);
(iii) case-sensitive ASCII strings,
e.g. %s"Ab"
denotes the sequence of terminals [65, 98];
and (iv) case-insensitive ASCII strings,
e.g. %i"ab"
, or just "ab"
, denotes
any sequence of terminals among
[65, 66],
[65, 98],
[97, 66], and
[97, 98].
ABNF terminals in suitable sets represent ASCII or Unicode characters.
ABNF allows repetition prefixes n*m
,
where n
and m
are natural numbers in decimal notation;
if absent,
n
defaults to 0, and
m
defaults to infinity.
For example,
1*4HEXDIG
denotes one to four HEXDIG
s,
*3DIGIT
denotes up to three DIGIT
s, and
1*OCTET
denotes one or more OCTET
s.
A single n
prefix
abbreviates n*n
,
e.g. 3DIGIT
denotes three DIGIT
s.
Instead of BNF's |
, ABNF uses /
to separate alternatives.
Repetition prefixes have precedence over juxtapositions,
which have precedence over /
.
Round brackets group things and override the aforementioned precedence rules,
e.g. *(WSP / CRLF WSP)
denotes sequences of terminals
obtained by repeating, zero or more times,
either (i) a WSP
or (ii) a CRLF
followed by a WSP
.
Square brackets also group things but make them optional,
e.g. [":" port]
is equivalent to 0*1(":" port)
.
Instead of BNF's ::=
, ABNF uses =
to define nonterminals,
and =/
to incrementally add alternatives
to previously defined nonterminals.
For example, the rule BIT = "0" / "1"
is equivalent to BIT = "0"
followed by BIT =/ "1"
.
The syntax of ABNF itself is formally specified in ABNF (in Section 4 of the aforementioned RFC 5234, after the syntax and semantics of ABNF are informally specified in natural language (in Sections 1, 2, and 3 of the aforementioned RFC 5234). The syntax rules of ABNF prescribe the ASCII codes allowed for white space (spaces and horizontal tabs), line endings (carriage returns followed by line feeds), and comments (semicolons to line endings).
Structure
This ABNF grammar consists of one grammar: that describes how a Leo string-literal is parsed for formatting.
Format String
not-double-quote-or-backslash-or-brace = %x0-21
/ %x23-5B
/ %x5D-7A
/ %x7C
/ %x7E-10FFFF
; anything but " or \ or { or }
double-quote = %x22 ; "
single-quote = %x27 ; '
single-quote-escape = "\" single-quote ; \'
Go to: single-quote;
double-quote-escape = "\" double-quote ; \"
Go to: double-quote;
backslash-escape = "\\"
line-feed-escape = %s"\n"
carriage-return-escape = %s"\r"
horizontal-tab-escape = %s"\t"
null-character-escape = "\0"
simple-character-escape = single-quote-escape
/ double-quote-escape
/ backslash-escape
/ line-feed-escape
/ carriage-return-escape
/ horizontal-tab-escape
/ null-character-escape
Go to: null-character-escape, line-feed-escape, single-quote-escape, double-quote-escape, backslash-escape, carriage-return-escape, horizontal-tab-escape;
octal-digit = %x30-37 ; 0-7
hexadecimal-digit = digit / "a" / "b" / "c" / "d" / "e" / "f"
Go to: digit;
ascii-character-escape = %s"\x" octal-digit hexadecimal-digit
Go to: octal-digit, hexadecimal-digit;
unicode-character-escape = %s"\u{" 1*6hexadecimal-digit "}"
format-string-element-not-brace = not-double-quote-or-backslash-or-brace
/ simple-character-escape
/ ascii-character-escape
/ unicode-character-escape
Go to: not-double-quote-or-backslash-or-brace, simple-character-escape, ascii-character-escape, unicode-character-escape;
format-string-container = "{}"
format-string-open-brace = "{{"
format-string-close-brace = "}}"
format-string-element = format-string-element-not-brace
/ format-string-container
/ format-string-open-brace
/ format-string-close-brace
Go to: format-string-element-not-brace, format-string-open-brace, format-string-close-brace, format-string-container;
format-string = double-quote *format-string-element double-quote