leo/grammar/format-abnf-grammar.txt
2021-07-21 14:42:25 -07:00

180 lines
6.3 KiB
Plaintext

; Copyright (C) 2019-2021 Aleo Systems Inc.
; This file is part of the Leo library.
; The Leo library is free software: you can redistribute it and/or modify
; it under the terms of the GNU General Public License as published by
; the Free Software Foundation, either version 3 of the License, or
; (at your option) any later version.
; The Leo library is distributed in the hope that it will be useful,
; but WITHOUT ANY WARRANTY; without even the implied warranty of
; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
; GNU General Public License for more details.
; You should have received a copy of the GNU General Public License
; along with the Leo library. If not, see <https://www.gnu.org/licenses/>.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Introduction
; ------------
; This file contains an ABNF (Augmented Backus-Naur Form) grammar of Leo string formatting.
; Background on ABNF is provided later in this file.
; This grammar provides an official definition of how format strings
; are parsed for printed by the Leo compiler.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Background on ABNF
; ------------------
; ABNF is an Internet standard:
; see RFC 5234 at https://www.rfc-editor.org/info/rfc5234
; and RFC 7405 at https://www.rfc-editor.org/info/rfc7405.
; It is used to specify the syntax of JSON, HTTP, and other standards.
; ABNF adds conveniences and makes slight modifications
; to Backus-Naur Form (BNF),
; without going beyond context-free grammars.
; Instead of BNF's angle-bracket notation for nonterminals,
; ABNF uses case-insensitive names consisting of letters, digits, and dashes,
; e.g. `HTTP-message` and `IPv6address`.
; ABNF includes an angle-bracket notation for prose descriptions,
; e.g. `<host, see [RFC3986], Section 3.2.2>`,
; usable as last resort in the definiens of a nonterminal.
; While BNF allows arbitrary terminals,
; ABNF uses only natural numbers as terminals,
; and denotes them via:
; (i) binary, decimal, or hexadecimal sequences,
; e.g. `%b1.11.1010`, `%d1.3.10`, and `%x.1.3.A`
; all denote the sequence of terminals [1, 3, 10];
; (ii) binary, decimal, or hexadecimal ranges,
; e.g. `%x30-39` denotes any singleton sequence of terminals
; [_n_] with 48 <= _n_ <= 57 (an ASCII digit);
; (iii) case-sensitive ASCII strings,
; e.g. `%s"Ab"` denotes the sequence of terminals [65, 98];
; and (iv) case-insensitive ASCII strings,
; e.g. `%i"ab"`, or just `"ab"`, denotes
; any sequence of terminals among
; [65, 66],
; [65, 98],
; [97, 66], and
; [97, 98].
; ABNF terminals in suitable sets represent ASCII or Unicode characters.
; ABNF allows repetition prefixes `n*m`,
; where `n` and `m` are natural numbers in decimal notation;
; if absent,
; `n` defaults to 0, and
; `m` defaults to infinity.
; For example,
; `1*4HEXDIG` denotes one to four `HEXDIG`s,
; `*3DIGIT` denotes up to three `DIGIT`s, and
; `1*OCTET` denotes one or more `OCTET`s.
; A single `n` prefix
; abbreviates `n*n`,
; e.g. `3DIGIT` denotes three `DIGIT`s.
; Instead of BNF's `|`, ABNF uses `/` to separate alternatives.
; Repetition prefixes have precedence over juxtapositions,
; which have precedence over `/`.
; Round brackets group things and override the aforementioned precedence rules,
; e.g. `*(WSP / CRLF WSP)` denotes sequences of terminals
; obtained by repeating, zero or more times,
; either (i) a `WSP` or (ii) a `CRLF` followed by a `WSP`.
; Square brackets also group things but make them optional,
; e.g. `[":" port]` is equivalent to `0*1(":" port)`.
; Instead of BNF's `::=`, ABNF uses `=` to define nonterminals,
; and `=/` to incrementally add alternatives
; to previously defined nonterminals.
; For example, the rule `BIT = "0" / "1"`
; is equivalent to `BIT = "0"` followed by `BIT =/ "1"`.
; The syntax of ABNF itself is formally specified in ABNF
; (in Section 4 of the aforementioned RFC 5234,
; after the syntax and semantics of ABNF
; are informally specified in natural language
; (in Sections 1, 2, and 3 of the aforementioned RFC 5234).
; The syntax rules of ABNF prescribe the ASCII codes allowed for
; white space (spaces and horizontal tabs),
; line endings (carriage returns followed by line feeds),
; and comments (semicolons to line endings).
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Structure
; ---------
; This ABNF grammar consists of one grammar:
; that describes how a Leo string-literal is parsed
; for formatting.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Format String
; -------------------
not-double-quote-or-backslash-or-brace = %x0-21
/ %x23-5B
/ %x5D-7A
/ %x7C
/ %x7E-10FFFF
; anything but " or \ or { or }
double-quote = %x22 ; "
single-quote = %x27 ; '
single-quote-escape = "\" single-quote ; \'
double-quote-escape = "\" double-quote ; \"
backslash-escape = "\\"
line-feed-escape = %s"\n"
carriage-return-escape = %s"\r"
horizontal-tab-escape = %s"\t"
null-character-escape = "\0"
simple-character-escape = single-quote-escape
/ double-quote-escape
/ backslash-escape
/ line-feed-escape
/ carriage-return-escape
/ horizontal-tab-escape
/ null-character-escape
octal-digit = %x30-37 ; 0-7
hexadecimal-digit = digit / "a" / "b" / "c" / "d" / "e" / "f"
ascii-character-escape = %s"\x" octal-digit hexadecimal-digit
unicode-character-escape = %s"\u{" 1*6hexadecimal-digit "}"
format-string-element-not-brace = not-double-quote-or-backslash-or-brace
/ simple-character-escape
/ ascii-character-escape
/ unicode-character-escape
format-string-container = "{}"
format-string-open-brace = "{{"
format-string-close-brace = "}}"
format-string-element = format-string-element-not-brace
/ format-string-container
/ format-string-open-brace
/ format-string-close-brace
format-string = double-quote *format-string-element double-quote