leo/grammar/abnf-grammar.txt

1106 lines
39 KiB
Plaintext
Raw Normal View History

2022-02-02 21:07:27 +03:00
; Copyright (C) 2019-2022 Aleo Systems Inc.
; This file is part of the Leo library.
2021-03-23 19:11:42 +03:00
; The Leo library is free software: you can redistribute it and/or modify
; it under the terms of the GNU General Public License as published by
; the Free Software Foundation, either version 3 of the License, or
; (at your option) any later version.
2021-03-23 19:11:42 +03:00
; The Leo library is distributed in the hope that it will be useful,
; but WITHOUT ANY WARRANTY; without even the implied warranty of
; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
; GNU General Public License for more details.
2021-03-23 19:11:42 +03:00
; You should have received a copy of the GNU General Public License
; along with the Leo library. If not, see <https://www.gnu.org/licenses/>.
2021-03-23 19:11:42 +03:00
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Introduction
; ------------
; This file contains an ABNF (Augmented Backus-Naur Form) grammar of Leo.
2021-03-23 19:11:42 +03:00
; Background on ABNF is provided later in this file.
; This grammar provides an official definition of the syntax of Leo
; that is both human-readable and machine-readable.
; It will be part of an upcoming Leo language reference.
; It may also be used to generate parser tests at some point.
; We are also using this grammar
; as part of a mathematical formalization of the Leo language,
; which we are developing in the ACL2 theorem prover
; and which we plan to publish at some point.
; In particular, we have used a formally verified parser of ABNF grammars
2021-03-23 19:11:42 +03:00
; (at https://github.com/acl2/acl2/tree/master/books/kestrel/abnf;
; also see the paper at https://www.kestrel.edu/people/coglio/vstte18.pdf)
; to parse this grammar into a formal representation of the Leo concrete syntax
; and to validate that the grammar satisfies certain consistency properties.
2021-03-23 19:11:42 +03:00
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Background on ABNF
; ------------------
; ABNF is an Internet standard:
; see RFC 5234 at https://www.rfc-editor.org/info/rfc5234
; and RFC 7405 at https://www.rfc-editor.org/info/rfc7405.
2021-03-23 19:11:42 +03:00
; It is used to specify the syntax of JSON, HTTP, and other standards.
; ABNF adds conveniences and makes slight modifications
; to Backus-Naur Form (BNF),
; without going beyond context-free grammars.
; Instead of BNF's angle-bracket notation for nonterminals,
; ABNF uses case-insensitive names consisting of letters, digits, and dashes,
; e.g. `HTTP-message` and `IPv6address`.
2021-03-23 19:11:42 +03:00
; ABNF includes an angle-bracket notation for prose descriptions,
; e.g. `<host, see [RFC3986], Section 3.2.2>`,
2021-03-23 19:11:42 +03:00
; usable as last resort in the definiens of a nonterminal.
; While BNF allows arbitrary terminals,
; ABNF uses only natural numbers as terminals,
; and denotes them via:
; (i) binary, decimal, or hexadecimal sequences,
; e.g. `%b1.11.1010`, `%d1.3.10`, and `%x.1.3.A`
; all denote the sequence of terminals [1, 3, 10];
2021-03-23 19:11:42 +03:00
; (ii) binary, decimal, or hexadecimal ranges,
; e.g. `%x30-39` denotes any singleton sequence of terminals
; [_n_] with 48 <= _n_ <= 57 (an ASCII digit);
2021-03-23 19:11:42 +03:00
; (iii) case-sensitive ASCII strings,
; e.g. `%s"Ab"` denotes the sequence of terminals [65, 98];
2021-03-23 19:11:42 +03:00
; and (iv) case-insensitive ASCII strings,
; e.g. `%i"ab"`, or just `"ab"`, denotes
; any sequence of terminals among
; [65, 66],
; [65, 98],
; [97, 66], and
; [97, 98].
2021-03-23 19:11:42 +03:00
; ABNF terminals in suitable sets represent ASCII or Unicode characters.
; ABNF allows repetition prefixes `n*m`,
; where `n` and `m` are natural numbers in decimal notation;
2021-03-23 19:11:42 +03:00
; if absent,
; `n` defaults to 0, and
; `m` defaults to infinity.
2021-03-23 19:11:42 +03:00
; For example,
; `1*4HEXDIG` denotes one to four `HEXDIG`s,
; `*3DIGIT` denotes up to three `DIGIT`s, and
; `1*OCTET` denotes one or more `OCTET`s.
; A single `n` prefix
; abbreviates `n*n`,
; e.g. `3DIGIT` denotes three `DIGIT`s.
; Instead of BNF's `|`, ABNF uses `/` to separate alternatives.
2021-03-23 19:11:42 +03:00
; Repetition prefixes have precedence over juxtapositions,
; which have precedence over `/`.
2021-03-23 19:11:42 +03:00
; Round brackets group things and override the aforementioned precedence rules,
; e.g. `*(WSP / CRLF WSP)` denotes sequences of terminals
2021-03-23 19:11:42 +03:00
; obtained by repeating, zero or more times,
; either (i) a `WSP` or (ii) a `CRLF` followed by a `WSP`.
2021-03-23 19:11:42 +03:00
; Square brackets also group things but make them optional,
; e.g. `[":" port]` is equivalent to `0*1(":" port)`.
2021-03-23 19:11:42 +03:00
; Instead of BNF's `::=`, ABNF uses `=` to define nonterminals,
; and `=/` to incrementally add alternatives
2021-03-23 19:11:42 +03:00
; to previously defined nonterminals.
; For example, the rule `BIT = "0" / "1"`
; is equivalent to `BIT = "0"` followed by `BIT =/ "1"`.
2021-03-23 19:11:42 +03:00
; The syntax of ABNF itself is formally specified in ABNF
; (in Section 4 of the aforementioned RFC 5234,
; after the syntax and semantics of ABNF
; are informally specified in natural language
; (in Sections 1, 2, and 3 of the aforementioned RFC 5234).
; The syntax rules of ABNF prescribe the ASCII codes allowed for
; white space (spaces and horizontal tabs),
; line endings (carriage returns followed by line feeds),
; and comments (semicolons to line endings).
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Structure
; ---------
; This ABNF grammar consists of two (sub-)grammars:
2021-03-23 19:11:42 +03:00
; (i) a lexical grammar that describes how
; sequence of characters are parsed into tokens, and
; (ii) a syntactic grammar that describes how
2021-03-23 19:11:42 +03:00
; tokens are parsed into expressions, statements, etc.
; The adjectives 'lexical' and 'syntactic' are
; the same ones used in the Java language reference,
; for instance;
; alternative terms may be used in other languages,
; but the separation into these two components is quite common
; (the situation is sometimes a bit more complex, with multiple passes,
; e.g. Unicode escape processing in Java).
; This separation enables
2021-03-23 19:11:42 +03:00
; concerns of white space, line endings, etc.
; to be handled by the lexical grammar,
; with the syntactic grammar focused on the more important structure.
; Handling both aspects in a single grammar may be unwieldy,
2021-03-23 19:11:42 +03:00
; so having two grammars provides more clarity and readability.
; ABNF is a context-free grammar notation, with no procedural interpretation.
; The two grammars conceptually define two subsequent processing phases,
2021-03-23 19:11:42 +03:00
; as detailed below.
; However, a parser implementation does not need to perform
; two strictly separate phases (in fact, it typically does not),
; so long as it produces the same final result.
; The grammar is accompanied by some extra-grammatical requirements,
; which are not conveniently expressible in a context-free grammar like ABNF.
; These requirements are needed to make the grammar unambiguous,
; i.e. to ensure that, for each sequence of terminals,
; there is exactly one parse tree for that sequence terminals
; that satisfies not only the grammar rules
; but also the extra-grammatical requirements.
; These requirements are expressed as comments in this file.
2021-03-23 19:11:42 +03:00
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Operator Precedence
; -------------------
; We formulate the grammar rules for expressions
; in a way that describes the relative precedence of operators,
; as often done in language syntax specifications.
; For instance, consider the rules
;
; multiplicative-expression =
; exponential-expression
; / multiplicative-expression "*" exponential-expression
; / multiplicative-expression "/" exponential-expression
;
; additive-expression =
; multiplicative-expression
; / additive-expression "+" multiplicative-expression
; / additive-expression "-" multiplicative-expression
;
; These rules tell us
; that the additive operators `+` and `-` have lower precedence
; than the multiplicative operators `*` and `/`,
2021-03-23 19:11:42 +03:00
; and that both the additive and multiplicative operators associate to the left.
; This may be best understood via the examples given below.
; According to the rules, the expression
;
; x + y * z
;
; can only be parsed as
;
; +
; / \
; x *
; / \
; y z
;
; and not as
;
; *
; / \
; + z
; / \
; x y
;
; because a multiplicative expression cannot have an additive expression
; as first sub-expression, as it would in the second tree above.
; Also according to the rules, the expression
;
; x + y + z
;
; can only be parsed as
;
; +
; / \
; + z
; / \
; x y
;
; and not as
;
; +
; / \
; x +
; / \
; y z
;
; because an additive expression cannot have an additive expression
; as second sub-expression, as it would in the second tree above.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Naming Convention
; -----------------
; This ABNF grammar uses nonterminal names
2021-03-23 19:11:42 +03:00
; that consist of complete English words, separated by dashes,
; and that describe the construct the way it is in English.
; For instance, we use the name `conditional-statement`
2021-03-23 19:11:42 +03:00
; to describe conditional statements.
; At the same time, this grammar establishes
; a precise and official nomenclature for the Leo constructs,
2021-03-23 19:11:42 +03:00
; by way of the nonterminal names that define their syntax.
; For instance, the rule
;
; group-literal = product-group-literal
; / affine-group-literal
;
; tells us that there are two kinds of group literals,
; namely product group literals and affine group literals.
; This is more precise than describing them as
; integers (which are not really group elements per se),
; or points (they are all points, just differently specified),
; or being singletons vs. pairs (which is a bit generic).
; The only exception to the nomenclature-establishing role of the grammar
; is the fact that, as discussed above,
; we write the grammar rules in a way that determines
; the relative precedence and the associativity of expression operators,
; and therefore we have rules like
2021-03-23 19:11:42 +03:00
;
; unary-expression = primary-expression
; / "!" unary-expression
; / "-" unary-expression
;
; In order to allow the recursion of the rule to stop,
; we need to regard, in the grammar, a primary expression as a unary expression
; (i.e. a primary expression is also a unary expression in the grammar;
; but note that the opposite is not true).
2021-03-23 19:11:42 +03:00
; However, this is just a grammatical artifact:
; ontologically, a primary expression is not really a unary expression,
; because a unary expression is one that consists of
; a unary operator and an operand sub-expression.
; These terminological exceptions should be easy to identify in the rules.
2021-03-23 19:11:42 +03:00
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Lexical Grammar
; ---------------
; A Leo file is a finite sequence of Unicode characters,
; represented as Unicode code points,
2021-05-13 05:13:23 +03:00
; which are numbers in the range from 0 to 10FFFF.
; These are captured by the ABNF rule `character` below.
2021-03-23 19:11:42 +03:00
; The lexical grammar defines how, at least conceptually,
2021-03-23 19:11:42 +03:00
; the sequence of characters is turned into
; a sequence of tokens, comments, and whitespaces:
; these entities are all defined by the grammar rules below.
2021-03-23 19:11:42 +03:00
; As stated, the lexical grammar alone is ambiguous.
; For example, the sequence of characters `**` (i.e. two stars)
; could be equally parsed as two `*` symbol tokens or one `**` symbol token
; (see rule for `symbol` below).
; As another example, the sequence or characters `<CR><LF>`
2021-03-23 19:11:42 +03:00
; (i.e. carriage return followed by line feed)
; could be equally parsed as two line terminators or one
2022-02-02 21:07:27 +03:00
; (see rule for `line-terminator`).
2021-03-23 19:11:42 +03:00
; Thus, as often done in language syntax definitions,
; the lexical grammar is disambiguated by
; the extra-grammatical requirement that
; the longest possible sequence of characters is always parsed.
; This way, `**` must be parsed as one `**` symbol token,
; and `<CR><LF>` must be parsed as one line terminator.
2021-03-23 19:11:42 +03:00
; As mentioned above, a character is any Unicode code point.
; This grammar does not say how those are encoded in files (e.g. UTF-8):
; it starts with a decoded sequence of Unicode code points.
; Note that we allow any value,
; even though some values may not be used according to the Unicode standard.
character = %x0-10FFFF ; any Unicode code point
; We give names to certain ASCII characters.
horizontal-tab = %x9 ; <HT>
2021-03-23 19:11:42 +03:00
line-feed = %xA ; <LF>
2021-03-23 19:11:42 +03:00
carriage-return = %xD ; <CR>
2021-03-23 19:11:42 +03:00
space = %x20 ; <SP>
2021-03-23 19:11:42 +03:00
double-quote = %x22 ; "
2021-03-23 19:11:42 +03:00
single-quote = %x27 ; '
2021-03-23 19:11:42 +03:00
; We give names to complements of certain ASCII characters.
; These consist of all the Unicode characters except for one or two.
not-star = %x0-29 / %x2B-10FFFF ; anything but *
2022-02-02 21:07:27 +03:00
not-star-or-slash = %x0-29 / %x2B-2E / %x30-10FFFF ; anything but * or /
2021-03-23 19:11:42 +03:00
not-line-feed-or-carriage-return = %x0-9 / %xB-C / %xE-10FFFF
; anything but <LF> or <CR>
2021-03-23 19:11:42 +03:00
not-double-quote-or-backslash = %x0-21 / %x23-5B / %x5D-10FFFF
; anything but " or \
not-single-quote-or-backslash = %x0-26 / %x28-5B / %x5D-10FFFF
; anything but ' or \
2021-03-23 19:11:42 +03:00
; Lines in Leo may be terminated via
; a single carriage return,
; a line feed,
; or a carriage return immediately followed by a line feed.
; Note that the latter combination constitutes a single line terminator,
; according to the extra-grammatical requirement of the longest sequence,
; described above.
2021-03-23 19:11:42 +03:00
2022-02-02 21:07:27 +03:00
line-terminator = line-feed / carriage-return / carriage-return line-feed
2021-03-23 19:11:42 +03:00
; Line terminators form whitespace, along with spaces and horizontal tabs.
2022-02-02 21:07:27 +03:00
whitespace = space / horizontal-tab / line-terminator
2021-03-23 19:11:42 +03:00
; There are two kinds of comments in Leo, as in other languages.
; One is block comments of the form `/* ... */`,
; and the other is end-of-line comments of the form `// ...`.
; The first kind start at `/*` and end at the first `*/`,
2021-03-23 19:11:42 +03:00
; possibly spanning multiple (partial) lines;
; these do no nest.
; The second kind start at `//` and extend till the end of the line.
2021-03-23 19:11:42 +03:00
; The rules about comments given below are similar to
; the ones used in the Java language reference.
2021-03-23 19:11:42 +03:00
comment = block-comment / end-of-line-comment
block-comment = "/*" rest-of-block-comment
rest-of-block-comment = "*" rest-of-block-comment-after-star
/ not-star rest-of-block-comment
rest-of-block-comment-after-star = "/"
/ "*" rest-of-block-comment-after-star
/ not-star-or-slash rest-of-block-comment
2022-02-02 21:07:27 +03:00
end-of-line-comment = "//" *not-line-feed-or-carriage-return
2021-03-23 19:11:42 +03:00
; Below are the keywords in the Leo language.
; They cannot be used as identifiers.
keyword = %s"address"
/ %s"as"
/ %s"bool"
/ %s"char"
2021-03-23 19:11:42 +03:00
/ %s"circuit"
/ %s"console"
/ %s"const"
/ %s"else"
/ %s"field"
/ %s"for"
/ %s"function"
/ %s"group"
/ %s"i8"
/ %s"i16"
/ %s"i32"
/ %s"i64"
/ %s"i128"
/ %s"if"
/ %s"import"
/ %s"in"
/ %s"input"
/ %s"let"
/ %s"return"
/ %s"Self"
/ %s"self"
/ %s"static"
2021-08-19 16:04:44 +03:00
/ %s"type"
2021-03-23 19:11:42 +03:00
/ %s"u8"
/ %s"u16"
/ %s"u32"
/ %s"u64"
/ %s"u128"
; The following rules define (ASCII) letters.
2021-03-23 19:11:42 +03:00
uppercase-letter = %x41-5A ; A-Z
lowercase-letter = %x61-7A ; a-z
letter = uppercase-letter / lowercase-letter
; The following rules defines (ASCII) decimal, octal, and hexadecimal digits.
; Note that the latter are case-insensitive.
decimal-digit = %x30-39 ; 0-9
octal-digit = %x30-37 ; 0-7
hexadecimal-digit = decimal-digit / "a" / "b" / "c" / "d" / "e" / "f"
; An identifier is a non-empty sequence of
; letters, (decimal) digits, and underscores,
2021-03-23 19:11:42 +03:00
; starting with a letter.
; It must not be a keyword: this is an extra-grammatical requirement.
; It must also not be or start with `aleo1`,
; because that is used for address literals:
; this is another extra-grammatical requirement.
2021-03-23 19:11:42 +03:00
identifier = letter *( letter / decimal-digit / "_" )
; but not a keyword or a boolean literal or aleo1...
2021-03-23 19:11:42 +03:00
; Annotations have names, which are identifiers immediately preceded by `@`.
2021-03-23 19:11:42 +03:00
annotation-name = "@" identifier
2021-03-23 19:11:42 +03:00
; A natural (number) is a sequence of one or more decimal digits.
; We allow leading zeros, e.g. `007`.
2021-03-23 19:11:42 +03:00
natural = 1*decimal-digit
2021-03-23 19:11:42 +03:00
; An integer (number) is either a natural or its negation.
; We allow leading zeros also in negative numbers, e.g. `-007`.
2021-03-23 19:11:42 +03:00
integer = [ "-" ] natural
; An untyped literal is just an integer.
untyped-literal = integer
; Unsigned literals are naturals followed by unsigned types.
unsigned-literal = natural ( %s"u8" / %s"u16" / %s"u32" / %s"u64" / %s"u128" )
; Signed literals are integers followed by signed types.
signed-literal = integer ( %s"i8" / %s"i16" / %s"i32" / %s"i64" / %s"i128" )
; Field literals are integers followed by the type of field elements.
field-literal = integer %s"field"
; There are two kinds of group literals.
; One is a single integer followed by the type of group elements,
; which denotes the scalar product of the generator point by the integer.
; The other kind is not a token because it allows some whitespace inside;
; therefore, it is defined in the syntactic grammar.
2021-03-23 19:11:42 +03:00
product-group-literal = integer %s"group"
; Boolean literals are the usual two.
boolean-literal = %s"true" / %s"false"
; An address literal starts with `aleo1`
; and continues with exactly 58 lowercase letters and (decimal) digits.
; Thus an address always consists of 63 characters.
2021-03-23 19:11:42 +03:00
address-literal = %s"aleo1" 58( lowercase-letter / decimal-digit )
2021-03-23 19:11:42 +03:00
; A character literal consists of an element surrounded by single quotes.
; The element is any character other than single quote or backslash,
; or an escape, which starts with backslash.
; There are simple escapes with a single character,
; ASCII escapes with an octal and a hexadecimal digit
; (which together denote a number between 0 and 127),
; and Unicode escapes with one to six hexadecimal digits
; (which must not exceed 10FFFF).
character-literal = single-quote character-literal-element single-quote
character-literal-element = not-single-quote-or-backslash
/ simple-character-escape
/ ascii-character-escape
/ unicode-character-escape
single-quote-escape = "\" single-quote ; \'
double-quote-escape = "\" double-quote ; \"
backslash-escape = "\\"
line-feed-escape = %s"\n"
carriage-return-escape = %s"\r"
horizontal-tab-escape = %s"\t"
null-character-escape = "\0"
simple-character-escape = single-quote-escape
/ double-quote-escape
/ backslash-escape
/ line-feed-escape
/ carriage-return-escape
/ horizontal-tab-escape
/ null-character-escape
ascii-character-escape = %s"\x" octal-digit hexadecimal-digit
unicode-character-escape = %s"\u{" 1*6hexadecimal-digit "}"
; A string literal consists of one or more elements surrounded by double quotes.
; Each element is any character other than double quote or backslash,
; or an escape among the same ones used for elements of character literals.
2021-07-21 00:52:35 +03:00
string-literal = double-quote *string-literal-element double-quote
string-literal-element = not-double-quote-or-backslash
/ simple-character-escape
/ ascii-character-escape
/ unicode-character-escape
; The ones above are all the atomic literals
; (in the sense that they are tokens, without whitespace allowed in them),
; as defined by the following rule.
2021-03-23 19:11:42 +03:00
atomic-literal = untyped-literal
/ unsigned-literal
/ signed-literal
/ field-literal
/ product-group-literal
/ boolean-literal
/ address-literal
/ character-literal
/ string-literal
2021-03-23 19:11:42 +03:00
; After defining the (mostly) alphanumeric tokens above,
; it remains to define tokens for non-alphanumeric symbols such as `+` and `(`.
2021-03-23 19:11:42 +03:00
; Different programming languages used different terminologies for these,
; e.g. operators, separators, punctuators, etc.
; Here we use `symbol`, for all of them.
; We also include a token consisting of
; a closing parenthesis `)` immediately followed by `group`:
; as defined in the syntactic grammar,
; this is the final part of an affine group literal;
; even though it includes letters,
; it seems appropriate to still consider it a symbol,
; particularly since it starts with a proper symbol.
2022-02-02 21:07:27 +03:00
symbol = "!" / "&" / "&&" / "||"
2021-03-23 19:11:42 +03:00
/ "==" / "!="
/ "<" / "<=" / ">" / ">="
/ "+" / "-" / "*" / "/" / "**"
/ "=" / "+=" / "-=" / "*=" / "/=" / "**="
/ "(" / ")"
/ "[" / "]"
/ "{" / "}"
/ "," / "." / ".." / "..." / ";" / ":" / "::" / "?"
2021-03-24 18:40:26 +03:00
/ "->" / "_"
/ %s")group"
2021-03-23 19:11:42 +03:00
; Everything defined above, other than comments and whitespace,
; is a token, as defined by the following rule.
token = keyword
/ identifier
/ atomic-literal
2021-03-23 19:11:42 +03:00
/ annotation-name
/ symbol
; Tokens, comments, and whitespace are lexemes, i.e. lexical units.
lexeme = token / comment / whitespace
2021-03-23 19:11:42 +03:00
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Syntactic Grammar
; -----------------
; The processing defined by the lexical grammar above
; turns the initial sequence of characters
; into a sequence of tokens, comments, and whitespaces.
; The purpose of comments and whitespaces, from a syntactic point of view,
; is just to separate tokens:
; they are discarded, leaving a sequence of tokens.
; The syntactic grammar describes how to turn
; a sequence of tokens into concrete syntax trees.
; There are unsigned and signed integer types, for five sizes.
unsigned-type = %s"u8" / %s"u16" / %s"u32" / %s"u64" / %s"u128"
signed-type = %s"i8" / %s"i16" / %s"i32" / %s"i64" / %s"i128"
integer-type = unsigned-type / signed-type
; The integer types, along with the field and group types,
; for the arithmetic types, i.e. the ones that support arithmetic operations.
field-type = %s"field"
group-type = %s"group"
arithmetic-type = integer-type / field-type / group-type
; The arithmetic types, along with the boolean, address, and character types,
; form the scalar types, i.e. the ones whose values do not contain (sub-)values.
2021-03-23 19:11:42 +03:00
boolean-type = %s"bool"
address-type = %s"address"
character-type = %s"char"
scalar-type = boolean-type / arithmetic-type / address-type / character-type
2021-03-23 19:11:42 +03:00
; A tuple type consists of zero, two, or more component types.
tuple-type = "(" [ type 1*( "," type ) ] ")"
; An array type consists of an element type
; and an indication of dimensions.
; There is either a single dimension,
; or a tuple of one or more dimensions.
; Each dimension is either a natural or is unspecified.
2021-03-23 19:11:42 +03:00
array-type = "[" type ";" array-type-dimensions "]"
2021-03-23 19:11:42 +03:00
array-type-dimension = natural / "_"
array-type-dimensions = array-type-dimension
/ "(" array-type-dimension
2022-02-02 21:07:27 +03:00
*( "," array-type-dimension ) [","] ")"
2021-03-23 19:11:42 +03:00
; The keyword `Self` denotes the enclosing circuit type.
; It is only allowed inside a circuit type declaration.
self-type = %s"Self"
; Circuit types are denoted by identifiers and by `Self`.
; Identifiers may also be type aliases;
; syntactically (i.e. without a semantic analysis),
; they cannot be distinguished from circuit types.
; Scalar types, tuple types, array types,
; identifiers (which may be circuit types or type aliases),
; and the `Self` type
; form all the types.
type = scalar-type / tuple-type / array-type / identifier / self-type
; It is convenient to introduce a rule for types that are
; either identifiers or `Self`, as this is used in other rules.
2021-03-23 19:11:42 +03:00
identifier-or-self-type = identifier / self-type
2021-03-23 19:11:42 +03:00
2022-02-02 21:07:27 +03:00
; A named type is an identifier, the `Self` type, or a scalar type.
; These are types that are named by identifiers or keywords.
named-type = identifier / self-type / scalar-type
; The lexical grammar given earlier defines product group literals.
; The other kind of group literal is a pair of integer coordinates,
; which are reduced modulo the prime to identify a point,
; which must be on the elliptic curve.
; It is also allowed to omit one coordinate (not both),
; with an indication of how to fill in the missing coordinate
; (i.e. sign high, sign low, or inferred).
; This is an affine group literal,
; because it consists of affine point coordinates.
group-coordinate = integer / "+" / "-" / "_"
affine-group-literal = "(" group-coordinate "," group-coordinate %s")group"
; A literal is either an atomic one or an affine group literal.
literal = atomic-literal / affine-group-literal
; The following rule is not directly referenced in the rules for expressions
; (which reference `literal` instead),
; but it is useful to establish terminology:
; a group literal is either a product group literal or an affine group literal.
group-literal = product-group-literal / affine-group-literal
2021-03-23 19:11:42 +03:00
; As often done in grammatical language syntax specifications,
; we define rules for different kinds of expressions,
; which also defines the relative precedence
; of operators and other expression constructs,
; and the (left or right) associativity of binary operators.
; The primary expressions are self-contained in a way,
2021-04-27 06:30:15 +03:00
; i.e. they have clear delimitations:
; Some consist of single tokens,
; while others have explicit endings.
2021-03-23 19:11:42 +03:00
; Primary expressions also include parenthesized expressions,
; i.e. any expression may be turned into a primary one
; by putting parentheses around it.
primary-expression = identifier
/ %s"self"
/ %s"input"
/ literal
/ "(" expression ")"
/ tuple-expression
/ array-expression
/ circuit-expression
; Tuple expressions construct tuples.
; Each consists of zero, two, or more component expressions.
2021-03-23 19:11:42 +03:00
tuple-construction = "(" [ expression 1*( "," expression ) ] ")"
2021-03-25 23:24:01 +03:00
tuple-expression = tuple-construction
2021-03-23 19:11:42 +03:00
; Array expressions construct arrays.
; There are two kinds:
2021-03-23 19:11:42 +03:00
; one lists the element expressions (at least one),
; including spreads (via `...`) which are arrays being spliced in;
2021-03-23 19:11:42 +03:00
; the other repeats (the value of) a single expression
; across one or more dimensions.
array-inline-construction = "["
array-inline-element
*( "," array-inline-element )
"]"
array-inline-element = expression / "..." expression
array-repeat-construction = "[" expression ";" array-expression-dimensions "]"
array-expression-dimensions = natural
/ "(" natural *( "," natural ) ")"
2021-03-23 19:11:42 +03:00
array-construction = array-inline-construction / array-repeat-construction
array-expression = array-construction
2021-03-23 19:11:42 +03:00
; Circuit expressions construct circuit values.
; Each lists values for all the member variables (in any order);
; there may be zero or more member variables.
2021-03-23 20:45:57 +03:00
; A single identifier abbreviates
; a pair consisting of the same identifier separated by colon;
2021-03-23 20:45:57 +03:00
; note that, in the expansion, the left one denotes a member name,
; while the right one denotes an expression (a variable),
; so they are syntactically identical but semantically different.
2021-03-23 19:11:42 +03:00
circuit-construction = identifier-or-self-type "{"
[ circuit-inline-element
*( "," circuit-inline-element ) [ "," ] ]
2021-03-23 19:11:42 +03:00
"}"
2021-03-23 20:45:57 +03:00
circuit-inline-element = identifier ":" expression / identifier
2021-03-23 19:11:42 +03:00
circuit-expression = circuit-construction
2021-03-23 19:11:42 +03:00
; After primary expressions, postfix expressions have highest precedence.
; They apply to primary expressions, and recursively to postfix expressions.
; There are postfix expressions to access parts of aggregate values.
; A tuple access selects a component by index (zero-based).
; There are two kinds of array accesses:
; one selects a single element by index (zero-based);
; the other selects a range via two indices,
; the first inclusive and the second exclusive --
; both are optional,
; the first defaulting to 0 and the second to the array length.
; A circuit access selects a member variable by name.
; Function calls are also postfix expressions.
2021-03-23 19:11:42 +03:00
; There are three kinds of function calls:
; top-level function calls,
; instance (i.e. non-static) member function calls, and
; static member function calls.
; What changes is the start, but they all end in an argument list.
2021-03-23 19:11:42 +03:00
2022-02-02 21:07:27 +03:00
; Accesses to static constants are also postfix expressions.
; They consist of a named type followed by the constant name,
; as static constants are associated to named types.
2021-03-23 19:11:42 +03:00
function-arguments = "(" [ expression *( "," expression ) ] ")"
postfix-expression = primary-expression
/ postfix-expression "." natural
/ postfix-expression "." identifier
/ identifier function-arguments
/ postfix-expression "." identifier function-arguments
2022-02-02 21:07:27 +03:00
/ named-type "::" identifier function-arguments
/ named-type "::" identifier
/ postfix-expression "[" expression "]"
/ postfix-expression "[" [expression] ".." [expression] "]"
; Unary operators have the highest operator precedence.
; They apply to postfix expressions,
2021-03-23 19:11:42 +03:00
; and recursively to unary expressions.
unary-expression = postfix-expression
2021-03-23 19:11:42 +03:00
/ "!" unary-expression
/ "-" unary-expression
; Next in the operator precedence is exponentiation,
; following mathematical practice.
; The current rule below makes exponentiation right-associative,
; i.e. `a ** b ** c` must be parsed as `a ** (b ** c)`.
2021-03-23 19:11:42 +03:00
exponential-expression = unary-expression
/ unary-expression "**" exponential-expression
2021-03-23 19:11:42 +03:00
; Next in precedence come multiplication and division, both left-associative.
multiplicative-expression = exponential-expression
/ multiplicative-expression "*" exponential-expression
/ multiplicative-expression "/" exponential-expression
; Then there are addition and subtraction, both left-assocative.
additive-expression = multiplicative-expression
/ additive-expression "+" multiplicative-expression
/ additive-expression "-" multiplicative-expression
; Next in the precedence order are ordering relations.
; These are not associative, because they return boolean values.
ordering-expression = additive-expression
/ additive-expression "<" additive-expression
/ additive-expression ">" additive-expression
/ additive-expression "<=" additive-expression
/ additive-expression ">=" additive-expression
; Equalities return booleans but may also operate on booleans;
; the rule below makes them left-associative.
2021-03-23 19:11:42 +03:00
equality-expression = ordering-expression
2021-07-01 23:40:29 +03:00
/ ordering-expression "==" ordering-expression
/ ordering-expression "!=" ordering-expression
2021-03-23 19:11:42 +03:00
; Next come conjunctive expressions, left-associative.
conjunctive-expression = equality-expression
/ conjunctive-expression "&&" equality-expression
; Next come disjunctive expressions, left-associative.
disjunctive-expression = conjunctive-expression
/ disjunctive-expression "||" conjunctive-expression
2021-03-24 20:38:45 +03:00
; Finally we have conditional expressions.
2021-03-23 19:11:42 +03:00
conditional-expression = disjunctive-expression
2022-02-02 22:23:05 +03:00
/ disjunctive-expression "?" expression ":" expression
2021-03-23 19:11:42 +03:00
; Those above are all the expressions.
2021-03-23 19:11:42 +03:00
; Recall that conditional expressions
; may be disjunctive expressions,
; which may be conjunctive expressions,
; and so on all the way to primary expressions.
expression = conditional-expression
; There are various kinds of statements, including blocks.
; Blocks are possibly empty sequences of statements surrounded by curly braces.
2021-03-23 19:11:42 +03:00
statement = expression-statement
/ return-statement
2021-04-28 19:41:42 +03:00
/ variable-declaration
/ constant-declaration
2021-03-23 19:11:42 +03:00
/ conditional-statement
/ loop-statement
/ assignment-statement
/ console-statement
/ block
block = "{" *statement "}"
; An expression (that must return the empty tuple, as semantically required)
2021-03-23 19:11:42 +03:00
; can be turned into a statement by appending a semicolon.
expression-statement = expression ";"
; A return statement always takes an expression, and ends with a semicolon.
2021-03-23 19:11:42 +03:00
return-statement = %s"return" expression ";"
2021-03-23 19:11:42 +03:00
; There are variable declarations and constant declarations,
2021-03-23 19:11:42 +03:00
; which only differ in the starting keyword.
; These declarations are also statements.
; The names of the variables or constants are
; either a single one or a tuple of two or more;
2021-03-23 19:11:42 +03:00
; in all cases, there is just one optional type
; and just one initializing expression.
variable-declaration = %s"let" identifier-or-identifiers [ ":" type ]
"=" expression ";"
2021-04-28 19:41:42 +03:00
constant-declaration = %s"const" identifier-or-identifiers [ ":" type ]
"=" expression ";"
2021-03-23 19:11:42 +03:00
identifier-or-identifiers = identifier
/ "(" identifier 1*( "," identifier ) ")"
; A conditional statement always starts with a condition and a block
; (which together form a branch).
; It may stop there, or it may continue with an alternative block,
; or possibly with another conditional statement, forming a chain.
; Note that blocks are required in all branches, not merely statements.
2021-03-23 19:11:42 +03:00
branch = %s"if" expression block
conditional-statement = branch
/ branch %s"else" block
/ branch %s"else" conditional-statement
; A loop statement implicitly defines a loop variable
; that goes from a starting value (inclusive) to an ending value (exclusive).
; The body is a block.
loop-statement = %s"for" identifier %s"in" expression ".." [ "=" ] expression
block
2021-03-23 19:11:42 +03:00
; An assignment statement is straightforward.
; Based on the operator, the assignment may be simple (i.e. `=`)
2021-03-23 19:11:42 +03:00
; or compound (i.e. combining assignment with an arithmetic operation).
assignment-operator = "=" / "+=" / "-=" / "*=" / "/=" / "**="
assignment-statement = expression assignment-operator expression ";"
; Console statements start with the `console` keyword,
2021-03-23 19:11:42 +03:00
; followed by a console function call.
; The call may be an assertion or a print command.
; The former takes an expression (which must be boolean) as argument.
; The latter takes either no argument,
; or a format string followed by expressions,
; whose number must match the number of containers `{}` in the format string.
2021-03-23 19:11:42 +03:00
; Note that the console function names are identifiers, not keywords.
; There are three kinds of print commands.
2021-03-23 19:11:42 +03:00
console-statement = %s"console" "." console-call ";"
2021-03-23 19:11:42 +03:00
console-call = assert-call
/ print-call
assert-call = %s"assert" "(" expression ")"
print-function = %s"debug" / %s"error" / %s"log"
2021-07-01 08:20:06 +03:00
print-arguments = "(" string-literal *( "," expression ) ")"
2021-03-23 19:11:42 +03:00
print-call = print-function print-arguments
; An annotation consists of an annotation name (which starts with `@`)
; with optional annotation arguments, which are identifiers.
2021-03-23 19:11:42 +03:00
annotation = annotation-name
2022-02-02 21:07:27 +03:00
[ "(" *( identifier "," ) [ identifier ] ")" ]
2021-03-23 19:11:42 +03:00
; A function declaration defines a function.
; The output type is optional, defaulting to the empty tuple type.
2021-03-23 19:11:42 +03:00
; In general, a function input consists of an identifier and a type,
; with an optional 'const' modifier.
; Additionally, functions inside circuits
2022-02-02 21:07:27 +03:00
; may start with a `&self` or `const self` or `self` parameter.
2021-03-23 19:11:42 +03:00
2022-02-02 21:07:27 +03:00
function-declaration = *annotation [ %s"const" ] %s"function" identifier
2021-03-23 19:11:42 +03:00
"(" [ function-parameters ] ")" [ "->" type ]
block
function-parameters = self-parameter
/ self-parameter "," function-inputs
/ function-inputs
2021-03-23 19:11:42 +03:00
2022-02-02 21:07:27 +03:00
self-parameter = [ %s"&" / %s"const" ] %s"self"
2021-03-23 19:11:42 +03:00
function-inputs = function-input *( "," function-input )
function-input = [ %s"const" ] identifier ":" type
2022-02-02 21:07:27 +03:00
; A circuit member constant declaration consists of
; the `static` and `const` keywords followed by
; an identifier and a type, then an initializer
; with a literal terminated by semicolon.
member-constant-declaration = %s"static" %s"const" identifier ":" type
"=" literal ";"
; A circuit member variable declaration consists of
; an identifier and a type, terminated by semicolon.
; For backward compatibility,
; member variable declarations may be alternatively followed by commas,
; and the last one may not be followed by anything:
; these are deprecated, and will be eventually removed,
; leaving only mandatory semicolons.
; Note that there is no rule for a single `member-variable-declaration`,
; but instead one for a sequence of them;
; see the rule `circuit-declaration`.
member-variable-declarations = *( identifier ":" type ( "," / ";" ) )
identifier ":" type ( [ "," ] / ";" )
2021-03-23 19:11:42 +03:00
; A circuit member function declaration consists of a function declaration.
2021-03-23 19:11:42 +03:00
member-function-declaration = function-declaration
; A circuit declaration defines a circuit type,
2022-02-02 21:07:27 +03:00
; as consisting of member constants, variables and functions.
; A circuit member constant declaration
; as consisting of member constants, member variables, and member functions.
; To more simply accommodate the backward compatibility
; described for the rule `member-variable-declarations`,
; all the member variables must precede all the member functions;
; this may be relaxed after the backward compatibility is removed,
; allowing member variables and member functions to be intermixed.
2021-03-23 19:11:42 +03:00
2021-07-01 08:20:06 +03:00
circuit-declaration = %s"circuit" identifier
2022-02-02 21:07:27 +03:00
"{" *member-constant-declaration
[ member-variable-declarations ]
*member-function-declaration "}"
2021-03-23 19:11:42 +03:00
; An import declaration consists of the `import` keyword
2021-03-23 19:11:42 +03:00
; followed by a package path, which may be one of the following:
; a single wildcard;
; an identifier, optionally followed by a local renamer;
; an identifier followed by a path, recursively;
2021-03-23 19:11:42 +03:00
; or a parenthesized list of package paths,
; which are "fan out" of the initial path.
; Note that we allow the last element of the parenthesized list
; to be followed by a comma, for convenience.
; The package path in the import declaration must start with an identifier
; (e.g. it cannot be a `*`):
; the rule for import declaration expresses this requirement
; by using an explicit identifier before the package path.
2021-03-23 19:11:42 +03:00
import-declaration = %s"import" identifier "." package-path ";"
2021-03-23 19:11:42 +03:00
package-path = "*"
/ identifier [ %s"as" identifier ]
/ identifier "." package-path
2021-03-23 19:11:42 +03:00
/ "(" package-path *( "," package-path ) [","] ")"
; A type alias declaration defines an identifier to stand for a type.
2021-08-19 16:04:44 +03:00
type-alias-declaration = %s"type" identifier "=" type ";"
2021-03-23 19:11:42 +03:00
; Finally, we define a file as a sequence of zero or more declarations.
; We allow constant declarations at the top level, for global constants.
; Currently variable declarations are disallowed at the top level.
2021-03-23 19:11:42 +03:00
declaration = import-declaration
/ function-declaration
/ circuit-declaration
2021-04-28 19:41:42 +03:00
/ constant-declaration
2021-08-19 16:04:44 +03:00
/ type-alias-declaration
2021-03-23 19:11:42 +03:00
file = *declaration
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Format Note
; -----------
; The ABNF standard requires grammars
; to consist of lines terminated by `<CR><LF>`
; (i.e. carriage return followed by line feed, DOS/Windows-style),
; as explained in the background on ABNF earlier in this file.
; This file's lines are therefore terminated by `<CR><LF>`.
; To avoid losing this requirement across systems,
; this file is marked as `text eol=crlf` in `.gitattributes`:
; this means that the file is textual, enabling visual diffs,
; but its lines will always be terminated by `<CR><LF>` on any system.
; Note that this `<CR><LF>` requirement only applies
; to the grammar files themselves.
; It does not apply to the lines of the languages described by the grammar.
; ABNF grammars may describe any kind of languages,
; with any kind of line terminators,
; or even without line terminators at all (e.g. for "binary" languages).