Merge pull request #890 from AleoHQ/feature/abnf-improve-doc

Feature/abnf improve doc
This commit is contained in:
Collin Chin 2021-04-22 11:22:40 -07:00 committed by GitHub
commit f972794cbf
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 217 additions and 224 deletions

View File

@ -14,6 +14,7 @@ GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with the Leo library. If not, see <https://www.gnu.org/licenses/>.
--------
@ -56,59 +57,59 @@ without going beyond context-free grammars.
Instead of BNF's angle-bracket notation for nonterminals,
ABNF uses case-insensitive names consisting of letters, digits, and dashes,
e.g. HTTP-message and IPv6address.
e.g. `HTTP-message` and `IPv6address`.
ABNF includes an angle-bracket notation for prose descriptions,
e.g. <host, see [RFC3986], Section 3.2.2>,
e.g. `<host, see [RFC3986], Section 3.2.2>`,
usable as last resort in the definiens of a nonterminal.
While BNF allows arbitrary terminals,
ABNF uses only natural numbers as terminals,
and denotes them via:
(i) binary, decimal, or hexadecimal sequences,
e.g. %b1.11.1010, %d1.3.10, and %x.1.3.A
all denote the sequence of terminals '1 3 10';
e.g. `%b1.11.1010`, `%d1.3.10`, and `%x.1.3.A`
all denote the sequence of terminals [1, 3, 10];
(ii) binary, decimal, or hexadecimal ranges,
e.g. %x30-39 denotes any singleton sequence of terminals
'n' with 48 <= n <= 57 (an ASCII digit);
e.g. `%x30-39` denotes any singleton sequence of terminals
[_n_] with 48 <= _n_ <= 57 (an ASCII digit);
(iii) case-sensitive ASCII strings,
e.g. %s"Ab" denotes the sequence of terminals '65 98';
e.g. `%s"Ab"` denotes the sequence of terminals [65, 98];
and (iv) case-insensitive ASCII strings,
e.g. %i"ab", or just "ab", denotes
e.g. `%i"ab"`, or just `"ab"`, denotes
any sequence of terminals among
'65 66',
'65 98',
'97 66', and
'97 98'.
[65, 66],
[65, 98],
[97, 66], and
[97, 98].
ABNF terminals in suitable sets represent ASCII or Unicode characters.
ABNF allows repetition prefixes n*m,
where n and m are natural numbers in decimal notation;
ABNF allows repetition prefixes `n*m`,
where `n` and `m` are natural numbers in decimal notation;
if absent,
n defaults to 0, and
m defaults to infinity.
`n` defaults to 0, and
`m` defaults to infinity.
For example,
1*4HEXDIG denotes one to four HEXDIGs,
*3DIGIT denotes up to three DIGITs, and
1*OCTET denotes one or more OCTETs.
A single n prefix
abbreviates n*n,
e.g. 3DIGIT denotes three DIGITs.
`1*4HEXDIG` denotes one to four `HEXDIG`s,
`*3DIGIT` denotes up to three `DIGIT`s, and
`1*OCTET` denotes one or more `OCTET`s.
A single `n` prefix
abbreviates `n*n`,
e.g. `3DIGIT` denotes three `DIGIT`s.
Instead of BNF's |, ABNF uses / to separate alternatives.
Instead of BNF's `|`, ABNF uses `/` to separate alternatives.
Repetition prefixes have precedence over juxtapositions,
which have precedence over /.
which have precedence over `/`.
Round brackets group things and override the aforementioned precedence rules,
e.g. *(WSP / CRLF WSP) denotes sequences of terminals
e.g. `*(WSP / CRLF WSP)` denotes sequences of terminals
obtained by repeating, zero or more times,
either (i) a WSP or (ii) a CRLF followed by a WSP.
either (i) a `WSP` or (ii) a `CRLF` followed by a `WSP`.
Square brackets also group things but make them optional,
e.g. [":" port] is equivalent to 0*1(":" port).
e.g. `[":" port]` is equivalent to `0*1(":" port)`.
Instead of BNF's ::=, ABNF uses = to define nonterminals,
and =/ to incrementally add alternatives
Instead of BNF's `::=`, ABNF uses `=` to define nonterminals,
and `=/` to incrementally add alternatives
to previously defined nonterminals.
For example, the rule BIT = "0" / "1"
is equivalent to BIT = "0" followed by BIT =/ "1".
For example, the rule `BIT = "0" / "1"`
is equivalent to `BIT = "0"` followed by `BIT =/ "1"`.
The syntax of ABNF itself is formally specified in ABNF
(in Section 4 of the aforementioned RFC 5234,
@ -130,7 +131,7 @@ Structure
This ABNF grammar consists of two (sub-)grammars:
(i) a lexical grammar that describes how
sequence of characters are parsed into tokens, and
(ii) a syntactic grammar that described how
(ii) a syntactic grammar that describes how
tokens are parsed into expressions, statements, etc.
The adjectives 'lexical' and 'syntactic' are
the same ones used in the Java language reference,
@ -197,8 +198,8 @@ additive-expression =
These rules tell us
that the additive operators '+' and '-' have lower precedence
than the multiplicative operators '*' and '/',
that the additive operators `+` and `-` have lower precedence
than the multiplicative operators `*` and `/`,
and that both the additive and multiplicative operators associate to the left.
This may be best understood via the examples given below.
@ -294,7 +295,7 @@ Naming Convention
This ABNF grammar uses nonterminal names
that consist of complete English words, separated by dashes,
and that describe the construct the way it is in English.
For instance, we use the name 'conditional-statement'
For instance, we use the name `conditional-statement`
to describe conditional statements.
At the same time, this grammar establishes
@ -353,8 +354,8 @@ Lexical Grammar
A Leo file is a finite sequence of Unicode characters,
represented as Unicode code points,
which are numbers in the range form 0 to 10FFFFh.
These are captured by the ABNF rule 'character' below.
which are numbers in the range from 0 to 10FFFFh.
These are captured by the ABNF rule `character` below.
The lexical grammar defines how, at least conceptually,
the sequence of characters is turned into
@ -362,20 +363,20 @@ a sequence of tokens, comments, and whitespaces:
these entities are all defined by the grammar rules below.
As stated, the lexical grammar alone is ambiguous.
For example, the sequence of characters '**' (i.e. two stars)
could be equally parsed as two '*' symbol tokens or one '**' symbol token
(see rule for 'symbol' below).
As another example, the sequence or characters '<CR><LF>'
For example, the sequence of characters `**` (i.e. two stars)
could be equally parsed as two `*` symbol tokens or one `**` symbol token
(see rule for `symbol` below).
As another example, the sequence or characters `<CR><LF>`
(i.e. carriage return followed by line feed)
could be equally parsed as two line terminators or one
(see rule for 'newline').
(see rule for `newline`).
Thus, as often done in language syntax definitions,
the lexical grammar is disambiguated by
the extra-grammatical requirement that
the longest possible sequence of characters is always parsed.
This way, '**' must be parsed as one '**' symbol token,
and '<CR><LF>' must be parsed as one line terminator.
This way, `**` must be parsed as one `**` symbol token,
and `<CR><LF>` must be parsed as one line terminator.
As mentioned above, a character is any Unicode code point.
This grammar does not say how those are encoded in files (e.g. UTF-8):
@ -392,27 +393,27 @@ We give names to certain ASCII characters.
<a name="horizontal-tab"></a>
```abnf
horizontal-tab = %x9
horizontal-tab = %x9 ; <HT>
```
<a name="line-feed"></a>
```abnf
line-feed = %xA
line-feed = %xA ; <LF>
```
<a name="carriage-return"></a>
```abnf
carriage-return = %xD
carriage-return = %xD ; <CR>
```
<a name="space"></a>
```abnf
space = %x20
space = %x20 ; <SP>
```
<a name="double-quote"></a>
```abnf
double-quote = %x22
double-quote = %x22 ; "
```
We give names to complements of certain ASCII characters.
@ -431,12 +432,25 @@ not-star = %x0-29 / %x2B-10FFFF ; anything but *
<a name="not-line-feed-or-carriage-return"></a>
```abnf
not-line-feed-or-carriage-return = %x0-9 / %xB-C / %xE-10FFFF
; anything but LF or CR
; anything but <LF> or <CR>
```
<a name="not-double-quote-or-open-brace"></a>
```abnf
not-double-quote-or-open-brace = %x0-22 / %x24-7A / %x7C-10FFFF
; anything but " or {
```
<a name="not-double-quote-or-close-brace"></a>
```abnf
not-double-quote-or-close-brace = %x0-22 / %x24-7C / %x7E-10FFFF
; anything but " or }
```
<a name="not-star-or-slash"></a>
```abnf
not-star-or-slash = %x0-29 / %x2B-2E / %x30-10FFFF ; anything but * or /
not-star-or-slash = %x0-29 / %x2B-2E / %x30-10FFFF
; anything but * or /
```
Lines in Leo may be terminated via
@ -452,7 +466,7 @@ described above.
newline = line-feed / carriage-return / carriage-return line-feed
```
Go to: _[carriage-return](#user-content-carriage-return), [line-feed](#user-content-line-feed)_;
Go to: _[line-feed](#user-content-line-feed), [carriage-return](#user-content-carriage-return)_;
Line terminators form whitespace, along with spaces and horizontal tabs.
@ -462,16 +476,16 @@ Line terminators form whitespace, along with spaces and horizontal tabs.
whitespace = space / horizontal-tab / newline
```
Go to: _[newline](#user-content-newline), [horizontal-tab](#user-content-horizontal-tab), [space](#user-content-space)_;
Go to: _[horizontal-tab](#user-content-horizontal-tab), [space](#user-content-space), [newline](#user-content-newline)_;
There are two kinds of comments in Leo, as in other languages.
One is block comments of the form '/* ... */',
and the other is end-of-line comments of the form '// ...'.
The first kind start at '/*' and end at the first '*/',
One is block comments of the form `/* ... */`,
and the other is end-of-line comments of the form `// ...`.
The first kind start at `/*` and end at the first `*/`,
possibly spanning multiple (partial) lines;
these do no nest.
The second kind start at '//' and extend till the end of the line.
The second kind start at `//` and extend till the end of the line.
The rules about comments given below are similar to
the ones used in the Java language reference.
@ -480,7 +494,7 @@ the ones used in the Java language reference.
comment = block-comment / end-of-line-comment
```
Go to: _[block-comment](#user-content-block-comment), [end-of-line-comment](#user-content-end-of-line-comment)_;
Go to: _[end-of-line-comment](#user-content-end-of-line-comment), [block-comment](#user-content-block-comment)_;
<a name="block-comment"></a>
@ -497,7 +511,7 @@ rest-of-block-comment = "*" rest-of-block-comment-after-star
/ not-star rest-of-block-comment
```
Go to: _[rest-of-block-comment-after-star](#user-content-rest-of-block-comment-after-star), [not-star](#user-content-not-star), [rest-of-block-comment](#user-content-rest-of-block-comment)_;
Go to: _[not-star](#user-content-not-star), [rest-of-block-comment-after-star](#user-content-rest-of-block-comment-after-star), [rest-of-block-comment](#user-content-rest-of-block-comment)_;
<a name="rest-of-block-comment-after-star"></a>
@ -608,7 +622,7 @@ package-name = 1*( lowercase-letter / digit )
A format string is a sequence of characters, other than double quote,
surrounded by double quotes.
Within a format string, sub-strings '{}' are distinguished as containers
Within a format string, sub-strings `{}` are distinguished as containers
(these are the ones that may be matched with values
whose textual representation replaces the containers
in the printed string).
@ -618,16 +632,6 @@ in the printed string).
format-string-container = "{}"
```
<a name="not-double-quote-or-open-brace"></a>
```abnf
not-double-quote-or-open-brace = %x0-22 / %x24-7A / %x7C-10FFFF
```
<a name="not-double-quote-or-close-brace"></a>
```abnf
not-double-quote-or-close-brace = %x0-22 / %x24-7C / %x7E-10FFFF
```
<a name="format-string-element"></a>
```abnf
format-string-element = not-double-quote-or-open-brace
@ -635,8 +639,7 @@ format-string-element = not-double-quote-or-open-brace
/ format-string-container
```
Go to: _[format-string-container](#user-content-format-string-container), [not-double-quote-or-open-brace](#user-content-not-double-quote-or-open-brace), [not-double-quote-or-close-brace](#user-content-not-double-quote-or-close-brace)_;
Go to: _[not-double-quote-or-close-brace](#user-content-not-double-quote-or-close-brace), [format-string-container](#user-content-format-string-container), [not-double-quote-or-open-brace](#user-content-not-double-quote-or-open-brace)_;
<a name="format-string"></a>
@ -647,7 +650,7 @@ format-string = double-quote *format-string-element double-quote
Go to: _[double-quote](#user-content-double-quote)_;
Annotations have names, which are identifiers immediately preceded by '@'.
Annotations have names, which are identifiers immediately preceded by `@`.
<a name="annotation-name"></a>
```abnf
@ -658,7 +661,7 @@ Go to: _[identifier](#user-content-identifier)_;
A natural (number) is a sequence of one or more digits.
We allow leading zeros, e.g. '007'.
We allow leading zeros, e.g. `007`.
<a name="natural"></a>
```abnf
@ -666,7 +669,7 @@ natural = 1*digit
```
An integer (number) is either a natural or its negation.
We allow leading zeros also in negative numbers, e.g. '-007'.
We allow leading zeros also in negative numbers, e.g. `-007`.
<a name="integer"></a>
```abnf
@ -737,7 +740,7 @@ Boolean literals are the usual two.
boolean-literal = %s"true" / %s"false"
```
An address literal starts with 'aleo1'
An address literal starts with `aleo1`
and continues with exactly 58 lowercase letters and digits.
Thus an address always consists of 63 characters.
@ -761,17 +764,16 @@ atomic-literal = untyped-literal
/ address-literal
```
Go to: _[untyped-literal](#user-content-untyped-literal), [field-literal](#user-content-field-literal), [product-group-literal](#user-content-product-group-literal), [signed-literal](#user-content-signed-literal), [boolean-literal](#user-content-boolean-literal), [unsigned-literal](#user-content-unsigned-literal), [address-literal](#user-content-address-literal)_;
Go to: _[signed-literal](#user-content-signed-literal), [field-literal](#user-content-field-literal), [product-group-literal](#user-content-product-group-literal), [unsigned-literal](#user-content-unsigned-literal), [untyped-literal](#user-content-untyped-literal), [boolean-literal](#user-content-boolean-literal), [address-literal](#user-content-address-literal)_;
After defining the (mostly) alphanumeric tokens above,
it remains to define tokens for non-alphanumeric symbols such as "+" and "(".
it remains to define tokens for non-alphanumeric symbols such as `+` and `(`.
Different programming languages used different terminologies for these,
e.g. operators, separators, punctuators, etc.
Here we use 'symbol', for all of them.
Here we use `symbol`, for all of them.
We also include a token consisting of
a closing parenthesis immediately followed by 'group':
a closing parenthesis `)` immediately followed by `group`:
as defined in the syntactic grammar,
this is the final part of an affine group literal;
even though it includes letters,
@ -789,7 +791,7 @@ equality-operator = "=="
and defining 'symbol' in terms of those
and defining `symbol` in terms of those
@ -804,7 +806,7 @@ but it would help establish a terminology in the grammar,
namely the exact names of some of these token.
On the other hand, at least some of them are perhaps simple enough
that they could be just described in terms of their symbols,
e.g. 'double dot', 'question mark', etc.
e.g. double dot, question mark, etc.
<a name="symbol"></a>
```abnf
@ -835,8 +837,7 @@ token = keyword
/ symbol
```
Go to: _[identifier](#user-content-identifier), [atomic-literal](#user-content-atomic-literal), [annotation-name](#user-content-annotation-name), [symbol](#user-content-symbol), [keyword](#user-content-keyword), [package-name](#user-content-package-name), [format-string](#user-content-format-string)_;
Go to: _[package-name](#user-content-package-name), [format-string](#user-content-format-string), [symbol](#user-content-symbol), [identifier](#user-content-identifier), [atomic-literal](#user-content-atomic-literal), [annotation-name](#user-content-annotation-name), [keyword](#user-content-keyword)_;
@ -893,7 +894,7 @@ group-type = %s"group"
arithmetic-type = integer-type / field-type / group-type
```
Go to: _[group-type](#user-content-group-type), [integer-type](#user-content-integer-type), [field-type](#user-content-field-type)_;
Go to: _[group-type](#user-content-group-type), [field-type](#user-content-field-type), [integer-type](#user-content-integer-type)_;
The arithmetic types, along with the boolean and address types,
@ -914,10 +915,10 @@ address-type = %s"address"
scalar-type = boolean-type / arithmetic-type / address-type
```
Go to: _[arithmetic-type](#user-content-arithmetic-type), [address-type](#user-content-address-type), [boolean-type](#user-content-boolean-type)_;
Go to: _[address-type](#user-content-address-type), [arithmetic-type](#user-content-arithmetic-type), [boolean-type](#user-content-boolean-type)_;
Circuit types are denoted by identifiers and the keyword 'Self'.
Circuit types are denoted by identifiers and the keyword `Self`.
The latter is only allowed inside a circuit definition,
to denote the circuit being defined.
@ -931,7 +932,7 @@ self-type = %s"Self"
circuit-type = identifier / self-type
```
Go to: _[identifier](#user-content-identifier), [self-type](#user-content-self-type)_;
Go to: _[self-type](#user-content-self-type), [identifier](#user-content-identifier)_;
A tuple type consists of zero, two, or more component types.
@ -954,7 +955,7 @@ or a tuple of one or more dimensions.
array-type = "[" type ";" array-dimensions "]"
```
Go to: _[type](#user-content-type), [array-dimensions](#user-content-array-dimensions)_;
Go to: _[array-dimensions](#user-content-array-dimensions), [type](#user-content-type)_;
<a name="array-dimensions"></a>
@ -975,7 +976,7 @@ i.e. types whose values contain (sub-)values
aggregate-type = tuple-type / array-type / circuit-type
```
Go to: _[circuit-type](#user-content-circuit-type), [tuple-type](#user-content-tuple-type), [array-type](#user-content-array-type)_;
Go to: _[array-type](#user-content-array-type), [circuit-type](#user-content-circuit-type), [tuple-type](#user-content-tuple-type)_;
Scalar and aggregate types form all the types.
@ -1021,11 +1022,11 @@ A literal is either an atomic one or an affine group literal.
literal = atomic-literal / affine-group-literal
```
Go to: _[affine-group-literal](#user-content-affine-group-literal), [atomic-literal](#user-content-atomic-literal)_;
Go to: _[atomic-literal](#user-content-atomic-literal), [affine-group-literal](#user-content-affine-group-literal)_;
The following rule is not directly referenced in the rules for expressions
(which reference 'literal' instead),
(which reference `literal` instead),
but it is useful to establish terminology:
a group literal is either a product group literal or an affine group literal.
@ -1063,8 +1064,7 @@ primary-expression = identifier
/ circuit-expression
```
Go to: _[tuple-expression](#user-content-tuple-expression), [array-expression](#user-content-array-expression), [circuit-expression](#user-content-circuit-expression), [identifier](#user-content-identifier), [literal](#user-content-literal), [expression](#user-content-expression)_;
Go to: _[array-expression](#user-content-array-expression), [identifier](#user-content-identifier), [expression](#user-content-expression), [literal](#user-content-literal), [tuple-expression](#user-content-tuple-expression), [circuit-expression](#user-content-circuit-expression)_;
Tuple expressions construct tuples.
@ -1089,7 +1089,7 @@ Go to: _[tuple-construction](#user-content-tuple-construction)_;
Array expressions construct arrays.
There are two kinds:
one lists the element expressions (at least one),
including spreads (via '...') which are arrays being spliced in;
including spreads (via `...`) which are arrays being spliced in;
the other repeats (the value of) a single expression
across one or more dimensions.
@ -1117,7 +1117,7 @@ Go to: _[expression](#user-content-expression)_;
array-repeat-construction = "[" expression ";" array-dimensions "]"
```
Go to: _[array-dimensions](#user-content-array-dimensions), [expression](#user-content-expression)_;
Go to: _[expression](#user-content-expression), [array-dimensions](#user-content-array-dimensions)_;
<a name="array-construction"></a>
@ -1125,7 +1125,7 @@ Go to: _[array-dimensions](#user-content-array-dimensions), [expression](#user-c
array-construction = array-inline-construction / array-repeat-construction
```
Go to: _[array-inline-construction](#user-content-array-inline-construction), [array-repeat-construction](#user-content-array-repeat-construction)_;
Go to: _[array-repeat-construction](#user-content-array-repeat-construction), [array-inline-construction](#user-content-array-inline-construction)_;
<a name="array-expression"></a>
@ -1148,11 +1148,12 @@ so they are syntactically identical but semantically different.
<a name="circuit-construction"></a>
```abnf
circuit-construction = circuit-type "{"
circuit-inline-element *( "," circuit-inline-element ) [ "," ]
circuit-inline-element
*( "," circuit-inline-element ) [ "," ]
"}"
```
Go to: _[circuit-inline-element](#user-content-circuit-inline-element), [circuit-type](#user-content-circuit-type)_;
Go to: _[circuit-type](#user-content-circuit-type), [circuit-inline-element](#user-content-circuit-inline-element)_;
<a name="circuit-inline-element"></a>
@ -1160,7 +1161,7 @@ Go to: _[circuit-inline-element](#user-content-circuit-inline-element), [circuit
circuit-inline-element = identifier ":" expression / identifier
```
Go to: _[expression](#user-content-expression), [identifier](#user-content-identifier)_;
Go to: _[identifier](#user-content-identifier), [expression](#user-content-expression)_;
<a name="circuit-expression"></a>
@ -1211,8 +1212,7 @@ postfix-expression = primary-expression
/ postfix-expression "[" [expression] ".." [expression] "]"
```
Go to: _[natural](#user-content-natural), [function-arguments](#user-content-function-arguments), [identifier](#user-content-identifier), [primary-expression](#user-content-primary-expression), [circuit-type](#user-content-circuit-type), [postfix-expression](#user-content-postfix-expression), [expression](#user-content-expression)_;
Go to: _[identifier](#user-content-identifier), [function-arguments](#user-content-function-arguments), [natural](#user-content-natural), [circuit-type](#user-content-circuit-type), [primary-expression](#user-content-primary-expression), [postfix-expression](#user-content-postfix-expression), [expression](#user-content-expression)_;
Unary operators have the highest operator precedence.
@ -1226,13 +1226,13 @@ unary-expression = postfix-expression
/ "-" unary-expression
```
Go to: _[unary-expression](#user-content-unary-expression), [postfix-expression](#user-content-postfix-expression)_;
Go to: _[postfix-expression](#user-content-postfix-expression), [unary-expression](#user-content-unary-expression)_;
Next in the operator precedence is exponentiation,
following mathematical practice.
The current rule below makes exponentiation left-associative,
i.e. 'a ** b ** c' must be parsed as '(a ** b) ** c'.
The current rule below makes exponentiation right-associative,
i.e. `a ** b ** c` must be parsed as `a ** (b ** c)`.
<a name="exponential-expression"></a>
```abnf
@ -1240,7 +1240,7 @@ exponential-expression = unary-expression
/ unary-expression "**" exponential-expression
```
Go to: _[exponential-expression](#user-content-exponential-expression), [unary-expression](#user-content-unary-expression)_;
Go to: _[unary-expression](#user-content-unary-expression), [exponential-expression](#user-content-exponential-expression)_;
Next in precedence come multiplication and division, both left-associative.
@ -1327,7 +1327,7 @@ conditional-expression = disjunctive-expression
":" conditional-expression
```
Go to: _[conditional-expression](#user-content-conditional-expression), [disjunctive-expression](#user-content-disjunctive-expression), [expression](#user-content-expression)_;
Go to: _[disjunctive-expression](#user-content-disjunctive-expression), [conditional-expression](#user-content-conditional-expression), [expression](#user-content-expression)_;
Those above are all the expressions.
@ -1359,7 +1359,7 @@ statement = expression-statement
/ block
```
Go to: _[expression-statement](#user-content-expression-statement), [block](#user-content-block), [variable-definition-statement](#user-content-variable-definition-statement), [conditional-statement](#user-content-conditional-statement), [assignment-statement](#user-content-assignment-statement), [return-statement](#user-content-return-statement), [loop-statement](#user-content-loop-statement), [console-statement](#user-content-console-statement)_;
Go to: _[return-statement](#user-content-return-statement), [loop-statement](#user-content-loop-statement), [assignment-statement](#user-content-assignment-statement), [expression-statement](#user-content-expression-statement), [console-statement](#user-content-console-statement), [block](#user-content-block), [conditional-statement](#user-content-conditional-statement), [variable-definition-statement](#user-content-variable-definition-statement)_;
<a name="block"></a>
@ -1378,8 +1378,7 @@ expression-statement = expression ";"
Go to: _[expression](#user-content-expression)_;
A return statement always takes an expression,
and does not end with a semicolon.
A return statement always takes an expression, and ends with a semicolon.
<a name="return-statement"></a>
```abnf
@ -1402,7 +1401,7 @@ variable-definition-statement = ( %s"let" / %s"const" )
[ ":" type ] "=" expression ";"
```
Go to: _[identifier-or-identifiers](#user-content-identifier-or-identifiers), [type](#user-content-type), [expression](#user-content-expression)_;
Go to: _[type](#user-content-type), [identifier-or-identifiers](#user-content-identifier-or-identifiers), [expression](#user-content-expression)_;
<a name="identifier-or-identifiers"></a>
@ -1435,7 +1434,7 @@ conditional-statement = branch
/ branch %s"else" conditional-statement
```
Go to: _[block](#user-content-block), [branch](#user-content-branch), [conditional-statement](#user-content-conditional-statement)_;
Go to: _[block](#user-content-block), [conditional-statement](#user-content-conditional-statement), [branch](#user-content-branch)_;
A loop statement implicitly defines a loop variable
@ -1447,11 +1446,11 @@ The body is a block.
loop-statement = %s"for" identifier %s"in" expression ".." expression block
```
Go to: _[block](#user-content-block), [identifier](#user-content-identifier), [expression](#user-content-expression)_;
Go to: _[block](#user-content-block), [expression](#user-content-expression), [identifier](#user-content-identifier)_;
An assignment statement is straightforward.
Based on the operator, the assignment may be simple (i.e. '=')
Based on the operator, the assignment may be simple (i.e. `=`)
or compound (i.e. combining assignment with an arithmetic operation).
<a name="assignment-operator"></a>
@ -1467,13 +1466,13 @@ assignment-statement = expression assignment-operator expression ";"
Go to: _[expression](#user-content-expression), [assignment-operator](#user-content-assignment-operator)_;
Console statements start with the 'console' keyword,
Console statements start with the `console` keyword,
followed by a console function call.
The call may be an assertion or a print command.
The former takes an expression (which must be boolean) as argument.
The latter takes either no argument,
or a format string followed by expressions,
whose number must match the number of containers '{}' in the format string.
whose number must match the number of containers `{}` in the format string.
Note that the console function names are identifiers, not keywords.
There are three kinds of print commands.
@ -1491,7 +1490,7 @@ console-call = assert-call
/ print-call
```
Go to: _[print-call](#user-content-print-call), [assert-call](#user-content-assert-call)_;
Go to: _[assert-call](#user-content-assert-call), [print-call](#user-content-print-call)_;
<a name="assert-call"></a>
@ -1520,10 +1519,10 @@ Go to: _[format-string](#user-content-format-string)_;
print-call = print-function print-arguments
```
Go to: _[print-function](#user-content-print-function), [print-arguments](#user-content-print-arguments)_;
Go to: _[print-arguments](#user-content-print-arguments), [print-function](#user-content-print-function)_;
An annotation consists of an annotation name (which starts with '@')
An annotation consists of an annotation name (which starts with `@`)
with optional annotation arguments, which are identifiers.
Note that no parentheses are used if there are no arguments.
@ -1533,7 +1532,7 @@ annotation = annotation-name
[ "(" identifier *( "," identifier ) ")" ]
```
Go to: _[annotation-name](#user-content-annotation-name), [identifier](#user-content-identifier)_;
Go to: _[identifier](#user-content-identifier), [annotation-name](#user-content-annotation-name)_;
A function declaration defines a function.
@ -1541,8 +1540,7 @@ The output type is optional, defaulting to the empty tuple type.
In general, a function input consists of an identifier and a type,
with an optional 'const' modifier.
Additionally, functions inside circuits
may start with a 'mut self' or 'const self' or 'self' parameter.
Furthermore, any function may end with an 'input' parameter.
may start with a `mut self` or `const self` or `self` parameter.
<a name="function-declaration"></a>
```abnf
@ -1551,7 +1549,7 @@ function-declaration = *annotation %s"function" identifier
block
```
Go to: _[type](#user-content-type), [block](#user-content-block), [function-parameters](#user-content-function-parameters), [identifier](#user-content-identifier)_;
Go to: _[type](#user-content-type), [block](#user-content-block), [identifier](#user-content-identifier), [function-parameters](#user-content-function-parameters)_;
<a name="function-parameters"></a>
@ -1582,14 +1580,9 @@ Go to: _[function-input](#user-content-function-input)_;
function-input = [ %s"const" ] identifier ":" type
```
Go to: _[type](#user-content-type), [identifier](#user-content-identifier)_;
Go to: _[identifier](#user-content-identifier), [type](#user-content-type)_;
<a name="input-parameter"></a>
```abnf
input-parameter = %s"input"
```
A circuit member variable declaration consists of an identifier and a type.
A circuit member function declaration consists of a function declaration.
@ -1599,7 +1592,7 @@ member-declaration = member-variable-declaration
/ member-function-declaration
```
Go to: _[member-function-declaration](#user-content-member-function-declaration), [member-variable-declaration](#user-content-member-variable-declaration)_;
Go to: _[member-variable-declaration](#user-content-member-variable-declaration), [member-function-declaration](#user-content-member-function-declaration)_;
<a name="member-variable-declaration"></a>
@ -1607,7 +1600,7 @@ Go to: _[member-function-declaration](#user-content-member-function-declaration)
member-variable-declaration = identifier ":" type
```
Go to: _[type](#user-content-type), [identifier](#user-content-identifier)_;
Go to: _[identifier](#user-content-identifier), [type](#user-content-type)_;
<a name="member-function-declaration"></a>
@ -1630,7 +1623,7 @@ circuit-declaration = *annotation %s"circuit" identifier
Go to: _[identifier](#user-content-identifier), [member-declaration](#user-content-member-declaration)_;
An import declaration consists of the 'import' keyword
An import declaration consists of the `import` keyword
followed by a package path, which may be one of the following:
a single wildcard;
an identifier, optionally followed by a local renamer;
@ -1668,7 +1661,7 @@ declaration = import-declaration
/ circuit-declaration
```
Go to: _[circuit-declaration](#user-content-circuit-declaration), [import-declaration](#user-content-import-declaration), [function-declaration](#user-content-function-declaration)_;
Go to: _[function-declaration](#user-content-function-declaration), [circuit-declaration](#user-content-circuit-declaration), [import-declaration](#user-content-import-declaration)_;
<a name="file"></a>
@ -1683,16 +1676,18 @@ file = *declaration
Format Note
-----------
The ABNF standard requires grammars to consist of lines terminated by CR LF
The ABNF standard requires grammars
to consist of lines terminated by `<CR><LF>`
(i.e. carriage return followed by line feed, DOS/Windows-style),
as explained in the background on ABNF earlier in this file.
This file's lines are therefore terminated by CR LF.
This file's lines are therefore terminated by `<CR><LF>`.
To avoid losing this requirement across systems,
this file is marked as 'text eol=crlf' in .gitattributes:
this file is marked as `text eol=crlf` in `.gitattributes`:
this means that the file is textual, enabling visual diffs,
but its lines will always be terminated by CR LF on any system.
but its lines will always be terminated by `<CR><LF>` on any system.
Note that this CR LF requirement only applies to the grammar files themselves.
Note that this `<CR><LF>` requirement only applies
to the grammar files themselves.
It does not apply to the lines of the languages described by the grammar.
ABNF grammars may describe any kind of languages,
with any kind of line terminators,

View File

@ -53,59 +53,59 @@
; Instead of BNF's angle-bracket notation for nonterminals,
; ABNF uses case-insensitive names consisting of letters, digits, and dashes,
; e.g. HTTP-message and IPv6address.
; e.g. `HTTP-message` and `IPv6address`.
; ABNF includes an angle-bracket notation for prose descriptions,
; e.g. <host, see [RFC3986], Section 3.2.2>,
; e.g. `<host, see [RFC3986], Section 3.2.2>`,
; usable as last resort in the definiens of a nonterminal.
; While BNF allows arbitrary terminals,
; ABNF uses only natural numbers as terminals,
; and denotes them via:
; (i) binary, decimal, or hexadecimal sequences,
; e.g. %b1.11.1010, %d1.3.10, and %x.1.3.A
; all denote the sequence of terminals '1 3 10';
; e.g. `%b1.11.1010`, `%d1.3.10`, and `%x.1.3.A`
; all denote the sequence of terminals [1, 3, 10];
; (ii) binary, decimal, or hexadecimal ranges,
; e.g. %x30-39 denotes any singleton sequence of terminals
; 'n' with 48 <= n <= 57 (an ASCII digit);
; e.g. `%x30-39` denotes any singleton sequence of terminals
; [_n_] with 48 <= _n_ <= 57 (an ASCII digit);
; (iii) case-sensitive ASCII strings,
; e.g. %s"Ab" denotes the sequence of terminals '65 98';
; e.g. `%s"Ab"` denotes the sequence of terminals [65, 98];
; and (iv) case-insensitive ASCII strings,
; e.g. %i"ab", or just "ab", denotes
; e.g. `%i"ab"`, or just `"ab"`, denotes
; any sequence of terminals among
; '65 66',
; '65 98',
; '97 66', and
; '97 98'.
; [65, 66],
; [65, 98],
; [97, 66], and
; [97, 98].
; ABNF terminals in suitable sets represent ASCII or Unicode characters.
; ABNF allows repetition prefixes n*m,
; where n and m are natural numbers in decimal notation;
; ABNF allows repetition prefixes `n*m`,
; where `n` and `m` are natural numbers in decimal notation;
; if absent,
; n defaults to 0, and
; m defaults to infinity.
; `n` defaults to 0, and
; `m` defaults to infinity.
; For example,
; 1*4HEXDIG denotes one to four HEXDIGs,
; *3DIGIT denotes up to three DIGITs, and
; 1*OCTET denotes one or more OCTETs.
; A single n prefix
; abbreviates n*n,
; e.g. 3DIGIT denotes three DIGITs.
; `1*4HEXDIG` denotes one to four `HEXDIG`s,
; `*3DIGIT` denotes up to three `DIGIT`s, and
; `1*OCTET` denotes one or more `OCTET`s.
; A single `n` prefix
; abbreviates `n*n`,
; e.g. `3DIGIT` denotes three `DIGIT`s.
; Instead of BNF's |, ABNF uses / to separate alternatives.
; Instead of BNF's `|`, ABNF uses `/` to separate alternatives.
; Repetition prefixes have precedence over juxtapositions,
; which have precedence over /.
; which have precedence over `/`.
; Round brackets group things and override the aforementioned precedence rules,
; e.g. *(WSP / CRLF WSP) denotes sequences of terminals
; e.g. `*(WSP / CRLF WSP)` denotes sequences of terminals
; obtained by repeating, zero or more times,
; either (i) a WSP or (ii) a CRLF followed by a WSP.
; either (i) a `WSP` or (ii) a `CRLF` followed by a `WSP`.
; Square brackets also group things but make them optional,
; e.g. [":" port] is equivalent to 0*1(":" port).
; e.g. `[":" port]` is equivalent to `0*1(":" port)`.
; Instead of BNF's ::=, ABNF uses = to define nonterminals,
; and =/ to incrementally add alternatives
; Instead of BNF's `::=`, ABNF uses `=` to define nonterminals,
; and `=/` to incrementally add alternatives
; to previously defined nonterminals.
; For example, the rule BIT = "0" / "1"
; is equivalent to BIT = "0" followed by BIT =/ "1".
; For example, the rule `BIT = "0" / "1"`
; is equivalent to `BIT = "0"` followed by `BIT =/ "1"`.
; The syntax of ABNF itself is formally specified in ABNF
; (in Section 4 of the aforementioned RFC 5234,
@ -125,7 +125,7 @@
; This ABNF grammar consists of two (sub-)grammars:
; (i) a lexical grammar that describes how
; sequence of characters are parsed into tokens, and
; (ii) a syntactic grammar that described how
; (ii) a syntactic grammar that describes how
; tokens are parsed into expressions, statements, etc.
; The adjectives 'lexical' and 'syntactic' are
; the same ones used in the Java language reference,
@ -180,8 +180,8 @@
; / additive-expression "-" multiplicative-expression
;
; These rules tell us
; that the additive operators '+' and '-' have lower precedence
; than the multiplicative operators '*' and '/',
; that the additive operators `+` and `-` have lower precedence
; than the multiplicative operators `*` and `/`,
; and that both the additive and multiplicative operators associate to the left.
; This may be best understood via the examples given below.
@ -239,7 +239,7 @@
; This ABNF grammar uses nonterminal names
; that consist of complete English words, separated by dashes,
; and that describe the construct the way it is in English.
; For instance, we use the name 'conditional-statement'
; For instance, we use the name `conditional-statement`
; to describe conditional statements.
; At the same time, this grammar establishes
@ -284,8 +284,8 @@
; A Leo file is a finite sequence of Unicode characters,
; represented as Unicode code points,
; which are numbers in the range form 0 to 10FFFFh.
; These are captured by the ABNF rule 'character' below.
; which are numbers in the range from 0 to 10FFFFh.
; These are captured by the ABNF rule `character` below.
; The lexical grammar defines how, at least conceptually,
; the sequence of characters is turned into
@ -293,20 +293,20 @@
; these entities are all defined by the grammar rules below.
; As stated, the lexical grammar alone is ambiguous.
; For example, the sequence of characters '**' (i.e. two stars)
; could be equally parsed as two '*' symbol tokens or one '**' symbol token
; (see rule for 'symbol' below).
; As another example, the sequence or characters '<CR><LF>'
; For example, the sequence of characters `**` (i.e. two stars)
; could be equally parsed as two `*` symbol tokens or one `**` symbol token
; (see rule for `symbol` below).
; As another example, the sequence or characters `<CR><LF>`
; (i.e. carriage return followed by line feed)
; could be equally parsed as two line terminators or one
; (see rule for 'newline').
; (see rule for `newline`).
; Thus, as often done in language syntax definitions,
; the lexical grammar is disambiguated by
; the extra-grammatical requirement that
; the longest possible sequence of characters is always parsed.
; This way, '**' must be parsed as one '**' symbol token,
; and '<CR><LF>' must be parsed as one line terminator.
; This way, `**` must be parsed as one `**` symbol token,
; and `<CR><LF>` must be parsed as one line terminator.
; As mentioned above, a character is any Unicode code point.
; This grammar does not say how those are encoded in files (e.g. UTF-8):
@ -318,15 +318,15 @@ character = %x0-10FFFF ; any Unicode code point
; We give names to certain ASCII characters.
horizontal-tab = %x9
horizontal-tab = %x9 ; <HT>
line-feed = %xA
line-feed = %xA ; <LF>
carriage-return = %xD
carriage-return = %xD ; <CR>
space = %x20
space = %x20 ; <SP>
double-quote = %x22
double-quote = %x22 ; "
; We give names to complements of certain ASCII characters.
; These consist of all the Unicode characters except for one or two.
@ -362,12 +362,12 @@ newline = line-feed / carriage-return / carriage-return line-feed
whitespace = space / horizontal-tab / newline
; There are two kinds of comments in Leo, as in other languages.
; One is block comments of the form '/* ... */',
; and the other is end-of-line comments of the form '// ...'.
; The first kind start at '/*' and end at the first '*/',
; One is block comments of the form `/* ... */`,
; and the other is end-of-line comments of the form `// ...`.
; The first kind start at `/*` and end at the first `*/`,
; possibly spanning multiple (partial) lines;
; these do no nest.
; The second kind start at '//' and extend till the end of the line.
; The second kind start at `//` and extend till the end of the line.
; The rules about comments given below are similar to
; the ones used in the Java language reference.
@ -447,7 +447,7 @@ package-name = 1*( lowercase-letter / digit )
; A format string is a sequence of characters, other than double quote,
; surrounded by double quotes.
; Within a format string, sub-strings '{}' are distinguished as containers
; Within a format string, sub-strings `{}` are distinguished as containers
; (these are the ones that may be matched with values
; whose textual representation replaces the containers
; in the printed string).
@ -460,17 +460,17 @@ format-string-element = not-double-quote-or-open-brace
format-string = double-quote *format-string-element double-quote
; Annotations have names, which are identifiers immediately preceded by '@'.
; Annotations have names, which are identifiers immediately preceded by `@`.
annotation-name = "@" identifier
; A natural (number) is a sequence of one or more digits.
; We allow leading zeros, e.g. '007'.
; We allow leading zeros, e.g. `007`.
natural = 1*digit
; An integer (number) is either a natural or its negation.
; We allow leading zeros also in negative numbers, e.g. '-007'.
; We allow leading zeros also in negative numbers, e.g. `-007`.
integer = [ "-" ] natural
@ -502,7 +502,7 @@ product-group-literal = integer %s"group"
boolean-literal = %s"true" / %s"false"
; An address literal starts with 'aleo1'
; An address literal starts with `aleo1`
; and continues with exactly 58 lowercase letters and digits.
; Thus an address always consists of 63 characters.
@ -521,12 +521,12 @@ atomic-literal = untyped-literal
/ address-literal
; After defining the (mostly) alphanumeric tokens above,
; it remains to define tokens for non-alphanumeric symbols such as "+" and "(".
; it remains to define tokens for non-alphanumeric symbols such as `+` and `(`.
; Different programming languages used different terminologies for these,
; e.g. operators, separators, punctuators, etc.
; Here we use 'symbol', for all of them.
; Here we use `symbol`, for all of them.
; We also include a token consisting of
; a closing parenthesis immediately followed by 'group':
; a closing parenthesis `)` immediately followed by `group`:
; as defined in the syntactic grammar,
; this is the final part of an affine group literal;
; even though it includes letters,
@ -538,7 +538,7 @@ atomic-literal = untyped-literal
;
; equality-operator = "=="
;
; and defining 'symbol' in terms of those
; and defining `symbol` in terms of those
;
; symbol = ... / equality-operator / ...
;
@ -547,7 +547,7 @@ atomic-literal = untyped-literal
; namely the exact names of some of these token.
; On the other hand, at least some of them are perhaps simple enough
; that they could be just described in terms of their symbols,
; e.g. 'double dot', 'question mark', etc.
; e.g. double dot, question mark, etc.
symbol = "!" / "&&" / "||"
/ "==" / "!="
@ -612,7 +612,7 @@ address-type = %s"address"
scalar-type = boolean-type / arithmetic-type / address-type
; Circuit types are denoted by identifiers and the keyword 'Self'.
; Circuit types are denoted by identifiers and the keyword `Self`.
; The latter is only allowed inside a circuit definition,
; to denote the circuit being defined.
@ -663,7 +663,7 @@ affine-group-literal = "(" group-coordinate "," group-coordinate %s")group"
literal = atomic-literal / affine-group-literal
; The following rule is not directly referenced in the rules for expressions
; (which reference 'literal' instead),
; (which reference `literal` instead),
; but it is useful to establish terminology:
; a group literal is either a product group literal or an affine group literal.
@ -702,7 +702,7 @@ tuple-expression = tuple-construction
; Array expressions construct arrays.
; There are two kinds:
; one lists the element expressions (at least one),
; including spreads (via '...') which are arrays being spliced in;
; including spreads (via `...`) which are arrays being spliced in;
; the other repeats (the value of) a single expression
; across one or more dimensions.
@ -729,7 +729,8 @@ array-expression = array-construction
; so they are syntactically identical but semantically different.
circuit-construction = circuit-type "{"
circuit-inline-element *( "," circuit-inline-element ) [ "," ]
circuit-inline-element
*( "," circuit-inline-element ) [ "," ]
"}"
circuit-inline-element = identifier ":" expression / identifier
@ -777,8 +778,8 @@ unary-expression = postfix-expression
; Next in the operator precedence is exponentiation,
; following mathematical practice.
; The current rule below makes exponentiation left-associative,
; i.e. 'a ** b ** c' must be parsed as '(a ** b) ** c'.
; The current rule below makes exponentiation right-associative,
; i.e. `a ** b ** c` must be parsed as `a ** (b ** c)`.
exponential-expression = unary-expression
/ unary-expression "**" exponential-expression
@ -855,8 +856,7 @@ block = "{" *statement "}"
expression-statement = expression ";"
; A return statement always takes an expression,
; and does not end with a semicolon.
; A return statement always takes an expression, and ends with a semicolon.
return-statement = %s"return" expression ";"
@ -892,20 +892,20 @@ conditional-statement = branch
loop-statement = %s"for" identifier %s"in" expression ".." expression block
; An assignment statement is straightforward.
; Based on the operator, the assignment may be simple (i.e. '=')
; Based on the operator, the assignment may be simple (i.e. `=`)
; or compound (i.e. combining assignment with an arithmetic operation).
assignment-operator = "=" / "+=" / "-=" / "*=" / "/=" / "**="
assignment-statement = expression assignment-operator expression ";"
; Console statements start with the 'console' keyword,
; Console statements start with the `console` keyword,
; followed by a console function call.
; The call may be an assertion or a print command.
; The former takes an expression (which must be boolean) as argument.
; The latter takes either no argument,
; or a format string followed by expressions,
; whose number must match the number of containers '{}' in the format string.
; whose number must match the number of containers `{}` in the format string.
; Note that the console function names are identifiers, not keywords.
; There are three kinds of print commands.
@ -922,7 +922,7 @@ print-arguments = "(" [ format-string *( "," expression ) ] ")"
print-call = print-function print-arguments
; An annotation consists of an annotation name (which starts with '@')
; An annotation consists of an annotation name (which starts with `@`)
; with optional annotation arguments, which are identifiers.
; Note that no parentheses are used if there are no arguments.
@ -934,8 +934,7 @@ annotation = annotation-name
; In general, a function input consists of an identifier and a type,
; with an optional 'const' modifier.
; Additionally, functions inside circuits
; may start with a 'mut self' or 'const self' or 'self' parameter.
; Furthermore, any function may end with an 'input' parameter.
; may start with a `mut self` or `const self` or `self` parameter.
function-declaration = *annotation %s"function" identifier
"(" [ function-parameters ] ")" [ "->" type ]
@ -951,8 +950,6 @@ function-inputs = function-input *( "," function-input )
function-input = [ %s"const" ] identifier ":" type
input-parameter = %s"input"
; A circuit member variable declaration consists of an identifier and a type.
; A circuit member function declaration consists of a function declaration.
@ -969,7 +966,7 @@ member-function-declaration = function-declaration
circuit-declaration = *annotation %s"circuit" identifier
"{" member-declaration *( "," member-declaration ) "}"
; An import declaration consists of the 'import' keyword
; An import declaration consists of the `import` keyword
; followed by a package path, which may be one of the following:
; a single wildcard;
; an identifier, optionally followed by a local renamer;
@ -999,16 +996,17 @@ file = *declaration
; Format Note
; -----------
; The ABNF standard requires grammars to consist of lines terminated by <CR><LF>
; The ABNF standard requires grammars
; to consist of lines terminated by `<CR><LF>`
; (i.e. carriage return followed by line feed, DOS/Windows-style),
; as explained in the background on ABNF earlier in this file.
; This file's lines are therefore terminated by <CR><LF>.
; This file's lines are therefore terminated by `<CR><LF>`.
; To avoid losing this requirement across systems,
; this file is marked as 'text eol=crlf' in .gitattributes:
; this file is marked as `text eol=crlf` in `.gitattributes`:
; this means that the file is textual, enabling visual diffs,
; but its lines will always be terminated by <CR><LF> on any system.
; but its lines will always be terminated by `<CR><LF>` on any system.
; Note that this <CR><LF> requirement only applies
; Note that this `<CR><LF>` requirement only applies
; to the grammar files themselves.
; It does not apply to the lines of the languages described by the grammar.
; ABNF grammars may describe any kind of languages,