Merge pull request #1368 from AleoHQ/rfc-rec-trans

[RFC] Updates to record/transaction model
This commit is contained in:
Alessandro Coglio 2021-09-30 22:12:53 -07:00 committed by GitHub
commit a5de1d6ee3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1,4 +1,4 @@
# Leo RFC 012: Record and Transaction Model # Leo RFC 012: Improved Record and Transaction Model
## Authors ## Authors
@ -10,7 +10,7 @@ DRAFT
## Summary ## Summary
This RFC describes how Leo programs interact with the Aleo blockchain. This RFC describes an improved model for how Leo programs interact with the Aleo blockchain.
The description is oriented to the Leo developer: The description is oriented to the Leo developer:
it does not describe the zero-knowledge details, it does not describe the zero-knowledge details,
as the whole purpose of Leo is to enable developers as the whole purpose of Leo is to enable developers
@ -23,7 +23,7 @@ While Leo can be described as a regular programming language
its purpose is to build applications for the Aleo blockchain. its purpose is to build applications for the Aleo blockchain.
It is thus important to describe precisely how Leo programs operate in the Aleo blockchain. It is thus important to describe precisely how Leo programs operate in the Aleo blockchain.
## Design ## Background
### Zexe ### Zexe
@ -36,6 +36,7 @@ The computation of the new records from the old records
is arbitrary and unknown to the blockchain; is arbitrary and unknown to the blockchain;
the blockchain only enforces that the old records satisfy known _death predicates_ the blockchain only enforces that the old records satisfy known _death predicates_
and that the new records satisfy known _birth predicates_. and that the new records satisfy known _birth predicates_.
See the [Zexe paper](https://eprint.iacr.org/2018/962.pdf) for details.
### Aleo Blockchain ### Aleo Blockchain
@ -49,7 +50,7 @@ which may involve records owned by different parties
One or both of the old records may be dummy, One or both of the old records may be dummy,
if only one old actual record is desired, if only one old actual record is desired,
or if new records are to be created "from nothing". or if new records are created "from nothing".
One or both of the new records may be dummy, One or both of the new records may be dummy,
if only one new actual record is desired, if only one new actual record is desired,
or if old records just have to be consumed. or if old records just have to be consumed.
@ -90,10 +91,10 @@ This is what 'corresponds' means, in that sentence.
However, for the high-level purpose of this RFC, these are zero-knowledge details. However, for the high-level purpose of this RFC, these are zero-knowledge details.
In general, the `main` function takes some `const` and some non-`const` inputs (declared as parameters), In general, the `main` function takes some `const` and some non-`const` inputs (declared as parameters),
and returns an output (declared as a return type), which may be a tuple to effectively represent multiple outputs. and returns an output (declared as a return type), which may be a tuple to represent "multiple" outputs.
The `const` inputs are compiled into the zero-knowledge circuit, The `const` inputs are compiled into the zero-knowledge circuit,
so they can be ignored for our purpose here, so they can be ignored for our purpose here,
leaving the non-`const` inputs and the output for consideration. leaving only the non-`const` inputs and the output for consideration.
The execution of `main` can be described as a mathematical function The execution of `main` can be described as a mathematical function
``` ```
@ -131,14 +132,23 @@ on those values and on the old records
yields the new records, along with some value in `Output`; yields the new records, along with some value in `Output`;
this is, roughly speaking, the assertion proved in zero-knowledge. this is, roughly speaking, the assertion proved in zero-knowledge.
### Proposed Leo Program Execution Model ### Input and Output Files
The current model described above seems adequate overall, but we need to: Currently the compilation of a Leo program involves:
1. Clarify how Leo code reads old records and writes new records. 1. A `.in` file, containing `const` and non-`const` inputs.
2. Generalize from one entry point (i.e. the `main` function) to multiple entry points, in line with the smart contract paradigm. 2. A `.state` file, containing transaction data.
3. A `.out` file, containing results.
Generalizing from one `main` entry point to multiple ones is conceptually easy. The compilation takes the first two files as inputs and returns the third file as output.
It means that, instead of implicitly designating `main` as the only entry point,
## Design
### Multiple Entry Points
We propose to generalize from one entry point (i.e. the `main` function) to multiple entry points,
in line with the smart contract paradigm.
Instead of implicitly designating `main` as the only entry point,
we need a mechanism to explicitly designate one or more Leo functions as entry points. we need a mechanism to explicitly designate one or more Leo functions as entry points.
A simple approach could be to use an annotation like `@entrypoint` to designate _entry point functions_: A simple approach could be to use an annotation like `@entrypoint` to designate _entry point functions_:
@ -167,12 +177,18 @@ entrypoint {
} }
``` ```
Now let us turn to the issue of clarifying how the Leo code reads old records and writes new records. In the rest of this design section we assume the annotation approach (i.e. `@entrypoint`) for concreteness,
but that can be replaced as soon as we converge on a choice.
### Types for Transaction Inputs and Outputs
We propose to add types for transaction inputs and outputs to the Leo standard library,
and possibly include them in the prelude that is implicitly imported by every Leo program.
Given that records have a fixed structure with typed slots, Given that records have a fixed structure with typed slots,
their format could be described by a Leo circuit type, their format could be described by a Leo circuit type, e.g. called `Record`,
whose member variables correspond to the slots. whose member variables correspond to the slots.
The types of the slots would be fairly low-level, The types of the slots are fairly low-level,
i.e. byte arrays (e.g. `u8[128]` for the payload) i.e. byte arrays (e.g. `u8[128]` for the payload)
and unsigned integers (e.g. `u64` for the balance), and unsigned integers (e.g. `u64` for the balance),
because they must have a clear correspondence with the serialized form of records. because they must have a clear correspondence with the serialized form of records.
@ -181,104 +197,171 @@ its own deserialization of the payload bytes into higher-level Leo values;
standard serialization/deserialization libraries for Leo types may be provided for this, standard serialization/deserialization libraries for Leo types may be provided for this,
as an independent and more generally useful feature. as an independent and more generally useful feature.
It may make sense to have a circuit type for the special `input` variable, Given that a transaction input consists of two records and possibly additional information,
which includes two slots for the two old records. it makes sense to also have a circuit type `TransactionInput`,
All these circuit types should be explicitly documented, which includes two `Record` slots and possibly additional slots.
and available to the Leo program.
However, we probably want `input` to be read-only, Additionally, it makes sense to have a circuit type `TransactionOutput`
i.e. disallow assigning to an old record slot. that describes the output data of a transaction that is produced by the Leo program.
Designating `input` as `const` does not seem right, This could also include two `Record` slots for the new records,
as that designation normally means that it is compiled into the circuit. or possibly "subsets" of records if the values of some record slots are calculated
Instead, we could provide read-only access via member function (e.g. `payload()`, `balance()`), not by the Leo program but instead by the Leo CLI (i.e. build process).
All these types should be documented, as part of the standard library.
We will need to flesh out their exact definition,
but we note that this is fairly easy to change when it is in the standard library.
### Entry Point Input and Output Types
We propose that each entry point function of a Leo program
explicitly produce transaction outputs from transaction inputs
by taking a `TransactionInput` input and returning a `TransactionOutput` output:
```
@entrypoint
function ...(input: TransactionInput, ...) -> TransactionOutput { ... }
```
This way, the calculation of transaction outputs from transaction inputs is made functional and explicit.
As special cases (both of which may apply to the same entry point):
1. We could allow the `TransactionInput` input to be absent,
when an entry point does not need access the transaction input data,
e.g. when producing new records without consuming old records.
2. We could allow the function output to be `()` instead of `TransactionOutput`,
when an entry point does not need to produce transaction outputs,
e.g. when consuming old records without producing new records.
Compared to the current Leo program execution model (described earlier in the background section),
`input` is made an explicit input here, instead of being like a built-in global variable.
Furthermore, the output type is restricted to be `TransactionOutput` (or `()`),
thus eliminating the implicit serialization and the asymmetry with the treatment of transaction inputs.
There is still no restriction on the non-`TransactionInput` inputs of an entry point function;
as noted earlier, they are existentially quantified in the zero-knowledge assertion.
Thus, a Leo program entry point can be now described as a mathematical function
```
entrypoint : Record x Record x Inputs -> Record x Record
```
where `Output` is no longer present.
(If `TransactionInput` includes additional data, besides the two old records, that may affect the transaction output,
then we would need to add that to this mathematical model;
however, the model above is sufficiently accurate for the current discussion.)
We may require the `TransactionInput` input of an entry point function, if present,
to be the first input of the function, for clarity and readability.
A question is whether we should extend that requirement to non-entry-point functions
that may be passed `TransactionInput` values.
We note that none of these restrictions are necessary, though.
A necessary restriction is that each entry point function takes at most one `TransactionInput` input.
We may require the `TransactionInput` input of an entry point function, if present,
to be called `input`, or some other predefined name.
However, this is not a necessary restriction, and we may decide to demote that to a convention rather than a requirement.
(Currently `input` is a keyword and its own kind of Leo expression, which slightly complicates the language.)
### Access to Transaction Input and Output Types
Currently the member variables of Leo circuit types are always accessible for both reading and writing.
It is thus possible for a Leo program
to read from the member variables of `TransactionInput`
and to write to the member variables of `TransactionOutput`.
Therefore, for an initial implementation,
it suffices for these two circuit types to provide member variables for the needed slots.
We might want the member variables of `TransactionInput` to be read-only.
This is not necessary for the transaction model to work:
so long as `TransactionInput` is properly initialized before calling the entry point,
and that after the call the resulting `TransactionOutput` is used to create the transaction,
there is no harm in the Leo program to modify the copy of `TransactionInput` passed to the program.
Nonetheless, we may want to enforce this restriction to encourage good coding practices
(unless we find a use case to the contrary).
There is currently no mechanism in Leo to enforce that.
Designating the transaction input as `const` is not right,
as that designation normally means that the value is compiled into the circuit.
We could provide read-only access via member function (e.g. `payload()`, `balance()`),
but we still have to prohibit assignments to member variables (which is currently allowed on any circuit type). but we still have to prohibit assignments to member variables (which is currently allowed on any circuit type).
As an orthogonal and more generally useful feature, As an orthogonal and more generally useful feature,
we could consider adding public/private access designations to Leo circuit members. we could consider adding public/private access designations to Leo circuit members.
Another approach is to avoid exposing the member variables, Another approach is to avoid exposing the member variables,
and just make the member functions available via an implicit import declaration. and just make the member functions available via an implicit import declaration.
All of this needs to be thought through more carefully, in the broader context of the Leo language design; All of this needs to be thought through more carefully, in the broader context of the Leo language design.
in any case, it should be clear that this can be made to work in some way,
and that Leo programs can access the old records through the special `input` variables.
One issue with the special `input` variable is whether it should be treated as a built-in global variable, If `TransactionInput` has member functions, it may also be useful for `TransactionOutput` to have member functions,
or whether it should be explicitly passed to the entry point functions and to the non-entry-point functions called by them. presumably to create new instances and to set values of member variables.
The first approach is more concise, while the second approach is more explicit.
Note that, in the second approach, we may want to enforce certain restrictions on the use of `input`,
e.g. we may not want to allow a call `f(input, input)` even if the parameters of `f` both have the same circuit type as `input`.
There is nothing inherently wrong with `f(input, input)`, i.e. with handling `input` by value,
except that perhaps `input` is a relatively large structure,
and duplicating it generates a (relatively) large number of R1CS constraints.
Another idea is to pass `input` by (immutable) reference behind the scenes,
analogously to how we pass `self` by mutable reference to functions with `mut self`.
The treatment of output records is less clear at this point. A somewhat related consideration is whether it should be allowed
As mentioned above, experimentation suggests that currently the output values of `main` are serialized into new records. to make copies of the `TransactionInput` value passed to an entry point function.
This is not "symmetric" with the treatment of input records. There is no harm in doing that: the model still works, as explained above.
It may be preferable to require the Leo code to perform its own serialization of high-level data to output records, (Since a `TransactionInput` is a relatively large structure,
which would often be the inverse of the deserialization from input records. there may be harm consisting in creating a relatively large number of R1CS constraints,
We could consider, for symmetry, to add a special `output` variable, but that may happen with user-defined types too, and is a separable problem.)
also with a known circuit type, Nonetheless, we may want to enforce a discipline of single-threadedness,
which contains (at least some of) the data in the output records, most notably the two payloads. which could also allow us to treat transaction input as an immutable reference behind the scenes,
(It may not contain all the data of the record because some slots thus reducing the number of R1CS constraints.
have to be computed by the underlying zero-knowledge mechanisms,
outside of the Leo code.)
This `output` variable would have to be read/write, unlike `input`.
Similarly to `input`, it could be either a built-in global variable
or passed around functions by reference, in a single-threaded way.
The single-threadedness is a more important requirement here,
since the variable is read/write,
i.e. it needs to be treated like a global variable,
in the sense that there is a single instance of it.
If we go the `output` variable route, a question is what happens with the outputs of the entry point functions Analogous considerations apply to `TransactionOutput`,
(i.e. the values in `Output`, in the mathematical function described earlier). namely whether it should be treated in a single-threaded way,
If all the output data is explicitly written into the output record by the Leo code, i.e. effectively as a built-in global variable,
then perhaps the Leo entry point functions should always return `()`, i.e. "nothing", which could enable compiler optimizations.
or perhaps they should be predicates, i.e. return `bool`,
where `true` indicates a successful check (e.g. "yes, this private input yields this commitment when hashed")
and `false` indicates a failed check.
Another possibility is to require entry point functions to return records as outputs. ### Input and Output Files
More precisely, these may be smaller structures than records,
because some of the slots of the records may only be calculated outside of Leo,
but for the present discussion we will assume that Leo can calculate the whole records.
As mentioned earlier, a transaction may generate 0, 1, or 2 new records.
Correspondingly, we could require entry point functions to return results of one of the following types:
```
@entrypoint function ...(...) -> () // when no new records are created
@entrypoint function ...(...) -> Record // when one new record is created
@entrypoint function ...(...) -> (Record, Record) // when two new records are created
// using an annotation for concreteness, but the point stands for the other options discussed
```
In other words, an entry point function can be now seen as a mathematical function
```
entrypoint : Record x Record x Inputs -> Record x Record
```
where one or both output records are dummy if the function creates less than two new records.
The above constrains each entry point to always return the same number of records. According to the new model proposed above, we should have just two files involved in the Leo compilation process:
Different entry point functions may return different numbers of records. 1. A `.in` file, from which the `TransactionInput` value is produced.
If we want the same entry point function 2. A `.out` file, produced from the `TransactionOutput` returned by the program.
to return different numbers of records in different situations,
then it could make sense to have a more general circuit type for the output of a transaction,
which may contain 0, 1, or 2 records, and possibly other information as needed,
and require entry point functions to uniformly return values of that type:
```
@entrypoint function ...(...) -> TransactionOutput // contains 0, 1, or 2 records
```
Earlier we discussed having a known and accessible circuit type for the `input` special variable. There seems to be no longer a need for a `.state` file and for explicit registers.
This type could be called `TransactionInput`, which mirrors `TransactionOutput`.
In this case, it seems more natural to treat `input` not as a global variable,
but as a parameter of entry functions;
it could be the first parameter, required for every entry function that accesses the transaction input:
```
@entrypoint function ...(input: TransactionInput, ...) -> TransactionOutput
```
We could even drop `input` as a special keyword/expression altogether,
and allow any name (but suggest a convention) for the `TransactionInput` parameter of entry point functions.
## Alternatives ## Alternatives
The 'Design' section above already outlines several alternatives to consider. The 'Design' section above still outlines certain alternatives to consider.
Once we make some specific choices, we can move the other options to this section. Once we make some specific choices, we can move the other options to this section.
### Built-in Global Variable for Transaction Input
Instead of having explicit `TransactionInput` inputs in entry point functions,
we could maintain the current approach of viewing `input` as a built-in global variable, of type `TransactionInput`.
Everything else would be the same, except that `input` would be implicitly available.
An advantage is that single-threadedness would be immediately guaranteed,
if we wanted to enforced that as discussed above.
On the other hand, explicating transaction inputs as entry point inputs makes the code more functional
and simplifies certain aspects of the Leo compiler.
### Built-in Global Variable for Transaction Output
Instead of having explicit `TransactionOutput` outputs in entry point functions,
we could introduce a built-in `output` global variable, of type `TransactionOutput`.
This has similar advantages and disadvantages to the ones discussed above for `input` as a built-in global variable.
In any case, we may want this `output` global variable alternative
to go hand-in-hand with the `input` global variable alternative.
That is, we either adopt both or none.
The current treatment in Leo is asymmetric in this respect.
### Implicit Serialization of Output Values
Instead of having an explicit `TransactionOutput` type
whose values describe exactly the output data for a transaction,
we could keep something like the current model,
in which an entry point function may return values of arbitrary types,
which are implicitly serialized into output records.
This may be a bit simpler for the beginning developer,
but it also introduces less control on the output data.
Futhermore, given that records have payloads of limited size,
it is not difficult to write a program that attempts to produce too much data.
In any case, if we were to go this route, there would be an asymmetry with the treatment of transaction inputs,
unless we also allow `input` (by this we mean the value passed to an entry point function with the transaction inputs)
to consist of arbitrary Leo types (subject to serialization size limitations).
Note that this requires the Leo type of `input` to potentially vary across different programs
(which appears to be the case in current Leo),
which is more complicated than having some fixed types in the standard library.
All in all, it seems that having `TransactionInput` and `TransactionOutput` types provides more explicit control.
Furthermore, in the future the Leo standard library could provide serialization and deserialization tools
that will make it easy to map between record slots and higher-level Leo types.