mirror of
https://github.com/ProvableHQ/leo.git
synced 2024-12-23 18:21:38 +03:00
Merge pull request #1368 from AleoHQ/rfc-rec-trans
[RFC] Updates to record/transaction model
This commit is contained in:
commit
a5de1d6ee3
@ -1,4 +1,4 @@
|
|||||||
# Leo RFC 012: Record and Transaction Model
|
# Leo RFC 012: Improved Record and Transaction Model
|
||||||
|
|
||||||
## Authors
|
## Authors
|
||||||
|
|
||||||
@ -10,7 +10,7 @@ DRAFT
|
|||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
This RFC describes how Leo programs interact with the Aleo blockchain.
|
This RFC describes an improved model for how Leo programs interact with the Aleo blockchain.
|
||||||
The description is oriented to the Leo developer:
|
The description is oriented to the Leo developer:
|
||||||
it does not describe the zero-knowledge details,
|
it does not describe the zero-knowledge details,
|
||||||
as the whole purpose of Leo is to enable developers
|
as the whole purpose of Leo is to enable developers
|
||||||
@ -23,7 +23,7 @@ While Leo can be described as a regular programming language
|
|||||||
its purpose is to build applications for the Aleo blockchain.
|
its purpose is to build applications for the Aleo blockchain.
|
||||||
It is thus important to describe precisely how Leo programs operate in the Aleo blockchain.
|
It is thus important to describe precisely how Leo programs operate in the Aleo blockchain.
|
||||||
|
|
||||||
## Design
|
## Background
|
||||||
|
|
||||||
### Zexe
|
### Zexe
|
||||||
|
|
||||||
@ -36,6 +36,7 @@ The computation of the new records from the old records
|
|||||||
is arbitrary and unknown to the blockchain;
|
is arbitrary and unknown to the blockchain;
|
||||||
the blockchain only enforces that the old records satisfy known _death predicates_
|
the blockchain only enforces that the old records satisfy known _death predicates_
|
||||||
and that the new records satisfy known _birth predicates_.
|
and that the new records satisfy known _birth predicates_.
|
||||||
|
See the [Zexe paper](https://eprint.iacr.org/2018/962.pdf) for details.
|
||||||
|
|
||||||
### Aleo Blockchain
|
### Aleo Blockchain
|
||||||
|
|
||||||
@ -49,7 +50,7 @@ which may involve records owned by different parties
|
|||||||
|
|
||||||
One or both of the old records may be dummy,
|
One or both of the old records may be dummy,
|
||||||
if only one old actual record is desired,
|
if only one old actual record is desired,
|
||||||
or if new records are to be created "from nothing".
|
or if new records are created "from nothing".
|
||||||
One or both of the new records may be dummy,
|
One or both of the new records may be dummy,
|
||||||
if only one new actual record is desired,
|
if only one new actual record is desired,
|
||||||
or if old records just have to be consumed.
|
or if old records just have to be consumed.
|
||||||
@ -90,10 +91,10 @@ This is what 'corresponds' means, in that sentence.
|
|||||||
However, for the high-level purpose of this RFC, these are zero-knowledge details.
|
However, for the high-level purpose of this RFC, these are zero-knowledge details.
|
||||||
|
|
||||||
In general, the `main` function takes some `const` and some non-`const` inputs (declared as parameters),
|
In general, the `main` function takes some `const` and some non-`const` inputs (declared as parameters),
|
||||||
and returns an output (declared as a return type), which may be a tuple to effectively represent multiple outputs.
|
and returns an output (declared as a return type), which may be a tuple to represent "multiple" outputs.
|
||||||
The `const` inputs are compiled into the zero-knowledge circuit,
|
The `const` inputs are compiled into the zero-knowledge circuit,
|
||||||
so they can be ignored for our purpose here,
|
so they can be ignored for our purpose here,
|
||||||
leaving the non-`const` inputs and the output for consideration.
|
leaving only the non-`const` inputs and the output for consideration.
|
||||||
|
|
||||||
The execution of `main` can be described as a mathematical function
|
The execution of `main` can be described as a mathematical function
|
||||||
```
|
```
|
||||||
@ -131,14 +132,23 @@ on those values and on the old records
|
|||||||
yields the new records, along with some value in `Output`;
|
yields the new records, along with some value in `Output`;
|
||||||
this is, roughly speaking, the assertion proved in zero-knowledge.
|
this is, roughly speaking, the assertion proved in zero-knowledge.
|
||||||
|
|
||||||
### Proposed Leo Program Execution Model
|
### Input and Output Files
|
||||||
|
|
||||||
The current model described above seems adequate overall, but we need to:
|
Currently the compilation of a Leo program involves:
|
||||||
1. Clarify how Leo code reads old records and writes new records.
|
1. A `.in` file, containing `const` and non-`const` inputs.
|
||||||
2. Generalize from one entry point (i.e. the `main` function) to multiple entry points, in line with the smart contract paradigm.
|
2. A `.state` file, containing transaction data.
|
||||||
|
3. A `.out` file, containing results.
|
||||||
|
|
||||||
Generalizing from one `main` entry point to multiple ones is conceptually easy.
|
The compilation takes the first two files as inputs and returns the third file as output.
|
||||||
It means that, instead of implicitly designating `main` as the only entry point,
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
### Multiple Entry Points
|
||||||
|
|
||||||
|
We propose to generalize from one entry point (i.e. the `main` function) to multiple entry points,
|
||||||
|
in line with the smart contract paradigm.
|
||||||
|
|
||||||
|
Instead of implicitly designating `main` as the only entry point,
|
||||||
we need a mechanism to explicitly designate one or more Leo functions as entry points.
|
we need a mechanism to explicitly designate one or more Leo functions as entry points.
|
||||||
|
|
||||||
A simple approach could be to use an annotation like `@entrypoint` to designate _entry point functions_:
|
A simple approach could be to use an annotation like `@entrypoint` to designate _entry point functions_:
|
||||||
@ -167,12 +177,18 @@ entrypoint {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
Now let us turn to the issue of clarifying how the Leo code reads old records and writes new records.
|
In the rest of this design section we assume the annotation approach (i.e. `@entrypoint`) for concreteness,
|
||||||
|
but that can be replaced as soon as we converge on a choice.
|
||||||
|
|
||||||
|
### Types for Transaction Inputs and Outputs
|
||||||
|
|
||||||
|
We propose to add types for transaction inputs and outputs to the Leo standard library,
|
||||||
|
and possibly include them in the prelude that is implicitly imported by every Leo program.
|
||||||
|
|
||||||
Given that records have a fixed structure with typed slots,
|
Given that records have a fixed structure with typed slots,
|
||||||
their format could be described by a Leo circuit type,
|
their format could be described by a Leo circuit type, e.g. called `Record`,
|
||||||
whose member variables correspond to the slots.
|
whose member variables correspond to the slots.
|
||||||
The types of the slots would be fairly low-level,
|
The types of the slots are fairly low-level,
|
||||||
i.e. byte arrays (e.g. `u8[128]` for the payload)
|
i.e. byte arrays (e.g. `u8[128]` for the payload)
|
||||||
and unsigned integers (e.g. `u64` for the balance),
|
and unsigned integers (e.g. `u64` for the balance),
|
||||||
because they must have a clear correspondence with the serialized form of records.
|
because they must have a clear correspondence with the serialized form of records.
|
||||||
@ -181,104 +197,171 @@ its own deserialization of the payload bytes into higher-level Leo values;
|
|||||||
standard serialization/deserialization libraries for Leo types may be provided for this,
|
standard serialization/deserialization libraries for Leo types may be provided for this,
|
||||||
as an independent and more generally useful feature.
|
as an independent and more generally useful feature.
|
||||||
|
|
||||||
It may make sense to have a circuit type for the special `input` variable,
|
Given that a transaction input consists of two records and possibly additional information,
|
||||||
which includes two slots for the two old records.
|
it makes sense to also have a circuit type `TransactionInput`,
|
||||||
All these circuit types should be explicitly documented,
|
which includes two `Record` slots and possibly additional slots.
|
||||||
and available to the Leo program.
|
|
||||||
|
|
||||||
However, we probably want `input` to be read-only,
|
Additionally, it makes sense to have a circuit type `TransactionOutput`
|
||||||
i.e. disallow assigning to an old record slot.
|
that describes the output data of a transaction that is produced by the Leo program.
|
||||||
Designating `input` as `const` does not seem right,
|
This could also include two `Record` slots for the new records,
|
||||||
as that designation normally means that it is compiled into the circuit.
|
or possibly "subsets" of records if the values of some record slots are calculated
|
||||||
Instead, we could provide read-only access via member function (e.g. `payload()`, `balance()`),
|
not by the Leo program but instead by the Leo CLI (i.e. build process).
|
||||||
|
|
||||||
|
All these types should be documented, as part of the standard library.
|
||||||
|
We will need to flesh out their exact definition,
|
||||||
|
but we note that this is fairly easy to change when it is in the standard library.
|
||||||
|
|
||||||
|
### Entry Point Input and Output Types
|
||||||
|
|
||||||
|
We propose that each entry point function of a Leo program
|
||||||
|
explicitly produce transaction outputs from transaction inputs
|
||||||
|
by taking a `TransactionInput` input and returning a `TransactionOutput` output:
|
||||||
|
```
|
||||||
|
@entrypoint
|
||||||
|
function ...(input: TransactionInput, ...) -> TransactionOutput { ... }
|
||||||
|
```
|
||||||
|
This way, the calculation of transaction outputs from transaction inputs is made functional and explicit.
|
||||||
|
|
||||||
|
As special cases (both of which may apply to the same entry point):
|
||||||
|
1. We could allow the `TransactionInput` input to be absent,
|
||||||
|
when an entry point does not need access the transaction input data,
|
||||||
|
e.g. when producing new records without consuming old records.
|
||||||
|
2. We could allow the function output to be `()` instead of `TransactionOutput`,
|
||||||
|
when an entry point does not need to produce transaction outputs,
|
||||||
|
e.g. when consuming old records without producing new records.
|
||||||
|
|
||||||
|
Compared to the current Leo program execution model (described earlier in the background section),
|
||||||
|
`input` is made an explicit input here, instead of being like a built-in global variable.
|
||||||
|
Furthermore, the output type is restricted to be `TransactionOutput` (or `()`),
|
||||||
|
thus eliminating the implicit serialization and the asymmetry with the treatment of transaction inputs.
|
||||||
|
There is still no restriction on the non-`TransactionInput` inputs of an entry point function;
|
||||||
|
as noted earlier, they are existentially quantified in the zero-knowledge assertion.
|
||||||
|
|
||||||
|
Thus, a Leo program entry point can be now described as a mathematical function
|
||||||
|
```
|
||||||
|
entrypoint : Record x Record x Inputs -> Record x Record
|
||||||
|
```
|
||||||
|
where `Output` is no longer present.
|
||||||
|
(If `TransactionInput` includes additional data, besides the two old records, that may affect the transaction output,
|
||||||
|
then we would need to add that to this mathematical model;
|
||||||
|
however, the model above is sufficiently accurate for the current discussion.)
|
||||||
|
|
||||||
|
We may require the `TransactionInput` input of an entry point function, if present,
|
||||||
|
to be the first input of the function, for clarity and readability.
|
||||||
|
A question is whether we should extend that requirement to non-entry-point functions
|
||||||
|
that may be passed `TransactionInput` values.
|
||||||
|
We note that none of these restrictions are necessary, though.
|
||||||
|
A necessary restriction is that each entry point function takes at most one `TransactionInput` input.
|
||||||
|
|
||||||
|
We may require the `TransactionInput` input of an entry point function, if present,
|
||||||
|
to be called `input`, or some other predefined name.
|
||||||
|
However, this is not a necessary restriction, and we may decide to demote that to a convention rather than a requirement.
|
||||||
|
(Currently `input` is a keyword and its own kind of Leo expression, which slightly complicates the language.)
|
||||||
|
|
||||||
|
### Access to Transaction Input and Output Types
|
||||||
|
|
||||||
|
Currently the member variables of Leo circuit types are always accessible for both reading and writing.
|
||||||
|
It is thus possible for a Leo program
|
||||||
|
to read from the member variables of `TransactionInput`
|
||||||
|
and to write to the member variables of `TransactionOutput`.
|
||||||
|
Therefore, for an initial implementation,
|
||||||
|
it suffices for these two circuit types to provide member variables for the needed slots.
|
||||||
|
|
||||||
|
We might want the member variables of `TransactionInput` to be read-only.
|
||||||
|
This is not necessary for the transaction model to work:
|
||||||
|
so long as `TransactionInput` is properly initialized before calling the entry point,
|
||||||
|
and that after the call the resulting `TransactionOutput` is used to create the transaction,
|
||||||
|
there is no harm in the Leo program to modify the copy of `TransactionInput` passed to the program.
|
||||||
|
Nonetheless, we may want to enforce this restriction to encourage good coding practices
|
||||||
|
(unless we find a use case to the contrary).
|
||||||
|
|
||||||
|
There is currently no mechanism in Leo to enforce that.
|
||||||
|
Designating the transaction input as `const` is not right,
|
||||||
|
as that designation normally means that the value is compiled into the circuit.
|
||||||
|
|
||||||
|
We could provide read-only access via member function (e.g. `payload()`, `balance()`),
|
||||||
but we still have to prohibit assignments to member variables (which is currently allowed on any circuit type).
|
but we still have to prohibit assignments to member variables (which is currently allowed on any circuit type).
|
||||||
As an orthogonal and more generally useful feature,
|
As an orthogonal and more generally useful feature,
|
||||||
we could consider adding public/private access designations to Leo circuit members.
|
we could consider adding public/private access designations to Leo circuit members.
|
||||||
Another approach is to avoid exposing the member variables,
|
Another approach is to avoid exposing the member variables,
|
||||||
and just make the member functions available via an implicit import declaration.
|
and just make the member functions available via an implicit import declaration.
|
||||||
All of this needs to be thought through more carefully, in the broader context of the Leo language design;
|
All of this needs to be thought through more carefully, in the broader context of the Leo language design.
|
||||||
in any case, it should be clear that this can be made to work in some way,
|
|
||||||
and that Leo programs can access the old records through the special `input` variables.
|
|
||||||
|
|
||||||
One issue with the special `input` variable is whether it should be treated as a built-in global variable,
|
If `TransactionInput` has member functions, it may also be useful for `TransactionOutput` to have member functions,
|
||||||
or whether it should be explicitly passed to the entry point functions and to the non-entry-point functions called by them.
|
presumably to create new instances and to set values of member variables.
|
||||||
The first approach is more concise, while the second approach is more explicit.
|
|
||||||
Note that, in the second approach, we may want to enforce certain restrictions on the use of `input`,
|
|
||||||
e.g. we may not want to allow a call `f(input, input)` even if the parameters of `f` both have the same circuit type as `input`.
|
|
||||||
There is nothing inherently wrong with `f(input, input)`, i.e. with handling `input` by value,
|
|
||||||
except that perhaps `input` is a relatively large structure,
|
|
||||||
and duplicating it generates a (relatively) large number of R1CS constraints.
|
|
||||||
Another idea is to pass `input` by (immutable) reference behind the scenes,
|
|
||||||
analogously to how we pass `self` by mutable reference to functions with `mut self`.
|
|
||||||
|
|
||||||
The treatment of output records is less clear at this point.
|
A somewhat related consideration is whether it should be allowed
|
||||||
As mentioned above, experimentation suggests that currently the output values of `main` are serialized into new records.
|
to make copies of the `TransactionInput` value passed to an entry point function.
|
||||||
This is not "symmetric" with the treatment of input records.
|
There is no harm in doing that: the model still works, as explained above.
|
||||||
It may be preferable to require the Leo code to perform its own serialization of high-level data to output records,
|
(Since a `TransactionInput` is a relatively large structure,
|
||||||
which would often be the inverse of the deserialization from input records.
|
there may be harm consisting in creating a relatively large number of R1CS constraints,
|
||||||
We could consider, for symmetry, to add a special `output` variable,
|
but that may happen with user-defined types too, and is a separable problem.)
|
||||||
also with a known circuit type,
|
Nonetheless, we may want to enforce a discipline of single-threadedness,
|
||||||
which contains (at least some of) the data in the output records, most notably the two payloads.
|
which could also allow us to treat transaction input as an immutable reference behind the scenes,
|
||||||
(It may not contain all the data of the record because some slots
|
thus reducing the number of R1CS constraints.
|
||||||
have to be computed by the underlying zero-knowledge mechanisms,
|
|
||||||
outside of the Leo code.)
|
|
||||||
This `output` variable would have to be read/write, unlike `input`.
|
|
||||||
Similarly to `input`, it could be either a built-in global variable
|
|
||||||
or passed around functions by reference, in a single-threaded way.
|
|
||||||
The single-threadedness is a more important requirement here,
|
|
||||||
since the variable is read/write,
|
|
||||||
i.e. it needs to be treated like a global variable,
|
|
||||||
in the sense that there is a single instance of it.
|
|
||||||
|
|
||||||
If we go the `output` variable route, a question is what happens with the outputs of the entry point functions
|
Analogous considerations apply to `TransactionOutput`,
|
||||||
(i.e. the values in `Output`, in the mathematical function described earlier).
|
namely whether it should be treated in a single-threaded way,
|
||||||
If all the output data is explicitly written into the output record by the Leo code,
|
i.e. effectively as a built-in global variable,
|
||||||
then perhaps the Leo entry point functions should always return `()`, i.e. "nothing",
|
which could enable compiler optimizations.
|
||||||
or perhaps they should be predicates, i.e. return `bool`,
|
|
||||||
where `true` indicates a successful check (e.g. "yes, this private input yields this commitment when hashed")
|
|
||||||
and `false` indicates a failed check.
|
|
||||||
|
|
||||||
Another possibility is to require entry point functions to return records as outputs.
|
### Input and Output Files
|
||||||
More precisely, these may be smaller structures than records,
|
|
||||||
because some of the slots of the records may only be calculated outside of Leo,
|
|
||||||
but for the present discussion we will assume that Leo can calculate the whole records.
|
|
||||||
As mentioned earlier, a transaction may generate 0, 1, or 2 new records.
|
|
||||||
Correspondingly, we could require entry point functions to return results of one of the following types:
|
|
||||||
```
|
|
||||||
@entrypoint function ...(...) -> () // when no new records are created
|
|
||||||
@entrypoint function ...(...) -> Record // when one new record is created
|
|
||||||
@entrypoint function ...(...) -> (Record, Record) // when two new records are created
|
|
||||||
// using an annotation for concreteness, but the point stands for the other options discussed
|
|
||||||
```
|
|
||||||
In other words, an entry point function can be now seen as a mathematical function
|
|
||||||
```
|
|
||||||
entrypoint : Record x Record x Inputs -> Record x Record
|
|
||||||
```
|
|
||||||
where one or both output records are dummy if the function creates less than two new records.
|
|
||||||
|
|
||||||
The above constrains each entry point to always return the same number of records.
|
According to the new model proposed above, we should have just two files involved in the Leo compilation process:
|
||||||
Different entry point functions may return different numbers of records.
|
1. A `.in` file, from which the `TransactionInput` value is produced.
|
||||||
If we want the same entry point function
|
2. A `.out` file, produced from the `TransactionOutput` returned by the program.
|
||||||
to return different numbers of records in different situations,
|
|
||||||
then it could make sense to have a more general circuit type for the output of a transaction,
|
|
||||||
which may contain 0, 1, or 2 records, and possibly other information as needed,
|
|
||||||
and require entry point functions to uniformly return values of that type:
|
|
||||||
```
|
|
||||||
@entrypoint function ...(...) -> TransactionOutput // contains 0, 1, or 2 records
|
|
||||||
```
|
|
||||||
|
|
||||||
Earlier we discussed having a known and accessible circuit type for the `input` special variable.
|
There seems to be no longer a need for a `.state` file and for explicit registers.
|
||||||
This type could be called `TransactionInput`, which mirrors `TransactionOutput`.
|
|
||||||
In this case, it seems more natural to treat `input` not as a global variable,
|
|
||||||
but as a parameter of entry functions;
|
|
||||||
it could be the first parameter, required for every entry function that accesses the transaction input:
|
|
||||||
```
|
|
||||||
@entrypoint function ...(input: TransactionInput, ...) -> TransactionOutput
|
|
||||||
```
|
|
||||||
We could even drop `input` as a special keyword/expression altogether,
|
|
||||||
and allow any name (but suggest a convention) for the `TransactionInput` parameter of entry point functions.
|
|
||||||
|
|
||||||
## Alternatives
|
## Alternatives
|
||||||
|
|
||||||
The 'Design' section above already outlines several alternatives to consider.
|
The 'Design' section above still outlines certain alternatives to consider.
|
||||||
Once we make some specific choices, we can move the other options to this section.
|
Once we make some specific choices, we can move the other options to this section.
|
||||||
|
|
||||||
|
### Built-in Global Variable for Transaction Input
|
||||||
|
|
||||||
|
Instead of having explicit `TransactionInput` inputs in entry point functions,
|
||||||
|
we could maintain the current approach of viewing `input` as a built-in global variable, of type `TransactionInput`.
|
||||||
|
Everything else would be the same, except that `input` would be implicitly available.
|
||||||
|
|
||||||
|
An advantage is that single-threadedness would be immediately guaranteed,
|
||||||
|
if we wanted to enforced that as discussed above.
|
||||||
|
|
||||||
|
On the other hand, explicating transaction inputs as entry point inputs makes the code more functional
|
||||||
|
and simplifies certain aspects of the Leo compiler.
|
||||||
|
|
||||||
|
### Built-in Global Variable for Transaction Output
|
||||||
|
|
||||||
|
Instead of having explicit `TransactionOutput` outputs in entry point functions,
|
||||||
|
we could introduce a built-in `output` global variable, of type `TransactionOutput`.
|
||||||
|
|
||||||
|
This has similar advantages and disadvantages to the ones discussed above for `input` as a built-in global variable.
|
||||||
|
|
||||||
|
In any case, we may want this `output` global variable alternative
|
||||||
|
to go hand-in-hand with the `input` global variable alternative.
|
||||||
|
That is, we either adopt both or none.
|
||||||
|
The current treatment in Leo is asymmetric in this respect.
|
||||||
|
|
||||||
|
### Implicit Serialization of Output Values
|
||||||
|
|
||||||
|
Instead of having an explicit `TransactionOutput` type
|
||||||
|
whose values describe exactly the output data for a transaction,
|
||||||
|
we could keep something like the current model,
|
||||||
|
in which an entry point function may return values of arbitrary types,
|
||||||
|
which are implicitly serialized into output records.
|
||||||
|
|
||||||
|
This may be a bit simpler for the beginning developer,
|
||||||
|
but it also introduces less control on the output data.
|
||||||
|
Futhermore, given that records have payloads of limited size,
|
||||||
|
it is not difficult to write a program that attempts to produce too much data.
|
||||||
|
|
||||||
|
In any case, if we were to go this route, there would be an asymmetry with the treatment of transaction inputs,
|
||||||
|
unless we also allow `input` (by this we mean the value passed to an entry point function with the transaction inputs)
|
||||||
|
to consist of arbitrary Leo types (subject to serialization size limitations).
|
||||||
|
Note that this requires the Leo type of `input` to potentially vary across different programs
|
||||||
|
(which appears to be the case in current Leo),
|
||||||
|
which is more complicated than having some fixed types in the standard library.
|
||||||
|
|
||||||
|
All in all, it seems that having `TransactionInput` and `TransactionOutput` types provides more explicit control.
|
||||||
|
Furthermore, in the future the Leo standard library could provide serialization and deserialization tools
|
||||||
|
that will make it easy to map between record slots and higher-level Leo types.
|
||||||
|
Loading…
Reference in New Issue
Block a user