Merge pull request #1368 from AleoHQ/rfc-rec-trans

[RFC] Updates to record/transaction model
2024-12-19 07:32:26 +03:00 · 2021-09-30 22:12:53 -07:00 · 2021-09-30 22:12:53 -07:00 · a5de1d6ee3
commit a5de1d6ee3
parent e2ba64b79d 9daeaf41e0
1 changed files with 183 additions and 100 deletions
--- a/docs/rfc/012-record-transaction-model.md
+++ b/docs/rfc/012-record-transaction-model.md
@ -1,4 +1,4 @@
-# Leo RFC 012: Record and Transaction Model
+# Leo RFC 012: Improved Record and Transaction Model

 ## Authors

@ -10,7 +10,7 @@ DRAFT

 ## Summary

-This RFC describes how Leo programs interact with the Aleo blockchain.
+This RFC describes an improved model for how Leo programs interact with the Aleo blockchain.
 The description is oriented to the Leo developer:
 it does not describe the zero-knowledge details,
 as the whole purpose of Leo is to enable developers
@ -23,7 +23,7 @@ While Leo can be described as a regular programming language
 its purpose is to build applications for the Aleo blockchain.
 It is thus important to describe precisely how Leo programs operate in the Aleo blockchain.

-## Design
+## Background

 ### Zexe

@ -36,6 +36,7 @@ The computation of the new records from the old records
 is arbitrary and unknown to the blockchain;
 the blockchain only enforces that the old records satisfy known _death predicates_
 and that the new records satisfy known _birth predicates_.
+See the [Zexe paper](https://eprint.iacr.org/2018/962.pdf) for details.

 ### Aleo Blockchain

@ -49,7 +50,7 @@ which may involve records owned by different parties

 One or both of the old records may be dummy,
 if only one old actual record is desired,
-or if new records are to be created "from nothing".
+or if new records are created "from nothing".
 One or both of the new records may be dummy,
 if only one new actual record is desired,
 or if old records just have to be consumed.
@ -90,10 +91,10 @@ This is what 'corresponds' means, in that sentence.
 However, for the high-level purpose of this RFC, these are zero-knowledge details.

 In general, the `main` function takes some `const` and some non-`const` inputs (declared as parameters),
-and returns an output (declared as a return type), which may be a tuple to effectively represent multiple outputs.
+and returns an output (declared as a return type), which may be a tuple to represent "multiple" outputs.
 The `const` inputs are compiled into the zero-knowledge circuit,
 so they can be ignored for our purpose here,
-leaving the non-`const` inputs and the output for consideration.
+leaving only the non-`const` inputs and the output for consideration.

 The execution of `main` can be described as a mathematical function
 ```
@ -131,14 +132,23 @@ on those values and on the old records
 yields the new records, along with some value in `Output`;
 this is, roughly speaking, the assertion proved in zero-knowledge.

-### Proposed Leo Program Execution Model
+### Input and Output Files

-The current model described above seems adequate overall, but we need to:
-1. Clarify how Leo code reads old records and writes new records.
-2. Generalize from one entry point (i.e. the `main` function) to multiple entry points, in line with the smart contract paradigm.
+Currently the compilation of a Leo program involves:
+1. A `.in` file, containing `const` and non-`const` inputs.
+2. A `.state` file, containing transaction data.
+3. A `.out` file, containing results.

-Generalizing from one `main` entry point to multiple ones is conceptually easy.
-It means that, instead of implicitly designating `main` as the only entry point,
+The compilation takes the first two files as inputs and returns the third file as output.
+
+## Design
+
+### Multiple Entry Points
+
+We propose to generalize from one entry point (i.e. the `main` function) to multiple entry points,
+in line with the smart contract paradigm.
+
+Instead of implicitly designating `main` as the only entry point,
 we need a mechanism to explicitly designate one or more Leo functions as entry points.

 A simple approach could be to use an annotation like `@entrypoint` to designate _entry point functions_:
@ -167,12 +177,18 @@ entrypoint {
 }
 ```

-Now let us turn to the issue of clarifying how the Leo code reads old records and writes new records.
+In the rest of this design section we assume the annotation approach (i.e. `@entrypoint`) for concreteness,
+but that can be replaced as soon as we converge on a choice.
+
+### Types for Transaction Inputs and Outputs
+
+We propose to add types for transaction inputs and outputs to the Leo standard library,
+and possibly include them in the prelude that is implicitly imported by every Leo program.

 Given that records have a fixed structure with typed slots,
-their format could be described by a Leo circuit type,
+their format could be described by a Leo circuit type, e.g. called `Record`,
 whose member variables correspond to the slots.
-The types of the slots would be fairly low-level,
+The types of the slots are fairly low-level,
 i.e. byte arrays (e.g. `u8[128]` for the payload)
 and unsigned integers (e.g. `u64` for the balance),
 because they must have a clear correspondence with the serialized form of records.
@ -181,104 +197,171 @@ its own deserialization of the payload bytes into higher-level Leo values;
 standard serialization/deserialization libraries for Leo types may be provided for this,
 as an independent and more generally useful feature.

-It may make sense to have a circuit type for the special `input` variable,
-which includes two slots for the two old records.
-All these circuit types should be explicitly documented,
-and available to the Leo program.
+Given that a transaction input consists of two records and possibly additional information,
+it makes sense to also have a circuit type `TransactionInput`,
+which includes two `Record` slots and possibly additional slots.

-However, we probably want `input` to be read-only,
-i.e. disallow assigning to an old record slot.
-Designating `input` as `const` does not seem right,
-as that designation normally means that it is compiled into the circuit.
-Instead, we could provide read-only access via member function (e.g. `payload()`, `balance()`),
+Additionally, it makes sense to have a circuit type `TransactionOutput`
+that describes the output data of a transaction that is produced by the Leo program.
+This could also include two `Record` slots for the new records,
+or possibly "subsets" of records if the values of some record slots are calculated
+not by the Leo program but instead by the Leo CLI (i.e. build process).
+
+All these types should be documented, as part of the standard library.
+We will need to flesh out their exact definition,
+but we note that this is fairly easy to change when it is in the standard library.
+
+### Entry Point Input and Output Types
+
+We propose that each entry point function of a Leo program
+explicitly produce transaction outputs from transaction inputs
+by taking a `TransactionInput` input and returning a `TransactionOutput` output:
+```
+@entrypoint
+function ...(input: TransactionInput, ...) -> TransactionOutput { ... }
+```
+This way, the calculation of transaction outputs from transaction inputs is made functional and explicit.
+
+As special cases (both of which may apply to the same entry point):
+1. We could allow the `TransactionInput` input to be absent,
+   when an entry point does not need access the transaction input data,
+   e.g. when producing new records without consuming old records.
+2. We could allow the function output to be `()` instead of `TransactionOutput`,
+   when an entry point does not need to produce transaction outputs,
+   e.g. when consuming old records without producing new records.
+
+Compared to the current Leo program execution model (described earlier in the background section),
+`input` is made an explicit input here, instead of being like a built-in global variable.
+Furthermore, the output type is restricted to be `TransactionOutput` (or `()`),
+thus eliminating the implicit serialization and the asymmetry with the treatment of transaction inputs.
+There is still no restriction on the non-`TransactionInput` inputs of an entry point function;
+as noted earlier, they are existentially quantified in the zero-knowledge assertion.
+
+Thus, a Leo program entry point can be now described as a mathematical function
+```
+entrypoint : Record x Record x Inputs -> Record x Record
+```
+where `Output` is no longer present.
+(If `TransactionInput` includes additional data, besides the two old records, that may affect the transaction output,
+then we would need to add that to this mathematical model;
+however, the model above is sufficiently accurate for the current discussion.)
+
+We may require the `TransactionInput` input of an entry point function, if present,
+to be the first input of the function, for clarity and readability.
+A question is whether we should extend that requirement to non-entry-point functions
+that may be passed `TransactionInput` values.
+We note that none of these restrictions are necessary, though.
+A necessary restriction is that each entry point function takes at most one `TransactionInput` input.
+
+We may require the `TransactionInput` input of an entry point function, if present,
+to be called `input`, or some other predefined name.
+However, this is not a necessary restriction, and we may decide to demote that to a convention rather than a requirement.
+(Currently `input` is a keyword and its own kind of Leo expression, which slightly complicates the language.)
+
+### Access to Transaction Input and Output Types
+
+Currently the member variables of Leo circuit types are always accessible for both reading and writing.
+It is thus possible for a Leo program
+to read from the member variables of `TransactionInput`
+and to write to the member variables of `TransactionOutput`.
+Therefore, for an initial implementation,
+it suffices for these two circuit types to provide member variables for the needed slots.
+
+We might want the member variables of `TransactionInput` to be read-only.
+This is not necessary for the transaction model to work:
+so long as `TransactionInput` is properly initialized before calling the entry point,
+and that after the call the resulting `TransactionOutput` is used to create the transaction,
+there is no harm in the Leo program to modify the copy of `TransactionInput` passed to the program.
+Nonetheless, we may want to enforce this restriction to encourage good coding practices
+(unless we find a use case to the contrary).
+
+There is currently no mechanism in Leo to enforce that.
+Designating the transaction input as `const` is not right,
+as that designation normally means that the value is compiled into the circuit.
+
+We could provide read-only access via member function (e.g. `payload()`, `balance()`),
 but we still have to prohibit assignments to member variables (which is currently allowed on any circuit type).
 As an orthogonal and more generally useful feature,
 we could consider adding public/private access designations to Leo circuit members.
 Another approach is to avoid exposing the member variables,
 and just make the member functions available via an implicit import declaration.
-All of this needs to be thought through more carefully, in the broader context of the Leo language design;
-in any case, it should be clear that this can be made to work in some way,
-and that Leo programs can access the old records through the special `input` variables.
+All of this needs to be thought through more carefully, in the broader context of the Leo language design.

-One issue with the special `input` variable is whether it should be treated as a built-in global variable,
-or whether it should be explicitly passed to the entry point functions and to the non-entry-point functions called by them.
-The first approach is more concise, while the second approach is more explicit.
-Note that, in the second approach, we may want to enforce certain restrictions on the use of `input`,
-e.g. we may not want to allow a call `f(input, input)` even if the parameters of `f` both have the same circuit type as `input`.
-There is nothing inherently wrong with `f(input, input)`, i.e. with handling `input` by value,
-except that perhaps `input` is a relatively large structure,
-and duplicating it generates a (relatively) large number of R1CS constraints.
-Another idea is to pass `input` by (immutable) reference behind the scenes,
-analogously to how we pass `self` by mutable reference to functions with `mut self`.
+If `TransactionInput` has member functions, it may also be useful for `TransactionOutput` to have member functions,
+presumably to create new instances and to set values of member variables.

-The treatment of output records is less clear at this point.
-As mentioned above, experimentation suggests that currently the output values of `main` are serialized into new records.
-This is not "symmetric" with the treatment of input records.
-It may be preferable to require the Leo code to perform its own serialization of high-level data to output records,
-which would often be the inverse of the deserialization from input records.
-We could consider, for symmetry, to add a special `output` variable,
-also with a known circuit type,
-which contains (at least some of) the data in the output records, most notably the two payloads.
-(It may not contain all the data of the record because some slots
-have to be computed by the underlying zero-knowledge mechanisms,
-outside of the Leo code.)
-This `output` variable would have to be read/write, unlike `input`.
-Similarly to `input`, it could be either a built-in global variable
-or passed around functions by reference, in a single-threaded way.
-The single-threadedness is a more important requirement here,
-since the variable is read/write,
-i.e. it needs to be treated like a global variable,
-in the sense that there is a single instance of it.
+A somewhat related consideration is whether it should be allowed
+to make copies of the `TransactionInput` value passed to an entry point function.
+There is no harm in doing that: the model still works, as explained above.
+(Since a `TransactionInput` is a relatively large structure,
+there may be harm consisting in creating a relatively large number of R1CS constraints,
+but that may happen with user-defined types too, and is a separable problem.)
+Nonetheless, we may want to enforce a discipline of single-threadedness,
+which could also allow us to treat transaction input as an immutable reference behind the scenes,
+thus reducing the number of R1CS constraints.

-If we go the `output` variable route, a question is what happens with the outputs of the entry point functions
-(i.e. the values in `Output`, in the mathematical function described earlier).
-If all the output data is explicitly written into the output record by the Leo code,
-then perhaps the Leo entry point functions should always return `()`, i.e. "nothing",
-or perhaps they should be predicates, i.e. return `bool`,
-where `true` indicates a successful check (e.g. "yes, this private input yields this commitment when hashed")
-and `false` indicates a failed check.
+Analogous considerations apply to `TransactionOutput`,
+namely whether it should be treated in a single-threaded way,
+i.e. effectively as a built-in global variable,
+which could enable compiler optimizations.

-Another possibility is to require entry point functions to return records as outputs.
-More precisely, these may be smaller structures than records,
-because some of the slots of the records may only be calculated outside of Leo,
-but for the present discussion we will assume that Leo can calculate the whole records.
-As mentioned earlier, a transaction may generate 0, 1, or 2 new records.
-Correspondingly, we could require entry point functions to return results of one of the following types:
-```
-@entrypoint function ...(...) -> () // when no new records are created
-@entrypoint function ...(...) -> Record // when one new record is created
-@entrypoint function ...(...) -> (Record, Record) // when two new records are created
-// using an annotation for concreteness, but the point stands for the other options discussed
-```
-In other words, an entry point function can be now seen as a mathematical function
-```
-entrypoint : Record x Record x Inputs -> Record x Record
-```
-where one or both output records are dummy if the function creates less than two new records.
+### Input and Output Files

-The above constrains each entry point to always return the same number of records.
-Different entry point functions may return different numbers of records.
-If we want the same entry point function
-to return different numbers of records in different situations,
-then it could make sense to have a more general circuit type for the output of a transaction,
-which may contain 0, 1, or 2 records, and possibly other information as needed,
-and require entry point functions to uniformly return values of that type:
-```
-@entrypoint function ...(...) -> TransactionOutput // contains 0, 1, or 2 records
-```
+According to the new model proposed above, we should have just two files involved in the Leo compilation process:
+1. A `.in` file, from which the `TransactionInput` value is produced.
+2. A `.out` file, produced from the `TransactionOutput` returned by the program.

-Earlier we discussed having a known and accessible circuit type for the `input` special variable.
-This type could be called `TransactionInput`, which mirrors `TransactionOutput`.
-In this case, it seems more natural to treat `input` not as a global variable,
-but as a parameter of entry functions;
-it could be the first parameter, required for every entry function that accesses the transaction input:
-```
-@entrypoint function ...(input: TransactionInput, ...) -> TransactionOutput
-```
-We could even drop `input` as a special keyword/expression altogether,
-and allow any name (but suggest a convention) for the `TransactionInput` parameter of entry point functions.
+There seems to be no longer a need for a `.state` file and for explicit registers.

 ## Alternatives

-The 'Design' section above already outlines several alternatives to consider.
+The 'Design' section above still outlines certain alternatives to consider.
 Once we make some specific choices, we can move the other options to this section.
+
+### Built-in Global Variable for Transaction Input
+
+Instead of having explicit `TransactionInput` inputs in entry point functions,
+we could maintain the current approach of viewing `input` as a built-in global variable, of type `TransactionInput`.
+Everything else would be the same, except that `input` would be implicitly available.
+
+An advantage is that single-threadedness would be immediately guaranteed,
+if we wanted to enforced that as discussed above.
+
+On the other hand, explicating transaction inputs as entry point inputs makes the code more functional
+and simplifies certain aspects of the Leo compiler.
+
+### Built-in Global Variable for Transaction Output
+
+Instead of having explicit `TransactionOutput` outputs in entry point functions,
+we could introduce a built-in `output` global variable, of type `TransactionOutput`.
+
+This has similar advantages and disadvantages to the ones discussed above for `input` as a built-in global variable.
+
+In any case, we may want this `output` global variable alternative
+to go hand-in-hand with the `input` global variable alternative.
+That is, we either adopt both or none.
+The current treatment in Leo is asymmetric in this respect.
+
+### Implicit Serialization of Output Values
+
+Instead of having an explicit `TransactionOutput` type
+whose values describe exactly the output data for a transaction,
+we could keep something like the current model,
+in which an entry point function may return values of arbitrary types,
+which are implicitly serialized into output records.
+
+This may be a bit simpler for the beginning developer,
+but it also introduces less control on the output data.
+Futhermore, given that records have payloads of limited size,
+it is not difficult to write a program that attempts to produce too much data.
+
+In any case, if we were to go this route, there would be an asymmetry with the treatment of transaction inputs,
+unless we also allow `input` (by this we mean the value passed to an entry point function with the transaction inputs)
+to consist of arbitrary Leo types (subject to serialization size limitations).
+Note that this requires the Leo type of `input` to potentially vary across different programs
+(which appears to be the case in current Leo),
+which is more complicated than having some fixed types in the standard library.
+
+All in all, it seems that having `TransactionInput` and `TransactionOutput` types provides more explicit control.
+Furthermore, in the future the Leo standard library could provide serialization and deserialization tools
+that will make it easy to map between record slots and higher-level Leo types.