From 65cfffac9952d376dbb0a5a5f7be6f3ff193fbdf Mon Sep 17 00:00:00 2001 From: Alessandro Coglio Date: Mon, 13 Sep 2021 10:13:47 -0700 Subject: [PATCH] [RFC] Initial draft of record/transaction RFC. --- docs/rfc/012-record-transaction-model.md | 243 +++++++++++++++++++++++ 1 file changed, 243 insertions(+) create mode 100644 docs/rfc/012-record-transaction-model.md diff --git a/docs/rfc/012-record-transaction-model.md b/docs/rfc/012-record-transaction-model.md new file mode 100644 index 0000000000..3e6618dfdb --- /dev/null +++ b/docs/rfc/012-record-transaction-model.md @@ -0,0 +1,243 @@ +# Leo RFC 012: Record and Transaction Model + +## Authors + +The Aleo Team. + +## Status + +DRAFT + +## Summary + +This RFC describes how Leo programs interact with the Aleo blockchain. +The description is oriented to the Leo developer: +it does not describe the zero-knowledge details, +as the whole purpose of Leo is to enable developers +to write applications with only a very high-level understanding of zero-knowledge. + +## Motivation + +While Leo can be described as a regular programming language +(albeit with certain non-standard restrictions motivated by its compilation to zero-knowledge circuits), +its purpose is to build applications for the Aleo blockchain. +It is thus important to describe precisely how Leo programs operate in the Aleo blockchain. + +## Design + +### Zexe + +The Aleo blockchain follows the Zexe model, with some variations. +It is thus useful to briefly review some aspects of Zexe first. + +In Zexe, there are _records_ that contain application-specific data, +and _transactions_ that consume _n_ old records and produce _m_ new records. +The computation of the new records from the old records +is arbitrary and unknown to the blockchain; +the blockchain only enforces that the old records satisfy known _death predicates_ +and that the new records satisfy known _birth predicates_. + +### Aleo Blockchain + +In the Aleo blockchain, a transaction always consumes 2 old records and produces 2 new records. +That is, _n_ = 2 and _m_ = 2 with respect to the Zexe model. +Other choices are possible, and may be supported in the future; +the current choice of 2 old and 2 new records is motivated by +being the minimum to represent certain computations of interest, such as token exchanges, +which may involve records owned by different parties +(and therefore need to consume more than one record, since each record has a unique owner). + +One or both of the old records may be dummy, +if only one old actual record is desired, +or if new records are to be created "from nothing". +One or both of the new records may be dummy, +if only one new actual record is desired, +or if old records just have to be consumed. + +Aleo records and transactions have a well-defined structure. +They are ordered collections of typed slots. +Of particular interest is the _payload_ slot, +which contains a fixed number of bytes (currently 128) +to store application-specific data. +(Note that the developer documentation is out of date at the time of this writing.) + +In the Aleo blockchain, unlike Zexe, there is no separation among +computation of new records from old records, death predicates, and birth predicates. +Instead, a Leo program plays the role of all three, as described below. + +### Current Leo Program Execution Model + +A Leo program is a collection of files, +with `file` as defined in the ABNF grammar, +i.e. as a sequence of declarations. +A Leo program has one main file, +which may contain import declarations, +which resolve to other files, +which may in turn contain import declarations, +and so on until a closed set of files is obtained: +that (linked) set of files is the _program_. + +In order to be used in the Aleo blockchain, +a Leo program must include a function called `main`, in its aforementioned main file. +The processing of a transaction corresponds to an invocation of this `main` function. +The word 'corresponds' in the preceding sentence is important: +unlike other blockchains like Ethereum, +the processing of the transaction does not involve executing the Leo code; +rather, it involves checking a zero-knowledge proof +of the execution of the Leo program, +which was prepared when the Leo program was compiled. +This is what 'corresponds' means, in that sentence. +However, for the high-level purpose of this RFC, these are zero-knowledge details. + +In general, the `main` function takes some `const` and some non-`const` inputs (declared as parameters), +and returns an output (declared as a return type), which may be a tuple to effectively represent multiple outputs. +The `const` inputs are compiled into the zero-knowledge circuit, +so they can be ignored for our purpose here, +leaving the non-`const` inputs and the output for consideration. + +The execution of `main` can be described as a mathematical function +``` +main : Record * Record * Inputs -> Record * Record * Output +``` +where `*` is cartesian product, +`Record` is the set of possible records, +`Inputs` is the set of possible inputs to `main`, and +`Output` is the set of possible outputs from `main`. +(These sets can be thought as "types", but mathematically we talk about sets.) +That is, this mathematical function +takes three inputs (the two old records and the `main` inputs) +and returns three outputs (the two new records and the `main` output). +While `Record` is fixed, i.e. it is the same for all Leo programs, +both `Inputs` and `Output` differ based on the Leo input and output types of `main`. + +In the Leo code, in `main` or in functions called by `main`, +the values in `Inputs` are accessed via the `main` parameters, +while the old records are accessed via the special `input` variable, +which provides access to the two old records and their slots, +including the payloads that contain application-specific data. +The picture for new records and values in `Output` is less clear form the documentation: +experimentation suggests that the new records are obtained +by serializing the output value in `Output` (which, recall, may be a tuple). + +It is important to note that the values in `Inputs` do not come from the two old records. +Rather, they are private inputs, supplied by the developer +when they compile the Leo program and generate the zero-knowledge proof. +Indeed, as mentioned above, the processing of the transaction in the blockchain +does not execute the Leo code, and thus does not need to know the values in `Inputs`. +Rather, the blockchain has to verify a zero-knowledge proof asserting that +there exist values in `Input`, known to the creator of the transaction, +such that the execution of the Leo program's `main` +on those values and on the old records +yields the new records; +this is, roughly speaking, the assertion proved in zero-knowledge. + +### Proposed Leo Program Execution Model + +The current model described above seems adequate overall, but we need to: +1. Clarify how Leo code reads old records and writes new records. +2. Generalize from one entry point (i.e. the `main` function) to multiple entry points, in line with the smart contract paradigm. + +Generalizing from one `main` entry point to multiple ones is conceptually easy. +It means that, instead of implicitly designating `main` as the only entry point, +we need a mechanism to explicitly designate one or more Leo functions as entry points. + +A simple approach could to use an annotation like `@interface` to designate _interface functions_, i.e. entry points: +``` +@interface +function mint(...) -> ... { ... } + +@interface +function transfer(...) -> ... { ... } +``` +This has a precedent, in the use of `@test` to designate Leo test functions that are not compiled to circuits. + +Another approach is to use a keyword, e.g. +``` +interface function mint(...) -> ... { ... } + +interface function transfer(...) -> ... { ... } +``` +Yet another approach is to group interface functions inside a new block construct, e.g. +``` +interface { + + function mint(...) -> ... { ... } + + function transfer(...) -> ... { ... } +} +``` + +Now let us turn to the issue of clarifying how the Leo code reads old records and writes new records. + +Given that records have a fixed structure with typed slots, +their format could be described by a Leo circuit type, +whose member variables correspond to the slots. +The types of the slots would be fairly low-level, +i.e. byte arrays (e.g. `u8[128]` for the payload) +and unsigned integers (e.g. `u64` for the balance), +because they must have a clear correspondence with the serialized form of records. +This means that the Leo code may have to do +its own deserialization of the payload bytes into higher-level Leo values; +standard serialization/deserialization libraries for Leo types may be provided for this, +as an independent and more generally useful feature. + +It may make sense to have a circuit type for the special `input` variable, +which includes two slots for the two old records. +All these circuit types should be explicitly documented, +and available to the Leo program. + +However, we probably want `input` to be read-only, +i.e. disallow assigning to an old record slot. +Designating `input` as `const` does not seem right, +as that designation normally means that it is compiled into the circuit. +Instead, we could provide read-only access via member function (e.g. `get_payload()`, `get_balance()`), +but we still have to prohibit assignments to member variables (which is currently allowed on any circuit type). +As an orthogonal and more generally useful feature, +we could consider adding public/private access designations to Leo circuit members. +Another approach is to avoid exposing the member variables, +and just make the member functions available via an implicit import declaration. +All of this needs to be thought through more carefully, in the broader context of the Leo language design; +in any case, it should be clear that this can be made to work in some way, +and that Leo programs can access the old records throught the special `input` variables. + +One issue with the special `input` variable is whether it should be treated as a built-in global variable, +or whether it should be explicitly passed to the interface functions and to the non-interface functions called by them. +The first approach is more concise, while the second approach is more explicit. +Note that, in the second approach, we may want to enforce certain restrictions on the use of `input`, +e.g. we may not want to allow a call `f(input, input)` even if the parameters of `f` both have the same circuit type as `input`. +That is, even if we make the passing of `input` explicit, +there is still something special about it; +in essence, it still needs to be treated essentially like a global variable, +i.e. in a single-threaded way. + +The treatment of output records is less clear at this point. +As mentioned above, experimentation suggests that currently the output values of `main` are serialized into new records. +This is not "symmetric" with the treatment of input records. +It may be preferable to require the Leo code to perform its own serialization of high-level data to output records, +which would often be the inverse of the deserialization from input records. +We could consider, for symmetry, to add a special `output` variable, +also with a known circuit type, +which contains (at least some of) the data in the output records, most notably the two payloads. +(It may not contain all the data of the record because some slots +have to be computed by the underlying zero-knowledge mechanisms, +outside of the Leo code.) +This `output` variable would have to be read/write, unlike `input`. +Similarly to `input`, it could be either a built-in global variable +or passed around functions by reference, in a single-threaded way. +The single-threadedness is a more important requirement here, +since the variable is read/write, +i.e. it needs to be treated like a global variable, +in the sense that there is a single instance of it. + +If we go the `output` variable route, a question is what happens with the outputs of the interface functions +(i.e. the values in `Output`, in the mathematical function described earlier). +If all the output data is explicitly written into the output record by the Leo code, +then perhaps the Leo interface functions should always return `()`, i.e. "nothing", +or perhaps they should be predicates, i.e. return `bool`, +where `true` indicates a successful check (e.g. "yes, this private input yields this commitment when hashed") +and `false` indicates a failed check. + +## Alternatives + +The 'Design' section above already outlines several alternatives to consider. +Once we make some specific choices, we can move the other options to this section.