mirror of
https://github.com/ProvableHQ/leo.git
synced 2024-11-22 22:44:47 +03:00
Add RFC for bounded recursion.
This commit is contained in:
parent
e1b59a283c
commit
696f877e5b
510
docs/rfc/002-bounded-recursion.md
Normal file
510
docs/rfc/002-bounded-recursion.md
Normal file
@ -0,0 +1,510 @@
|
||||
# Leo RFC 002: Bounded Recursion
|
||||
|
||||
## Authors
|
||||
|
||||
- Max Bruce
|
||||
- Collin Chin
|
||||
- Alessandro Coglio
|
||||
- Eric McCarthy
|
||||
- Jon Pavlik
|
||||
- Damir Shamanaev
|
||||
- Damon Sicore
|
||||
- Howard Wu
|
||||
|
||||
## Status
|
||||
|
||||
DRAFT
|
||||
|
||||
# Summary
|
||||
|
||||
This proposal provides support for recursion in Leo,
|
||||
via a user-configurable limit to the allowed depth of the recursion.
|
||||
If the recursion can be completely inlined without exceeding the limit,
|
||||
compilation succeeds;
|
||||
otherwise, an informative message is shown to the user,
|
||||
who can try and increase the limit.
|
||||
Compilation may also fail
|
||||
if a circularity is detected before exceeding the limit.
|
||||
Future analyses may also recognize cases in which the recursion terminates,
|
||||
informing the user and setting or suggesting a sufficient limit.
|
||||
|
||||
A similar approach could be also used for loops.
|
||||
User-configurable limits may be also appropriate for
|
||||
other compiler transformations that are known to terminate
|
||||
but could result in a very large number of R1CS constraints.
|
||||
|
||||
# Motivation
|
||||
|
||||
Leo currently allows functions to call other functions,
|
||||
but recursion is disallowed:
|
||||
a function cannot call itself, directly or indirectly.
|
||||
However, recursion is a natural programming idiom in some cases,
|
||||
compared to iteration (i.e. loops).
|
||||
|
||||
# Background
|
||||
|
||||
## Function Inlining
|
||||
|
||||
Since R1CS are flat collections of constraints,
|
||||
compiling Leo to R1CS involves _flattening_ the Leo code:
|
||||
unrolling loops, inlining functions, decomposing arrays, etc.
|
||||
Of interest to this RFC is the inlining of functions,
|
||||
in which a function call is replaced with the function body,
|
||||
after binding the formal parameters to the the actual arguments,
|
||||
and taking care to rename apart variables to avoid conflicts.
|
||||
|
||||
Since the `main` function is the entry point into a Leo program,
|
||||
conceptually, for the purpose of this RFC,
|
||||
we can think of function inlining as transitively inlining
|
||||
all the functions into `main`
|
||||
(this is just a conceptual model;
|
||||
it does not mean that it should be necessarily implemented this way).
|
||||
This is a simple example,
|
||||
where '`===> {<description>}`' indicates a transformation
|
||||
described in the curly braces:
|
||||
```js
|
||||
function f(x: u32) -> u32 {
|
||||
return 2 * x;
|
||||
}
|
||||
function main(a: u32) -> u32 {
|
||||
return f(a + 1);
|
||||
}
|
||||
|
||||
===> {inline call f(a + 1)}
|
||||
|
||||
function main(a: u32) -> u32 {
|
||||
let x = a + 1; // bind actual argument to formal argument
|
||||
return 2 * x; // replace call with body
|
||||
}
|
||||
```
|
||||
|
||||
## Constants and Variables
|
||||
|
||||
A Leo program has two kinds of inputs: constants and variables;
|
||||
both come from input files (which represent blockchain records).
|
||||
They are passed as arguments to the `main` functions:
|
||||
the parameters marked with `const` receive the constant inputs,
|
||||
while the other parameters receive the variable inputs.
|
||||
Leo has constants and variables,
|
||||
of which the just mentioned `main` parameters are examples;
|
||||
constants may only depend on literals and other constants,
|
||||
and therefore only on the constant inputs of the program;
|
||||
variables have no such restrictions.
|
||||
|
||||
The distinction between constants and variables
|
||||
is significant to the compilation of Leo to R1CS.
|
||||
Even though specific values of both constant and variable inputs
|
||||
are known when the Leo program is compiled and the zk-proof is generated,
|
||||
the generated R1CS does not depend
|
||||
on the specific values of the variable inputs;
|
||||
it only depends on the specific values of the constant inputs.
|
||||
Stated another way, Leo variables are represented by R1CS variables,
|
||||
while Leo constants are folded into the R1CS.
|
||||
|
||||
For instance, in
|
||||
```js
|
||||
function main(base: u32, const exponent: u32) -> u32 {
|
||||
return base ** exponent; // raise base to exponent
|
||||
}
|
||||
```
|
||||
the base is a variable while the exponent is a constant.
|
||||
Both base and exponent are known, supplied in the input file,
|
||||
e.g. the base is 2 and the exponent is 5.
|
||||
However, only the information about the exponent being 5
|
||||
is folded into the R1CS, which retains the base as a variable.
|
||||
Conceptually, the R1CS corresponds to the _partially evaluated_ Leo program
|
||||
```js
|
||||
function main(base: u32) -> u32 {
|
||||
return base ** 5;
|
||||
}
|
||||
```
|
||||
where the constant `exponent` has been replaced with its value 5.
|
||||
|
||||
This partial evaluation is carried out
|
||||
as part of the Leo program flattening transformations mentioned earlier.
|
||||
This also involves constant propagation and folding,
|
||||
e.g. a constant expression `exponent + 1` is replaced with 6
|
||||
when the constant `exponent` is known to be 5.
|
||||
(The example program above does not need any constant propagation and folding.)
|
||||
|
||||
Circling back to the topic of Leo function inlining,
|
||||
it is the case that, due to the aforementioned partial evaluation,
|
||||
the `const` arguments of function calls have known values
|
||||
when the flattening transformations are carried out.
|
||||
|
||||
# Design
|
||||
|
||||
After exemplifying how inlining of recursive functions may terminate or not,
|
||||
we discuss our approach to avoid non-termination.
|
||||
Then we discuss future optimizations,
|
||||
and how the approach to avoid non-termination of recursion
|
||||
may be used for other features of the Leo language.
|
||||
|
||||
## Inlining Recursive Functions
|
||||
|
||||
In the presence of recursion,
|
||||
attempting to exhaustively inline function calls does not terminate in general.
|
||||
However, in conjunction with the partial evaluation discussed above,
|
||||
inlining of recursive functions may terminate, under appropriate conditions.
|
||||
|
||||
This is an example:
|
||||
```js
|
||||
function double(const count: u32, sum: u32) -> u32 {
|
||||
if count > 1 {
|
||||
return double(count - 1, sum + sum)
|
||||
} else {
|
||||
return sum + sum;
|
||||
}
|
||||
}
|
||||
function main(x: u32) -> u32 {
|
||||
return double(3, x)
|
||||
}
|
||||
|
||||
===> {inline call double(3, x)}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x; // bind and rename parameter of function sum
|
||||
if 3 > 1 {
|
||||
return double(2, sum1 + sum1)
|
||||
} else {
|
||||
return sum1 + sum1;
|
||||
}
|
||||
}
|
||||
|
||||
===> {evaluate 3 > 1 to true and simplify if statement}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x;
|
||||
return double(2, sum1 + sum1)
|
||||
}
|
||||
|
||||
===> {inine call double(2, sum1 + sum1)}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x;
|
||||
let sum2 = sum1 + sum1; // bind and rename parameter of function sum
|
||||
if 2 > 1 {
|
||||
return double(1, sum2 + sum2)
|
||||
} else {
|
||||
return sum2 + sum2;
|
||||
}
|
||||
}
|
||||
|
||||
===> {evaluate 2 > 1 to true and simplify if statement}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x;
|
||||
let sum2 = sum1 + sum1;
|
||||
return double(1, sum2 + sum2)
|
||||
}
|
||||
|
||||
===> {inline call double(1, sum2 + sum2)}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x;
|
||||
let sum2 = sum1 + sum1;
|
||||
let sum3 = sum2 + sum2; // bind and rename parameter of function sum
|
||||
if 1 > 1 {
|
||||
return double(0, sum3 + sum3)
|
||||
} else {
|
||||
return sum3 + sum3;
|
||||
}
|
||||
}
|
||||
|
||||
===> {evaluate 1 > 1 to false and simplify if statement}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x;
|
||||
let sum2 = sum1 + sum1;
|
||||
let sum3 = sum2 + sum2;
|
||||
return sum3 + sum3;
|
||||
}
|
||||
```
|
||||
|
||||
This is a slightly more complex example
|
||||
```js
|
||||
function double(const count: u32, sum: u32) -> u32 {
|
||||
if count > 1 && sum < 30 {
|
||||
return double(count - 1, sum + sum)
|
||||
} else {
|
||||
return sum + sum;
|
||||
}
|
||||
}
|
||||
function main(x: u32) -> u32 {
|
||||
return double(3, x)
|
||||
}
|
||||
|
||||
===> {inline call double(3, x)}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x; // bind and rename parameter of function sum
|
||||
if 3 > 1 && sum1 < 30 {
|
||||
return double(2, sum1 + sum1)
|
||||
} else {
|
||||
return sum1 + sum1;
|
||||
}
|
||||
}
|
||||
|
||||
===> {evaluate 3 > 1 to true and simplify if test}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x;
|
||||
if sum1 < 30 {
|
||||
return double(2, sum1 + sum1)
|
||||
} else {
|
||||
return sum1 + sum1;
|
||||
}
|
||||
}
|
||||
|
||||
===> {inline call double(2, sum1 + sum1)}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x;
|
||||
if sum1 < 30 {
|
||||
let sum2 = sum1 + sum1; // bind and rename parameter of function sum
|
||||
if 2 > 1 && sum2 < 30 {
|
||||
return double(1, sum2 + sum2)
|
||||
} else {
|
||||
return sum2 + sum2;
|
||||
}
|
||||
} else {
|
||||
return sum1 + sum1;
|
||||
}
|
||||
}
|
||||
|
||||
===> {evaluate 2 > 1 to true and simplify if test}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x;
|
||||
if sum1 < 30 {
|
||||
let sum2 = sum1 + sum1;
|
||||
if sum2 < 30 {
|
||||
return double(1, sum2 + sum2)
|
||||
} else {
|
||||
return sum2 + sum2;
|
||||
}
|
||||
} else {
|
||||
return sum1 + sum1;
|
||||
}
|
||||
}
|
||||
|
||||
===> {inline call double(1, sum2 + sum2)}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x;
|
||||
if sum1 < 30 {
|
||||
let sum2 = sum1 + sum1;
|
||||
if sum2 < 30 {
|
||||
let sum3 = sum2 + sum2; // bind and rename parameter of function sum
|
||||
if 1 > 1 && sum3 < 30 {
|
||||
return double(0, sum3 + sum3)
|
||||
} else {
|
||||
return sum3 + sum3;
|
||||
}
|
||||
} else {
|
||||
return sum2 + sum2;
|
||||
}
|
||||
} else {
|
||||
return sum1 + sum1;
|
||||
}
|
||||
}
|
||||
|
||||
===> {evaluate 1 > 1 to false and simplify if statement}
|
||||
|
||||
function main(x: u32) -> u32 {
|
||||
let sum1 = x;
|
||||
if sum1 < 30 {
|
||||
let sum2 = sum1 + sum1;
|
||||
if sum2 < 30 {
|
||||
let sum3 = sum2 + sum2;
|
||||
return sum3 + sum3;
|
||||
} else {
|
||||
return sum2 + sum2;
|
||||
}
|
||||
} else {
|
||||
return sum1 + sum1;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
But here is an example in which the inlining does not terminate:
|
||||
```js
|
||||
function forever(const n: u32) {
|
||||
forever(n);
|
||||
}
|
||||
function main() {
|
||||
forever(5);
|
||||
}
|
||||
|
||||
===> {inline call forever(5)}
|
||||
|
||||
function main() {
|
||||
forever(5);
|
||||
}
|
||||
|
||||
===> {inline call forever(5)}
|
||||
|
||||
...
|
||||
```
|
||||
|
||||
## Configurable Limit
|
||||
|
||||
Our proposed approach to avoid non-termination
|
||||
when inlining recursive functions is to
|
||||
(i) keep track of the depth of the inlining call stack and
|
||||
(ii) stop when a certain limit is reached.
|
||||
If the limit is reached,
|
||||
the compiler will provide an informative message to the user,
|
||||
explaining which recursive calls caused the limit to be reached.
|
||||
The limit is configurable by the user.
|
||||
In particular, based on the informative message described above,
|
||||
the user may decide to re-attempt compilation with a higher limit.
|
||||
|
||||
Both variants of the `double` example given earlier reach depth 3
|
||||
(if we start with depth 0 at `main`).
|
||||
|
||||
The default limit (i.e. when the user does not specify one)
|
||||
should be chosen in a way that
|
||||
the compiler does not take too long to reach the limit.
|
||||
Since inlining larger functions
|
||||
takes more time than inlining smaller functions,
|
||||
we may consider adjusting the default limit
|
||||
based on some measure of the complexity of the functions.
|
||||
|
||||
In any case, compiler responsiveness is a broader topic.
|
||||
As the Leo compiler sometimes performs expensive computations,
|
||||
it may be important that it provide periodic output to the user,
|
||||
to reassure them that the compiler is not stuck.
|
||||
|
||||
## Circularity Detection
|
||||
|
||||
Besides the depth of the inlining call stack,
|
||||
the compiler could also keep track of
|
||||
the values of the `const` arguments at each recursive call.
|
||||
If the same argument values are encountered twice,
|
||||
they indicate a circularity
|
||||
(see the discussion on termination analysis below):
|
||||
in that case, there is no need to continue inlining until the limit is reached,
|
||||
and the compiler can show to the user the trace of circular calls.
|
||||
|
||||
This approach would readily reject the `forever` example given earlier.
|
||||
|
||||
## Termination Analysis
|
||||
|
||||
In general, a recursive function
|
||||
(a generic kind of function, not necessarily a Leo function)
|
||||
terminates when
|
||||
there exists a _measure_ of its arguments
|
||||
that decreases at each recursive call,
|
||||
under the tests that govern the recursive call,
|
||||
according to a _well-founded relation_.
|
||||
This is well known in theorem proving,
|
||||
where termination of recursive functions
|
||||
is needed for logical consistency.
|
||||
|
||||
For example, the mathematical factorial function
|
||||
on the natural numbers (i.e. non-negative integers)
|
||||
```
|
||||
n! =def= [IF n = 0 THEN 1 ELSE n * (n-1)!]
|
||||
```
|
||||
terminates because, if `n` is not 0, we have that `n-1 < n`,
|
||||
and `<` is well-founded on the natural numbers;
|
||||
in this example, the measure of the (only) argument is the argument itself.
|
||||
(A relation is well-founded when
|
||||
it has no infinite strictly decreasing sequence;
|
||||
note that, in the factorial example,
|
||||
we are considering the relation on natural numbers only,
|
||||
not on all the integers).
|
||||
|
||||
This property is undecidable in general,
|
||||
but there are many cases in which termination can be proved automatically,
|
||||
as routinely done in theorem provers.
|
||||
|
||||
In Leo, we are interested in
|
||||
the termination of the inlining transformations.
|
||||
Therefore, the termination condition above
|
||||
must involve the `const` parameters of recursive functions:
|
||||
a recursive inlining in Leo terminates when
|
||||
there exists a measure of the `const` arguments
|
||||
that decreases at each recursive call,
|
||||
under the tests that govern the recursive call,
|
||||
according to a well-founded relation.
|
||||
The governing test must necessarily involve the `const` parameters,
|
||||
but they may involve variable parameters as well,
|
||||
as one of the `double` examples given earlier shows.
|
||||
|
||||
We could have the Leo compiler attempt to recognize
|
||||
recursive functions whose `const` parameters
|
||||
satisfy the termination condition given above.
|
||||
(This does not have to involve any proof search;
|
||||
the compiler could just recognize structures
|
||||
for which a known proof can be readily instantiated.)
|
||||
If the recognition succeed,
|
||||
we know that the recursion inlining will terminate,
|
||||
and also possibly in how many steps,
|
||||
enabling the information to be presented to the user
|
||||
in case the configurable limit is insufficient.
|
||||
If the recognition fails,
|
||||
the compiler falls back to inlining until
|
||||
either inlining terminates or the limit is reached.
|
||||
|
||||
## Application to Loops
|
||||
|
||||
Loops are conceptually not different from recursion.
|
||||
Loops and recursion are two ways to repeat computations,
|
||||
and it is well-known that each can emulate the other in various ways.
|
||||
|
||||
Currenly Leo restricts the allowed loop statements
|
||||
to a form whose unrolling always terminates at compile time.
|
||||
If we were to use a similar approach for recursion,
|
||||
we would only allow certain forms of recursion
|
||||
whose inlining always terminates at compile time
|
||||
(see the discussion above about termination analysis).
|
||||
|
||||
Turning things around,
|
||||
we could consider allowing general forms of loops (e.g. `while` loops)
|
||||
and use a configurable limit to unroll loops.
|
||||
We could also detect circularities
|
||||
(when the values of the local constants of the loop repeat themselves).
|
||||
We could also perform a termination analysis on loops,
|
||||
which in particular would readily recognize
|
||||
the currently allowed loop forms to terminate.
|
||||
All of this should be the topic of a separate RFC.
|
||||
|
||||
## Application to Potentially Slow Transformations
|
||||
|
||||
Some flattening transformations in the Leo compiler are known to terminate,
|
||||
but they may take an excessively long time to do so.
|
||||
Examples include decomposing large arrays into their elements
|
||||
or decomposing large integers (e.g. of type `u128`) into their bits.
|
||||
Long compilation times have been observed for cases like these.
|
||||
|
||||
Thus, we could consider using configurable limits
|
||||
not only for flattening transformations that may not otherwise terminate,
|
||||
but also for ones that may take a long time to do so.
|
||||
This is a broader topic that should be discussed in a separate RFC.
|
||||
|
||||
# Drawbacks
|
||||
|
||||
This proposal does not appear to bring any real drawbacks,
|
||||
other than making the compiler inevitably more complex.
|
||||
But the benefits to support recursion justifies the extra complexity.
|
||||
|
||||
# Effect on Ecosystem
|
||||
|
||||
This proposal does not appear to have any direct effects on the ecosystem.
|
||||
It simply enables certain Leo programs to be written in a more natural style.
|
||||
|
||||
# Alternatives
|
||||
|
||||
An alternative approach, hinted at in the above discussion about loops,
|
||||
is to take a similar approach to the current one for loops.
|
||||
That is, we could restrict the forms of allowed recursion
|
||||
to ones whose inlining is known to terminate at compile time.
|
||||
|
||||
However, the configurable limit approach seems more flexible.
|
||||
It does not even preclude a termination analysis, as discussed earlier.
|
||||
Furthermore, in practical terms,
|
||||
non-termination is not much different from excessively long computation.
|
||||
and the configurable limit approach may be uniformly suitable
|
||||
to avoid both non-termination and excessively long computation.
|
Loading…
Reference in New Issue
Block a user