Add RFC for bounded recursion.

2024-11-26 16:37:30 +03:00 · 2021-06-06 21:57:35 -07:00 · 2021-06-06 21:57:35 -07:00 · 696f877e5b
commit 696f877e5b
parent e1b59a283c
1 changed files with 510 additions and 0 deletions
--- a/docs/rfc/002-bounded-recursion.md
+++ b/docs/rfc/002-bounded-recursion.md
@ -0,0 +1,510 @@
+# Leo RFC 002: Bounded Recursion
+
+## Authors
+
+- Max Bruce
+- Collin Chin
+- Alessandro Coglio
+- Eric McCarthy
+- Jon Pavlik
+- Damir Shamanaev
+- Damon Sicore
+- Howard Wu
+
+## Status
+
+DRAFT
+
+# Summary
+
+This proposal provides support for recursion in Leo,
+via a user-configurable limit to the allowed depth of the recursion.
+If the recursion can be completely inlined without exceeding the limit,
+compilation succeeds;
+otherwise, an informative message is shown to the user,
+who can try and increase the limit.
+Compilation may also fail
+if a circularity is detected before exceeding the limit.
+Future analyses may also recognize cases in which the recursion terminates,
+informing the user and setting or suggesting a sufficient limit.
+
+A similar approach could be also used for loops.
+User-configurable limits may be also appropriate for
+other compiler transformations that are known to terminate
+but could result in a very large number of R1CS constraints.
+
+# Motivation
+
+Leo currently allows functions to call other functions,
+but recursion is disallowed:
+a function cannot call itself, directly or indirectly.
+However, recursion is a natural programming idiom in some cases,
+compared to iteration (i.e. loops).
+
+# Background
+
+## Function Inlining
+
+Since R1CS are flat collections of constraints,
+compiling Leo to R1CS involves _flattening_ the Leo code:
+unrolling loops, inlining functions, decomposing arrays, etc.
+Of interest to this RFC is the inlining of functions,
+in which a function call is replaced with the function body,
+after binding the formal parameters to the the actual arguments,
+and taking care to rename apart variables to avoid conflicts.
+
+Since the `main` function is the entry point into a Leo program,
+conceptually, for the purpose of this RFC,
+we can think of function inlining as transitively inlining
+all the functions into `main`
+(this is just a conceptual model;
+it does not mean that it should be necessarily implemented this way).
+This is a simple example,
+where '`===> {<description>}`' indicates a transformation
+described in the curly braces:
+```js
+function f(x: u32) -> u32 {
+    return 2 * x;
+}
+function main(a: u32) -> u32 {
+    return f(a + 1);
+}
+
+===> {inline call f(a + 1)}
+
+function main(a: u32) -> u32 {
+    let x = a + 1; // bind actual argument to formal argument
+    return 2 * x; // replace call with body
+}
+```
+
+## Constants and Variables
+
+A Leo program has two kinds of inputs: constants and variables;
+both come from input files (which represent blockchain records).
+They are passed as arguments to the `main` functions:
+the parameters marked with `const` receive the constant inputs,
+while the other parameters receive the variable inputs.
+Leo has constants and variables,
+of which the just mentioned `main` parameters are examples;
+constants may only depend on literals and other constants,
+and therefore only on the constant inputs of the program;
+variables have no such restrictions.
+
+The distinction between constants and variables
+is significant to the compilation of Leo to R1CS.
+Even though specific values of both constant and variable inputs
+are known when the Leo program is compiled and the zk-proof is generated,
+the generated R1CS does not depend
+on the specific values of the variable inputs;
+it only depends on the specific values of the constant inputs.
+Stated another way, Leo variables are represented by R1CS variables,
+while Leo constants are folded into the R1CS.
+
+For instance, in
+```js
+function main(base: u32, const exponent: u32) -> u32 {
+    return base ** exponent; // raise base to exponent
+}
+```
+the base is a variable while the exponent is a constant.
+Both base and exponent are known, supplied in the input file,
+e.g. the base is 2 and the exponent is 5.
+However, only the information about the exponent being 5
+is folded into the R1CS, which retains the base as a variable.
+Conceptually, the R1CS corresponds to the _partially evaluated_ Leo program
+```js
+function main(base: u32) -> u32 {
+    return base ** 5;
+}
+```
+where the constant `exponent` has been replaced with its value 5.
+
+This partial evaluation is carried out
+as part of the Leo program flattening transformations mentioned earlier.
+This also involves constant propagation and folding,
+e.g. a constant expression `exponent + 1` is replaced with 6
+when the constant `exponent` is known to be 5.
+(The example program above does not need any constant propagation and folding.)
+
+Circling back to the topic of Leo function inlining,
+it is the case that, due to the aforementioned partial evaluation,
+the `const` arguments of function calls have known values
+when the flattening transformations are carried out.
+
+# Design
+
+After exemplifying how inlining of recursive functions may terminate or not,
+we discuss our approach to avoid non-termination.
+Then we discuss future optimizations,
+and how the approach to avoid non-termination of recursion
+may be used for other features of the Leo language.
+
+## Inlining Recursive Functions
+
+In the presence of recursion,
+attempting to exhaustively inline function calls does not terminate in general.
+However, in conjunction with the partial evaluation discussed above,
+inlining of recursive functions may terminate, under appropriate conditions.
+
+This is an example:
+```js
+function double(const count: u32, sum: u32) -> u32 {
+    if count > 1 {
+        return double(count - 1, sum + sum)
+    } else {
+        return sum + sum;
+    }
+}
+function main(x: u32) -> u32 {
+    return double(3, x)
+}
+
+===> {inline call double(3, x)}
+
+function main(x: u32) -> u32 {
+    let sum1 = x; // bind and rename parameter of function sum
+    if 3 > 1 {
+        return double(2, sum1 + sum1)
+    } else {
+        return sum1 + sum1;
+    }
+}
+
+===> {evaluate 3 > 1 to true and simplify if statement}
+
+function main(x: u32) -> u32 {
+    let sum1 = x;
+    return double(2, sum1 + sum1)
+}
+
+===> {inine call double(2, sum1 + sum1)}
+
+function main(x: u32) -> u32 {
+    let sum1 = x;
+    let sum2  = sum1 + sum1; // bind and rename parameter of function sum
+    if 2 > 1 {
+        return double(1, sum2 + sum2)
+    } else {
+        return sum2 + sum2;
+    }
+}
+
+===> {evaluate 2 > 1 to true and simplify if statement}
+
+function main(x: u32) -> u32 {
+    let sum1 = x;
+    let sum2  = sum1 + sum1;
+    return double(1, sum2 + sum2)
+}
+
+===> {inline call double(1, sum2 + sum2)}
+
+function main(x: u32) -> u32 {
+    let sum1 = x;
+    let sum2  = sum1 + sum1;
+    let sum3 = sum2 + sum2; // bind and rename parameter of function sum
+    if 1 > 1 {
+        return double(0, sum3 + sum3)
+    } else {
+        return sum3 + sum3;
+    }
+}
+
+===> {evaluate 1 > 1 to false and simplify if statement}
+
+function main(x: u32) -> u32 {
+    let sum1 = x;
+    let sum2  = sum1 + sum1;
+    let sum3 = sum2 + sum2;
+    return sum3 + sum3;
+}
+```
+
+This is a slightly more complex example
+```js
+function double(const count: u32, sum: u32) -> u32 {
+    if count > 1 && sum < 30 {
+        return double(count - 1, sum + sum)
+    } else {
+        return sum + sum;
+    }
+}
+function main(x: u32) -> u32 {
+    return double(3, x)
+}
+
+===> {inline call double(3, x)}
+
+function main(x: u32) -> u32 {
+    let sum1 = x; // bind and rename parameter of function sum
+    if 3 > 1 && sum1 < 30 {
+        return double(2, sum1 + sum1)
+    } else {
+        return sum1 + sum1;
+    }
+}
+
+===> {evaluate 3 > 1 to true and simplify if test}
+
+function main(x: u32) -> u32 {
+    let sum1 = x;
+    if sum1 < 30 {
+        return double(2, sum1 + sum1)
+    } else {
+        return sum1 + sum1;
+    }
+}
+
+===> {inline call double(2, sum1 + sum1)}
+
+function main(x: u32) -> u32 {
+    let sum1 = x;
+    if sum1 < 30 {
+        let sum2 = sum1 + sum1; // bind and rename parameter of function sum
+        if 2 > 1 && sum2 < 30 {
+            return double(1, sum2 + sum2)
+        } else {
+            return sum2 + sum2;
+        }
+    } else {
+        return sum1 + sum1;
+    }
+}
+
+===> {evaluate 2 > 1 to true and simplify if test}
+
+function main(x: u32) -> u32 {
+    let sum1 = x;
+    if sum1 < 30 {
+        let sum2 = sum1 + sum1;
+        if sum2 < 30 {
+            return double(1, sum2 + sum2)
+        } else {
+            return sum2 + sum2;
+        }
+    } else {
+        return sum1 + sum1;
+    }
+}
+
+===> {inline call double(1, sum2 + sum2)}
+
+function main(x: u32) -> u32 {
+    let sum1 = x;
+    if sum1 < 30 {
+        let sum2 = sum1 + sum1;
+        if sum2 < 30 {
+            let sum3 = sum2 + sum2; // bind and rename parameter of function sum
+            if 1 > 1 && sum3 < 30 {
+                return double(0, sum3 + sum3)
+            } else {
+                return sum3 + sum3;
+            }
+        } else {
+            return sum2 + sum2;
+        }
+    } else {
+        return sum1 + sum1;
+    }
+}
+
+===> {evaluate 1 > 1 to false and simplify if statement}
+
+function main(x: u32) -> u32 {
+    let sum1 = x;
+    if sum1 < 30 {
+        let sum2 = sum1 + sum1;
+        if sum2 < 30 {
+            let sum3 = sum2 + sum2;
+            return sum3 + sum3;
+        } else {
+            return sum2 + sum2;
+        }
+    } else {
+        return sum1 + sum1;
+    }
+}
+```
+
+But here is an example in which the inlining does not terminate:
+```js
+function forever(const n: u32) {
+    forever(n);
+}
+function main() {
+    forever(5);
+}
+
+===> {inline call forever(5)}
+
+function main() {
+    forever(5);
+}
+
+===> {inline call forever(5)}
+
+...
+```
+
+## Configurable Limit
+
+Our proposed approach to avoid non-termination
+when inlining recursive functions is to
+(i) keep track of the depth of the inlining call stack and
+(ii) stop when a certain limit is reached.
+If the limit is reached,
+the compiler will provide an informative message to the user,
+explaining which recursive calls caused the limit to be reached.
+The limit is configurable by the user.
+In particular, based on the informative message described above,
+the user may decide to re-attempt compilation with a higher limit.
+
+Both variants of the `double` example given earlier reach depth 3
+(if we start with depth 0 at `main`).
+
+The default limit (i.e. when the user does not specify one)
+should be chosen in a way that
+the compiler does not take too long to reach the limit.
+Since inlining larger functions
+takes more time than inlining smaller functions,
+we may consider adjusting the default limit
+based on some measure of the complexity of the functions.
+
+In any case, compiler responsiveness is a broader topic.
+As the Leo compiler sometimes performs expensive computations,
+it may be important that it provide periodic output to the user,
+to reassure them that the compiler is not stuck.
+
+## Circularity Detection
+
+Besides the depth of the inlining call stack,
+the compiler could also keep track of
+the values of the `const` arguments at each recursive call.
+If the same argument values are encountered twice,
+they indicate a circularity
+(see the discussion on termination analysis below):
+in that case, there is no need to continue inlining until the limit is reached,
+and the compiler can show to the user the trace of circular calls.
+
+This approach would readily reject the `forever` example given earlier.
+
+## Termination Analysis
+
+In general, a recursive function
+(a generic kind of function, not necessarily a Leo function)
+terminates when
+there exists a _measure_ of its arguments
+that decreases at each recursive call,
+under the tests that govern the recursive call,
+according to a _well-founded relation_.
+This is well known in theorem proving,
+where termination of recursive functions
+is needed for logical consistency.
+
+For example, the mathematical factorial function
+on the natural numbers (i.e. non-negative integers)
+```
+n! =def= [IF n = 0 THEN 1 ELSE n * (n-1)!]
+```
+terminates because, if `n` is not 0, we have that `n-1 < n`,
+and `<` is well-founded on the natural numbers;
+in this example, the measure of the (only) argument is the argument itself.
+(A relation is well-founded when
+it has no infinite strictly decreasing sequence;
+note that, in the factorial example,
+we are considering the relation on natural numbers only,
+not on all the integers).
+
+This property is undecidable in general,
+but there are many cases in which termination can be proved automatically,
+as routinely done in theorem provers.
+
+In Leo, we are interested in
+the termination of the inlining transformations.
+Therefore, the termination condition above
+must involve the `const` parameters of recursive functions:
+a recursive inlining in Leo terminates when
+there exists a measure of the `const` arguments
+that decreases at each recursive call,
+under the tests that govern the recursive call,
+according to a well-founded relation.
+The governing test must necessarily involve the `const` parameters,
+but they may involve variable parameters as well,
+as one of the `double` examples given earlier shows.
+
+We could have the Leo compiler attempt to recognize
+recursive functions whose `const` parameters
+satisfy the termination condition given above.
+(This does not have to involve any proof search;
+the compiler could just recognize structures
+for which a known proof can be readily instantiated.)
+If the recognition succeed,
+we know that the recursion inlining will terminate,
+and also possibly in how many steps,
+enabling the information to be presented to the user
+in case the configurable limit is insufficient.
+If the recognition fails,
+the compiler falls back to inlining until
+either inlining terminates or the limit is reached.
+
+## Application to Loops
+
+Loops are conceptually not different from recursion.
+Loops and recursion are two ways to repeat computations,
+and it is well-known that each can emulate the other in various ways.
+
+Currenly Leo restricts the allowed loop statements
+to a form whose unrolling always terminates at compile time.
+If we were to use a similar approach for recursion,
+we would only allow certain forms of recursion
+whose inlining always terminates at compile time
+(see the discussion above about termination analysis).
+
+Turning things around,
+we could consider allowing general forms of loops (e.g. `while` loops)
+and use a configurable limit to unroll loops.
+We could also detect circularities
+(when the values of the local constants of the loop repeat themselves).
+We could also perform a termination analysis on loops,
+which in particular would readily recognize
+the currently allowed loop forms to terminate.
+All of this should be the topic of a separate RFC.
+
+## Application to Potentially Slow Transformations
+
+Some flattening transformations in the Leo compiler are known to terminate,
+but they may take an excessively long time to do so.
+Examples include decomposing large arrays into their elements
+or decomposing large integers (e.g. of type `u128`) into their bits.
+Long compilation times have been observed for cases like these.
+
+Thus, we could consider using configurable limits
+not only for flattening transformations that may not otherwise terminate,
+but also for ones that may take a long time to do so.
+This is a broader topic that should be discussed in a separate RFC.
+
+# Drawbacks
+
+This proposal does not appear to bring any real drawbacks,
+other than making the compiler inevitably more complex.
+But the benefits to support recursion justifies the extra complexity.
+
+# Effect on Ecosystem
+
+This proposal does not appear to have any direct effects on the ecosystem.
+It simply enables certain Leo programs to be written in a more natural style.
+
+# Alternatives
+
+An alternative approach, hinted at in the above discussion about loops,
+is to take a similar approach to the current one for loops.
+That is, we could restrict the forms of allowed recursion
+to ones whose inlining is known to terminate at compile time.
+
+However, the configurable limit approach seems more flexible.
+It does not even preclude a termination analysis, as discussed earlier.
+Furthermore, in practical terms,
+non-termination is not much different from excessively long computation.
+and the configurable limit approach may be uniformly suitable
+to avoid both non-termination and excessively long computation.