--- layout: developer-doc title: Caching category: runtime tags: [runtime, caching, execution] order: 1 --- # Caching It is not uncommon for users in data-analysis jobs to work with data on the order of _gigabytes_ or even _terabytes_. As fast as computers have become, and as efficient as programming languages can be, you still don't want to compute on such large amounts of data unless you absolutely have to. This wouldn't usually be an issue, with such data-analysis tasks being able to run in a 'batch mode', where the user starts their job in a fire-and-forget fashion. Enso, however, is a highly _interactive_ environment for working with data, where waiting _seconds_, let alone _hours_, would severely hamper the user experience. Given that Enso is a highly interactive language and platform, we want to take every measure to ensure that we provide a highly responsive experience to our users. To that end, one of the key tenets of the new runtime's featureset for aiding in this is the inclusion of a _caching_ mechanism. Caching, in this case, refers to the runtime's ability to 'remember' the values computed in the currently observed scopes. In combination with the data dependency analysis performed by the compiler, this allows the runtime to recompute the _minimal_ set of expressions when the user makes a change, rather than having to recompute the entire program. - [Cache Candidates](#cache-candidates) - [Initial Cache Candidates](#initial-cache-candidates) - [Further Development of Cache Candidates](#further-development-of-cache-candidates) - [Partial-Evaluation and Side-Effects](#partial-evaluation-and-side-effects) - [Side Effects in the Initial Version](#side-effects-in-the-initial-version) - [In The Future](#in-the-future) - [Cache Eviction Strategy](#cache-eviction-strategy) - [Initial Eviction Strategy](#initial-eviction-strategy) - [Future Eviction Strategies](#future-eviction-strategies) - [Dataflow Analysis](#dataflow-analysis) - [Identifying Expressions](#identifying-expressions) - [Specifying Dataflow](#specifying-dataflow) - [Cache Backend](#cache-backend) - [Initial Implementation of Cache Backend](#initial-implementation-of-cache-backend) - [Further Development of Cache Backend](#further-development-of-cache-backend) - [Memory Bounded Caches](#memory-bounded-caches) - [Soft References](#soft-references) - [Serialization](#serialization) - [Instrumentation](#instrumentation) - [Manual Memory Management](#manual-memory-management) - [Comparison of Memory Management Approaches](#comparison-of-memory-management-approaches) ## Cache Candidates The key use of the Enso value cache is to support the interactive editing of user code. This means that it caches all bindings within the scope of a given function, including the function arguments. This means that, as users edit their code, we can ensure that the minimal amount of their program is recomputed. Consider the following example: ```ruby foo a b = c = a.frob b d = c.wibble b a.quux d ``` The cache is active for the _currently visible scope_ in Enso Studio, so when a user enters the function `foo`, the cache stores the intermediate results in this function (in this case `c` and `d`), as well as the inputs to the function (in this case `a`, and `b`). All intermediate results and inputs are considered as candidates, though as the cache design evolves, the _selected_ candidates may be refined. Ultimately we want to cache and reuse as much as possible to minimize the computation costs. At the same time, we want to limit the memory consumed by the cache. ### Initial Cache Candidates The initial version of the cache only stores the right-hand-sides of binding expressions. This is for two main reasons: - Firstly, this allows us to ensure that the cache does not cause the JVM to go out of memory between executions, allowing us to avoid the implementation of a [memory-bounded cache](#memory-management) for now. - It also simplifies the initial implementation of weighting program components. ### Further Development of Cache Candidates The next step for the cache is to expand the portions of the introspected scope that we cache. In general, this means the caching of intermediate expressions. However, once we do this, we can no longer guarantee that we do not push the JVM out of memory between two program executions. This is best demonstrated by example. ``` a = (computeHugeObject b).size ``` Here we compute a value that takes up a significant amount of memory, but from it we only compute a small derived value (its size). Hence, if we want to cache the intermediate result of the discarded `computeHugeObject b` expression, we need some way of tracking the sizes of individual cache entries. ## Partial-Evaluation and Side-Effects The more theoretically-minded people among those reading this document may instantly realise that there is a _problem_ with this approach. In the presence of caching, it becomes _entirely_ unpredictable as to when side effects are executed. This is problematic in that side-effecting computations are rarely idempotent, and problems might be caused by executing them over and over again. Furthermore, the nature of the interpreter's support for entering functions inherently requires that it recompute portions of that function in a different context, thereby potentially re-evaluating side-effecting computations as well. In general, it is clear that many kinds of side effect have _problems_ in the presence of caching and partial-evaluation. ### Side Effects in the Initial Version Many of the mechanisms required to deal with this kind of issue properly are complex and require deep type-level support in the compiler. To that end, the initial version of the interpreter is going to pretend that the problem doesn't really exist. - All intermediate values will be cached. - Cached values will be recomputed as necessary as described in the section on [initial eviction strategies](#initial-eviction-strategies). This can and _will_ recompute side-effecting computations indiscriminately, but we cannot initially do much better. #### A Stopgap While the compiler won't have the machinery in place to properly track information about side-effects, we can implement a stop-gap solution that at least allows the GUI to allow users to make the decision about whether or not to recompute a side-effecting value. This is very similar to the initial approach used for functions with arguments marked as `Suspended`, and works as follows: - We provide explicit signatures (containing `IO`) for functions that perform side-effects. - Whenever the runtime wants to recompute a value that performs side-effects, it can use this information to ask for user input. - We can also display a box on these types `always_reevaluate` that lets users opt in to automatic re-evaluation of these values. ### In The Future As the compiler evolves, however, we can do better than this. In particular, we can employ type-system information to determine which functions are side-effecting (in absence of annotations), and to class some kinds of said functions as safe for either caching, re-evaluation, or both. What follows is a brief sketch of how this might work: - Rather than having a single type capturing side effects (like `IO` in Haskell) we divide the type up into fine-grained descriptions of side-effects that let us better describe the particular behaviours of given functions (e.g. `IO.Read`, `IO.Write`), all of which are more-specific versions of the base `IO` type. - We provide a set of interfaces that determine whether a given kind of side effect can be safely cached or re-evaluated (e.g. `No_Cache` or `No_Reevaluate`). - We can use this information to ask the user about recomputation in far less situations. > The actionables for this section are: > > - Evolve the strategy for handling side effects as the compiler provides more > capabilities that will be useful in doing so. ## Cache Eviction Strategy The cache eviction strategy refers to the method by which we determine which entries in the cache are invalidated (if any) after a given change to the code. ### Initial Eviction Strategy In the initial version of the caching mechanism, the eviction strategies are intended to be fairly simplistic and conservative to ensure correctness. - The compiler performs data-dependency analysis between expressions. - If an expression is changed, all cached values for expressions that depend on it are evicted from the cache. - Expressions that have been evicted from the cache subsequently have to be recomputed by the runtime. The following rules are applied when an expression identified by some key `k` is changed: 1. All expressions that depend on the result of `k` are evicted from the cache. 2. If `k` is a dynamic symbol, all expressions that depend on _any instance_ of the dynamic symbol are evicted from the cache. ### Future Eviction Strategies In the future, however, the increasing sophistication of the front-end compiler for Enso will allow us to do better than this by accounting for more granular information in the eviction decisions. Cache eviction takes into account the following aspects: * **Visualization:** In the first place, we should care about nodes that have visualization attached in the IDE. * **Priority:** The runtime can assign a score to a node, meaning how valuable this node is. Less valuable nodes should be evicted first. * **Computation time:** The runtime can calculate the time that node took to compute. Less computationally intensive nodes should be evicted first. Memory limit. The cache should not exceed the specified memory limit. > The actionables for this section are: > > - Evolve the cache eviction strategy by employing more granular information > as the compiler evolves to provide it. ## Dataflow Analysis Dataflow analysis is the process by which the compiler discovers the relationships between program expressions. The output of the process is a data dependency graph that can be queried for an expression, and returns the set of all expressions that depended on that expression. Internally we represent this as a directed graph: - An edge from `a` to `b` indicates that the expression `a` is depended on by the expression `b`. - These dependencies are _direct_ dependencies on `a`. - We reconstruct transitive dependencies from this graph. An expression `a` can be any Enso expression, including definitions of dynamic symbols. Given that dynamic symbols need not be in scope, care has to be taken with registering them properly. Each expression in the compiler IR is annotated with both the set of expressions that depend on it, and the set of expressions that it depends on. ### Identifying Expressions Expressions are identified, for the purposes of dataflow analysis, by unique identifiers on every IR node. The dataflow analysis process creates a dependency graph between these identifiers. However, at runtime, the IDE uses a different and separate concept of identifiers. Translating between these external identifiers and the internal identifiers is left to the runtime and is not the responsibility of the dataflow analysis pass. ### Specifying Dataflow Dataflow analysis takes place on the core set of language constructs, defined as those that extend `IRKind.Primitive`. Their dataflow is specified as follows, with arrows representing edges in the graph. #### Atom An atom is dependent on the definitions of its arguments, particularly with regard to any defaults. ``` atom <- arguments ``` #### Method A method is dependent on the definition of its body. Methods at the point that dataflow analysis runs are 'bare' methods, meaning that they are defined as functions. ``` method <- body ``` #### Block The value of a block is dependent purely on the value of its return expression. The return expression may depend on other values. ``` block <- returnValue ``` #### Binding The value of a binding is dependent both on the name of the binding and the expression being assigned in the binding. ``` binding <- name binding <- expression ``` #### Lambda The value of a lambda is dependent on the return value from its body, as well as the definitions of any defaults for its arguments. ``` lambda <- body lambda <- argumentDefaults ``` #### Definition Argument The value of a function definition argument is dependent purely on the value of its default, if that default is present. ``` defArgument <- defaultExpression ``` #### Prefix Application The value of a prefix application is dependent on the values of both the function expression being called, and the arguments. ``` prefix <- function prefix <- arguments ``` #### Call Argument The value of a call argument is dependent both on the value that it's wrapping, as well as the name it has, if it exists. ``` callArgument <- argumentValue callArgument <- argumentName ``` #### Forced Term A forced term is purely dependent on the value of the term being forced (the `target`). ``` force <- target ``` #### Typeset Members A typeset member is dependent on the definition of its label, as well as the possibly present definitions of its type and value. ``` typesetMember <- label typesetMember <- memberType typesetMember <- memberValue ``` #### Typing Operators All typing operators in Enso (`IR.Type`) are dependent on their constituent parts: ``` typingExpr <- expressionChildren ``` #### Name An occurrence of a name is dependent on the definition site of that name. This means that it is broken down into two options: 1. **Static Dependency:** The definition site for a given usage can be statically resolved. ``` name <- staticUseSite ``` 2. **Dynamic Dependency:** The definition site for a given usage can only be determined to be a symbol resolved dynamically. ``` name <- dynamicSymbol ``` Under these circumstances, if any definition for `dynamicSymbol` changes, then _all_ usages of that symbol must be invalidated, whether or not they used the changed definition in particular. #### Case Expressions The value of a case expression depends on the value of its scrutinee, as well as the definitions of its branches. ``` case <- scrutinee case <- branches case <- fallback ``` #### Case Branches The value of a case branch depends on both the pattern expression and the result expression. ``` caseBranch <- casePattern caseBranch <- expression ``` #### Comments The value of a comment is purely dependent on the value of the commented entity. ``` comment <- commented ``` ## Cache Backend The cache is implemented as key-value storage with an eviction function. The cache stores the right-hand side expressions of the bindings in the key-value storage. The storage can be as simple as a Hash Map with values wrapped into the Soft References as a fallback strategy of clearing the cache. The eviction function purges invalidated expressions from previous program execution. ### Further development Cache intermediate results of expressions to reduce the cost of new computations and extend the eviction strategy to clear the cache based on memory consumption. Extend the eviction strategy by adding an asynchronous scoring task that computes some properties of stored objects (e.g., the size of the object). Those properties can be used in the eviction strategy as optional clues, improving the hit ratio. > The actionables for this section are: > > - Evolve the cache by storing the results of intermediate expressions > > - Evolve the cache eviction strategy implementation by employing more > information of the stored values ## Memory Bounded Caches Memory management refers to a way of controlling the size of the cache, avoiding the Out Of Memory errors. The methods below can be divided into two approaches. 1. Limiting the overall JVM memory and relying on garbage collection. 2. Calculating the object's size and using it in the eviction strategy. In general, to control the memory size on JVM, anything besides limiting the total amount involves some tradeoffs. ### Soft References Soft References is a way to mark the cache entries available for garbage collection whenever JVM runs a GC. In practice, it can cause long GC times and reduced overall performance. This strategy is generally considered as a last resort measure. The effect of the GC can be mitigated by using the _isolates_ (JSR121 implemented in GraalVM). One can think of an _isolate_ as a lightweight JVM, running in a thread with their own heap, memory limit, and garbage collection. The problem is that _isolates_ can't share objects. And even if we move the cache to the separate _isolate_, that would require creating a mechanism of sharing objects based on pointers, which requires implementing serialization. On the other hand, serialization itself can provide the size of the object, which is enough to implement the eviction policy, even without running the _isolates_. ### Serialization One approach to get the size of the value stored in the cache is _serialization_. The downside is the computational overhead of transforming the object into a byte sequence. ### Instrumentation Another way of getting the object's size is to use JVM instrumentation. This approach requires running JVM with _javaagent_ attached, which can complicate the deployment and have indirect performance penalties. ### Manual Memory Management This method implies tracking the size of the values by hand. It can be done for the values of Enso language, knowing the size of its primitive types. For interacting with other languages, Enso can provide an interface that developers should implement to have a better experience with the cache. ### Comparison of Memory Management Approaches Below are some key takeaways after experimenting with the _instrumentation_ and _serialization_ approaches. #### Enso Runtime Benchmark The existing runtime benchmark was executed with the java agent attached to measure the impact of instrumentation. #### Serialization Benchmark [FST](https://github.com/RuedigerMoeller/fast-serialization) library was used for the serialization benchmark. It doesn't require an explicit scheme and relies on Java `Serializable` interface. #### Instrumentation Benchmark Java [`Instrumentation#getObjectSize()](https://docs.oracle.com/javase/8/docs/api/java/lang/instrument/Instrumentation.html) can only provide a _shallow_ memory of the object. It does not follow the references and only takes into account public fields containing primitive types. Benchmark used the `MemoryUtil#deepMemoryUsageOf` function of [Classmexer](https://www.javamex.com/classmexer/) library. It utilizes Java reflection to follow the references and access the private fields of the object. #### Benchmark Results Benchmarks measured Java array, `java.util.LinkedList`, and a custom implementation of lined list `ConsList`, an object that maintains references for its head and tail. ``` java public static class ConsList { private A head; private ConsList tail; ... } ``` Function execution time measured in milliseconds per operation (lower is better). ``` text Benchmark (size) Mode Cnt Score Error Units FstBenchmark.serializeArray 1000 avgt 5 21.862 ± 0.503 us/op FstBenchmark.serializeConsList 1000 avgt 5 151.791 ± 45.200 us/op FstBenchmark.serializeLinkedList 1000 avgt 5 38.139 ± 12.932 us/op InstrumentBenchmark.memoryOfArray 1000 avgt 5 17.700 ± 0.068 us/op InstrumentBenchmark.memoryOfConsList 1000 avgt 5 1706.224 ± 61.631 us/op InstrumentBenchmark.memoryOfLinkedList 1000 avgt 5 1866.783 ± 557.296 us/op ``` - There are no slowdowns in running the Enso runtime benchmark with the instrumentation `javaagent` attached. - Serialization works in microsecond time range and operates on all Java objects implementing _Serializable_ interface. - Java `Instrumentation#getObjectSize() performs in nanosecond time range. The _deep_ inspection approach based on the reflection was significantly slower than the serialization. The resulting approach can be a combination of one of the approaches with the introspection of the value. For example, it can be a case statement, analyzing the value and applying the appropriate method. ```java public long getSize(Object value) { if (value instanceof Primitive) { return getPrimitiveSize(value); } else if (value instanceof EnsoNode) { return introspectNode(value); } else { return null; } } ```