enso/docs/runtime/caching.md
Dmitry Bushev 34009e1111
Add the Runtime Cache Design Docs (#764)
* doc: add runtime cache design

* Apply suggestions from code review

doc: review edits

Co-authored-by: Ara Adkins <iamrecursion@users.noreply.github.com>

* doc: fix formatting

* doc: benchmark result

* doc: add sections

* doc: update cache backend

* doc: remove redundant eviction section

* doc: rename eviction strategy section

* doc: merge two eviction sections

Co-authored-by: Ara Adkins <iamrecursion@users.noreply.github.com>
2020-05-20 16:34:23 +03:00

20 KiB

layout title category tags order
developer-doc Caching runtime
runtime
caching
execution
1

Caching

It is not uncommon for users in data-analysis jobs to work with data on the order of gigabytes or even terabytes. As fast as computers have become, and as efficient as programming languages can be, you still don't want to compute on such large amounts of data unless you absolutely have to.

This wouldn't usually be an issue, with such data-analysis tasks being able to run in a 'batch mode', where the user starts their job in a fire-and-forget fashion. Enso, however, is a highly interactive environment for working with data, where waiting seconds, let alone hours, would severely hamper the user experience.

Given that Enso is a highly interactive language and platform, we want to take every measure to ensure that we provide a highly responsive experience to our users. To that end, one of the key tenets of the new runtime's featureset for aiding in this is the inclusion of a caching mechanism.

Caching, in this case, refers to the runtime's ability to 'remember' the values computed in the currently observed scopes. In combination with the data dependency analysis performed by the compiler, this allows the runtime to recompute the minimal set of expressions when the user makes a change, rather than having to recompute the entire program.

Cache Candidates

The key use of the Enso value cache is to support the interactive editing of user code. This means that it caches all bindings within the scope of a given function, including the function arguments. This means that, as users edit their code, we can ensure that the minimal amount of their program is recomputed.

Consider the following example:

foo a b =
    c = a.frob b
    d = c.wibble b
    a.quux d

The cache is active for the currently visible scope in Enso Studio, so when a user enters the function foo, the cache stores the intermediate results in this function (in this case c and d), as well as the inputs to the function (in this case a, and b).

All intermediate results and inputs are considered as candidates, though as the cache design evolves, the selected candidates may be refined. Ultimately we want to cache and reuse as much as possible to minimize the computation costs. At the same time, we want to limit the memory consumed by the cache.

Initial Cache Candidates

The initial version of the cache only stores the right-hand-sides of binding expressions. This is for two main reasons:

  • Firstly, this allows us to ensure that the cache does not cause the JVM to go out of memory between executions, allowing us to avoid the implementation of a memory-bounded cache for now.
  • It also simplifies the initial implementation of weighting program components.

Further Development of Cache Candidates

The next step for the cache is to expand the portions of the introspected scope that we cache. In general, this means the caching of intermediate expressions.

However, once we do this, we can no longer guarantee that we do not push the JVM out of memory between two program executions. This is best demonstrated by example.

a = (computeHugeObject b).size

Here we compute a value that takes up a significant amount of memory, but from it we only compute a small derived value (its size). Hence, if we want to cache the intermediate result of the discarded computeHugeObject b expression, we need some way of tracking the sizes of individual cache entries.

Partial-Evaluation and Side-Effects

The more theoretically-minded people among those reading this document may instantly realise that there is a problem with this approach. In the presence of caching, it becomes entirely unpredictable as to when side effects are executed. This is problematic in that side-effecting computations are rarely idempotent, and problems might be caused by executing them over and over again.

Furthermore, the nature of the interpreter's support for entering functions inherently requires that it recompute portions of that function in a different context, thereby potentially re-evaluating side-effecting computations as well.

In general, it is clear that many kinds of side effect have problems in the presence of caching and partial-evaluation.

Side Effects in the Initial Version

Many of the mechanisms required to deal with this kind of issue properly are complex and require deep type-level support in the compiler. To that end, the initial version of the interpreter is going to pretend that the problem doesn't really exist.

  • All intermediate values will be cached.
  • Cached values will be recomputed as necessary as described in the section on initial eviction strategies.

This can and will recompute side-effecting computations indiscriminately, but we cannot initially do much better.

A Stopgap

While the compiler won't have the machinery in place to properly track information about side-effects, we can implement a stop-gap solution that at least allows the GUI to allow users to make the decision about whether or not to recompute a side-effecting value. This is very similar to the initial approach used for functions with arguments marked as Suspended, and works as follows:

  • We provide explicit signatures (containing IO) for functions that perform side-effects.
  • Whenever the runtime wants to recompute a value that performs side-effects, it can use this information to ask for user input.
  • We can also display a box on these types always_reevaluate that lets users opt in to automatic re-evaluation of these values.

In The Future

As the compiler evolves, however, we can do better than this. In particular, we can employ type-system information to determine which functions are side-effecting (in absence of annotations), and to class some kinds of said functions as safe for either caching, re-evaluation, or both. What follows is a brief sketch of how this might work:

  • Rather than having a single type capturing side effects (like IO in Haskell) we divide the type up into fine-grained descriptions of side-effects that let us better describe the particular behaviours of given functions (e.g. IO.Read, IO.Write), all of which are more-specific versions of the base IO type.
  • We provide a set of interfaces that determine whether a given kind of side effect can be safely cached or re-evaluated (e.g. No_Cache or No_Reevaluate).
  • We can use this information to ask the user about recomputation in far less situations.

The actionables for this section are:

  • Evolve the strategy for handling side effects as the compiler provides more capabilities that will be useful in doing so.

Cache Eviction Strategy

The cache eviction strategy refers to the method by which we determine which entries in the cache are invalidated (if any) after a given change to the code.

Initial Eviction Strategy

In the initial version of the caching mechanism, the eviction strategies are intended to be fairly simplistic and conservative to ensure correctness.

  • The compiler performs data-dependency analysis between expressions.
  • If an expression is changed, all cached values for expressions that depend on it are evicted from the cache.
  • Expressions that have been evicted from the cache subsequently have to be recomputed by the runtime.

The following rules are applied when an expression identified by some key k is changed:

  1. All expressions that depend on the result of k are evicted from the cache.
  2. If k is a dynamic symbol, all expressions that depend on any instance of the dynamic symbol are evicted from the cache.

Future Eviction Strategies

In the future, however, the increasing sophistication of the front-end compiler for Enso will allow us to do better than this by accounting for more granular information in the eviction decisions.

Cache eviction takes into account the following aspects:

  • Visualization: In the first place, we should care about nodes that have visualization attached in the IDE.
  • Priority: The runtime can assign a score to a node, meaning how valuable this node is. Less valuable nodes should be evicted first.
  • Computation time: The runtime can calculate the time that node took to compute. Less computationally intensive nodes should be evicted first. Memory limit. The cache should not exceed the specified memory limit.

The actionables for this section are:

  • Evolve the cache eviction strategy by employing more granular information as the compiler evolves to provide it.

Dataflow Analysis

Dataflow analysis is the process by which the compiler discovers the relationships between program expressions. The output of the process is a data dependency graph that can be queried for an expression, and returns the set of all expressions that depended on that expression.

Internally we represent this as a directed graph:

  • An edge from a to b indicates that the expression a is depended on by the expression b.
  • These dependencies are direct dependencies on a.
  • We reconstruct transitive dependencies from this graph.

An expression a can be any Enso expression, including definitions of dynamic symbols. Given that dynamic symbols need not be in scope, care has to be taken with registering them properly.

Each expression in the compiler IR is annotated with both the set of expressions that depend on it, and the set of expressions that it depends on.

Identifying Expressions

Expressions are identified, for the purposes of dataflow analysis, by unique identifiers on every IR node. The dataflow analysis process creates a dependency graph between these identifiers.

However, at runtime, the IDE uses a different and separate concept of identifiers. Translating between these external identifiers and the internal identifiers is left to the runtime and is not the responsibility of the dataflow analysis pass.

Specifying Dataflow

Dataflow analysis takes place on the core set of language constructs, defined as those that extend IRKind.Primitive. Their dataflow is specified as follows, with arrows representing edges in the graph.

Atom

An atom is dependent on the definitions of its arguments, particularly with regard to any defaults.

atom <- arguments

Method

A method is dependent on the definition of its body. Methods at the point that dataflow analysis runs are 'bare' methods, meaning that they are defined as functions.

method <- body

Block

The value of a block is dependent purely on the value of its return expression. The return expression may depend on other values.

block <- returnValue

Binding

The value of a binding is dependent both on the name of the binding and the expression being assigned in the binding.

binding <- name
binding <- expression

Lambda

The value of a lambda is dependent on the return value from its body, as well as the definitions of any defaults for its arguments.

lambda <- body
lambda <- argumentDefaults

Definition Argument

The value of a function definition argument is dependent purely on the value of its default, if that default is present.

defArgument <- defaultExpression

Prefix Application

The value of a prefix application is dependent on the values of both the function expression being called, and the arguments.

prefix <- function
prefix <- arguments

Call Argument

The value of a call argument is dependent both on the value that it's wrapping, as well as the name it has, if it exists.

callArgument <- argumentValue
callArgument <- argumentName

Forced Term

A forced term is purely dependent on the value of the term being forced (the target).

force <- target

Typeset Members

A typeset member is dependent on the definition of its label, as well as the possibly present definitions of its type and value.

typesetMember <- label
typesetMember <- memberType
typesetMember <- memberValue

Typing Operators

All typing operators in Enso (IR.Type) are dependent on their constituent parts:

typingExpr <- expressionChildren

Name

An occurrence of a name is dependent on the definition site of that name. This means that it is broken down into two options:

  1. Static Dependency: The definition site for a given usage can be statically resolved.

    name <- staticUseSite
    
  2. Dynamic Dependency: The definition site for a given usage can only be determined to be a symbol resolved dynamically.

    name <- dynamicSymbol
    

    Under these circumstances, if any definition for dynamicSymbol changes, then all usages of that symbol must be invalidated, whether or not they used the changed definition in particular.

Case Expressions

The value of a case expression depends on the value of its scrutinee, as well as the definitions of its branches.

case <- scrutinee
case <- branches
case <- fallback

Case Branches

The value of a case branch depends on both the pattern expression and the result expression.

caseBranch <- casePattern
caseBranch <- expression

Comments

The value of a comment is purely dependent on the value of the commented entity.

comment <- commented

Cache Backend

The cache is implemented as key-value storage with an eviction function.

The cache stores the right-hand side expressions of the bindings in the key-value storage. The storage can be as simple as a Hash Map with values wrapped into the Soft References as a fallback strategy of clearing the cache. The eviction function purges invalidated expressions from previous program execution.

Further development

Cache intermediate results of expressions to reduce the cost of new computations and extend the eviction strategy to clear the cache based on memory consumption.

Extend the eviction strategy by adding an asynchronous scoring task that computes some properties of stored objects (e.g., the size of the object). Those properties can be used in the eviction strategy as optional clues, improving the hit ratio.

The actionables for this section are:

  • Evolve the cache by storing the results of intermediate expressions

  • Evolve the cache eviction strategy implementation by employing more information of the stored values

Memory Bounded Caches

Memory management refers to a way of controlling the size of the cache, avoiding the Out Of Memory errors.

The methods below can be divided into two approaches.

  1. Limiting the overall JVM memory and relying on garbage collection.
  2. Calculating the object's size and using it in the eviction strategy.

In general, to control the memory size on JVM, anything besides limiting the total amount involves some tradeoffs.

Soft References

Soft References is a way to mark the cache entries available for garbage collection whenever JVM runs a GC. In practice, it can cause long GC times and reduced overall performance. This strategy is generally considered as a last resort measure.

The effect of the GC can be mitigated by using the isolates (JSR121 implemented in GraalVM). One can think of an isolate as a lightweight JVM, running in a thread with their own heap, memory limit, and garbage collection.

The problem is that isolates can't share objects. And even if we move the cache to the separate isolate, that would require creating a mechanism of sharing objects based on pointers, which requires implementing serialization. On the other hand, serialization itself can provide the size of the object, which is enough to implement the eviction policy, even without running the isolates.

Serialization

One approach to get the size of the value stored in the cache is serialization. The downside is the computational overhead of transforming the object into a byte sequence.

Instrumentation

Another way of getting the object's size is to use JVM instrumentation. This approach requires running JVM with javaagent attached, which can complicate the deployment and have indirect performance penalties.

Manual Memory Management

This method implies tracking the size of the values by hand. It can be done for the values of Enso language, knowing the size of its primitive types. For interacting with other languages, Enso can provide an interface that developers should implement to have a better experience with the cache.

Comparison of Memory Management Approaches

Below are some key takeaways after experimenting with the instrumentation and serialization approaches.

Enso Runtime Benchmark

The existing runtime benchmark was executed with the java agent attached to measure the impact of instrumentation.

Serialization Benchmark

FST library was used for the serialization benchmark. It doesn't require an explicit scheme and relies on Java Serializable interface.

Instrumentation Benchmark

Java `Instrumentation#getObjectSize() can only provide a shallow memory of the object. It does not follow the references and only takes into account public fields containing primitive types.

Benchmark used the MemoryUtil#deepMemoryUsageOf function of Classmexer library. It utilizes Java reflection to follow the references and access the private fields of the object.

Benchmark Results

Benchmarks measured Java array, java.util.LinkedList, and a custom implementation of lined list ConsList, an object that maintains references for its head and tail.

  public static class ConsList<A> {

    private A head;
    private ConsList<A> tail;
...
}

Function execution time measured in milliseconds per operation (lower is better).

Benchmark                               (size)  Mode  Cnt     Score     Error  Units
FstBenchmark.serializeArray               1000  avgt    5    21.862 ±   0.503  us/op
FstBenchmark.serializeConsList            1000  avgt    5   151.791 ±  45.200  us/op
FstBenchmark.serializeLinkedList          1000  avgt    5    38.139 ±  12.932  us/op
InstrumentBenchmark.memoryOfArray         1000  avgt    5    17.700 ±   0.068  us/op
InstrumentBenchmark.memoryOfConsList      1000  avgt    5  1706.224 ±  61.631  us/op
InstrumentBenchmark.memoryOfLinkedList    1000  avgt    5  1866.783 ± 557.296  us/op
  • There are no slowdowns in running the Enso runtime benchmark with the instrumentation javaagent attached.
  • Serialization works in microsecond time range and operates on all Java objects implementing Serializable interface.
  • Java `Instrumentation#getObjectSize() performs in nanosecond time range. The deep inspection approach based on the reflection was significantly slower than the serialization.

The resulting approach can be a combination of one of the approaches with the introspection of the value. For example, it can be a case statement, analyzing the value and applying the appropriate method.

public long getSize(Object value) {
  if (value instanceof Primitive) {
    return getPrimitiveSize(value);
  } else if (value instanceof EnsoNode) {
    return introspectNode(value);
  } else {
    return null;
  }
}