* doc: add runtime cache design * Apply suggestions from code review doc: review edits Co-authored-by: Ara Adkins <iamrecursion@users.noreply.github.com> * doc: fix formatting * doc: benchmark result * doc: add sections * doc: update cache backend * doc: remove redundant eviction section * doc: rename eviction strategy section * doc: merge two eviction sections Co-authored-by: Ara Adkins <iamrecursion@users.noreply.github.com>
20 KiB
layout | title | category | tags | order | |||
---|---|---|---|---|---|---|---|
developer-doc | Caching | runtime |
|
1 |
Caching
It is not uncommon for users in data-analysis jobs to work with data on the order of gigabytes or even terabytes. As fast as computers have become, and as efficient as programming languages can be, you still don't want to compute on such large amounts of data unless you absolutely have to.
This wouldn't usually be an issue, with such data-analysis tasks being able to run in a 'batch mode', where the user starts their job in a fire-and-forget fashion. Enso, however, is a highly interactive environment for working with data, where waiting seconds, let alone hours, would severely hamper the user experience.
Given that Enso is a highly interactive language and platform, we want to take every measure to ensure that we provide a highly responsive experience to our users. To that end, one of the key tenets of the new runtime's featureset for aiding in this is the inclusion of a caching mechanism.
Caching, in this case, refers to the runtime's ability to 'remember' the values computed in the currently observed scopes. In combination with the data dependency analysis performed by the compiler, this allows the runtime to recompute the minimal set of expressions when the user makes a change, rather than having to recompute the entire program.
- Cache Candidates
- Partial-Evaluation and Side-Effects
- Cache Eviction Strategy
- Dataflow Analysis
- Cache Backend
- Memory Bounded Caches
Cache Candidates
The key use of the Enso value cache is to support the interactive editing of user code. This means that it caches all bindings within the scope of a given function, including the function arguments. This means that, as users edit their code, we can ensure that the minimal amount of their program is recomputed.
Consider the following example:
foo a b =
c = a.frob b
d = c.wibble b
a.quux d
The cache is active for the currently visible scope in Enso Studio, so when a
user enters the function foo
, the cache stores the intermediate results in
this function (in this case c
and d
), as well as the inputs to the function
(in this case a
, and b
).
All intermediate results and inputs are considered as candidates, though as the cache design evolves, the selected candidates may be refined. Ultimately we want to cache and reuse as much as possible to minimize the computation costs. At the same time, we want to limit the memory consumed by the cache.
Initial Cache Candidates
The initial version of the cache only stores the right-hand-sides of binding expressions. This is for two main reasons:
- Firstly, this allows us to ensure that the cache does not cause the JVM to go out of memory between executions, allowing us to avoid the implementation of a memory-bounded cache for now.
- It also simplifies the initial implementation of weighting program components.
Further Development of Cache Candidates
The next step for the cache is to expand the portions of the introspected scope that we cache. In general, this means the caching of intermediate expressions.
However, once we do this, we can no longer guarantee that we do not push the JVM out of memory between two program executions. This is best demonstrated by example.
a = (computeHugeObject b).size
Here we compute a value that takes up a significant amount of memory, but from
it we only compute a small derived value (its size). Hence, if we want to cache
the intermediate result of the discarded computeHugeObject b
expression, we
need some way of tracking the sizes of individual cache entries.
Partial-Evaluation and Side-Effects
The more theoretically-minded people among those reading this document may instantly realise that there is a problem with this approach. In the presence of caching, it becomes entirely unpredictable as to when side effects are executed. This is problematic in that side-effecting computations are rarely idempotent, and problems might be caused by executing them over and over again.
Furthermore, the nature of the interpreter's support for entering functions inherently requires that it recompute portions of that function in a different context, thereby potentially re-evaluating side-effecting computations as well.
In general, it is clear that many kinds of side effect have problems in the presence of caching and partial-evaluation.
Side Effects in the Initial Version
Many of the mechanisms required to deal with this kind of issue properly are complex and require deep type-level support in the compiler. To that end, the initial version of the interpreter is going to pretend that the problem doesn't really exist.
- All intermediate values will be cached.
- Cached values will be recomputed as necessary as described in the section on initial eviction strategies.
This can and will recompute side-effecting computations indiscriminately, but we cannot initially do much better.
A Stopgap
While the compiler won't have the machinery in place to properly track
information about side-effects, we can implement a stop-gap solution that at
least allows the GUI to allow users to make the decision about whether or not to
recompute a side-effecting value. This is very similar to the initial approach
used for functions with arguments marked as Suspended
, and works as follows:
- We provide explicit signatures (containing
IO
) for functions that perform side-effects. - Whenever the runtime wants to recompute a value that performs side-effects, it can use this information to ask for user input.
- We can also display a box on these types
always_reevaluate
that lets users opt in to automatic re-evaluation of these values.
In The Future
As the compiler evolves, however, we can do better than this. In particular, we can employ type-system information to determine which functions are side-effecting (in absence of annotations), and to class some kinds of said functions as safe for either caching, re-evaluation, or both. What follows is a brief sketch of how this might work:
- Rather than having a single type capturing side effects (like
IO
in Haskell) we divide the type up into fine-grained descriptions of side-effects that let us better describe the particular behaviours of given functions (e.g.IO.Read
,IO.Write
), all of which are more-specific versions of the baseIO
type. - We provide a set of interfaces that determine whether a given kind of side
effect can be safely cached or re-evaluated (e.g.
No_Cache
orNo_Reevaluate
). - We can use this information to ask the user about recomputation in far less situations.
The actionables for this section are:
- Evolve the strategy for handling side effects as the compiler provides more capabilities that will be useful in doing so.
Cache Eviction Strategy
The cache eviction strategy refers to the method by which we determine which entries in the cache are invalidated (if any) after a given change to the code.
Initial Eviction Strategy
In the initial version of the caching mechanism, the eviction strategies are intended to be fairly simplistic and conservative to ensure correctness.
- The compiler performs data-dependency analysis between expressions.
- If an expression is changed, all cached values for expressions that depend on it are evicted from the cache.
- Expressions that have been evicted from the cache subsequently have to be recomputed by the runtime.
The following rules are applied when an expression identified by some key k
is
changed:
- All expressions that depend on the result of
k
are evicted from the cache. - If
k
is a dynamic symbol, all expressions that depend on any instance of the dynamic symbol are evicted from the cache.
Future Eviction Strategies
In the future, however, the increasing sophistication of the front-end compiler for Enso will allow us to do better than this by accounting for more granular information in the eviction decisions.
Cache eviction takes into account the following aspects:
- Visualization: In the first place, we should care about nodes that have visualization attached in the IDE.
- Priority: The runtime can assign a score to a node, meaning how valuable this node is. Less valuable nodes should be evicted first.
- Computation time: The runtime can calculate the time that node took to compute. Less computationally intensive nodes should be evicted first. Memory limit. The cache should not exceed the specified memory limit.
The actionables for this section are:
- Evolve the cache eviction strategy by employing more granular information as the compiler evolves to provide it.
Dataflow Analysis
Dataflow analysis is the process by which the compiler discovers the relationships between program expressions. The output of the process is a data dependency graph that can be queried for an expression, and returns the set of all expressions that depended on that expression.
Internally we represent this as a directed graph:
- An edge from
a
tob
indicates that the expressiona
is depended on by the expressionb
. - These dependencies are direct dependencies on
a
. - We reconstruct transitive dependencies from this graph.
An expression a
can be any Enso expression, including definitions of dynamic
symbols. Given that dynamic symbols need not be in scope, care has to be taken
with registering them properly.
Each expression in the compiler IR is annotated with both the set of expressions that depend on it, and the set of expressions that it depends on.
Identifying Expressions
Expressions are identified, for the purposes of dataflow analysis, by unique identifiers on every IR node. The dataflow analysis process creates a dependency graph between these identifiers.
However, at runtime, the IDE uses a different and separate concept of identifiers. Translating between these external identifiers and the internal identifiers is left to the runtime and is not the responsibility of the dataflow analysis pass.
Specifying Dataflow
Dataflow analysis takes place on the core set of language constructs, defined
as those that extend IRKind.Primitive
. Their dataflow is specified as follows,
with arrows representing edges in the graph.
Atom
An atom is dependent on the definitions of its arguments, particularly with regard to any defaults.
atom <- arguments
Method
A method is dependent on the definition of its body. Methods at the point that dataflow analysis runs are 'bare' methods, meaning that they are defined as functions.
method <- body
Block
The value of a block is dependent purely on the value of its return expression. The return expression may depend on other values.
block <- returnValue
Binding
The value of a binding is dependent both on the name of the binding and the expression being assigned in the binding.
binding <- name
binding <- expression
Lambda
The value of a lambda is dependent on the return value from its body, as well as the definitions of any defaults for its arguments.
lambda <- body
lambda <- argumentDefaults
Definition Argument
The value of a function definition argument is dependent purely on the value of its default, if that default is present.
defArgument <- defaultExpression
Prefix Application
The value of a prefix application is dependent on the values of both the function expression being called, and the arguments.
prefix <- function
prefix <- arguments
Call Argument
The value of a call argument is dependent both on the value that it's wrapping, as well as the name it has, if it exists.
callArgument <- argumentValue
callArgument <- argumentName
Forced Term
A forced term is purely dependent on the value of the term being forced (the
target
).
force <- target
Typeset Members
A typeset member is dependent on the definition of its label, as well as the possibly present definitions of its type and value.
typesetMember <- label
typesetMember <- memberType
typesetMember <- memberValue
Typing Operators
All typing operators in Enso (IR.Type
) are dependent on their constituent
parts:
typingExpr <- expressionChildren
Name
An occurrence of a name is dependent on the definition site of that name. This means that it is broken down into two options:
-
Static Dependency: The definition site for a given usage can be statically resolved.
name <- staticUseSite
-
Dynamic Dependency: The definition site for a given usage can only be determined to be a symbol resolved dynamically.
name <- dynamicSymbol
Under these circumstances, if any definition for
dynamicSymbol
changes, then all usages of that symbol must be invalidated, whether or not they used the changed definition in particular.
Case Expressions
The value of a case expression depends on the value of its scrutinee, as well as the definitions of its branches.
case <- scrutinee
case <- branches
case <- fallback
Case Branches
The value of a case branch depends on both the pattern expression and the result expression.
caseBranch <- casePattern
caseBranch <- expression
Comments
The value of a comment is purely dependent on the value of the commented entity.
comment <- commented
Cache Backend
The cache is implemented as key-value storage with an eviction function.
The cache stores the right-hand side expressions of the bindings in the key-value storage. The storage can be as simple as a Hash Map with values wrapped into the Soft References as a fallback strategy of clearing the cache. The eviction function purges invalidated expressions from previous program execution.
Further development
Cache intermediate results of expressions to reduce the cost of new computations and extend the eviction strategy to clear the cache based on memory consumption.
Extend the eviction strategy by adding an asynchronous scoring task that computes some properties of stored objects (e.g., the size of the object). Those properties can be used in the eviction strategy as optional clues, improving the hit ratio.
The actionables for this section are:
Evolve the cache by storing the results of intermediate expressions
Evolve the cache eviction strategy implementation by employing more information of the stored values
Memory Bounded Caches
Memory management refers to a way of controlling the size of the cache, avoiding the Out Of Memory errors.
The methods below can be divided into two approaches.
- Limiting the overall JVM memory and relying on garbage collection.
- Calculating the object's size and using it in the eviction strategy.
In general, to control the memory size on JVM, anything besides limiting the total amount involves some tradeoffs.
Soft References
Soft References is a way to mark the cache entries available for garbage collection whenever JVM runs a GC. In practice, it can cause long GC times and reduced overall performance. This strategy is generally considered as a last resort measure.
The effect of the GC can be mitigated by using the isolates (JSR121 implemented in GraalVM). One can think of an isolate as a lightweight JVM, running in a thread with their own heap, memory limit, and garbage collection.
The problem is that isolates can't share objects. And even if we move the cache to the separate isolate, that would require creating a mechanism of sharing objects based on pointers, which requires implementing serialization. On the other hand, serialization itself can provide the size of the object, which is enough to implement the eviction policy, even without running the isolates.
Serialization
One approach to get the size of the value stored in the cache is serialization. The downside is the computational overhead of transforming the object into a byte sequence.
Instrumentation
Another way of getting the object's size is to use JVM instrumentation. This approach requires running JVM with javaagent attached, which can complicate the deployment and have indirect performance penalties.
Manual Memory Management
This method implies tracking the size of the values by hand. It can be done for the values of Enso language, knowing the size of its primitive types. For interacting with other languages, Enso can provide an interface that developers should implement to have a better experience with the cache.
Comparison of Memory Management Approaches
Below are some key takeaways after experimenting with the instrumentation and serialization approaches.
Enso Runtime Benchmark
The existing runtime benchmark was executed with the java agent attached to measure the impact of instrumentation.
Serialization Benchmark
FST library was used
for the serialization benchmark. It doesn't require an explicit scheme and
relies on Java Serializable
interface.
Instrumentation Benchmark
Java `Instrumentation#getObjectSize() can only provide a shallow memory of the object. It does not follow the references and only takes into account public fields containing primitive types.
Benchmark used the MemoryUtil#deepMemoryUsageOf
function of
Classmexer library. It utilizes Java
reflection to follow the references and access the private fields of the object.
Benchmark Results
Benchmarks measured Java array, java.util.LinkedList
, and a custom
implementation of lined list ConsList
, an object that maintains references for
its head and tail.
public static class ConsList<A> {
private A head;
private ConsList<A> tail;
...
}
Function execution time measured in milliseconds per operation (lower is better).
Benchmark (size) Mode Cnt Score Error Units
FstBenchmark.serializeArray 1000 avgt 5 21.862 ± 0.503 us/op
FstBenchmark.serializeConsList 1000 avgt 5 151.791 ± 45.200 us/op
FstBenchmark.serializeLinkedList 1000 avgt 5 38.139 ± 12.932 us/op
InstrumentBenchmark.memoryOfArray 1000 avgt 5 17.700 ± 0.068 us/op
InstrumentBenchmark.memoryOfConsList 1000 avgt 5 1706.224 ± 61.631 us/op
InstrumentBenchmark.memoryOfLinkedList 1000 avgt 5 1866.783 ± 557.296 us/op
- There are no slowdowns in running the Enso runtime benchmark with the
instrumentation
javaagent
attached. - Serialization works in microsecond time range and operates on all Java objects implementing Serializable interface.
- Java `Instrumentation#getObjectSize() performs in nanosecond time range. The deep inspection approach based on the reflection was significantly slower than the serialization.
The resulting approach can be a combination of one of the approaches with the introspection of the value. For example, it can be a case statement, analyzing the value and applying the appropriate method.
public long getSize(Object value) {
if (value instanceof Primitive) {
return getPrimitiveSize(value);
} else if (value instanceof EnsoNode) {
return introspectNode(value);
} else {
return null;
}
}