enso/docs/runtime-guide.md

217 lines
13 KiB
Markdown
Raw Normal View History

---
layout: developer-doc
title: Runtime Guide
category: summary
tags: [contributing, guide, graal, truffle]
order: 7
---
# Runtime Guide
## GraalVM and Truffle
### Papers
1. [One VM To Rule Them All](http://lafo.ssw.uni-linz.ac.at/papers/2013_Onward_OneVMToRuleThemAll.pdf)
a high-level overview of what GraalVM is and how it works.
2. [Practical Partial Evaluation for High-Performance Dynamic Language Runtimes](https://chrisseaton.com/rubytruffle/pldi17-truffle/pldi17-truffle.pdf)
an introduction to basic Truffle concepts, including Polymorphic Inline
Caches and other frequently used techniques.
3. [Fast, Flexible, Polyglot Instrumentation Support for Debuggers and other Tools](https://arxiv.org/pdf/1803.10201.pdf)
an introduction to Truffle instrumentation framework this is what Enso's
runtime server for the IDE uses.
4. [Cross-Language Interoperability in a Multi-Language Runtime](https://chrisseaton.com/truffleruby/cross-language-interop.pdf)
an introduction to how Truffle cross-language interop works.
5. [The whole list of publications](https://www.graalvm.org/community/publications/)
because something may be useful at some point.
### Tutorials
1. [The list of Truffle docs on specific topics](https://github.com/oracle/graal/tree/master/truffle/docs)
Certain more advanced topics are covered in these, use as needed.
1. [Optimizing Tutorial](https://github.com/oracle/graal/blob/master/truffle/docs/Optimizing.md)
you'll want to read this one for sure.
2. [TruffleLibrary Tutorial](https://github.com/oracle/graal/blob/master/truffle/docs/TruffleLibraries.md)
this is an important architectural concept for building Truffle
interpreters. We wish we knew about this sooner and we recommend you
structure the interpreter around this in the future.
2. [A tutorial on building a LISP in Truffle](https://cesquivias.github.io/blog/2015/01/15/writing-a-language-in-truffle-part-4-adding-features-the-truffle-way/)
It's a 4-part tutorial, linked is part 4, start with part 2 (part 1 is not
about Truffle). This one is important, even though it is old and uses stale
APIs it will still highlight the most important concepts, in particular the
way Enso deals with lexical scopes and Tail Call Optimization.
3. [Simple Language](https://github.com/graalvm/simplelanguage) this is an
implementation of a very simple toy language. Read it for basic understanding
of simple Truffle concepts.
### Tips and Tricks
1. Familiarize yourself with
[IGV](https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/Profiling/).
It's a horrible tool. It's clunky, ugly, and painful to use. It has also
saved us more times than we can count, definitely worth investing the time to
understand it. Download
[Enso Language Support for IGV](../tools/enso4igv/README.md). Use
[this tutorial](https://shopify.engineering/understanding-programs-using-graphs)
(and
[the follow up post](https://chrisseaton.com/truffleruby/basic-truffle-graphs/))
to familiarize yourself with the representation.
2. Use our sbt
[`withDebug`](https://github.com/enso-org/enso/blob/develop/project/WithDebugCommand.scala)
utility. Familiarize yourself with the different otpions. It is a useful
helper for running your programs and microbenchmarks with different Truffle
debugging options.
3. Use [hsdis](https://github.com/liuzhengyang/hsdis/) for printing the
generated assembly you can often spot obvious problems with compilations.
That being said, IGV (with
[Enso Language Support](../tools/enso4igv/README.md)) is usually the better
tool, if you take a look at the later compilation stages.
4. Pay attention to making things `final` and `@CompilationFinal`. This is the
most important way Graal does constant-folding. Whenever a loop bound can be
compilation final, take advantage (and use `@ExplodeLoop`).
5. Read the generated code for the nodes generated by the DSL. Learning the DSL
is quite difficult and the documentation is sorely lacking. It is best to
experiment with different kinds of `@Specialization` and read the generated
code. Without this understanding, it's way too easy to introduce very subtle
bugs to the language semantics.
6. Join the [GraalVM Slack server](https://www.graalvm.org/slack-invitation/).
All the authors are there and they will happily help and answer any
questions.
7. Be aware that Truffle Instrumentation is more constrained than it could be,
because it wants to be language agnostic. The Enso runtime server is
Enso-specific and therefore you may be better served in the future by rolling
your own instrumentation. Read the instrumentation sources, it will help you
understand how non-magical it actually is.
8. Clone the sources of Truffle and TruffleRuby. Set them up as projects in your
IDE. Read the code when in doubt. Truffle documentation is really lacking
sometimes, even though it is improving.
9. Understand the boundary between the language-side APIs (see e.g.
`InteropLibrary`) and embedder side (see `Value`). You want to make sure you
use the proper APIs in the proper places in the codebase. As a rule of thumb:
all code in the `runtime` project is language/instrumentation-side. All code
elsewhere is embedder-side. In particular, the only Graal dependency in
embedder code should be `graal-sdk`. If you find yourself pulling things like
`truffle-api`, you've done something wrong. Similarly, if you ever import
anything from `org.graalvm.polyglot` in the language code, you're doing
something wrong.
10. Avoid
[deoptimizations](https://www.graalvm.org/22.2/graalvm-as-a-platform/language-implementation-framework/Optimizing/#debugging-deoptimizations).
Understanding IGV graphs can be a very time-consuming and complex process
(even with the help of [Enso tooling for IGV](../tools/enso4igv/README.md)).
Sometimes it is sufficient to only look at the compilation traces to
discover repeated or unnecessary deoptimizations which can significantly
affect overall performance of your program. You can tell runner to generate
compilation traces via additional options:
```
JAVA_OPTS="-Dpolygot.engine.TracePerformanceWarnings=all -Dpolyglot.engine.TraceTransferToInterpreter=true -Dpolyglot.engine.TraceDeoptimizeFrame=true -Dpolyglot.engine.TraceCompilation=true -Dpolyglot.engine.TraceCompilationDetails=true"
```
Make sure you print trace logs by using `--log-level TRACE`.
11. Occasionally a piece of code runs slower than we anticipated. Analyzing
Truffle inlining traces may reveal locations that one thought would be
inlined but Truffle decided otherwise. Rewriting such locations to builtin
methods or more inliner-friendly representation can significantly improve
the performance. You can tell runner to generate inlining traces via
additional options:
```
JAVA_OPTS="-Dpolyglot.engine.TraceInlining=true -Dpolyglot.engine.TraceInliningDetails=true"
```
Make sure you print trace logs by using `--log-level TRACE`. See
[documentation](https://www.graalvm.org/22.2/graalvm-as-a-platform/language-implementation-framework/Inlining/#call-tree-states)
for the explanation of inlining decisions.
## Code & Internal Documentation Map
Other than the subsections here, go through the
[existing documentation](https://github.com/enso-org/enso/tree/develop/docs).
### Entry Points
1. See `Main` in `engine-runner` and `Language` in `runtime`. The former is the
embedder-side entry point, the latter the language-side one. They do a bit of
ping-pong through the polyglot APIs. That is unfortunate, as this API is
stringly typed. Therefore, chase the usages of method-name constants to jump
between the language-side implementations and the embedder-side calls.
Alternatively, step through the flow in a debugger.
2. Look at the `MainModule` in `language-server` and `RuntimeServerInstrument`
in `runtime`. This is the entry point for IDE, with language/embedder
boundary as usual, but with a server-like message exchange instead of
polyglot API use.
### Compiler
Look at `Compiler` in `runtime`. It is the main compiler class and the flow
should be straightforward. A high level overview is: the compiler alternates
between running module-local passes (currently in 3 groups) and global join
points, where information flows between modules.
### Interpreter
There are a few very convoluted spots in the interpreter, with non-trivial
design choices. Here's a list with some explanations:
1. **Function Call Flow**: It is quite difficult to efficiently call an Enso
function. Enso allows passing arguments by name, supports currying and
eta-expansion, and defaulted argument values. It also has to deal with
polyglot method calls. And it has to be instrumentable, to enable the "enter
a method via call site" functionality of the IDE. Start reading from
`ApplicationNode` and follow the execute methods (or `@Specialization`s).
There's a lot of them, but don't get too scared. It is also outlined
[here](https://github.com/enso-org/enso/blob/develop/docs/runtime/function-call-flow.md).
2. **Polyglot Code**: While for some languages (Java, Ruby and Python) it is
straightforward and very Truffle-like, for others (JS and R) it becomes
tricky. The reason is that Truffle places strong limitations on threading in
these languages and it is impossible to call JS and R from a multithreaded
language context (like Enso's). For this reason, we have a special, internal
sub-language, running on 2 separate Truffle contexts, exposing the single
threaded languages in a safe way (through a GIL). The language is called EPB
(Enso Polyglot Bridge) and lives in
[this subtree](https://github.com/enso-org/enso/tree/develop/engine/runtime/src/main/java/org/enso/interpreter/epb).
To really understand it, you'll need to familiarize yourself with what a
[TruffleContext](https://www.graalvm.org/truffle/javadoc/com/oracle/truffle/api/TruffleContext.html)
is and how it relates to polyglot and language contexts (oh, and also get
ready to work with about 7 different meanings of the word `Context`...).
3. **Threading & Safepoints**: Enso has its own safepointing system and a thread
manager. The job of the thread manager is to halt all the executing threads
when needed. Safepoints are polled during normal code execution (usually at
the start of every non-inlined method call and at each iteration of a TCO
loop). See
[the source](https://github.com/enso-org/enso/blob/develop/engine/runtime/src/main/java/org/enso/interpreter/runtime/ThreadManager.java).
4. **Resource Finalization**: Enso exposes a system for automatic resource
finalization. This is non-trivial on the JVM and is handled in the
[ResourceManager](https://github.com/enso-org/enso/blob/develop/engine/runtime/src/main/java/org/enso/interpreter/runtime/ResourceManager.java).
5. **Builtin Definitions**: Certain basic functions and types are exposed
directly from the interpreter. They currently are all bundled in a virtual
module called `Standard.Builtins`. See
[the Builtins class](https://github.com/enso-org/enso/blob/develop/engine/runtime/src/main/java/org/enso/interpreter/runtime/builtin/Builtins.java)
to see how that module is constructed. There's also a java-side
annotation-driven DSL for automatic generation of builtin method boilerplate.
See nodes in
[this tree](https://github.com/enso-org/enso/tree/develop/engine/runtime/src/main/java/org/enso/interpreter/runtime/builtin)
to get an idea of how it works. Also
[read the doc](https://github.com/enso-org/enso/blob/develop/docs/runtime/builtin-base-methods.md)
6. **Standard Library Sources**: These are very non-magical just plain old
Enso projects that get shipped with every compiler release. They live
[in this tree](https://github.com/enso-org/enso/tree/develop/distribution/lib/Standard).
And are tested through
[these projects](https://github.com/enso-org/enso/tree/develop/test). It also
makes heavy use of host interop. The Java methods used by the standard
library are located in
[this directory](https://github.com/enso-org/enso/tree/develop/std-bits).
7. **Microbenchmarks**: There are some microbenchmarks for tiny Enso programs
for basic language constructs. They are located in
[this directory](https://github.com/enso-org/enso/tree/develop/engine/runtime/src/bench).
Add org.enso.compiler.dumpIr system prop (#10740) Working on compiler IR is a daunting task. I have therefore added a new system property `enso.compiler.dumpIr` that will help with that. It dumps the encountered IRs to `ir-dumps` directory in the [GraphViz](www.graphviz.org) format. More info in updated docs. Note that all the functionality to dump IRs to `dot` files was already implemented. This PR just adds the command line option and updates docs. # Important Notes - `--dump-graphs` cmd line option is removed as per [Jaroslav's request](https://github.com/enso-org/enso/pull/10740#pullrequestreview-2216676140). - To dump graphs, use `-Dgraal.Dump=Truffle:2` system property passed via `JAVA_OPTS` env var. If you run `env JAVA_OPTS='-Denso.compiler.dumpIr=true' enso --run tmp.enso` where `tmp.enso` is, e.g.: ``` from Standard.Base import all main = 42 ``` You will then have something like: ``` $ ls ir-dumps Standard.Base.Data.Filter_Condition.dot Standard.Base.Data.Time.dot Standard.Base.System.Advanced.dot Standard.Base.Warning.dot Standard.Base.Data.Locale.dot Standard.Base.Enso_Cloud.Enso_File.dot Standard.Base.System.File.Advanced.dot tmp.dot Standard.Base.Data.Numeric.dot Standard.Base.Errors.dot Standard.Base.System.File.dot Standard.Base.Data.Numeric.Internal.dot Standard.Base.Network.HTTP.Internal.dot Standard.Base.System.File.Generic.dot Standard.Base.Data.Text.Regex.Internal.dot Standard.Base.Runtime.dot Standard.Base.System.Internal.dot ``` You can then visualize any of these with `dot -Tsvg -O ir-dumps/tmp.dot`. An example how that could look like is ![image.svg](https://github.com/user-attachments/assets/26ab8415-72cf-46da-bc63-f475e9fa628e)
2024-08-06 15:00:27 +03:00
They can be run through `sbt runtime-benchmarks/bench`. Each run will
generate (or append to) the `bench-report.xml` file. See
[Benchmarks](infrastructure/benchmarks.md) for more information about the
benchmarking infrastructure.
[Enso Language Support](../tools/enso4igv/README.md)).
8. **Tests**: There are scalatests that comprehensively test all of the language
Add org.enso.compiler.dumpIr system prop (#10740) Working on compiler IR is a daunting task. I have therefore added a new system property `enso.compiler.dumpIr` that will help with that. It dumps the encountered IRs to `ir-dumps` directory in the [GraphViz](www.graphviz.org) format. More info in updated docs. Note that all the functionality to dump IRs to `dot` files was already implemented. This PR just adds the command line option and updates docs. # Important Notes - `--dump-graphs` cmd line option is removed as per [Jaroslav's request](https://github.com/enso-org/enso/pull/10740#pullrequestreview-2216676140). - To dump graphs, use `-Dgraal.Dump=Truffle:2` system property passed via `JAVA_OPTS` env var. If you run `env JAVA_OPTS='-Denso.compiler.dumpIr=true' enso --run tmp.enso` where `tmp.enso` is, e.g.: ``` from Standard.Base import all main = 42 ``` You will then have something like: ``` $ ls ir-dumps Standard.Base.Data.Filter_Condition.dot Standard.Base.Data.Time.dot Standard.Base.System.Advanced.dot Standard.Base.Warning.dot Standard.Base.Data.Locale.dot Standard.Base.Enso_Cloud.Enso_File.dot Standard.Base.System.File.Advanced.dot tmp.dot Standard.Base.Data.Numeric.dot Standard.Base.Errors.dot Standard.Base.System.File.dot Standard.Base.Data.Numeric.Internal.dot Standard.Base.Network.HTTP.Internal.dot Standard.Base.System.File.Generic.dot Standard.Base.Data.Text.Regex.Internal.dot Standard.Base.Runtime.dot Standard.Base.System.Internal.dot ``` You can then visualize any of these with `dot -Tsvg -O ir-dumps/tmp.dot`. An example how that could look like is ![image.svg](https://github.com/user-attachments/assets/26ab8415-72cf-46da-bc63-f475e9fa628e)
2024-08-06 15:00:27 +03:00
semantics and compiler passes. These are run with
`sbt runtime-integration-tests/test`. For newer functionalities, we prefer
adding tests to the `Tests` project in the standard library test. At this
point, Enso is mature enough to self-test.
### Language Server
Talk to Dmitry! He's the main maintainer of this part.