mirror of https://github.com/enso-org/enso.git synced 2024-11-22 11:52:59 +03:00

Add org.enso.compiler.dumpIr system prop (#10740 )

Working on compiler IR is a daunting task. I have therefore added a new system property `enso.compiler.dumpIr` that will help with that. It dumps the encountered IRs to `ir-dumps` directory in the [GraphViz](www.graphviz.org) format. More info in updated docs.

Note that all the functionality to dump IRs to `dot` files was already implemented. This PR just adds the command line option and updates docs.

# Important Notes
- `--dump-graphs` cmd line option is removed as per [Jaroslav's request](https://github.com/enso-org/enso/pull/10740#pullrequestreview-2216676140).
- To dump graphs, use `-Dgraal.Dump=Truffle:2` system property passed via `JAVA_OPTS` env var.

If you run `env JAVA_OPTS='-Denso.compiler.dumpIr=true' enso --run tmp.enso` where `tmp.enso` is, e.g.:
```
from Standard.Base import all
main = 42
```
You will then have something like:
```
$ ls ir-dumps
Standard.Base.Data.Filter_Condition.dot Standard.Base.Data.Time.dot Standard.Base.System.Advanced.dot Standard.Base.Warning.dot
Standard.Base.Data.Locale.dot Standard.Base.Enso_Cloud.Enso_File.dot Standard.Base.System.File.Advanced.dot tmp.dot
Standard.Base.Data.Numeric.dot Standard.Base.Errors.dot Standard.Base.System.File.dot
Standard.Base.Data.Numeric.Internal.dot Standard.Base.Network.HTTP.Internal.dot Standard.Base.System.File.Generic.dot
Standard.Base.Data.Text.Regex.Internal.dot Standard.Base.Runtime.dot Standard.Base.System.Internal.dot
```
You can then visualize any of these with `dot -Tsvg -O ir-dumps/tmp.dot`.

An example how that could look like is
![image.svg](https://github.com/user-attachments/assets/26ab8415-72cf-46da-bc63-f475e9fa628e)

2024-08-06 12:00:27 +00:00

13 KiB

Raw Blame History

layout

title

Runtime Guide

GraalVM and Truffle

Papers

One VM To Rule Them All a high-level overview of what GraalVM is and how it works.
Practical Partial Evaluation for High-Performance Dynamic Language Runtimes an introduction to basic Truffle concepts, including Polymorphic Inline Caches and other frequently used techniques.
Fast, Flexible, Polyglot Instrumentation Support for Debuggers and other Tools an introduction to Truffle instrumentation framework – this is what Enso's runtime server for the IDE uses.
Cross-Language Interoperability in a Multi-Language Runtime an introduction to how Truffle cross-language interop works.
The whole list of publications because something may be useful at some point.

Tutorials

The list of Truffle docs on specific topics Certain more advanced topics are covered in these, use as needed.
1. Optimizing Tutorial you'll want to read this one for sure.
2. TruffleLibrary Tutorial this is an important architectural concept for building Truffle interpreters. We wish we knew about this sooner and we recommend you structure the interpreter around this in the future.
A tutorial on building a LISP in Truffle It's a 4-part tutorial, linked is part 4, start with part 2 (part 1 is not about Truffle). This one is important, even though it is old and uses stale APIs – it will still highlight the most important concepts, in particular the way Enso deals with lexical scopes and Tail Call Optimization.
Simple Language this is an implementation of a very simple toy language. Read it for basic understanding of simple Truffle concepts.

Tips and Tricks

Familiarize yourself with IGV. It's a horrible tool. It's clunky, ugly, and painful to use. It has also saved us more times than we can count, definitely worth investing the time to understand it. Download Enso Language Support for IGV. Use this tutorial (and the follow up post) to familiarize yourself with the representation.
Use our sbt withDebug utility. Familiarize yourself with the different otpions. It is a useful helper for running your programs and microbenchmarks with different Truffle debugging options.
Use hsdis for printing the generated assembly – you can often spot obvious problems with compilations. That being said, IGV (with Enso Language Support) is usually the better tool, if you take a look at the later compilation stages.
Pay attention to making things final and @CompilationFinal. This is the most important way Graal does constant-folding. Whenever a loop bound can be compilation final, take advantage (and use @ExplodeLoop).
Read the generated code for the nodes generated by the DSL. Learning the DSL is quite difficult and the documentation is sorely lacking. It is best to experiment with different kinds of @Specialization and read the generated code. Without this understanding, it's way too easy to introduce very subtle bugs to the language semantics.
Join the GraalVM Slack server. All the authors are there and they will happily help and answer any questions.
Be aware that Truffle Instrumentation is more constrained than it could be, because it wants to be language agnostic. The Enso runtime server is Enso-specific and therefore you may be better served in the future by rolling your own instrumentation. Read the instrumentation sources, it will help you understand how non-magical it actually is.
Clone the sources of Truffle and TruffleRuby. Set them up as projects in your IDE. Read the code when in doubt. Truffle documentation is really lacking sometimes, even though it is improving.
Understand the boundary between the language-side APIs (see e.g. InteropLibrary) and embedder side (see Value). You want to make sure you use the proper APIs in the proper places in the codebase. As a rule of thumb: all code in the runtime project is language/instrumentation-side. All code elsewhere is embedder-side. In particular, the only Graal dependency in embedder code should be graal-sdk. If you find yourself pulling things like truffle-api, you've done something wrong. Similarly, if you ever import anything from org.graalvm.polyglot in the language code, you're doing something wrong.
Avoid deoptimizations. Understanding IGV graphs can be a very time-consuming and complex process (even with the help of Enso tooling for IGV). Sometimes it is sufficient to only look at the compilation traces to discover repeated or unnecessary deoptimizations which can significantly affect overall performance of your program. You can tell runner to generate compilation traces via additional options:
```
JAVA_OPTS="-Dpolygot.engine.TracePerformanceWarnings=all -Dpolyglot.engine.TraceTransferToInterpreter=true -Dpolyglot.engine.TraceDeoptimizeFrame=true -Dpolyglot.engine.TraceCompilation=true -Dpolyglot.engine.TraceCompilationDetails=true"
```
Make sure you print trace logs by using --log-level TRACE.
Occasionally a piece of code runs slower than we anticipated. Analyzing Truffle inlining traces may reveal locations that one thought would be inlined but Truffle decided otherwise. Rewriting such locations to builtin methods or more inliner-friendly representation can significantly improve the performance. You can tell runner to generate inlining traces via additional options:
```
JAVA_OPTS="-Dpolyglot.engine.TraceInlining=true -Dpolyglot.engine.TraceInliningDetails=true"
```
Make sure you print trace logs by using --log-level TRACE. See documentation for the explanation of inlining decisions.

Code & Internal Documentation Map

Other than the subsections here, go through the existing documentation.

Entry Points

See Main in engine-runner and Language in runtime. The former is the embedder-side entry point, the latter the language-side one. They do a bit of ping-pong through the polyglot APIs. That is unfortunate, as this API is stringly typed. Therefore, chase the usages of method-name constants to jump between the language-side implementations and the embedder-side calls. Alternatively, step through the flow in a debugger.
Look at the MainModule in language-server and RuntimeServerInstrument in runtime. This is the entry point for IDE, with language/embedder boundary as usual, but with a server-like message exchange instead of polyglot API use.

Compiler

Look at Compiler in runtime. It is the main compiler class and the flow should be straightforward. A high level overview is: the compiler alternates between running module-local passes (currently in 3 groups) and global join points, where information flows between modules.

Interpreter

There are a few very convoluted spots in the interpreter, with non-trivial design choices. Here's a list with some explanations:

Function Call Flow: It is quite difficult to efficiently call an Enso function. Enso allows passing arguments by name, supports currying and eta-expansion, and defaulted argument values. It also has to deal with polyglot method calls. And it has to be instrumentable, to enable the "enter a method via call site" functionality of the IDE. Start reading from ApplicationNode and follow the execute methods (or @Specializations). There's a lot of them, but don't get too scared. It is also outlined here.
Polyglot Code: While for some languages (Java, Ruby and Python) it is straightforward and very Truffle-like, for others (JS and R) it becomes tricky. The reason is that Truffle places strong limitations on threading in these languages and it is impossible to call JS and R from a multithreaded language context (like Enso's). For this reason, we have a special, internal sub-language, running on 2 separate Truffle contexts, exposing the single threaded languages in a safe way (through a GIL). The language is called EPB (Enso Polyglot Bridge) and lives in this subtree. To really understand it, you'll need to familiarize yourself with what a TruffleContext is and how it relates to polyglot and language contexts (oh, and also get ready to work with about 7 different meanings of the word Context...).
Threading & Safepoints: Enso has its own safepointing system and a thread manager. The job of the thread manager is to halt all the executing threads when needed. Safepoints are polled during normal code execution (usually at the start of every non-inlined method call and at each iteration of a TCO loop). See the source.
Resource Finalization: Enso exposes a system for automatic resource finalization. This is non-trivial on the JVM and is handled in the ResourceManager.
Builtin Definitions: Certain basic functions and types are exposed directly from the interpreter. They currently are all bundled in a virtual module called Standard.Builtins. See the Builtins class to see how that module is constructed. There's also a java-side annotation-driven DSL for automatic generation of builtin method boilerplate. See nodes in this tree to get an idea of how it works. Also read the doc
Standard Library Sources: These are very non-magical – just plain old Enso projects that get shipped with every compiler release. They live in this tree. And are tested through these projects. It also makes heavy use of host interop. The Java methods used by the standard library are located in this directory.
Microbenchmarks: There are some microbenchmarks for tiny Enso programs for basic language constructs. They are located in this directory. They can be run through sbt runtime-benchmarks/bench. Each run will generate (or append to) the bench-report.xml file. See Benchmarks for more information about the benchmarking infrastructure. Enso Language Support).
Tests: There are scalatests that comprehensively test all of the language semantics and compiler passes. These are run with sbt runtime-integration-tests/test. For newer functionalities, we prefer adding tests to the Tests project in the standard library test. At this point, Enso is mature enough to self-test.

Language Server

Talk to Dmitry! He's the main maintainer of this part.

13 KiB Raw Blame History Unescape Escape