daml/daml-lf/README.md

259 lines
9.7 KiB
Markdown
Raw Normal View History

The unified Daml-LF interpreter and engine
2019-04-04 11:33:38 +03:00
==========================================
This package contains the canonical in-memory LF ASTs of both the public
interface and the whole contents, decoders from on-wire LF to those, and
an interpreter for LF.
Additionally a separate package is provided for a standalone
REPL allowing loading of .dalf files and interpretation of
pure functions, updates and scenarios.
We provide both Bazel-based and Sbt-based builds for this project. The
Sbt builds are provided solely for development purposes, to facilitate
incremental compilation and IDE integration. You can simply import the
sbt project for development, but if making changes, know that the Bazel
build is the sole source of truth for CI and releases.
Components
----------
- `archive` contains the Protobuf definition of the LF format, and
Protobuf utilities for reading it into a raw memory form. This should
reflect the [official LF specification][] at any given time. As with
the LF specification, changes to the Protobuf definition are governed
by the [Daml-LF Governance process][].
2019-04-04 11:33:38 +03:00
- `interface` is an ADT of the "public interface" of a given LF package,
meaning its templates, their choices, and serializable data types in
the package. The ADT does *not* include `def`s or expressions. A
reader from the raw protobuf is included. The ADT is usable from
Java.
- `lfpackage` is the canonical LF ADT, containing all information about
an LF package. Its main consumer is the `interpreter`, which compiles
from this faithful representation of the protobuf LF archive into
a lower-level AST that is then interpreted.
The current plan with `lfpackage` is to be able to load both old and
new LF versions into, so that the interpreter and other consumers can
work with a common format.
For most use cases `lfpackage` is too complex, and `interface` is more
convenient; if you need the extra information, this is available,
though, but without guarantees of stability.
- `transaction` holds ADTs related to the interpretation of LF, as
`lfpackage` represents the *definitions* in LF. The base of these is
the Value ADT, representing serializable values (i.e. values of
serializable LF type). Building on that is the Transaction ADT,
representing ledger updates.
Both have associated Protobuf definitions, also contained in this
package, and are used in `interpreter` and `engine` respectively.
- `transaction-lib-test` supplies tools to generate transaction and
the value and transaction ADTs provided by the `transaction` library.
2019-04-04 11:33:38 +03:00
- `data` contains utility datatypes used in the engine, and functions
designed around specified LF semantics. For example, if you want
LF-compatible decimal handling, the `Decimal` API is a good source of
useful functions.
- `data-scalacheck` supplies Scalacheck `Arbitrary`s for the custom
collections provided by the `data` library.
2019-04-04 11:33:38 +03:00
- `interpreter` is the "unified interpreter" used for both the sandbox
and the production ledger. It is an efficient [CEK machine][],
interpreting the `lfpackage` terms using a (non-serializable) internal
value model, ultimately producing `transaction`s. Most downstream
will want to use `engine` in addition to this, because only the pure
interpreter lives here.
- `engine` holds the ledger state on `interpreter`'s behalf and
implements all of its public-facing aspects, such as the `Command`
interface, events, and loaded packages.
- `scenario-interpreter` practically demonstrates why `interpreter` is
separate from `engine`: it is a small set of library functions using
`interpreter` to evaluate scenarios from an LF.
- `repl` is the below-described REPL, manipulating an internal engine
state and running scenarios at your command.
- `testing-tools` helps you run scenarios from Scalatest.
[official LF specification]: spec/daml-lf-1.rst
[Daml-LF Governance process]: governance.rst
2019-04-04 11:33:38 +03:00
[CEK machine]: https://gist.github.com/ekmett/f081b5e36bac3fed1ea6b21eb25327c6
Building and testing
--------------------
Daml-LF uses Bazel to build and test the components. Please refer to top-level
2019-04-04 11:33:38 +03:00
`BAZEL.md` and `BAZEL-JVM.md` documents for high-level instructions on how to
use Bazel and how it works within IntelliJ.
To get a list of build targets:
```
bazel query //daml-foundations/daml-lf/...
```
To build and test everything:
```
bazel build //daml-foundations/daml-lf/...
bazel test //daml-foundations/daml-lf/...
```
To watch a target and re-run tests when files change:
```
ibazel test //daml-foundations/daml-lf/...
```
All the above can of course take more fine-grained targets as arguments.
"..." means all targets under this directory, recursively. ":all" would
specify all targets in the specified directory.
To load a package in the scala repl you will need to add a "@repl" target
to BUILD.bazel:
```
load("@io_bazel_rules_scala//scala:scala.bzl", "scala_repl")
scala_repl(
name = "interpreter@repl",
deps = [
":interpreter"
]
)
```
This target can then be invoked with "bazel run":
```
da$ bazel run //daml-lf/interpreter:interpreter@repl
or:
interpreter$ bazel run interpreter@repl
```
Since "rules_scala" does not currently support incremental compilation
you will need to help Bazel along a bit by keep the dependency graph
lean. Try to divide your code into separate scala_library targets as
build results are cached at this level. Preferably unrelated modules
should be separate scala_library targets, unvisible to the outside.
A visible scala_library target should then collect the unrelated modules
into a single target that can be depended on from outside.
Benchmarking
------------
Benchmarks for scenario execution can be run with
```
bazel run //daml-lf/scenario-interpreter:scenario-perf
```
A run of this benchmark will take between 6 and 7 minutes. A faster, less
precise benchmark which takes around 1 minute can be invoked with
```
bazel run //daml-lf/scenario-interpreter:scenario-perf -- -f 0
```
To benchmark scenarios other than the ones configured by default, you can
invoke
```
bazel run //daml-lf/scenario-interpreter:scenario-perf -- -p dar=/path/to/some/dar -p scenario=Some.Module:test
```
This can be combined with the `-f 0` flag as well.
These benchmarks are focused on Daml execution speed and try to avoid noise
caused by, say, I/O as much as possible.
Daml-LF-REPL Usage
2019-04-04 11:33:38 +03:00
------------------
The REPL can be compiled with `bazel build //:daml-lf-repl` and run with
`bazel run //:daml-lf-repl -- repl`. The `//:` prefix is not needed when
at repository root.
Example use:
$ bazel run //:daml-lf-repl -- repl
daml> :load project.dar
2019-04-04 11:33:38 +03:00
daml> Project.double 4
8
daml> :scenario Project.tests
...
See `:help` for more instructions.
The REPL application also provides commands `test` and `testAll` for
running scenarios in packages:
$ bazel run //:daml-lf-repl -- testAll $PWD/project.dalf
$ bazel run //:daml-lf-repl -- test Project.tests $PWD/project.dalf
NOTE: When running via `bazel run` one needs to specify full path (or relative path from repo root), since Bazel runs all commands from repository root.
Implement a simple profiler for DAML scenarios (#5957) * Implement a simple profiler for DAML scenarios The profiler runs a single scenario and records timing information when each function (and some other closures) are entered and left. The resulting information can be visualized as a flamegraph using [speedscope](https://www.speedscope.app/). The profiler works by instrumenting the CEK machine at the heart of DAML Engine. Unfortunetaly, this causes a very small overhead on non-profiling runs too. However, in my benchmarks I could not measure any significant impact on the overall runtime at all. More precisely, the overhead is as follows: Every closure now has an additional field called `label`. In non-profiling runs this field is always set to `null`. This field needs to be allocated, copied whenever we copy a closure and scanned during garbage collection. Additionally, whenever we enter a closure, we check this field and whenever it is _not_ `null`, i.e. never during non-profiling runs, we record an "open event" and set up a hook for the corresponding "close event". Thus, the additional cost during non-profiling runs are a single pointer comparison and a jump beyond the "then branch". Since this is still very much in active development, there are no documentation, other than an entry in a README, and no tests yet. They will come before we promote this. However, the UX will look very different then since we already have plans to significantly change it. CHANGELOG_BEGIN CHANGELOG_END * Run scalafmt * Make profiling argument to PureCompiledPackges optional * Fix a bunch of tests CHANGELOG_BEGIN CHANGELOG_END * scalafmt is so annoying CHANGELOG_BEGIN CHANGELOG_END * Apply simple suggestions CHANGELOG_BEGIN CHANGELOG_END
2020-05-18 20:20:44 +03:00
Profiling scenarios
-------------------
Daml-LF-REPL provides a command to run a scenario and collect profiling
Implement a simple profiler for DAML scenarios (#5957) * Implement a simple profiler for DAML scenarios The profiler runs a single scenario and records timing information when each function (and some other closures) are entered and left. The resulting information can be visualized as a flamegraph using [speedscope](https://www.speedscope.app/). The profiler works by instrumenting the CEK machine at the heart of DAML Engine. Unfortunetaly, this causes a very small overhead on non-profiling runs too. However, in my benchmarks I could not measure any significant impact on the overall runtime at all. More precisely, the overhead is as follows: Every closure now has an additional field called `label`. In non-profiling runs this field is always set to `null`. This field needs to be allocated, copied whenever we copy a closure and scanned during garbage collection. Additionally, whenever we enter a closure, we check this field and whenever it is _not_ `null`, i.e. never during non-profiling runs, we record an "open event" and set up a hook for the corresponding "close event". Thus, the additional cost during non-profiling runs are a single pointer comparison and a jump beyond the "then branch". Since this is still very much in active development, there are no documentation, other than an entry in a README, and no tests yet. They will come before we promote this. However, the UX will look very different then since we already have plans to significantly change it. CHANGELOG_BEGIN CHANGELOG_END * Run scalafmt * Make profiling argument to PureCompiledPackges optional * Fix a bunch of tests CHANGELOG_BEGIN CHANGELOG_END * scalafmt is so annoying CHANGELOG_BEGIN CHANGELOG_END * Apply simple suggestions CHANGELOG_BEGIN CHANGELOG_END
2020-05-18 20:20:44 +03:00
information while running it. This information is then written into a file that
can be viewed using the [speedscope](https://www.speedscope.app/) flamegraph
visualizer. The easiest way to install speedscope is to run
```shell
$ npm install -g speedscope
```
See the [Offline usage](https://github.com/jlfwong/speedscope#usage) section of
its documentation for alternatives.
Once speedscope is installed, the profiler can be invoked via
```shell
$ bazel run //:daml-lf-repl -- profile Module.Name:scenarioName /path/to.dar /path/to/output.json
```
and the profile viewed via
```shell
$ speedscope /path/to/output.json
```
2019-04-04 11:33:38 +03:00
Scala house rules
-----------------
* _Do not_ use `Seq` in the interpreter's code paths, with the possible
exceptions of accepting inputs in external APIs. Use `ImmArray`,
`FrontStack`, and `BackStack` as appropriate.
The reason for this rule is that `Seq` hides completely the
performance of operations -- for example it defines cons and snoc
and append for all structures even if it requires a full copy for an
array.
`ImmArray` should be used in cases where you do not need to append
or prepend content often. It is however very cheap to slice the
`ImmArray` (removing elements from either end).
`FrontStack` should be used when needing to build up a list of
elements by prepending elements. Both single elements or chunks
in the form of `ImmArray` can be prepended. A typical use case
if traversing a tree in topological order by keeping a stack of
children to still be visited.
`BackStack` is like `FrontStack` but you can append rather than
prepend. For example if you find yourself building and then
reversing a list, use `BackStack` instead.
* Avoid mutable data structures in external APIs. This is not set in
stone but generally a code smell.
* Try to always define functions on user-provided data structures (of
which we have a lot of in this codebase) to be tail recursive. The
typical way to do this is by defining little "interpreters" to perform
your function. XXX put good example here once we have an established
pattern. In doubt, ask Francesco Mazzoli or Gyorgy Farkas about this.
* Disable "Optimize imports on the fly" and the "Optimize Imports" shortcut in
IntelliJ IDEA, since they mess up diffs and can subtly, insidiously change
the semantics of your code (Scala imports are order sensitive). You can
disable "on the fly" at Menu -> Preferences -> Editor -> General -> Auto
Import -> Scala, and the shortcut key in Preferences -> Keymap -> search
Optimize Imports -> double-click result -> Remove.