mirror of
https://github.com/enso-org/enso.git
synced 2025-01-03 16:44:05 +03:00
Update docs following GUI1 removal (#9670)
This commit is contained in:
parent
32c3f5f3e8
commit
e81e3661b2
5
.github/PULL_REQUEST_TEMPLATE.md
vendored
5
.github/PULL_REQUEST_TEMPLATE.md
vendored
@ -21,9 +21,8 @@ Please ensure that the following checklist has been satisfied before submitting
|
||||
- [ ] All code follows the
|
||||
[Scala](https://github.com/enso-org/enso/blob/develop/docs/style-guide/scala.md),
|
||||
[Java](https://github.com/enso-org/enso/blob/develop/docs/style-guide/java.md),
|
||||
[TypeScript](https://github.com/enso-org/enso/blob/develop/docs/style-guide/typescript.md),
|
||||
and
|
||||
[Rust](https://github.com/enso-org/enso/blob/develop/docs/style-guide/rust.md)
|
||||
style guides. In case you are using a language not listed above, follow the [Rust](https://github.com/enso-org/enso/blob/develop/docs/style-guide/rust.md) style guide.
|
||||
- All code has been tested:
|
||||
- [ ] Unit tests have been written where possible.
|
||||
- [ ] If GUI codebase was changed, the GUI was tested when built using `./run ide build`.
|
||||
- [ ] Unit tests have been written where possible.
|
||||
|
@ -1,643 +0,0 @@
|
||||
# Summary
|
||||
|
||||
This document provides a guide on how to achieve and maintain high-performance
|
||||
evaluation of Enso (IDE, Engine, Language Server, and other services). It is
|
||||
divided into the following chapters:
|
||||
|
||||
- [**Profiling**](#epic-1-profiler-implementation)
|
||||
Discovering performance bottlenecks and understanding their causes.
|
||||
- [**Performance goals**](#epic-1-performance-goals-and-reference-hardware)
|
||||
Desired performance and comparison to other products.
|
||||
- [**Performance improvements plan**](#epic-2-performance-improvements-planning)
|
||||
Strategies on improving performance of the selected components.
|
||||
- [**Continuous performance monitoring**](#epic-3-continuous-performance-monitoring)
|
||||
Maintenance of high performance of all components and early discovery of new
|
||||
performance issues.
|
||||
|
||||
# Profiling
|
||||
|
||||
To understand, improve, and maintain performance of the IDE, we need to be able
|
||||
to reliably measure the performance of actions it performs.
|
||||
|
||||
## Measurables
|
||||
|
||||
Measurements we make will fall into one of two categories:
|
||||
|
||||
- **Objective** measurements refer to intervals directly observable by the user,
|
||||
e.g. _time from opening the app until the loading spinner gets out of the
|
||||
way_.
|
||||
- **Internal** measurements provide insight into how time is spent while
|
||||
performing the **objective**s, e.g. _time spent compiling ide.wasm_.
|
||||
|
||||
### Objectives to measure
|
||||
|
||||
The following **objective**s should be measured by the framework; it is part of
|
||||
this task to identify the most important user-visible measurements. **Internal**
|
||||
measurements will be defined as-needed to understand how time is spent
|
||||
performing **objective**s.
|
||||
|
||||
- **App startup**
|
||||
Time between double-clicking the app icon and having functional hello screen
|
||||
of recent projects. Sub-processes:
|
||||
|
||||
- **Hello Screen initialization**
|
||||
Time between double-clicking the app icon and displaying the loading
|
||||
spinner:
|
||||
- **App window initialization**
|
||||
Time between double-clicking the app and displaying the user log-in
|
||||
prompt.
|
||||
|
||||
- **Project startup**
|
||||
Time from selecting a project from the Hello Screen to a working graph view of
|
||||
the project. See also [engine startup](engine-startup.md) for non-IDE details
|
||||
on a similar topic. Sub-processes:
|
||||
|
||||
- **Graph Editor initialization**
|
||||
Time needed to display fully-functional graph editor (not colored).
|
||||
Profiling should include:
|
||||
- Time needed for GUI Backend to receive and process the required
|
||||
information from the Engine.
|
||||
- FRP events propagation. This should mainly focus on the public Graph
|
||||
Editor FRP API (used by GUI backend):
|
||||
- Time needed to propagate FRP events through the system.
|
||||
- Count of FRP events of a particular type, e.g. how many times a text
|
||||
refresh was requested.
|
||||
- Time needed to create each node component (model, display object, etc.).
|
||||
- FPS statistics (explained later).
|
||||
- **Type information propagation**
|
||||
Time needed to color the graph. Profiling should include:
|
||||
- Count of FRP events of a particular type. For example, how many times a
|
||||
text refresh was requested.
|
||||
- Time needed to update each particular node.
|
||||
- FPS statistics (explained later).
|
||||
|
||||
- **Working with a project**
|
||||
This section should contain various objectives, including:
|
||||
|
||||
- Creating/moving/editing/removing a node.
|
||||
- Connecting nodes.
|
||||
- Editing expressions (incl. typing a letter).
|
||||
- Entering nodes.
|
||||
- Opening/changing/closing a visualization.
|
||||
- Saving the project.
|
||||
- Switching projects.
|
||||
- Opening node searcher.
|
||||
- Switching selection in node searcher.
|
||||
|
||||
- **Idle performance**
|
||||
The profiling statistics for opened project that does not receive any
|
||||
user-interactions.
|
||||
|
||||
Also, please browse the
|
||||
[performance-related issues](https://enso.youtrack.cloud/issue/IDE-3209) to
|
||||
check if they are not describing processes missing from the above list. In
|
||||
particular, please check the
|
||||
[IDE Performance Issues issue](https://enso.youtrack.cloud/issue/IDE-2571).
|
||||
|
||||
### Measurement relationships
|
||||
|
||||
The data model for measurements is hierarchical, to enable use ranging from
|
||||
high-level overviews to fine-grained analysis of details.
|
||||
|
||||
Each of the trees comprising the hierarchy is rooted at an
|
||||
[objective measurement](#measurables), because we only measure intervals that
|
||||
contribute to a user-observable delay.
|
||||
|
||||
The parent-child relationship of the hierarchy indicates that the child was
|
||||
started by the parent. Usually, but not always, a task's parent cannot complete
|
||||
until the child does.
|
||||
|
||||
Beside start and end times, the framework logs when tasks are paused or resumed
|
||||
(e.g. before and after an `.await`); this provides enough information to define
|
||||
a _current profiler_: The current profiler is the most-recently
|
||||
started-or-resumed profiler that is not currently paused and has not yet ended.
|
||||
|
||||
Some profiling data such as _FRP events_ and _per-frame statistics_ is
|
||||
implicitly attributed to the current profiler [these terms will be elaborated on
|
||||
in a future revision of this document].
|
||||
|
||||
## Collecting the data
|
||||
|
||||
### Interactively
|
||||
|
||||
Users may capture profiling information at any time during an interactive
|
||||
session of the application (usage in debug scenes is not yet supported). The
|
||||
procedure to collect data is:
|
||||
|
||||
- Compile Enso with the desired [profiling level](#profiling-levels) enabled.
|
||||
The default (which is used in release builds) is to record only
|
||||
[objective measurements](#measurables), which includes information about
|
||||
performance, but little detail for identifying causes. A greater level of
|
||||
detail can be obtained by passing the `profiling-level` argument to the
|
||||
[build script (see "Development" subheading)](https://enso.org/docs/developer/ide/CONTRIBUTING.html);
|
||||
for maximum detail, use `--profiling-level=debug`. For example:
|
||||
```shell
|
||||
./run ide build --profiling-level=detail
|
||||
```
|
||||
- To profile in Electron:
|
||||
- Start Enso with the `profile.save` option, e.g.:
|
||||
```shell
|
||||
./distribution/bin/enso --profile.save=your_profile.json
|
||||
```
|
||||
If this option is not provided, the path `profile.json` will be used.
|
||||
- Perform any activity the profiler should record (the profiler is always
|
||||
running).
|
||||
- Press Ctrl+Alt+Q to save profile (to the path specified earlier) and quit.
|
||||
- To profile in Chrome:
|
||||
- Open Enso in Chrome.
|
||||
- Perform any activity the profiler should record (the profiler is always
|
||||
running).
|
||||
- Press Ctrl+Alt+O to end profile recording.
|
||||
- Open the Browser Console by pressing Ctrl+Alt+J (do not do this before
|
||||
profiling;
|
||||
[having the console open can degrade performance](https://developer.chrome.com/blog/wasm-debugging-2020/#profiling)
|
||||
of WASM applications.)
|
||||
- Near the end of the Console output, there will be a JSON object printed; it
|
||||
will usually be large, and truncated. Click the Copy button below it.
|
||||
- Paste the log into a .json file.
|
||||
|
||||
### Batch mode
|
||||
|
||||
A _batch mode_ supports recording profiling information for predefined workflows
|
||||
without user-interaction. It can be run as follows:
|
||||
|
||||
```shell
|
||||
# First build with full profiling information.
|
||||
./run ide build --profiling-level=debug
|
||||
# Now run the generated binary in batch mode.
|
||||
./distribution/bin/enso --startup.entry=profile --profile.workflow=create_node --profile.save=create_node.json
|
||||
```
|
||||
|
||||
The `profile` entry point selects batch mode; when it is used, the
|
||||
`profile.workflow` argument is mandatory. `profile.save` specifies an output
|
||||
filename; `profile.workflow` selects from a list of defined batch-mode workflows
|
||||
to measure. Supported workflows include:
|
||||
|
||||
- `new_project`: This workflow includes starting the application, opening an
|
||||
empty project, and closing the application.
|
||||
- `collapse_nodes`: Measures selecting and collapsing a group of nodes.
|
||||
- `create_node`: Measures adding a new node to the graph.
|
||||
- `enter_collapsed_node`: Measures entering a collapsed node.
|
||||
- `open_visualization`: Measures enabling a visualization for a node.
|
||||
- `open_project_orders`: Measures opening the project called `Orders`.
|
||||
|
||||
To see a full list of currently-implemented workflows, run the `enso` executable
|
||||
with the arguments `--entry-point=profile`, and `--workflow=help`. Additional
|
||||
workflows will be implemented in
|
||||
[the future](#epic-1-performance-goals-and-reference-hardware).
|
||||
|
||||
### The profiling environment
|
||||
|
||||
An important consideration when profiling is controlling the environment. In
|
||||
particular, setting a fixed CPU frequency is important for measurement
|
||||
consistency (with Turbo Boost enabled I have been seeing up to a 2x difference
|
||||
between consecutive runs; with it disabled, it is only around 20%). Because
|
||||
achieving this depends on hardware, it is up to the user to do so; it is
|
||||
recommended to create a script that automatically disables turbo boost before
|
||||
running benchmarks rather that do so manually, since a manual approach is
|
||||
error-prone. On Linux, frequency scaling can be controlled by writing to /sys
|
||||
values (which values depends on processor); on AWS instances, it can be set with
|
||||
a kernel command line
|
||||
(https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html#baseline-perf).
|
||||
In any environment where frequency cannot be disabled (or when running with it
|
||||
enabled to obtain realistic measurements of reference hardware), high variance
|
||||
will make it more difficult to detect performance changes.
|
||||
|
||||
## Examining the data
|
||||
|
||||
### Run-graphs
|
||||
|
||||
A "run graph" tool provides a high-level overview of the data in the profile.
|
||||
The upper part of the view is an icicle plot[^1] showing data that is similar to
|
||||
a callgraph[^2], but rather than show function scopes it shows profiler scopes
|
||||
(these often coincide). The lower part of the view, sharing the (horizontal)
|
||||
time axis, is a graph of async tasks, showing their active and inactive
|
||||
intervals. [^1]:
|
||||
https://www.cs.middlebury.edu/~candrews/showcase/infovis_techniques_s16/icicle_plots/icicleplots.html
|
||||
[^2]: https://en.wikipedia.org/wiki/Call_graph
|
||||
|
||||
Invocation:
|
||||
|
||||
```shell
|
||||
./distribution/bin/enso --startup.entry=profiling-run-graph --profile.load=profile.json
|
||||
```
|
||||
|
||||
It takes as input the profiling data representing a single run
|
||||
([averaged data would not be suitable for graphical representation](#averaging-profile-data)).
|
||||
|
||||
### Command-line tools
|
||||
|
||||
- `intervals`
|
||||
(`cargo run -p enso-profiler-data --bin intervals < profile.json`): This tool
|
||||
produces a table showing which profilers spent the most time running.
|
||||
- `measurements`
|
||||
(`cargo run -p enso-profiler-data --bin measurements < profile.json`): This
|
||||
tool produces a textual representation of all the data in a profile, including
|
||||
the hierarchy and lifetimes of async profilers, sync profilers, and metadata.
|
||||
It supports fine-grained examination of profile contents, and `profiler`-based
|
||||
debugging.
|
||||
|
||||
### Format conversion
|
||||
|
||||
- `devtools`
|
||||
(`cargo run -p enso-profiler-data --bin devtools < input.json > output.json`):
|
||||
This tool converts a profile from Enso's profile format to the format used by
|
||||
Chrome Dev Tools, for compatibility with third-party tools.
|
||||
|
||||
### Examining frontend-backend timing breakdowns of tasks
|
||||
|
||||
Some additional tools support examining how different processes are contributing
|
||||
to the latency of different tasks, and examining the communication between
|
||||
processes (`processes` in enso-profiler-data`, and message_beanpoles` in
|
||||
`enso-profiler-enso-data`).
|
||||
|
||||
### Averaging profile data
|
||||
|
||||
Given multiple recorded profiles of the same workflow, it is possible to compute
|
||||
averages of specific measurements; this is essential for reducing variance to
|
||||
support A/B testing (see [#future-extensions]), above. However, there is no way
|
||||
to produce a complete "averaged profile" (suitable for visualization) from
|
||||
multiple individual profiles, for various reasons. The average of multiple
|
||||
profiles is hard to even define: A particular measurement may repeat a different
|
||||
number of times in one profile from another. Even ignoring that, an "average"
|
||||
profile wouldn't necessarily represent a set of timings that could occur in a
|
||||
real profile: For example, a property may hold in all profiles that Task A and
|
||||
Task B never occur at the same time; yet if they occur in different orders in
|
||||
individual profiles, their intervals could overlap in an averaged profile.
|
||||
|
||||
## Epic 1: Profiler Implementation
|
||||
|
||||
Because some measurements need to start before WASM is initialized, the
|
||||
profiling framework should provide an implementation with two interfaces – for
|
||||
JavaScript and Rust. Some API syntax will differ, but both APIs should support
|
||||
the same functionality; when the data is emitted, profiling entries from
|
||||
JavaScript and Rust will be merged into one log. There should be no difference
|
||||
in the behavior between calling APIs in any of the languages.
|
||||
|
||||
To minimize performance impact of profiling, instrumentation should do not
|
||||
require any more time-consuming operations than obtaining a timestamp (with
|
||||
`performance.now`) and pushing an entry to a buffer. These entries will be
|
||||
emitted at process exit.
|
||||
|
||||
### Profiling levels
|
||||
|
||||
Fine-grained profiling (like per-frame rendering profiling) is useful for
|
||||
understanding the performance characteristics, but it can cause significant
|
||||
performance overhead. That is why the API should define a few profiling levels:
|
||||
|
||||
- **objective**
|
||||
Measurements that correspond directly to aspects of the user experience. An
|
||||
**objective** can contain other **objective**s, e.g. _GUI initialization_
|
||||
(which might be defined as: time from opening the app until the app is ready
|
||||
to receive user input) contains _time until the loading spinner finishes_.
|
||||
- **task**
|
||||
Coarse-grained tasks, such as app window initialization, GUI downloading, or
|
||||
WASM compilation. A **task** can contain other **task**s e.g. GUI
|
||||
initialization contains GUI downloading.
|
||||
- **details**
|
||||
All processes which can be removed in compile-time for the official stable
|
||||
release for users. We might provide some users with special releases with
|
||||
enabled **detailed** profiling, however, it should be possible to debug and
|
||||
understand most of user-provided logs with disabled **details** view.
|
||||
- **debug**
|
||||
All processes which should be removed in compile-time by default even during
|
||||
app development. It applies to every heavy-usage of the profiling framework,
|
||||
such as per-frame rendering profiling.
|
||||
|
||||
When a profiling level is disabled, all instrumentation associated with that
|
||||
level is eliminated at compile time. In order to avoid passing real profiler
|
||||
values down the call stack when they are not needed by enabled profiling, each
|
||||
type of profiler will conditionally compile to either a real object or a
|
||||
zero-sized type with no-op implementations. This conditional compilation is
|
||||
controlled with the `profiling-level` family of Cargo feature flags, and
|
||||
integrated with the build script, e.g. `./run watch --profiling-level=task`.
|
||||
|
||||
### Rust API
|
||||
|
||||
The primary interface is a `profile` attribute macro. When applied to a function
|
||||
definition, it inserts instrumentation to measure the function's entry and exit
|
||||
times. The measurement is logged under the name of the function, with file/line
|
||||
information attached.
|
||||
|
||||
```rust
|
||||
// The argument to the macro identifies the profiling level at which this measurement is enabled.
|
||||
#[profile(Detail)]
|
||||
fn compute_four() -> u32 {
|
||||
2 + 2
|
||||
}
|
||||
```
|
||||
|
||||
#### Low-level API
|
||||
|
||||
A lower-level API is available for special purposes. The `#[profile]` API above
|
||||
is preferred, especially for non-`async` measurements.
|
||||
|
||||
```rust
|
||||
// Sometimes a block that isn't a whole function may need to be measured; in that case, a RAII
|
||||
// interface is available.
|
||||
//
|
||||
// A trait `profiler::Parent<T>` supports accepting as an argument any object representing a
|
||||
// profiler that could be a parent of a profiler type T. This includes profilers of the same or
|
||||
// higher profiling level, and their corresponding RAII objects (described below).
|
||||
//
|
||||
// Usage note:
|
||||
// When the RAII guard is implicitly dropped at the end of a block, it should be created as the
|
||||
// first line of the block; it measures one full block, and the line containing `start_X!` acts as a
|
||||
// title for the block.
|
||||
fn big_computation(prof: impl profiler::Parent<profiler::Task>) {
|
||||
match some_condition {
|
||||
// When task-level profiling is enabled, the `start_task!` line will create a measurement
|
||||
// named "condition_a". The measurement will extend from the line to the end of its
|
||||
// enclosing scope, when the __profiler object is dropped.
|
||||
Condition::A => {
|
||||
let _profiler = profiler::start_task!(prof, "condition_a");
|
||||
|
||||
// If detail-level profiling is also enabled, "small_computation" will log a
|
||||
// child of the "condition_a" task.
|
||||
let foo = small_computation(_profiler);
|
||||
...
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// `await` in a context profiled by the low-level API should be done through a wrapper that logs the
|
||||
// beginning and end of the await interval. This is necessary to have accurate information about
|
||||
// what tasks are running when.
|
||||
//
|
||||
// If this is not done, tools interpreting profiling data will be able to detect the oversight and
|
||||
// identify the profiler at fault, as long as the callee, or any async task that runs during the
|
||||
// await period, is correctly profiled.
|
||||
//
|
||||
// However if the above detectable case does not occur, and during the await period an unprofiled
|
||||
// async task starts an implicitly-parented profiled task, then some profiler(s) that should be
|
||||
// root-level may be interpreted as children of this profiler.
|
||||
async fn awaiting(prof: impl profiler::Parent<profiler::Task>) {
|
||||
profiler::await_!(some_future(), prof);
|
||||
}
|
||||
|
||||
// Sometimes we need multiple measurements from the same start time; to support this pattern, there
|
||||
// is an interface to create a child profiler that starts at the same time as its parent.
|
||||
fn same_start_child(prof: impl profiler::Parent<profiler::Task>) {
|
||||
let _profiler = profiler::task_with_same_start!(prof, "same_start_child");
|
||||
...
|
||||
}
|
||||
|
||||
// The hierarchy of measurements starts at a const Objective object, APP_LIFETIME.
|
||||
//
|
||||
// Since APP_LIFETIME is not a true objective measurement (i.e. we aren't, in general, trying to
|
||||
// minimize its length), its direct children must be objective measurements, and form the logical
|
||||
// roots of the measurement hierarchy.
|
||||
fn a_root_objective() {
|
||||
let _profiler = profiler::start_objective!(profiler::APP_LIFETIME, "a_root_objective");
|
||||
...
|
||||
}
|
||||
|
||||
// Some measurements need to start when the app starts (the "time origin" in Web terminology);
|
||||
// APP_LIFETIME can be used with with_same_start.
|
||||
fn starts_at_beginning() {
|
||||
let _profiler = profiler::objective_with_same_start!(
|
||||
profiler::APP_LIFETIME,
|
||||
"starts_at_beginning"
|
||||
);
|
||||
...
|
||||
}
|
||||
|
||||
// Rarely, a single block may need to be divided into multiple measurements.
|
||||
// This can be accomplished by creating sub-blocks, and using the same RAII interface as above.
|
||||
fn multipart_fn(prof: impl profiler::Parent<profiler::Task>) {
|
||||
let foo;
|
||||
let bar;
|
||||
{
|
||||
let _profiler = profiler::start_task!(prof, "first_part");
|
||||
|
||||
foo = calculate_foo();
|
||||
bar = calculate_bar();
|
||||
}
|
||||
{
|
||||
let _profiler = profiler::start_task!(prof, "second_part");
|
||||
|
||||
do_something(foo, bar);
|
||||
}
|
||||
}
|
||||
|
||||
// Occasionally, we need to measure an interval that doesn't start and end in any one scope.
|
||||
//
|
||||
// This can be accomplished by storing the RAII object returned by `start_X!` and ensuring it is
|
||||
// dropped at the time the measurement should end.
|
||||
//
|
||||
// In this example, the measurement lasts for the life of the containing `Baz`.
|
||||
struct Baz {
|
||||
..,
|
||||
// profiler::Started<profiler::Objective> holds a profiler::Objective, and a start time.
|
||||
// When it is dropped, the end time will be set and the measurement will be logged.
|
||||
profiler: profiler::Started<profiler::Objective>,
|
||||
}
|
||||
impl Baz {
|
||||
fn new(parent: impl profiler::Parent<profiler::Objective>) -> Self {
|
||||
let profiler = profiler::start_objective!(parent, "doing Baz");
|
||||
Baz { .., profiler }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### JavaScript API
|
||||
|
||||
The API in JavaScript supports the same functionality as the Rust low-level API
|
||||
above, but is used a bit differently because we can't support RAII in JS.
|
||||
|
||||
```js
|
||||
// Higher-level API, for scoped measurements: Measure a block by passing it as a closure.
|
||||
await profiler.objective.measure(profiler.APP_LIFETIME, "init-gui", async (subprof) => {
|
||||
...
|
||||
});
|
||||
|
||||
// Lower-level API: Start and end explicitly.
|
||||
const prof = profiler.objective.start(profiler.APP_LIFETIME, "a root objective");
|
||||
const subprof = profiler.task.with_same_start(prof, "a subtask");
|
||||
subprof.end();
|
||||
prof.end();
|
||||
```
|
||||
|
||||
### Profiling data
|
||||
|
||||
#### File format
|
||||
|
||||
After profiling data is [captured](#collecting-the-data), it is written to a
|
||||
file. The data model of this file is an _event log_ that reflects the raw data
|
||||
collected by the profiling framework. It is a low-level format, considered
|
||||
internal to the `profiler` crates.
|
||||
|
||||
Some properties of the format are important to document. Its data can be divided
|
||||
into two categories:
|
||||
|
||||
- The _interval data_ is information about the _profilers_ themselves: their
|
||||
lifetimes, state changes, and relationships.
|
||||
- _Metadata_ provides information about the application, usually associated with
|
||||
specific profilers.
|
||||
|
||||
The definitions of the interval data will not change often; metadata is expected
|
||||
to change as more data-logging is supported. It is desirable that when a
|
||||
changeset alters a metadata definition, we can still tell whether that changeset
|
||||
introduces a performance regression in the interval data; more generally, we can
|
||||
check performance against a baseline even if metadata changes have happened
|
||||
since the baseline was recorded. To ensure this:
|
||||
|
||||
- The definitions of _interval data_ should be as stable as possible.
|
||||
- The format is designed and implemented such that when a metadata type's
|
||||
definition changes, we can still read at least the intervals (We will also
|
||||
still be able to read metadata of other types, although this is not
|
||||
essential).
|
||||
|
||||
In the initial implementation, there is no explicit versioning to detect changes
|
||||
to the datatype definitions; we rely on the self-describing format (JSON) to
|
||||
recognize changes. Because of this:
|
||||
|
||||
- Developers altering metadata definitions should try not to introduce an
|
||||
important change to the semantics of a field without any concurrent structural
|
||||
change. (E.g. If a field's meaning has changed, change its name too.)
|
||||
- Developers interpreting profiling data should be aware that tools may
|
||||
misinterpret some metadata if producer and consumer are subtly incompatible
|
||||
versions.
|
||||
|
||||
#### Measurement API
|
||||
|
||||
The `profiler` framework includes a `data` library which interprets the
|
||||
low-level _event log_ data, and makes the information available through a
|
||||
higher-level API.
|
||||
|
||||
### Open questions
|
||||
|
||||
There are open questions that should be answered as part of implementation of
|
||||
this design idea. Please note, that you should not answer them by yourself.
|
||||
Instead, you talk with other team members who have broad domain-specific
|
||||
knowledge in these areas, please. Questions:
|
||||
|
||||
- What will [the interface to profiling data](#measurement-api) be?
|
||||
- What tools will be implemented to support working with profile files? (Need to
|
||||
support: comparing mean timings between profiles; including variance between
|
||||
sets of profiles)
|
||||
|
||||
## Epic 1: Performance Goals and Reference Hardware
|
||||
|
||||
To know where we stand in terms of performance, we need to have references. This
|
||||
means we have to look at how other software behaves and compare how slow/fast
|
||||
Enso is for _objective_ measures.
|
||||
|
||||
### Hardware
|
||||
|
||||
The tests should be conducted on the following hardware types:
|
||||
|
||||
- The newest MacBook Pro.
|
||||
- An "average client laptop" without dedicated GPU, which should be a "common"
|
||||
hardware used in enterprises nowadays.
|
||||
- A "minimal-spec laptop".
|
||||
|
||||
### Open questions
|
||||
|
||||
There are several open questions that should be answered as part of
|
||||
implementation of this design idea. Please note, that you should not answer them
|
||||
by yourself. Instead, you talk with other team members who have broad
|
||||
domain-specific knowledge in these areas, please. Questions:
|
||||
|
||||
- What is the exact hardware specification that we should consider "average
|
||||
client laptop" and "minimal-spec laptop" described above?
|
||||
- What is the time needed for other software to perform the _objective_-level
|
||||
actions and which software we should compare Enso to (e.g. Alteryx, Knime,
|
||||
Tableau, ...).
|
||||
|
||||
## Epic 2: Performance improvements planning
|
||||
|
||||
Once we have determined our performance goals and created the tools to measure
|
||||
them, we can focus on improving the most significant performance bottlenecks.
|
||||
The process would be as follows:
|
||||
|
||||
- We will compare the current results with results of other software, and we
|
||||
will describe our performance goals.
|
||||
- We will analyze the profiling information and check which processes need
|
||||
improvements in order for the goals to be matched.
|
||||
- We will create user-stories (tasks) for improving these components, and we
|
||||
will plan their implementation. All tasks will become sub-tasks of one
|
||||
performance-related epic.
|
||||
|
||||
## Epic 3: Continuous performance monitoring
|
||||
|
||||
As a last step, we should take what we have learned, and ensure that we do not
|
||||
introduce regressions in the future. We should add a CI step that fails if we
|
||||
introduce new performance regressions.
|
||||
|
||||
To achieve this we should add a command line interface that makes a reference
|
||||
report, tests the application, and returns whether the test was worse in any of
|
||||
the measurements than the reference report (including some margin of error to
|
||||
avoid spurious failures).
|
||||
|
||||
We then add a reference performance report to the build system and compare new
|
||||
PRs against this report. If they are introducing regression, CI will fail and
|
||||
the performance of the PR will need to be checked/improved.
|
||||
|
||||
Moreover, the CI should generate time reports over time, in a similar fashion to
|
||||
the [performance reports in Rust](https://perf.rust-lang.org).
|
||||
|
||||
![image](https://user-images.githubusercontent.com/1623053/141743832-f55123f4-4d32-4859-8b8b-26df21958200.png)
|
||||
|
||||
In order to allow high sensitivity without excess false positives and with a
|
||||
minimum of required testing resources, we should probably have a way of tagging
|
||||
issues that may affect specific objective-level measures. For example, the most
|
||||
expensive measurement to take is time to initialize the app and open a project;
|
||||
looking at the most recent PRs, most could safely be assumed to have no impact
|
||||
on startup time. (Of course, it is important to err on the side of safety
|
||||
here--but I think the chance of an issue having a completely unforeseeable
|
||||
performance impact is lower than the overhead associated with running this test
|
||||
for every PR; and if we run it too often, we'll get used to seeing false
|
||||
positives, which reduces the value of testing).
|
||||
|
||||
# Future extensions
|
||||
|
||||
## A/B-testing tool
|
||||
|
||||
## Additional profiling information
|
||||
|
||||
The time profiling is a good start for understanding application performance
|
||||
metrics, however, it is not enough to analyze causes of bottlenecks, especially
|
||||
in async-first applications, like graphical user interfaces.
|
||||
|
||||
The profiling toolkit should also collect per-frame statistics for the duration
|
||||
of each of the processes and report their summary, including:
|
||||
|
||||
- Graph Editor public API (used by GUI backend) FRP events count, e.g. how many
|
||||
times the Graph Editor was asked to create a node or modify an expression.
|
||||
- Min, max, and average rendering statistics:
|
||||
- WASM memory usage.
|
||||
- GPU memory usage.
|
||||
- Sprites count.
|
||||
- Shaders count.
|
||||
- ... other statistics currently collected by the GUI performance monitor
|
||||
implemented in `gui/src/rust/ensogl/lib/core/src/debug/stats.rs` (you can
|
||||
open it by `CMD/CTRL+ALT+<TILDE>`, where `<TILDE>` is the button below the
|
||||
escape key and because of a bug, can be different from tilde on some
|
||||
keyboard layouts).
|
||||
|
||||
Another important type of measurement is cold-start performance: the amount of
|
||||
time it takes to reach important startup milestones on a freshly-booted machine,
|
||||
when the app's resources are not yet cached to memory and the processor has not
|
||||
recently run the process (so no branch prediction data etc. is cached yet).
|
||||
Because it requires many repeated measurements to control for high variance (and
|
||||
a reboot for each measurement), this data will take a lot of time to collect; it
|
||||
is not feasible to run it in regular CI tests. However, it reflects important
|
||||
aspects of real-world usage, so it is important data to check on a regular
|
||||
basis. (To be determined: how frequently.)
|
||||
|
||||
## Production monitoring
|
||||
|
||||
It's important to know how performance benchmarks compare to actual user
|
||||
experience. In production builds, we could collect objective-level measurements,
|
||||
aggregate them into statistical reports, and submit them through the
|
||||
anonymized-data facilities. The details of this are outside the initial scope of
|
||||
this design.
|
||||
|
||||
## More advanced run graphs
|
||||
|
||||
Run-graphs could support rendering callgraph data collected by V8 alongside the
|
||||
measurements generated by our instrumentation.
|
||||
|
||||
Run-graphs could support interactive functionality, such as collapsing the
|
||||
lowest-level profiling data until it is clicked.
|
Loading…
Reference in New Issue
Block a user