enso/lib/rust/ensogl/README.md

392 lines
18 KiB
Markdown

# Enso App Framework
## Overview
Enso App Framework is a fully featured framework for building modern, blazing
fast web applications in the Rust programming language. It comes batteries
included, containing:
- **[Enso Canvas], a WebGL-based vector shapes rendering engine**
It is blazing-fast, pixel-perfect, uses a high-quality computational
anti-aliasing, allows _almost zero-cost_ boolean operations on shapes, and
uses sophisticated Lab CIECH color management system for unparalleled results.
- \*\*[Enso Signals], a [functional reactive programming] signal processing
engine designed exclusively for the needs of efficient GUI programming and
optimized for Rust semantics.
- [Enso GUI], a rich set of modern GUI components, including iOS-like mouse
cursor.
-
EnsoGL is a blazing fast vector rendering engine that comes batteries included.
It was developed as part of the [Enso](https://github.com/enso-org/enso)
project.
## Demo
See the demo videos of [Enso](https://github.com/enso-org/enso) to see an
example application based on EnsoGl
## Features
### High performance and small size
- **No garbage collector**
EnsoGL is written in Rust. All memory management is static, there is not
garbage collection needed, and thus, you can be sure that your creations will
run 60 frames per second without unexpected hiccups.
- **Small binary size**
EnsoGL is a very feature rich library, however, it includes all aspects needed
to build fully featured, production ready applications, including rich set of
GUI elements, animation engine, user events processing engine, keyboard
shortcut management, mouse gesture management, and even dedicated theme
resolution engine. For example, [Enso](https://github.com/enso-org/enso),
which naturally uses EnsoGl for all client-side logic weights less than 4Mb in
production mode build.
### Vector Shapes
- **Highest anti-aliasing quality possible**
The shapes are always smooth and crisp. They are described using mathematical
equations and do not use triangle-based approximation nor are they
interpolated in any way. For example, after subtracting two circles, no matter
how much you scale the resulting shape, it will always render smooth, crisp,
and without any visual glitches and imperfections. It's worth noting that
EnsoGL uses [Signed Distance Functions][sdf] to describe shapes and perform
anti-aliasing, and thus do not need
- **Pixel prefect**
Shapes align perfectly with the pixels on the screen. Rendering a rectangle
with integer position will not produce any anti-aliased borders.
- **Rich set of primitive shapes**
Including a circle, a rectangle, a rectangle with rounded corners, a triangle,
a line, a bezier curve, and many more. You can also define your own shapes by
using [Signed Distance Functions][sdf].
- **Blazing fast boolean operations on shapes**
EnsoGL allows performing boolean operations on shapes, including subtracting
shapes, finding common part of two shapes, and even merging shapes with
rounded intersection areas (bevels). All these operations are very fast and do
not depend on the shapes' complexity. Subtracting two circles is as fast as
subtracting two shapes build out of 100 circles each.
- **Infinite amount of symbols instancing** EnsoGL supports rendering of
infinite amount of shapes instances at close-to-zero performance cost (a cost
of a few GPU cycles for all instances altogether). The instancing is done by
folding the used coordinate system into cyclic space.
- **Lab CIECH color space based color management** EnsoGL uses Lab CIECH color
blending in order to output color blending results. Unlike HTML and CSS
implementations in all popular browsers nowadays, EnsoGL do not produce
[visual artifacts when blending colors together][blending in browsers].
- **Various coordinate systems** EnsoGL supports various coordinate systems
including Cartesian and Polar ones. You can freely switch between in order to
for example bend some parts of the shapes around a given point.
### Signals
EnsoGL ships with a state of the art [Functional Reactive Programming
(FRP)][frp] event processing system designed exclusively for the needs of GUI
programming and optimized for Rust semantics. FRP systems allow designing even
very complex event dependencies in a static, easy to debug way. Unlike
old-school event-listener based approach, FRP does not cause [callback hell] nor
leads to 'spaghetti' code, which is hard to read and extend.
### Animation
EnsoGL delivers a set of lightweight animation engines in a form of a reactive
FRP API. It allows attaching animations to every interface element simply by
plugging an FRP event source to FRP animation node. For example, the Inertia
Simulator enables physical-based animations of positions and colors, allowing at
the same time changing the destination values with smooth interpolation between
states. The Tween engine does not allow smooth destination value change,
however, its so lightweight, that you can consider it non-existent from the
performance point of view.
- Mixing HTML elements
### Modern GUI Components
### Built-in performance statistics
# Rendering Architecture
https://www.nomnoml.com :
```ignore
#zoom: 0.6
#gutter:100
#padding: 14
#leading: 1.4
#spacing: 60
#edgeMargin:5
#arrowSize: 0.8
#fill: #FFFFFF; #fdf6e3
#background: #FFFFFF
#.usr: visual=roundrect title=bold stroke=rgb(237,80,80)
#.dyn: visual=roundrect title=bold dashed
#.cpu: visual=roundrect title=bold
#.gpu: stroke=rgb(68,133,187) visual=roundrect
[<gpu> Buffer]
[<gpu> WebGL Context]
[<cpu> AttributeScope]
[<cpu> Attribute]
[<cpu> Mesh]
[<cpu> Material]
[<cpu> Symbol]
[<cpu> SymbolRegistry]
[<cpu> World]
[<cpu> Scene]
[<cpu> View]
[<cpu> SpriteSystem]
[<cpu> Sprite]
[<cpu> ShapeSystem]
[<dyn> ShapeView]
[<usr> *Shape]
[<usr> *ShapeSystem]
[<usr> *Component]
[<cpu> Application]
[AttributeScope] o- [Buffer]
[Buffer] o-- [Attribute]
[Mesh]* o- 4[AttributeScope]
[Symbol]* o- [Mesh]
[Symbol]* o- [Material]
[SymbolRegistry] o- [Symbol]
[Scene] - [SymbolRegistry]
[Scene] o- [View]
[Scene] - [WebGL Context]
[SpriteSystem] o- [Symbol]
[SpriteSystem] o-- [Sprite]
[ShapeSystem] o- [SpriteSystem]
[Sprite] o- [Symbol]
[Sprite] o- [Attribute]
[*Shape] o- [Sprite]
[*ShapeSystem] o- [ShapeSystem]
[*ShapeSystem] o-- [*Shape]
[*Component] o- [ShapeView]
[ShapeView] - [*Shape]
[View] o- [Symbol]
[View] o- [*ShapeSystem]
[World] o- [Scene]
[Application] - [World]
[Application] o- [*Component]
```
# Shapes Rendering
## The Current Architecture
The current implementation uses instanced rendering to display shapes. First, a
simple rectangular geometry is defined, and for each new instance, a new
attribute is added to the list of attached attribute arrays. During rendering,
we use the `draw_arrays_instanced` WebGL call to iterate over the arrays and
draw each shape. The shape placement is done from within its vertex shader.
See the documentation of [`crate::system::gpu::data::Buffer`]. See the
documentation of [`crate::system::gpu::data::Attribute`]. See the documentation
of [`crate::system::gpu::data::AttributeScope`].
### Known Issues / Ideas of Improvement
The current architecture is very efficient at shapes rendering, which comes with
a few limitations. Below, there are many other architectures described with
their own gains and problems and we should consider improving the current
approach in the future. However, keep in mind that the listed limitations allow
us for very fast rendering pipeline, so it's questionable whether we would like
to ever change it.
The most significant limitations of the current approach are:
- No possibility to depth-sort the shapes instances. The used
`draw_arrays_instanced` WebGL draw call iterates over all attrib arrays and
draws a new instance for each entry. There is no possibility to specify the
iteration order, while re-ordering the attrib arrays can be CPU heavy (with
big instance count) and would require re-sending big amount of data between
CPU and GPU (e.g. moving the top-most instance to the bottom would require
moving its attribs in all attached attrib arrays from the last position to the
front, and thus, re sending ALL attrib arrays to the GPU (for ALL INSTANCES)).
- No efficient memory management. In case an instance with a high ID exists and
many instances with lower IDs are already destroyed, the memory of the
destroyed instances cannot be freed. This is because currently the sprite
instances remember the ID (wrapper over usize) of the instance, which is used
as the attrib array index. Thus, it is impossible to update the number in all
sprite instances in memory, and sort the instances to move the destroyed ones
to the end of the buffer to free it. This could be easily solved by using
`Rc<Cell<ID>>` instead, however, it is important to benchmark how big
performance impact this will cause. Also, other architectures may provide
alternative solutions.
- No possibility to render shape instances using different cameras (in separate
draw calls). Currently, the shape instances are drawn with the
`draw_arrays_instanced` WebGL draw call. This API allows drawing all instances
at once, so it is not possible to draw only some subset of them, and thus, it
is not possible to update the view-matrix uniform between the calls. The
OpenGL 4.2 introduced a specialized draw call that would solve this issue
entirely, however, it is not accessible from within WebGL
([glDrawArraysInstancedBaseInstance](https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glDrawArraysInstancedBaseInstance.xhtml)).
### Depth-sorting, memory cleaning, and indexes re-using.
The current approach, however, doesn't allow us to depth-sort the shapes
instances. Also, it does not allow for efficient memory management in case an
instance with a high ID exists and many instances with lover IDs are already
destroyed. This section describes possible alternative architectures and
compares them from this perspective.
There are several possible implementation architectures for attribute
management. The currently used architecture may not be the best one, but the
choice is not obvious and would require complex benchmarking. However, lets
compare the available architectures and lets list their good and bad sides:
#### A. Drawing instanced geometry (the current architecture).
- Rendering. Very fast. May not be as fast as some of other methods, but that
may not be the case with modern hardware, see:
https://stackoverflow.com/a/65376034/889902, and also
https://stackoverflow.com/questions/62537968/using-opengl-instancing-for-rendering-2d-scene-with-object-depths-and-alpha-blen#answer-62538277
- Changing attribute & GPU memory consumption. Very fast and with low memory
consumption. Requires only 1 WebGL call (attribute per instance).
- Visual sorting of instances (depth management). Complex. Requires sorting of
all attribute buffers connected with a particular instance. For big buffers
(many instances) it may require significant CPU -> GPU data upload. For
example, taking the last element to the front, would require shifting all
attributes in all buffers, which basically would mean uploading all data to
the GPU from scratch for that particular geometry. Also, this would require
keeping instance IDs in some kind of `Rc<Cell<usize>>`, as during sorting, the
instance IDs will change, so all sprites would need to be updated.
#### B. Drawing non-instanced, indexed geometry.
- Rendering. Very fast. May be faster than architecture (A). See it's
description to learn more.
- Changing attribute & GPU memory consumption. 4 times slower and 4 times more
memory hungry than architecture (A). Requires setting each attribute for each
vertex (4 WebGL calls). During drawing, vertexes are re-used by using indexed
geometry rendering.
- Visual sorting of instances (depth management). The same issues as in
architecture (A). Even more CPU -> GPU heavy, as the attribute count is
bigger.
#### C. Drawing non-instanced, non-indexed geometry. Using indexing for sorting.
- Rendering. Very fast. May be faster than architecture (A). See it's
description to learn more.
- Changing attribute & GPU memory consumption. 6 times slower and 6 times more
memory hungry than architecture (A). Requires setting each attribute for each
vertex (6 WebGL calls). During drawing, vertexes are not re-used, and thus we
need to set attributes for each vertex of each triangle.
- Visual sorting of instances (depth management). Simple. We can re-use index
buffer to sort the geometry by telling GPU in what order it should render each
of the vertexes. Unlike previous architectures, this would not require to
create any more internally mutable state regarding attribute index management
(the indexes will not change during sorting).
However, sorting for the needs of memory compression (removing unused memory
for sparse attrib arrays) would still require re-uploading sorted data to GPU,
just as in architecture (A).
#### D. Keeping all attribute values in a texture and passing index buffer to the shader.
This is a very different architecture to what is currently implemented and might
require very complex refactoring in order to be even tested and benchmarked
properly. To learn more about the idea, follow the link:
https://stackoverflow.com/a/65376034/889902.
- Rendering. Fast. May be slower than architecture (A). Needs real benchmarks.
- Changing attribute & GPU memory consumption. Changing attribute would require
2 WebGL calls: the `bindTexture`, and `texParameterf` (or similar).
Performance of this solution is questionable, but in real life, it may be as
fast as architecture (A). The memory consumption should be fine as well, as
WebGL textures behave like C++ Vectors, so even if we allocate the texture of
max size, it will occupy only the needed space. This will also limit the
number of instances on the stage, but the limit will be big enough (assuming
max texture od 2048px x 2048px and 20 float attributes per shader, this will
allow us to render over 200 000 shapes). Also, this architecture would allow
us to pass more attributes to shaders than it is currently possible, which on
the other hand, would probably negatively affect the fragment shader
performance.
- Visual sorting of instances (depth management). Simple. Next to the attribute
texture, we can pass index buffer to the shader, which will dictate what
initial offset in the texture should be used. This would allow for the fastest
sorting mechanism of all of the above architectures.
However, sorting for the needs of memory compression (removing unused memory
for sparse attrib arrays) would still require re-uploading sorted data to GPU,
just as in architecture (A).
#### E. Using the depth-buffer for sorting.
As with architecture (C), this is a very different architecture to what is
currently implemented and might require very complex refactoring in order to be
even tested and benchmarked properly. This architecture, however, is the most
common architecture among all WebGL / OpenGL applications, but it is not really
well suitable for SDF-based shapes rendering, as it requires anti-aliasing to be
done by multisampling, which is not needed with SDF-based rasterization. It
lowers the quality and drastically increases the rendering time (in the case of
4x4 multisampling, the rendering time is 16x bigger than the time of
architecture (A)).
There is one additional thread to consider here, namely, with some browsers,
systems, and GPU combinations, the super-sampling anti-aliasing is not
accessible in WebGL. In such situations we could use a post-processing
anti-aliasing techniques, such as [FXAA][1] or [SMAA][2], however, the resulting
image quality will be even worse. We could also use custom multi-sampled render
buffers for implementing [multi-sampled depth buffers][3]. [1]
https://github.com/mitsuhiko/webgl-meincraft/blob/master/assets/shaders/fxaa.glsl
[2]
http://www.iryoku.com/papers/SMAA-Enhanced-Subpixel-Morphological-Antialiasing.pdf
[3]
https://stackoverflow.com/questions/50613696/whats-the-purpose-of-multisample-renderbuffers
- Rendering. May be 9x - 16x slower than architecture (A), depending on
multi-sampling level. Also, the final image quality and edge sharpness will be
lower. There is, however, an open question, whether there is an SDF-suitable
depth-buffer sorting technique which would not cause such downsides (maybe
involving SDF-based depth buffer). Currently, we don't know of any such
technique.
- Changing attribute & GPU memory consumption. Fast with low memory consumption.
The same as with architecture (A), (B), or (C).
- Visual sorting of instances (depth management). Simple and fast. Much faster
than any other architecture listed before, as it does not require upfront
CPU-side buffer sorting.
#### F. Using depth-peeling / dual depth-peeling algorithms.
As with architecture (C), this is a very different architecture to what is
currently implemented and might require very complex refactoring in order to be
even tested and benchmarked properly. The idea is to render the scene multiple
times, as long as some objects do overlap, by "peeling" the top-most (and
bottom-most) layers every time. See the [Interactive Order-Independent
Transparency][1], the [Order Independent Transparency with Dual Depth
Peeling][2], and the [sample WebGL implementation][3] to learn more.
[1] https://my.eng.utah.edu/~cs5610/handouts/order_independent_transparency.pdf
[2]
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.193.3485&rep=rep1&type=pdf
[3]
https://medium.com/@shrekshao_71662/dual-depth-peeling-implementation-in-webgl-11baa061ba4b
- Rendering. May be several times slower than architecture (A) due to the need
to render the scene by peeling components. However, in contrast to the
architecture (D), the final image quality should be as good as with
architecture (A), (B), or (C).
- Changing attribute & GPU memory consumption. Fast with low memory consumption.
The same as with architecture (A), (B), or (C).
- Visual sorting of instances (depth management). Simple and fast. As fast as
architecture (E), as it does not require upfront CPU-side buffer sorting.