mirror of
https://github.com/enso-org/enso.git
synced 2024-12-23 08:53:31 +03:00
392 lines
18 KiB
Markdown
392 lines
18 KiB
Markdown
# Enso App Framework
|
|
|
|
## Overview
|
|
|
|
Enso App Framework is a fully featured framework for building modern, blazing
|
|
fast web applications in the Rust programming language. It comes batteries
|
|
included, containing:
|
|
|
|
- **[Enso Canvas], a WebGL-based vector shapes rendering engine**
|
|
It is blazing-fast, pixel-perfect, uses a high-quality computational
|
|
anti-aliasing, allows _almost zero-cost_ boolean operations on shapes, and
|
|
uses sophisticated Lab CIECH color management system for unparalleled results.
|
|
- \*\*[Enso Signals], a [functional reactive programming] signal processing
|
|
engine designed exclusively for the needs of efficient GUI programming and
|
|
optimized for Rust semantics.
|
|
- [Enso GUI], a rich set of modern GUI components, including iOS-like mouse
|
|
cursor.
|
|
-
|
|
|
|
EnsoGL is a blazing fast vector rendering engine that comes batteries included.
|
|
It was developed as part of the [Enso](https://github.com/enso-org/enso)
|
|
project.
|
|
|
|
## Demo
|
|
|
|
See the demo videos of [Enso](https://github.com/enso-org/enso) to see an
|
|
example application based on EnsoGl
|
|
|
|
## Features
|
|
|
|
### High performance and small size
|
|
|
|
- **No garbage collector**
|
|
EnsoGL is written in Rust. All memory management is static, there is not
|
|
garbage collection needed, and thus, you can be sure that your creations will
|
|
run 60 frames per second without unexpected hiccups.
|
|
- **Small binary size**
|
|
EnsoGL is a very feature rich library, however, it includes all aspects needed
|
|
to build fully featured, production ready applications, including rich set of
|
|
GUI elements, animation engine, user events processing engine, keyboard
|
|
shortcut management, mouse gesture management, and even dedicated theme
|
|
resolution engine. For example, [Enso](https://github.com/enso-org/enso),
|
|
which naturally uses EnsoGl for all client-side logic weights less than 4Mb in
|
|
production mode build.
|
|
|
|
### Vector Shapes
|
|
|
|
- **Highest anti-aliasing quality possible**
|
|
The shapes are always smooth and crisp. They are described using mathematical
|
|
equations and do not use triangle-based approximation nor are they
|
|
interpolated in any way. For example, after subtracting two circles, no matter
|
|
how much you scale the resulting shape, it will always render smooth, crisp,
|
|
and without any visual glitches and imperfections. It's worth noting that
|
|
EnsoGL uses [Signed Distance Functions][sdf] to describe shapes and perform
|
|
anti-aliasing, and thus do not need
|
|
- **Pixel prefect**
|
|
Shapes align perfectly with the pixels on the screen. Rendering a rectangle
|
|
with integer position will not produce any anti-aliased borders.
|
|
|
|
- **Rich set of primitive shapes**
|
|
Including a circle, a rectangle, a rectangle with rounded corners, a triangle,
|
|
a line, a bezier curve, and many more. You can also define your own shapes by
|
|
using [Signed Distance Functions][sdf].
|
|
|
|
- **Blazing fast boolean operations on shapes**
|
|
EnsoGL allows performing boolean operations on shapes, including subtracting
|
|
shapes, finding common part of two shapes, and even merging shapes with
|
|
rounded intersection areas (bevels). All these operations are very fast and do
|
|
not depend on the shapes' complexity. Subtracting two circles is as fast as
|
|
subtracting two shapes build out of 100 circles each.
|
|
- **Infinite amount of symbols instancing** EnsoGL supports rendering of
|
|
infinite amount of shapes instances at close-to-zero performance cost (a cost
|
|
of a few GPU cycles for all instances altogether). The instancing is done by
|
|
folding the used coordinate system into cyclic space.
|
|
|
|
- **Lab CIECH color space based color management** EnsoGL uses Lab CIECH color
|
|
blending in order to output color blending results. Unlike HTML and CSS
|
|
implementations in all popular browsers nowadays, EnsoGL do not produce
|
|
[visual artifacts when blending colors together][blending in browsers].
|
|
- **Various coordinate systems** EnsoGL supports various coordinate systems
|
|
including Cartesian and Polar ones. You can freely switch between in order to
|
|
for example bend some parts of the shapes around a given point.
|
|
|
|
### Signals
|
|
|
|
EnsoGL ships with a state of the art [Functional Reactive Programming
|
|
(FRP)][frp] event processing system designed exclusively for the needs of GUI
|
|
programming and optimized for Rust semantics. FRP systems allow designing even
|
|
very complex event dependencies in a static, easy to debug way. Unlike
|
|
old-school event-listener based approach, FRP does not cause [callback hell] nor
|
|
leads to 'spaghetti' code, which is hard to read and extend.
|
|
|
|
### Animation
|
|
|
|
EnsoGL delivers a set of lightweight animation engines in a form of a reactive
|
|
FRP API. It allows attaching animations to every interface element simply by
|
|
plugging an FRP event source to FRP animation node. For example, the Inertia
|
|
Simulator enables physical-based animations of positions and colors, allowing at
|
|
the same time changing the destination values with smooth interpolation between
|
|
states. The Tween engine does not allow smooth destination value change,
|
|
however, its so lightweight, that you can consider it non-existent from the
|
|
performance point of view.
|
|
|
|
- Mixing HTML elements
|
|
|
|
### Modern GUI Components
|
|
|
|
### Built-in performance statistics
|
|
|
|
# Rendering Architecture
|
|
|
|
https://www.nomnoml.com :
|
|
|
|
```ignore
|
|
#zoom: 0.6
|
|
#gutter:100
|
|
#padding: 14
|
|
#leading: 1.4
|
|
#spacing: 60
|
|
#edgeMargin:5
|
|
#arrowSize: 0.8
|
|
#fill: #FFFFFF; #fdf6e3
|
|
|
|
#background: #FFFFFF
|
|
#.usr: visual=roundrect title=bold stroke=rgb(237,80,80)
|
|
#.dyn: visual=roundrect title=bold dashed
|
|
#.cpu: visual=roundrect title=bold
|
|
#.gpu: stroke=rgb(68,133,187) visual=roundrect
|
|
|
|
[<gpu> Buffer]
|
|
[<gpu> WebGL Context]
|
|
[<cpu> AttributeScope]
|
|
[<cpu> Attribute]
|
|
[<cpu> Mesh]
|
|
[<cpu> Material]
|
|
[<cpu> Symbol]
|
|
[<cpu> SymbolRegistry]
|
|
[<cpu> World]
|
|
[<cpu> Scene]
|
|
[<cpu> View]
|
|
[<cpu> SpriteSystem]
|
|
[<cpu> Sprite]
|
|
[<cpu> ShapeSystem]
|
|
[<dyn> ShapeView]
|
|
[<usr> *Shape]
|
|
[<usr> *ShapeSystem]
|
|
[<usr> *Component]
|
|
[<cpu> Application]
|
|
|
|
[AttributeScope] o- [Buffer]
|
|
[Buffer] o-- [Attribute]
|
|
[Mesh]* o- 4[AttributeScope]
|
|
[Symbol]* o- [Mesh]
|
|
[Symbol]* o- [Material]
|
|
[SymbolRegistry] o- [Symbol]
|
|
[Scene] - [SymbolRegistry]
|
|
[Scene] o- [View]
|
|
[Scene] - [WebGL Context]
|
|
|
|
[SpriteSystem] o- [Symbol]
|
|
[SpriteSystem] o-- [Sprite]
|
|
[ShapeSystem] o- [SpriteSystem]
|
|
[Sprite] o- [Symbol]
|
|
[Sprite] o- [Attribute]
|
|
[*Shape] o- [Sprite]
|
|
[*ShapeSystem] o- [ShapeSystem]
|
|
[*ShapeSystem] o-- [*Shape]
|
|
[*Component] o- [ShapeView]
|
|
[ShapeView] - [*Shape]
|
|
[View] o- [Symbol]
|
|
[View] o- [*ShapeSystem]
|
|
[World] o- [Scene]
|
|
[Application] - [World]
|
|
[Application] o- [*Component]
|
|
```
|
|
|
|
# Shapes Rendering
|
|
|
|
## The Current Architecture
|
|
|
|
The current implementation uses instanced rendering to display shapes. First, a
|
|
simple rectangular geometry is defined, and for each new instance, a new
|
|
attribute is added to the list of attached attribute arrays. During rendering,
|
|
we use the `draw_arrays_instanced` WebGL call to iterate over the arrays and
|
|
draw each shape. The shape placement is done from within its vertex shader.
|
|
|
|
See the documentation of [`crate::system::gpu::data::Buffer`]. See the
|
|
documentation of [`crate::system::gpu::data::Attribute`]. See the documentation
|
|
of [`crate::system::gpu::data::AttributeScope`].
|
|
|
|
### Known Issues / Ideas of Improvement
|
|
|
|
The current architecture is very efficient at shapes rendering, which comes with
|
|
a few limitations. Below, there are many other architectures described with
|
|
their own gains and problems and we should consider improving the current
|
|
approach in the future. However, keep in mind that the listed limitations allow
|
|
us for very fast rendering pipeline, so it's questionable whether we would like
|
|
to ever change it.
|
|
|
|
The most significant limitations of the current approach are:
|
|
|
|
- No possibility to depth-sort the shapes instances. The used
|
|
`draw_arrays_instanced` WebGL draw call iterates over all attrib arrays and
|
|
draws a new instance for each entry. There is no possibility to specify the
|
|
iteration order, while re-ordering the attrib arrays can be CPU heavy (with
|
|
big instance count) and would require re-sending big amount of data between
|
|
CPU and GPU (e.g. moving the top-most instance to the bottom would require
|
|
moving its attribs in all attached attrib arrays from the last position to the
|
|
front, and thus, re sending ALL attrib arrays to the GPU (for ALL INSTANCES)).
|
|
|
|
- No efficient memory management. In case an instance with a high ID exists and
|
|
many instances with lower IDs are already destroyed, the memory of the
|
|
destroyed instances cannot be freed. This is because currently the sprite
|
|
instances remember the ID (wrapper over usize) of the instance, which is used
|
|
as the attrib array index. Thus, it is impossible to update the number in all
|
|
sprite instances in memory, and sort the instances to move the destroyed ones
|
|
to the end of the buffer to free it. This could be easily solved by using
|
|
`Rc<Cell<ID>>` instead, however, it is important to benchmark how big
|
|
performance impact this will cause. Also, other architectures may provide
|
|
alternative solutions.
|
|
|
|
- No possibility to render shape instances using different cameras (in separate
|
|
draw calls). Currently, the shape instances are drawn with the
|
|
`draw_arrays_instanced` WebGL draw call. This API allows drawing all instances
|
|
at once, so it is not possible to draw only some subset of them, and thus, it
|
|
is not possible to update the view-matrix uniform between the calls. The
|
|
OpenGL 4.2 introduced a specialized draw call that would solve this issue
|
|
entirely, however, it is not accessible from within WebGL
|
|
([glDrawArraysInstancedBaseInstance](https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glDrawArraysInstancedBaseInstance.xhtml)).
|
|
|
|
### Depth-sorting, memory cleaning, and indexes re-using.
|
|
|
|
The current approach, however, doesn't allow us to depth-sort the shapes
|
|
instances. Also, it does not allow for efficient memory management in case an
|
|
instance with a high ID exists and many instances with lover IDs are already
|
|
destroyed. This section describes possible alternative architectures and
|
|
compares them from this perspective.
|
|
|
|
There are several possible implementation architectures for attribute
|
|
management. The currently used architecture may not be the best one, but the
|
|
choice is not obvious and would require complex benchmarking. However, lets
|
|
compare the available architectures and lets list their good and bad sides:
|
|
|
|
#### A. Drawing instanced geometry (the current architecture).
|
|
|
|
- Rendering. Very fast. May not be as fast as some of other methods, but that
|
|
may not be the case with modern hardware, see:
|
|
https://stackoverflow.com/a/65376034/889902, and also
|
|
https://stackoverflow.com/questions/62537968/using-opengl-instancing-for-rendering-2d-scene-with-object-depths-and-alpha-blen#answer-62538277
|
|
|
|
- Changing attribute & GPU memory consumption. Very fast and with low memory
|
|
consumption. Requires only 1 WebGL call (attribute per instance).
|
|
|
|
- Visual sorting of instances (depth management). Complex. Requires sorting of
|
|
all attribute buffers connected with a particular instance. For big buffers
|
|
(many instances) it may require significant CPU -> GPU data upload. For
|
|
example, taking the last element to the front, would require shifting all
|
|
attributes in all buffers, which basically would mean uploading all data to
|
|
the GPU from scratch for that particular geometry. Also, this would require
|
|
keeping instance IDs in some kind of `Rc<Cell<usize>>`, as during sorting, the
|
|
instance IDs will change, so all sprites would need to be updated.
|
|
|
|
#### B. Drawing non-instanced, indexed geometry.
|
|
|
|
- Rendering. Very fast. May be faster than architecture (A). See it's
|
|
description to learn more.
|
|
|
|
- Changing attribute & GPU memory consumption. 4 times slower and 4 times more
|
|
memory hungry than architecture (A). Requires setting each attribute for each
|
|
vertex (4 WebGL calls). During drawing, vertexes are re-used by using indexed
|
|
geometry rendering.
|
|
|
|
- Visual sorting of instances (depth management). The same issues as in
|
|
architecture (A). Even more CPU -> GPU heavy, as the attribute count is
|
|
bigger.
|
|
|
|
#### C. Drawing non-instanced, non-indexed geometry. Using indexing for sorting.
|
|
|
|
- Rendering. Very fast. May be faster than architecture (A). See it's
|
|
description to learn more.
|
|
|
|
- Changing attribute & GPU memory consumption. 6 times slower and 6 times more
|
|
memory hungry than architecture (A). Requires setting each attribute for each
|
|
vertex (6 WebGL calls). During drawing, vertexes are not re-used, and thus we
|
|
need to set attributes for each vertex of each triangle.
|
|
|
|
- Visual sorting of instances (depth management). Simple. We can re-use index
|
|
buffer to sort the geometry by telling GPU in what order it should render each
|
|
of the vertexes. Unlike previous architectures, this would not require to
|
|
create any more internally mutable state regarding attribute index management
|
|
(the indexes will not change during sorting).
|
|
|
|
However, sorting for the needs of memory compression (removing unused memory
|
|
for sparse attrib arrays) would still require re-uploading sorted data to GPU,
|
|
just as in architecture (A).
|
|
|
|
#### D. Keeping all attribute values in a texture and passing index buffer to the shader.
|
|
|
|
This is a very different architecture to what is currently implemented and might
|
|
require very complex refactoring in order to be even tested and benchmarked
|
|
properly. To learn more about the idea, follow the link:
|
|
https://stackoverflow.com/a/65376034/889902.
|
|
|
|
- Rendering. Fast. May be slower than architecture (A). Needs real benchmarks.
|
|
|
|
- Changing attribute & GPU memory consumption. Changing attribute would require
|
|
2 WebGL calls: the `bindTexture`, and `texParameterf` (or similar).
|
|
Performance of this solution is questionable, but in real life, it may be as
|
|
fast as architecture (A). The memory consumption should be fine as well, as
|
|
WebGL textures behave like C++ Vectors, so even if we allocate the texture of
|
|
max size, it will occupy only the needed space. This will also limit the
|
|
number of instances on the stage, but the limit will be big enough (assuming
|
|
max texture od 2048px x 2048px and 20 float attributes per shader, this will
|
|
allow us to render over 200 000 shapes). Also, this architecture would allow
|
|
us to pass more attributes to shaders than it is currently possible, which on
|
|
the other hand, would probably negatively affect the fragment shader
|
|
performance.
|
|
|
|
- Visual sorting of instances (depth management). Simple. Next to the attribute
|
|
texture, we can pass index buffer to the shader, which will dictate what
|
|
initial offset in the texture should be used. This would allow for the fastest
|
|
sorting mechanism of all of the above architectures.
|
|
|
|
However, sorting for the needs of memory compression (removing unused memory
|
|
for sparse attrib arrays) would still require re-uploading sorted data to GPU,
|
|
just as in architecture (A).
|
|
|
|
#### E. Using the depth-buffer for sorting.
|
|
|
|
As with architecture (C), this is a very different architecture to what is
|
|
currently implemented and might require very complex refactoring in order to be
|
|
even tested and benchmarked properly. This architecture, however, is the most
|
|
common architecture among all WebGL / OpenGL applications, but it is not really
|
|
well suitable for SDF-based shapes rendering, as it requires anti-aliasing to be
|
|
done by multisampling, which is not needed with SDF-based rasterization. It
|
|
lowers the quality and drastically increases the rendering time (in the case of
|
|
4x4 multisampling, the rendering time is 16x bigger than the time of
|
|
architecture (A)).
|
|
|
|
There is one additional thread to consider here, namely, with some browsers,
|
|
systems, and GPU combinations, the super-sampling anti-aliasing is not
|
|
accessible in WebGL. In such situations we could use a post-processing
|
|
anti-aliasing techniques, such as [FXAA][1] or [SMAA][2], however, the resulting
|
|
image quality will be even worse. We could also use custom multi-sampled render
|
|
buffers for implementing [multi-sampled depth buffers][3]. [1]
|
|
https://github.com/mitsuhiko/webgl-meincraft/blob/master/assets/shaders/fxaa.glsl
|
|
[2]
|
|
http://www.iryoku.com/papers/SMAA-Enhanced-Subpixel-Morphological-Antialiasing.pdf
|
|
[3]
|
|
https://stackoverflow.com/questions/50613696/whats-the-purpose-of-multisample-renderbuffers
|
|
|
|
- Rendering. May be 9x - 16x slower than architecture (A), depending on
|
|
multi-sampling level. Also, the final image quality and edge sharpness will be
|
|
lower. There is, however, an open question, whether there is an SDF-suitable
|
|
depth-buffer sorting technique which would not cause such downsides (maybe
|
|
involving SDF-based depth buffer). Currently, we don't know of any such
|
|
technique.
|
|
|
|
- Changing attribute & GPU memory consumption. Fast with low memory consumption.
|
|
The same as with architecture (A), (B), or (C).
|
|
|
|
- Visual sorting of instances (depth management). Simple and fast. Much faster
|
|
than any other architecture listed before, as it does not require upfront
|
|
CPU-side buffer sorting.
|
|
|
|
#### F. Using depth-peeling / dual depth-peeling algorithms.
|
|
|
|
As with architecture (C), this is a very different architecture to what is
|
|
currently implemented and might require very complex refactoring in order to be
|
|
even tested and benchmarked properly. The idea is to render the scene multiple
|
|
times, as long as some objects do overlap, by "peeling" the top-most (and
|
|
bottom-most) layers every time. See the [Interactive Order-Independent
|
|
Transparency][1], the [Order Independent Transparency with Dual Depth
|
|
Peeling][2], and the [sample WebGL implementation][3] to learn more.
|
|
|
|
[1] https://my.eng.utah.edu/~cs5610/handouts/order_independent_transparency.pdf
|
|
[2]
|
|
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.193.3485&rep=rep1&type=pdf
|
|
[3]
|
|
https://medium.com/@shrekshao_71662/dual-depth-peeling-implementation-in-webgl-11baa061ba4b
|
|
|
|
- Rendering. May be several times slower than architecture (A) due to the need
|
|
to render the scene by peeling components. However, in contrast to the
|
|
architecture (D), the final image quality should be as good as with
|
|
architecture (A), (B), or (C).
|
|
|
|
- Changing attribute & GPU memory consumption. Fast with low memory consumption.
|
|
The same as with architecture (A), (B), or (C).
|
|
|
|
- Visual sorting of instances (depth management). Simple and fast. As fast as
|
|
architecture (E), as it does not require upfront CPU-side buffer sorting.
|