enso/lib/rust/ensogl
2022-11-14 10:09:49 +01:00
..
app/theme add scroll overshoot bounce animation (#3836) 2022-11-07 10:50:47 +00:00
component Text rendering quality improvements. (#3855) 2022-11-08 19:15:05 +01:00
core Focus management (#3863) 2022-11-14 10:09:49 +01:00
doc EnsoGL context abstraction (#3293) 2022-03-04 15:13:23 +01:00
example Focus management (#3863) 2022-11-14 10:09:49 +01:00
src Build script merge (#3743) 2022-10-10 23:38:48 +02:00
build.rs Linting codebase 2022-03-10 05:32:33 +01:00
Cargo.toml Switch to 2021 edition (#3173) 2021-12-01 16:06:57 +01:00
README.md Refactored config crate + moving crates to lib/rust directory. (#3155) 2021-11-12 13:56:23 +01:00
webdriver.json Refactored config crate + moving crates to lib/rust directory. (#3155) 2021-11-12 13:56:23 +01:00

Enso App Framework

Overview

Enso App Framework is a fully featured framework for building modern, blazing fast web applications in the Rust programming language. It comes batteries included, containing:

  • [Enso Canvas], a WebGL-based vector shapes rendering engine
    It is blazing-fast, pixel-perfect, uses a high-quality computational anti-aliasing, allows almost zero-cost boolean operations on shapes, and uses sophisticated Lab CIECH color management system for unparalleled results.
  • **[Enso Signals], a [functional reactive programming] signal processing engine designed exclusively for the needs of efficient GUI programming and optimized for Rust semantics.
  • [Enso GUI], a rich set of modern GUI components, including iOS-like mouse cursor.

EnsoGL is a blazing fast vector rendering engine that comes batteries included. It was developed as part of the Enso project.

Demo

See the demo videos of Enso to see an example application based on EnsoGl

Features

High performance and small size

  • No garbage collector
    EnsoGL is written in Rust. All memory management is static, there is not garbage collection needed, and thus, you can be sure that your creations will run 60 frames per second without unexpected hiccups.
  • Small binary size
    EnsoGL is a very feature rich library, however, it includes all aspects needed to build fully featured, production ready applications, including rich set of GUI elements, animation engine, user events processing engine, keyboard shortcut management, mouse gesture management, and even dedicated theme resolution engine. For example, Enso, which naturally uses EnsoGl for all client-side logic weights less than 4Mb in production mode build.

Vector Shapes

  • Highest anti-aliasing quality possible
    The shapes are always smooth and crisp. They are described using mathematical equations and do not use triangle-based approximation nor are they interpolated in any way. For example, after subtracting two circles, no matter how much you scale the resulting shape, it will always render smooth, crisp, and without any visual glitches and imperfections. It's worth noting that EnsoGL uses [Signed Distance Functions][sdf] to describe shapes and perform anti-aliasing, and thus do not need

  • Pixel prefect
    Shapes align perfectly with the pixels on the screen. Rendering a rectangle with integer position will not produce any anti-aliased borders.

  • Rich set of primitive shapes
    Including a circle, a rectangle, a rectangle with rounded corners, a triangle, a line, a bezier curve, and many more. You can also define your own shapes by using [Signed Distance Functions][sdf].

  • Blazing fast boolean operations on shapes
    EnsoGL allows performing boolean operations on shapes, including subtracting shapes, finding common part of two shapes, and even merging shapes with rounded intersection areas (bevels). All these operations are very fast and do not depend on the shapes' complexity. Subtracting two circles is as fast as subtracting two shapes build out of 100 circles each.

  • Infinite amount of symbols instancing EnsoGL supports rendering of infinite amount of shapes instances at close-to-zero performance cost (a cost of a few GPU cycles for all instances altogether). The instancing is done by folding the used coordinate system into cyclic space.

  • Lab CIECH color space based color management EnsoGL uses Lab CIECH color blending in order to output color blending results. Unlike HTML and CSS implementations in all popular browsers nowadays, EnsoGL do not produce [visual artifacts when blending colors together][blending in browsers].

  • Various coordinate systems EnsoGL supports various coordinate systems including Cartesian and Polar ones. You can freely switch between in order to for example bend some parts of the shapes around a given point.

Signals

EnsoGL ships with a state of the art [Functional Reactive Programming (FRP)][frp] event processing system designed exclusively for the needs of GUI programming and optimized for Rust semantics. FRP systems allow designing even very complex event dependencies in a static, easy to debug way. Unlike old-school event-listener based approach, FRP does not cause [callback hell] nor leads to 'spaghetti' code, which is hard to read and extend.

Animation

EnsoGL delivers a set of lightweight animation engines in a form of a reactive FRP API. It allows attaching animations to every interface element simply by plugging an FRP event source to FRP animation node. For example, the Inertia Simulator enables physical-based animations of positions and colors, allowing at the same time changing the destination values with smooth interpolation between states. The Tween engine does not allow smooth destination value change, however, its so lightweight, that you can consider it non-existent from the performance point of view.

  • Mixing HTML elements

Modern GUI Components

Built-in performance statistics

Rendering Architecture

https://www.nomnoml.com :

    #zoom: 0.6
    #gutter:100
    #padding: 14
    #leading: 1.4
    #spacing: 60
    #edgeMargin:5
    #arrowSize: 0.8
    #fill: #FFFFFF; #fdf6e3

    #background: #FFFFFF
    #.usr: visual=roundrect title=bold stroke=rgb(237,80,80)
    #.dyn: visual=roundrect title=bold dashed
    #.cpu: visual=roundrect title=bold
    #.gpu: stroke=rgb(68,133,187) visual=roundrect

    [<gpu> Buffer]
    [<gpu> WebGL Context]
    [<cpu> AttributeScope]
    [<cpu> Attribute]
    [<cpu> Mesh]
    [<cpu> Material]
    [<cpu> Symbol]
    [<cpu> SymbolRegistry]
    [<cpu> World]
    [<cpu> Scene]
    [<cpu> View]
    [<cpu> SpriteSystem]
    [<cpu> Sprite]
    [<cpu> ShapeSystem]
    [<dyn> ShapeView]
    [<usr> *Shape]
    [<usr> *ShapeSystem]
    [<usr> *Component]
    [<cpu> Application]

    [AttributeScope] o- [Buffer]
    [Buffer] o-- [Attribute]
    [Mesh]* o- 4[AttributeScope]
    [Symbol]* o- [Mesh]
    [Symbol]* o- [Material]
    [SymbolRegistry] o- [Symbol]
    [Scene] - [SymbolRegistry]
    [Scene] o- [View]
    [Scene] - [WebGL Context]

    [SpriteSystem] o- [Symbol]
    [SpriteSystem] o-- [Sprite]
    [ShapeSystem] o- [SpriteSystem]
    [Sprite] o- [Symbol]
    [Sprite] o- [Attribute]
    [*Shape] o- [Sprite]
    [*ShapeSystem] o- [ShapeSystem]
    [*ShapeSystem] o-- [*Shape]
    [*Component] o- [ShapeView]
    [ShapeView] - [*Shape]
    [View] o- [Symbol]
    [View] o- [*ShapeSystem]
    [World] o- [Scene]
    [Application] - [World]
    [Application] o- [*Component]

Shapes Rendering

The Current Architecture

The current implementation uses instanced rendering to display shapes. First, a simple rectangular geometry is defined, and for each new instance, a new attribute is added to the list of attached attribute arrays. During rendering, we use the draw_arrays_instanced WebGL call to iterate over the arrays and draw each shape. The shape placement is done from within its vertex shader.

See the documentation of [crate::system::gpu::data::Buffer]. See the documentation of [crate::system::gpu::data::Attribute]. See the documentation of [crate::system::gpu::data::AttributeScope].

Known Issues / Ideas of Improvement

The current architecture is very efficient at shapes rendering, which comes with a few limitations. Below, there are many other architectures described with their own gains and problems and we should consider improving the current approach in the future. However, keep in mind that the listed limitations allow us for very fast rendering pipeline, so it's questionable whether we would like to ever change it.

The most significant limitations of the current approach are:

  • No possibility to depth-sort the shapes instances. The used draw_arrays_instanced WebGL draw call iterates over all attrib arrays and draws a new instance for each entry. There is no possibility to specify the iteration order, while re-ordering the attrib arrays can be CPU heavy (with big instance count) and would require re-sending big amount of data between CPU and GPU (e.g. moving the top-most instance to the bottom would require moving its attribs in all attached attrib arrays from the last position to the front, and thus, re sending ALL attrib arrays to the GPU (for ALL INSTANCES)).

  • No efficient memory management. In case an instance with a high ID exists and many instances with lower IDs are already destroyed, the memory of the destroyed instances cannot be freed. This is because currently the sprite instances remember the ID (wrapper over usize) of the instance, which is used as the attrib array index. Thus, it is impossible to update the number in all sprite instances in memory, and sort the instances to move the destroyed ones to the end of the buffer to free it. This could be easily solved by using Rc<Cell<ID>> instead, however, it is important to benchmark how big performance impact this will cause. Also, other architectures may provide alternative solutions.

  • No possibility to render shape instances using different cameras (in separate draw calls). Currently, the shape instances are drawn with the draw_arrays_instanced WebGL draw call. This API allows drawing all instances at once, so it is not possible to draw only some subset of them, and thus, it is not possible to update the view-matrix uniform between the calls. The OpenGL 4.2 introduced a specialized draw call that would solve this issue entirely, however, it is not accessible from within WebGL (glDrawArraysInstancedBaseInstance).

Depth-sorting, memory cleaning, and indexes re-using.

The current approach, however, doesn't allow us to depth-sort the shapes instances. Also, it does not allow for efficient memory management in case an instance with a high ID exists and many instances with lover IDs are already destroyed. This section describes possible alternative architectures and compares them from this perspective.

There are several possible implementation architectures for attribute management. The currently used architecture may not be the best one, but the choice is not obvious and would require complex benchmarking. However, lets compare the available architectures and lets list their good and bad sides:

A. Drawing instanced geometry (the current architecture).

  • Rendering. Very fast. May not be as fast as some of other methods, but that may not be the case with modern hardware, see: https://stackoverflow.com/a/65376034/889902, and also https://stackoverflow.com/questions/62537968/using-opengl-instancing-for-rendering-2d-scene-with-object-depths-and-alpha-blen#answer-62538277

  • Changing attribute & GPU memory consumption. Very fast and with low memory consumption. Requires only 1 WebGL call (attribute per instance).

  • Visual sorting of instances (depth management). Complex. Requires sorting of all attribute buffers connected with a particular instance. For big buffers (many instances) it may require significant CPU -> GPU data upload. For example, taking the last element to the front, would require shifting all attributes in all buffers, which basically would mean uploading all data to the GPU from scratch for that particular geometry. Also, this would require keeping instance IDs in some kind of Rc<Cell<usize>>, as during sorting, the instance IDs will change, so all sprites would need to be updated.

B. Drawing non-instanced, indexed geometry.

  • Rendering. Very fast. May be faster than architecture (A). See it's description to learn more.

  • Changing attribute & GPU memory consumption. 4 times slower and 4 times more memory hungry than architecture (A). Requires setting each attribute for each vertex (4 WebGL calls). During drawing, vertexes are re-used by using indexed geometry rendering.

  • Visual sorting of instances (depth management). The same issues as in architecture (A). Even more CPU -> GPU heavy, as the attribute count is bigger.

C. Drawing non-instanced, non-indexed geometry. Using indexing for sorting.

  • Rendering. Very fast. May be faster than architecture (A). See it's description to learn more.

  • Changing attribute & GPU memory consumption. 6 times slower and 6 times more memory hungry than architecture (A). Requires setting each attribute for each vertex (6 WebGL calls). During drawing, vertexes are not re-used, and thus we need to set attributes for each vertex of each triangle.

  • Visual sorting of instances (depth management). Simple. We can re-use index buffer to sort the geometry by telling GPU in what order it should render each of the vertexes. Unlike previous architectures, this would not require to create any more internally mutable state regarding attribute index management (the indexes will not change during sorting).

    However, sorting for the needs of memory compression (removing unused memory for sparse attrib arrays) would still require re-uploading sorted data to GPU, just as in architecture (A).

D. Keeping all attribute values in a texture and passing index buffer to the shader.

This is a very different architecture to what is currently implemented and might require very complex refactoring in order to be even tested and benchmarked properly. To learn more about the idea, follow the link: https://stackoverflow.com/a/65376034/889902.

  • Rendering. Fast. May be slower than architecture (A). Needs real benchmarks.

  • Changing attribute & GPU memory consumption. Changing attribute would require 2 WebGL calls: the bindTexture, and texParameterf (or similar). Performance of this solution is questionable, but in real life, it may be as fast as architecture (A). The memory consumption should be fine as well, as WebGL textures behave like C++ Vectors, so even if we allocate the texture of max size, it will occupy only the needed space. This will also limit the number of instances on the stage, but the limit will be big enough (assuming max texture od 2048px x 2048px and 20 float attributes per shader, this will allow us to render over 200 000 shapes). Also, this architecture would allow us to pass more attributes to shaders than it is currently possible, which on the other hand, would probably negatively affect the fragment shader performance.

  • Visual sorting of instances (depth management). Simple. Next to the attribute texture, we can pass index buffer to the shader, which will dictate what initial offset in the texture should be used. This would allow for the fastest sorting mechanism of all of the above architectures.

    However, sorting for the needs of memory compression (removing unused memory for sparse attrib arrays) would still require re-uploading sorted data to GPU, just as in architecture (A).

E. Using the depth-buffer for sorting.

As with architecture (C), this is a very different architecture to what is currently implemented and might require very complex refactoring in order to be even tested and benchmarked properly. This architecture, however, is the most common architecture among all WebGL / OpenGL applications, but it is not really well suitable for SDF-based shapes rendering, as it requires anti-aliasing to be done by multisampling, which is not needed with SDF-based rasterization. It lowers the quality and drastically increases the rendering time (in the case of 4x4 multisampling, the rendering time is 16x bigger than the time of architecture (A)).

There is one additional thread to consider here, namely, with some browsers, systems, and GPU combinations, the super-sampling anti-aliasing is not accessible in WebGL. In such situations we could use a post-processing anti-aliasing techniques, such as [FXAA][1] or [SMAA][2], however, the resulting image quality will be even worse. We could also use custom multi-sampled render buffers for implementing [multi-sampled depth buffers][3]. [1] https://github.com/mitsuhiko/webgl-meincraft/blob/master/assets/shaders/fxaa.glsl [2] http://www.iryoku.com/papers/SMAA-Enhanced-Subpixel-Morphological-Antialiasing.pdf [3] https://stackoverflow.com/questions/50613696/whats-the-purpose-of-multisample-renderbuffers

  • Rendering. May be 9x - 16x slower than architecture (A), depending on multi-sampling level. Also, the final image quality and edge sharpness will be lower. There is, however, an open question, whether there is an SDF-suitable depth-buffer sorting technique which would not cause such downsides (maybe involving SDF-based depth buffer). Currently, we don't know of any such technique.

  • Changing attribute & GPU memory consumption. Fast with low memory consumption. The same as with architecture (A), (B), or (C).

  • Visual sorting of instances (depth management). Simple and fast. Much faster than any other architecture listed before, as it does not require upfront CPU-side buffer sorting.

F. Using depth-peeling / dual depth-peeling algorithms.

As with architecture (C), this is a very different architecture to what is currently implemented and might require very complex refactoring in order to be even tested and benchmarked properly. The idea is to render the scene multiple times, as long as some objects do overlap, by "peeling" the top-most (and bottom-most) layers every time. See the [Interactive Order-Independent Transparency][1], the [Order Independent Transparency with Dual Depth Peeling][2], and the [sample WebGL implementation][3] to learn more.

[1] https://my.eng.utah.edu/~cs5610/handouts/order_independent_transparency.pdf [2] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.193.3485&rep=rep1&type=pdf [3] https://medium.com/@shrekshao_71662/dual-depth-peeling-implementation-in-webgl-11baa061ba4b

  • Rendering. May be several times slower than architecture (A) due to the need to render the scene by peeling components. However, in contrast to the architecture (D), the final image quality should be as good as with architecture (A), (B), or (C).

  • Changing attribute & GPU memory consumption. Fast with low memory consumption. The same as with architecture (A), (B), or (C).

  • Visual sorting of instances (depth management). Simple and fast. As fast as architecture (E), as it does not require upfront CPU-side buffer sorting.