Previously the test determining whether to use texture maginifaction or
texture minification was reversed. This commit fixes the test and also
provides an early out of the sampler in case of texture magnification
since magnification does not make use of mipmaps.
This replaces the fixed point subpixel precision logic.
GLQuake now effectively renders artifact-free. Previously white/gray
pixels would sometimes be visible at triangle edges, caused by slightly
misaligned triangle edges as a result of converting the vertex window
coordinates to `int`. These artifacts were reduced by the introduction
of subpixel precision, but not completely eliminated.
Some interesting changes in this commit:
* Applying the top-left rule for our counter-clockwise vertices is now
done with simpler conditions: every vertex that has a Y coordinate
lower than or equal to the previous vertex' Y coordinate is counted
as a top or left edge. A float epsilon is used to emulate a switch
between `> 0` and `>= 0` comparisons.
* Fog depth calculation into a `f32x4` is now done once per triangle
instead of once per fragment, and only if fog is enabled.
* The `one_over_area` value was previously calculated as `1.0f / area`,
where `area` was an `int`. This resulted in a lower quality
reciprocal value whereas we can now retain floating point precision.
The effect of this can be seen in Tux Racer, where the ice reflection
is noticeably smoother.
There were some issues with the old code: we were saving the length of
the light vector but not actually using it anywhere, if we were dealing
with a zero-vector this could potentially divide by zero resulting in a
black fragment color, and we were erroneously using P2's length instead
of P1's length when P1's W coordinate is zero in the SGI arrow
operation.
This fixes some lighting bugs in Grim Fandango, but this probably
affects all lighting as well.
This fixes the issue where e.g. `299.97` would be cast to an integer
value of `299`, whereas the pixel's center would lie at `299.5` and
would then erroneously be excluded.
Currently, LibSoftGPU is still OpenGL-minded in that it uses a
coordinate system with the origin of `(0, 0)` at the lower-left of
textures, buffers and window coordinates. Because we are blitting to a
`Gfx::Bitmap` that has the origin at the top-left, we need to flip the
Y-coordinates somewhere in the rasterization logic.
We used to do this during conversion of NDC-coordinates to window
coordinates. This resulted in some incorrect behavior when
rasterization did not pass through the vertex transformation logic,
e.g. when calling `glDrawPixels`.
This changes the coordinate system to OpenGL's throughout, only to blit
the final color buffer upside down to the target bitmap. This fixes
drawing to the depth buffer directly resulting in upside down images.
Between the OpenGL client and server, a lot of data type and color
conversion needs to happen. We are performing these conversions both in
`LibSoftGPU` and `LibGL`, which is not ideal. Additionally, some
concepts like the color, depth and stencil buffers should share their
logic but have separate implementations.
This is the first step towards generalizing our `LibSoftGPU` frame
buffer: a generalized `Typed3DBuffer` is introduced for arbitrary 3D
value storage and retrieval, and `Typed2DBuffer` wraps around it to
provide in an easy-to-use 2D pixel buffer. The color, depth and stencil
buffers are replaced by `Typed2DBuffer` and are now managed by the new
`FrameBuffer` class.
The `Image` class now uses multiple `Typed3DBuffer`s for layers and
mipmap levels. Additionally, the textures are now always stored as
BGRA8888, only converting between formats when reading or writing
pixels.
Ideally this refactor should have no functional changes, but some
graphical glitches in Grim Fandango seem to be fixed and most OpenGL
ports get an FPS boost on my machine. :^)
This function was added as a FIXME but was then arbitrarily invoked in
the rest of `Device`. We are better off removing this FIXME for now and
reevaluate introducing multithreading later on, so the code is not
littered with useless empty function calls.
The `ClipPlane` enum is being looped over at run-time performing
run-time dispatch to determine the comparison operation in
`point_within_clip_plane`.
Change this `for` loop to be linear code which dispatches using a
template parameter. This allows for the `point_within_clip_plane`
function to do compile-time dispatch.
Note: This linear code can become a compile-time loop when static
reflection lands in C++2[y|z] allowing looping over the reflected
`enum class`.
The clipping logic is not DRY (Don't Repeat Yourself). The same logic
is repeated in multiple parts of an `if-else` statement. This can be
simplified to contain fewer branches and eliminate the redundant code.
Much of the `Clipper` class can be made free functions and their scope
limited.
The purpose of this is to prepare the interface for a change to more
compile-time dispatch.
Clearing the `m_alpha_blend_factors` is performed manually and in
separate steps. This is error prone for future developers. The
behavior is to reset the entire `struct` to the same state as default
initialization, so this simplifies it to do just that.
Problem:
- The statistics overlay period is hardcoded to 500 ms. This time is
very short and can result in the values being very "jumpy".
Solution:
- Increasing this value can result in more steady values which is
useful when trying to evaluate the performance impact of a change. A
new config value is offered in `Config.h` to let the developer
change to any value desired.
OpenGL mandates at least 2 texture units when multitexturing is
supported. This keeps our vertices lean and gives a nice speed
improvement in glquake. Until we support shaders this should be enough.
We now have one set of texture coordinates per texture unit.
Texture coordinate generation and texture coordinate assignment is
currently only stubbed. This will be rectified in another commit.
This function is used quite a bit during the lighting calculations, so
it's a bit cleaner having it in a centralized spot instead of just
arbitrarily calling `dot()` with numerous `FloatVector3` conversions.
This implements an 8-bit front stencil buffer. Stencil operations are
SIMD optimized. LibGL changes include:
* New `glStencilMask` and `glStencilMaskSeparate` functions
* New context parameter `GL_STENCIL_CLEAR_VALUE`
Implements support for `glRasterPos` and updating the raster position's
window coordinates through `glBitmap`. The input for `glRasterPos` is
an object position that needs to go through the same vertex
transformations as our regular triangles.
When `GL_COLOR_MATERIAL` is enabled, specific material parameters can
be overwritten by the current color per-vertex during the lighting
calculations. Which parameter is controlled by `glColorMaterial`.
Also move the lighting calculations _before_ clipping, because the spec
says so. As a result, we interpolate the resulting vertex color instead
of the input color.
If there's less than 3 vertices, we cannot do triangle strip otherwise
we will go out-of-bounds of the vertices vector.
Required for Half-Life, which sometimes submits 0 vertices for triangle
strip when drawing the electric disks around the pillars in Xen.
This was currently only set in the OpenGL context, as the previous
architecture did all of the transformation in LibGL before passing the
transformed triangles onto the rasterizer. As this has now changed, and
we require the vertex data to be in eye-space before we can apply
lighting, we need to pass this flag along as well via the GPU options.
Most of the T&L stuff is, like on an actual GPU, now done inside of
LibSoftGPU. As such, it no longer makes sense to have specific values
like the scene ambient color inside of LibGL as part of the GL context.
These have now been moved into LibSoftGPU and use the same pattern as
the render options to set/get.
These two functions have been turned from stubs into actually doing
something. They now set the correspondingmaterial data member based on
the value passed into the `pname`argument.
Co-authored-by: Stephan Unverwerth <s.unverwerth@serenityos.org>
This implements the `glLightf{v}` family of functions used to set
lighting parameters per light in the GL. It also fixes an incorrect
prototype for the user exposed version of `glLightf{v}` in which
`params` was not marked as `const`.
This is required to allow lighting to work properly in the GL. We
currently have the maximum number of lights in the software GL context
set to 8, as this is the minimum that OpenGL mandates according to the
spec.
Previously, we were expecting triangles and quads to consist of
complete sets of vertices. However, a more common behavior is to ignore
all vertices that do not make up a full primitive. For example, OpenGL
specifies for `GL_QUADS`:
"The total number of vertices between Begin and End is 4n + k, where
0 ≤ k ≤ 3; if k is not zero, the final k vertices are ignored."
This changes the behavior of `Device::draw_primitives()` to both return
early if no full set of vertices was provided, and to ignore any
additional vertices that are not part of a full set.
The name `scissor_box_to_window_coordinates` was wildy inaccurate since
we are actually transforming window coordinates into whatever the
coordinate space of the backing bitmap is.
This adds a half pixel offset to the edge value calculation in order to
sample the triangle at pixel centers. This is in line with actual OpenGL
rasterization rules and generates correctly interpolated vertex
attributes including texture coordinates.
With the RASTERIZER_BLOCK_SIZE gone we can now render to any size, even
odd ones. We have to be careful to not generate out of bounds accesses
when calculating the render target and depth buffer pointers. Thus we
check the coverage mask and generate nullptrs for pixels that will not
be updated. This also masks out pixels that would touch the triangle but
are outside the render target/scissor rect bounds.
Since the alpha blend configuration should not change between most calls
of draw_primitives it makes no sense to reinitialize the blend factors
for every rasterized triangle.
The alpha blend factors are now set up whenever the device config
changes. The blend factors are stored in struct AlphaBlendFactors.
This adds member functions Device::rasterize_triangle() and
Device::shade_fragments(). They were free standing functions/lambdas
previously which led to a lot of parameters being passed around.
This adds a counter to the debug overlay that displays the average
percentage of SIMD lane utilization.
This number represents the number of pixels that were output for each
quad. A utilization of 100% means that all 4 SIMD lanes were used and
no pixels were masked out before being written to the color buffer.
This snaps vertices to 1/32 of a pixel before rasterization resulting
in smoother movement and less floaty appearance of moving triangles.
This also reduces the severity of the artifacts in the glquake port.
5 bits should allow up to 1024x1024 render targets. Anything larger
needs a different implementation.
This displays statistics regarding frame timings and number of pixels
rendered.
Timings are based on the time between draw_debug_overlay() invocations.
This measures actual number of frames presented to the user vs. wall
clock time so this also includes everything the app might do besides
rendering.
Triangles are counted after clipping. This number might actually be
higher than the number of triangles coming from LibGL.
Pixels are counted after the initial scissor and coverage test. Pixels
rejected here are not counted. Shaded pixels is the percentage of all
pixels that made it to the shading stage. Blended pixels is the
percentage of shaded pixels that were alpha blended to the color buffer.
Overdraw measures how many pixels were shaded vs. how many pixels the
render target has. e.g. a 640x480 render target has 307200 pixels. If
exactly that many pixels are shaded the overdraw number will read 0%.
614400 shaded pixels will read as an overdraw of 100%.
Sampler calls is simply the number of times sampler.sample_2d() was
called.
Texture coordinate generation is the concept of automatically
generating vertex texture coordinates instead of using the provided
coordinates (i.e. `glTexCoord`).
This commit implements support for:
* The `GL_TEXTURE_GEN_Q/R/S/T` capabilities
* The `GL_OBJECT_LINEAR`, `GL_EYE_LINEAR`, `GL_SPHERE_MAP`,
`GL_REFLECTION_MAP` and `GL_NORMAL_MAP` modes
* Object and eye plane coefficients (write-only at the moment)
This changeset allows Tux Racer to render its terrain :^)
Now that we calculate and store eye coordinates for each vertex, we
should use their `z` values for the fragment depth used in further fog
calculations.
This fixes the fog in Tux Racer :^)
This follows the OpenGL 1.5 spec much more closely. We need to store
the eye coordinates especially, since they are used in texture
coordinate generation and fog fragment depth calculation.
* LibGL now supports the `GL_NORMALIZE` capability
* LibSoftGPU transforms and normalizes the vertices' normals
Normals are heavily used in texture coordinate generation, to be
implemented in a future commit.
In the OpenGL fixed function pipeline, alpha testing should happen
before depth testing and writing. Since the tests are basically boolean
ANDs, we can reorder them however we like to improve performance and as
such, we perform early depth testing and delay the more expensive alpha
testing until we know which pixels to test.
However, we were already writing to the depth buffer during the depth
test, even if the alpha test fails later on. Depth writing should only
happen if depth testing _and_ writing is enabled.
This change introduces depth staging, deferring the depth write until
we are absolutely sure we should do so.
According to the Khronos FAQ on texture edge sampling, the `GL_CLAMP`
option was never implemented in hardware and as such, it was
deprecated. A lot of applications and games depend on `GL_CLAMP` not
really meaning `GL_CLAMP` but `GL_CLAMP_TO_EDGE`, so we introduce an
option to toggle this behavior at compile-time.
According to the documentation, we should switch around vertices every
other triangle to prevent front-face culling from removing them.
This allows Tux in Tux Racer to render correctly.
This adds a method `info()` to SoftGPU that returns the name of the
hardware vendor and device name, as well as the number of texture untis.
LibGL uses the returned texture unit count to initialize its internal
texture unit array.
Replaces the GLenum used in RasterizerConfig to select the draw buffer
with a simple boolean that disabled color output when the draw buffer
is set to GL_NONE on the OpenGL side.
We now sample textures from the device owned image samplers.
Passing of enabled texture units has been simplified by only passing a
list of texture unit indices.
This adds two methods, write_texels and read_texels, to the Image class.
Conversion between image formats happens automatically. The layout of
the client image data is passed in via ImageDataLayout struct.
This serves as the storage for all image types. 1D, 2D, 3D, Cube and
image arrays.
Upon construction a full mipmap chain is generated and the image is
immutable afterwards with respect to its layout.
This introduces a new library, LibSoftGPU, that incorporates all
rendering related features that formerly resided within LibGL itself.
Going forward we will make both libraries completely independent from
each other allowing LibGL to load different, possibly accelerated,
rendering backends.