On my machine, benchmarking 3DFileViewer revealed ~2.5% of CPU time
spent in `Vector<GPU::Vertex>::try_append`. By carefully managing list
capacities, we can remove this method from profiles altogether.
LibSoftGPU used to calculate the normal transformation based on the
model view transformation for every primitive, because that's when we
sent over the matrix. By making LibGL a bit smarter and only update the
matrices when they could have changed, we only need to calculate the
normal transformation once on every matrix update.
When viewing `Tuba.obj` in 3DFileViewer, this brings the percentage of
time spent in `FloatMatrix4x4::inverse()` down from 15% to 0%. :^)
Previously, calling `.right()` on a `Gfx::Rect` would return the last
column's coordinate still inside the rectangle, or `left + width - 1`.
This is called 'endpoint inclusive' and does not make a lot of sense for
`Gfx::Rect<float>` where a rectangle of width 5 at position (0, 0) would
return 4 as its right side. This same problem exists for `.bottom()`.
This changes `Gfx::Rect` to be endpoint exclusive, which gives us the
nice property that `width = right - left` and `height = bottom - top`.
It enables us to treat `Gfx::Rect<int>` and `Gfx::Rect<float>` exactly
the same.
All users of `Gfx::Rect` have been updated accordingly.
Copying over every texel (4x`f32x4`) for every texture unit is
relatively expensive. By checking if we even need to remember these
texel values, we reduce the time spent in `rasterize_triangle` by
around 2% as measured in Quake III.
Previously, we would precalculate "alpha blend factors" on every
configuration update and then calculate the source and destination
blending factors in one go using all these factors. The idea here was
probably that we would get better performance by avoiding branching.
However, by measuring blending performance in Quake III, it seems that
this simpler version that only calculates the required factors reduces
the CPU time spent in `rasterize_triangle` by 3%.
As a bonus, `GL_SRC_ALPHA_SATURATE` is now also implemented.
We changed elapsed() to return i64 instead of int as that's what
AK::Time::to_milliseconds() returns, causing a bunch of implicit lossy
conversions in callers. Clean those up with a mix of type changes and
casts.
Same as with inputs, we define outputs as a generic array of floats.
This can later be expanded to accomodate multiple render targets or
vertex attributes in the case of a vertex shader.
Previously we would store vertex color and texture coordinates in
separate fields in PixelQuad. To make them accessible from shaders we
need to store them as a completely generic array of floats.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
Up until now, we have only dealt with games that pass Q = 1 for their
texture coordinates. PrBoom+, however, relies on proper homogenous
texture coordinates for its relatively complex sky rendering, which
means that we should perform this per-fragment division.
Each texture unit now has its own texture transformation matrix stack.
Introduce a new texture unit configuration that is synced when changed.
Because we're no longer passing a silly `Vector` when drawing each
primitive, this results in a slightly improved frames per second :^)
Looking at how Khronos defines layers:
https://www.khronos.org/opengl/wiki/Array_Texture
We both have 3D textures and layers of 2D textures, which can both be
encoded in our existing `Typed3DBuffer` as depth. Since we support
depth already in the GPU API, remove layer everywhere.
Also pass in `Texture2D::LOG2_MAX_TEXTURE_SIZE` as the maximum number
of mipmap levels, so we do not allocate 999 levels on each Image
instantiation.
This makes it consistent with our other `blit_from_color_buffer` and
paves the way for a third method that will be introduced in one of the
next commits.
`GL_COMBINE` is basically a fixed function calculator to perform simple
arithmetics on configurable fragment sources. This patch implements a
number of texture env parameters with support for the RGBA internal
format.
OpenGL allows GPUs to approximate a triangle's maximum depth slope
which prevents a number computationally expensive instructions. On my
machine, this gives me +6% FPS in Quake III.
We are able to reuse `render_bounds` here since it is the containing
rect of the (X, Y) window coordinates of the triangle, thus its width
and height are the maximum delta X and delta Y, respectively.
The Quake 3 port makes use of this extension to determine a more
efficient multitexturing strategy. Since LibSoftGPU supports it, let's
report the extension in LibGL. :^)
In OpenGL this is called the (base) internal format which is an
expectation expressed by the client for the minimum supported texel
storage format in the GPU for textures.
Since we store everything as RGBA in a `FloatVector4`, the only thing
we do in this patch is remember the expected internal format, and when
we write new texels we fixate the value for the alpha channel to 1 for
two formats that require it.
`PixelConverter` has learned how to transform pixels during transfer to
support this.
A GPU (driver) is now responsible for reading and writing pixels from
and to user data. The client (LibGL) is responsible for specifying how
the user data must be interpreted or written to.
This allows us to centralize all pixel format conversion in one class,
`LibSoftGPU::PixelConverter`. For both the input and output image, it
takes a specification containing the image dimensions, the pixel type
and the selection (basically a clipping rect), and converts the pixels
from the input image to the output image.
Effectively this means we now support almost all OpenGL 1.5 formats,
and all custom logic has disappeared from:
- `glDrawPixels`
- `glReadPixels`
- `glTexImage2D`
- `glTexSubImage2D`
The new logic is still unoptimized, but on my machine I experienced no
noticeable slowdown. :^)
Also skip the test for the `::Always` alpha test function in the hot
loop. This test function is very unlikely to be set, so leave that up
to `::test_alpha()`.
This commit implements glClipPlane and its supporting calls, backed
by new support for user-defined clip planes in the software GPU clipper.
This fixes some visual bugs seen in the Quake III port, in which mirrors
would only reflect correctly from close distances.
According to the OpenGL spec, we're expected to clamp the fragment
depth values to the range `0.f - 1.f` for all polygons.
This fixes Z-fighting issues with the sky in Quake 3.
This implements the depth offset factor that you can set through e.g.
OpenGL's `glPolygonOffset`. Without it, triangles might start to Z-
fight amongst each other.
This fixes the floor decals in Quake 3.
Implement (anti)aliased point drawing and anti-aliased line drawing.
Supported through LibGL's `GL_POINTS`, `GL_LINES`, `GL_LINE_LOOP` and
`GL_LINE_STRIP`.
In order to support this, `LibSoftGPU`s rasterization logic was
reworked. Now, any primitive can be drawn by invoking `rasterize()`
which takes care of the quad loop and fragment testing logic. Three
callbacks need to be passed:
* `set_coverage_mask`: the primitive needs to provide initial coverage
mask information so fragments can be discarded early.
* `set_quad_depth`: fragments survived stencil testing, so depth values
need to be set so depth testing can take place.
* `set_quad_attributes`: fragments survived depth testing, so fragment
shading is going to take place. All attributes like color, tex coords
and fog depth need to be set so alpha testing and eventually,
fragment rasterization can take place.
As of this commit, there are four instantiations of this function:
* Triangle rasterization
* Points - aliased
* Points - anti-aliased
* Lines - anti-aliased
In order to standardize vertex processing for all primitive types,
things like vertex transformation, lighting and tex coord generation
are now taking place before clipping.
Our move to floating point precision has eradicated the pixel artifacts
in Quake 1, but introduced new and not so subtle rendering glitches in
games like Tux Racer. This commit changes three things to get the best
of both worlds:
1. Subpixel logic based on `i32` types was reintroduced, the number of
bits is set to 6. This reintroduces the artifacts in Quake 1 but
fixes rendering of Tux Racer.
2. Before triangle culling, subpixel coordinates are calculated and
stored in `Triangle`. These coordinates are rounded, which fixes the
Quake 1 artifacts. Tux Racer is unaffected.
3. The triangle area (actually parallelogram area) is also stored in
`Triangle` so we don't need to recalculate it later on. In our
previous subpixel code, there was a subtle disconnect between the
two calculations (one with and one without subpixel precision) which
resulted in triangles incorrectly being culled. This fixes some
remaining Quake 1 artifacts.