We can't decode any actual image data yet, but it shows that we can
read the basics of the container format. (...as long as there's an
Annex I container around the data, not just an Annex A codestream.
All files I've found so far have the container.)
I drew the thes input in Acorn.app and used "Save as..." to save it as
JPEG2000. It's an RGBA image.
JPEG2000 is the last image format used in PDF filters that we
don't have a loader for. Let's change that.
This adds all the scaffolding, but no actual implementation yet.
This doesn't have to be a virtual method: it's called from
various create_from_stream() methods that have a static type
that's created. There's no point in the virtual call here,
and it makes it harder to add additional parameters to
read_from_stream() in some subclasses.
This adds a test for the code added in #23696.
I created this file using `jbig2` (see below for details), but as
usual it required a bunch of changes to it to make it actually produce
spec-compliant output. See the PR adding this image for my local diff.
I created the test image file by running this shell script with
`jbig2` tweaked as described above:
#!/bin/bash
set -eu
S=Tests/LibGfx/test-inputs/bmp/bitmap.bmp
# See make-symbol-jbig.sh (the script in #23659) for the general
# setup and some comments. Note that the symbol section here only
# has 3 symbols, instead of 4 over there.
#
# `-RefID` takes 6 arguments:
# 1. The symbol ID of the base symbol (like after an `-ID`)
# 2. A bmp file that the base symbol gets refined to
# 3. y, x (like after an `-ID`)
# 4. dx, dy (note swapped order to previous item)
#
# We also explicitly set refinement adaptive pixels, because the
# default adaptive refinement pixels aren't the nominal pixels from
# the spec.
cat << EOF > jbig2-symbol-textrefine.ini
-sym -Seg 1
-sym -file -numClass -HeightClass 3 -WidthClass 1
-sym -file -numSymbol 3
-sym -file -Height 250
-sym -file -Width 120 -Simple 0 mouth-1bpp.bmp
-sym -file -EndOfHeightClass
-sym -file -Height 100
-sym -file -Width 100 -Simple 1 nose-1bpp.bmp
-sym -file -EndOfHeightClass
-sym -file -Height 30
-sym -file -Width 30 -Simple 2 top_eye-1bpp.bmp
-sym -file -EndOfHeightClass
-sym -Param -Huff_DH 0
-sym -Param -Huff_DW 0
-txt -Seg 2
-txt -Param -numInst 4
-ID 2 108 50 -RefID 2 bottom_eye-1bpp.bmp 265 60 0 0
-ID 1 100 135 -ID 0 70 232
-txt -Param -RefCorner 1
-txt -Param -Xlocation 0
-txt -Param -Ylocation 0
-txt -Param -W 399
-txt -Param -H 400
-txt -Param -rATX1 -1
-txt -Param -rATY1 -1
-txt -Param -rATX2 -1
-txt -Param -rATY2 -1
EOF
J=$HOME/Downloads/T-REC-T.88-201808-I\!\!SOFT-ZST-E/Software
J=$J/JBIG2_SampleSoftware-A20180829/source/jbig2
$J -i "${S%.bmp}" -f bmp -o symbol-textrefine -F jb2 -ini \
jbig2-symbol-textrefine.ini
...but only as long as REFAGGNINST == 1. That's enough for 0000337.pdf.
Except that it also needs GRTEMPLATE=1 support in the generic
refinement region decoding procedure, so no behaivor change yet.
...instead of a lambda that checks the template on every call.
Doesn't make a performance difference locally, but seems maybe nicer?
No behavior change.
Template 2 is needed by some symbols in 0000372.pdf page 11 and
0000857.pdf pages 1-4. Implement the others too while here. (The
mentioned pages in those two PDFs also use the "end of stripe" segment,
so they still don't render yet.
We still don't support EXTTEMPLATE.
Instead of fetching a generic set of metrics for each glyph, only fetch
the advance when that's all we need.
This is extremely hot in LibWeb text layout, where it makes a nice dent.
Although it has some interesting properties, SipHash is brutally slow
compared to our previous hash function. Since its introduction, it has
been highly visible in every profile of doing anything interesting with
LibJS or LibWeb.
By switching back, we gain a 10x speedup for 32-bit hashes, and "only"
a 3x speedup for 64-bit hashes.
This comes out to roughly 1.10x faster HashTable insertion, and roughly
2.25x faster HashTable lookup. Hashing is no longer at the top of
profiles and everything runs measurably faster.
For security-sensitive hash tables with user-controlled inputs, we can
opt into SipHash selectively on a case-by-case basis. The vast majority
of our uses don't fit that description though.
...and make text_region_decoding_procedure() call it.
generic_refinement_region_decoding_procedure() still just returns
"unimplemented", so no behavior change yet.
Text segments using refinement are still rejected later, by
text_region_decoding_procedure(). But we deserialize the input data now,
and the error when this feature is used is now slightly different.