It seems to do the right thing already, and nothing in the spec says
not to do this as far as I can tell.
With this, we can finally decode the test input from #23659.
See f391c7822d for a similar change for generic regions and
lossless generic regions.
Text segments conceptually store (x,y,id) triples. (x,y) are a
coordinate, and id refers to an id from a symbol segment.
A text segment has the effect of drawing some of the bitmaps stored
in a symbol segment to the output bitmap.
For example, the symbol segment might contain a small bitmap that
happens to look like the letter 'A', and the text segment might
draw that everywhere a scanned page has an 'A'. (The JBIG2 format
only treats it as an abstract bitmap. It doesn't know that this
small bitmap is an 'A'.)
This is missing support for many things:
* Huffman-coded input (not used in practice)
* Symbol refinement
* Transposed symbols
* Colors (not used in practice)
Still, we now have basic symbol/text segment support. This is enough
to decode the downloadable PDF here:
https://www.google.com/books/edition/Paradise_Lost/6qdbAAAAQAAJ
It doesn't lead to any progression on my 1000 file test PDF set.
The 7 files in there that use JBIG2 with symbol and text segments
now fail to load for other reasons (4 need symbol refinement for
text segments, one needs end-of-stripe segment support, one needs
support for symbol segments referring to other segments).
(And possibly, many other PDFs from Google Books, but that's the
only one I've tried so far.)
This extracts the bitbuffer combining code we had into a new function
composite_bitbuffer() and adds the following features:
* Real support for combination operators (which also lets us allow black
as background color again, even if that's never used in practice)
* Clipping support (not used here yet, but will be needed elsewhere
soon)
We're going to need this for text segment handling.
No behavior change.
A symbol segment defines a bunch of small bitmaps and associates them
with numeric IDs.
This only implements reading symbols encoded with the arithmetic coder.
It does not support huffman coding. (In practice, everything seems to
use arithmetic coding.)
Support for refinement or aggregate coding isn't implemented yet.
Support for retaining bitmap coding contexts isn't implemented yet.
Support for symbol segments referring to other symbol segments isn't
implemented yet.
But all produce diagnostics if encountered, so we won't forget about
them. (I haven't seen either being used in the wild.)
No visible behavior change yet, but with JBIG2_DEBUG turned on,
it produces all kinds of debug output.
The symbol segment decoding procedure will read generic regions
that aren't at a byte boundary, and that share contexts across
several regions.
No behavior change.
The existing ArithmeticEncoder (from Annex E) reads one bit at a
time.
ArithmeticIntegerDecoder (from Annex A) builds on top of that to
read integer values.
This will be used by both the symbol segment and the text segment
readers.
(This does not yet implement the IAID decoding procedure in A.3.
We only need that one in the text segment decoder at the moment,
and it's pretty small, so I'll put it inline there for now.)
Not used yet, so no behavior change yet.
The spec for each of these state:
-> EOF:
This is an eof-in-comment parse error. Emit the current comment
token. Emit an end-of-file token.
We were neglecting to emit the current comment token before emitting an
EOF token. Note the existing EMIT_CURRENT_TOKEN macro was unused.
If box is sized as replaced it still could be anything, not only SVG.
This fixes crashing on https://www.shopify.com/ that was caused by a
missing paintable for a box that has a layout node. This occurred
because the box was not laid out in dimension_box_on_line().
`Node::shadow_including_root()` was missing a null check, which caused
a crash when manipulating a select element, whose option elements were
initially detached.
The HTMLMediaElement, for example, contains spec text which states any
ongoing fetch process must be "stopped". The spec does not indicate how
to do this, so our implementation is rather ad-hoc.
Our current implementation may cause a crash in places that assume one
of the fetch algorithms that we set to null is *not* null. For example:
if (fetch_params.process_response) {
queue_fetch_task([]() {
fetch_params.process_response();
};
}
If the fetch process is stopped after queuing the fetch task, but not
before the fetch task is run, we will crash when running this fetch
algorithm.
We now track queued fetch tasks on the fetch controller. When the fetch
process is stopped, we cancel any such pending task.
It is a little bit awkward maintaining a fetch task ID. Ideally, we
could use the underlying task ID throughout. But we do not have access
to the underlying task nor its ID when the task is running, at which
point we need some ID to remove from the pending task list.
...because "change" event should be dispatched on control even if it
has "display: none" style.
This change fixes selection in labels dropdown on GitHub's "new issue"
page.
Previously, ChunkID's from_big_endian_number() and
as_big_endian_number() weren't inverses of each other.
ChunkID::from_big_endian_number() used to take an u32 that contained
`('f' << 24) | ('t' << 16) | ('y' << 8) | 'p'`, that is
'f', 't', 'y', 'p' in memory on big-endian and 'p', 'y', 't', 'f'
on little-endian, and return a ChunkID for 'f', 't', 'y', 'p'.
ChunkID::as_big_endian_number() used to return an u32 that for a
ChunkID storing 'f', 't', 'y', 'p' was always 'f', 't', 'y', 'p'
in memory on both little-endian and big-endian, that is it stored
`('f' << 24) | ('t' << 16) | ('y' << 8) | 'p'` on big-endian and
`('p' << 24) | ('y' << 16) | ('t' << 8) | 'f'` on little-endian.
`ChunkID::from_big_endian_number(0x11223344).as_big_endian_number()`
returned 0x44332211.
This change makes the two methods self-consistent: they now take
and return a u32 that always has the first ChunkID part in the
highest bits of the u32 (`'f' << 24`), and so on. That also means
they return a u32 that in-memory looks differently on big-endian
and little-endian. Since that's normal for numbers, this also
renames the two methods to just `from_number()` and `to_number()`.
With the semantics cleared up, change the one use in ISOBMFF to read a
BigEndian for chunk headers and brand codes. This has the effect of
tags now being printed in the right order.
Before:
```sh
% Build/lagom/bin/isobmff ~/Downloads/sample1.jp2
Unknown Box (' Pj')
[ 4 bytes ]
('pytf') (version = 0, flags = 0x0)
- major_brand = ' 2pj'
- minor_version = 0
- compatible_brands = { ' 2pj' }
Unknown Box ('h2pj')
[ 37 bytes ]
Unknown Box ('fniu')
[ 92 bytes ]
Unknown Box (' lmx')
[ 2736 bytes ]
Unknown Box ('c2pj')
[ 667336 bytes ]
```
After:
```sh
% Build/lagom/bin/isobmff ~/Downloads/sample1.jp2
hmm 0x11223344 0x11223344
Unknown Box ('jP ')
[ 4 bytes ]
('ftyp' ) (version = 0, flags = 0x0)
- major_brand = 'jp2 '
- minor_version = 0
- compatible_brands = { 'jp2 ' }
Unknown Box ('jp2h')
[ 37 bytes ]
Unknown Box ('uinf')
[ 92 bytes ]
Unknown Box ('xml ')
[ 2736 bytes ]
Unknown Box ('jp2c')
[ 667336 bytes ]
```
This was causing some racey behaviour in LibHTTP, and just generally
lead to really bad stack traces; avoid that by switching to
Core::Promise and using the existing event loop.
Possibly resolves#23524 and #23642.
Given a selector like `.foo .bar #baz`, we know that elements with
the class names `foo` and `bar` must be present in the ancestor chain of
the candidate element, or the selector cannot match.
By keeping track of the current ancestor chain during style computation,
and which strings are used in tag names and attribute names, we can do
a quick check before evaluating the selector itself, to see if all the
required ancestors are present.
The way this works:
1. CSS::Selector now has a cache of up to 8 strings that must be present
in the ancestor chain of a matching element. Note that we actually
store string *hashes*, not the strings themselves.
2. When Document performs a recursive style update, we now push and pop
elements to the ancestor chain stack as they are entered and exited.
3. When entering/exiting an ancestor, StyleComputer collects all the
relevant string hashes from that ancestor element and updates a
counting bloom filter.
4. Before evaluating a selector, we first check if any of the hashes
required by the selector are definitely missing from the ancestor
filter. If so, it cannot be a match, and we reject it immediately.
5. Otherwise, we carry on and evaluate the selector as usual.
I originally tried doing this with a HashMap, but we ended up losing
a huge chunk of the time saved to HashMap instead. As it turns out,
a simple counting bloom filter is way better at handling this.
The cost is a flat 8KB per StyleComputer, and since it's a bloom filter,
false positives are a thing.
This is extremely efficient, and allows us to quickly reject the
majority of selectors on many huge websites.
Some example rejection rates:
- https://amazon.com: 77%
- https://github.com/SerenityOS/serenity: 61%
- https://nytimes.com: 57%
- https://store.steampowered.com: 55%
- https://en.wikipedia.org: 45%
- https://youtube.com: 32%
- https://shopify.com: 25%
This also yields a chunky 37% speedup on StyleBench. :^)
I've seen a crash when trying to verify_cast some block-level box to a
BlockContainer when it's actually something else.
This patch adds a debug log message so we can learn more about it next
time it happens somewhere.
Since we drive painting for SVG-as-image manually anyway, there's no
need for them to say they are "ready to paint", since that just causes
unnecessary extra processing in the HTML event loop.
We do the same thing with the gzip utility for performance.
This reduces the runtime of `./bin/base64 enwik8 >/dev/null` from
0.428s to 0.303s.
This reduces the runtime of `./bin/base64 -d enwik8.base64 >/dev/null`
from 0.632s to 0.469s.
(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
Instead of invalidating animated style properties whenever
`Document::update_style()` is called, now we only do that when
animations might have actually progressed. We still have to ensure
animated properties are up-to-date in `update_style()` to ensure that
JS methods can access updated style properties.
Before this change, we ran style and layout updates from both event
loop processing and update timers. This could have caused missed resize
observer updates and unnecessary updating of style or layout more than
once before repaint.
Also, we can now be sure unnecessary style or layout updates won't
happen in `EventLoop::spin_processing_tasks_with_source_until()`.
In our implementation of the "apply the history step" algorithm, we
have to spin-wait for the completion of tasks queued on the event loop.
Before this change, we allowed tasks from any source to be executed
while we were waiting. It should not be possible because it allows to
interrupt history step application by anything, including another
history step application.
Fixes https://github.com/SerenityOS/serenity/issues/23598
You can now run
image -o out.png Tests/LibGfx/test-inputs/bmp/bitmap.bmp \
--crop 130,86,108,114
and end up with the nose part of that image in out.png.
This isn't required as the StyleComputer will do this when animating,
but this allows the properties to be resolved once instead of on
every animation frame.
Note that we still pass AllowUnresolved::Yes because the properties will
not be resolved if there is no target.
When iterating through a @keyframes rule, it isn't possible to resolve
unresolved style properties since there are no elements. This change
allows those properties to simply pass through this helper function.
These will need to store unresolved styles as well, since they may be
built during parsing of a @keyframes rule. In that case there is no
target element or pseudo-element, and thus the value cannot be resolved.
structured_deserialize_internal() is added to support sub
deserialization from serializable interfaces serialization steps which
needs the ability to pass onto the current position in the deserialized
data.
This value is at most 46, so a u8 is enough.
We have tens of thousands of these contexts.
(We could pack the is_mps bit into that u8 as well, but
then the I() and MPS() functions need to return helper objects
instead of a direct reference, so let's not do that part for now.)
If a lossless webp has 3 or 4 colors, it uses 2 bits per pixel to
store an offset into a "color index" (which the spec explicitly does
not call palette since it says the 'color cache' is more like that).
This way, it can pack 4 pixels into a single pixel.
If the width of the output image wasn't evenly divisble by 4,
we used to write out-of-bounds in the last few columns of each
row, since we used to always write all 4 pixels.
Found by clusterfuzz. Probably fixes
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=66082
While here, spruce up the comments very slightly.
If an unexpected token is encountered when parsing an SVG attribute it
is now immediately propagated with ErrorOr. Previously, some situations
where an unexpected token was encountered could cause a crash.
If `Document::resolve()` was called during parsing, it'd change the
reader's current position, so the parsing code that called it would
then end up at an unexpected position in the file.
Parser.cpp already had special-case recovery when a stream's length
was stored in an indirect reference.
Commit ead02da98ac70c ("/JBIG2Globals") in #23503 added another case
where we could resolve indirect reference during parsing, but wasn't
aware of having to save and restore the reader position for that.
Put the save/restore code in `DocumentParser::parse_object_with_index`
instead, right before the place that ultimately changes the reader's
position during `Document::resolve`. This fixes `/JBIG2Globals` and
lets us remove the special-case code for `/Length` handling.
Since this is kind of subtle, include a test.
Use the new DOM tree version mechanism to allow HTMLCollection to
remember its internal list of elements instead of rebuilding it on
every access.
This avoids thousands of full DOM walks while loading our GitHub repo.
~15% speed-up on jQuery subtests in Speedometer 3.0 :^)
This patch adds a u64 version counter to DOM::Document that increments
whenever the tree structure changes (via node insertion or removal),
or an element attribute is changed somehow.
This will be used as a crude invalidation mechanism for HTMLCollection
to cache its elements.
- Compare only the animated properties
- Clone only the hash map containing animated properties, instead of
the entire StyleProperties.
Reduces `KeyframeEffect::update_style_properties()` from 10% to 3% in
GitHub profiles.
Non-recursive pseudo classes are easy to evaluate, so let's allow them
on the fast path.
Increases fast path coverage when loading our GitHub repo from 48% to
56% of all selectors evaluated.
If we determine that a selector is simple enough, we now run it using a
special matching loop that traverses up the DOM ancestor chain without
recursion.
The criteria for this fast path are:
- All combinators involved must be either descendant or child.
- Only tag name, class, ID and attribute selectors allowed.
It's definitely possible to increase the coverage of this fast path,
but this first version already provides a substantial reduction in time
spent evaluating selectors.
48% of the selectors evaluated when loading our GitHub repo are now
using this fast path.
18% speed-up on the "Descendant and child combinators" subtest of
StyleBench. :^)
This is much more useful than the previous format, as you can now just
paste the path into a site like https://svg-path-visualizer.netlify.app/
to debug issues.
These two options are additions of the PDF specification. They are valid
for both 1D and 2D, but let's bail out if we encounter them in a 2D
image, as we don't have a test case yet.
There is no need to run full table layout if we are only interested in
calculating its width.
This change reduces compute_table_box_width_inside_table_wrapper()
from ~30% to ~15% in profiles of "File changed" pages on github.
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.
This change has two main benefits:
* Moving AK back more towards being an agnostic library that can
be used between the kernel and userspace. URL has never really fit
that description - and is not used in the kernel.
* URL _should_ depend on LibUnicode, as it needs punnycode support.
However, it's not really possible to do this inside of AK as it can't
depend on any external library. This change brings us a little closer
to being able to do that, but unfortunately we aren't there quite
yet, as the code generators depend on LibCore.
Instead of wrapping every entry in Optional, use the null state of the
style pointer for the same purpose.
This shrinks StyleProperties by 1752 bytes per instance.