Text can be rendered in various ways in PDFs: Filled, stroked,
both filled and stroked, set as clipping path, hidden, or
some combinations thereof.
We don't implement any of this at the moment except "filled".
Hidden text is used in scanned documents: The image of the scan is
drawn in the background, and then OCRd text is "drawn" as hidden
on top of the scanned bitmap. That way, the (hidden) text can be
selected and copied, and it looks like you're selecting text from
the scanned bitmap. Find-in-page also works similarly. (We currently
have neither text selection nor find-in-page, but one day we will.)
Now that we have pretty good support for CCITT and are growing some
support for JBIG2, we now draw both the scanned background image
as well as the foreground text. They're not always perfectly aligned.
This change makes it so that we don't render text that's marked as
hidden. (We still do most of the coordinate math, which will probably
come in handy at some point when we implement text selection.)
This makes these scanned documents appear as they're supposed to
appear (at least in documents where we manage to decode the background
bitmap).
This also adds a debug option to force rendering of hidden text.
Before this change, we would wake up on every event loop iteration to
drive animations in single-frame images. This was a complete waste of
time and caused 100% CPU usage on our main GitHub repo page.
With this change, CPU usage is ~1% when idle on the same page. :^)
This commit introduces a WEB_SET_PROTOTYPE_FOR_INTERFACE macro that
caches the interface name in a local static FlyString. This means that
we only pay for FlyString-from-literal lookup once per browser lifetime
instead of every time the interface is instantiated.
Document::navigable() can be unpleasantly slow, since we don't have a
direct link between documents and navigables at the moment. So let's not
call it twice when once is enough.
This allows us to skip evaluating selectors like "[foo=bar]" for any
element that doesn't have a "foo" attribute.
Note that the bucket is case-insensitively keyed on the attribute name
since case sensitivity is depending on evaluation context. This ensures
we may get some false positives but no false negatives.
Reduces the number of selectors evaluated by 36% when loading our GitHub
repo at https://github.com/SerenityOS/serenity
We already grow the "rules to run" vector before appending to it, so we
can actually use unchecked_append() here and avoid the "needs to grow"
checks every time we append to it.
This takes appending from 3% to <1% when loading our GitHub repo.
When matching a CSS attribute selector against an HTML element, the
attribute name is case-insensitive. Before this change, that meant we
had to call equals_ignoring_ascii_case() on all the attribute names.
We now cache the attribute name lowercased on each Attr node, which
allows us to do FlyString-to-FlyString comparison (simple pointer
comparison).
This brings attribute selector matching from 6% to <1% when loading our
GitHub repo at https://github.com/SerenityOS/serenity
It seems to do the right thing already, and nothing in the spec says
not to do this as far as I can tell.
With this, we can finally decode
Tests/LibGfx/test-inputs/jbig2/bitmap.jbig2 and add a test for
decoding simple arithmetic-coded images.
This errors out on many special cases. None of those seem to be hit
in practice (with the exception of TPGDON, which is used in a handful
PDFs. I have an implementation of that locally, but I'll put that
in a separate PR. The code for it is straightforward, but adding a
test for it is a bit involved.)
With this, we can decode about half of the JBIG2 images in my PDF
test dataset.
In practice, everything uses white backgrounds and operators `or`
or `xor` to turn them black, at least for the simple images we're
about to be able to decode.
To make sure we don't forget implementing this for real once needed,
reject other ops, and also reject black backgrounds (because 1 | 0
is 1, not 0 like our overwrite implementation will produce).
This means we have to remove a test, but since this scenario doesn't
seem to happen in practice, that seems ok.
The context can vary for every bit we read.
This does not affect the one use in the test which reuses the same
context for all bits, but it is necessary for future changes.
The API value of a <textarea> element is its raw value with normalized
newlines. This should be used in a couple of places where we currently
use the raw value.
Patch up existing style properties instead of using the regular style
invalidation path, which requires rule matching for each element in the
invalidated subtree.
- !important properties: this change introduces a flag used to skip the
update of animated properties overridden by !important.
- inherited animated properties: for now, these are invalidated by
traversing animated element's subtree to propagate the update.
- StyleProperties has a separate array for animated properties that
allows the removal animated properties after animation has ended,
without requiring full style invalidation.
Rather than adding a bunch of `get_*_from_mime_type` functions, add just
one to get the Core::MimeType instance. We will need multiple fields at
once in Browser.
I think the context normally changes for every bit. But this here
is enough to correctly decode the test bitstream in Annex H.2 in
the spec, which seems like a good checkpoint.
The internals of the decoder use spec naming, to make the code
look virtually identical to what's in the spec. (Even so, I managed
to put in several typos that took a while to track down.)
The behavior of Crypto::UnsignedBigInt::export_data unexpectedly
does not actually remove leading zero bytes when the corresponding
parameter is passed. The caller must manually adjust for the location
of the zero bytes.