Commit Graph

125 Commits

Author SHA1 Message Date
MacDue
d55867e563 LibPDF: Fix paths with negatively sized re (rect) commands
Turns out the width/height in a `re` command can be negative. This
results in rectangles with different winding orders. For example, a
negative width results in a reversed winding order.

Previously, this was lost by passing the rect through an
`AffineTransform` before constructing the path. So instead, this
constructs the rect path, and then transforms the resulting path.
2024-01-16 21:31:20 +00:00
Nico Weber
9a93f677f4 LibPDF: Mark text rendering matrix as dirty after TJ numbers
Mostly because I audited all places that assigned to `m_text_matrix`
after #22760.

This one is very difficult to trigger in practice.

`show_text()` marks the text rendering matrix dirty already,
so this only has an effect if the `TJ` array starts with a
number, and the matrix isn't marked dirty going in.

`Tm` caches the text rendering matrix, so I changed text.pdf
to contain:

```
1 0 0 1 45 130 Tm
[ 200 (Hello) -2000 (World) ] TJ T*
```

This first sets an x offset of 5 (on top of the normal 40), and
then undoes it (`200` is multiplied by font size (25) / -1000,
and `200 * 25 / -1000` is -5). Before this change, the topmost
"Hello World" ended up slightly indented.

Likely no behavior change in practice, but makes the code easier
to understand, and maybe it helps in the wild somewhere.
2024-01-15 08:39:04 +00:00
Nico Weber
f23f5dcd62 LibPDF: Mark text rendering matrix dirty for Td operator
0000342.pdf page 5 contains this snippet:

```
/T1_1 10.976 Tf
0 -31.643 TD
(This)Tj

1 0 0 1 54 745.563 Tm
22.181 -31.643 Td
[(vehicle)-270.926(uses)...
```

The `Tm` marked the text rendering matrix as dirty at the start,
but it then calls calculate_text_rendering_matrix() almost in the
next line, which recalculates the text rendering matrix and caches
the new matrix. The `Td` used to not mark it as dirty, and we'd
draw "vehicle" with an incorrect matrix.
2024-01-15 08:37:55 +00:00
Nico Weber
f4ee9a2333 LibPDF: Support drawing images with 16 bits per channel
This uses the tried-and-true "throw away the lower 8 bits" technique
for now. This lets us render  Tests/LibPDF/wide-gamut-only.pdf.
2024-01-12 16:20:46 -07:00
Nico Weber
5f85aff036 LibPDF: Move ColorSpace::style() to take ReadonlySpan<float>
All ColorSpace subclasses converted to float anyways, and this
allows us to save lots of float->Value->float conversions during
image color space processing.

A bit faster:

```
    N           Min           Max        Median         Avg       Stddev
x  50    0.99054313     1.0412271    0.99933481   1.0052408  0.012931916
+  50    0.97073889     1.0075941    0.97849107  0.98184034 0.0090329046
Difference at 95.0% confidence
	-0.0234004 +/- 0.00442595
	-2.32785% +/- 0.440287%
	(Student's t, pooled s = 0.0111541)
```
2024-01-12 12:37:56 +00:00
Nico Weber
f7fc2df8ac LibPDF: Simplify load_image() a tiny bit
Images can't use Pattern color spaces, so we'll always have a Color.

No behavior (or perf) change.
2024-01-10 23:26:57 +01:00
Nico Weber
df5451a889 LibPDF: Mark text rendering matrix dirty after changing it in text_begin
A certain PDF was drawing some text used `9 0 0 9 474.54 700.6801 Tm`
to set the text matrix to a matrix that scaled by 9 in one text object.

Then, after ending that text object, it had the following new text
object which contained nothing that invalidated the text matrix:

```
BT
/F1 7 Tf
/DeviceRGB CS
0 0 0 SC
10 TL
86.37849 21.908 Td
(Authorized licensed use limited to: ...) Tj
ET
```

`BT` did reset it as required, but since we didn't mark the matrix
as dirty, we never recomputed it and drew the additional text scaled
up 9x.
2024-01-10 19:42:08 +01:00
Nico Weber
4fd5d450be LibPDF: Add support for image masks
An image mask is a 1-bit-per-pixel bitmap that's black where the
current color should be painted, and white where it should be
transparent (think: like ink).

load_image() already converts images like this into 8-bit-per-pixel
images that have 0xff, 0xff, 0xff in rgb for opaque (originally 0 bit)
pixels and 0, 0, 0 in rgb for transparent pixels.

So we just move copy the image mask's image data into the alpha
channel and replace rgb with the current color, and then draw
it like a regular bitmap.
2024-01-10 09:10:11 +00:00
Nico Weber
e770cf06b0 LibPDF: Send jpeg data down the same path as all other data
JPEG images now honor decode arrays and color spaces.
2024-01-10 09:39:00 +01:00
Nico Weber
b63eb4a4dd LibPDF: Implement /Mask support with stream object argument 2023-12-23 20:39:11 +01:00
Nico Weber
a3507ef65b LibPDF: Move error for /ImageMask out of load_image()
...and tweak load_image() to support loading mask images
(which don't have a color space and are always 1 bit per pixel).
2023-12-23 20:39:11 +01:00
Nico Weber
3ad9782e25 LibPDF: Extract a apply_alpha_channel() function
No behavior change.
2023-12-23 20:39:11 +01:00
Nico Weber
4bd11c8eb4 LibPDF: Show a 'rendering unsupported' error for images with /Mask key 2023-12-23 20:39:11 +01:00
Nico Weber
387fecea7f LibPDF: Fix typo in a variable name
No behavior change.
2023-12-23 10:10:24 +01:00
Nico Weber
6032c06f6b Revert "LibPDF: Add basic tiled, coloured pattern rendering"
This reverts commit 8ff87911a3.
2023-12-21 19:24:56 +01:00
Nico Weber
7cb216c95b Revert "LibPDF: Offset PaintStyle when painting so pattern overlaps..."
This reverts commit 8c7fc4fe6c.
2023-12-21 19:24:56 +01:00
Nico Weber
6de32e5359 LibPDF: Draw inline images
The idea is to massage the inline image data into something that
looks like a regular image, and then use the normal image drawing code:
We translate the inline image abbreviations to the expanded version at
rendering time, then unfilter (i.e. uncompress) the image data at
rendering time, and the go down the usual image drawing path.

Normal streams are unfiltered when they're first accessed, but
inline image streams live in a page's drawing operators, and this
fits the current approach of parsing a page's operators anew
every time the page is rendered.

(We also need to add some special-case handling for color spaces
of inline images: Inline images can use named color spaces, while
regular images always use direct color space objects.)
2023-12-20 12:45:16 -07:00
Nico Weber
022fce75a6 LibPDF: Get inline image data from parser to renderer
We create a inline_image_end operator that has all the relevant data
in a synthetic StreamObject.

inline_image_end is still a RENDERER_TODO(), so no real behavior
change. (Previously we'd call only inline_image_begin, so string the
todo message is about is now a bit different. But no interesting
behavior change.)
2023-12-20 12:19:08 +01:00
Nico Weber
b21f867e88 LibPDF: Don't crash on images with empty filter arrays
0000967.pdf page 2 contains a bunch of inline images with empty
filter arrays.
2023-12-20 12:19:08 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Kyle Pereira
8c7fc4fe6c LibPDF: Offset PaintStyle when painting so pattern overlaps properly 2023-12-10 16:44:24 +01:00
Kyle Pereira
8ff87911a3 LibPDF: Add basic tiled, coloured pattern rendering 2023-12-10 16:44:24 +01:00
Kyle Pereira
8191f2b47a LibPDF: Add parameter for background color of render 2023-12-10 16:44:24 +01:00
Kyle Pereira
60c4803dd3 LibPDF: Pass Renderer to ColorSpace 2023-12-10 16:44:24 +01:00
Kyle Pereira
082a4197b6 LibPDF: Use Variant<Color, PaintStyle> instead of Color for ColorSpaces
This is in anticipation of Pattern color space support which does not
yield a simple color.
2023-12-10 16:44:24 +01:00
Nico Weber
832a065687 LibPDF: For low-bpp images, start scanlines on byte boundaries
Required per spec, and we get slanted images without it. Fixes e.g.
page 1 of 0000749.pdf.
2023-12-07 08:10:40 +00:00
Nico Weber
06b9633da5 LibPDF: For indexed images with 1, 2 or 4 bpp, do not repeat bit pattern
When upsampling e.g. the 4-bit value 0b1101 to 8-bit, we used to repeat
the value to fill the full 8-bits, e.g. 0b11011101. This maps RGB colors
to 8-bit nicely, but is the wrong thing to do for palette indices.
Stop doing this for palette indices.

Fixes "Indexed color space index out of range" for 11 files in the
PDF/A 0000.zip test set now that we correctly handle palette indices
as of the previous commit:

    Malformed PDF file: Indexed color space lookup table doesn't match
                        size, in 4 files, on 8 pages, 73 times
      path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x)
      path/to/0000/0000364.pdf 5 6
      path/to/0000/0000918.pdf 5
      path/to/0000/0000683.pdf 8
2023-12-07 08:10:40 +00:00
Nico Weber
f34da6396f LibPDF: Update font size after getting font from cache
Page 1 of 0000277.pdf does:

    BT 22 0 0 22  59  28 Tm /TT2 1 Tf
        (Presented at Photonics West OPTO, February 17, 2016) Tj ET
    BT 32 0 0 32 269 426 Tm /TT1 1 Tf
        (Robert W. Boyd) Tj ET
    BT 22 0 0 22 253 357 Tm /TT2 1 Tf
        (Department of Physics and) Tj ET
    BT 22 0 0 22 105 326 Tm /TT2 1 Tf
        (Max-Planck Centre for Extreme and Quantum Photonics) Tj ET

Every line begins a text operation, then updates the font matrix,
selects a font (TT2, TT1, TT2, TT1), draws some text and ends the text
operation.

`Tm` (which sets the font matrix) contains a scale, and uses that
to update the font size of the currently-active font (cf #20084).
But in this file, we `Tm` first and `Tf` (font selection) second,
so this updates the size of the old font. So when we pull it out
of the cache again on line 3, it would still have the old size
from the `Tm` on line 2.

(The whole text scaling logic in LibPDF imho needs a rethink; the
current approach also causes issues with zero-width glyphs which
currently lead to divisions by zero. But that's for another PR.)

Fixes another regression from c8510b58a3 (which I've accidentally
referred to by 2340e834cd in another commit).
2023-11-26 19:05:13 -05:00
Nico Weber
eb1c99bd72 LibPDF+LibGfx: Make SMasks on jpeg images work
SMasks are greyscale images that get used as alpha channel for a
different image.

JPEGs in PDFs are stored as streams with /DCTDecode filters, and
we have a separate code path for loading those in the PDF renderer.
That code path just calls our JPEG decoder, which creates bitmaps
with format BGRx8888.

So when we process an SMask for such a bitmap, we have to change
the bitmap's format to BGRA8888 in addition to setting alpha values
on all pixels.
2023-11-23 12:13:03 +01:00
Nico Weber
4440452f92 LibPDF: Support images with 1, 2, 4 bits per pixel
They just get upsampled to 8 bits per pixel images.
2023-11-18 07:33:15 +00:00
Nico Weber
29396415d5 LibPDF: Add an initial implementation of type 3 glyph rendering
This is a very inefficient implementation: Every time a type 3 font
glyph is drawn, we parse its operator stream and execute all the
operators therein.

We'll want to instead cache the glyphs in bitmaps (at least in most
cases), like we do for other fonts. But it's a good first step, and
all the coordinate math seems to work in the files I've tested.

Good test files from pdfa dataset 0000.zip:

- 0000559.pdf page 1 (and 2): Has a non-default font matrix;
  text appears mirrored if the font matrix isn't handled correctly

- 0000425.pdf, page 1: Draws several glyphs in a single run;
  glyphs overlap if Renderer::render_type3_glyph() ignores the
  passed-in point

- 0000211.pdf, any page: Uses type 3 glyphs for all text.
  Good perf test (already "reasonably fast")

- 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the
  purple box is a type 3 font glyph, and it's colored (which in part
  means the first operator is `d0`, while all the other documents above
  use `d1`)
2023-11-17 19:47:53 +00:00
Nico Weber
14ddab5519 LibPDF: Stub out type3_font_set_glyph_width*
Type 3 font glyphs begin with either `d0` or `d1`. If we bail out
with an "unsupported" error on the very first operator in a glyph,
we'll never paint the glyph.

Just stub these out for now. We probably want to do more in here in
the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).
2023-11-17 19:47:53 +00:00
Nico Weber
5513f8bbe3 LibPDF: Move ScopedState from a function on Renderer into Renderer
No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
bcc6439b5f LibPDF: Pass Renderer to PDFFont::draw_string()
It's a bit unfortunate that fonts need to know about the renderer,
but type 3 fonts contain PDF drawing operators, so it's necessary.

On the bright side, it makes it possible to pass fewer parameters
around and compute things locally as needed.

(As we implement more fonts, we'll probably want to create some
functions to do these computations in a central place, eventually.)

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
5eaa403ddf LibPDF: Use font dictionary object as cache key, not resource name
In the main page contents, /T0 might refer to a different font than
it might refer to in an XObject. So don't use the `Tf` argument as
font cache key. Instead, use the address of the font dictionary object.

Fixes false cache sharing, and also allows us to share cache entries
if the same font dict is referred to by two different names.

Fixes a regression from 2340e834cd (but keeps the speed-up intact).
2023-11-17 19:14:39 +01:00
Nico Weber
9b022239c3 LibPDF: Apply all offsets of TJ operator
TJ acts on a list of either strings or numbers.
The strings are drawn, and the numbers are treated as offsets.

Previously, we'd only apply the last-seen number as offset when
we saw a string. That had the effect of us ignoring all but the
last number in front of a string, and ignoring numbers at the
end of the list.

Now, we apply all numbers as offsets.
Our rendering of Tests/LibPDF/text.pdf now matches other PDF viewers.
2023-11-14 10:11:09 +01:00
Lucas CHOLLET
9e4d697d23 LibPDF: Detect DCT images correctly
Images can have multiple filters, each one of them is processed
sequentially. Only the last one will be relevant for the image format
(DCT or JPXDecode), so use the last filter instead of the first one to
detect that property.
2023-11-13 10:30:34 -05:00
Nico Weber
3dca11c4e2 LibPDF: Move color space creation from name or array into ColorSpace
No behavior change.
2023-11-05 14:27:22 -07:00
Nico Weber
8b806183f6 LibPDF: Tolerate indirect objects in various image dict values
0000101.pdf from 0000.zip from the pdfa dataset has /Height set to
an indirect object that contains an int.

Make that work, and make sure various other similar places getting
values of the image dict also resolve indirect references.
2023-10-26 10:58:45 +02:00
Nico Weber
1a58fee0fd LibPDF: Don't assert on named simple color space
If a PDF uses `/CustomName cs` and `/CustomName` then points at just a
name like `/DeviceGray` instead of an array, that's ok. Just using
`/DeviceGray cs` is simpler, so this extra level of indirection is
somewhat rare in practice, but it's valid and it does happen. So support
it.

We already have a helper that does the right thing that we just need to
call.

Together with #21524 and #21525, reduces number of crashes on 300 random
PDFs from the web (the first 300 from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 29 (9%) to 25 (8%).
2023-10-21 21:04:26 +02:00
Nico Weber
34cb506bad LibPDF: Replace another TODO with a message
Like ca1a98ba9f, but for stroke color.
2023-10-21 09:09:06 +02:00
Nico Weber
9442782881 LibPDF: Implement text_next_line_show_string_set_spacing
Not used terribly often, but e.g. used in 000333.pdf page 17 in
stillhq.com-pdfdb.
2023-10-20 14:24:31 -04:00
Nico Weber
78dea9500f LibPDF: Make operator parsing use ReadonlySpan instead of Vector
No behavior change.
2023-10-20 14:24:31 -04:00
Nico Weber
aea0e2f313 LibPDF: Rename ColorSpaceFamily function to may_be_specified_directly()
It used to be called ColorSpaceFamily::never_needs_parameters().

But in the cpp file, the macro arg was called ever_needs_parameters,
and the spec says

"If the color space is one that can be specified by a name and no
additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, and certain
cases of Pattern), the name may be specified directly."

so let's use that language here.

No behavior change.
2023-10-20 10:35:54 -06:00
Nico Weber
f5d3f47af3 LibPDF: Add spec comment about color spaces on images 2023-10-20 08:58:52 +02:00
Nico Weber
7c24a89acf LibPDF: Add spec comment about valid bits_per_component values 2023-10-20 08:58:52 +02:00
Nico Weber
64bb9aa8c7 LibPDF: Fix comment typo 2023-10-20 08:58:52 +02:00
Nico Weber
ea6fed627a LibPDF: Get color rendering intent from image dict
Still not used for anything, so no behavior change.
2023-10-20 08:58:52 +02:00
Nico Weber
708d5e2fe6 LibPDF: Implement color_rendering_intent operator
Implements the `ri` operator, and the `RI` key in a graphics state
dictionary.

We don't do anything yet with the color rendering intent except
store it.

No behavior change except removing a few "not yet implemented"
messages.
2023-10-19 16:51:16 -04:00
Nico Weber
609e640530 LibPDF: Try harder to use a RAII object to restore state
Follow-up to #21489. There, I made us use a RAII object.

That's great, but if the embedded instruction stream pushes
its own graphics state, then an early return would cause us to
not process graphics state pop instructions in the embedded stream.

To fix this, remember the graphics stack depth before entering
the nested instruction stream, and explicitly shrink the stack back
to that size upon exit.

Enables us to render all pages of
https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf
without crashing.
2023-10-19 16:49:00 -04:00