Commit Graph

97 Commits

Author SHA1 Message Date
Nico Weber
eb1c99bd72 LibPDF+LibGfx: Make SMasks on jpeg images work
SMasks are greyscale images that get used as alpha channel for a
different image.

JPEGs in PDFs are stored as streams with /DCTDecode filters, and
we have a separate code path for loading those in the PDF renderer.
That code path just calls our JPEG decoder, which creates bitmaps
with format BGRx8888.

So when we process an SMask for such a bitmap, we have to change
the bitmap's format to BGRA8888 in addition to setting alpha values
on all pixels.
2023-11-23 12:13:03 +01:00
Nico Weber
4440452f92 LibPDF: Support images with 1, 2, 4 bits per pixel
They just get upsampled to 8 bits per pixel images.
2023-11-18 07:33:15 +00:00
Nico Weber
29396415d5 LibPDF: Add an initial implementation of type 3 glyph rendering
This is a very inefficient implementation: Every time a type 3 font
glyph is drawn, we parse its operator stream and execute all the
operators therein.

We'll want to instead cache the glyphs in bitmaps (at least in most
cases), like we do for other fonts. But it's a good first step, and
all the coordinate math seems to work in the files I've tested.

Good test files from pdfa dataset 0000.zip:

- 0000559.pdf page 1 (and 2): Has a non-default font matrix;
  text appears mirrored if the font matrix isn't handled correctly

- 0000425.pdf, page 1: Draws several glyphs in a single run;
  glyphs overlap if Renderer::render_type3_glyph() ignores the
  passed-in point

- 0000211.pdf, any page: Uses type 3 glyphs for all text.
  Good perf test (already "reasonably fast")

- 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the
  purple box is a type 3 font glyph, and it's colored (which in part
  means the first operator is `d0`, while all the other documents above
  use `d1`)
2023-11-17 19:47:53 +00:00
Nico Weber
14ddab5519 LibPDF: Stub out type3_font_set_glyph_width*
Type 3 font glyphs begin with either `d0` or `d1`. If we bail out
with an "unsupported" error on the very first operator in a glyph,
we'll never paint the glyph.

Just stub these out for now. We probably want to do more in here in
the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).
2023-11-17 19:47:53 +00:00
Nico Weber
5513f8bbe3 LibPDF: Move ScopedState from a function on Renderer into Renderer
No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
bcc6439b5f LibPDF: Pass Renderer to PDFFont::draw_string()
It's a bit unfortunate that fonts need to know about the renderer,
but type 3 fonts contain PDF drawing operators, so it's necessary.

On the bright side, it makes it possible to pass fewer parameters
around and compute things locally as needed.

(As we implement more fonts, we'll probably want to create some
functions to do these computations in a central place, eventually.)

No behavior change.
2023-11-17 19:47:53 +00:00
Nico Weber
5eaa403ddf LibPDF: Use font dictionary object as cache key, not resource name
In the main page contents, /T0 might refer to a different font than
it might refer to in an XObject. So don't use the `Tf` argument as
font cache key. Instead, use the address of the font dictionary object.

Fixes false cache sharing, and also allows us to share cache entries
if the same font dict is referred to by two different names.

Fixes a regression from 2340e834cd (but keeps the speed-up intact).
2023-11-17 19:14:39 +01:00
Nico Weber
9b022239c3 LibPDF: Apply all offsets of TJ operator
TJ acts on a list of either strings or numbers.
The strings are drawn, and the numbers are treated as offsets.

Previously, we'd only apply the last-seen number as offset when
we saw a string. That had the effect of us ignoring all but the
last number in front of a string, and ignoring numbers at the
end of the list.

Now, we apply all numbers as offsets.
Our rendering of Tests/LibPDF/text.pdf now matches other PDF viewers.
2023-11-14 10:11:09 +01:00
Lucas CHOLLET
9e4d697d23 LibPDF: Detect DCT images correctly
Images can have multiple filters, each one of them is processed
sequentially. Only the last one will be relevant for the image format
(DCT or JPXDecode), so use the last filter instead of the first one to
detect that property.
2023-11-13 10:30:34 -05:00
Nico Weber
3dca11c4e2 LibPDF: Move color space creation from name or array into ColorSpace
No behavior change.
2023-11-05 14:27:22 -07:00
Nico Weber
8b806183f6 LibPDF: Tolerate indirect objects in various image dict values
0000101.pdf from 0000.zip from the pdfa dataset has /Height set to
an indirect object that contains an int.

Make that work, and make sure various other similar places getting
values of the image dict also resolve indirect references.
2023-10-26 10:58:45 +02:00
Nico Weber
1a58fee0fd LibPDF: Don't assert on named simple color space
If a PDF uses `/CustomName cs` and `/CustomName` then points at just a
name like `/DeviceGray` instead of an array, that's ok. Just using
`/DeviceGray cs` is simpler, so this extra level of indirection is
somewhat rare in practice, but it's valid and it does happen. So support
it.

We already have a helper that does the right thing that we just need to
call.

Together with #21524 and #21525, reduces number of crashes on 300 random
PDFs from the web (the first 300 from 0000.zip from
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/)
from 29 (9%) to 25 (8%).
2023-10-21 21:04:26 +02:00
Nico Weber
34cb506bad LibPDF: Replace another TODO with a message
Like ca1a98ba9f, but for stroke color.
2023-10-21 09:09:06 +02:00
Nico Weber
9442782881 LibPDF: Implement text_next_line_show_string_set_spacing
Not used terribly often, but e.g. used in 000333.pdf page 17 in
stillhq.com-pdfdb.
2023-10-20 14:24:31 -04:00
Nico Weber
78dea9500f LibPDF: Make operator parsing use ReadonlySpan instead of Vector
No behavior change.
2023-10-20 14:24:31 -04:00
Nico Weber
aea0e2f313 LibPDF: Rename ColorSpaceFamily function to may_be_specified_directly()
It used to be called ColorSpaceFamily::never_needs_parameters().

But in the cpp file, the macro arg was called ever_needs_parameters,
and the spec says

"If the color space is one that can be specified by a name and no
additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, and certain
cases of Pattern), the name may be specified directly."

so let's use that language here.

No behavior change.
2023-10-20 10:35:54 -06:00
Nico Weber
f5d3f47af3 LibPDF: Add spec comment about color spaces on images 2023-10-20 08:58:52 +02:00
Nico Weber
7c24a89acf LibPDF: Add spec comment about valid bits_per_component values 2023-10-20 08:58:52 +02:00
Nico Weber
64bb9aa8c7 LibPDF: Fix comment typo 2023-10-20 08:58:52 +02:00
Nico Weber
ea6fed627a LibPDF: Get color rendering intent from image dict
Still not used for anything, so no behavior change.
2023-10-20 08:58:52 +02:00
Nico Weber
708d5e2fe6 LibPDF: Implement color_rendering_intent operator
Implements the `ri` operator, and the `RI` key in a graphics state
dictionary.

We don't do anything yet with the color rendering intent except
store it.

No behavior change except removing a few "not yet implemented"
messages.
2023-10-19 16:51:16 -04:00
Nico Weber
609e640530 LibPDF: Try harder to use a RAII object to restore state
Follow-up to #21489. There, I made us use a RAII object.

That's great, but if the embedded instruction stream pushes
its own graphics state, then an early return would cause us to
not process graphics state pop instructions in the embedded stream.

To fix this, remember the graphics stack depth before entering
the nested instruction stream, and explicitly shrink the stack back
to that size upon exit.

Enables us to render all pages of
https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf
without crashing.
2023-10-19 16:49:00 -04:00
Nico Weber
b835d2bd66 LibPDF: Use a RAII object to restore state in recursive render
Previously, if one operator returned an error, the TRY() would cause
us to return without restoring the outer graphics state, leading to
problems such as handing a 3-tuple to a grayscale color space
(because the inner object set up a grayscale color space that we
failed to dispose of).

Makes us crash later on page 43 of
https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf
2023-10-18 19:43:31 -04:00
Nico Weber
3c2d820391 LibPDF: If softmask has different size than target bitmap, resize it
Size of smask and image aren't guaranteed to be equal by the spec
(...except for /Matte, see page 555 of the PDF 1.7 spec, but we
don't implement that), and in pratice they sometimes aren't.

Fixes an assert on page 4 of
https://devstreaming-cdn.apple.com/videos/wwdc/2017/821kjtggolzxsv/821/821_get_started_with_display_p3.pdf
We now make it all the way to page 43 of 64 before crashing.
2023-10-18 20:03:35 +01:00
Nico Weber
c8510b58a3 LibPDF: Cache fonts per page
Previously, every time a page switched fonts, we'd completely
re-parse the font.

Now, we cache fonts in Renderer, effectively caching them per page.

It'd be nice to have an LRU cache across pages too, but that's a
bigger change, and this already helps a lot.

Font size is part of the cache key, which means we re-parse the same
font at different font sizes. That could be better too, but again,
it's a big help as-is already.

Takes rendering the 1310 pages of the PDF 1.7 reference with

    Build/lagom/bin/pdf --debugging-stats \
        ~/Downloads/pdf_reference_1-7.pdf

from 71 s to 11s :^)

Going through pages especially in the index is noticeably snappier.

(On the PDF 2.0 spec, ISO_32000-2-2020_sponsored.pdf, it's less
dramatic: From 19s to 16s.)
2023-10-11 07:10:19 +02:00
MacDue
6088374ad2 LibPDF: Ensure all subpaths are closed before filling paths
This lets us correctly draw figure 3.4 in pdf_reference_1-7.pdf.
2023-07-25 13:42:40 +02:00
Nico Weber
ca1a98ba9f LibPDF: Replace two more crashes with messages 2023-07-23 23:05:32 -04:00
Nico Weber
29c3a9c5f0 LibPDF: Don't crash on images without /Filter
Fixes a crash rendering page 819 of ISO_32000-2-2020_sponsored.pdf
which contains an uncompressed 2x2 1bpp grayscale bitmap.
2023-07-23 23:04:55 -04:00
Nico Weber
18b86b1868 LibPDF: Apply text matrix scale to character and word spacing 2023-07-22 12:24:29 -04:00
Nico Weber
e3cc05b935 LibPDF: Don't ignore word_spacing 2023-07-22 12:24:29 -04:00
Nico Weber
6caaffa134 LibPDF: Add a few FIXMEs to set_graphics_state_from_dict 2023-07-21 08:17:12 +02:00
Matthew Olsson
5f8fd47214 LibPDF: Resize fonts when the text and line matrices change 2023-07-20 06:56:41 +01:00
Matthew Olsson
9a0e1dde42 LibPDF: Propogate errors from ColorSpace::color() 2023-07-20 06:56:41 +01:00
Nico Weber
c625ba34fe LibPDF: Implement set_flatness_tolerance
We now track it in the graphics state. It isn't used for anything yet.
Fixes the one thing that rendering the first 100 pages of
pdf_reference_1-7.pdf complains about.
2023-07-12 18:22:52 -04:00
Nico Weber
69c965b987 LibPDF: Move code to compute full page contents into Page
Pure code move, no behavior change.
2023-07-12 18:22:35 -04:00
Julian Offenhäuser
4ec01669fc LibPDF: Scale vector paths with the view
This ensures that lines have the correct size at every scale factor.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
731676c041 LibPDF: Accept floats as line dash pattern phases 2023-03-25 16:27:30 -06:00
Julian Offenhäuser
b90a794d78 LibPDF: Allow pages with no specified contents
The contents object may be omitted as per spec, which will just leave
the page blank.
2023-03-25 16:27:30 -06:00
Andreas Kling
8a48246ed1 Everywhere: Stop using NonnullRefPtrVector
This class had slightly confusing semantics and the added weirdness
doesn't seem worth it just so we can say "." instead of "->" when
iterating over a vector of NNRPs.

This patch replaces NonnullRefPtrVector<T> with Vector<NNRP<T>>.
2023-03-06 23:46:35 +01:00
Rodrigo Tobar
bf61f94413 LibPDF: Don't crash when a font hasn't been loaded yet
This could happen because there was a problem while loading the first
font in the document.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
79b4293687 LibPDF: Prevent crashes when loading XObject streams
These streams might need a Filter that isn't implemented yet, and thus
cannot be blindly MUST()-ed.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
cb04e4e9da LibPDF: Refactor *Font classes
The PDFFont class hierarchy was very simple (a top-level PDFFont class,
followed by all the children classes that derived directly from it).
While this design was good enough for some things, it didn't correctly
model the actual organization of font types:

 * PDF fonts are first divided between "simple" and "composite" fonts.
   The latter is the Type0 font, while the rest are all simple.
 * PDF fonts yield a glyph per "character code". Simple fonts char codes
   are always 1 byte long, while Type0 char codes are of variable size.

To this effect, this commit changes the hierarchy of Font classes,
introducing a new SimpleFont class, deriving from PDFFont, and acting as
the parent of Type1Font and TrueTypeFont, while Type0 still derives from
PDFFont directly. This distinction allows us now to:

 * Model string rendering differently from simple and composite fonts:
   PDFFont now offers a generic draw_string method that takes a whole
   string to be rendered instead of a single char code. SimpleFont
   implements this as a loop over individual bytes of the string, with
   T1 and TT implementing draw_glyph for drawing a single char code.
 * Some common fields between T1 and TT fonts now live under SimpleFont
   instead of under PDFfont, where they previously resided.
 * Some other interfaces specific to SimpleFont have been cleaned up,
   with u16/u32 not appearing on these classes (or in PDFFont) anymore.
 * Type0Font's rendering still remains unimplemented.

As part of this exercise I also took the chance to perform the following
cleanups and restructurings:

 * Refactored the creation and initialisation of fonts. They are all
   centrally created at PDFFont::create, with a virtual "initialize"
   method that allows them to initialise their inner members in the
   correct order (parent first, child later) after creation.
 * Removed duplicated code.
 * Cleaned up some public interfaces: receive const refs, removed
   unnecessary ctro/dtors, etc.
 * Slightly changed how Type1 and TrueType fonts are implemented: if
   there's an embedded font that takes priority, otherwise we always
   look for a replacement.
 * This means we don't do anything special for the standard fonts. The
   only behavior previously associated to standard fonts was choosing an
   encoding, and even that was under questioning.
2023-02-24 20:16:50 +01:00
Rodrigo Tobar
db9fa7ff07 LibPDF: Allow show_text to return errors
Errors can (and do) occur when trying to render text, and so far we've
silently ignored them, making us think that all is well when it isn't.
Letting show_text return errors will allow us to inform the user about
these errors instead of having to hiding them.
2023-02-24 20:16:50 +01:00
Rodrigo Tobar
82bac7e665 LibPDF: Fix clipping of painting operations
While the clipping logic was correct (current v/s new clipping path),
the clipping path contents weren't. This commit fixed that.

We calculate the clipping path in two places: when we set it to be the
whole page at graphics state creation time, and when we perform clipping
path intersection to calculate a new clipping path. The clipping path is
then used to limit painting by passing it to the painter (more
precisely, but passing its bounding box to the painter, as the latter
doesn't support arbitrary path clipping). For this last point the
clipping path must be in device coordinates.

There was however a mix of coordinate systems involved in the creation,
update and usage of the clipping path:

 * The initial values of the path (i.e., the whole page) were in user
   coordinates.
 * Clipping path intersection was performed against m_current_path,
   which is in device coordinates.
 * To perform the clipping operation, the current clipping path was
   assumed to be in user coordinates.

This mix resulted in the clipping not working correctly depending on the
zoom level at which one visualised a page.

This commit fixes the issue by always keeping track of the clipping path
in device coordinates. This means that the initial full-page contents
are now converted to device coordinates before putting them in the
graphics state, and that no mapping is performed when applied the
clipping to the painter.
2023-02-04 12:29:57 +01:00
Rodrigo Tobar
fb0c3a9e18 LibPDF: Stop calculating code points for glyphs
When rendering text, a sequence of bytes corresponds to a glyph, but not
necessarily to a character. This misunderstanding permeated through the
Encoding through to the Font classes, which were all trying to calculate
such values. Moreover, this was done only to identify "space"
characters/glyphs, which were getting a special treatment (e.g., avoid
rendering). Spaces are not special though -- there might be fonts that
render something for them -- and thus should not be skipped
2023-02-02 14:50:38 +01:00
Tim Schumacher
82a152b696 LibGfx: Remove try_ prefix from bitmap creation functions
Those don't have any non-try counterpart, so we might as well just omit
it.
2023-01-26 20:24:37 +00:00
Timothy Flynn
f3db548a3d AK+Everywhere: Rename FlyString to DeprecatedFlyString
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
2023-01-09 23:00:24 +00:00
MacDue
91db49f7b3 LibPDF: Use subpixel accurate text rendering
This just enables the new tricks from LibGfx with the same nice
improvements :^)
2023-01-05 12:09:35 +01:00
Rodrigo Tobar
c4bc27f274 LibPDF: Don't abort on unsupported drawing operations
Instead of calling TODO(), which will abort the program, we now return
an Error specifying that we haven't implemented the drawing operation
yet. This will now nicely trickle up all the way through to the
PDFViewer, which will then notify its clients about the problem.
2022-12-16 10:04:23 +01:00
Rodrigo Tobar
e87fecf710 LibPDF: Switch to best-effort PDF rendering
The current rendering routine aborts as soon as an error is found during
rendering, which potentially severely limits the contents we show on
screen. Moreover, whenever an error happens the PDFViewer widget shows
an error dialog, and doesn't display the bitmap that has been painted so
far.

This commit improves the situation in both fronts, implementing
rendering now with a best-effort approach. Firstly, execution of
operations isn't halted after an operand results in an error, but
instead execution of all operations is always attempted, and all
collected errors are returned in bulk. Secondly, PDFViewer now always
displays the resulting bitmap, regardless of error being produced or
not. To communicate errors, an on_render_errors callback has been added
so clients can subscribe to these events and handle them as appropriate.
2022-12-16 10:04:23 +01:00