Commit Graph

157 Commits

Author SHA1 Message Date
Linus Groh
d26aabff04 Everywhere: Run clang-format 2022-12-03 23:52:23 +00:00
Rodrigo Tobar
cb3e05f476 LibPDF: Add initial implementation of XObject rendering
This implementation currently handles Form XObjects only, skipping
image XObjects. When rendering an XObject, its resources are passed to
the underlying operations so they use those instead of the Page's.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
b3007c17bd LibPDF: Allow operators to receive optional resources
Operators usually assume that the resources its operations will require
will be the Page's. This assumption breaks however when XObjects with
their own resources come into the picture (and maybe other cases too).
In that case, the XObject's resources take precedence, but they should
also contain the Page's resources. Because of this, one can safely use
the XObject resources alone when given, and default to the Page's if
not.

This commit adds all operator calls an extra argument with optional
resources, which will be fed by XObjects as necessary.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
e58165ed7a LibPDF: Render cubic bezier curves
The implementation of bezier curves already exists on Gfx, so
implementing the PDF rendering of this command is trivial.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
fe5c823989 LibPDF: Communicate resources to ColorSpace, not Page
Resources can come from other sources (e.g., XObjects), and since the
only attribute we are reading from Page are its resources it makes sense
to receive resources instead. That way we'll be able to pass down
arbitrary resources that are not necessarily declared at the page level.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
164422f8d8 LibPDF: Add further common names 2022-11-30 14:51:14 +01:00
Rodrigo Tobar
5277ad1d6d LibPDF: Implement Run Length Decoding
This is a simple decoding process that is needed by some streams.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
e776048309 LibPDF: Ignore whitespace on hex strings
The spec says that whitespaces should be ignored, but we weren't. PDFs
with whitespaces in their hex strings were thus crushing the parser.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
d04613d252 LibPDF: Fix path coordinates calculation
Paths rendering was buggy because the map() function that translates
points from user space to bitmap space applied the vertical flip
conversion that the current transformation matrix already considers;
Hence, all paths were upside down. The only exception was the "re"
instruction, which manually adjusted the Y coordinate of its points to
be flipped again (and had a FIXME saying that this should be
unnecessary).

This commit fixes the map() function that maps userspace points to
bitmap coordinates. The "re" operator implementation has also been
simplified creating a rectangle first and mapping *that* instead of
mapping each point individually.
2022-11-26 08:56:35 +01:00
Rodrigo Tobar
e92ec26771 LibPDF: Introduce rendering preferences and show clipping paths
A new struct allows users to specify specific rendering preferences that
the Renderer class might use to paint some Document elements onto the
target bitmap. The first toggle allows rendering (or not) the clipping
paths on a page, which is useful for debugging.
2022-11-25 23:03:24 +01:00
Rodrigo Tobar
a1e36e8f78 LibPDF: Improve path clipping support
The existing path clipping support was broken, as it performed the
clipping operation as soon as the path clipping commands (W/W*) were
received. The correct behavior is to keep a clipping path in the
graphic state, *intersect* that with the current path upon receiving
W/W*, and apply the clipping when performing painting operations. On top
of that, the intersection happening at W/W* time does not affect the
painting operation happening on the current on-build path, but takes
effect only after the current path is cleared; therefore a current and a
next clipping path need to be kept track of.

Path clipping is not yet supported on the Painter class, nor is path
intersection. We thus continue using the same simplified bounding box
approach to calculate clipping paths.

Since now we are dealing with more rectangles-as-path code, I've made
helper functions to build a rectangle path and reuse it as needed.
2022-11-25 23:03:24 +01:00
Julian Offenhäuser
d1bc89e30b LibPDF: Try to repair XRef tables with broken indices
An XRef table usually starts with an object number of zero. While it
could technically start at any other number, this is a tell-tale sign
of a broken table.

For the "broken" documents I encountered, this always meant that some
objects must have been removed from the start of the table, without
updating the following indices. When this is the case, the document is
not able to be read normally.

However, most other PDF parsers seem to know of this quirk and fix the
XRef table automatically.

Likewise, we now check for this exact case, and if it matches up with
what we expect, we update the XRef table such that all object numbers
match the actual objects found in the file again.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
e06a065594 LibPDF: Override Type 1 character mappings by encoding in font dict
If the font dictionary includes an "Encoding" entry, it will be used
instead of the PS1FontProgram's built-in encoding.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
65ff80e8a5 LibPDF: Add alternative names to is_standard_latin_font() helper 2022-11-25 22:44:47 +01:00
Julian Offenhäuser
9cb3b23377 LibPDF: Move all font handling to Type1Font and TrueTypeFont classes
It was previously the job of the renderer to create fonts, load
replacements for the standard 14 fonts and to pass the font size back
to the PDFFont when asking for glyph widths.

Now, the renderer tells the font its size at creation, as it doesn't
change throughout the life of the font. The PDFFont itself is now
responsible to decide whether or not it needs to use a replacement
font, which still is Liberation Serif for now.

This means that we can now render embedded TrueType fonts as well :^)

It also makes the renderer's job much more simple and leads to a much
cleaner API design.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
e748a94f80 LibPDF: Introduce loading of common font data in PDFFont base class
This font data is shared between Type 1 and TrueType fonts, which is
why we can now load it in the base class that they both use.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
dd82a026f8 LibPDF: Pass PDFFont::draw_glyph() a char code instead of a code point
We would previously pass this function a unicode code point, which is
not actually what we want here.

Instead, we want the "raw" code point, with the font itself deciding
whether or not it needs to be re-mapped.

This same mistake in terminology applied to PS1FontProgram.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
8532ca1b57 LibPDF: Convert dash pattern array elements to integers if necessary
They may be floats instead.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
0bc3333740 LibPDF: Parse integer numbers with atoi() instead of strtof()
strtof() produces rounding errors for very large numbers, which we
don't want for integers, as they may have to be precise.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
c2ad29c85f LibPDF: Implement png predictor decoding for flate filter
For flate and lzw filters, the data can be transformed by this
predictor function to make it compress better. For us this means that
we have to undo this step in order to get the right result.

Although this feature is meant for images, I found at least a few
documents that use it all over the place, making this step very
important.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
4bd79338e8 LibPDF: Fix off-by-one error in Reader::remaining() 2022-11-19 15:42:08 +01:00
Julian Offenhäuser
4b1a72ff7a LibPDF: Fix loop condition in parse_xref_stream()
We previously compared two unrelated values to determine if we parsed
the xref table to completion. We now check if we added every subsection
instead, and double check to make sure we never read past the end.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
a17a23a3f0 LibPDF: Make some variable names in parse_xref_stream() more clear
I found these to be a bit misleading.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
f926dfe36b LibPDF: Implement the DCT filter
This filter basically tells us that we are dealing with a JPEG.
Note that by serializing the resulting image we assume that this filter
is the last one in the chain, everything else would be highly unlikely.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
baaf42360e LibPDF: Derive alternate ICC color space from the number of components
We currently don't support ICC color spaces and fall back to a "simple"
one instead.

If no alternative is specified however, we are allowed to pick the
closest match based on the number of color components.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
16ed407c01 LibPDF: Support cascading stream filters
You can specify multiple filters as an array, where each one is fed the
output of the one before it.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
becd648a78 LibPDF: Parse hexadecimal values in name objects correctly 2022-11-19 15:42:08 +01:00
Julian Offenhäuser
7c4f5b58be LibPDF: Use Gfx::PathRasterizer for Adobe Type 1 font rendering
This gives much better visual results than painting the path directly.
It also has the nice side effect that Type 1 fonts will now look much
more similar to TrueType fonts, which use the same class :^)

In addition, we can now cache glyph bitmaps for repeated use.
2022-11-19 11:04:34 +01:00
Tim Schumacher
ce2f1b845f Everywhere: Mark dependencies of most targets as PRIVATE
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.

Also included are changes to readd now missing dependencies to tools
that actually need them.
2022-11-01 14:49:09 +00:00
Tim Schumacher
7834e26ddb Everywhere: Explicitly link all binaries against the LibC target
Even though the toolchain implicitly links against -lc, it does not know
where it should get LibC from except for the sysroot. In the case of
Clang this causes it to pick up the LibC stub instead, which might be
slightly outdated and feature missing symbols.

This is currently not an issue that manifests because we pass through
the dependency on LibC and other libraries by accident, which causes
CMake to link against the LibC target (instead of just the library),
and thus points the linker at the build output directory.

Since we are looking to fix that in the upcoming commits, let's make
sure that everything will still be able to find the proper LibC first.
2022-11-01 14:49:09 +00:00
Julian Offenhäuser
b14f0950a5 LibPDF: Add very basic support for Adobe Type 1 font rendering
Previously we would draw all text, no matter what font type, as
Liberation Serif, which results in things like ugly character spacing.

We now have partial support for drawing Type 1 glyphs, which are part of
a PostScript font program. We completely ignore hinting for now, which
results in ugly looking characters at low resolutions, but gain support
for a large number of typefaces, including most of the default fonts
used in TeX.
2022-10-16 17:44:54 +02:00
Julian Offenhäuser
e6f29302a7 LibPDF: Add glyph drawing and type info methods to PDFFont
A PDFFont can now be asked for its specific type and whether it is part
of the standard 14 fonts. It now also contains a method to draw a
glyph, which is stubbed-out for now.

This will be useful for the renderer to take into consideration when
drawing text, since we don't include replacements for the standard set
of fonts yet, but still want to make use of embedded fonts when
available.
2022-10-16 17:44:54 +02:00
Julian Offenhäuser
36f83cecab LibPDF: Allow page objects to inherit the MediaBox and Resources entries 2022-10-16 17:44:54 +02:00
Julian Offenhäuser
2f71e0f09a LibPDF: Allow text operator sequences to start with whitespace 2022-10-16 17:44:54 +02:00
Julian Offenhäuser
7ecd420b03 LibPDF: Parse floating point numbers that omit a leading zero correctly 2022-10-16 17:44:54 +02:00
Ben Wiederhake
a99cd09891 Libraries: Add missing includes, add namespace qualifiers
This remained undetected for a long time as HeaderCheck is disabled by
default. This commit makes the following file compile again:

    // file: compile_me.cpp
    #include <LibDNS/Question.h>
    // That's it, this was enough to cause a compilation error.

Likewise for most other files touched by this commit.
2022-09-18 13:27:24 -04:00
Julian Offenhäuser
77f5f7a6f4 LibPDF: Support parsing page tree nodes that are in object streams
conditionally_parse_page_tree_node used to assume that the xref table
contained a byte offset, even for compressed objects. It now uses the
common facilities for parsing objects, at the expense of some
performance.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
6225c03256 LibPDF: Rename argument for the latin character set enumeration macro
The previous name "V" collided with one of the entries.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
633e1632d0 LibPDF: Allow whitespace other than EOL after an object marker 2022-09-17 10:07:14 +01:00
Julian Offenhäuser
65e83bed53 LibPDF: Disallow parsing indirect values as operands
An operation like 0 0 0 RG would have been confused for [ 0, 0 0 R ] G
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
04cb00dc9a LibPDF: Fix handling of differences array in custom encodings
When looking up differences in the specified encoding, we previously
didn't recognize a lot of characters, namely those that are referred to
by a string in the PDF itself, like "/germandbls".

We now create a mapping of those characters to the code points they are
referring to, and correctly look them up when needed.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
36828f1385 LibPDF: Don't expect glyph width arrays to contain integers
They might also contain floats, in which case we convert them to int
before use.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
97ed4106e5 LibPDF: Fix text positioning with operator TJ
As per spec, the positioning (or kerning) parameter of this operator
should translate the text matrix before the next showing of text.
Previously, this calculation was slightly wrong and also only applied
after the text was already shown.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
563d91b6c4 LibPDF: Implement loading compressed objects from object streams
Now, whenever the xref table points to a compressed object,
parse_object_with_index will look it up in the corresponding object
stream as if it were a regular object.

With this, our parser gains the bare minimum support for xref streams.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
f9beff7b5e LibPDF: Initial work on parsing xref streams
Since PDF version 1.5, a document may omit the xref table in favor of
a new kind of xref stream object. This is used to reference so-called
"compressed" objects that are part of an object stream.

With this patch we are able to parse this new kind of xref object, but
we'll have to implement object streams to use them correctly.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
4887aacec7 LibPDF: Move document-specific parsing functionality into its own class
The Parser class is now a generic PDF object parser, of which the new
DocumentParser class derives. DocumentParser now takes over all
functions relating to linearization, pages, xref and trailer handling.

This allows the use of multiple parsers in the same document's
context, which will be needed in order to handle PDF object streams.
2022-09-17 10:07:14 +01:00
Julian Offenhäuser
9f4659cc63 LibPDF: Move consume and match helper functions to the Reader class 2022-09-17 10:07:14 +01:00
Ben Wiederhake
ff96747e17 LibPDF: Break inclusion cycle by removing unnecessary include 2022-09-17 04:00:54 +00:00
sin-ack
c8585b77d2 Everywhere: Replace single-char StringView op. arguments with chars
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00