This implementation currently handles Form XObjects only, skipping
image XObjects. When rendering an XObject, its resources are passed to
the underlying operations so they use those instead of the Page's.
Operators usually assume that the resources its operations will require
will be the Page's. This assumption breaks however when XObjects with
their own resources come into the picture (and maybe other cases too).
In that case, the XObject's resources take precedence, but they should
also contain the Page's resources. Because of this, one can safely use
the XObject resources alone when given, and default to the Page's if
not.
This commit adds all operator calls an extra argument with optional
resources, which will be fed by XObjects as necessary.
Resources can come from other sources (e.g., XObjects), and since the
only attribute we are reading from Page are its resources it makes sense
to receive resources instead. That way we'll be able to pass down
arbitrary resources that are not necessarily declared at the page level.
Paths rendering was buggy because the map() function that translates
points from user space to bitmap space applied the vertical flip
conversion that the current transformation matrix already considers;
Hence, all paths were upside down. The only exception was the "re"
instruction, which manually adjusted the Y coordinate of its points to
be flipped again (and had a FIXME saying that this should be
unnecessary).
This commit fixes the map() function that maps userspace points to
bitmap coordinates. The "re" operator implementation has also been
simplified creating a rectangle first and mapping *that* instead of
mapping each point individually.
A new struct allows users to specify specific rendering preferences that
the Renderer class might use to paint some Document elements onto the
target bitmap. The first toggle allows rendering (or not) the clipping
paths on a page, which is useful for debugging.
The existing path clipping support was broken, as it performed the
clipping operation as soon as the path clipping commands (W/W*) were
received. The correct behavior is to keep a clipping path in the
graphic state, *intersect* that with the current path upon receiving
W/W*, and apply the clipping when performing painting operations. On top
of that, the intersection happening at W/W* time does not affect the
painting operation happening on the current on-build path, but takes
effect only after the current path is cleared; therefore a current and a
next clipping path need to be kept track of.
Path clipping is not yet supported on the Painter class, nor is path
intersection. We thus continue using the same simplified bounding box
approach to calculate clipping paths.
Since now we are dealing with more rectangles-as-path code, I've made
helper functions to build a rectangle path and reuse it as needed.
An XRef table usually starts with an object number of zero. While it
could technically start at any other number, this is a tell-tale sign
of a broken table.
For the "broken" documents I encountered, this always meant that some
objects must have been removed from the start of the table, without
updating the following indices. When this is the case, the document is
not able to be read normally.
However, most other PDF parsers seem to know of this quirk and fix the
XRef table automatically.
Likewise, we now check for this exact case, and if it matches up with
what we expect, we update the XRef table such that all object numbers
match the actual objects found in the file again.
It was previously the job of the renderer to create fonts, load
replacements for the standard 14 fonts and to pass the font size back
to the PDFFont when asking for glyph widths.
Now, the renderer tells the font its size at creation, as it doesn't
change throughout the life of the font. The PDFFont itself is now
responsible to decide whether or not it needs to use a replacement
font, which still is Liberation Serif for now.
This means that we can now render embedded TrueType fonts as well :^)
It also makes the renderer's job much more simple and leads to a much
cleaner API design.
We would previously pass this function a unicode code point, which is
not actually what we want here.
Instead, we want the "raw" code point, with the font itself deciding
whether or not it needs to be re-mapped.
This same mistake in terminology applied to PS1FontProgram.
For flate and lzw filters, the data can be transformed by this
predictor function to make it compress better. For us this means that
we have to undo this step in order to get the right result.
Although this feature is meant for images, I found at least a few
documents that use it all over the place, making this step very
important.
We previously compared two unrelated values to determine if we parsed
the xref table to completion. We now check if we added every subsection
instead, and double check to make sure we never read past the end.
This filter basically tells us that we are dealing with a JPEG.
Note that by serializing the resulting image we assume that this filter
is the last one in the chain, everything else would be highly unlikely.
We currently don't support ICC color spaces and fall back to a "simple"
one instead.
If no alternative is specified however, we are allowed to pick the
closest match based on the number of color components.
This gives much better visual results than painting the path directly.
It also has the nice side effect that Type 1 fonts will now look much
more similar to TrueType fonts, which use the same class :^)
In addition, we can now cache glyph bitmaps for repeated use.
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.
Also included are changes to readd now missing dependencies to tools
that actually need them.
Even though the toolchain implicitly links against -lc, it does not know
where it should get LibC from except for the sysroot. In the case of
Clang this causes it to pick up the LibC stub instead, which might be
slightly outdated and feature missing symbols.
This is currently not an issue that manifests because we pass through
the dependency on LibC and other libraries by accident, which causes
CMake to link against the LibC target (instead of just the library),
and thus points the linker at the build output directory.
Since we are looking to fix that in the upcoming commits, let's make
sure that everything will still be able to find the proper LibC first.
Previously we would draw all text, no matter what font type, as
Liberation Serif, which results in things like ugly character spacing.
We now have partial support for drawing Type 1 glyphs, which are part of
a PostScript font program. We completely ignore hinting for now, which
results in ugly looking characters at low resolutions, but gain support
for a large number of typefaces, including most of the default fonts
used in TeX.
A PDFFont can now be asked for its specific type and whether it is part
of the standard 14 fonts. It now also contains a method to draw a
glyph, which is stubbed-out for now.
This will be useful for the renderer to take into consideration when
drawing text, since we don't include replacements for the standard set
of fonts yet, but still want to make use of embedded fonts when
available.
This remained undetected for a long time as HeaderCheck is disabled by
default. This commit makes the following file compile again:
// file: compile_me.cpp
#include <LibDNS/Question.h>
// That's it, this was enough to cause a compilation error.
Likewise for most other files touched by this commit.
conditionally_parse_page_tree_node used to assume that the xref table
contained a byte offset, even for compressed objects. It now uses the
common facilities for parsing objects, at the expense of some
performance.
When looking up differences in the specified encoding, we previously
didn't recognize a lot of characters, namely those that are referred to
by a string in the PDF itself, like "/germandbls".
We now create a mapping of those characters to the code points they are
referring to, and correctly look them up when needed.
As per spec, the positioning (or kerning) parameter of this operator
should translate the text matrix before the next showing of text.
Previously, this calculation was slightly wrong and also only applied
after the text was already shown.
Now, whenever the xref table points to a compressed object,
parse_object_with_index will look it up in the corresponding object
stream as if it were a regular object.
With this, our parser gains the bare minimum support for xref streams.
Since PDF version 1.5, a document may omit the xref table in favor of
a new kind of xref stream object. This is used to reference so-called
"compressed" objects that are part of an object stream.
With this patch we are able to parse this new kind of xref object, but
we'll have to implement object streams to use them correctly.
The Parser class is now a generic PDF object parser, of which the new
DocumentParser class derives. DocumentParser now takes over all
functions relating to linearization, pages, xref and trailer handling.
This allows the use of multiple parsers in the same document's
context, which will be needed in order to handle PDF object streams.
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).
No functional changes.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.