Commit Graph

271 Commits

Author SHA1 Message Date
Rodrigo Tobar
82bd854d6f LibPDF: Account for other endings of PS1 Encoding array 2023-02-08 19:47:15 +01:00
Rodrigo Tobar
a533ea7ae6 LibPDF: Improve stream parsing
When parsing streams we rely on a /Length item being defined in the
stream's dictionary to know how much data comprises the stream. Its
value is usually a direct value, but it can be indirect. There was
however a contradiction in the code: the condition that allowed it to
read and use the /Length value required it to be a direct value, but the
actual code using the value would have worked with indirect ones. This
meant that indirect /Length values triggered the fallback, "manual"
stream parsing code.

On the other hand, this latter code was also buggy, because it relied on
the "endstream" keyword to appear on a separate line, which isn't always
the case.

This commit both fixes the bug in the manual stream parsing scenario,
while also allowing for indirect /Length values to be used to parse
streams more directly and avoid the manual approach. The main caveat to
this second change is that for a brief period of time the Document is
not able to resolve references (i.e., before the xref table itself is
not parsed). Any parsing happening before that (e..g, the linearization
dictionary) must therefore use the manual stream parsing approach.
2023-02-08 19:47:15 +01:00
Tim Schumacher
220fbcaa7e AK: Remove the fallible constructor from FixedMemoryStream 2023-02-08 17:44:32 +00:00
Tim Schumacher
261d62438f AK: Remove the fallible constructor from LittleEndianInputBitStream 2023-02-08 17:44:32 +00:00
Rodrigo Tobar
82bac7e665 LibPDF: Fix clipping of painting operations
While the clipping logic was correct (current v/s new clipping path),
the clipping path contents weren't. This commit fixed that.

We calculate the clipping path in two places: when we set it to be the
whole page at graphics state creation time, and when we perform clipping
path intersection to calculate a new clipping path. The clipping path is
then used to limit painting by passing it to the painter (more
precisely, but passing its bounding box to the painter, as the latter
doesn't support arbitrary path clipping). For this last point the
clipping path must be in device coordinates.

There was however a mix of coordinate systems involved in the creation,
update and usage of the clipping path:

 * The initial values of the path (i.e., the whole page) were in user
   coordinates.
 * Clipping path intersection was performed against m_current_path,
   which is in device coordinates.
 * To perform the clipping operation, the current clipping path was
   assumed to be in user coordinates.

This mix resulted in the clipping not working correctly depending on the
zoom level at which one visualised a page.

This commit fixes the issue by always keeping track of the clipping path
in device coordinates. This means that the initial full-page contents
are now converted to device coordinates before putting them in the
graphics state, and that no mapping is performed when applied the
clipping to the painter.
2023-02-04 12:29:57 +01:00
Rodrigo Tobar
286e3e6872 LibPDF: Simplify Encoding to align with simple font requirements
All "Simple Fonts" in PDF (all but Type0 fonts) have the property that
glyphs are selected with single byte character codes. This means that
the Encoding objects should use u8 for representing these character
codes. Moreover, and as mentioned in a previous commit, there is no need
to store the unicode code point associated with a character (which was
in turn wrongly associated to a glyph).

This commit greatly simplifies the Encoding class. Namely it:

 * Removes the unnecessary CharDescriptor class.
 * Changes the internal maps to be u8 -> FlyString and vice-versa,
   effectively providing two-way lookups.
 * Adds a new method to set a two-way u8 -> FlyString mapping and uses
   it in all possible places.
 * Simplified the creation of Encoding objects.
 * Changes how the WinAnsi special treatment for bullet points is
   implemented.
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
fb0c3a9e18 LibPDF: Stop calculating code points for glyphs
When rendering text, a sequence of bytes corresponds to a glyph, but not
necessarily to a character. This misunderstanding permeated through the
Encoding through to the Font classes, which were all trying to calculate
such values. Moreover, this was done only to identify "space"
characters/glyphs, which were getting a special treatment (e.g., avoid
rendering). Spaces are not special though -- there might be fonts that
render something for them -- and thus should not be skipped
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
7c42d6c737 LibPDF: Fix ZapfDingbat's char codes
The initial values were fine, but those starting at 100 were wrong: they
are all octal values, but since they were missing an initial 0 they were
interpreted as decimals.
2023-02-02 14:50:38 +01:00
Rodrigo Tobar
2f773b3c5c LibPDF: Stop storing unicode code points in Encoding
In PDF's fonts, encoding objects are used to translate bytes into fonts'
glyphs. Glyphs (in the fonts we currently support) organise their glyphs
in such a way that they are accessed by name, and thus encoding
translate between a byte sequence and a glyph name.

Note that an no point this translation includes a Unicode character, and
therefore assigning a character to a glyph in the Encoding object is the
wrong thing to do. Moreover, using the code point for this character
during the byte-sequence-to-glyph translation sequence is double-wrong.

This commit removes the characters associated to each translation in the
built-in Encoding objects. In order to keep commits short and sweet, I'm
currently simply removing the character from the enumeration, leaving
the old structure this information was held on intact. Instead, I'm
filling the "code_point" member with a zero, and filling both mappings
(which will be changed later on too) with the glyph name and the
associated char code.
2023-02-02 14:50:38 +01:00
Tim Schumacher
093cf428a3 AK: Move memory streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher
2470dd3bb5 AK: Move bit streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher
ae64b68717 AK: Deprecate the old AK::Stream
This also removes a few cases where the respective header wasn't
actually required to be included.
2023-01-29 19:16:44 -07:00
Sam Atkins
bc1504c794 LibPDF: Remove declarations for non-existent methods 2023-01-27 20:33:18 +00:00
Tim Schumacher
82a152b696 LibGfx: Remove try_ prefix from bitmap creation functions
Those don't have any non-try counterpart, so we might as well just omit
it.
2023-01-26 20:24:37 +00:00
Rodrigo Tobar
3588709986 LibPDF: Load Type1C fonts when found
Now that our CFF parser is working we can load Type1C fonts in PDF,
which are backed by a CFF stream.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
c4b45a82cd LibPDF: Add initial CFF parsing
The Compat Font Format specification (Adobe's Technical Note #5176) is
used by PDF's Type1C fonts to store their data. While being similar in
spirit to PS1 Type 1 Font Programs, it was designed for a more compact
representation and thus space reduction (but an increment on
complexity). It also shares most of the charstring encoding logic, which
is why the CFF class also inherits from Type1FontProgram.

This initial implementation is still lacking many details, e.g.:

 * It doesn't include all the built-in CFF SIDs
 * It doesn't support CFF-provided SIDs (defaults those glyphs to the
   space character)
 * More checks in general
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
1ec4ad5eb6 LibPDF: Add name -> char code conversion in Encoding
This is an operation that was already being done (sub-optimally) in
PS1FontProgram, so we are replacing that. We will use this during CFF
parsing too.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
c592b889bf LibPDF: Add Reader::try_read for easier error propagation
This will allow us to use TRY(reader.try_read) instead of having to
verify the result of reader.remaining() before calling read.read().
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
1b90ea7d3a LibPDF: Augment Type11FontProgram with Type2 capabilities
The Type1FontProgram logic was based on the Adobe Type 1 Font Format; in
particular, it implemented the CharStrings Dictionary section
(charstring decoding, and most commands). In the case of Type1, these
charstrings are read from a PS1 diciontary, with one entry per character
in the font's charset. This has served us well for Type1 font rendering.

When implementing Type1C font rendering, this wasn't enough. Type1C PDF
fonts are specified in embedded CFF (Compact Font File) streams, which
also contain a charstring dictionary with an entry for each character in
the font's charset. These entries can be slightly different from those
in a PS1 Font Program though: depending on a flag in the CFF, the
entries will be encoded either in the original charstring format from
the Adobe Type 1 Font Format, or in the "Type 2 Charstring Format"
(Adobe's Technical Note #1577). This new format is for the most part a
super-set of the original, with small differences, all in the name of
making the representation as compact as possible:

 * The glyph's width is not specified via a separate command; instead
   it's an optional additional argument to the first command of the
   charstring stream (and even then, it's only the *difference* to a
   nominal character width specified in the CFF).
 * The interpretation of a 4-byte number is different from Type 1: in
   Type 1 this is a 4-byte unsigned integer, whereas in Type 1 it's a
   fixed decimal with 16 bits of fractional part.
 * Many commands accept a variable set of arguments, so they can draw
   more than one line/curve on a single go. These are all
   retro-compatible with Type 1's commands.

All these changes are implemented in this patch in a
backwards-compatible way. To ensure Type 1/2 behavior is accessed, a new
parameter indicates which behavior is desired when decoding the
charstring stream.

I also took the chance to centralise some logic that was previously
duplicated across the parse_glyph function. Common lambdas capture the
logic for moving to, or drawing a line/curve to a given point and
updating the glyph state. Similarly, some command logic, including
reading parameters, are shared by several commands. Finally, I've
re-organised the cases in the main switch to group together related
commands.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
f06de0fa07 LibPDF: Remove unused member 2023-01-25 15:40:11 +01:00
Rodrigo Tobar
416585f75a LibPDF: Add new Type1FontProgram base class
We are planning to add support for CFF fonts to read Type1 fonts, and
therefore much of the logic already found in PS1FontProgram will be
useful for representing the Type1 fonts read from CFF.

This commit moves the PS1-independent bits of PS1FontProgram into a new
Type1FontProgram base class that can be used as the base for CFF-based
Type1 fonts in the future. The Type1Font class uses this new type now
instead of storing a PS1FontProgram pointer. While doing this
refactoring I also took care of making some minor adjustments to the
PS1FontProgram API, namely:

 * Its create() method is static and returns a
   NonnullRefPtr<Type1FontProgram>.
 * Many (all?) of the parse_* methods are now static.
 * Added const where possible.

Notably, the Type1FontProgram also contains at the moment the code that
parses the CharString data from the PS1 program. This logic is very
similar in CFF files, so after some minor adjustments later on it should
be possible to reuse most of it.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
e751ec2089 LibPDF: Avoid reading fields from moved-from Data object
This might not be an issue at the moment, but moved-from objects are
usually in a unspecifed but valid state, meaning that we shouldn't read
from them.
2023-01-25 15:40:11 +01:00
Rodrigo Tobar
bfeca4ebb3 LibPDF: Record base font name read from document
This will be useful for debugging, or if we later on want to show all
the fonts found in the document in an organised manner.
2023-01-25 15:40:11 +01:00
Tim Schumacher
b1bfeb391e LibPDF: Use Core::Stream to parse the page offset hint table 2023-01-21 00:45:33 +00:00
Liav A
57e19a7e56 LibGfx: Re-structure the whole initialization pattern for image decoders
When trying to figure out the correct implementation, we now have a very
strong distinction on plugins that are well suited for sniffing, and
plugins that need a MIME type to be chosen.

Instead of having multiple calls to non-static virtual sniff methods for
each Image decoding plugin, we have 2 static methods for each
implementation:
1. The sniff method, which in contrast to the old method, gets a
    ReadonlyBytes parameter and ensures we can figure out the result
    with zero heap allocations for most implementations.
2. The create method, which just creates a new instance so we don't
    expose the constructor to everyone anymore.

In addition to that, we have a new virtual method called initialize,
which has a per-implementation initialization pattern to actually ensure
each implementation can construct a decoder object, and then have a
correct context being applied to it for the actual decoding.
2023-01-20 15:13:31 +00:00
Timothy Flynn
f3db548a3d AK+Everywhere: Rename FlyString to DeprecatedFlyString
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
2023-01-09 23:00:24 +00:00
Julian Offenhäuser
2a70ea3ee7 LibPDF: Propagate errors in PDFFont::create() 2023-01-09 22:54:36 +00:00
Julian Offenhäuser
ac31b1bda3 LibPDF: Make glyphs from standard 14 fonts show up in Type1Font
Previously, we would assume that all standard 14 fonts use a
TrueTypeFont dictionary. Now we render them in Type1Font as well,
given that it doesn't contain a PostScript font program.
2023-01-09 22:54:36 +00:00
Julian Offenhäuser
a37f3390dc LibPDF: Allow numbers to start with whitespace 2023-01-09 22:54:36 +00:00
Rodrigo Tobar
a5620fd41f LibPDF: Load destinations from Catalogue -> Names -> Dests name tree
PDF allows for named destinations to be provided as string. These can be
either found in the Dests dictionary in the document catalogue (as
already implemented), or in the Name Tree specified by the Dests key in
the Names dictionary of the document catalogue (missing).

This commit adds this missing case. Once the named destination is found
in the name tree, its value is interpreted just like in the first case,
so a new utility method encapsulates the common behavior.
2023-01-06 18:06:41 +01:00
Rodrigo Tobar
5420261347 LibPDF: Implement name tree lookups
Name Trees are hierarchical, string-keyed, sorted-by-key dictionary
structures in PDF where each node (except the root) specifies the bounds
of the values it holds, and either its kids (more nodes) or the
key/value pairs it contains.

This commit implements a series of lookup calls for finding a key in
such name trees. This implementation follows the tree as needed on each
lookup, but if that becomes inefficient in the long run we can switch to
creating a HashMap with all the contents, which as a drawback will
require more memory.
2023-01-06 18:06:41 +01:00
Rodrigo Tobar
8c79f0e0cf LibPDF: Add more utility methods to {Dict,Array}Object
Being both of them containers, these classes already offered a set of
methods to retrieve an inner element by key or index, respectively, with
different methods for the different subtypes of the PDF::Object type
returning the element cast to the correct type pointer. On top of
that, DictObject offered an additional method to obtain an element as an
Object pointer.

While these methods were useful, they have some shortcomings:

 * They always take a Document pointer to first perform an object
   resolution, in case the element is a Reference. This is not always
   necessary though, as there are values that are always meant to be
   immediate, and hence the resolution lookup adds overhead.
 * There was no easy way to get an individual Object element from an
   ArrayObject like there is in DictObject. This makes it difficult to
   obtain such values, as one first needs to call dict.get() to get a
   Value, then cast it manually to a NonnullRefPtr<Object>.

This commit fixes these two issues by:

 * Adding a new method that returns an Object for a given index.
 * Adding overloads for this new method, and all the existing methods
   described above, that do *not* take a Document, and therefore do
   *not* perform an object resolution lookup.
2023-01-06 18:06:41 +01:00
Rodrigo Tobar
0e1c858f90 LibPDF: Move casting code to its own cast_to function
This functionality was previously part of the resolve_to() Document
method, and thus only available only when resolving objects through the
Document class. There are many use cases where this casting can be used,
but no resolution is needed.

This commit moves this functionality into a new cast_to function, and
makes the resolve_to function call it internally. With this new function
in place we can now offer new versions of DictObject::get_* and
ArrayObject::get_*_at that don't perform Document resolution
unnecessarily when not required.
2023-01-06 18:06:41 +01:00
Rodrigo Tobar
f510b2b180 LibPDF: Support null destination parameters
Destination arrays contain a page number, a mode name, and parameters
specific to that mode. In many cases these parameters can be set to
"null", which our code wasn't taking into consideration.

This commit parses these parameters taking into account whether they are
null or actual numbers, and stores them as Optional<float> instead of
plain floats. The parameters are not yet used anywhere else other than
when formatting a Destination object, so the change is fairly small.
2023-01-06 18:06:41 +01:00
Rodrigo Tobar
2485c500a3 LibPDF: Fix Destination formatting
This was not correctly written, and thus printed confusing output.
2023-01-06 18:06:41 +01:00
MacDue
eeb6072f15 LibGfx+LibPDF: Apply subpixel offset in affine transformation 2023-01-05 13:50:26 +01:00
MacDue
91db49f7b3 LibPDF: Use subpixel accurate text rendering
This just enables the new tricks from LibGfx with the same nice
improvements :^)
2023-01-05 12:09:35 +01:00
Simon Danner
5fa8068580 LibPDF: Fix calculation of encryption key
Before this patch, the generation of the encryption key was not working
correctly since the lifetime of the underlying data was too short,
same inputs would give random encryption keys.

Fixes #16668
2023-01-04 11:10:37 -05:00
Ben Wiederhake
c2a900b853 Everywhere: Remove unused includes of AK/StdLibExtras.h
These instances were detected by searching for files that include
AK/StdLibExtras.h, but don't match the regex:

\\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for
ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP
tr|round_up_to_power_of_two|swap|to_underlying)\\b

(Without the linebreaks.)

This regex is pessimistic, so there might be more files that don't
actually use any "extra stdlib" functions.

In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
2023-01-02 20:27:20 -05:00
Ben Wiederhake
b83cb09db1 Everywhere: Fix badly-formatted includes
In 7c5e30daaa, the focus was "only" on
Userland/Libraries/, whereas this commit cleans up the remaining
headers in the repo, and any new badly-formatted include.
2023-01-02 11:06:15 -05:00
Andreas Kling
f982400063 LibGfx: Rename TTF/TrueType to OpenType
OpenType is the backwards-compatible successor to TrueType, and the
format we're actually parsing in LibGfx. So let's call it that.
2022-12-21 08:44:22 +01:00
Rodrigo Tobar
bb48a67f84 LibPDF: Reset encryption key on failed user password attempt
When an attempt is made to provide the user password to a
SecurityHandler a user gets back a boolean result indicating success or
failure on the attempt. However, the SecurityHandler is left in a state
where it thinks it has a user password, regardless of the outcome of the
attempt. This confuses the rest of the system, which continues as if the
provided password is correct, resulting in garbled content.

This commit fixes the situation by resetting the internal fields holding
the encryption key (which is used to determine whether a user password
has been successfully provided) in case of a failed attempt.
2022-12-20 10:28:58 +01:00
Rodrigo Tobar
dc6a11cf6b LibPDF: Treat Encyption's Length item as optional
With the StandardSecurityHandler the Length item in the Encryption
dictionary is optional, and needs to be given only if the encryption
algorithm (V) is other than 1; otherwise we can assume a length of 40
bits for the encryption key.
2022-12-20 10:28:58 +01:00
Rodrigo Tobar
6df9aa8f2c LibPDF: Store page number, not Value, in OutlineItem
The Value previously stored corresponded to a Reference to a Page object
in the PDF document. This isn't useful information, since what we want
to display at the end of the day is the page an outline item refers to.

This commit changes the page member on OutlineItem to be a Optional<u32>
(some destinations don't necessarily refer to a Page), which we resolve
while building OutlineItems.
2022-12-17 19:40:52 +01:00
Rodrigo Tobar
3db6af6360 LibPDF: Keep track of OutlineItem parents
While OutlineItem had a parent field, it was never populated nor used.
This commit populates it when possible (no parent means the OutlineItem
is a top-level item).
2022-12-17 19:40:52 +01:00
Rodrigo Tobar
c4bc27f274 LibPDF: Don't abort on unsupported drawing operations
Instead of calling TODO(), which will abort the program, we now return
an Error specifying that we haven't implemented the drawing operation
yet. This will now nicely trickle up all the way through to the
PDFViewer, which will then notify its clients about the problem.
2022-12-16 10:04:23 +01:00
Rodrigo Tobar
e87fecf710 LibPDF: Switch to best-effort PDF rendering
The current rendering routine aborts as soon as an error is found during
rendering, which potentially severely limits the contents we show on
screen. Moreover, whenever an error happens the PDFViewer widget shows
an error dialog, and doesn't display the bitmap that has been painted so
far.

This commit improves the situation in both fronts, implementing
rendering now with a best-effort approach. Firstly, execution of
operations isn't halted after an operand results in an error, but
instead execution of all operations is always attempted, and all
collected errors are returned in bulk. Secondly, PDFViewer now always
displays the resulting bitmap, regardless of error being produced or
not. To communicate errors, an on_render_errors callback has been added
so clients can subscribe to these events and handle them as appropriate.
2022-12-16 10:04:23 +01:00
Rodrigo Tobar
96fb4b20f1 LibPDF: Add Errors class that accumulate multiple errors
This will be used to perform a best-effort rendering, where an error in
rendering won't abort the whole rendering operation, but instead will be
stored for later reference while rendering continues.
2022-12-16 10:04:23 +01:00
Rodrigo Tobar
d9718064d1 LibPDF: Add support for multi-line comments
The code parsing comments parsed only a single line of comments, but
callers assumed they parsed all comments that appeared contiguously in a
block. The latter is an easier to understand API, so this commit changes
the parse_comment function to parse entire blocks of comments instead of
single lines.
2022-12-16 10:04:23 +01:00
Rodrigo Tobar
a1af79dca6 LibPDF: Follow a FontFile's Length values
These can be references (at least from what I've found in some
documents), so we want to resolve them before using them.
2022-12-16 01:24:43 -07:00
Rodrigo Tobar
cb1a7cc721 LibPDF: Simplify outline construction
While the Outline Items making up the document's Outline have all sorts
of cross-references (parent, first/last chlid, next/previous sibling,
etc), not all documents out there have fully-consistent references. Our
implementation already discarded some of that information too (e.g.,
/Parent and /Prev were never read), and trusted that /First and /Next
were good enough to traverse the whole hierarchy.

Where the current implementation failed was in assuming that /Last was
also a good source of information. There are documents out there were
/Last also points to dead ends, and were therefore causing a crash when
we verified that the last child found on a chain was the /Last child
declared by the parent. To fix this I'm simply removing the check, and
simplifying the function call to remove any references to /Last. This
way we affirm our commitment to /First and /Next as the main sources of
information.
2022-12-16 01:24:43 -07:00
Rodrigo Tobar
41bd304a7f LibPDF: Ignore seac PS1 commands for now
This command is meant to print an Standard Encoding Accented Character.
It's not critical to implement it yet, but if we want to render more
documents we need to handle the instruction, even if simply ignore it.
2022-12-16 01:24:43 -07:00
Ali Mohammad Pur
f96a3c002a Everywhere: Stop shoving things into ::std and mentioning them as such
Note that this still keeps the old behaviour of putting things in std by
default on serenity so the tools can be happy, but if USING_AK_GLOBALLY
is unset, AK behaves like a good citizen and doesn't try to put things
in the ::std namespace.

std::nothrow_t and its friends get to stay because I'm being told that
compilers assume things about them and I can't yeet them into a
different namespace...for now.
2022-12-14 11:44:32 +01:00
Rodrigo Tobar
adc45635e9 LibPDF: Add initial image display support
After adding support for XObject Form rendering, the next was to display
XObject images. This commit adds this initial support,

Images come in many shapes and forms: encodings: color spaces, bits per
component, width, height, etc. This initial support is constrained to
the color spaces we currently support, to images that use 8 bits per
component, to images that do *not* use the JPXDecode filter, and that
are not Masks. There are surely other constraints that aren't considered
in this initial support, so expect breakage here and there.

In addition to supporting images, we also support applying an alpha mask
(SMask) on them. Additionally, a new rendering preference allows to skip
image loading and rendering altogether, instead showing an empty
rectangle as a placeholder (useful for when actual images are not
supported). Since RenderingPreferences is becoming a bit more complex,
we add a hash option that will allow us to keep track of different
preferences (e.g., in a HashMap).
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
2331fe5e68 LibPDF: Add first interpolation methods
Interpolation is needed in more than one place, and I couldn't find a
central place where I could borrow a readily available interpolation
routine, so I've implemented the first simple interpolation object. More
will follow for more complex scenarios.
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
17676705a5 LibPDF: Add facility to obtain Vector<float> from ArrayObject
Arrays of float numbers are common in many PDF objects, and thus to
avoid code repetition I'm introducing a new method to ArrayObject that
will return exactly that.
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
a63b93f724 LibPDF: Add new Error::Type for unsupported rendering features 2022-12-10 10:49:03 +01:00
Rodrigo Tobar
26f8c0b76c LibPDF: Add more knowledge to ColorSpaces classes
ColorSpaces now can tell users how many components they expect, and the
default decode array that should be used when converting unit bit
sequences into color space component input values during image
rendering.
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
ba16310739 LibPDF: Refactor parsing of ColorSpaces
ColorSpaces can be specified in two ways: with a stream as operands of
the color space operations (CS/cs), or as a separate PDF object, which
is then referred to by other means (e.g., from Image XObjects and other
entities). These two modes of addressing a ColorSpace are slightly
different and need to be addressed separately. However, the current
implementation embedded the full logic of the first case in the routine
that created ColorSpace objects.

This commit refactors the creation of ColorSpace to support both cases.
First, a new ColorSpaceFamily class encapsulates the static aspects of a
family, like its name or whether color space construction never requires
parameters. Then we define the supported ColorSpaceFamily objects.

On top of this also sit a breakage on how ColorSpaces are created. Two
methods are now offered: one only providing construction of no-argument
color spaces (and thus taking a simple name), and another taking an
ArrayObject, hence used to create ColorSpaces requiring arguments.

Finally, on top of *that* two ways to get a color space in the Renderer
are made available: the first creates a ColorSpace with a name and a
Resources dictionary, and another takes an Object. These model the two
addressing modes described above.
2022-12-10 10:49:03 +01:00
Rodrigo Tobar
287bb0feac LibPDF: Return results directly and avoid unpacking+packing 2022-12-10 10:49:03 +01:00
Andreas Kling
d6a3be1615 LibPDF: Add missing character quirk for WinAnsiEncoding fonts
Fonts with the encoding name "WinAnsiEncoding" should render missing
characters above character code 040 (octal) as a "bullet" character.

This patch adds Encoding::should_map_to_bullet(char_code) which is then
called by char_code_to_code_point() to check if the given char code
should be displayed as a bullet instead.

I didn't have a good way to test this, so I've only verified that it
works by manually overriding inputs to the function during the rendering
stage.

This takes care of a FIXME in the Annex D part of the PDF specification.
2022-12-08 09:54:20 +01:00
MacDue
7be0b27dd3 Meta+Userland: Pass Gfx::IntPoint by value
This is just two ints or 8 bytes or the size of the reference on
x86_64 or AArch64.
2022-12-07 11:48:27 +01:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Linus Groh
d26aabff04 Everywhere: Run clang-format 2022-12-03 23:52:23 +00:00
Rodrigo Tobar
cb3e05f476 LibPDF: Add initial implementation of XObject rendering
This implementation currently handles Form XObjects only, skipping
image XObjects. When rendering an XObject, its resources are passed to
the underlying operations so they use those instead of the Page's.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
b3007c17bd LibPDF: Allow operators to receive optional resources
Operators usually assume that the resources its operations will require
will be the Page's. This assumption breaks however when XObjects with
their own resources come into the picture (and maybe other cases too).
In that case, the XObject's resources take precedence, but they should
also contain the Page's resources. Because of this, one can safely use
the XObject resources alone when given, and default to the Page's if
not.

This commit adds all operator calls an extra argument with optional
resources, which will be fed by XObjects as necessary.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
e58165ed7a LibPDF: Render cubic bezier curves
The implementation of bezier curves already exists on Gfx, so
implementing the PDF rendering of this command is trivial.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
fe5c823989 LibPDF: Communicate resources to ColorSpace, not Page
Resources can come from other sources (e.g., XObjects), and since the
only attribute we are reading from Page are its resources it makes sense
to receive resources instead. That way we'll be able to pass down
arbitrary resources that are not necessarily declared at the page level.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
164422f8d8 LibPDF: Add further common names 2022-11-30 14:51:14 +01:00
Rodrigo Tobar
5277ad1d6d LibPDF: Implement Run Length Decoding
This is a simple decoding process that is needed by some streams.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
e776048309 LibPDF: Ignore whitespace on hex strings
The spec says that whitespaces should be ignored, but we weren't. PDFs
with whitespaces in their hex strings were thus crushing the parser.
2022-11-30 14:51:14 +01:00
Rodrigo Tobar
d04613d252 LibPDF: Fix path coordinates calculation
Paths rendering was buggy because the map() function that translates
points from user space to bitmap space applied the vertical flip
conversion that the current transformation matrix already considers;
Hence, all paths were upside down. The only exception was the "re"
instruction, which manually adjusted the Y coordinate of its points to
be flipped again (and had a FIXME saying that this should be
unnecessary).

This commit fixes the map() function that maps userspace points to
bitmap coordinates. The "re" operator implementation has also been
simplified creating a rectangle first and mapping *that* instead of
mapping each point individually.
2022-11-26 08:56:35 +01:00
Rodrigo Tobar
e92ec26771 LibPDF: Introduce rendering preferences and show clipping paths
A new struct allows users to specify specific rendering preferences that
the Renderer class might use to paint some Document elements onto the
target bitmap. The first toggle allows rendering (or not) the clipping
paths on a page, which is useful for debugging.
2022-11-25 23:03:24 +01:00
Rodrigo Tobar
a1e36e8f78 LibPDF: Improve path clipping support
The existing path clipping support was broken, as it performed the
clipping operation as soon as the path clipping commands (W/W*) were
received. The correct behavior is to keep a clipping path in the
graphic state, *intersect* that with the current path upon receiving
W/W*, and apply the clipping when performing painting operations. On top
of that, the intersection happening at W/W* time does not affect the
painting operation happening on the current on-build path, but takes
effect only after the current path is cleared; therefore a current and a
next clipping path need to be kept track of.

Path clipping is not yet supported on the Painter class, nor is path
intersection. We thus continue using the same simplified bounding box
approach to calculate clipping paths.

Since now we are dealing with more rectangles-as-path code, I've made
helper functions to build a rectangle path and reuse it as needed.
2022-11-25 23:03:24 +01:00
Julian Offenhäuser
d1bc89e30b LibPDF: Try to repair XRef tables with broken indices
An XRef table usually starts with an object number of zero. While it
could technically start at any other number, this is a tell-tale sign
of a broken table.

For the "broken" documents I encountered, this always meant that some
objects must have been removed from the start of the table, without
updating the following indices. When this is the case, the document is
not able to be read normally.

However, most other PDF parsers seem to know of this quirk and fix the
XRef table automatically.

Likewise, we now check for this exact case, and if it matches up with
what we expect, we update the XRef table such that all object numbers
match the actual objects found in the file again.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
e06a065594 LibPDF: Override Type 1 character mappings by encoding in font dict
If the font dictionary includes an "Encoding" entry, it will be used
instead of the PS1FontProgram's built-in encoding.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
65ff80e8a5 LibPDF: Add alternative names to is_standard_latin_font() helper 2022-11-25 22:44:47 +01:00
Julian Offenhäuser
9cb3b23377 LibPDF: Move all font handling to Type1Font and TrueTypeFont classes
It was previously the job of the renderer to create fonts, load
replacements for the standard 14 fonts and to pass the font size back
to the PDFFont when asking for glyph widths.

Now, the renderer tells the font its size at creation, as it doesn't
change throughout the life of the font. The PDFFont itself is now
responsible to decide whether or not it needs to use a replacement
font, which still is Liberation Serif for now.

This means that we can now render embedded TrueType fonts as well :^)

It also makes the renderer's job much more simple and leads to a much
cleaner API design.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
e748a94f80 LibPDF: Introduce loading of common font data in PDFFont base class
This font data is shared between Type 1 and TrueType fonts, which is
why we can now load it in the base class that they both use.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
dd82a026f8 LibPDF: Pass PDFFont::draw_glyph() a char code instead of a code point
We would previously pass this function a unicode code point, which is
not actually what we want here.

Instead, we want the "raw" code point, with the font itself deciding
whether or not it needs to be re-mapped.

This same mistake in terminology applied to PS1FontProgram.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
8532ca1b57 LibPDF: Convert dash pattern array elements to integers if necessary
They may be floats instead.
2022-11-25 22:44:47 +01:00
Julian Offenhäuser
0bc3333740 LibPDF: Parse integer numbers with atoi() instead of strtof()
strtof() produces rounding errors for very large numbers, which we
don't want for integers, as they may have to be precise.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
c2ad29c85f LibPDF: Implement png predictor decoding for flate filter
For flate and lzw filters, the data can be transformed by this
predictor function to make it compress better. For us this means that
we have to undo this step in order to get the right result.

Although this feature is meant for images, I found at least a few
documents that use it all over the place, making this step very
important.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
4bd79338e8 LibPDF: Fix off-by-one error in Reader::remaining() 2022-11-19 15:42:08 +01:00
Julian Offenhäuser
4b1a72ff7a LibPDF: Fix loop condition in parse_xref_stream()
We previously compared two unrelated values to determine if we parsed
the xref table to completion. We now check if we added every subsection
instead, and double check to make sure we never read past the end.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
a17a23a3f0 LibPDF: Make some variable names in parse_xref_stream() more clear
I found these to be a bit misleading.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
f926dfe36b LibPDF: Implement the DCT filter
This filter basically tells us that we are dealing with a JPEG.
Note that by serializing the resulting image we assume that this filter
is the last one in the chain, everything else would be highly unlikely.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
baaf42360e LibPDF: Derive alternate ICC color space from the number of components
We currently don't support ICC color spaces and fall back to a "simple"
one instead.

If no alternative is specified however, we are allowed to pick the
closest match based on the number of color components.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
16ed407c01 LibPDF: Support cascading stream filters
You can specify multiple filters as an array, where each one is fed the
output of the one before it.
2022-11-19 15:42:08 +01:00
Julian Offenhäuser
becd648a78 LibPDF: Parse hexadecimal values in name objects correctly 2022-11-19 15:42:08 +01:00
Julian Offenhäuser
7c4f5b58be LibPDF: Use Gfx::PathRasterizer for Adobe Type 1 font rendering
This gives much better visual results than painting the path directly.
It also has the nice side effect that Type 1 fonts will now look much
more similar to TrueType fonts, which use the same class :^)

In addition, we can now cache glyph bitmaps for repeated use.
2022-11-19 11:04:34 +01:00
Tim Schumacher
ce2f1b845f Everywhere: Mark dependencies of most targets as PRIVATE
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.

Also included are changes to readd now missing dependencies to tools
that actually need them.
2022-11-01 14:49:09 +00:00
Tim Schumacher
7834e26ddb Everywhere: Explicitly link all binaries against the LibC target
Even though the toolchain implicitly links against -lc, it does not know
where it should get LibC from except for the sysroot. In the case of
Clang this causes it to pick up the LibC stub instead, which might be
slightly outdated and feature missing symbols.

This is currently not an issue that manifests because we pass through
the dependency on LibC and other libraries by accident, which causes
CMake to link against the LibC target (instead of just the library),
and thus points the linker at the build output directory.

Since we are looking to fix that in the upcoming commits, let's make
sure that everything will still be able to find the proper LibC first.
2022-11-01 14:49:09 +00:00
Julian Offenhäuser
b14f0950a5 LibPDF: Add very basic support for Adobe Type 1 font rendering
Previously we would draw all text, no matter what font type, as
Liberation Serif, which results in things like ugly character spacing.

We now have partial support for drawing Type 1 glyphs, which are part of
a PostScript font program. We completely ignore hinting for now, which
results in ugly looking characters at low resolutions, but gain support
for a large number of typefaces, including most of the default fonts
used in TeX.
2022-10-16 17:44:54 +02:00
Julian Offenhäuser
e6f29302a7 LibPDF: Add glyph drawing and type info methods to PDFFont
A PDFFont can now be asked for its specific type and whether it is part
of the standard 14 fonts. It now also contains a method to draw a
glyph, which is stubbed-out for now.

This will be useful for the renderer to take into consideration when
drawing text, since we don't include replacements for the standard set
of fonts yet, but still want to make use of embedded fonts when
available.
2022-10-16 17:44:54 +02:00
Julian Offenhäuser
36f83cecab LibPDF: Allow page objects to inherit the MediaBox and Resources entries 2022-10-16 17:44:54 +02:00
Julian Offenhäuser
2f71e0f09a LibPDF: Allow text operator sequences to start with whitespace 2022-10-16 17:44:54 +02:00
Julian Offenhäuser
7ecd420b03 LibPDF: Parse floating point numbers that omit a leading zero correctly 2022-10-16 17:44:54 +02:00
Ben Wiederhake
a99cd09891 Libraries: Add missing includes, add namespace qualifiers
This remained undetected for a long time as HeaderCheck is disabled by
default. This commit makes the following file compile again:

    // file: compile_me.cpp
    #include <LibDNS/Question.h>
    // That's it, this was enough to cause a compilation error.

Likewise for most other files touched by this commit.
2022-09-18 13:27:24 -04:00