Commit Graph

271 Commits

Author SHA1 Message Date
Ben Wiederhake
f866c80222 LibPDF: Avoid unnecessary HashMap copy, mark other copies 2023-05-19 22:33:57 +02:00
Ben Wiederhake
da394abe04 LibGfx+Fuzz: Convert ImageDecoder::initialize to ErrorOr
This prevents callers from accidentally discarding the result of
initialize(), which was the root cause of this OSS Fuzz bug:

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=55896&q=label%3AProj-serenity&sort=summary
2023-05-12 09:40:24 +01:00
Nico Weber
f56b897622 Everywhere: Fix a few typos
Some even user-visible!
2023-04-12 19:37:35 +02:00
Ben Wiederhake
560133a0c6 Everywhere: Remove unused DeprecatedString includes 2023-04-09 22:00:54 +02:00
Julian Offenhäuser
bdd5f36121 LibPDF: Load replacements for TrueTypeFonts without an embedded font
This previously only happened for Type 1 fonts.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
5deac3a7f5 LibPDF: Actually return an error when failing to load replacement fonts 2023-03-25 16:27:30 -06:00
Julian Offenhäuser
fec7ccf020 LibPDF: Ask OpenType font programs for glyph widths if needed
If the font dictionary didn't specify custom glyph widths, we would fall
back to the specified "missing width" (or 0 in most cases!), which meant
that we would draw glyphs on top of each other in a lot of cases, namely
for TrueTypeFonts or standard Type1Fonts with an OpenType fallback.

What we actually want to do in this case is ask the OpenType font for
the correct width.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
2b3a41be74 LibPDF: Remove the subroutine length limit for PS1 font programs
A limit of 1024 subroutines seemed like a sensible choice, but some
fonts actually do exceed it. We will now only assert that the specified
amount is positive.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
4ec01669fc LibPDF: Scale vector paths with the view
This ensures that lines have the correct size at every scale factor.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
731676c041 LibPDF: Accept floats as line dash pattern phases 2023-03-25 16:27:30 -06:00
Julian Offenhäuser
95a804bc4e LibPDF: Allow the page rotation to be inherited 2023-03-25 16:27:30 -06:00
Julian Offenhäuser
b90a794d78 LibPDF: Allow pages with no specified contents
The contents object may be omitted as per spec, which will just leave
the page blank.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
fde990ead8 LibPDF: Allow optional inheritable page attributes
Previously, get_inheritable_object would always try to find the object
and throw an error if it couldn't. The spec tells us that some page
attributes, like CropBox, are optional but also inheritable. Others,
like the media box and resources, are technically required by the spec,
but omitted by some documents.

In both cases, we are now able to search for inheritable objects and
find a suitable replacement if there wasn't one.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
320f5f91ab LibPDF: Ignore whitespace in the ASCII hex filter
The spec tells us that any amount of whitespace may appear between the
hex digits and that it should just be ignored.
2023-03-25 16:27:30 -06:00
Julian Offenhäuser
3400779047 LibPDF: Pass the right point width to the font loader in TrueTypeFont 2023-03-22 09:04:00 +01:00
Julian Offenhäuser
fd78875662 LibPDF: Fix navigate_to_before_eof_marker() for PDFs not ending in EOL
The way this was factored before, we would miss the %%EOF marker if it
didn't have a valid end-of-line sequence after it.
2023-03-22 09:04:00 +01:00
Julian Offenhäuser
fca9da4191 LibPDF: Don't consume anything other than EOL in Reader::consume_eol()
This was previously a slightly confusing API. Even when there was no EOL
marker at the current location, we would still consume one byte.
It will now consume either EOL or nothing at all.
2023-03-22 09:04:00 +01:00
Julian Offenhäuser
93062e2b78 LibPDF: Be more cautious of errors when looking for linearization dict
We would previously assume that, following the header, there must be a
valid PDF object that could be a linearization dict.

However, if the file is not linearized, this is not necessarily true.
We now try to detect if there even is an object, and don't treat
parsing errors as fatal.
2023-03-22 09:04:00 +01:00
Julian Offenhäuser
6c0f7d83bb LibPDF: Don't treat a broken document header as a fatal error
As the current goal is to make our best effort loading documents, we
might as well ignore a broken header and power through, giving the user
a warning.
2023-03-22 09:04:00 +01:00
Lucas CHOLLET
496b7ffb2b LibGfx: Move all image loaders and writers to a subdirectory 2023-03-21 22:39:25 +01:00
Andreas Kling
689ca370d4 Everywhere: Remove NonnullRefPtr.h includes 2023-03-06 23:46:35 +01:00
Andreas Kling
8a48246ed1 Everywhere: Stop using NonnullRefPtrVector
This class had slightly confusing semantics and the added weirdness
doesn't seem worth it just so we can say "." instead of "->" when
iterating over a vector of NNRPs.

This patch replaces NonnullRefPtrVector<T> with Vector<NNRP<T>>.
2023-03-06 23:46:35 +01:00
Rodrigo Tobar
4a20751ff6 LibPDF: Detect CFF encodings with supplements
These are not yet actually parsed, but detecting them means we at least
don't fail to understand the *actual* format value, which was causing
some CFF fonts to fail to load.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
9bca62c5fa LibPDF: Increase argument stack for Type1FontPrograms
Type1 imposes a stack limit of 24 elements, but Type2 has a limit of 48.
We are better off relaxing the limit of the former in favour of properly
supporting the latter.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
de5e7b487c LibPDF: Improve Type2 hint counting
There were two issues with how we counted hints with Type2 CharString
commands: the first was that we assumed a single hint per command, even
though there are commands that accept multiple hints thanks to taking a
variable number of operands; and secondly, the hintmask/ctrlmask
commands can also take operands (i.e., hints) themselves in certain
situations.

This commit fixes these two issues by correctly counting hints in both
cases. This in turn fixes cases when there were more than 8 hints in
total, therefore a hintmask/ctrlmask command needed to read more than
one byte past the operator itself.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
bf61f94413 LibPDF: Don't crash when a font hasn't been loaded yet
This could happen because there was a problem while loading the first
font in the document.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
79b4293687 LibPDF: Prevent crashes when loading XObject streams
These streams might need a Filter that isn't implemented yet, and thus
cannot be blindly MUST()-ed.
2023-03-02 12:18:53 +01:00
Rodrigo Tobar
2a8e0da71c LibPDF: Improve error support for Filter class
The Filter class had a few TODO()s that resulted in crashes at runtime.
Since we now have a better way to report errors back to the user let's
use that instead.
2023-03-02 12:18:53 +01:00
MacDue
6cf8eeb7a4 LibGfx: Return bool not ErrorOr<bool> from ImageDecoderPlugin::sniff()
Nobody made use of the ErrorOr return value and it just added more
chance of confusion, since it was not clear if failing to sniff an
image should return an error or false. The answer was false, if you
returned Error you'd crash the ImageDecoder.
2023-02-26 19:43:17 +01:00
Rodrigo Tobar
cb04e4e9da LibPDF: Refactor *Font classes
The PDFFont class hierarchy was very simple (a top-level PDFFont class,
followed by all the children classes that derived directly from it).
While this design was good enough for some things, it didn't correctly
model the actual organization of font types:

 * PDF fonts are first divided between "simple" and "composite" fonts.
   The latter is the Type0 font, while the rest are all simple.
 * PDF fonts yield a glyph per "character code". Simple fonts char codes
   are always 1 byte long, while Type0 char codes are of variable size.

To this effect, this commit changes the hierarchy of Font classes,
introducing a new SimpleFont class, deriving from PDFFont, and acting as
the parent of Type1Font and TrueTypeFont, while Type0 still derives from
PDFFont directly. This distinction allows us now to:

 * Model string rendering differently from simple and composite fonts:
   PDFFont now offers a generic draw_string method that takes a whole
   string to be rendered instead of a single char code. SimpleFont
   implements this as a loop over individual bytes of the string, with
   T1 and TT implementing draw_glyph for drawing a single char code.
 * Some common fields between T1 and TT fonts now live under SimpleFont
   instead of under PDFfont, where they previously resided.
 * Some other interfaces specific to SimpleFont have been cleaned up,
   with u16/u32 not appearing on these classes (or in PDFFont) anymore.
 * Type0Font's rendering still remains unimplemented.

As part of this exercise I also took the chance to perform the following
cleanups and restructurings:

 * Refactored the creation and initialisation of fonts. They are all
   centrally created at PDFFont::create, with a virtual "initialize"
   method that allows them to initialise their inner members in the
   correct order (parent first, child later) after creation.
 * Removed duplicated code.
 * Cleaned up some public interfaces: receive const refs, removed
   unnecessary ctro/dtors, etc.
 * Slightly changed how Type1 and TrueType fonts are implemented: if
   there's an embedded font that takes priority, otherwise we always
   look for a replacement.
 * This means we don't do anything special for the standard fonts. The
   only behavior previously associated to standard fonts was choosing an
   encoding, and even that was under questioning.
2023-02-24 20:16:50 +01:00
Rodrigo Tobar
e8f1a2ef02 LibPDF: Add new error construction functions
These should make it easier to create specific errors, specially when
wanting to create a formatted message.
2023-02-24 20:16:50 +01:00
Rodrigo Tobar
db9fa7ff07 LibPDF: Allow show_text to return errors
Errors can (and do) occur when trying to render text, and so far we've
silently ignored them, making us think that all is well when it isn't.
Letting show_text return errors will allow us to inform the user about
these errors instead of having to hiding them.
2023-02-24 20:16:50 +01:00
Andreas Kling
39a1702c99 LibPDF: Make Object::cast<T>() non-const
This was only ever used to cast non-const objects to other non-const
object types.
2023-02-21 00:54:04 +01:00
Sam Atkins
2db168acc1 LibTextCodec+Everywhere: Port Decoders to new Strings 2023-02-19 17:15:47 +01:00
Lucas CHOLLET
856d0202f2 LibGfx: Rename JPGLoader to JPEGLoader
The patch also contains modifications on several classes, functions or
files that are related to the `JPGLoader`.

Renaming include:
 - JPGLoader{.h, .cpp}
 - JPGImageDecoderPlugin
 - JPGLoadingContext
 - JPG_DEBUG
 - decode_jpg
 - FuzzJPGLoader.cpp
 - Few string literals or texts
2023-02-18 23:56:24 +01:00
Sam Atkins
d6075ef5b5 LibTextCodec+Everywhere: Make TextCodec::decoder_for() take a StringView
We don't need a full String/DeprecatedString inside this function, so we
might as well not force users to create one.
2023-02-15 12:48:26 -05:00
Rodrigo Tobar
c4507bb56e LibPDF: Add more built-in SIDs
The first iteration has enough SIDs to display simple documents, but
when trying more and more documents we started to need more of these
SIDs to be properly defined. This is a copy/paste exercise from the CFF
document, which is tedious, so it will continue in small drops.

This commit fills all the gaps until SID 228, which covers all the
ISOAdobe space, and should be enough for most use cases. Since this is a
continuous space starting at 0, we now use an Array instead of a Map to
store these names, which should be more performant. Also to simplify
things I've moved the Array out of the CFF class, making it a simpler
static variable, which allows us to use template type deduction.
2023-02-13 00:23:17 +00:00
Julian Offenhäuser
1f27c47973 LibPDF: Check for end of stream in Reader::matches_regular_character()
The way this was set up before, this function would return "true" if
the underlying stream had ended, which would cause us to try to read
past the end in some edge cases.
2023-02-12 10:55:37 +00:00
Julian Offenhäuser
a2b57dd188 LibPDF: Return an error if we fail to load a replacement font 2023-02-12 10:55:37 +00:00
Julian Offenhäuser
96064ec5af LibPDF: Allow filter DecodeParms array entries to be null
Filters will use the default values in this case.
2023-02-12 10:55:37 +00:00
Julian Offenhäuser
34350ee9e7 LibPDF: Allow reading documents with incremental updates
The PDF spec allows incremental changes of a document by appending a
new XRef table and file trailer to it. These will only contain the
changed objects and will point back to the previous change, forming an
arbitrarily long chain of XRef sections and file trailers.

Every one of those XRef sections may be encoded as an XRef stream as
well, in which case the trailer is part of the stream dictionary as
usual. To make this easier, I made it so every XRef table may "own" a
trailer. This means that the main file trailer is now part of the main
XRef table.
2023-02-12 10:55:37 +00:00
Julian Offenhäuser
4f4bd3793f LibPDF: Fix glyph sizing bug that caused incorrect spacing
When loading OpenType fonts, either as a replacement for the standard
14 fonts or an embedded one, we previously passed the font size as the
_point_ size to the loader class. The difference is quite subtle, being
that Gfx::ScaledFont uses the optional dpi parameter to convert the
input from inches to pixels.

This meant that our glyphs were exactly 1.333% too large, causing them
to overlap in places.
2023-02-10 15:37:51 +01:00
Julian Offenhäuser
152a8c5c43 LibPDF: Use more appropriate standard 14 replacement fonts
The mapping of standard font to replacement now looks like this:

Times New Roman -> Liberation Serif
Courier -> Liberation Mono
Helvetica, Arial -> Liberation Sans
2023-02-10 15:37:51 +01:00
MacDue
63b11030f0 Everywhere: Use ReadonlySpan<T> instead of Span<T const> 2023-02-08 19:15:45 +00:00
Rodrigo Tobar
e4a7606b81 LibPDF: Construct accented characters with Type1 seac command
The seac command provides the base and accented character that are
needed to create an accented character glyph. Storing these values is
all that was left to properly support these composed glyphs.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
3eaa27f53a LibPDF: Add infrastructure for accented character glyphs
Type1 accented character glyphs are composed of two other glyphs in the
same font: a base glyph and an accent glyph, given as char codes in the
standard encoding. These two glyphs are then composed together to form
the accented character.

This commit adds the data structures to hold the information for
accented characters, and also the routine that composes the final glyph
path out of the two individual components. All glyphs must have been
loaded by the time this composition takes place, and thus a new
protected consolidate_glyphs() routine has been added to perform this
calculation.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
11a9bfd4b6 LibPDF: Turn Glyph into a class
Glyph was a simple structure, but even now it's become more complex that
it was initially. Turning it into a class hides some of that complexity,
and make sit easier to understand to external eyes.

While doing this I also decided to remove the float + bool combo for
keeping track of the glyph's width, and replaced it with an Optional
instead.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
c084943457 LibPDF: Index Type1 glyphs by name, not char code
Storing glyphs indexed by char code in a Type1 Font Program binds a Font
Program instance to the particular Encoding that was used at Font
Program construction time. This makes it difficult to reuse Font Program
instances against different Encodings, which would be otherwise
possible.

This commit changes how we store the glyphs on Type1 Font Programs.
Instead of storing them on a map indexed by char code, the map is now
indexed by glyph name. In turn, when rendering a glyph we use the
Encoding object to turn the char code into a glyph name, which in turn
is used to index into the map of glyphs.

This is the first step towards reusability of Type1 Font Programs. It
also unlocks the ability to render glyphs that are described via the
"seac" command (standard encoding accented character), which requires
accessing the base and accent glyphs by name.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
596119cf3e LibPDF: Add placeholders for *flex Type2 commands
These should be implemented properly in the future, but for now we are
adding the as placeholders to avoid crashes.
2023-02-08 19:47:15 +01:00
Rodrigo Tobar
64bbe431b5 LibPDF: Add char_code -> name mapping function
We already keep both mappings internally, now it's time to actually use
it.
2023-02-08 19:47:15 +01:00