Commit Graph

23 Commits

Author SHA1 Message Date
Nico Weber
d345c5b793 LibPDF: Add (automated!) test for info dict encoding
Manually added an info dict with the three text string encoding
methods to encoding.pdf.

(Preview.app apparently can't handle UTF-8 in info dicts!)
2023-11-22 09:08:06 -07:00
Nico Weber
65b895595a LibPDF: Add an encoding test file
For now, this uses UTF-16BE and UTF-8 marked strings in page body text.
These markings should be ignored in body text.

Hand-written, with `set fenc=latin1` and `set binary` in vim, and
xref etc fixed up by running

    mutool clean Tests/LibPDF/encoding.pdf  Tests/LibPDF/encoding.pdf

as usual.
2023-11-22 09:08:06 -07:00
Nico Weber
4c6afd4763 Tests: Install recently added PDF test files
These aren't needed for any automated tests, but it's still nice to have
them in the OS for manual testing.
2023-11-22 09:08:06 -07:00
Nico Weber
eedd978974 Tests: Add a type3 test pdf
A manual test, but better than nothing.

I hand-wrote the file, and used mutool to fix up xref and stream
lengths:

    mutool clean Tests/LibPDF/type3.pdf Tests/LibPDF/type3.pdf

The file contains one `d1` character which per spec shouldn't
contain color statements, and if it does it should be ignored,
and one `d0` character which can contain color.

The text then sets a color before rendering the text.

Per spec, the text color should affect the `d1` character but
not the `d0` one. We get this wrong, but so does Preview.app.
(PDFium gets it right.)

But independent of the colors, just rendering the glyphs at all
at the right position is already good :^)
2023-11-19 22:33:34 +01:00
Nico Weber
4f51ff456e LibPDF: Add a test for inter-word spacing with TJ
Hand-written, based on the text example in Appendix G.2 in
the PDF 1.7 spec, with the xref table fixed up by `mutool clean`:

    mutool clean -dggg Tests/LibPDF/text.pdf Tests/LibPDF/text.pdf
2023-11-14 10:11:09 +01:00
Nico Weber
f4a847894f LibPDF: Make SampledFunction::evaluate() work for n-dimensional input
I didn't find example code for this and the AI assistant did very
poorly on this as well. So I had to write it all by myself!

It can be much more efficient I think, but I think the overall
shape is maybe roughly fine.
2023-11-12 07:55:04 +01:00
Nico Weber
a9ef65e64a LibPDF: For multi-output SampledFunctions, fix output colors
For N outputs, the outputs aren't stored in N independent planes.
Instead, N output values are stored right next to each other in
the stream data.
2023-11-11 08:55:37 +01:00
Nico Weber
ec739460e0 LibPDF: Add test for SampledFunction and fix bugs found by it
* SampledFunction now keeps the StreamObject it gets data from alive
  (doesn't matter too much in practice, but does matter in the test,
  where nothing else keeps the stream alive).

* If a sample is an integer, we would previously sample that value
  twice and then divide by zero when interpolating. Make sure to
  sample 1 unit apart.
2023-11-11 08:55:37 +01:00
Nico Weber
80eec1e16b LibPDF: Implement PostScriptCalculatorFunction
Includes a tokenizer and interpreter for the subset of PostScript
supported in PDF type 4 functions.
2023-11-09 16:06:25 +01:00
Nico Weber
00f1a6cf86 Tests: Add a pdf file for testing color spaces
Covers DeviceGray, CalRGB, DeviceRGB, DeviceCMYK, Lab, CalGray for now.

Does not yet cover Indexed, Pattern, Separation, DeviceN, ICCBased.

Lovingly hand-written, with the xref table fixed up by mutool.
2023-11-04 17:02:37 -04:00
Tim Ledbetter
b4296e1c9b LibPDF: Don't use unsanitized values in error messages
Previously, constructing error messages with unsanitized input could
fail because error message strings must be UTF-8.
2023-10-26 11:05:32 +02:00
Nico Weber
323d76fbb9 LibPDF: Make encrypted object streams work
There were two problems:
1. parse_compressed_object_with_index() parses indirect objects
   without going through Parser::parse_indirect_value(), so
   push_reference() / pop_reference() weren't called.
   Manually call them, both for the indirect object containing
   the object stream and for the indirect object within the
   object stream.
2. The indirect object within the object stream got decrypted
   twice: Once when the object stream data itself got decrypted,
   and then incorrectly a second time when the object data within
   the stream was read. To fix, disable encryption while parsing
   object stream data (since it's already decrypted).

The test is from http://opf-labs.org/format-corpus/pdfCabinetOfHorrors/
which according to readme.md at the same location is CC0.
2023-07-12 17:16:25 +02:00
Nico Weber
6200097bcc Tests/LibPDF: Make encrypted_with_aes test some metadata too 2023-07-12 17:16:25 +02:00
Nico Weber
2061ee2632 Tests/LibPDF: Add test for AES-encrypted PDF
I created this by typing "sup" into TextEdit.app on macOS 13.4,
hitting Cmd-P to bring up the print dialog, clicked the PDF button
at the bottom, changed Title and Author to "sup", clicked
"Security Options…", and checked "Require password to open document"
(with password "sup").

This file tests several things:
- It has a compressed stream as first object. This used to make the
  linearization dict detection logic assert.
- It uses AES as encryption key using version 4 of the encryption
  dict. This used to not be implemented.
2023-07-12 06:28:15 +02:00
Nico Weber
e0887dd045 Tests/LibPDF: Use MUST() more
No behavior change.
2023-07-12 06:28:15 +02:00
Ben Wiederhake
f890b70eae Tests: Prefer TRY_OR_FAIL() and MUST() over EXPECT(!.is_error())
Note that in some cases (in particular SQL::Result and PDFErrorOr),
there is no Formatter defined for the error type, hence TRY_OR_FAIL
cannot work as-is. Furthermore, this commit leaves untouched the places
where MUST could be replaced by TRY_OR_FAIL.

Inspired by:
https://github.com/SerenityOS/serenity/pull/18710#discussion_r1186892445
2023-05-14 15:39:38 -06:00
Sam Atkins
1910dc8976 Tests: Move test PDF files into Tests/LibPDF
Let's put test files with the tests themselves, instead of a random user
directory. (But still copy them so they appear in the user directory
for convenience.)
2023-01-19 11:50:10 +00:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
Matthew Olsson
5b316462b2 LibPDF: Add implementation of the Standard security handler
Security handlers manage encryption and decription of PDF files. The
standard security handler uses RC4/MD5 to perform its crypto (AES as
well, but that is not yet implemented).
2022-03-29 02:52:57 +02:00
Matthew Olsson
73cf8205b4 LibPDF: Propagate errors in Parser and Document 2022-03-07 10:53:57 +01:00
Simon Woertz
d8013f9c3a Tests: Add test cases for #10702 and #10717
Add test cases for parsing an empty file and a truncated file.
2022-01-08 18:57:55 +01:00
Simon Woertz
07a557194c Tests: Add base structure for LibPDF unit tests
Add a unit test for each sample pdf file that currently exists in the
anon user's `~/Document/pdf` directory.
- linear.pdf
- non-linearized.pdf
- complex.pdf

Each test ensures that the pdf document is parsed and that the page
count is the expected one.
2022-01-08 18:57:55 +01:00