Commit Graph

452 Commits

Author SHA1 Message Date
Lucas CHOLLET
048ef11136 LibPDF: Factorize flate parameters handling to its own function
This part will be shared with the LZW filter, so let's factorize it.
2023-11-13 14:23:23 +01:00
Nico Weber
bbde3cbc90 LibPDF: Tolerate an indirect object as dict for CIE-based color spaces
Namely, for CalGrayColorSpace, CalRGBColorSpace, LabColorSpace.

Fixes a crash rendering any page of Adobe's 5014.CIDFont_Spec.pdf
(which uses CalRGBColorSpace with an indirect dict: The dict is
object `92 0`, and many color spaces are inline objects referring
to it).
2023-11-13 07:12:05 -05:00
Nico Weber
f4a847894f LibPDF: Make SampledFunction::evaluate() work for n-dimensional input
I didn't find example code for this and the AI assistant did very
poorly on this as well. So I had to write it all by myself!

It can be much more efficient I think, but I think the overall
shape is maybe roughly fine.
2023-11-12 07:55:04 +01:00
Nico Weber
a9ef65e64a LibPDF: For multi-output SampledFunctions, fix output colors
For N outputs, the outputs aren't stored in N independent planes.
Instead, N output values are stored right next to each other in
the stream data.
2023-11-11 08:55:37 +01:00
Nico Weber
ec739460e0 LibPDF: Add test for SampledFunction and fix bugs found by it
* SampledFunction now keeps the StreamObject it gets data from alive
  (doesn't matter too much in practice, but does matter in the test,
  where nothing else keeps the stream alive).

* If a sample is an integer, we would previously sample that value
  twice and then divide by zero when interpolating. Make sure to
  sample 1 unit apart.
2023-11-11 08:55:37 +01:00
Nico Weber
323ba7404c LibPDF: Implement SampledFunction::evaluate() for some sampled functions
Things now work for functions that are all of:
* linear
* 1-D input
* 8 bits per sample
2023-11-10 15:03:30 +00:00
Nico Weber
fd1876441a LibPDF: Implement SampledFunction::create() 2023-11-10 15:03:30 +00:00
Nico Weber
cd9f4655ec LibPDF: Tweak implementation of postscript roll op
Since positive offsets roll to the right, it makes more sense
to do the big reverse first. Gets rid of an awkward minus sign.

No behavior change.
2023-11-10 14:45:38 +01:00
Nico Weber
b23ed86889 LibPDF: Implement StitchingFunction::evaluate() 2023-11-10 14:45:16 +01:00
Nico Weber
ba34ddeb21 LibPDF: Implement StitchingFunction creation 2023-11-10 14:45:16 +01:00
Nico Weber
5af6e1c042 LibPDF: Implement DeviceNColorSpace 2023-11-09 23:33:49 +01:00
Nico Weber
0f07049935 LibPDF: Add ColorSpaceFamily::operator==
No behavior change.
2023-11-09 23:33:49 +01:00
Nico Weber
80eec1e16b LibPDF: Implement PostScriptCalculatorFunction
Includes a tokenizer and interpreter for the subset of PostScript
supported in PDF type 4 functions.
2023-11-09 16:06:25 +01:00
Tim Schumacher
a2f60911fe AK: Rename GenericTraits to DefaultTraits
This feels like a more fitting name for something that provides the
default values for Traits.
2023-11-09 10:05:51 -05:00
Nico Weber
bbd86ee4f3 LibPDF: Implement ExponentialInterpolationFunction 2023-11-06 10:01:05 +01:00
Nico Weber
1aed465efe LibPDF: Implement Fuction::create() 2023-11-06 10:01:05 +01:00
Nico Weber
b78ea81de5 LibPDF: Implement SeparationColorSpace
Requires PDF::Function, which isn't implemented yet, so this has
no visual effect yet.
2023-11-06 10:01:05 +01:00
Nico Weber
9204252d02 LibPDF: Add scaffolding for function objects
See PDF 1.7 Spec, "3.9 Functions".
2023-11-06 10:01:05 +01:00
Nico Weber
21894f1cde LibPDF: Fix typos in DeviceN colorspace scaffolding
* Compare array size to 3 and 4, not 4 and 5
* Fix literal typo in error message

Fixes crash processing 0000906.pdf from 0000.zip from the pdf/a dataset.
2023-11-06 09:54:01 +01:00
Nico Weber
30ea218e35 LibPDF: Implement IndexedColorSpace 2023-11-05 14:27:22 -07:00
Nico Weber
0b087c02a3 LibPDF: Add spec link to default_decode() 2023-11-05 14:27:22 -07:00
Nico Weber
3dca11c4e2 LibPDF: Move color space creation from name or array into ColorSpace
No behavior change.
2023-11-05 14:27:22 -07:00
Nico Weber
1dfd49ef99 LibPDF: Implement LabColorSpace 2023-11-05 14:27:22 -07:00
Nico Weber
4a5136fc8c LibPDF: Implement CalGrayColorSpace
I haven't seen this being used in the wild, but it's used in
Tests/LibPDF/colorspaces.pdf.
2023-11-04 17:02:37 -04:00
Nico Weber
a207ab709a LibPDF: In convert_to_srgb(), also apply sRGB curve (ish)
We did convert from the input space to linear space and then
to linear sRGB, but we forgot to re-apply gamma.

This uses the x^2.2 curve instead of the real sRGB curve for now.
2023-11-04 17:02:37 -04:00
Nico Weber
641365b235 LibPDF: Move colorspace conversion functions up a bit
No code change, no behavior change. Pure code move.
2023-11-04 17:02:37 -04:00
Nico Weber
f8799885de LibPDF: Clamp sRGB channels before converting to u8 in CalRGB code
Sometimes the numbers end up just slightly above 1.0f, which previously
caused an overflow.
2023-11-01 11:45:13 -04:00
Nico Weber
bdd2404453 LibPDF: Ignore input whitepoint in convert_to_d65()
CalRGBColorSpace::color() converts into a flat xyz space,
which already takes input whitepoint into account.

It shouldn't be taken into account again when converting from
the flat color space to D65.
2023-11-01 11:45:13 -04:00
Nico Weber
e35a5da2fb LibPDF: Update dead link in a comment 2023-11-01 11:45:13 -04:00
Nico Weber
1fcf0142d2 LibPDF: Fix unfortunate typo in CalRGBColorSpace::create()
We always ignored the /Matrix key in /CalRGB dicts.
2023-11-01 11:45:13 -04:00
Nico Weber
d24289eef4 LibPDF: Always log unhandled type 1 and type 2 font program opcodes
This would've made it easy to see that we were missing flex opcodes for
https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf
2023-11-01 11:40:16 -04:00
Nico Weber
e1a743f286 LibPDF: Implement type 2 flex, hflex, hflex1, flex1 operators
This is the type 2 equivalent to type2 othersubr, from what I can tell.

See "4.1 Path Construction Operators" in 5177.Type2.pdf,
"The Type 2 Charstring Format".

Makes text show up alright on
https://developer.apple.com/library/archive/documentation/mac/pdf/Text.pdf
2023-11-01 11:40:16 -04:00
Nico Weber
3e707efdfa LibPDF: Move type1 subr 0 handling into othersubr handler
https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf,
8.4 First Four Subrs Entries:

"""If Flex or hint replacement is used in a Type 1 font program, the
first four entries in the Subrs array in the Private dictionary must be
assigned charstrings that correspond to the following code sequences. If
neither Flex nor hint replacement is used in the font program, then this
requirement is removed, and the first Subrs entry may be a normal
charstring subroutine sequence. The first four Subrs entries contain:

Subrs entry number 0:
3 0 callothersubr pop pop setcurrentpoint return
"""

othersubr handler 0 gets three arguments:
* The flex height (the distance after which the bezier splines
  are replaced with just straight lines)
* The current position after the flex

It pushes that position on the postscript stack, where predefined subr
handler number 0 then pops it from. It then passes it to
setcurrentpoint.

In theory, we now correctly do that setcurrentpoint call, which we
previously weren't.

In practice, that setcurrentpoint call always receives the last point of
the flex -- and our path api apparently gets confused when move_to() is
called on it when the current point is already at that same location.

So tweak the SetCurrentPoint handler to not set the current point on
the path if it's already the path's current point, with a FIXME to
figure out what exactly is happening in Gfx::Path.

No big behavior change if flex is used, but this is more correct if it
isn't.

(This only works because our `return` handler is empty, else we would
have to make the callothersubr handler start a call frame.)
2023-11-01 11:38:41 -04:00
Nico Weber
0bb8249780 LibPDF: Move type1 subr 1 and 2 handling into othersubr handler
https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf,
8.4 First Four Subrs Entries:

"""If Flex or hint replacement is used in a Type 1 font program, the
first four entries in the Subrs array in the Private dictionary must be
assigned charstrings that correspond to the following code sequences. If
neither Flex nor hint replacement is used in the font program, then this
requirement is removed, and the first Subrs entry may be a normal
charstring subroutine sequence. The first four Subrs entries contain:

[...]

Subrs entry number 1:
0 1 callothersubr return

Subrs entry number 2:
0 2 callothersubr return
"""

So subr entry numbers 1 and 2 just call othersubr 1 and and 2, which
means we can just move the handling code over.

No behavior change if flex is used, but more correct if it isn't.

(This only works because our `return` handler is empty, else we would
have to make the callothersubr handler start a call frame.)
2023-11-01 11:38:41 -04:00
Ali Mohammad Pur
78c04cb8b2 AK+LibPDF: Make Format print floats in a roundtrip-safe way by default
Previously we assumed a default precision of 6, which made the printed
values quite odd in some cases.
This commit changes that default to print them with just enough
precision to produce the exact same float when roundtripped.

This commit adds some new tests that assert exact format outputs, which
have to be modified if we decide to change the default behaviour.
2023-10-31 09:12:35 +03:30
Nico Weber
4cc24548f6 LibPDF: Call dbgln() for unimplemented flex upcodes 2023-10-28 13:28:05 -04:00
Nico Weber
e484fae8e1 LibPDF: Don't do special subr processing for type 2 CFFs
This is a subset of #21484: Type 2 CFFs never use the special subrs,
so stop doing them for type 2 at least for now.

Fixes an assert in 0000064.pdf in 0000.zip in the pdfa dataset
(a stack underflow because a subr is supposed to push a bunch of
stuff, but instead it ran one of the built-in routines instead of
the subr from the font file).

As discussed in #21484, this isn't right for type 1 CFFs either,
but just removing the code there regresses Tests/LibPDF/type1.pdf.
A slightly more involved thing is needed there; I added a FIXME
for that here.
2023-10-28 13:28:05 -04:00
Tim Ledbetter
5c0c55d2c0 LibPDF: Ensure xref stream field widths are within expected range
Previously, an xref stream with a field with larger than 8 would
result in an undefined shift occurring. We now ensure that each field
width is a number and is less than or equal to 8.
2023-10-28 13:17:09 -04:00
Nico Weber
6d47fca3bf LibPDF: Don't assert on outline destinations that use null as page
Nothing in PDF 1.7 spec 8.2.1 Destinations mentions the page being
`null`, but it happens in 0000372.pdf (for the root outline element)
and in 0000776.pdf (for every outline element, which looks like a
bug in the generator maybe) of 0000.zip from the pdfa dataset.
2023-10-27 06:38:25 -04:00
Tim Ledbetter
b4296e1c9b LibPDF: Don't use unsanitized values in error messages
Previously, constructing error messages with unsanitized input could
fail because error message strings must be UTF-8.
2023-10-26 11:05:32 +02:00
Nico Weber
f8bf9c6506 LibPDF: Sketch out DeviceN color spaces a bit
Documents using them now show render-time diagnostics instead
of asserting that number of parameters passed to a color don't
match whatever number of channels the previously-set color space
had.

Fixes two asserts on the `-n 500` 0000.zip test set.
2023-10-26 11:05:00 +02:00
Nico Weber
4549d6cf1b LibPDF: Add a FIXME comment to the inline image data skipping path 2023-10-26 10:59:45 +02:00
Nico Weber
2878af5968 LibPDF: Sketch out Lab color space
Same as other recent color spaces: Enough to make us not assert,
but not enough to actually produce color.

Fixes 2 asserts on the `-n 500` 0000.zip pdfa dataset.
2023-10-26 10:59:45 +02:00
Nico Weber
a65d8ff2ea LibPDF: Tolerate page rotation being an indirect object
Needed e.g. for 0000196.pdf in 0000.zip in the pdfa dataset.
2023-10-26 10:58:45 +02:00
Nico Weber
8b806183f6 LibPDF: Tolerate indirect objects in various image dict values
0000101.pdf from 0000.zip from the pdfa dataset has /Height set to
an indirect object that contains an int.

Make that work, and make sure various other similar places getting
values of the image dict also resolve indirect references.
2023-10-26 10:58:45 +02:00
Nico Weber
5dd7639386 LibPDF: Tolerate indirect references in Type0 /W array
Makes e.g. 0000236.pdf in 0000.zip in the pdfa dataset work.
2023-10-26 10:58:45 +02:00
Nico Weber
b928fadba7 LibPDF: Swap int and array branches in outline item reading
No intended behavior change.

It does have the effect that indirect object references now go down
the array path instead of the number path. They still fall over there,
but now that's easy to fix.
2023-10-26 10:58:45 +02:00
Nico Weber
208a058eab LibPDF: Tolerate integer outline item colors
0000296.pdf from 0000.zip from the pdfa dataset contains
`/C [0 0 0]` (as opposed to `/C [0.0 0.0 0.0]`). Make that work.
(It's fine per spec.)
2023-10-26 10:58:45 +02:00
Nico Weber
54cdcd0d06 LibPDF: Reject non-hexdigits in hex string with error
...instead of VERIFY()ing input data.

I haven't seen this in the wild, but since I'm here anyways,
might as well fix this.
2023-10-25 10:44:26 +02:00
Nico Weber
4675700057 LibPDF: Reject unterminated literal strings with an error
0000459.pdf in 0000.zip in the pdfa dataset contains this as the
very first object:

```
1 0 obj
<<
/Creator (Developer 2000)
/CreatorDate (
/Author (Oracle Reports)
/Producer (Oracle PDF driver)
/Title (2021_06_29 Tutoritzacions APTES.PDF)
>>
endobj
```

The `/CreatorDate` value string is unterminated.

Before, we'd assert when trying to check if the first object is
a linearization dict.

Now, we never read the first object (an error during the linearization
dict reading is treated as "file is not linearized") unless we try
to print the document's metadata -- and there we now show an error
instead of asserting.
2023-10-25 10:44:26 +02:00