ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2024-09-20 17:58:18 +03:00

Author	SHA1	Message	Date
Nico Weber	32f601f9a4	LibPDF: Fix small bug from #21452 I implemented CFF charset format 2 in `6f783929dd` with the note "I haven't seen this being used in the wild". Now that I have seen it (0000658.pdf), I can say that this has never worked, despite me claiming "it's easy to implement". But now it works!	2024-02-08 13:48:56 +00:00
Nico Weber	9fc47345ce	LibGfx+LibPDF: Make sample() functions take ReadonlySpan<> ...instead of Vector<>. No behavior (or performance) change.	2024-02-06 08:44:53 +01:00
Nico Weber	92a628c07c	LibPDF: Always treat `/Subtype /Image` as binary data when dumping Sometimes, the "is mostly text" heuristic fails for images. Before: Build/lagom/bin/pdf --render out.png ~/Downloads/0000/0000521.pdf \ --page 10 --dump-contents 2>&1 \| wc -l 25709 After: Build/lagom/bin/pdf --render out.png ~/Downloads/0000/0000521.pdf \ --page 10 --dump-contents 2>&1 \| wc -l 11376	2024-02-05 21:18:19 -05:00
Nico Weber	f562c470e2	LibGfx+LibPDF: Simpler and faster N-D linear sampling Previously, if we wanted to to e.g. do linear interpolation in 2-D, we'd get a sample point like (1.3, 4.4), then get 4 samples around it at (1, 4), (2, 4), (1, 5), (2, 5), then reduce the 4 samples to 2 samples by computing the combined samples `0.3 * f(1, 4) + 0.7 * f(2, 4)` and `0.3 * f(1, 5) + 0.8 * f(2, 5)`, and then 1-D linearly blending between these two samples with the factor 0.4. In the end we'd multiply the first value by 0.3 * 0.4, the second by 0.7 * 0.4, the third by 0.3 * 0.6, and the third by 0.7 * 0.6, and then sum them all up. This requires computing and storing 2N samples, followed by another 2N iterations to combine the 2N sampls to a single value. (N is in practice either 4 or 3, so 2N isn't super huge.) Instead, for every sample we can directly compute the product of weights and sum them up directly. This lets us omit the second loop and storing 2**N values, in exchange for doing an additional O(n) work to compute the product. Takes Build/lagom/bin/image --no-output --invert-cmyk \ --assign-color-profile \ Build/lagom/Root/res/icc/Adobe/CMYK/USWebCoatedSWOP.icc \ --convert-to-color-profile serenity-sRGB.icc \ cmyk.jpg form 3.42s to 3.08s on my machine, almost 10% faster (and less code). Here cmyk.jpg is a 2253x3080 cmyk jpeg, and USWebCoatedSWOP.icc is an mft2 profile with input tables with 256 samples and a 9x9x9x9 CLUT. The LibPDF change is covered by TEST_CASE(sampled) in LibPDF.cpp, and the LibGfx change is basically the same change as the one in LibPDF (where the test results don't change) and the output subjectively looks identical. So hopefully this causes indeed no behavior change :^)	2024-02-04 21:49:23 +01:00
Nico Weber	955d73657e	LibPDF: Make `pdf --dump-contents` dump less binary data For pages containing images or embedded fonts, --dump-contents used to dump a ton of binary data. That isn't very useful, so stop doing it. Before: % time Build/lagom/bin/pdf --render out.png \ ~/Downloads/0000/0000711.pdf --dump-contents \| wc -l 937972 Now: % time Build/lagom/bin/pdf --render out.png \ ~/Downloads/0000/0000711.pdf --dump-contents \| wc -l 6566 Printing 7k lines is also much faster than printing 940k, 0.15s instead of 2s.	2024-02-03 08:26:29 +00:00
Nico Weber	9c762b9650	LibPDF+Meta: Use a CMYK ICC profile to convert CMYK to RGB CMYK data describes which inks a printer should use to print a color. If a screen should display a color that's supposed to look similar to what the printer produces, it results in a color very different to what Color::from_cmyk() produces. (It's also printer-dependent.) There are many ICC profiles describing printing processes. It doesn't matter too much which one we use -- most of them look somewhat similar, and they all look dramatically better than Color::from_cmyk(). This patch adds a function to download a zip file that Adobe offers on their web site. They even have a page for redistribution: https://www.adobe.com/support/downloads/iccprofiles/icc_eula_win_dist.html (That one leads to a broken download though, so this downloads the end-user version.) In case we have to move off this download at some point, there are also a whole bunch of profiles at https://www.color.org/registry/index.xalter that "may be used, embedded, exchanged, and shared without restriction". The adobe zip contains a whole bunch of other useful and fun profiles, so I went with it. For now, this only unzips the USWebCoatedSWOP.icc file though, and installs it in ${CMAKE_BINARY_DIR}/Root/res/icc/Adobe/CMYK/. In Serenity builds, this will make it to /res/icc/Adobe/CMYK in the disk image. And in lagom build, after #23016 this is the lagom res staging directory that tools can install via Core::ResourceImplementation. `pdf` and `MacPDF` already do that, `TestPDF` now does it too. The final piece is that LibPDF then loads the profile from there and uses it for DeviceCMYK color conversions. (Doing file access from the bowels of a library is a bit weird, especially in a system that has sandboxing built in. But LibGfx does that in FontDatabase too already, and LibPDF uses that, so it's not a new problem.)	2024-02-01 13:42:04 -07:00
Nico Weber	f840fb6b4e	LibPDF: Make DeviceCMYKColorSpace::the() fallible No behavior change.	2024-02-01 13:42:04 -07:00
Nico Weber	384c6cf0f9	LibPDF: Tweak vertical position of truetype fonts again See #22821 for a previous attempt. This attempt should settle things once and for all. The opentype render path adjusts by `-font_ascender * -y_scale` in Glyf::Glyph::append_simple_path(), so that's what we need to undo to draw at the font's baseline. (OpenType::Font::metrics() returns ascender scaled by y_scale already, so no need to have the scale here where we undo the shift.) Previously, we called `baseline()` which just returns the font's font size, which is pretty meaningless: https://tonsky.me/blog/font-size/ https://simoncozens.github.io/fonts-and-layout/opentype.html#vertical-metrics-hhea-and-os2 Also, conceptually it makes sense to translate up by the ascender to get from the upper edge of the glyph to the baseline.	2024-02-01 10:05:40 +01:00
Nico Weber	87112dcbdc	LibPDF: Return null for invalid refs, tolerate null objects as outline https://llvm.org/devmtg/2022-11/slides/TechTalk5-WhatDoesItTakeToRunLLVMBuildbots.pdf has an xref table that starts like so: ``` xref 0 214 0000000002 65535 f 0000924663 00000 n 0000000003 00000 f 0000000000 00000 f 0000000016 00000 n 0000000160 00000 n 0000000263 00000 n ``` This is a list of objects in the PDF file. The lines ending with 'f' mean that this object is "free", that is it's not stored in the file. In this file, objects 0, 2, 3 are free. For free objects, the first number is the offset of the next free object: Object 0 refers to object 2, 2 to 3, and 3 back to 0 (since it's the last free object). The lines ending with "n" are actual objects; here the first number is a byte offset to where that object is stored in the file. Furthermore, the file contains ``` /Outlines 2 0 R ``` in its root object, meaning that object 2 stores the page outlines. Since object 2 is set as free, there is no object 2. But the spec says that an invalid object reference is just the null object. This patch makes us return null objects for references to free objects, and it also makes us treat a null object as /Outlines value the same as not having /Outlines in the first place. Fixes #23023 -- we can now open that file. (We don't render it super well, but only for already-known reasons.) Since I found it a bit confusing: XRefTable has two related methods here: 1. has_object() returns if an object was explicitly listed in an xref table. The first number right after `xref` is the start index. So if an xref table were to start with `10`, we'd implicitly create 10 trailing objects for which has_object() would return false 2. is_object_in_use() returns true if an object that was in a table (i.e. one where has_object() returns true) was listed with 'n' and false if it was listed with 'f'. DocumentParser::parse_object_with_index() should probably return a null object for the `!has_object()` case as well instead of VERIFY()ing that has_object() is true. But I haven't seen this in the wild yet, so keeping as-is for now.	2024-01-31 12:10:19 -05:00
Timothy Flynn	aa0a6d58b2	Userland: Remove LibCore dependency from libraries that do not use it	2024-01-22 08:48:34 -05:00
Nico Weber	a0462f495c	LibPDF+MacPDF: Clip text, and add a debug option for disabling it	2024-01-20 08:56:03 +01:00
Nico Weber	90fdf738a1	LibPDF: Alphabetize clip_ fields in RenderingPreferences No behavior change.	2024-01-20 08:56:03 +01:00
Nico Weber	66f8259a0b	LibPDF: Move ClipRAII to .h file No behavior change.	2024-01-20 08:56:03 +01:00
Tim Ledbetter	459fa8b840	LibPDF: Ensure that xref subsection numbers are u32 Previously, parsing an xref entry with a floating point subsection number would cause a crash.	2024-01-18 15:11:42 +01:00
Nico Weber	d2f3288666	LibPDF: Apply text matrix to each glyph's position We still don't apply it to the glyph itself, so they don't show up scaled or rotated, but they're at the right spot now. One big thing this here hsa going for it is that the final glyph position is now calculated with just `ext_rendering_matrix.map(glyph_position)`. Also, character_spacing and word_spacing are now used unmodified in the SimpleFont::draw_string() loop. This also means we no longer have to undo a scale when updating the position in `Renderer::show_text()`. Most of the rest stays pretty yucky though. The root cause of many problems is that ScaledFont has its rendering sized baked into the object. We want to render fonts at size font_size times scale from text matrix times scale from current transformation matrix (but not size from hotizontal_scaling). So we have to make that the font_size, but then we have to undo that in a bunch of places to get the actualy font size. This will eventually get better when LibPDF moves off ScaledFont.	2024-01-18 14:01:30 +01:00
Nico Weber	f54b0e7c22	LibPDF: Don't accidentally put horizontal_scaling in places Fonts should have size font_size times total scaling. We tried to get that by computing text_rendering_matrix.x_scale() * font_size, but text_rendering_matrix.x_scale() also includes horizontal_scaling, which shouldn't be part of font size. Same for character_spacing and word_spacing. This is all a big mess that's caused by LibPDF using ScaledFont, which requires scaling to be aprt of the text type. I have an in-progress local branch that moves LibPDF to directly use VectorFont, which will hopefully make this (and other things) nicer. But first, let's get this right, and then make sure we don't regress it when things change :^)	2024-01-18 14:01:30 +01:00
Nico Weber	abda5e66f6	LibPDF: Scale delta_x by horizontal_scaling in Renderer::show_text() While PDFFont::draw_string() already returns a position scaled by horizontal_scaling, the division by text_rendering_matrix.x_scale() (which also contains the scaling factor) undid it. Reapply it. Fixes the horizontal layout of the line "should be the same on all lines: super" in Tests/LibPDF/text.pdf.	2024-01-18 14:01:30 +01:00
Nico Weber	470d1d8dcf	LibPDF: Fix order of parameter, text, and current transform matrix PDF spec 1.7 5.3.3 Text Space Details gives the correct multiplication order: parameters * textmatrix * ctm. We used to do text * ctm * parameters (AffineTransform::multiply() does left-multiplication). This only matters if `text_state().rise` is non-zero. In practice, it's almost always zero, in which case the paramter matrix is a diagonal matrix that commutes. Fixes the horizontal offset of "super" in Tests/LibPDF/text.pdf.	2024-01-18 14:01:30 +01:00
Nico Weber	6c65c18c40	LibPDF: Add spec ref to Renderer::calculate_text_rendering_matrix()	2024-01-18 14:01:30 +01:00
Nico Weber	13f007aadb	LibPDF: Tweak vertical position of truetype fonts The vertical coordinates for truetype fonts are different somehow. We compensated a bit for that; now we compensate some more. This is still not 100% perfect, but much better than before.	2024-01-17 08:44:07 +00:00
Nico Weber	1845a406ea	LibPDF: Add debug settings for clipping paths and images	2024-01-17 08:42:56 +00:00
Nico Weber	2d8a22f4b4	LibPDF: Clip images too Since we can't clip against a general path yet, this clips images against the bounding box of the current clip path as well. Clips for images are often rectangular, so this works out well. (We wastefully still decode and color-convert the entire image. In a follow-up, we could consider only converting the unclipped part.)	2024-01-17 08:42:56 +00:00
Nico Weber	5615a2691a	LibPDF: Extract activate_clip() / deactivate_clip() functions No behavior change.	2024-01-17 08:42:56 +00:00
MacDue	d55867e563	LibPDF: Fix paths with negatively sized `re` (rect) commands Turns out the width/height in a `re` command can be negative. This results in rectangles with different winding orders. For example, a negative width results in a reversed winding order. Previously, this was lost by passing the rect through an `AffineTransform` before constructing the path. So instead, this constructs the rect path, and then transforms the resulting path.	2024-01-16 21:31:20 +00:00
Nico Weber	0e91682283	LibPDF: Be more forgiving about trailing image data The predictor code assumed that all stream data is image data (...which would make sense: trailing data there is wasted space). But some PDFs have trailing data there, e.g. 0000257.pdf, so be forgiving about it.	2024-01-16 09:55:11 -05:00
Nico Weber	b34509edd2	LibPDF: Make `pdf --dump-contents` handle \r line endings better Previously, all page contents ended up overprinting a single line over and over for PDFs that used only `\r` as line ending. This is for example useful for 0000364.pdf.	2024-01-15 23:16:45 -07:00
Nico Weber	9f9dbb325b	LibPDF: Make prediction filters error on user-controlled alloc OOM	2024-01-15 23:06:06 -07:00
Nico Weber	93f5420282	LibPDF: Start implementing the TIFF predictor This codepath is separate from the predictor in the TIFF decoder. The TIFF decoder currently does bits->Color conversion before processing the predictor. That doesn't fit the PDF model where filters are processed before converting streams into bitmaps. If this code here ever grows to handle all cases, maybe we can move it over to the TIFF decoder and then make it do predictions before decoding to colors, to share this code. (TIFF prediction is pretty messy since it's bits-per-pixel-dependent. PNG prediction is always byte-based, which makes things easier.)	2024-01-15 23:06:06 -07:00
Nico Weber	9a93f677f4	LibPDF: Mark text rendering matrix as dirty after TJ numbers Mostly because I audited all places that assigned to `m_text_matrix` after #22760. This one is very difficult to trigger in practice. `show_text()` marks the text rendering matrix dirty already, so this only has an effect if the `TJ` array starts with a number, and the matrix isn't marked dirty going in. `Tm` caches the text rendering matrix, so I changed text.pdf to contain: ``` 1 0 0 1 45 130 Tm [ 200 (Hello) -2000 (World) ] TJ T* ``` This first sets an x offset of 5 (on top of the normal 40), and then undoes it (`200` is multiplied by font size (25) / -1000, and `200 * 25 / -1000` is -5). Before this change, the topmost "Hello World" ended up slightly indented. Likely no behavior change in practice, but makes the code easier to understand, and maybe it helps in the wild somewhere.	2024-01-15 08:39:04 +00:00
Nico Weber	f23f5dcd62	LibPDF: Mark text rendering matrix dirty for Td operator 0000342.pdf page 5 contains this snippet: ``` /T1_1 10.976 Tf 0 -31.643 TD (This)Tj 1 0 0 1 54 745.563 Tm 22.181 -31.643 Td [(vehicle)-270.926(uses)... ``` The `Tm` marked the text rendering matrix as dirty at the start, but it then calls calculate_text_rendering_matrix() almost in the next line, which recalculates the text rendering matrix and caches the new matrix. The `Td` used to not mark it as dirty, and we'd draw "vehicle" with an incorrect matrix.	2024-01-15 08:37:55 +00:00
Nico Weber	f4ee9a2333	LibPDF: Support drawing images with 16 bits per channel This uses the tried-and-true "throw away the lower 8 bits" technique for now. This lets us render Tests/LibPDF/wide-gamut-only.pdf.	2024-01-12 16:20:46 -07:00
Nico Weber	5f85aff036	LibPDF: Move ColorSpace::style() to take ReadonlySpan<float> All ColorSpace subclasses converted to float anyways, and this allows us to save lots of float->Value->float conversions during image color space processing. A bit faster: ``` N Min Max Median Avg Stddev x 50 0.99054313 1.0412271 0.99933481 1.0052408 0.012931916 + 50 0.97073889 1.0075941 0.97849107 0.98184034 0.0090329046 Difference at 95.0% confidence -0.0234004 +/- 0.00442595 -2.32785% +/- 0.440287% (Student's t, pooled s = 0.0111541) ```	2024-01-12 12:37:56 +00:00
Nico Weber	56a4af8d03	LibPDF: Don't reallocate Vectors in ICCBasedColorSpace all the time Microoptimization; according to ministat a bit faster: ``` N Min Max Median Avg Stddev x 50 1.0179932 1.0561159 1.0315337 1.0333617 0.0094757426 + 50 1.000875 1.0427601 1.0208509 1.0201902 0.01066116 Difference at 95.0% confidence -0.0131715 +/- 0.00400208 -1.27463% +/- 0.387287% (Student's t, pooled s = 0.0100859) ```	2024-01-12 12:37:56 +00:00
Nico Weber	cfd05b1a55	LibPDF: Use MatrixMatrixConversion when possible Reduces time spent rendering page 3 of 0000849.pdf from 1.32s to 1.13s on my machine. Also reduces the time to run Meta/test_pdf.py on 0000.zip (without 0000849.pdf) from 56s to 54s.	2024-01-12 09:09:56 +01:00
Nico Weber	c161b2d2f9	LibPDF: Extract ICCBasedColorSpace::sRGB() helper	2024-01-12 09:09:56 +01:00
Nico Weber	f7fc2df8ac	LibPDF: Simplify load_image() a tiny bit Images can't use Pattern color spaces, so we'll always have a Color. No behavior (or perf) change.	2024-01-10 23:26:57 +01:00
Nico Weber	df5451a889	LibPDF: Mark text rendering matrix dirty after changing it in text_begin A certain PDF was drawing some text used `9 0 0 9 474.54 700.6801 Tm` to set the text matrix to a matrix that scaled by 9 in one text object. Then, after ending that text object, it had the following new text object which contained nothing that invalidated the text matrix: ``` BT /F1 7 Tf /DeviceRGB CS 0 0 0 SC 10 TL 86.37849 21.908 Td (Authorized licensed use limited to: ...) Tj ET ``` `BT` did reset it as required, but since we didn't mark the matrix as dirty, we never recomputed it and drew the additional text scaled up 9x.	2024-01-10 19:42:08 +01:00
Nico Weber	4fd5d450be	LibPDF: Add support for image masks An image mask is a 1-bit-per-pixel bitmap that's black where the current color should be painted, and white where it should be transparent (think: like ink). load_image() already converts images like this into 8-bit-per-pixel images that have 0xff, 0xff, 0xff in rgb for opaque (originally 0 bit) pixels and 0, 0, 0 in rgb for transparent pixels. So we just move copy the image mask's image data into the alpha channel and replace rgb with the current color, and then draw it like a regular bitmap.	2024-01-10 09:10:11 +00:00
Nico Weber	e770cf06b0	LibPDF: Send jpeg data down the same path as all other data JPEG images now honor decode arrays and color spaces.	2024-01-10 09:39:00 +01:00
Nico Weber	f157cd50a1	LibPDF: Use mix() in SampledFunction::evaluate() No behavior change.	2024-01-04 21:12:23 +01:00
Nico Weber	e16345555b	LibPDF: Port 59b50fa43f8c2 to xref and object streams 0000440.pdf contains an xref stream object (at offset 3643676) starting: ``` 294 0 obj << /Type /XRef /Index [0 295] /Size 295 ``` and an object stream object (at offset 3640121) starting: ``` 230 0 obj << /Type /ObjStm /N 73 /First 614 ``` In both cases, the `obj` and the `<<` are separated by non-newline whitespace. `633e1632d0` made parse_indirect_value() tolerate this, but it didn't update neither parse_xref_stream() (which parses xref streams) nor parse_compressed_object_with_index() (which parses object streams), despite all three changes being part of #14873. Make parse_xref_stream() and parse_compressed_object_with_index() call parse_indirect_value() to pick up the fix over there. It's a bit less code too. (0000440.pdf is the only PDF in my 1000 test PDFs that this helps, somewhat surprisingly.)	2024-01-04 11:27:24 +01:00
Nico Weber	9d69c5d434	LibPDF: Tolerate trailing whitespace after %%EOF marker At first I tried implmenting the quirk from PDF 1.7 Appendix H, 3.4.4, "File Trailer": """Acrobat viewers require only that the %%EOF marker appear somewhere within the last 1024 bytes of the file."" This would've been like #22548 but at end-of-file instead of at start-of-file. This helped a bunch of files, but also broke a bunch of files that made more than 1024 bytes of stuff at the end, and it wouldn't have helped 0000059.pdf, which has over 40k of \0 bytes after the %%EOF. So just tolerate whitespace after the %%EOF line, and keep ignoring and arbitrary amount of other stuff after that like before. This helps: * 0000599.pdf One trailing \0 byte after %%EOF. Due to that byte, the is_linearized() check fails and we go down the non-linearized codepath. But with this fix, that code path succeeds. * 0000937.pdf Same. * 0000055.pdf Has one space followed by a \n after %%EOF * 0000059.pdf Has over 40kB of trailing \0 bytes The following files keep working with it: * 0000242.pdf 5586 bytes of trailing HTML * 0000336.pdf 5586 bytes of trailing HTML fragment * 0000136.pdf 2054 bytes of trailing space characters This one kind of only worked by accident before since it found the %%EOF block before the final %%EOF block. Maybe this is even an intentional XRefStm compat hack? Anyways, now it find the final block instead. * 0000327.pdf 11044 bytes of trailing HTML	2024-01-04 11:19:15 +01:00
Nico Weber	2d12647e29	LibPDF: Add FIXME for "was linearized PDF incrementally updated" check It's pretty tricky to do, and also tricky with respect to skipping trailing bytes after %%EOF: The check requires knowning the full size of the PDF (which means web servers not sending content lengths are out), but that size has to be after stripping trailing bytes, which normal static file servers won't do. So PDF viewers would have to download the last couple bytes of the PDF unconditionally, then strip trailing bytes and use the count to figure out the final actual PDF size. Luckily, we don't incrementally download PDFs from the net but instead require all data to be available in one chunk, so it's not currently a problem.	2024-01-04 11:19:15 +01:00
Nico Weber	1b45c3e127	LibPDF: Tolerate whitespace after `xref` and `startxref` The spec isn't super clear on if this is allowed: """Each cross-reference section shall begin with a line containing the keyword xref. Following this line...""" """The two preceding lines shall contain, one per line and in order, the keyword startxref and...""" It kind of sounds like anything goes on both lines as long as they contain `xref` and `startxref`. In practice, both seem to always occur at the start of their line, but in 0000780.pdf (and nowhere else), there's one space after each keyword before the following linebreak, and this makes that file load.	2024-01-04 10:14:30 +01:00
Nico Weber	efb37f7252	LibPDF: Add Reader::consume_non_eol_whitespace()	2024-01-04 10:14:30 +01:00
Nico Weber	c59e08123b	LibPDF: Add a FIXME and a spec comment to Encoding::from_object()	2024-01-04 10:12:11 +01:00
Nico Weber	ad5fc0eda1	LibPDF: An Encoding's /Differences entry is optional Per "TABLE 5.11 Entries in an encoding dictionary", /Differences is optional. (Per "Encodings for TrueType Fonts" in 5.5.5 Character Encoding, nonsymbolic truetype fonts are even recommended to have "no Differences array." But in practice, most seem to have it.) Fixes crashes on: * 0000001.pdf * 0000574.pdf * 0000337.pdf All three don't render super great, but at least they no longer crash.	2024-01-04 10:12:11 +01:00
Nico Weber	0bb0c7dac2	LibPDF: Scan for PDF file start in first 1024 bytes Other readers do this too, and files depend on this. Fixes opening these four files from the PDFA 0000.zip dataset: * 0000015.pdf Starts with `C:\web\webeuncet\_cat\_docs\_publics\` before header * 0000408.pdf Starts with UTF-8 BOM * 0000524.pdf Starts with 867 bytes of HTML containing a PHP backtrace * 0000680.pdf Starts with `C:\web\webeuncet\_cat\_docs\_publics\` too	2024-01-03 10:12:35 +01:00
Nico Weber	9495f64f91	LibPDF: Improve hex string parsing A local (non-public) PDF I have lying around contains this in a page's operator stream: ``` [<00b4003e> 3 <002600480051> 3 <005700550044004f0003> -29 <00330044> 3 <0055> -3 <004e0040> 4 <0003> -29 <004c00560003> -31 <0057004b> 4 <00480003> -37 <0050 >] TJ ``` That is, there's a newline in a hexstring after a character. This led to `Parser error at offset 5184: Unexpected character`. The spec says in 3.2.3 String Objects, Hexadecimal Strings: """Each pair of hexadecimal digits defines one byte of the string. White-space characters (such as space, tab, carriage return, line feed, and form feed) are ignored.""" But we didn't ignore whitespace before or after a character, only in between the bytes. The spec also says: """If the final digit of a hexadecimal string is missing—that is, if there is an odd number of digits—the final digit is assumed to be 0.""" In that case, we were skipping the closing `>` twice -- or, more accurately, we ignored the character after it too. This has been wrong all the way back in #6974. Add a test that fails if either of the two changes isn't present.	2024-01-02 22:13:21 +01:00
Lucas CHOLLET	f389c1cdba	LibGfx+LibPDF: Use LibCompress' implementation of the PackBits decoder No need to have these three copies :^)	2023-12-27 17:40:11 +01:00
Shannon Booth	e2e7c4d574	Everywhere: Use to_number<T> instead of to_{int,uint,float,double} In a bunch of cases, this actually ends up simplifying the code as to_number will handle something such as: ``` Optional<I> opt; if constexpr (IsSigned<I>) opt = view.to_int<I>(); else opt = view.to_uint<I>(); ``` For us. The main goal here however is to have a single generic number conversion API between all of the String classes.	2023-12-23 20:41:07 +01:00
Nico Weber	b63eb4a4dd	LibPDF: Implement /Mask support with stream object argument	2023-12-23 20:39:11 +01:00
Nico Weber	a3507ef65b	LibPDF: Move error for /ImageMask out of load_image() ...and tweak load_image() to support loading mask images (which don't have a color space and are always 1 bit per pixel).	2023-12-23 20:39:11 +01:00
Nico Weber	3ad9782e25	LibPDF: Extract a apply_alpha_channel() function No behavior change.	2023-12-23 20:39:11 +01:00
Nico Weber	4bd11c8eb4	LibPDF: Show a 'rendering unsupported' error for images with /Mask key	2023-12-23 20:39:11 +01:00
Nico Weber	387fecea7f	LibPDF: Fix typo in a variable name No behavior change.	2023-12-23 10:10:24 +01:00
Nico Weber	6723552e95	LibPDF: Add a spec comment and remove a FIXME I think the ASCIIHexDecode / ASCII85Decode unfilter functions handle what this FIXME was about already.	2023-12-22 10:58:54 +01:00
Nico Weber	3d07684891	LibPDF: Extract Parser::parse_inline_image() Pure code move, no intended behavior change. The motivation is just to make Parser::parse_operators() less nested and more focused.	2023-12-22 10:58:54 +01:00
Nico Weber	6032c06f6b	Revert "LibPDF: Add basic tiled, coloured pattern rendering" This reverts commit `8ff87911a3`.	2023-12-21 19:24:56 +01:00
Nico Weber	7cb216c95b	Revert "LibPDF: Offset PaintStyle when painting so pattern overlaps..." This reverts commit `8c7fc4fe6c`.	2023-12-21 19:24:56 +01:00
Nico Weber	6de32e5359	LibPDF: Draw inline images The idea is to massage the inline image data into something that looks like a regular image, and then use the normal image drawing code: We translate the inline image abbreviations to the expanded version at rendering time, then unfilter (i.e. uncompress) the image data at rendering time, and the go down the usual image drawing path. Normal streams are unfiltered when they're first accessed, but inline image streams live in a page's drawing operators, and this fits the current approach of parsing a page's operators anew every time the page is rendered. (We also need to add some special-case handling for color spaces of inline images: Inline images can use named color spaces, while regular images always use direct color space objects.)	2023-12-20 12:45:16 -07:00
Nico Weber	d577d181e3	LibPDF: Clamp linear_srgb values in convert_to_srgb() This is very crude gamut mapping, but it's better than producing NaNs when passing negative values to powf(x, 1/2.2).	2023-12-20 12:45:07 +01:00
Nico Weber	022fce75a6	LibPDF: Get inline image data from parser to renderer We create a inline_image_end operator that has all the relevant data in a synthetic StreamObject. inline_image_end is still a RENDERER_TODO(), so no real behavior change. (Previously we'd call only inline_image_begin, so string the todo message is about is now a bit different. But no interesting behavior change.)	2023-12-20 12:19:08 +01:00
Nico Weber	3285502ec6	LibPDF: Extract a Parser::unfilter_stream() method No behavior change.	2023-12-20 12:19:08 +01:00
Nico Weber	b21f867e88	LibPDF: Don't crash on images with empty filter arrays 0000967.pdf page 2 contains a bunch of inline images with empty filter arrays.	2023-12-20 12:19:08 +01:00
Nico Weber	13641693cb	LibPDF: Use make_object<>() to make objects No behavior change.	2023-12-20 12:19:08 +01:00
Ali Mohammad Pur	5e1499d104	Everywhere: Rename {Deprecated => Byte}String This commit un-deprecates DeprecatedString, and repurposes it as a byte string. As the null state has already been removed, there are no other particularly hairy blockers in repurposing this type as a byte string (what it _really_ is). This commit is auto-generated: $ xs=$(ack -l \bDeprecatedString\b\\|deprecated_string AK Userland \ Meta Ports Ladybird Tests Kernel) $ perl -pie 's/\bDeprecatedString\b/ByteString/g; s/deprecated_string/byte_string/g' $xs $ clang-format --style=file -i \ $(git diff --name-only \| grep \.cpp\\|\.h) $ gn format $(git ls-files '.gn' '.gni')	2023-12-17 18:25:10 +03:30
Nico Weber	f2f07c3a80	LibPDF: Replace `if (a) VERIFY(0)` with `VERIFY(!a)` No behavior change.	2023-12-16 12:39:56 +01:00
Nico Weber	ee74bc2538	LibPDF: Tolerate 0-sized Subrs in PS1 font subprograms This regressed in `2b3a41be74` in #18031. Fixes a crash rendering page 2 and onward of https://pyx-project.org/presentation_dantemv35_en.pdf	2023-12-16 12:39:56 +01:00
Nico Weber	11354dbf9e	LibPDF: Remember inline image stream bytes We still don't process inline images, but now we have the pieces we need for doing it (`map` and `stream_bytes`).	2023-12-11 10:50:39 +01:00
Nico Weber	cabc6a9d80	LibPDF: Add a comment that PDF 2.0 added a length key for inline images In practice, basically no file has it, since it was only added in 2.0, and 1.7 explicitly said "in particular, the Type, Subtype, and Length entries normally found in a stream or image dictionary are unnecessary."	2023-12-11 10:50:39 +01:00
Nico Weber	071f890847	LibPDF: Require whitespace in front of inline image marker EI Fixes a crash on page 3 of 0000450.pdf of 0000.zip, where we previously started interpreting the middle of an inline image content stream as operators, since it contained `EI` in its pixel data.	2023-12-11 10:50:39 +01:00
Nico Weber	27aae7e2b1	LibPDF: Parse inline image key-value pairs Not used for anything yet.	2023-12-11 10:50:39 +01:00
Nico Weber	0912896ae0	LibPDF: Extract Parser::parse_dict_contents_until() No behavior change.	2023-12-11 10:50:39 +01:00
Kyle Pereira	8c7fc4fe6c	LibPDF: Offset PaintStyle when painting so pattern overlaps properly	2023-12-10 16:44:24 +01:00
Kyle Pereira	8ff87911a3	LibPDF: Add basic tiled, coloured pattern rendering	2023-12-10 16:44:24 +01:00
Kyle Pereira	8191f2b47a	LibPDF: Add parameter for background color of render	2023-12-10 16:44:24 +01:00
Kyle Pereira	60c4803dd3	LibPDF: Pass Renderer to ColorSpace	2023-12-10 16:44:24 +01:00
Kyle Pereira	082a4197b6	LibPDF: Use Variant<Color, PaintStyle> instead of Color for ColorSpaces This is in anticipation of Pattern color space support which does not yield a simple color.	2023-12-10 16:44:24 +01:00
Kyle Pereira	e4b8d68039	LibPDF: Permit comments at the end of a stream	2023-12-10 16:44:24 +01:00
Nico Weber	8b50b689f9	LibPDF: Reject invalid "hival" values Doesn't fire on any of the PDFs I have, and seems like a good thing to check.	2023-12-07 08:10:40 +00:00
Nico Weber	43cd3d7dbd	LibPDF: Tolerate palettes that are one byte too long Fixes these errors from `Meta/test_pdf.py path/to/0000`, with 0000 being 0000.zip from the PDF/A corpus in unzipped: Malformed PDF file: Indexed color space lookup table doesn't match size, in 4 files, on 8 pages, 73 times path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x) path/to/0000/0000364.pdf 5 6 path/to/0000/0000918.pdf 5 path/to/0000/0000683.pdf 8	2023-12-07 08:10:40 +00:00
Nico Weber	832a065687	LibPDF: For low-bpp images, start scanlines on byte boundaries Required per spec, and we get slanted images without it. Fixes e.g. page 1 of 0000749.pdf.	2023-12-07 08:10:40 +00:00
Nico Weber	06b9633da5	LibPDF: For indexed images with 1, 2 or 4 bpp, do not repeat bit pattern When upsampling e.g. the 4-bit value 0b1101 to 8-bit, we used to repeat the value to fill the full 8-bits, e.g. 0b11011101. This maps RGB colors to 8-bit nicely, but is the wrong thing to do for palette indices. Stop doing this for palette indices. Fixes "Indexed color space index out of range" for 11 files in the PDF/A 0000.zip test set now that we correctly handle palette indices as of the previous commit: Malformed PDF file: Indexed color space lookup table doesn't match size, in 4 files, on 8 pages, 73 times path/to/0000/0000206.pdf 2 4 (2x) 5 (3x) 6 (4x) path/to/0000/0000364.pdf 5 6 path/to/0000/0000918.pdf 5 path/to/0000/0000683.pdf 8	2023-12-07 08:10:40 +00:00
Nico Weber	8733ba2734	LibPDF: Fix decoding of IndexedColorSpace for palette sizes != 255 Previously, we were scaling palette indices from 0..(palette_size - 1) to 0..255 before using them as index into the palette. Instead, do not scale palette indices before using them as indices. (Renderer::load_image() uses `component_value_decoders.empend( .0f, 255.0f, dmin, dmax)`, so to get an identity mapping, we have to return `0, 255` from IndexedColorSpace::default_decode()). Fixes rendering of the gradient on page 5 of 0000277.pdf.	2023-12-06 15:32:13 +01:00
Nico Weber	4cb0593daf	LibPDF: Convert LAB values to bytes differently Gfx::ICC::Profile's current API takes bytes, so we need to do some contortions for LAB values to go through. This will probably become nicer once we implement all the backward transforms in Gfx::ICC::Profile, but for now let's hack it in on the LibPDF side. Makes colors in 0000651.pdf looks good, especially on pages 1 and 7-12.	2023-12-05 11:36:44 -05:00
Nico Weber	b2a1130556	LibGfx/ICC: Implement conversion between different connection spaces If one profile uses PCSXYZ and the other PCSLAB as connection space, we now do the necessary XYZ/LAB conversion. With this and the previous commits, we can now convert from profiles that use PCSLAB with mAB, such as stress.jpeg from https://littlecms.com/blog/2020/09/09/browser-check/ : % Build/lagom/icc --name sRGB --reencode-to serenity-sRGB.icc % Build/lagom/bin/image -o out.png \ --convert-to-color-profile serenity-sRGB.icc \ ~/src/jpegfiles/stress.jpeg	2023-12-04 08:02:36 +00:00
Nico Weber	1c88b82dfc	LibPDF: Do less work in SampledFunction::evaluate()'s inner loop Instead of recomputing the left index and the float amount in that interval for each coordinate all the time, do it once when we preprocess the input coordinates. One line less, faster, and arguably easier to read. No behavior change.	2023-12-02 22:26:13 +01:00
Nico Weber	54883b7d41	LibPDF: Remove get_bounds lambda in SampledFunction::evaluate() Using `min()` to guarantee the left index is never == `size() - 1`, even for an interpolation value of 1.0, is less code, and arguably easier to understand as well. No behavior change.	2023-12-02 22:26:13 +01:00
Nico Weber	d9fd72007e	LibPDF: Add a spec comment to SampledFunction::sample()	2023-12-02 22:26:13 +01:00
Idan Horowitz	aad5c58996	LibPDF: Eliminate reference cycle between OutlineItem parent/children Since all parents held a reference pointer to their children, and all children held reference pointers to their parents, both objects would never get free'd once the document was no longer being used. Fixes ossfuzz-63833.	2023-12-02 22:23:53 +01:00
Lucas CHOLLET	2a5cb5becb	LibCompress: Add `LZWDecoder::decode_all()` This method takes bytes as input and decompress everything to a ByteBuffer. It uses two control codes (clear and end of data) as described in the GIF, TIFF and PDF specifications.	2023-12-01 12:58:14 +01:00
Nico Weber	f34da6396f	LibPDF: Update font size after getting font from cache Page 1 of 0000277.pdf does: BT 22 0 0 22 59 28 Tm /TT2 1 Tf (Presented at Photonics West OPTO, February 17, 2016) Tj ET BT 32 0 0 32 269 426 Tm /TT1 1 Tf (Robert W. Boyd) Tj ET BT 22 0 0 22 253 357 Tm /TT2 1 Tf (Department of Physics and) Tj ET BT 22 0 0 22 105 326 Tm /TT2 1 Tf (Max-Planck Centre for Extreme and Quantum Photonics) Tj ET Every line begins a text operation, then updates the font matrix, selects a font (TT2, TT1, TT2, TT1), draws some text and ends the text operation. `Tm` (which sets the font matrix) contains a scale, and uses that to update the font size of the currently-active font (cf #20084). But in this file, we `Tm` first and `Tf` (font selection) second, so this updates the size of the old font. So when we pull it out of the cache again on line 3, it would still have the old size from the `Tm` on line 2. (The whole text scaling logic in LibPDF imho needs a rethink; the current approach also causes issues with zero-width glyphs which currently lead to divisions by zero. But that's for another PR.) Fixes another regression from `c8510b58a3` (which I've accidentally referred to by 2340e834cd in another commit).	2023-11-26 19:05:13 -05:00
Nico Weber	eb1c99bd72	LibPDF+LibGfx: Make SMasks on jpeg images work SMasks are greyscale images that get used as alpha channel for a different image. JPEGs in PDFs are stored as streams with /DCTDecode filters, and we have a separate code path for loading those in the PDF renderer. That code path just calls our JPEG decoder, which creates bitmaps with format BGRx8888. So when we process an SMask for such a bitmap, we have to change the bitmap's format to BGRA8888 in addition to setting alpha values on all pixels.	2023-11-23 12:13:03 +01:00
Nico Weber	57e2b5ef59	LibPDF+Tests: Correctly decode text strings without explicit encoding	2023-11-22 09:08:06 -07:00
Nico Weber	e39a790c82	LibPDF: Stop converting encodings in object parser Per 1.7 spec 3.8.1, there are multiple logical text string types: * text strings * ASCII strings * byte strings Text strings can be in UTF-16BE, PDFDocEncoding, or (since PDF 2.0) UTF-8. But byte strings shouldn't be converted but treated as binary data. This makes us no longer convert strings used for drawing page text. TABLE 5.6 "Text-showing operators" lists the operands for text-showing operators as just "string", not "text string" (even though these strings confusingly are called "text strings" in the body text), so not doing this there is correct (and matches other viewers). We also no longer incorrectly convert strings used for cypto data (such as passwords), if they start with an UTF-16BE or UTF-8 marker. No behavior change for outlines and info dict entries. https://pdfa.org/understanding-utf-8-in-pdf-2-0/ has a good overview of this. (ASCII strings only contain ASCII characters and behave the same anyways.)	2023-11-22 09:08:06 -07:00
Nico Weber	14bcb5219d	LibPDF: Tolerate comments before drawing operators Necessary to be able to render https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf	2023-11-22 08:56:43 +00:00
Nico Weber	9e8cf4fc1a	LibPDF: Tolerate comment after last dict item Necessary to be able to open https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf	2023-11-22 08:56:43 +00:00
Nico Weber	4440452f92	LibPDF: Support images with 1, 2, 4 bits per pixel They just get upsampled to 8 bits per pixel images.	2023-11-18 07:33:15 +00:00
Nico Weber	bfe27228a3	LibPDF+LibGfx: Don't invert CMYK channels in JPEG data in PDFs This is a hack: Ideally we'd have a CMYK Bitmap pixel format, and we'd convert to rgb at blit time. Then we could also apply color profiles (which for CMYK images are CMYK-based). Also, the colors for our CMYK->RGB conversion are off for PDFs, and we have distinct codepaths for this in Gfx::Color (for paths) and JPEGs. So when we fix that, we'll have to fix it in two places. But this doesn't require a lot of code and it's a huge visual progression, so let's go with it for now.	2023-11-17 22:32:40 +00:00
Nico Weber	bd7ae7f91e	LibPDF: Consistently asciibetize CommonNames.h The file wasn't quite decided if it wanted to sort by ascii value or by case folding. Now it uses ascii value, thanks to vim's `:'<,'>sort`. No behavior change.	2023-11-17 20:27:42 +00:00
Nico Weber	29396415d5	LibPDF: Add an initial implementation of type 3 glyph rendering This is a very inefficient implementation: Every time a type 3 font glyph is drawn, we parse its operator stream and execute all the operators therein. We'll want to instead cache the glyphs in bitmaps (at least in most cases), like we do for other fonts. But it's a good first step, and all the coordinate math seems to work in the files I've tested. Good test files from pdfa dataset 0000.zip: - 0000559.pdf page 1 (and 2): Has a non-default font matrix; text appears mirrored if the font matrix isn't handled correctly - 0000425.pdf, page 1: Draws several glyphs in a single run; glyphs overlap if Renderer::render_type3_glyph() ignores the passed-in point - 0000211.pdf, any page: Uses type 3 glyphs for all text. Good perf test (already "reasonably fast") - 0000521.pdf, page 5 (or 7 or or 16): The little red flag in the purple box is a type 3 font glyph, and it's colored (which in part means the first operator is `d0`, while all the other documents above use `d1`)	2023-11-17 19:47:53 +00:00
Nico Weber	14ddab5519	LibPDF: Stub out type3_font_set_glyph_width* Type 3 font glyphs begin with either `d0` or `d1`. If we bail out with an "unsupported" error on the very first operator in a glyph, we'll never paint the glyph. Just stub these out for now. We probably want to do more in here in the future (see "TABLE 5.10 Type 3 font operators" in the 1.7 spec).	2023-11-17 19:47:53 +00:00
Nico Weber	54c98a46d8	LibPDF: Correctly parse the d0 and d1 operators They are the first operator in a type 3 charproc. Operator.h already knew about them, but we didn't manage to parse them, since they're the only two operators that contain a digit.	2023-11-17 19:47:53 +00:00
Nico Weber	5513f8bbe3	LibPDF: Move ScopedState from a function on Renderer into Renderer No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	126a0be595	LibPDF: Pass Renderer to SimpleFont::draw_glyph() This makes it available in Type3Font::draw_glyph(). No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	bcc6439b5f	LibPDF: Pass Renderer to PDFFont::draw_string() It's a bit unfortunate that fonts need to know about the renderer, but type 3 fonts contain PDF drawing operators, so it's necessary. On the bright side, it makes it possible to pass fewer parameters around and compute things locally as needed. (As we implement more fonts, we'll probably want to create some functions to do these computations in a central place, eventually.) No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	e0c0864ddf	LibPDF: Load a few values off a type 3 font dictionary	2023-11-17 19:47:53 +00:00
Nico Weber	9632d8ee49	LibPDF: Make SimpleFont font matrix configurable Type 3 fonts can set it to a custom value.	2023-11-17 19:47:53 +00:00
Nico Weber	4cd1a2d319	LibPDF: Add some scaffolding for type 3 fonts	2023-11-17 19:47:53 +00:00
Nico Weber	7f999b1ff5	LibPDF: Sink m_base_font_name from PDFFont into subclasses /BaseFont is a required key for type 0, type 1, and truetype font dictionaries, but not for type 3 font dictionaries. This is mechanical; type 0 fonts don't even use this yet (but probably should). PDFFont::initialize() is now empty and could be removed, but maybe we'll put stuff there again later, so I'm leaving it around for a bit longer.	2023-11-17 19:47:53 +00:00
Nico Weber	6c1da5db54	LibPDF: Make SimpleFont::draw_glyph() fallible	2023-11-17 19:47:53 +00:00
Nico Weber	843e9daa8c	LibPDF: Remove unused PDFFont::type() This got added in #15270, but its one use then got removed again in #16150. No behavior change.	2023-11-17 19:47:53 +00:00
Nico Weber	26fd29baf8	LibPDF: Give Type3 fonts a dedicated error message They're described in "5.5.4 Type 3 Fonts" in the PDF 1.7 spec, so we shouldn't `internal_error()` on them. They're just not implemented yet.	2023-11-17 19:47:53 +00:00
Nico Weber	5eaa403ddf	LibPDF: Use font dictionary object as cache key, not resource name In the main page contents, /T0 might refer to a different font than it might refer to in an XObject. So don't use the `Tf` argument as font cache key. Instead, use the address of the font dictionary object. Fixes false cache sharing, and also allows us to share cache entries if the same font dict is referred to by two different names. Fixes a regression from 2340e834cd (but keeps the speed-up intact).	2023-11-17 19:14:39 +01:00
Nico Weber	443b3eac77	LibPDF: Let decode_png_prediction() call LibGfx's unfilter_scanline() It's less code, but it also fixes a bug: The implementation in Filter.cpp used to use the previous byte as reference value, while we're supposed to use the value of the previous channel as reference (at least when a pixel is larger than one byte).	2023-11-17 19:09:50 +01:00
Nico Weber	145ade3a86	LibPDF: Remove a needless AK:: qualification No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	0416a07d56	LibPDF: Make filter byte not part of row in decode_png_prediction() No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	b763960fc2	LibPDF: Convert decode_png_prediction to use spans No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	588d6fab22	LibGfx+LibPDF: Create filter_type() for converting u8 to FilterType ...and use it in LibPDF. No behavior change.	2023-11-17 19:09:50 +01:00
Nico Weber	7e4fe8e610	LibPDF: Use PNG::paeth_predictor() in png decoding path No behavior change. Ideally, the PDF code would just call a function PNGLoader to do the PNG unfiltering, but let's first try to make the implementations look more similar.	2023-11-17 19:09:50 +01:00
Lucas CHOLLET	1e8004734f	LibPDF: Don't consider the End of Data code as normal ASCII85 input Data encoded with ASCII85 is terminated with the EOD code 0x7E3E. This should not be considered as normal input but rather discarded.	2023-11-14 10:15:15 +01:00
Lucas CHOLLET	59a6d4b7bc	LibPDF: Factorize duplicated code in `Filter::decode_ascii85()`	2023-11-14 10:15:15 +01:00
Lucas CHOLLET	2fe0647c68	LibPDF: Handle pdf-specific white spaces correctly in ASCII85 We were previously only looking the space character but PDF white spaces is a superset of ascii spaces.	2023-11-14 10:15:15 +01:00
Lucas CHOLLET	db08fe12ec	LibPDF: Implement `Reader::is_[eol, whitespace](char)` These two static members are now used to implement respective `matches_` methods but will also be useful to provide a global implementation of the specified concept of whitespace.	2023-11-14 10:15:15 +01:00
Lucas CHOLLET	dac703a0b8	LibPDF: Avoid an unnecessary copy in `Filter::decode_ascii85()`	2023-11-14 10:15:15 +01:00
Nico Weber	9b022239c3	LibPDF: Apply all offsets of TJ operator TJ acts on a list of either strings or numbers. The strings are drawn, and the numbers are treated as offsets. Previously, we'd only apply the last-seen number as offset when we saw a string. That had the effect of us ignoring all but the last number in front of a string, and ignoring numbers at the end of the list. Now, we apply all numbers as offsets. Our rendering of Tests/LibPDF/text.pdf now matches other PDF viewers.	2023-11-14 10:11:09 +01:00
Nico Weber	1c2b0feb7b	LibPDF: Change how CFF optional width prefix is stored Per 5177.Type2.pdf 3.1 "Type 2 Charstring Organization", a glyph's charstring looks like: w? {hs* vs* cm* hm* mt subpath}? {mt subpath}* endchar The `w?` is the width of the glyph, but it's optional. So all possible commands after it (hstem* vstem* cntrmask hintmask moveto endchar) check if there's an extra number at the start and interpret it as a width, for the very first command we read. This was done by having an `is_first_command` local bool that got set to false after the first command. That didn't work with subrs: If the first command was a call to a subr that just pushed a bunch of numbers, then the second command after it is the actual first command. Instead, move that bool into the state. Set it to false the first time we try to read a width, since that means we just read a command that could've been prefixed by a width.	2023-11-14 10:10:34 +01:00
Lucas CHOLLET	9e4d697d23	LibPDF: Detect DCT images correctly Images can have multiple filters, each one of them is processed sequentially. Only the last one will be relevant for the image format (DCT or JPXDecode), so use the last filter instead of the first one to detect that property.	2023-11-13 10:30:34 -05:00
Nico Weber	f882a3ae37	LibPDF: In ColorSpace creation code, use resolve_to() more For valid PDFs, this makes no difference. For invalid PDFs, we now assert during the cast in resolve_to() instead of returning a PDFError. However, most PDFs are valid, and even for invalid PDFs, we'd previously keep the old color space around when getting the PDF error and then usually assert later when the old color space got passed a color with an unexpected number of components (since the components were for the new color space). Doesn't affect any of the > 2000 PDFs I use for testing locally, is less code, and should make for less surprising asserts when it does happen.	2023-11-13 10:29:26 -05:00
Lucas CHOLLET	9bc25db9a3	LibPDF: Add support for the LZW filter This allows us to decode the first page of ThinkingInPostScript.pdf :^)	2023-11-13 14:23:23 +01:00
Lucas CHOLLET	048ef11136	LibPDF: Factorize flate parameters handling to its own function This part will be shared with the LZW filter, so let's factorize it.	2023-11-13 14:23:23 +01:00
Nico Weber	bbde3cbc90	LibPDF: Tolerate an indirect object as dict for CIE-based color spaces Namely, for CalGrayColorSpace, CalRGBColorSpace, LabColorSpace. Fixes a crash rendering any page of Adobe's 5014.CIDFont_Spec.pdf (which uses CalRGBColorSpace with an indirect dict: The dict is object `92 0`, and many color spaces are inline objects referring to it).	2023-11-13 07:12:05 -05:00
Nico Weber	f4a847894f	LibPDF: Make SampledFunction::evaluate() work for n-dimensional input I didn't find example code for this and the AI assistant did very poorly on this as well. So I had to write it all by myself! It can be much more efficient I think, but I think the overall shape is maybe roughly fine.	2023-11-12 07:55:04 +01:00
Nico Weber	a9ef65e64a	LibPDF: For multi-output SampledFunctions, fix output colors For N outputs, the outputs aren't stored in N independent planes. Instead, N output values are stored right next to each other in the stream data.	2023-11-11 08:55:37 +01:00
Nico Weber	ec739460e0	LibPDF: Add test for SampledFunction and fix bugs found by it * SampledFunction now keeps the StreamObject it gets data from alive (doesn't matter too much in practice, but does matter in the test, where nothing else keeps the stream alive). * If a sample is an integer, we would previously sample that value twice and then divide by zero when interpolating. Make sure to sample 1 unit apart.	2023-11-11 08:55:37 +01:00
Nico Weber	323ba7404c	LibPDF: Implement SampledFunction::evaluate() for some sampled functions Things now work for functions that are all of: * linear * 1-D input * 8 bits per sample	2023-11-10 15:03:30 +00:00
Nico Weber	fd1876441a	LibPDF: Implement SampledFunction::create()	2023-11-10 15:03:30 +00:00
Nico Weber	cd9f4655ec	LibPDF: Tweak implementation of postscript `roll` op Since positive offsets roll to the right, it makes more sense to do the big reverse first. Gets rid of an awkward minus sign. No behavior change.	2023-11-10 14:45:38 +01:00
Nico Weber	b23ed86889	LibPDF: Implement StitchingFunction::evaluate()	2023-11-10 14:45:16 +01:00
Nico Weber	ba34ddeb21	LibPDF: Implement StitchingFunction creation	2023-11-10 14:45:16 +01:00
Nico Weber	5af6e1c042	LibPDF: Implement DeviceNColorSpace	2023-11-09 23:33:49 +01:00
Nico Weber	0f07049935	LibPDF: Add ColorSpaceFamily::operator== No behavior change.	2023-11-09 23:33:49 +01:00
Nico Weber	80eec1e16b	LibPDF: Implement PostScriptCalculatorFunction Includes a tokenizer and interpreter for the subset of PostScript supported in PDF type 4 functions.	2023-11-09 16:06:25 +01:00
Tim Schumacher	a2f60911fe	AK: Rename GenericTraits to DefaultTraits This feels like a more fitting name for something that provides the default values for Traits.	2023-11-09 10:05:51 -05:00
Nico Weber	bbd86ee4f3	LibPDF: Implement ExponentialInterpolationFunction	2023-11-06 10:01:05 +01:00
Nico Weber	1aed465efe	LibPDF: Implement Fuction::create()	2023-11-06 10:01:05 +01:00
Nico Weber	b78ea81de5	LibPDF: Implement SeparationColorSpace Requires PDF::Function, which isn't implemented yet, so this has no visual effect yet.	2023-11-06 10:01:05 +01:00
Nico Weber	9204252d02	LibPDF: Add scaffolding for function objects See PDF 1.7 Spec, "3.9 Functions".	2023-11-06 10:01:05 +01:00
Nico Weber	21894f1cde	LibPDF: Fix typos in DeviceN colorspace scaffolding * Compare array size to 3 and 4, not 4 and 5 * Fix literal typo in error message Fixes crash processing 0000906.pdf from 0000.zip from the pdf/a dataset.	2023-11-06 09:54:01 +01:00

1 2 3 4 5 ...

683 Commits