ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2024-11-13 11:42:38 +03:00

Author	SHA1	Message	Date
Julian Offenhäuser	65e83bed53	LibPDF: Disallow parsing indirect values as operands An operation like 0 0 0 RG would have been confused for [ 0, 0 0 R ] G	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	563d91b6c4	LibPDF: Implement loading compressed objects from object streams Now, whenever the xref table points to a compressed object, parse_object_with_index will look it up in the corresponding object stream as if it were a regular object. With this, our parser gains the bare minimum support for xref streams.	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	4887aacec7	LibPDF: Move document-specific parsing functionality into its own class The Parser class is now a generic PDF object parser, of which the new DocumentParser class derives. DocumentParser now takes over all functions relating to linearization, pages, xref and trailer handling. This allows the use of multiple parsers in the same document's context, which will be needed in order to handle PDF object streams.	2022-09-17 10:07:14 +01:00
Julian Offenhäuser	9f4659cc63	LibPDF: Move consume and match helper functions to the Reader class	2022-09-17 10:07:14 +01:00
Matthew Olsson	468ceb1b48	LibPDF: Rename Command to Operator This is the correct name, according to the spec	2022-03-31 18:10:45 +02:00
Matthew Olsson	4e81663b31	LibPDF: Attempt to unecrypt strings and streams	2022-03-29 02:52:57 +02:00
Matthew Olsson	60c3e786be	LibPDF: Require Document* in Parser constructor This makes it a bit easier to avoid calling parser->set_document, an issue which cost me ~30 minutes to find.	2022-03-29 02:52:57 +02:00
Matthew Olsson	a8de9cf541	LibPDF: Keep track of the current object index/generation while Parsing This information is required to decrypt encrypted strings/streams.	2022-03-29 02:52:57 +02:00
Matthew Olsson	73cf8205b4	LibPDF: Propagate errors in Parser and Document	2022-03-07 10:53:57 +01:00
Matthew Olsson	c1aa8c4a44	LibPDF: Remove unused function in Parser	2022-03-07 10:53:57 +01:00
Simon Woertz	c857b5d22f	LibPDF: Convert `PDF::Parser::m_document` from `RefPtr` to `WeakPtr` Otherwise both `PDF::Document` and `PDF::Parser` have a `RefPtr` pointing to each other which leads to a memory leak due to a circular dependency.	2022-01-08 18:57:55 +01:00
Andreas Kling	80d4e830a0	Everywhere: Pass AK::ReadonlyBytes by value	2021-11-11 01:27:46 +01:00
Ben Wiederhake	da170997d5	LibPDF: Move inline function definition This breaks the dependency cycle between Parser and Document.	2021-09-20 17:39:36 +04:30
Wesley Moret	1b8f73b6b3	LibPDF: Fix treating not finding the linearized dict as a fatal error We now try to parse the first indirect value and see if it's the `Linearization Parameter Dictionary`. if it's not, we fallback to reading the xref table from the end of the document	2021-07-16 20:44:10 +02:00
Matthew Olsson	612b183703	LibPDF: Convert to east-const to comply with the recent style changes	2021-06-12 22:45:01 +04:30
Matthew Olsson	ea3abb14fe	LibPDF: Parse hint tables This code isn't _actually_ used as of right now, but I wrote it at the same time as all of the code in the previous commit. I realized after I wrote it that these hint tables aren't super useful if the parser already has access to the full file. However, this will be useful if we ever want to stream PDFs from the web (and possibly view them in the browser).	2021-06-12 22:45:01 +04:30
Matthew Olsson	e23bfd7252	LibPDF: Parse linearized PDF files This is a big step, as most PDFs which are downloaded online will be linearized. Pretty much the only difference is that the xref structure is slightly different.	2021-06-12 22:45:01 +04:30
Matthew Olsson	78bc9d1539	LibPDF: Refine the distinction between the Document and Parser The Parser should hold information relevant for parsing, whereas the Document should hold information relevant for displaying pages. With this in mind, there is no reason for the Document to hold the xref table and trailer. These objects have been moved to the Parser, which allows the Parser to expose less public methods (which will be even more evident once linearized PDFs are supported).	2021-06-12 22:45:01 +04:30
Matthew Olsson	1ef5071d1b	LibPDF: Harden the document/parser against errors	2021-06-12 22:45:01 +04:30
Matthew Olsson	101639e526	LibPDF: Parse graphics commands	2021-05-18 16:35:23 +02:00
Matthew Olsson	2f0a2865f2	LibPDF: Give Parser a reference to the Document The Parser will need to call resolve_to on certain values.	2021-05-18 16:35:23 +02:00
Matthew Olsson	3aeaceb727	LibPDF: Parse nested Page Tree structures We now follow nested page tree nodes to find all of the actual page dicts, whereas previously we just assumed the root level page tree node contained all of the page children directly.	2021-05-10 10:32:39 +02:00
Matthew Olsson	8c745ad0d9	LibPDF: Parse page structures This commit introduces the ability to parse the document catalog dict, as well as the page tree and individual pages. Pages obviously aren't fully parsed, as we won't care about most of the fields until we start actually rendering PDFs. One of the primary benefits of the PDF format is laziness. PDFs are not meant to be parsed all at once, and the same is true for pages. When a Document is constructed, it builds a map of page number to object index, but it does not fetch and parse any of the pages. A page is only parsed when a caller requests that particular page (and is cached going forwards). Additionally, this commit also adds an object_cast function which logs bad casts if DEBUG_PDF is set. Additionally, utility functions were added to ArrayObject and DictObject to get all types of objects from the collections to avoid having to manually cast.	2021-05-10 10:32:39 +02:00
Matthew Olsson	72f693e9ed	LibPDF: Add a basic parser and Document structure This commit adds a parser as well as the Reader class, which serves as a utility to aid in reading the PDF both forwards and in reverse. The parser currently is capable of reading xref tables, as well as all values. We don't really do anything with any of this information, however.	2021-05-10 10:32:39 +02:00

24 Commits