mirror of
https://github.com/Kozea/WeasyPrint.git
synced 2024-10-05 16:37:47 +03:00
74 lines
2.2 KiB
Plaintext
74 lines
2.2 KiB
Plaintext
|
||
WeasyPrint converts web documents (HTML, CSS, ...) to PDF.
|
||
|
||
See the documentation at http://weasyprint.org/
|
||
|
||
|
||
Dependencies
|
||
------------
|
||
|
||
Listed in setup.py, will install automatically if you use easy_install or pip:
|
||
|
||
* html5lib
|
||
* lxml
|
||
* cssutils
|
||
* Attest
|
||
|
||
Not listed in setup.py since they are either not on PyPI or tricky to compile.
|
||
You need to install these manually:
|
||
|
||
* PyCairo
|
||
* PyGTK
|
||
* python-rsvg
|
||
|
||
About the PyGTK dependency
|
||
--------------------------
|
||
|
||
WeasyPrint does not use GTK+, but it uses Pango for text rendering and rsvg for
|
||
SVG rendering. Both of them can work work without GTK+, but their Python
|
||
bindings either are part of PyGTK (for Pango) or depend on PyGTK (for rsvg).
|
||
|
||
If someday we have GObject introspection for all of Pango, rsvg and cairo
|
||
we can switch to those and drop the PyGTK dependency.
|
||
|
||
Standards conformance
|
||
---------------------
|
||
|
||
WeasyPrint strives for web standards conformance. For some standards however,
|
||
conformance is just that of the libraries we use:
|
||
|
||
* HTML parsing: (turning bytes into a DOM tree), we currently use lxml.html
|
||
(see below.)
|
||
* CSS parsing: cssutils
|
||
* CSS selectors: lxml.cssselect (conforms to CSS3 with some exceptions,
|
||
see http://lxml.de/cssselect.html#limitations)
|
||
* SVG: rsvg
|
||
|
||
The biggest part where WeasyPrint only has itself to blame about conformance is
|
||
the graphical rendering and layout of documents. (That is: all of CSS but syntax
|
||
and selectors.)
|
||
|
||
Inline SVG
|
||
----------
|
||
|
||
SVG, even when inlined in the HTML document, is rendered by the rsvg library
|
||
independently of the rest of the document. In CSS speak, we consider it to be
|
||
a “replaced element”.
|
||
|
||
HTML parsing
|
||
------------
|
||
|
||
We use lxml to parse HTML into an object tree. lmxl’s own parser is very fast,
|
||
but it can optionnaly use the html5lib parser. html5lib implements the HTML5
|
||
parsing algorithm so it should give better results on broken HTML, though
|
||
“they all parse pretty-good HTML the same.” [1]
|
||
|
||
[1] http://stackoverflow.com/questions/2676872/how-to-parse-malformed-html-in-python-using-standard-libraries/2680724#2680724
|
||
|
||
lxml vs ElementTree
|
||
-------------------
|
||
|
||
lxml uses the same API as ElementTree so that some programs can use any of them.
|
||
However we need lxml.cssselect, which does not exist in ElementTree.
|
||
|