WeasyPrint/README


WeasyPrint converts web documents (HTML, CSS, ...) to PDF.

See the documentation at http://weasyprint.org/


Dependencies
------------

Listed in setup.py, will install automatically if you use easy_install or pip:

 * html5lib
 * lxml
 * cssutils
 * Attest

Not listed in setup.py since they are either not on PyPI or tricky to compile.
You need to install these manually:

 * PyCairo
 * PyGTK
 * python-rsvg

About the PyGTK dependency
--------------------------

WeasyPrint does not use GTK+, but it uses Pango for text rendering and rsvg for
SVG rendering. Both of them can work work without GTK+, but their Python
bindings either are part of PyGTK (for Pango) or depend on PyGTK (for rsvg).

If someday we have GObject introspection for all of Pango, rsvg and cairo
we can switch to those and drop the PyGTK dependency.

Standards conformance
---------------------

WeasyPrint strives for web standards conformance. For some standards however,
conformance is just that of the libraries we use:

 * HTML parsing: (turning bytes into a DOM tree), we currently use lxml.html
   (see below.)
 * CSS parsing: cssutils
 * CSS selectors: lxml.cssselect (conforms to CSS3 with some exceptions,
   see http://lxml.de/cssselect.html#limitations)
 * SVG: rsvg

The biggest part where WeasyPrint only has itself to blame about conformance is
the graphical rendering and layout of documents. (That is: all of CSS but syntax
and selectors.)

Inline SVG
----------

SVG, even when inlined in the HTML document, is rendered by the rsvg library
independently of the rest of the document. In CSS speak, we consider it to be
a “replaced element”.

HTML parsing
------------

We use lxml to parse HTML into an object tree. lmxl’s own parser is very fast,
but it can optionnaly use the html5lib parser. html5lib implements the HTML5
parsing algorithm so it should give better results on broken HTML, though
“they all parse pretty-good HTML the same.” [1]

[1] http://stackoverflow.com/questions/2676872/how-to-parse-malformed-html-in-python-using-standard-libraries/2680724#2680724

lxml vs ElementTree
-------------------

lxml uses the same API as ElementTree so that some programs can use any of them.
However we need lxml.cssselect, which does not exist in ElementTree.