2011-10-28 19:14:31 +04:00
|
|
|
|
Hacking WeasyPrint
|
2012-09-19 17:20:34 +04:00
|
|
|
|
==================
|
2011-10-28 19:14:31 +04:00
|
|
|
|
|
2012-09-20 18:29:33 +04:00
|
|
|
|
Assuming you already have the :doc:`dependencies </install>`,
|
|
|
|
|
install the `development version`_ of WeasyPrint:
|
2012-03-15 19:01:09 +04:00
|
|
|
|
|
|
|
|
|
.. _development version: https://github.com/Kozea/WeasyPrint
|
2011-10-28 19:14:31 +04:00
|
|
|
|
|
|
|
|
|
.. code-block:: sh
|
|
|
|
|
|
|
|
|
|
git clone git://github.com/Kozea/WeasyPrint.git
|
|
|
|
|
cd WeasyPrint
|
2012-03-15 19:01:09 +04:00
|
|
|
|
virtualenv --system-site-packages env
|
|
|
|
|
. env/bin/activate
|
2018-01-28 00:35:07 +03:00
|
|
|
|
pip install Sphinx -e .[test]
|
2012-03-15 19:01:09 +04:00
|
|
|
|
weasyprint --help
|
2011-10-28 19:14:31 +04:00
|
|
|
|
|
2018-03-30 01:13:36 +03:00
|
|
|
|
This will install WeasyPrint in “editable” mode (which means that you don’t
|
|
|
|
|
need to re-install it every time you make a change in the source code) as well
|
|
|
|
|
as `pytest <http://pytest.org/>`_ and `Sphinx <http://sphinx.pocoo.org/>`_.
|
|
|
|
|
|
|
|
|
|
Lastly, in order to pass unit tests, your system must have as default font any
|
|
|
|
|
font with a condensed variant (i.e. DejaVu) - typically installable via your
|
|
|
|
|
distro's packaging system.
|
2016-09-04 17:57:34 +03:00
|
|
|
|
|
2013-03-15 20:42:18 +04:00
|
|
|
|
|
|
|
|
|
Documentation changes
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
The documentation lives in the ``docs`` directory,
|
|
|
|
|
but API section references docstrings in the source code.
|
|
|
|
|
Run ``python setup.py build_sphinx`` to rebuild the documentation
|
|
|
|
|
and get the output in ``docs/_build/html``.
|
|
|
|
|
The website version is updated automatically when we push to master on GitHub.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Code changes
|
|
|
|
|
------------
|
2011-10-28 19:14:31 +04:00
|
|
|
|
|
2018-03-12 20:10:21 +03:00
|
|
|
|
Use the ``python setup.py test`` command from the ``WeasyPrint`` directory to
|
|
|
|
|
run the test suite.
|
2011-10-28 19:14:31 +04:00
|
|
|
|
|
2012-09-20 18:29:33 +04:00
|
|
|
|
Please report any bugs/feature requests and submit patches/pull requests
|
|
|
|
|
`on Github <https://github.com/Kozea/WeasyPrint>`_.
|
2012-02-29 22:13:10 +04:00
|
|
|
|
|
2011-10-28 19:14:31 +04:00
|
|
|
|
|
|
|
|
|
Dive into the source
|
|
|
|
|
--------------------
|
|
|
|
|
|
2011-10-31 19:45:22 +04:00
|
|
|
|
The rest of this document is a high-level overview of WeasyPrint’s source
|
|
|
|
|
code. For more details, see the various docstrings or even the code itself.
|
2012-10-22 13:27:05 +04:00
|
|
|
|
When in doubt, feel free to `ask <http://weasyprint.org/community>`_!
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2011-10-28 19:14:31 +04:00
|
|
|
|
Much like `in web browsers
|
|
|
|
|
<http://www.html5rocks.com/en/tutorials/internals/howbrowserswork/#The_main_flow>`_,
|
|
|
|
|
the rendering of a document in WeasyPrint goes like this:
|
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
1. The HTML document is fetched and parsed into a tree of elements (like DOM)
|
2011-10-28 19:14:31 +04:00
|
|
|
|
2. CSS stylesheets (either found in the HTML or supplied by the user) are
|
|
|
|
|
fetched and parsed
|
|
|
|
|
3. The stylesheets are applied to the DOM tree
|
2012-06-04 20:52:33 +04:00
|
|
|
|
4. The DOM tree with styles is transformed into a *formatting structure* made of rectangular boxes.
|
|
|
|
|
5. These boxes are *laid-out* with fixed dimensions and position onto pages.
|
2017-07-25 14:58:18 +03:00
|
|
|
|
6. For each page, the boxes:
|
|
|
|
|
|
|
|
|
|
- are re-ordered to observe stacking rules, and
|
|
|
|
|
- are drawn on a PDF page.
|
|
|
|
|
|
|
|
|
|
7. Cairo’s PDF is modified to add metadata such as bookmarks and hyperlinks.
|
2011-10-28 19:14:31 +04:00
|
|
|
|
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
|
|
|
|
HTML
|
|
|
|
|
....
|
|
|
|
|
|
2017-07-07 12:13:37 +03:00
|
|
|
|
Not much to see here. The :class:`weasyprint.HTML` class handles step 1 and
|
|
|
|
|
gives a tree of HTML *elements*. Although the actual API is different, this
|
|
|
|
|
tree is conceptually the same as what web browsers call *the DOM*.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
|
2011-10-31 19:45:22 +04:00
|
|
|
|
CSS
|
|
|
|
|
...
|
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
As with HTML, CSS stylesheets are parsed in the :class:`weasyprint.CSS` class
|
2017-03-26 12:42:50 +03:00
|
|
|
|
with an external library, tinycss2_.
|
2012-10-09 19:48:36 +04:00
|
|
|
|
After the In addition to the actual parsing, the :mod:`weasyprint.css` and
|
|
|
|
|
:mod:`weasyprint.css.validation` modules do some pre-processing:
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
* Unknown and unsupported declarations are ignored with warnings.
|
|
|
|
|
Remaining property values are parsed in a property-specific way
|
2017-03-26 12:42:50 +03:00
|
|
|
|
from raw tinycss2 tokens into a higher-level form.
|
2012-10-09 19:48:36 +04:00
|
|
|
|
* Shorthand properties are expanded. For example, ``margin`` becomes
|
2011-10-31 19:45:22 +04:00
|
|
|
|
``margin-top``, ``margin-right``, ``margin-bottom`` and ``margin-left``.
|
2012-02-29 22:13:10 +04:00
|
|
|
|
* Hyphens in property names are replaced by underscores (``margin-top``
|
2011-10-31 19:45:22 +04:00
|
|
|
|
becomes ``margin_top``) so that they can be used as Python attribute names
|
2012-10-09 19:48:36 +04:00
|
|
|
|
later on. This transformation is safe since none for the know (not ignored)
|
|
|
|
|
properties have an underscore character.
|
2017-08-03 18:27:19 +03:00
|
|
|
|
* Selectors are pre-compiled with cssselect2_.
|
2012-10-09 19:48:36 +04:00
|
|
|
|
|
2017-08-03 18:27:19 +03:00
|
|
|
|
.. _tinycss2: https://pypi.python.org/pypi/tinycss2
|
|
|
|
|
.. _cssselect2: https://pypi.python.org/pypi/cssselect2
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
The cascade
|
|
|
|
|
...........
|
|
|
|
|
|
|
|
|
|
After that and still in the :mod:`weasyprint.css` package, the cascade_
|
|
|
|
|
(that’s the C in CSS!) applies the stylesheets to the element tree.
|
|
|
|
|
Selectors associate property declarations to elements. In case of conflicting
|
|
|
|
|
declarations (different values for the same property on the same element),
|
|
|
|
|
the one with the highest *weight* wins. Weights are based on the stylesheet’s
|
|
|
|
|
:ref:`origin <stylesheet-origins>`, ``!important`` markers, selector
|
|
|
|
|
specificity and source order. Missing values are filled in through
|
|
|
|
|
*inheritance* (from the parent element) or the property’s *initial value*,
|
|
|
|
|
so that every element has a *specified value* for every property.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
|
|
|
|
.. _cascade: http://www.w3.org/TR/CSS21/cascade.html
|
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
These *specified values* are turned into *computed values* in the
|
|
|
|
|
``weasyprint.css.computed_values`` module. Keywords and lengths in various
|
|
|
|
|
units are converted to pixels, etc. At this point the value for some
|
|
|
|
|
properties can be represented by a single number or string, but some require
|
|
|
|
|
more complex objects. For example, a :class:`Dimension` object can be either
|
|
|
|
|
an absolute length or a percentage.
|
|
|
|
|
|
|
|
|
|
The final result of the :func:`~weasyprint.css.get_all_computed_styles`
|
|
|
|
|
function is a big dict where keys are ``(element, pseudo_element_type)``
|
2018-01-13 19:41:08 +03:00
|
|
|
|
tuples, and keys are style dict objects. Elements are
|
2017-07-07 12:13:37 +03:00
|
|
|
|
ElementTree elements, while the type of pseudo-element is a string
|
2017-06-30 23:48:47 +03:00
|
|
|
|
for eg. ``::first-line`` selectors, or :obj:`None` for “normal”
|
2018-01-13 19:41:08 +03:00
|
|
|
|
elements. Style dict objects are dicts with attribute read-only access
|
2017-06-30 23:48:47 +03:00
|
|
|
|
mapping property names to the computed values. (The return value is not the
|
|
|
|
|
dict itself, but a convenience :func:`style_for` function for accessing it.)
|
2012-10-09 19:48:36 +04:00
|
|
|
|
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
|
|
|
|
Formatting structure
|
|
|
|
|
....................
|
|
|
|
|
|
2017-06-30 23:48:47 +03:00
|
|
|
|
The `visual formatting model`_ explains how *elements* (from the ElementTree
|
|
|
|
|
tree) generate *boxes* (in the formatting structure). This is step 4 above.
|
|
|
|
|
Boxes may have children and thus form a tree, much like elements. This tree is
|
|
|
|
|
generally close but not identical to the ElementTree tree: some elements
|
|
|
|
|
generate more than one box or none.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
|
|
|
|
.. _visual formatting model: http://www.w3.org/TR/CSS21/visuren.html
|
|
|
|
|
|
|
|
|
|
Boxes are of a lot of different kinds. For example you should not confuse
|
|
|
|
|
*block-level boxes* and *block containers*, though *block boxes* are both.
|
2012-10-09 19:48:36 +04:00
|
|
|
|
The :mod:`weasyprint.formatting_structure.boxes` module has a whole hierarchy
|
|
|
|
|
of classes to represent all these boxes. We won’t go into the details here,
|
|
|
|
|
see the module and class docstrings.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2017-06-30 23:48:47 +03:00
|
|
|
|
The :mod:`weasyprint.formatting_structure.build` module takes an ElementTree
|
|
|
|
|
tree with associated computed styles, and builds a formatting structure. It
|
|
|
|
|
generates the right boxes for each element and ensures they conform to the
|
|
|
|
|
models rules. (Eg. an inline box can not contain a block.) Each box has a
|
2018-01-13 19:41:08 +03:00
|
|
|
|
:attr:`.style` attribute containing the style dict of computed values.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
The main logic is based on the ``display`` property, but it can be overridden
|
|
|
|
|
for some elements by adding a handler in the ``weasyprint.html`` module.
|
2012-06-04 20:52:33 +04:00
|
|
|
|
This is how ``<img>`` and ``<td colspan=3>`` are currently implemented,
|
|
|
|
|
for example.
|
2012-02-29 22:13:10 +04:00
|
|
|
|
This module is rather short as most of HTML is defined in CSS rather than
|
|
|
|
|
in Python, in the `user agent stylesheet`_.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
The :func:`~weasyprint.formatting_structure.build.build_formatting_structure`
|
|
|
|
|
function returns the box for the root element (and, through its
|
|
|
|
|
:attr:`children` attribute, the whole tree).
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2012-02-29 22:13:10 +04:00
|
|
|
|
.. _user agent stylesheet: https://github.com/Kozea/WeasyPrint/blob/master/weasyprint/css/html5_ua.css
|
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
|
2011-10-31 19:45:22 +04:00
|
|
|
|
Layout
|
|
|
|
|
......
|
|
|
|
|
|
|
|
|
|
Step 5 is the layout. You could say the everything else is glue code and
|
|
|
|
|
this is where the magic happens.
|
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
During the layout the document’s content is, well, laid out on pages.
|
|
|
|
|
This is when we decide where to do line breaks and page breaks. If a break
|
|
|
|
|
happens inside of a box, that box is split into two (or more) boxes in the
|
|
|
|
|
layout result.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
|
|
|
|
According to the `box model`_, each box has rectangular margin, border,
|
|
|
|
|
padding and content areas:
|
|
|
|
|
|
|
|
|
|
.. _box model: http://www.w3.org/TR/CSS21/box.html
|
|
|
|
|
|
2012-10-07 12:12:15 +04:00
|
|
|
|
.. image:: _static/box_model.png
|
2011-10-31 19:45:22 +04:00
|
|
|
|
:align: center
|
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
While :obj:`box.style` contains computed values, the `used values`_ are set
|
|
|
|
|
as attributes of the :class:`Box` object itself during the layout. This
|
2011-10-31 19:45:22 +04:00
|
|
|
|
include resolving percentages and especially ``auto`` values into absolute,
|
|
|
|
|
pixel lengths. Once the layout done, each box has used values for
|
2012-10-09 19:48:36 +04:00
|
|
|
|
margins, border width, padding of each four sides, as well as the
|
|
|
|
|
:attr:`width` and :attr:`height` of the content area. They also have
|
2018-07-30 17:58:19 +03:00
|
|
|
|
:attr:`position_x` and :attr:`position_y`, the absolute coordinates of the
|
2012-10-09 19:48:36 +04:00
|
|
|
|
top-left corner of the margin box (**not** the content box) from the top-left
|
|
|
|
|
corner of the page.\ [#]_
|
|
|
|
|
|
|
|
|
|
Boxes also have helpers methods such as :meth:`content_box_y` and
|
|
|
|
|
:meth:`margin_width` that give other metrics that can be useful in various
|
|
|
|
|
parts of the code.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
The final result of the layout is a list of :class:`PageBox` objects.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
.. [#] These are the coordinates *if* no `CSS transform`_ applies.
|
|
|
|
|
Transforms change the actual location of boxes, but they are applies
|
|
|
|
|
later during drawing and do not affect layout.
|
|
|
|
|
.. _used values: http://www.w3.org/TR/CSS21/cascade.html#used-value
|
|
|
|
|
.. _CSS transform: http://www.w3.org/TR/css3-transforms/
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
|
|
|
|
|
2018-07-30 18:19:12 +03:00
|
|
|
|
Stacking & Drawing
|
|
|
|
|
..................
|
2012-06-04 20:52:33 +04:00
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
In step 6, the boxes are reorder by the :mod:`weasyprint.stacking` module
|
2012-06-04 20:52:33 +04:00
|
|
|
|
to observe `stacking rules`_ such as the ``z-index`` property.
|
2012-10-09 19:48:36 +04:00
|
|
|
|
The result is a tree of *stacking contexts*.
|
2012-06-04 20:52:33 +04:00
|
|
|
|
|
2018-07-30 18:19:12 +03:00
|
|
|
|
Next, each laid-out page is *drawn* onto a cairo_ surface. Since each box has
|
|
|
|
|
absolute coordinates on the page from the layout step, the logic here should be
|
|
|
|
|
minimal. If you find yourself adding a lot of logic here, maybe it should go in
|
|
|
|
|
the layout or stacking instead.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
The code lives in the :mod:`weasyprint.draw` module.
|
2011-10-31 19:45:22 +04:00
|
|
|
|
|
2018-07-30 18:19:12 +03:00
|
|
|
|
.. _stacking rules: http://www.w3.org/TR/CSS21/zindex.html
|
2011-10-31 19:45:22 +04:00
|
|
|
|
.. _cairo: http://cairographics.org/pycairo/
|
2012-06-04 20:52:33 +04:00
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
|
2012-06-04 20:52:33 +04:00
|
|
|
|
Metadata
|
|
|
|
|
........
|
|
|
|
|
|
2012-10-09 19:48:36 +04:00
|
|
|
|
Finally (step 8), the :mod:`weasyprint.pdf` module parses the PDF file
|
|
|
|
|
produced by cairo and makes appends to it to add meta-data:
|
|
|
|
|
internal and external hyperlinks, as well as outlines / bookmarks.
|