1
1
mirror of https://github.com/Kozea/WeasyPrint.git synced 2024-10-05 08:27:22 +03:00
WeasyPrint/docs/hacking.rst

238 lines
9.6 KiB
ReStructuredText
Raw Normal View History

2011-10-28 19:14:31 +04:00
Hacking WeasyPrint
2012-09-19 17:20:34 +04:00
==================
2011-10-28 19:14:31 +04:00
2012-09-20 18:29:33 +04:00
Assuming you already have the :doc:`dependencies </install>`,
install the `development version`_ of WeasyPrint:
2012-03-15 19:01:09 +04:00
.. _development version: https://github.com/Kozea/WeasyPrint
2011-10-28 19:14:31 +04:00
.. code-block:: sh
git clone git://github.com/Kozea/WeasyPrint.git
cd WeasyPrint
2012-03-15 19:01:09 +04:00
virtualenv --system-site-packages env
. env/bin/activate
pip install Sphinx -e .[test]
2012-03-15 19:01:09 +04:00
weasyprint --help
2011-10-28 19:14:31 +04:00
2018-03-30 01:13:36 +03:00
This will install WeasyPrint in “editable” mode (which means that you dont
need to re-install it every time you make a change in the source code) as well
as `pytest <http://pytest.org/>`_ and `Sphinx <http://sphinx.pocoo.org/>`_.
Lastly, in order to pass unit tests, your system must have as default font any
font with a condensed variant (i.e. DejaVu) - typically installable via your
distro's packaging system.
Documentation changes
---------------------
The documentation lives in the ``docs`` directory,
but API section references docstrings in the source code.
Run ``python setup.py build_sphinx`` to rebuild the documentation
and get the output in ``docs/_build/html``.
The website version is updated automatically when we push to master on GitHub.
Code changes
------------
2011-10-28 19:14:31 +04:00
Use the ``python setup.py test`` command from the ``WeasyPrint`` directory to
run the test suite.
2011-10-28 19:14:31 +04:00
2012-09-20 18:29:33 +04:00
Please report any bugs/feature requests and submit patches/pull requests
`on Github <https://github.com/Kozea/WeasyPrint>`_.
2012-02-29 22:13:10 +04:00
2011-10-28 19:14:31 +04:00
Dive into the source
--------------------
2011-10-31 19:45:22 +04:00
The rest of this document is a high-level overview of WeasyPrints source
code. For more details, see the various docstrings or even the code itself.
2012-10-22 13:27:05 +04:00
When in doubt, feel free to `ask <http://weasyprint.org/community>`_!
2011-10-31 19:45:22 +04:00
2011-10-28 19:14:31 +04:00
Much like `in web browsers
<http://www.html5rocks.com/en/tutorials/internals/howbrowserswork/#The_main_flow>`_,
the rendering of a document in WeasyPrint goes like this:
2012-10-09 19:48:36 +04:00
1. The HTML document is fetched and parsed into a tree of elements (like DOM)
2011-10-28 19:14:31 +04:00
2. CSS stylesheets (either found in the HTML or supplied by the user) are
fetched and parsed
3. The stylesheets are applied to the DOM tree
2012-06-04 20:52:33 +04:00
4. The DOM tree with styles is transformed into a *formatting structure* made of rectangular boxes.
5. These boxes are *laid-out* with fixed dimensions and position onto pages.
6. For each page, the boxes:
- are re-ordered to observe stacking rules, and
- are drawn on a PDF page.
7. Cairos PDF is modified to add metadata such as bookmarks and hyperlinks.
2011-10-28 19:14:31 +04:00
2011-10-31 19:45:22 +04:00
HTML
....
Not much to see here. The :class:`weasyprint.HTML` class handles step 1 and
gives a tree of HTML *elements*. Although the actual API is different, this
tree is conceptually the same as what web browsers call *the DOM*.
2011-10-31 19:45:22 +04:00
2012-10-09 19:48:36 +04:00
2011-10-31 19:45:22 +04:00
CSS
...
2012-10-09 19:48:36 +04:00
As with HTML, CSS stylesheets are parsed in the :class:`weasyprint.CSS` class
2017-03-26 12:42:50 +03:00
with an external library, tinycss2_.
2012-10-09 19:48:36 +04:00
After the In addition to the actual parsing, the :mod:`weasyprint.css` and
:mod:`weasyprint.css.validation` modules do some pre-processing:
2011-10-31 19:45:22 +04:00
2012-10-09 19:48:36 +04:00
* Unknown and unsupported declarations are ignored with warnings.
Remaining property values are parsed in a property-specific way
2017-03-26 12:42:50 +03:00
from raw tinycss2 tokens into a higher-level form.
2012-10-09 19:48:36 +04:00
* Shorthand properties are expanded. For example, ``margin`` becomes
2011-10-31 19:45:22 +04:00
``margin-top``, ``margin-right``, ``margin-bottom`` and ``margin-left``.
2012-02-29 22:13:10 +04:00
* Hyphens in property names are replaced by underscores (``margin-top``
2011-10-31 19:45:22 +04:00
becomes ``margin_top``) so that they can be used as Python attribute names
2012-10-09 19:48:36 +04:00
later on. This transformation is safe since none for the know (not ignored)
properties have an underscore character.
* Selectors are pre-compiled with cssselect2_.
2012-10-09 19:48:36 +04:00
.. _tinycss2: https://pypi.python.org/pypi/tinycss2
.. _cssselect2: https://pypi.python.org/pypi/cssselect2
2011-10-31 19:45:22 +04:00
2012-10-09 19:48:36 +04:00
The cascade
...........
After that and still in the :mod:`weasyprint.css` package, the cascade_
(thats the C in CSS!) applies the stylesheets to the element tree.
Selectors associate property declarations to elements. In case of conflicting
declarations (different values for the same property on the same element),
the one with the highest *weight* wins. Weights are based on the stylesheets
:ref:`origin <stylesheet-origins>`, ``!important`` markers, selector
specificity and source order. Missing values are filled in through
*inheritance* (from the parent element) or the propertys *initial value*,
so that every element has a *specified value* for every property.
2011-10-31 19:45:22 +04:00
.. _cascade: http://www.w3.org/TR/CSS21/cascade.html
2012-10-09 19:48:36 +04:00
These *specified values* are turned into *computed values* in the
``weasyprint.css.computed_values`` module. Keywords and lengths in various
units are converted to pixels, etc. At this point the value for some
properties can be represented by a single number or string, but some require
more complex objects. For example, a :class:`Dimension` object can be either
an absolute length or a percentage.
The final result of the :func:`~weasyprint.css.get_all_computed_styles`
function is a big dict where keys are ``(element, pseudo_element_type)``
2018-01-13 19:41:08 +03:00
tuples, and keys are style dict objects. Elements are
ElementTree elements, while the type of pseudo-element is a string
2017-06-30 23:48:47 +03:00
for eg. ``::first-line`` selectors, or :obj:`None` for “normal”
2018-01-13 19:41:08 +03:00
elements. Style dict objects are dicts with attribute read-only access
2017-06-30 23:48:47 +03:00
mapping property names to the computed values. (The return value is not the
dict itself, but a convenience :func:`style_for` function for accessing it.)
2012-10-09 19:48:36 +04:00
2011-10-31 19:45:22 +04:00
Formatting structure
....................
2017-06-30 23:48:47 +03:00
The `visual formatting model`_ explains how *elements* (from the ElementTree
tree) generate *boxes* (in the formatting structure). This is step 4 above.
Boxes may have children and thus form a tree, much like elements. This tree is
generally close but not identical to the ElementTree tree: some elements
generate more than one box or none.
2011-10-31 19:45:22 +04:00
.. _visual formatting model: http://www.w3.org/TR/CSS21/visuren.html
Boxes are of a lot of different kinds. For example you should not confuse
*block-level boxes* and *block containers*, though *block boxes* are both.
2012-10-09 19:48:36 +04:00
The :mod:`weasyprint.formatting_structure.boxes` module has a whole hierarchy
of classes to represent all these boxes. We wont go into the details here,
see the module and class docstrings.
2011-10-31 19:45:22 +04:00
2017-06-30 23:48:47 +03:00
The :mod:`weasyprint.formatting_structure.build` module takes an ElementTree
tree with associated computed styles, and builds a formatting structure. It
generates the right boxes for each element and ensures they conform to the
models rules. (Eg. an inline box can not contain a block.) Each box has a
2018-01-13 19:41:08 +03:00
:attr:`.style` attribute containing the style dict of computed values.
2011-10-31 19:45:22 +04:00
2012-10-09 19:48:36 +04:00
The main logic is based on the ``display`` property, but it can be overridden
for some elements by adding a handler in the ``weasyprint.html`` module.
2012-06-04 20:52:33 +04:00
This is how ``<img>`` and ``<td colspan=3>`` are currently implemented,
for example.
2012-02-29 22:13:10 +04:00
This module is rather short as most of HTML is defined in CSS rather than
in Python, in the `user agent stylesheet`_.
2011-10-31 19:45:22 +04:00
2012-10-09 19:48:36 +04:00
The :func:`~weasyprint.formatting_structure.build.build_formatting_structure`
function returns the box for the root element (and, through its
:attr:`children` attribute, the whole tree).
2011-10-31 19:45:22 +04:00
2012-02-29 22:13:10 +04:00
.. _user agent stylesheet: https://github.com/Kozea/WeasyPrint/blob/master/weasyprint/css/html5_ua.css
2012-10-09 19:48:36 +04:00
2011-10-31 19:45:22 +04:00
Layout
......
Step 5 is the layout. You could say the everything else is glue code and
this is where the magic happens.
2012-10-09 19:48:36 +04:00
During the layout the documents content is, well, laid out on pages.
This is when we decide where to do line breaks and page breaks. If a break
happens inside of a box, that box is split into two (or more) boxes in the
layout result.
2011-10-31 19:45:22 +04:00
According to the `box model`_, each box has rectangular margin, border,
padding and content areas:
.. _box model: http://www.w3.org/TR/CSS21/box.html
.. image:: _static/box_model.png
2011-10-31 19:45:22 +04:00
:align: center
2012-10-09 19:48:36 +04:00
While :obj:`box.style` contains computed values, the `used values`_ are set
as attributes of the :class:`Box` object itself during the layout. This
2011-10-31 19:45:22 +04:00
include resolving percentages and especially ``auto`` values into absolute,
pixel lengths. Once the layout done, each box has used values for
2012-10-09 19:48:36 +04:00
margins, border width, padding of each four sides, as well as the
:attr:`width` and :attr:`height` of the content area. They also have
:attr:`position_x` and :attr:`position_y`, the absolute coordinates of the
2012-10-09 19:48:36 +04:00
top-left corner of the margin box (**not** the content box) from the top-left
corner of the page.\ [#]_
Boxes also have helpers methods such as :meth:`content_box_y` and
:meth:`margin_width` that give other metrics that can be useful in various
parts of the code.
2011-10-31 19:45:22 +04:00
2012-10-09 19:48:36 +04:00
The final result of the layout is a list of :class:`PageBox` objects.
2011-10-31 19:45:22 +04:00
2012-10-09 19:48:36 +04:00
.. [#] These are the coordinates *if* no `CSS transform`_ applies.
Transforms change the actual location of boxes, but they are applies
later during drawing and do not affect layout.
.. _used values: http://www.w3.org/TR/CSS21/cascade.html#used-value
.. _CSS transform: http://www.w3.org/TR/css3-transforms/
2011-10-31 19:45:22 +04:00
Stacking & Drawing
..................
2012-06-04 20:52:33 +04:00
2012-10-09 19:48:36 +04:00
In step 6, the boxes are reorder by the :mod:`weasyprint.stacking` module
2012-06-04 20:52:33 +04:00
to observe `stacking rules`_ such as the ``z-index`` property.
2012-10-09 19:48:36 +04:00
The result is a tree of *stacking contexts*.
2012-06-04 20:52:33 +04:00
Next, each laid-out page is *drawn* onto a cairo_ surface. Since each box has
absolute coordinates on the page from the layout step, the logic here should be
minimal. If you find yourself adding a lot of logic here, maybe it should go in
the layout or stacking instead.
2011-10-31 19:45:22 +04:00
2012-10-09 19:48:36 +04:00
The code lives in the :mod:`weasyprint.draw` module.
2011-10-31 19:45:22 +04:00
.. _stacking rules: http://www.w3.org/TR/CSS21/zindex.html
2011-10-31 19:45:22 +04:00
.. _cairo: http://cairographics.org/pycairo/
2012-06-04 20:52:33 +04:00
2012-10-09 19:48:36 +04:00
2012-06-04 20:52:33 +04:00
Metadata
........
2012-10-09 19:48:36 +04:00
Finally (step 8), the :mod:`weasyprint.pdf` module parses the PDF file
produced by cairo and makes appends to it to add meta-data:
internal and external hyperlinks, as well as outlines / bookmarks.