Documentation about how to write code documentation (#891)

* add initial guidelines of code documentation * fix math formula not displayed in Sphinx * remove @name tags which cannot be extracted by exhale and cause function signature errors * fix markdown ref warning and update markdown parser in sphinx * more about doxygen: add Doxygen commands and math formulas * move code doc guide to a new .rst file * add formula image * Set myst-parser version appropriate for the requested sphinx version * Update documentation on how to write Doxygen comments * Add new section to the documentation index * Sphinx 2.4.4 requires myst-parser 0.14 * complete code doc guide and small fixes on reStructuredText formats * More about reStructuredText * Update badges on the documentation frontpage Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2024-11-03 20:13:47 +03:00 · 2021-12-07 15:10:46 +00:00 · 2021-12-07 15:10:46 +00:00 · cd9afea8d3
commit cd9afea8d3
parent c64cb2990e
10 changed files with 424 additions and 80 deletions
--- a/doc/README.md
+++ b/doc/README.md
@ -23,7 +23,7 @@ Then set up a Python environment and install modules:
    pip3 install virtualenv
    virtualenv venv -p python3
    source venv/bin/activate
-    pip install -r requirements.txt
+    pip3 install -r requirements.txt

 Documentation building should also work on Windows, but it has not been tested.

@ -48,4 +48,22 @@ Directories:

 ## Writing documentation

-To be documented...
+See [this section](src/doc_guide.rst) in the documentation for detailed recommendations on how to
+write code and user documentation in Marian.
+
+In a nutshell, each class, struct or function should have a Doxygen comment following the basic
+template of:
+
+    /**
+     * Brief summary.
+     * Detailed description. More detail.
+     * @see Some reference
+     * @param <name> Parameter description
+     * @return Return value description
+     */
+     std::string function(int param);
+
+And attributes should be documented with an inline comment, for example:
+
+    int var; ///< Brief description
+
--- a/doc/conf.py
+++ b/doc/conf.py
@ -37,11 +37,11 @@ release = version + ' ' + str(datetime.date.today())
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
 extensions = [
-    'sphinx.ext.imgmath',
+    'sphinx.ext.mathjax',
    'sphinx.ext.todo',
    'breathe',
    'exhale',
-    'recommonmark',
+    'myst_parser',
 ]

 # Add any paths that contain templates here, relative to this directory.
@ -57,6 +57,13 @@ exclude_patterns = [
    'README.md',
 ]

+# The file extensions of source files. Sphinx considers the files with
+# this suffix as sources. By default, Sphinx only supports 'restructuredtext'
+# file type. You can add a new file type using source parser extensions.
+source_suffix = {
+    '.rst': 'restructuredtext',
+    '.md': 'markdown',
+}

 # -- Options for HTML output -------------------------------------------------

@ -91,6 +98,7 @@ EXTENSION_MAPPING   += cu=C++ inc=C++
 ENABLE_PREPROCESSING = YES
 JAVADOC_AUTOBRIEF    = YES
 WARN_IF_UNDOCUMENTED = NO
+USE_MATHJAX          = YES
 """

 exhale_args = {
@ -100,6 +108,7 @@ exhale_args = {
    'doxygenStripFromPath'  : '..',
    'createTreeView'        : True,
    'exhaleExecutesDoxygen' : True,
+    # 'verboseBuild'          : True, # set True for debugging
    'exhaleDoxygenStdin'    : doxygen_config.strip(),
 }

--- a/doc/doc_guide.rst
+++ b/doc/doc_guide.rst
@ -0,0 +1,336 @@
+Writing documentation
+---------------------
+
+Marian’s documentation is generated using `Sphinx`_ + `Breathe`_ + `Doxygen`_ + `Exhale`_.
+`Doxygen`_ is used for documenting the source code and `Sphinx`_ (together with the extensions of
+`Breathe`_ and `Exhale`_) for managing handwritten documentation and generating library API
+reference.
+
+Whenever you add new code or propose changes to Marian, we would highly appreciate if you also add
+new Doxygen comments or update existing ones as needed along with your changes (see the `Doxygen
+guidelines`_ below). Your Doxygen comments will be integrated in the Marian’s documentation
+automatically.
+
+There is an ongoing and incremental effort with the goal of documenting essential Marian API in a
+consistent way. The existing code might not follow these guidelines, but new code should.
+
+
+Code documentation with Doxygen
+```````````````````````````````
+
+`Doxygen`_ is a powerful documentation system for C++ and many other languages that parses and
+extracts documentation comments included in the source code to generate a comprehensive
+documentation, for example, in HTML or LaTeX format.
+
+Doxygen basics
+**************
+
+Doxygen recognises several special comment blocks with some additional markings. In Marian, we
+follow the **Javadoc style**, which consist of a C-style comment block starting with two ``*``'s,
+like this:
+
+.. code:: cpp
+
+    /**
+     * ... text ...
+     */
+
+A documentation comment for all main entities in the code (e.g. classes, functions, methods, etc.)
+always includes two sections: a *brief* summary and *detailed* description.  In Marian, a Java-style
+comment block automatically starts a brief description which ends at the first dot followed by a
+space or new line (i.e. there is no need to add the `@brief` keyword). Here is an example:
+
+.. code:: cpp
+
+    /**
+     *  Brief description which ends at this dot. Details follow
+     *  here.
+     */
+
+If you want to put documentation after members (e.g., a variable and enum), you have to put an
+additional ``<`` marker in the comment block.
+
+.. code:: cpp
+
+    int var; ///< Brief description after the member
+
+Doxygen commands
+****************
+
+More details in the documentation can be provided using special Doxygen's special commands
+(keywords) which start with an at-sign (@).  See `Doxygen special commands`_ for the complete list
+of available commands. Here, we list the most common Doxygen commands, which we use to document
+Marian:
+
+-----------------------+-----------------------+-----------------------+
+| Doxygen Command       | Detailed Description  | Example               |
+=======================+=======================+=======================+
+| @param                | Add a parameter       | ``@param device a     |
+|                       | description for a     | pointer to the        |
+|                       | function parameter    | device``              |
+-----------------------+-----------------------+-----------------------+
+| @return               | Add a return value    | ``@return a pointer   |
+|                       | description for a     | to the constant       |
+|                       | function              | node``                |
+-----------------------+-----------------------+-----------------------+
+| @see                  | Add a cross-reference | ``@see reshape()``    |
+|                       | to classes,           |                       |
+|                       | functions, methods,   |                       |
+|                       | variables, files or   |                       |
+|                       | URL                   |                       |
+-----------------------+-----------------------+-----------------------+
+| @ref                  | Create a reference to | ``@ref IndexType``    |
+|                       | another item being    |                       |
+|                       | documented.           |                       |
+-----------------------+-----------------------+-----------------------+
+| @copybrief            | Copy the brief        | ``@copybrief slice``  |
+|                       | description from the  |                       |
+|                       | object specified      |                       |
+-----------------------+-----------------------+-----------------------+
+| @copydetails          | Copy the detailed     | ``@copydetails dot``  |
+|                       | documentation from    |                       |
+|                       | the object specified  |                       |
+-----------------------+-----------------------+-----------------------+
+| @note                 | Add a note message    | ``@note this is named |
+|                       | where the text will   | after an equivalent   |
+|                       | be highlighted        | function in PyTorch`` |
+-----------------------+-----------------------+-----------------------+
+| @warning              | Add a warning message | ``@warning            |
+|                       | where the text will   | not implemented``     |
+|                       | be highlighted        |                       |
+-----------------------+-----------------------+-----------------------+
+| @b                    | Display a single word | ``@b bold``           |
+|                       | using a bold font     |                       |
+-----------------------+-----------------------+-----------------------+
+| @c                    | Display a single word | ``@c void``           |
+|                       | using a typewriter    |                       |
+|                       | font                  |                       |
+-----------------------+-----------------------+-----------------------+
+| @p                    | Display a single word | ``@p transA``         |
+|                       | using a typewriter    |                       |
+|                       | font. Equivalent to   |                       |
+|                       | ``@c``                |                       |
+-----------------------+-----------------------+-----------------------+
+| @em                   | Display a single word | ``@em x``             |
+|                       | in italics.           |                       |
+-----------------------+-----------------------+-----------------------+
+
+.. warning::
+
+    Not all Doxygen special commands are supported in Exhale, e.g., `grouping`_
+    [`1 <https://exhale.readthedocs.io/en/latest/faq.html#my-documentation-is-setup-using-groups-how-can-i-use-exhale>`_].
+    Some commands like `@name`_ could lead to errors when parsing overloaded functions.
+    To free yourself from debugging the Doxygen comments for hours, we recommend you only using the
+    above commands.
+
+Math formulas in Doxygen
+************************
+
+Doxygen supports LaTeX math formulas in the documentation. To include an inline formula that appears
+in the running text, we need wrap it by a pair of ``@f$`` commands, for example:
+
+.. code:: none
+
+    Default is no smoothing, @f$\alpha = 0 @f$.
+
+This will result in: Default is no smoothing, |formula1|
+
+.. |formula1| image:: images/formula1.png
+
+For the longer formulas which are in separate lines, we can put ``\f[`` and ``\f]`` commands between
+the formulas, for instance:
+
+.. code:: none
+
+    @f[
+       \operatorname{gelu}(x) = x \cdot \Phi(x)
+         = x \cdot \frac{1}{2}\left[
+            1 + \operatorname{erf}\left(\frac{x}{\sqrt{2}}\right)
+         \right]
+         \sim \operatorname{swish}(x, 1.702)
+    @f]
+
+This will result in:
+
+.. figure:: images/gelu_formula.png
+   :alt: Example of formula 2
+
+   Example of formula 2
+
+.. note::
+
+    Make sure the formula contains *valid* commands in `LaTeX’s math-mode`_.
+
+Recommendations
+***************
+
+First of all, add Doxygen comments in the header files. You can find the examples of Doxygen
+comments in `src/graph/expression_graph.h`_.  A good practice is to keep Doxygen comments as
+intuitive and short as possible. Try not to introduce unnecessary vertical space (e.g., an empty
+line). A basic template of Doxygen comments is shown as follows:
+
+.. code:: cpp
+
+    /**
+     * Brief summary.
+     * Detailed description. More detail.
+     * @see Some reference
+     * @param <name> Parameter description.
+     * @return Return value description.
+     */
+
+
+User documentation with Sphinx
+``````````````````````````````
+
+Sphinx supports `reStructuredText`_ and `Markdown`_ documents. Marian's user documentation files are
+located in `doc`_.  The default format of Sphinx is `reStructuredText`_ and most of the framework's
+power comes from the richness of its default `reStructuredText`_ markup format.
+
+
+reStructuredText
+****************
+
+As Marian’s documentation is generated using `Sphinx`_ + `Breathe`_ + `Doxygen`_ + `Exhale`_,
+reStructuredText is the best language to use if you need to utilise many ``directives`` generated by
+Sphinx / Breathe / Exhale and are not satisfied with Markdown features as mentioned :ref:`below
+<md-section>`.
+
+There are many useful ``directives`` supported by Sphinx / Breathe / Exhale which you could use in
+your user documentation. Here we highlight the most useful directives when documenting Marian.
+For the complete reStructuredText syntax guide, please refer to the `mini guide`_ provided by
+`Exhale`_. Sphinx docs also covers the most important aspects of reStructuredText. Read more in the
+`reStructuredText tutorials provided Sphinx`_.
+
+The first useful set of directives are `Breathe directives`_ which are used to include documentation
+for different constructs. The available commands are listed below:
+
+ .. code:: none
+
+    .. doxygenindex::
+    .. doxygenfunction::
+    .. doxygenstruct::
+    .. doxygenenum::
+    .. doxygentypedef::
+    .. doxygenclass::
+
+The second one is `Exhale directives`_ which are used to link different constructs.  The available
+commands are listed below:
+
+ .. code:: none
+
+    :class:`namespace::ClassName`
+    :func:`namespace::ClassName::methodName`
+    :member:`namespace::ClassName::mMemberName`
+    :func:`namespace::funcName`
+
+.. tip::
+    1. reStructuredText is particularly sensitive to whitespace! If the rendered text does not turn
+       out as what you expected, double check space(s) or newline(s).
+    2. It takes several minutes to build Marian's documentation (mostly due to Exhale). If you work
+       on a user documentation and need to check the rendered result frequently, you can comment out
+       the exhale extension in ``conf.py`` file once :doc:`Marian code documentation
+       <api/library_index>` is generated (i.e., building the whole documentation once). This will
+       greatly speed up the documentation building.
+
+.. _md-section:
+
+Markdown
+********
+
+Although reStructuredText is more powerful than Markdown, it might feel less intuitive if you have
+never used it before. Sphinx docs now use `MyST-Parser`_ as a default extension for handling
+Markdown, which adds more Markdown-friendly syntax for the purpose of the documentation, in addition
+to the `CommonMark`_ features. Read more in the `MyST-Parser documentation`_.
+
+For instance, MyST-Parser supports `directives syntax`_, a generic block of explicit markup syntax
+available in reStructuredText, such as ``note admonitions``:
+
+ .. code:: none
+
+    ```{note} Notes require **no** arguments, so content can start here.
+    ```
+
+The above markdown text will be rendered as below:
+
+ .. note::
+
+    Notes require **no** arguments, so content can start here.
+
+Another useful feature is that you can include reStructuredText text/files into a Markdown file.
+This means you can take advantage of ``directives`` generated by Sphinx / Breathe / Exhale with
+ease, especially if you want to highlight/reference the functions or classes in :doc:`Marian code
+documentation <api/library_index>`.
+In general Sphinx docs only supports reStructuredText commands (such as `sphinx.ext.autodoc`_ and
+`Breathe directives`_) to interact with the code documentation [`2 <https://myst-parser.readthedocs.io/en/latest/sphinx/use.html#>`_].
+
+For example, let's assume that you want to include the function documentation of
+``marian::inits::fromValue ( float )`` in the user documentation. You can use the following `Breathe
+doxygenfunction directive`_ for this:
+
+.. doxygenfunction:: marian::inits::fromValue(float)
+
+To display the exactly same content as above, MyST-Parser offers the special `eval-rst directive`_
+to wrap reStructuredText directives:
+
+ .. code:: none
+
+    ```{eval-rst}
+    .. doxygenfunction:: marian::inits::fromValue(float)
+    ```
+
+Also, you can link functions or classes in :doc:`Marian code documentation <api/library_index>` with
+`eval-rst directive`_. For example, to link ``marian::inits::fromValue(float)`` you can use the
+following markdown syntax:
+
+ .. code:: none
+
+    ```{eval-rst}
+     Link to :func:`marian::inits::fromValue`
+    ```
+
+Or you can directly link to the function in `markdown hyperlink syntax`_:
+
+ .. code:: none
+
+    Link to [`marian::inits::fromValue(float)`](api/function_namespacemarian_1_1inits_1a71bb6dee3704c85c5f63a97eead43a1e.html#_CPPv4N6marian5inits9fromValueEf)
+
+Both outputs will be rendered with a clickable hyperlink to ``marian::inits::fromValue(float)`` in
+the corresponding Library API page (as shown below):
+
+   Link to :func:`marian::inits::fromValue`
+
+.. note::
+
+    The reference links for ``marian::inits::fromValue(float)`` is generated by `Exhale`_. For more
+    information about how to cross-reference the code documentation, see `Exhale's linking
+    strategy`_.
+
+
+.. _Sphinx: https://www.sphinx-doc.org/en/master/usage/quickstart.html
+.. _Breathe: https://breathe.readthedocs.io/en/latest/directives.html
+.. _Doxygen: http://www.doxygen.nl/manual/docblocks.html
+.. _Exhale: https://exhale.readthedocs.io/en/latest/usage.html
+.. _Doxygen guidelines: #documentation-with-doxygen
+.. _JAVADOC_AUTOBRIEF: https://www.doxygen.nl/manual/config.html#cfg_javadoc_autobrief
+.. _Doxygen special commands: https://www.doxygen.nl/manual/commands.html
+.. _grouping: https://www.doxygen.nl/manual/grouping.html
+.. _@name: https://www.doxygen.nl/manual/commands.html#cmdname
+.. _LaTeX’s math-mode: https://en.wikibooks.org/wiki/LaTeX/Mathematics
+.. _src/graph/expression_graph.h: https://github.com/marian-nmt/marian-dev/blob/master/src/graph/expression_graph.h
+.. _Markdown: https://www.sphinx-doc.org/en/master/usage/markdown.html
+.. _reStructuredText: https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html
+.. _doc: https://github.com/marian-nmt/marian-dev/tree/master/doc
+.. _MyST-Parser: https://www.sphinx-doc.org/en/master/usage/markdown.html
+.. _MyST-Parser documentation: https://myst-parser.readthedocs.io/en/latest/syntax/optional.html
+.. _CommonMark: https://commonmark.org
+.. _directives syntax: https://myst-parser.readthedocs.io/en/latest/syntax/syntax.html#directives-a-block-level-extension-point
+.. _Breathe directives: https://breathe.readthedocs.io/en/latest/directives.html
+.. _Breathe doxygenfunction directive: https://breathe.readthedocs.io/en/latest/directives.html#doxygenfunction
+.. _sphinx.ext.autodoc: https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#module-sphinx.ext.autodoc
+.. _eval-rst directive: https://myst-parser.readthedocs.io/en/latest/syntax/syntax.html#syntax-directives-parsing
+.. _Exhale's linking strategy: https://exhale.readthedocs.io/en/latest/usage.html#linking-to-a-generated-file
+.. _mini guide: https://exhale.readthedocs.io/en/latest/mastering_doxygen.html#features-available-by-using-sphinx-breathe-exhale-by-way-of-restructuredtext
+.. _reStructuredText tutorials provided Sphinx: https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html
+.. _markdown hyperlink syntax: https://www.markdownguide.org/basic-syntax/#links
+.. _Exhale directives: https://exhale.readthedocs.io/en/latest/usage.html#suggested-restructuredtext-linking-strategy
--- a/doc/graph.md
+++ b/doc/graph.md
@ -6,7 +6,7 @@ The dynamic declaration, which means a new graph is created for each training in
 It allows handling of variably sized inputs, as well as the cases where the graph may change depending on the results of previous steps.
 Compared to static declaration, a dynamic computation graph could be expensive in terms of creating and optimising computation graphs.
 Marian uses careful memory management to remove overhead in computation graph construction, and supports efficient execution on both CPU and GPU.
-The main implementation of computation graph is in under [`src/graph`](api/dir_src_graph.html#dir-src-graph) directory.  
+The main implementation of computation graph is in under [src/graph](dir_src_graph) directory.  

 Building blocks for graphs:

@ -59,7 +59,7 @@ The _workspace memory_ means the size of the memory available for the forward an
 This does not include model size and optimizer parameters that are allocated outsize workspace. 
 Hence you cannot allocate all device memory to the workspace.

-To create a graph, Marian offer a set of shortcut functions that implements the common expression operators for a neural network (see [`src/graph/expression_operators.h`](api/program_listing_file_src_graph_expression_operators.h.html)), such as `affine()`. 
+To create a graph, Marian offers a set of shortcut functions that implements the common expression operators for a neural network (see [src/graph/expression_operators.h](file_src_graph_expression_operators.h), such as `affine()`. 
 These functions actually construct the corresponding operation nodes in the graph, make links with other nodes. 
 E.g., `affine()` construct a `AffineNodeOp` node in the graph. 
 Thus, building a graph turns into a simple task of defining expressions by using those functions. 
@ -142,7 +142,7 @@ auto x = graph->constant({N, NUM_FEATURES}, inits::fromVector(inputData));

 For the above example, the shape of the constant node is `{N, NUM_FEATURES}`, and the value of the constant node is initialised from a vector `inputData`.
 `inits::fromVector()` returns a `NodeInitializer` which is a functor used to initialise a tensor by copying from the given vector. 
-More functions used to initialise a node can be found in [`src/graph/node_initializers.h`](api/namespace_marian__inits.html#namespace-marian-inits) file. 
+More functions used to initialise a node can be found in [src/graph/node_initializers.h](namespace_marian__inits) file. 
 Marian also provides some shortcut functions to construct special constant nodes, such as `ones()` and `zeros()`:

 ```cpp
@ -186,7 +186,7 @@ auto h = tanh(affine(x, W1, b1));
 ```

 In the above example, `affine()` and `tanh()` actually add `AffineNodeOp` and `TanhNodeOp` nodes to the graph. 
-For more shortcut functions used to add operations in the graph, you can find in [`src/graph/expression_operators.h`](api/program_listing_file_src_graph_expression_operators.h.html) file.
+For more shortcut functions used to add operations in the graph, you can find in [src/graph/expression_operators.h](file_src_graph_expression_operators.h) file.

 ## Graph execution

@ -279,7 +279,7 @@ This comes to how we define the loss function and optimiser for the graph.
 A loss function is used to calculate the model error between the predicted value and the actual value. 
 The goal is to minimise this error during training. 
 In a graph, the loss function is also represented as a group of node(s). 
-You can also use the operators provided in [`expression_operators.h`](api/program_listing_file_src_graph_expression_operators.h.html) file to define the loss function. 
+You can also use the operators provided in [src/graph/expression_operators.h](file_src_graph_expression_operators.h) file to define the loss function. 
 E.g., Marian offers `cross_entropy()` function to compute the cross-entropy loss between true labels and predicted labels.

 **Define a loss function for modified Example 1**
--- a/doc/images/formula1.png
+++ b/doc/images/formula1.png
--- a/doc/images/gelu_formula.png
+++ b/doc/images/gelu_formula.png
--- a/doc/index.rst
+++ b/doc/index.rst
@ -19,6 +19,8 @@ This is developer documentation. User documentation is available at https://mari

   contributing

+   doc_guide
+

 Indices and tables
 ------------------
@ -26,8 +28,8 @@ Indices and tables
 * :ref:`genindex`


-.. |buildgpu| image:: https://img.shields.io/jenkins/s/http/vali.inf.ed.ac.uk/jenkins/view/marian/job/marian-dev-cuda-10.1.svg?label=CUDAC%20Build
-   :target: http://vali.inf.ed.ac.uk/jenkins/job/marian-dev/
+.. |buildgpu| image:: https://img.shields.io/jenkins/s/http/vali.inf.ed.ac.uk/jenkins/view/marian/job/marian-dev-cuda-10.2.svg?label=CUDAC%20Build
+   :target: http://vali.inf.ed.ac.uk/jenkins/job/marian-dev-cuda-10.2/
   :alt: GPU build status

 .. |buildcpu| image:: https://img.shields.io/jenkins/s/http/vali.inf.ed.ac.uk/jenkins/view/marian/job/marian-dev-cpu.svg?label=CPU%20Build
--- a/doc/operators.md
+++ b/doc/operators.md
@ -25,7 +25,7 @@ Marian.
 The central component in the graph is the `Chainable<Tensor>` object. This
 object provides the abstract interface necessary to interact with elements in
 the computation graph. The details of this interface can be found in
-[/src/graph/chainable.h](api/file_src_graph_chainable.h.html). Note that the
+[/src/graph/chainable.h](file_src_graph_chainable.h). Note that the
 template parameter corresponds to the underlying data structure, which in Marian
 is the `Tensor`. Therefore, for convenience, the type `Expr` is defined:

@ -37,22 +37,22 @@ The implementation of the different operator components are divided across
 several files:

  - Expression Operator
-    - [/src/graph/expression_operators.h](api/file_src_graph_expression_operators.h.html)
-    - [/src/graph/expression_operators.cpp](api/file_src_graph_expression_operators.cpp.html)
+    - [/src/graph/expression_operators.h](file_src_graph_expression_operators.h)
+    - [/src/graph/expression_operators.cpp](file_src_graph_expression_operators.cpp)
  - Node Operator
-    - [/src/graph/node_operators_unary.h](api/file_src_graph_node_operators_unary.h.html)
-    - [/src/graph/node_operators_binary.h](api/file_src_graph_node_operators_binary.h.html)
-    - [/src/graph/node_operators_tuple.h](api/file_src_graph_node_operators_tuple.h.html)
+    - [/src/graph/node_operators_unary.h](file_src_graph_node_operators_unary.h)
+    - [/src/graph/node_operators_binary.h](file_src_graph_node_operators_binary.h)
+    - [/src/graph/node_operators_tuple.h](file_src_graph_node_operators_tuple.h)
  - Functional Operator
-    - [/src/functional/operators.h](api/file_src_functional_operators.h.html)
+    - [/src/functional/operators.h](file_src_functional_operators.h)
  - Tensor operation
-    - [/src/tensors/tensor_operators.h](api/file_src_tensors_tensor_operators.h.html)
-    - [/src/tensors/cpu/tensor_operators.cpp](api/file_src_tensors_cpu_tensor_operators.cpp.html)
-    - [/src/tensors/gpu/tensor_operators.cu](api/file_src_tensors_gpu_tensor_operators.cu.html)
+    - [/src/tensors/tensor_operators.h](file_src_tensors_tensor_operators.h)
+    - [/src/tensors/cpu/tensor_operators.cpp](file_src_tensors_cpu_tensor_operators.cpp)
+    - [/src/tensors/gpu/tensor_operators.cu](file_src_tensors_gpu_tensor_operators.cu)
  - Declared Specialization
-    - [/src/tensors/gpu/element.inc](api/program_listing_file_src_tensors_gpu_element.inc.html)
-    - [/src/tensors/gpu/add.inc](api/program_listing_file_src_tensors_gpu_add.inc.html)
-    - [/src/tensors/gpu/add_all.inc](api/program_listing_file_src_tensors_gpu_add_all.inc.html)
+    - [/src/tensors/gpu/element.inc](program_listing_file_src_tensors_gpu_element.inc)
+    - [/src/tensors/gpu/add.inc](program_listing_file_src_tensors_gpu_add.inc)
+    - [/src/tensors/gpu/add_all.inc](program_listing_file_src_tensors_gpu_add_all.inc)

 To understand how the different components are inter-linked, we'll look at each
 of them in turn.
@ -197,7 +197,7 @@ this example code, these are optional and, when omitted, calling
 `NaryNodeOp({a})` would result in a node with the same shape and type as `a`.
 The `type()` method returns the friendly name for the node. Note that the
 [ONNX](https://onnx.ai)
-[interface](api/program_listing_file_src_onnx_expression_graph_onnx_serialization.cpp.html)
+[interface](program_listing_file_src_onnx_expression_graph_onnx_serialization.cpp)
 maintains a mapping of these friendly names to their ONNX representation. In the
 absence of any member variables the `hash()` and `equal()` methods can be
 omitted, and defer to their `NaryNodeOp` definition. However, if such variables
@ -244,7 +244,7 @@ _1 = sin(_2)
 ```

 The placeholders `_1`, `_2` are enabled by code in
-[/src/functional](api/dir_src_functional.html) and interoperate with the
+[/src/functional](dir_src_functional) and interoperate with the
 functional operators. In the call to `Element`, `val_` is assigned to `_1` and
 `child(0)->val()` to `_2`. Therefore, this has the action of setting the
 elements of this node to the result obtained by applying `sin` to the elements
@ -328,7 +328,7 @@ specialization required for each type. The current required types are:
  - half (see `cuda_fp16.h` in the CUDA Math API)

 Further details are available in
-[/src/common/types.h](api/file_src_common_types.h.html).
+[/src/common/types.h](file_src_common_types.h).

 Returning to the example of `sin(x)`, the specialization for `float` and
 `double` requires
@ -355,12 +355,12 @@ struct Ops<double> {
 ```

 The remaining specializations can be seen in
-[/src/functional/operators.h](api/file_src_functional_operators.h.html). Note
+[/src/functional/operators.h](file_src_functional_operators.h). Note
 that the general template must produce a runtime abort.

 The final component of the functional operator is to call the macro that enables
 interoperability with the framework of
-[/src/functional](api/dir_src_functional.html). For a unary operator, this is
+[/src/functional](dir_src_functional). For a unary operator, this is
 the macro `UNARY`.

 ```cpp
@ -392,7 +392,7 @@ representation.

 Furthermore, the OpenMPI and OpenMP libraries are employed for parallelisation.
 While macros provided in
-[/src/common/definitions.h](api/file_src_common_definitions.h.html) locally
+[/src/common/definitions.h](file_src_common_definitions.h) locally
 enable faster floating-point math in supported compilers.

 ```cpp
@ -402,14 +402,14 @@ MARIAN_FFAST_MATH_END
 ```

 The usual caveats apply when enabling `fast_math`, and can be found in
-[/src/common/definitions.h](api/file_src_common_definitions.h.html)
+[/src/common/definitions.h](file_src_common_definitions.h)

 Tensor operators are declared in
-[/src/tensors/tensor_operators.h](api/file_src_tensors_tensor_operators.h.html),
+[/src/tensors/tensor_operators.h](file_src_tensors_tensor_operators.h),
 these are device-agnostic function that call the relevant device-specific
 implementation. The CPU- and GPU-specific implementation are defined in `cpu`
-namespace in [/src/tensors/cpu/](api/dir_src_tensors_cpu.html) and the `gpu`
-namespace [/src/tensors/gpu/](api/dir_src_tensors_gpu.html). Therefore a typical
+namespace in [/src/tensors/cpu/](dir_src_tensors_cpu) and the `gpu`
+namespace [/src/tensors/gpu/](dir_src_tensors_gpu). Therefore a typical
 operator defers to an implementation in the device-specific namespace.

 ```cpp
@ -461,16 +461,16 @@ compilation:
 ```

 To fix these undefined references, we must explicitly add the specialization to
-the `.inc` files of [/src/tensors/gpu/](api/dir_src_tensors_gpu.html). Each
+the `.inc` files of [/src/tensors/gpu/](dir_src_tensors_gpu). Each
 `.inc` file is included at the end of its corresponding `.cu` file, ensuring
 that the specialization is compiled.

 The undefined references should be added to the `.inc` file that corresponds to
 the header file in which contains the declaration of the missing functions.

-The file [element.inc](api/file_src_tensors_gpu_element.inc.html) contains the
+The file [element.inc](file_src_tensors_gpu_element.inc) contains the
 specializations of the function defined in
-[element.h](api/file_src_tensors_gpu_element.h.html):
+[element.h](file_src_tensors_gpu_element.h):

 ```cpp
 // src/tensors/gpu/element.h
@ -478,9 +478,9 @@ template <class Functor, class... Tensors>
 void Element(Functor functor, Tensor out, Tensors... tensors);
 ```

-Similarly, [add.inc](api/file_src_tensors_gpu_add.inc.html) contains the
+Similarly, [add.inc](file_src_tensors_gpu_add.inc) contains the
 specializations for functions matching either of the two signatures in
-[add.h](api/file_src_tensors_gpu_add.h.html):
+[add.h](file_src_tensors_gpu_add.h):

 ```cpp
 // src/tensors/gpu/add.h
@ -491,8 +491,8 @@ template <class Functor, class AggFunctor, class... Tensors>
 void Aggregate(Functor functor, float initAgg, AggFunctor aggFunctor, float scale, marian::Tensor out, Tensors... tensors);
 ```

-Finally [add_all.inc](api/file_src_tensors_gpu_add_all.inc.html) contains the
-specializations for [add_all.h](api/file_src_tensors_gpu_add_all.h.html), which
+Finally [add_all.inc](file_src_tensors_gpu_add_all.inc) contains the
+specializations for [add_all.h](file_src_tensors_gpu_add_all.h), which
 are several versions of:

 ```cpp
@ -507,7 +507,7 @@ void AggregateAll(Ptr<Allocator> allocator,
                  const Tensor in1);
 ```

-However, for [add_all.h](api/file_src_tensors_gpu_add_all.h.html), there is an
+However, for [add_all.h](file_src_tensors_gpu_add_all.h), there is an
 additional type dependence in the first template parameter, which requires two
 entries:

--- a/doc/requirements.txt
+++ b/doc/requirements.txt
@ -2,6 +2,7 @@ sphinx==2.4.4
 breathe==4.13.0
 exhale
 sphinx_rtd_theme
-recommonmark
+myst-parser==0.14.0a3
 mistune<2.0.0
 m2r
+sphinx-mathjax-offline
--- a/src/graph/expression_operators.h
+++ b/src/graph/expression_operators.h
@ -19,7 +19,7 @@ Expr checkpoint(Expr a);
 typedef Expr(ActivationFunction)(Expr);  ///< ActivationFunction has signature Expr(Expr)

 /**
- * Convience typedef for graph @ref lambda expressions.
+ * Convenience typedef for graph @ref lambda expressions.
 */
 typedef std::function<void(Expr out, const std::vector<Expr>& in)> LambdaNodeFunctor;

@ -114,7 +114,7 @@ Expr tanh(const std::vector<Expr>& nodes);

 /**
 * @copybrief tanh
- * Convience function to put parameter pack @p Args into a Expr vector
+ * Convenience function to put parameter pack @p Args into a Expr vector
 */
 template <typename... Args>
 Expr tanh(Args... args) {
@ -188,8 +188,7 @@ Expr prelu(const std::vector<Expr>&, float alpha = 0.01);
 * @{
 */

-///@name Exponentiation and Logarithmic functions
-///@{
+// Exponentiation and Logarithmic functions
 /**
 * Natural logarithm.
 * Computes the element-wise natural logarithm of the expression: @f$ \log(a) @f$
@ -203,10 +202,8 @@ Expr log(Expr a);
 * @see ExpNodeOp
 */
 Expr exp(Expr a);
-///@}

-///@name Trigonometric functions
-///@{
+// Trigonometric functions
 /**
 * Sine. Computes the element-wise sine of the expression: @f$ \sin(a) @f$.
 * @see SinNodeOp
@ -225,7 +222,6 @@ Expr cos(Expr a);
 */
 Expr tan(Expr a);
 ///@}
-///@}

 /**
 * @addtogroup graph_ops_arithmetic Arithmetic
@ -238,52 +234,42 @@ Expr tan(Expr a);
 * Returns @f$ -a @f$.
 * @see NegNodeOp for implementation.
 */
-///@{
 Expr operator-(Expr a);
-///@}

 /*********************************************************/

 /**
- * @name Addition
+ * Addition
 * Performs @f$ a + b @f$ in the expression graph.
 */
-///@{
 Expr operator+(Expr a, Expr b);   ///< @see Implementation in PlusNodeOp
 Expr operator+(float a, Expr b);  ///< @see Implementation in ScalarAddNodeOp
 Expr operator+(Expr a, float b);  ///< @see Implementation in ScalarAddNodeOp
-///@}

 /**
- * @name Subtraction
+ * Subtraction
 * Performs @f$ a - b @f$ in the expression graph.
 */
-///@{
 Expr operator-(Expr a, Expr b);   ///< @see Implementation in MinusNodeOp
 Expr operator-(float a, Expr b);  ///< @see Implementation in ScalarAddNodeOp
 Expr operator-(Expr a, float b);  ///< @see Implementation in ScalarAddNodeOp
-///@}

 /**
- * @name Multiplication
+ * Multiplication
 * Performs @f$ a * b @f$ in the expression graph.
 */
-///@{
 Expr operator*(Expr a, Expr b);   ///< @see Implementation in MultNodeOp
 Expr operator*(float a, Expr b);  ///< @see Implementation in ScalarMultNodeOp
 Expr operator*(Expr a, float b);  ///< @see Implementation in ScalarMultNodeOp
-///@}

 /**
- * @name Division
+ * Division
 * Performs @f$ a / b @f$ in the expression graph.
 */
-///@{
 Expr operator/(Expr a, Expr b);   ///< @see Implementation in DivNodeOp
 Expr operator/(float a, Expr b);  ///< Promotes @p a to Expression<ConstantNode> and uses operator/(Expr a, Expr b).
                                  ///< @todo efficient version of this without ExpressionGraph::constant
 Expr operator/(Expr a, float b);  ///< Implementation via @f$ a * \frac{1}{b} @f$.
-///@}

 ///@}

@ -324,14 +310,13 @@ Expr logaddexp(Expr a, Expr b);

 ///@addtogroup graph_ops_mathematical
 ///@{
-/**
- * @name Element-wise min/max
+/*
+ * Element-wise min/max
 * Performs an element-wise min max comparison between expressions.
 * @see min, max for axis level operations
 * @see MinimumNodeOp, MaximumNodeOp
 * @todo implement version without ExpressionGraph::constant.
 */
-///@{

 /**
 * Computes the element-wise maximum of its inputs.
@ -367,7 +352,6 @@ Expr minimum(float a, Expr b);
 */
 Expr minimum(Expr a, float b);
 ///@}
-///@}

 /**
 * Pair of expressions.
@ -428,23 +412,20 @@ Expr2 argmin(Expr a, int axis);
 * @{
 */

-/**
- * @name Expr-Expr comparisons
+/*
+ * Expr-Expr comparisons
 */
-///@{
 Expr lt(Expr a, Expr b);  ///< @f$ a < b @f$
 Expr eq(Expr a, Expr b);  ///< @f$ a \equiv b @f$
 Expr gt(Expr a, Expr b);  ///< @f$ a > b @f$
 Expr ge(Expr a, Expr b);  ///< @f$ a \geq b @f$
 Expr ne(Expr a, Expr b);  ///< @f$ a \neq b @f$
 Expr le(Expr a, Expr b);  ///< @f$ a \leq b @f$
-///@}

-/**
- * @name Float-Expr comparisons
+/*
+ * Float-Expr comparisons
 * Floats are promoted to a @ref ExpressionGraph::constant and use the Expr-Expr methods
 */
-///@{
 Expr lt(float a, Expr b);  ///< @f$ a < b @f$
 Expr eq(float a, Expr b);  ///< @f$ a \equiv b @f$
 Expr gt(float a, Expr b);  ///< @f$ a > b @f$
@ -458,7 +439,6 @@ Expr gt(Expr a, float b);  ///< @f$ a > b @f$
 Expr ge(Expr a, float b);  ///< @f$ a \geq b @f$
 Expr ne(Expr a, float b);  ///< @f$ a \neq b @f$
 Expr le(Expr a, float b);  ///< @f$ a \leq b @f$
-///@}

 ///@}

@ -810,8 +790,7 @@ static inline Expr narrow(Expr a, int axis, size_t start, size_t length) {

 ///@addtogroup graph_ops_mathematical
 ///@{
-///@name Aggregations
-///@{
+// Aggregations

 /**
 * Compute the sum along the specified axis.
@ -862,7 +841,6 @@ Expr min(Expr a, int ax);
 */
 Expr prod(Expr a, int ax);

-///@}
 ///@}

 /**