compaREST/docs/Development_guide.md

150 lines
7.5 KiB
Markdown
Raw Permalink Normal View History

CompaREST
=========
The tool works by simultaneously looking at a pair of nodes in OpenAPI schema
trees. At every place we try to understand whether one side (the producer) can
produce something that the other side (the consumer) cannot consume. This often
involves recursively running the same check on child sub-nodes. The necessary
criteria are implemented manually for all types of nodes that appear in the
schema. The `Subtree` class in `Data.OpenApi.Compare.Subtree` designates types
of nodes for which this compatibility checking is defined. The implementations
for various types reside in `Data.OpenApi.Compare.Validate.*`.
Producer/Consumer
-----------------
Rather than looking at which schema is the server's and which is the client's,
in terms of compatibility checking it is more useful to track which side is the
"producer" and which is the "consumer". At the root of the schema the producer
is the client, but as we descend into an HTTP response, the direction flips and
the producer becomes the server.
We deal with a lot of things in pairs, where one thing belongs to the producer
and another belongs to the consumer. The `ProdCons` datatype (in
`Data.OpenApi.Compare.Subtree`) is just a pair type that provides an Applicative
abstraction for working with two things simultaneously.
Trees
-----
We directly use the datatypes defined in the `openapi3` library. For reasons of
identifiability and memoization we tag all nodes in the trees with a path from
the root. The tree is heterogeneous (nodes have different types), and every type
of node has a different arrangement of children. There is a `Step` data family
that defines all the possible children (of a specific type) of a node (of a
specific type).
A `Trace` is a sequence of steps, except in a path the types of adjacent steps
have to agree. The `Paths` datatype achieves this. Together all paths form a
`Category`, and we often concatenate them using `>>>`. This is defined in
`Data.OpenApi.Compare.Paths`.
We often keep a node and a path to it next to each other using an environment
comonad. `Traced` is a type alias around `Env` which ensures that the type of
the path matches the type of the node.
Issues
------
The output of the tool is a list of compatibility issues that came up during
checking. The `Issue` data family describes the kinds of issues that can occur
at each type of node. An issue is supposed to characterize a set of interactions
(requests or responses) that demonstrate that the schemas are incompatible.
Each issue is tagged with a `Behavior` that describes the path to reproducing
the issue. While a `Trace` describes a syntactic path down the schema tree, a
`Behavior` describes which part of an interaction needs to be focused on for the
issue to manifest itself.
In most places where we keep multiple issues (e.g. the output), to store them we
use a prefix tree map, indexed by both types and values of `Behavior`s. See
`Data.OpenApi.Compare.PathsPrefixTree`.
Report
------
The checker ultimately outputs a list of `Issue`s, together with `Behavior`s
that cause them. We run the checker twice: forwards and backwards -- to detect
non-breaking changes. All of this is then compiled into a report in
`Data.OpenApi.Compare.Report`. The tree structure of headings is computed from
`Behavior`s, using `Jet`s to collapse particular sequences of behavior steps
into a single human-readable header item. The text of each paragraph comes from
`describeIssue`.
We then use the `pandoc` library to render the report in a variety of formats.
Formulas
--------
The schema has references, so it can end up being recursive. Similarly the
process for establishing compatibility can end up being recursive. To keep track
of this, the checker actually manipulates *formulas* which can have variables in
them, defined in (`Data.OpenApi.Compare.Formula`).
A formula represents a set of issues, with the empty set corresponding to a
successful compatibility check. A formula can have conjunctions, where we only
report success if all parts succeeded, and if not we take the union of all
issues. A formula can also have disjunctions, but there it's impossible to
guarantee that the issues for the parts also make sense as issues for the whole
thing. Due to this if all disjuncted parts have issues we instead return a
different issue (usually non-descriptive).
The `FormulaF` datatype implements a functor that carries formulaic calculations
as well as a result value (a co-Yoneda extension of formulas for the most part).
The Applicative interface provides conjunction, and the Alternative interface
provides disjunction.
When we detect recursion we introduce a variable, compute the compatibility
check as a formula with that variable, and then we solve a fixed point equation
(`maxFixpoint`) with the variable to obtain the answer.
This `FomulaF` is further wrapped in a couple monad transformers
(`SemanticCompatFormula`, `StructuralCompatFormula`), and the whole checker is
then implemented in the resulting Applicative using ApplicativeDo.
Memoization
-----------
The `Data.OpenApi.Compare.Memo` module contains utilities for stateful
memoization and recursion detection. The entire checking process is memoized,
and can detect, propagate, and resolve recursion.
Structural/Semantic Compatibility
---------------------------------
When we encounter two nodes we first try to optimistically establish whether
they're structurally equal, i.e. the trees are exactly the same modulo reference
inlining. This check is also memoized and recursion-aware. If the check succeeds
we conclude that the schemas are compatible. Otherwise we fall back to the
"normal" semantic compatibility.
JSON Schema
-----------
The language that describes JSON payloads is probably the richest part of
OpenAPI schemas, and the checker for it also comes with a lot of moving parts.
JSON schema allows arbitrary set arithmetic using `allOf`, `anyOf`, `not` and
`oneOf`. To fully check `not` and `oneOf` we would need to compare set
subtraction, which involves negating a comparison result. If recursion is
involved we would need to be able to solve equations with negation, which we
cannot do. So `not` is unsupported altogether, and `oneOf` is only supported
when it has disjoint branches and behaves like `anyOf`.
With that in mind a JSON schema is first pre-processed into a Disjunctive Normal
Form (`DNF` in `Data.OpenApi.Compare.Validate.Schema.DNF`). This pre-processing
happens in `Data.OpenApi.Compare.Validate.Schema.Process`.
The DNF contains elementary clauses that match on a specific aspect of a JSON
object (`Condition`). Most clauses only affect objects of a single type, so JSON
values and clauses are segregated by type (`JsonType`, `TypedValue` in
`Data.OpenApi.Compare.Validate.Schema.JsonFormula`)
Then we reduce the problem of comparing two DNF's to the comparison of a
conjunction of clauses in the producer with a single clause in the consumer
(`checkFormulas` in `Data.OpenApi.Compare.Validate.Schema`).
When we do disjunction in the formula we lose information, so we try to avoid
having disjunctions in the consumer DNF. This is done by the means of
partitioning. We look for factors that would let us partition the set of all
objects into several bins, in a way that hopefully corresponds only a single
consumer to each producer. This is implemented in
`Data.OpenApi.Compare.Validate.Schema.Partition`. Currently we can only
partition by an `enum` value that is accessible via a chain of `required`
fields.
The same partitioning mechanism is used to test whether the branches of a
`oneOf` are disjoint, and emit a warning otherwise.