compaREST/docs/Development_guide.md
mniip a1ea91fc50
Add a dev and a user guide (#120)
* Add a dev and a user guide

* Changes
2021-09-07 22:33:51 +03:00

7.5 KiB

CompaREST

The tool works by simultaneously looking at a pair of nodes in OpenAPI schema trees. At every place we try to understand whether one side (the producer) can produce something that the other side (the consumer) cannot consume. This often involves recursively running the same check on child sub-nodes. The necessary criteria are implemented manually for all types of nodes that appear in the schema. The Subtree class in Data.OpenApi.Compare.Subtree designates types of nodes for which this compatibility checking is defined. The implementations for various types reside in Data.OpenApi.Compare.Validate.*.

Producer/Consumer

Rather than looking at which schema is the server's and which is the client's, in terms of compatibility checking it is more useful to track which side is the "producer" and which is the "consumer". At the root of the schema the producer is the client, but as we descend into an HTTP response, the direction flips and the producer becomes the server.

We deal with a lot of things in pairs, where one thing belongs to the producer and another belongs to the consumer. The ProdCons datatype (in Data.OpenApi.Compare.Subtree) is just a pair type that provides an Applicative abstraction for working with two things simultaneously.

Trees

We directly use the datatypes defined in the openapi3 library. For reasons of identifiability and memoization we tag all nodes in the trees with a path from the root. The tree is heterogeneous (nodes have different types), and every type of node has a different arrangement of children. There is a Step data family that defines all the possible children (of a specific type) of a node (of a specific type).

A Trace is a sequence of steps, except in a path the types of adjacent steps have to agree. The Paths datatype achieves this. Together all paths form a Category, and we often concatenate them using >>>. This is defined in Data.OpenApi.Compare.Paths.

We often keep a node and a path to it next to each other using an environment comonad. Traced is a type alias around Env which ensures that the type of the path matches the type of the node.

Issues

The output of the tool is a list of compatibility issues that came up during checking. The Issue data family describes the kinds of issues that can occur at each type of node. An issue is supposed to characterize a set of interactions (requests or responses) that demonstrate that the schemas are incompatible.

Each issue is tagged with a Behavior that describes the path to reproducing the issue. While a Trace describes a syntactic path down the schema tree, a Behavior describes which part of an interaction needs to be focused on for the issue to manifest itself.

In most places where we keep multiple issues (e.g. the output), to store them we use a prefix tree map, indexed by both types and values of Behaviors. See Data.OpenApi.Compare.PathsPrefixTree.

Report

The checker ultimately outputs a list of Issues, together with Behaviors that cause them. We run the checker twice: forwards and backwards -- to detect non-breaking changes. All of this is then compiled into a report in Data.OpenApi.Compare.Report. The tree structure of headings is computed from Behaviors, using Jets to collapse particular sequences of behavior steps into a single human-readable header item. The text of each paragraph comes from describeIssue.

We then use the pandoc library to render the report in a variety of formats.

Formulas

The schema has references, so it can end up being recursive. Similarly the process for establishing compatibility can end up being recursive. To keep track of this, the checker actually manipulates formulas which can have variables in them, defined in (Data.OpenApi.Compare.Formula).

A formula represents a set of issues, with the empty set corresponding to a successful compatibility check. A formula can have conjunctions, where we only report success if all parts succeeded, and if not we take the union of all issues. A formula can also have disjunctions, but there it's impossible to guarantee that the issues for the parts also make sense as issues for the whole thing. Due to this if all disjuncted parts have issues we instead return a different issue (usually non-descriptive).

The FormulaF datatype implements a functor that carries formulaic calculations as well as a result value (a co-Yoneda extension of formulas for the most part). The Applicative interface provides conjunction, and the Alternative interface provides disjunction.

When we detect recursion we introduce a variable, compute the compatibility check as a formula with that variable, and then we solve a fixed point equation (maxFixpoint) with the variable to obtain the answer.

This FomulaF is further wrapped in a couple monad transformers (SemanticCompatFormula, StructuralCompatFormula), and the whole checker is then implemented in the resulting Applicative using ApplicativeDo.

Memoization

The Data.OpenApi.Compare.Memo module contains utilities for stateful memoization and recursion detection. The entire checking process is memoized, and can detect, propagate, and resolve recursion.

Structural/Semantic Compatibility

When we encounter two nodes we first try to optimistically establish whether they're structurally equal, i.e. the trees are exactly the same modulo reference inlining. This check is also memoized and recursion-aware. If the check succeeds we conclude that the schemas are compatible. Otherwise we fall back to the "normal" semantic compatibility.

JSON Schema

The language that describes JSON payloads is probably the richest part of OpenAPI schemas, and the checker for it also comes with a lot of moving parts.

JSON schema allows arbitrary set arithmetic using allOf, anyOf, not and oneOf. To fully check not and oneOf we would need to compare set subtraction, which involves negating a comparison result. If recursion is involved we would need to be able to solve equations with negation, which we cannot do. So not is unsupported altogether, and oneOf is only supported when it has disjoint branches and behaves like anyOf.

With that in mind a JSON schema is first pre-processed into a Disjunctive Normal Form (DNF in Data.OpenApi.Compare.Validate.Schema.DNF). This pre-processing happens in Data.OpenApi.Compare.Validate.Schema.Process.

The DNF contains elementary clauses that match on a specific aspect of a JSON object (Condition). Most clauses only affect objects of a single type, so JSON values and clauses are segregated by type (JsonType, TypedValue in Data.OpenApi.Compare.Validate.Schema.JsonFormula)

Then we reduce the problem of comparing two DNF's to the comparison of a conjunction of clauses in the producer with a single clause in the consumer (checkFormulas in Data.OpenApi.Compare.Validate.Schema).

When we do disjunction in the formula we lose information, so we try to avoid having disjunctions in the consumer DNF. This is done by the means of partitioning. We look for factors that would let us partition the set of all objects into several bins, in a way that hopefully corresponds only a single consumer to each producer. This is implemented in Data.OpenApi.Compare.Validate.Schema.Partition. Currently we can only partition by an enum value that is accessible via a chain of required fields.

The same partitioning mechanism is used to test whether the branches of a oneOf are disjoint, and emit a warning otherwise.