7.5 KiB
CompaREST
The tool works by simultaneously looking at a pair of nodes in OpenAPI schema
trees. At every place we try to understand whether one side (the producer) can
produce something that the other side (the consumer) cannot consume. This often
involves recursively running the same check on child sub-nodes. The necessary
criteria are implemented manually for all types of nodes that appear in the
schema. The Subtree
class in Data.OpenApi.Compare.Subtree
designates types
of nodes for which this compatibility checking is defined. The implementations
for various types reside in Data.OpenApi.Compare.Validate.*
.
Producer/Consumer
Rather than looking at which schema is the server's and which is the client's, in terms of compatibility checking it is more useful to track which side is the "producer" and which is the "consumer". At the root of the schema the producer is the client, but as we descend into an HTTP response, the direction flips and the producer becomes the server.
We deal with a lot of things in pairs, where one thing belongs to the producer
and another belongs to the consumer. The ProdCons
datatype (in
Data.OpenApi.Compare.Subtree
) is just a pair type that provides an Applicative
abstraction for working with two things simultaneously.
Trees
We directly use the datatypes defined in the openapi3
library. For reasons of
identifiability and memoization we tag all nodes in the trees with a path from
the root. The tree is heterogeneous (nodes have different types), and every type
of node has a different arrangement of children. There is a Step
data family
that defines all the possible children (of a specific type) of a node (of a
specific type).
A Trace
is a sequence of steps, except in a path the types of adjacent steps
have to agree. The Paths
datatype achieves this. Together all paths form a
Category
, and we often concatenate them using >>>
. This is defined in
Data.OpenApi.Compare.Paths
.
We often keep a node and a path to it next to each other using an environment
comonad. Traced
is a type alias around Env
which ensures that the type of
the path matches the type of the node.
Issues
The output of the tool is a list of compatibility issues that came up during
checking. The Issue
data family describes the kinds of issues that can occur
at each type of node. An issue is supposed to characterize a set of interactions
(requests or responses) that demonstrate that the schemas are incompatible.
Each issue is tagged with a Behavior
that describes the path to reproducing
the issue. While a Trace
describes a syntactic path down the schema tree, a
Behavior
describes which part of an interaction needs to be focused on for the
issue to manifest itself.
In most places where we keep multiple issues (e.g. the output), to store them we
use a prefix tree map, indexed by both types and values of Behavior
s. See
Data.OpenApi.Compare.PathsPrefixTree
.
Report
The checker ultimately outputs a list of Issue
s, together with Behavior
s
that cause them. We run the checker twice: forwards and backwards -- to detect
non-breaking changes. All of this is then compiled into a report in
Data.OpenApi.Compare.Report
. The tree structure of headings is computed from
Behavior
s, using Jet
s to collapse particular sequences of behavior steps
into a single human-readable header item. The text of each paragraph comes from
describeIssue
.
We then use the pandoc
library to render the report in a variety of formats.
Formulas
The schema has references, so it can end up being recursive. Similarly the
process for establishing compatibility can end up being recursive. To keep track
of this, the checker actually manipulates formulas which can have variables in
them, defined in (Data.OpenApi.Compare.Formula
).
A formula represents a set of issues, with the empty set corresponding to a successful compatibility check. A formula can have conjunctions, where we only report success if all parts succeeded, and if not we take the union of all issues. A formula can also have disjunctions, but there it's impossible to guarantee that the issues for the parts also make sense as issues for the whole thing. Due to this if all disjuncted parts have issues we instead return a different issue (usually non-descriptive).
The FormulaF
datatype implements a functor that carries formulaic calculations
as well as a result value (a co-Yoneda extension of formulas for the most part).
The Applicative interface provides conjunction, and the Alternative interface
provides disjunction.
When we detect recursion we introduce a variable, compute the compatibility
check as a formula with that variable, and then we solve a fixed point equation
(maxFixpoint
) with the variable to obtain the answer.
This FomulaF
is further wrapped in a couple monad transformers
(SemanticCompatFormula
, StructuralCompatFormula
), and the whole checker is
then implemented in the resulting Applicative using ApplicativeDo.
Memoization
The Data.OpenApi.Compare.Memo
module contains utilities for stateful
memoization and recursion detection. The entire checking process is memoized,
and can detect, propagate, and resolve recursion.
Structural/Semantic Compatibility
When we encounter two nodes we first try to optimistically establish whether they're structurally equal, i.e. the trees are exactly the same modulo reference inlining. This check is also memoized and recursion-aware. If the check succeeds we conclude that the schemas are compatible. Otherwise we fall back to the "normal" semantic compatibility.
JSON Schema
The language that describes JSON payloads is probably the richest part of OpenAPI schemas, and the checker for it also comes with a lot of moving parts.
JSON schema allows arbitrary set arithmetic using allOf
, anyOf
, not
and
oneOf
. To fully check not
and oneOf
we would need to compare set
subtraction, which involves negating a comparison result. If recursion is
involved we would need to be able to solve equations with negation, which we
cannot do. So not
is unsupported altogether, and oneOf
is only supported
when it has disjoint branches and behaves like anyOf
.
With that in mind a JSON schema is first pre-processed into a Disjunctive Normal
Form (DNF
in Data.OpenApi.Compare.Validate.Schema.DNF
). This pre-processing
happens in Data.OpenApi.Compare.Validate.Schema.Process
.
The DNF contains elementary clauses that match on a specific aspect of a JSON
object (Condition
). Most clauses only affect objects of a single type, so JSON
values and clauses are segregated by type (JsonType
, TypedValue
in
Data.OpenApi.Compare.Validate.Schema.JsonFormula
)
Then we reduce the problem of comparing two DNF's to the comparison of a
conjunction of clauses in the producer with a single clause in the consumer
(checkFormulas
in Data.OpenApi.Compare.Validate.Schema
).
When we do disjunction in the formula we lose information, so we try to avoid
having disjunctions in the consumer DNF. This is done by the means of
partitioning. We look for factors that would let us partition the set of all
objects into several bins, in a way that hopefully corresponds only a single
consumer to each producer. This is implemented in
Data.OpenApi.Compare.Validate.Schema.Partition
. Currently we can only
partition by an enum
value that is accessible via a chain of required
fields.
The same partitioning mechanism is used to test whether the branches of a
oneOf
are disjoint, and emit a warning otherwise.