tix-papers/typing/records.tex

\input{common/header}

\title{Typing of records in nix}

\begin{document}

\maketitle{}

We assume a record language, whose grammar is defined as follows:

\begin{grammar}
  <e> ::= \{ \e/ = \e/ ; \} \| \{\}
    \| \e/ + \e/
    \| \e/ $\orthplus{}$ \e/
    \| \e/ \textbackslash{} \e/
    \| \e/.\e/
    \| s
\end{grammar}

Where:
\begin{itemize}
  \item $+$ denotes the merge of two records (with precedence on the right if
    the same field exists in both records),
  \item $\orthplus{}$ denotes the merge of two records, assuming no field is
    defined in both,
  \item \textbackslash{} denotes the removal of a field (\emph{ie} $e_1
    \backslash e_2$ is the record $e_1$ whose field $s$ is undefined, where $s$
    is the value of $e_2$).
  \item $e_1.e_2$ denots the access to the field $s$ of the record $e_2$,
    where $s$ is the value of $e_2$.
  \item $s$ range over the set of allowed value labels (which here will be \ty{string})
\end{itemize}

\section{Static labels}
\subsection{Typing in a CBV setting}
The typing of records in a CBV language is already given (in absence of
polymorphism) by the formalism given in Section 4.5 of~\cite{Cas15} (which itself
is the reformulation of the formalism presented by Alain Frisch in its Phd
Thesis − see~\cite{Fri04}).

In the original formalism, records (resp.\ record types) are interpreted as
\emph{quasi-constant} functions from \ty{string} to $\textbf{Values} \cup
\undef$ (resp.\ from \ty{string} to $\textbf{Types} \cup \undef$), where
\begin{itemize}
  \item A \emph{quasi-constant} function from \set{L} to \set{Z} is a
    function $r: \set{L} \rightarrow \set{Z}$ which is constant to an element
    $z \in \set{Z}$, except for a finite set $\dom(r)$. We note $\set{L}
    \quasiconst{z} \set{Z}$ for the set of functions from \set{L} to \set{Z}
    quasi-constant to $z$.
  \item \textbf{Values} denotes the set of values in the language.
  \item \textbf{Types} denotes the set of values in the language.
  \item $\undef$ is a distinguished constant that represents an undefined
    field.
\end{itemize}

(in this formalism, the constant $\undef$ was called $\bot$, we renamed it here
in order to avoid confusion with the type $\bot$ representing an undefined
computation).

\subsection{Adaptation to a call-by-name semantic}

As long as we do not allow anything but constant strings as labels, this
formalism requires only a few modifications in order to be useful in a CBN
setting. In fact it is sufficient (I think) to interpret a record as an element
of $\ty{string} \quasiconst{\undef} \left(\textbf{Expressions} \cup
\undef\right)$ (instead of $\ty{string} \quasiconst{\undef}
\left(\textbf{Values} \cup \undef\right)$).

\section{Dynamic labels}

In nix, the labels can be not only static strings, but also arbitrary
expressions (whose value cannott by consequence be statically known in the
general case), which makes the typing more complicated and way less accurate.

One problem raised by this is the semantic of labels evaluation (ie: when do we
evaluate the labels?). When the labels were static, this was not a problem at
all as evaluation of labels was a no-op

The most logical choice seems to evaluate every label eagerly as soon as we
need the value of one of them. This entails that record values will have the
form \texttt{\bfseries \{ $s$ = $e$; \ldots{}; $s$ = $e$; \}} where $s$ designs
an evaluated label (a string value). The semantic is then the one given in the
related \texttt{semantics} paper.

The consequence is that, for example, an ill-formed record (with a field
defined twice) will not be detected until it is used (one try to access or test
the existence of one of the fields), while with static fields the bad formation
of the record can be detected at the time of the definition, even in a lazy
setting.

Nix mixes both approaches by detecting a collision between static fields at
definition place and between dynamic fields (or between a static and a dynamic
one) at use-time. This approach is rather inconsistent (even more if we decide
to consider a static label $s$ as syntactic sugar for the dynamic label whose
expression is the constant string ``$s$''), so probably will not be reflected in
the type system\footnote{It could nonethless be implemented as a pre-treatment
of the AST in order to match nix's behaviour}.

\section{Orthogonal merge}

In CDuce, as in perl6 (the contexts into which respectivly~\cite{Fri04}
and~\cite{Cas15} have been written), a record can be defined with the same field
appearing twice, in which case the second appearance takes precedence over the
first. This led to the definition of the $\oplus_t$ (\emph{merge} with
respect to the value $t$) operator between two records.

In nix, such a definition would be invalid − the language requires all
fields to be distinct − so we cannot use this $\oplus_t$ operator to
define records and their typing. That's why we introduced the $\orthplus$
operator, so that the litteral expression

\begin{lstlisting}
{
    x1 = e1;
    ...;
    xn = en;
}
\end{lstlisting}

is syntactic sugar for

\begin{lstlisting}
{ x1 = e1; } //*$\orthplus$*// ... //*$\orthplus$*// { xn = en; }
\end{lstlisting}

We also define a $\orthsum_t$ (orthogonal merge) operator between record type
atoms defined as:

\[
  T_1 \orthsum_t T_2 = T_1 \oplus_t T_2
  \text{ if } \dom_t(T_1) \wedge \dom_t(T_2) = \varnothing
\]

where for a record type atom $T$, $\dom_t(T)$ is defined as the set
of all elements of \ty{string} whose image by $T$ is different from $t$.

We note $T_1 \orthplus T_2$ for $T_1 \orthsum_\undef T_2$

\section{Typing}
\subsection{Subtyping relation}

We first extend the $\subtype$ relation to a relation on $\ty{string}
\rightarrow \textbf{Types}$ (defined as the pointwise relation: $\forall r_1,
r_2 \in \ty{string} \rightarrow \textbf{Types}, r_1 \subtype r_2
\Leftrightarrow \left( \forall s \in \ty{string}, r_1(s) \subtype r_2(s)
\right)$).

This relation defines the subtyping for record types. For example, for two
closed record types:

if $r_1 = \{ l_1 = v_1; \ldots{}; l_n = v_n; \}$
and $r_2 = \{ k_1 = w_1; \ldots{}; k_m = w_m; \}$
then
$r_1 \subtype r_2$ if and only if $n = m$ and there exists a permutation $\σ$
of $\left\{1, \ldots{}, n\right\}$ such that for all $i \in \left\{1, \ldots{},
n\right\}$, $l_i = k_{\σ(i)}$ and $v_i \subtype w_{\σ(i)}$.

\subsection{Typing rules}
\subsubsection{Dynamic labels}
\begin{mathpar}
  \input{typing/recordTypingRules}
\end{mathpar}

\bibliographystyle{alpha}
\bibliography{../references}
\end{document}