\input{common/header} \title{Typing of records in nix} \begin{document} \maketitle{} We assume a record language, whose grammar is defined as follows: \begin{grammar} ::= \{ \e/ = \e/ ; \} \| \{\} \| \e/ + \e/ \| \e/ $\orthplus{}$ \e/ \| \e/ \textbackslash{} \e/ \| \e/.\e/ \| s \end{grammar} Where: \begin{itemize} \item $+$ denotes the merge of two records (with precedence on the right if the same field exists in both records), \item $\orthplus{}$ denotes the merge of two records, assuming no field is defined in both, \item \textbackslash{} denotes the removal of a field (\emph{ie} $e_1 \backslash e_2$ is the record $e_1$ whose field $s$ is undefined, where $s$ is the value of $e_2$). \item $e_1.e_2$ denots the access to the field $s$ of the record $e_2$, where $s$ is the value of $e_2$. \item $s$ range over the set of allowed value labels (which here will be \ty{string}) \end{itemize} \section{Static labels} \subsection{Typing in a CBV setting} The typing of records in a CBV language is already given (in absence of polymorphism) by the formalism given in Section 4.5 of~\cite{Cas15} (which itself is the reformulation of the formalism presented by Alain Frisch in its Phd Thesis − see~\cite{Fri04}). In the original formalism, records (resp.\ record types) are interpreted as \emph{quasi-constant} functions from \ty{string} to $\textbf{Values} \cup \undef$ (resp.\ from \ty{string} to $\textbf{Types} \cup \undef$), where \begin{itemize} \item A \emph{quasi-constant} function from \set{L} to \set{Z} is a function $r: \set{L} \rightarrow \set{Z}$ which is constant to an element $z \in \set{Z}$, except for a finite set $\dom(r)$. We note $\set{L} \quasiconst{z} \set{Z}$ for the set of functions from \set{L} to \set{Z} quasi-constant to $z$. \item \textbf{Values} denotes the set of values in the language. \item \textbf{Types} denotes the set of values in the language. \item $\undef$ is a distinguished constant that represents an undefined field. \end{itemize} (in this formalism, the constant $\undef$ was called $\bot$, we renamed it here in order to avoid confusion with the type $\bot$ representing an undefined computation). \subsection{Adaptation to a call-by-name semantic} As long as we do not allow anything but constant strings as labels, this formalism requires only a few modifications in order to be useful in a CBN setting. In fact it is sufficient (I think) to interpret a record as an element of $\ty{string} \quasiconst{\undef} \left(\textbf{Expressions} \cup \undef\right)$ (instead of $\ty{string} \quasiconst{\undef} \left(\textbf{Values} \cup \undef\right)$). \section{Dynamic labels} In nix, the labels can be not only static strings, but also arbitrary expressions (whose value cannott by consequence be statically known in the general case), which makes the typing more complicated and way less accurate. One problem raised by this is the semantic of labels evaluation (ie: when do we evaluate the labels?). When the labels were static, this was not a problem at all as evaluation of labels was a no-op The most logical choice seems to evaluate every label eagerly as soon as we need the value of one of them. This entails that record values will have the form \texttt{\bfseries \{ $s$ = $e$; \ldots{}; $s$ = $e$; \}} where $s$ designs an evaluated label (a string value). The semantic is then the one given in the related \texttt{semantics} paper. The consequence is that, for example, an ill-formed record (with a field defined twice) will not be detected until it is used (one try to access or test the existence of one of the fields), while with static fields the bad formation of the record can be detected at the time of the definition, even in a lazy setting. Nix mixes both approaches by detecting a collision between static fields at definition place and between dynamic fields (or between a static and a dynamic one) at use-time. This approach is rather inconsistent (even more if we decide to consider a static label $s$ as syntactic sugar for the dynamic label whose expression is the constant string ``$s$''), so probably will not be reflected in the type system\footnote{It could nonethless be implemented as a pre-treatment of the AST in order to match nix's behaviour}. \section{Orthogonal merge} In CDuce, as in perl6 (the contexts into which respectivly~\cite{Fri04} and~\cite{Cas15} have been written), a record can be defined with the same field appearing twice, in which case the second appearance takes precedence over the first. This led to the definition of the $\oplus_t$ (\emph{merge} with respect to the value $t$) operator between two records. In nix, such a definition would be invalid − the language requires all fields to be distinct − so we cannot use this $\oplus_t$ operator to define records and their typing. That's why we introduced the $\orthplus$ operator, so that the litteral expression \begin{lstlisting} { x1 = e1; ...; xn = en; } \end{lstlisting} is syntactic sugar for \begin{lstlisting} { x1 = e1; } //*$\orthplus$*// ... //*$\orthplus$*// { xn = en; } \end{lstlisting} We also define a $\orthsum_t$ (orthogonal merge) operator between record type atoms defined as: \[ T_1 \orthsum_t T_2 = T_1 \oplus_t T_2 \text{ if } \dom_t(T_1) \wedge \dom_t(T_2) = \varnothing \] where for a record type atom $T$, $\dom_t(T)$ is defined as the set of all elements of \ty{string} whose image by $T$ is different from $t$. We note $T_1 \orthplus T_2$ for $T_1 \orthsum_\undef T_2$ \section{Typing} \subsection{Subtyping relation} We first extend the $\subtype$ relation to a relation on $\ty{string} \rightarrow \textbf{Types}$ (defined as the pointwise relation: $\forall r_1, r_2 \in \ty{string} \rightarrow \textbf{Types}, r_1 \subtype r_2 \Leftrightarrow \left( \forall s \in \ty{string}, r_1(s) \subtype r_2(s) \right)$). This relation defines the subtyping for record types. For example, for two closed record types: if $r_1 = \{ l_1 = v_1; \ldots{}; l_n = v_n; \}$ and $r_2 = \{ k_1 = w_1; \ldots{}; k_m = w_m; \}$ then $r_1 \subtype r_2$ if and only if $n = m$ and there exists a permutation $\σ$ of $\left\{1, \ldots{}, n\right\}$ such that for all $i \in \left\{1, \ldots{}, n\right\}$, $l_i = k_{\σ(i)}$ and $v_i \subtype w_{\σ(i)}$. \subsection{Typing rules} \subsubsection{Dynamic labels} \begin{mathpar} \input{typing/recordTypingRules} \end{mathpar} \bibliographystyle{alpha} \bibliography{../references} \end{document}