18 The Language Specification
Sofia Rodrigues edited this page 2023-04-12 12:58:49 -03:00

This is a draft for the 0.4 version of the Kind language specification, it's made to be a "formal" informal specification that is the basis for a real world implementation but the compiler can make some slight modifications to it in order to make it more performant.

This document is influenced by the Ferrocene Specification, The Ada Reference Manual, the Rust language reference and the Haskell 2010 Reference guide.

1.1. Scope

  • This document specifies the syntax and the semantics of programs written in the programming language Kind and it's implemented as a whole with some modifications by the kindc compiler that is on hosted on GitHub under the HIgherOrderCO organization.
  • This document does not specifies how programs are compiled to any particular backend, how they're interpreted in this specific backend, how things are translated to any specific IR, just how they're equivalent to some other constructions of the language.
  • This document does not specifies how external effects interacts with the underlying machine in order to make it independent of a machine and just an abstract specification.

1.2. Structure

  • Introduction
  • Chapter 1. General
    • 1.1. Scope
    • 1.2. Structure
  • Chapter 2. Lexical Grammar
    • 2.1. Notation Conventions
    • 2.2. Character Set
    • 2.3. Grammar
  • Chapter 3. Items syntax and semantics
    • 3.1. Module
      • 3.1.1 Shebang
      • 3.1.2 Use aliases
    • 3.2. Declarations
      • 3.2.1. Arguments
        • 3.2.1.1 Strictness, Erasability and Hidability
      • 3.2.1. Functions Rules and Signatures
        • Docstrings
        • Public and private
        • Default parameters
        • Absurd Rule
      • 3.2.2. Types and Records
      • 3.2.3. Attributes
      • 3.2.4. Commands
        • 3.3.5. Eval and Print
        • 3.3.6. Widget
    • 3.3. Literals
      • 3.3.1 Type Literals
      • 3.3.2 Numeric Literals
    • 3.4. Expressions
      • 3.3.3 Function calls
        • 3.3.4 Named arguments
        • 3.3.5 Lambda Injection
      • 3.3.6 Lightweight lambdas
      • 3.3.7 Pi type
      • 3.3.8 Sigma type
      • 3.3.9 Let binding
      • 3.3.10 Type annotation
      • 3.3.11 Binary operations
      • 3.3.12 Substitutions
      • 3.3.13 Field accessor
      • 3.3.14 Syntax sugars
        • 3.3.14.1 Open expression
        • 3.3.14.2 Match expression
        • 3.3.14.3 If expression
        • 3.3.14.4 Do expression
          • 3.3.14.4.1 Return statement
          • 3.3.14.4.2 Ask statement
        • 3.3.14.5 List expression
        • 3.3.14.6 Pair expression
        • 3.3.14.7 Sigma expression
    • 3.5 Patterns
      • 3.5.1 Constructor pattern
      • 3.5.2 Variable pattern
      • 3.5.3 Absurd pattern
      • 3.5.4 List pattern
      • 3.5.5 Pair pattern
      • 3.5.6 Named patterns
      • 3.5.7 Literal Patterns
      • 3.5.8 Rest patterns
      • 3.5.9 Wildcards
  • Chapter 4. Types
    • 4.1. Built-in Types
    • 4.2. Function Types
    • 4.3. Inductive Types
    • 4.4. Records
    • 4.5. Type aliases
    • 4.6. Error Type
  • Chapter 5. Patterns
    • 5.1. Refutability
    • 5.2. Pattern Matching and Match statement
  • Chapter 6. Entities and Resolution
    • 6.1. Name resolution
    • 6.2. Crates and Packages
    • 6.3. Namespaces and Visibility
  • Chapter 8. Compile time constants
    • 8.1. LOC, FILE, COL, LINE, VERSION

2.0. Lexical Grammar

In this chapter, we describe the lexical structure of the Kind language.

2.1. Notation Conventions

The notation used in the lexical grammar is the following:

Syntax Description
pattern? Optional
pattern* Zero or more repetitions
pattern+ One or more repetitions
pat1 | pat2 Choice
pat1-pat2 Pat1 excluding Pat2
terminal Terminal
num..num2 Char range from num to num2

In the next section, we will ignore all of the whitespaces but we will handle semicolons and linebreaks explicitly due to how Kind expressions work in practice. All of the rules are expressed in a BNF-like language with rules in the form:

non_terminal -> alt1
              | alt2
              | ...

2.2 Character Set

The program text of a Kind program is written using the Unicode character set. A whitespace string matches the following rule:


hor_tab   -> `0x09` ; Horizontal Tabulation
ver_tab   -> `0x0B` ; Vertical Tabulation
new_line  -> `0x0A` ; New line
form_feed -> `0x0A` ; Form Feed
car_ret   -> `0x0D` ; Cariage Return
space     -> `0x20` ; Space
next_line -> `0x85` ; Next Line

white_space -> ( hor_tab
               | ver_tab
               | new_line
               | form_feed
               | car_ret
               | space
               | next_line
               )+

An ascii character matches the following rule.

char       -> 0x00..0x7F

2.3. Grammar

We have a distiction between lower and upper cased identifiers in the language. Upper cased identifiers are used for globally available functions and constructors and lower cased identifiers for variables.

The lexical analysis process should use the "maximum munch", the principle that the longest chain of characters should be consumed to determine a single token.

lower    -> 'a'-'z'
lower_id -> (lower | '_') id_tail

upper    -> 'A'-'Z'
upper_id -> upper id_tail

id_tail  -> (lower | upper | '_' | digit)

any_id   -> lower_id | upper_id


All of the numbers can be suffixed with some specific type because Kind does not have something like a Num type class so it's discovered sintatically.

digit       -> '0'..'9'
hex_digit   -> '0'..'9' | 'A' .. 'F' | 'a' .. 'f'
bin_digit   -> '0'..'1'
oct_digit   -> '0'..'7'

number      -> `0` ('x' | 'X') hex_digit+
             | `0` ('b' | 'B') bin_digit+
             | `0` ('o' | 'O') oct_digit+
             | digit+ `u60`?

number_u60  -> number `u60`?
number_u120 -> number `u120`
number_nat  -> number `n`

number_f64  -> '-' dec_float
dec_float   -> decimal+ ('.' decimal+ exp? | exp)
exp_float   -> ('e' | 'E') ('+' | '-')? digit+

escaped_end  -> `n` | `r` | `t` | `0` | `\` | `'` | `"`
              | (`x` | 'X') hex_digit hex_digit
              | (`u` | `U`) hex_digit hex_digit hex_digit hex_digit

escaped_char -> `\` escaped_end
              | char

string_lit   -> `"` escaped_char* `"`
char_lit     -> `'` escaped_char `'`

All of these operators are useful to make binary operations between elements of type U60.

operator    -> `+` | `-` | `*` | `/` | `%` | `&`
             | `|` | `^` | `>>` | `<<` | `>` | `<`
             | `<=` | `>=` | `!` | `!=`

Keywords and the help construction.

help          -> `?` lower_id

keywords      -> `return` | `ask` | `with`

soft_keywords -> `let` | `do` | `if` | `else` | `match`
               | `type` | `record` | `constructor` | `as`
               | `use`

Line and block comments.

block_part    -> (char - (`/*` | `*/`))*
block_comment -> `/*` (block_comment | block_part) `*/`

comment       -> `//` (char - (`\n` | EOF))

Separators.

ponctuation   -> `.` | `,` | `<` | `>` | `@` | `~` | `#`
delimiter     -> `(` | `)` | `[` | `]` | `{` | `}`

semi          -> (';' | '\n')+
_             -> semi?

Chapter 3. Items syntax and semantics

This chapter describes all of the available syntax in the Kind language and it's static and dynamic semantics.

A Kind program is defined as a set of modules that are ways to reuse declarations and to control the namespace.

Chapter 3.1. Module

A module is a container for zero or more declarations, it has a name that delimites which are the names that can be declared inside that module and how other modules can find that one.

The names inside of a module are limited because it enforces the transparency of dependency when trying to find some module by the name and avoid weird injections.

3.1.1 Shebang

The bash-shebang is always stripped from the top of the file if present.

3.2.2 Namespace aliases

A set of namespace aliases can be present at the top of a module and they are useful for local synonyms.

Syntax

alias_decl <- `use` _ qualified_id `as` _ qualified_id