## The Catala language tutorial Welcome to this tutorial, whose objective is to guide you through the features of the Catala language and teach you how to annotate a legislative text using the language. This document is addressed primarily to developers or people that have a programming background, though tech-savvy lawyers will probably figure things out. ### Literate programming To begin writing a Catala program, you must start from the text of the legislative source that will justify the code that you will write. Concretely, that means copy-pasting the text of the law into a Catala source file and formatting it according so that Catala can understand it. Catala source files have the ".catala_en" extension. If you were to write a Catala program for a French law, you would use the ".catala_fr" extension. You can write any kind of plain text in Catala, and it will be printed as is in PDF or HTML output. You can split your text into short lines, those will appear as a single paragraph in the output. If you want to create a new paragraph, you have to leave a blank line in the source. Catala allows you to declare section or subsection headers as it is done here, with the "at" symbol repeated twice. You can define heading of lower importance by adding increasing numbers of "+" after the title of the heading. The fundamental division unit is the article, introduced by a single "at". Let's analyze a fictional example that defines an income tax. #### Article 1 The income tax for an individual is defined as a fixed percentage of the individual's income over a year. ```catala # Welcome to the code mode of Catala. This is a comment, because the line is # prefixed by #. # We will soon learn what to write here in order to translate the meaning # of the article into Catala code. ``` To do that, we will intertwine short snippets of code between the sentences of the legislative text. Each snippet of code should be as short as possible and as close as possible to the actual sentence that justifies the code. This style is called literate programming, a programming paradigm invented by the famous computer scientist Donald Knuth in the 70s. ### Defining a fictional income tax The content of article 1 uses a lot of implicit context: there exists an individual with an income, as well as an income tax that the individual has to pay each year. Even if this implicit context is not verbatim in the law, we have to explicit it for programming purposes. Concretely, we need a "metadata" section that defines the shape and types of the data used inside the law. Let's start our metadata section by declaring the type information for the individual: ```catala-metadata declaration structure Individual: # The name of the structure "Individual", must start with an # uppercase letter: this is the CamelCase convention. data income content money # In this line, "income" is the name of the structure field and # "money" is the type of what is stored in that field. # Available types include: integer, decimal, money, date, duration, # and any other structure or enumeration that you declare data number_of_children content integer # "income" and "number_of_children" start by a lowercase letter, # they follow the snake_case convention ``` This structure contains two data fields, "income" and "number_of_children". Structures are useful to group together data that goes together. Usually, you get one structure per concrete object on which the law applies (like the individual). It is up to you to decide how to group the data together, but you should aim to optimize code readability. Sometimes, the law gives an enumeration of different situations. These enumerations are modeled in Catala using an enumeration type, like: ```catala-metadata declaration enumeration TaxCredit: # The name "TaxCredit" is also written in CamelCase -- NoTaxCredit # This line says that "TaxCredit" can be a "NoTaxCredit" situation -- ChildrenTaxCredit content integer # This line says that alternatively, "TaxCredit" can be a # "ChildrenTaxCredit" situation. This situation carries a content # of type integer corresponding to the number of children concerned # by the tax credit. This means that if you're in the "ChildrenTaxCredit" # situation, you will also have access to this number of children ``` In computer science terms, such an enumeration is called a "sum type" or simply an enum. The combination of structures and enumerations allow the Catala programmer to declare all possible shapes of data, as they are equivalent to the powerful notion of "algebraic data types". We've defined and typed the data that the program will manipulate. Now we have to define the logical context in which these data will evolve. This is done in Catala using "scopes". Scopes are close to functions in terms of traditional programming. Scopes also have to be declared in metadata, so here we go: ```catala-metadata declaration scope IncomeTaxComputation: # Scope names use CamelCase context individual content Individual # This line declares a context element of the scope, which is akin to # a function parameter in computer science term. This is the piece of # data on which the scope will operate context fixed_percentage content decimal context income_tax content money ``` We now have everything to annotate the contents of article 1, which is copied over below. #### Article 1 The income tax for an individual is defined as a fixed percentage of the individual's income over a year. ```catala scope IncomeTaxComputation: definition income_tax equals individual.income *$ fixed_percentage ``` In the code, we are defining inside our scope the amount of the income tax according to the formula described in the article. When defining formulas, you have access to all the usual arithmetic operators: addition "+", subtraction "-", multiplication "*" and division (slash). However, in the Catala code, you can see that we use "*$" to multiply the individual income by the fixed percentage. The $ suffix indicates that we are performing a multiplication on an amount of money. Indeed, in Catala, you have to keep track of what you are dealing with: is it money ? Is it an integer? Using just "+" or "*" can be ambiguous in terms of rounding, since money is usually rounded at the cent. So to disambiguate, we suffix these operations with something that indicates the type of what we manipulate. The suffixes are "$" for money "." for decimals, "at" (like in email addresses) for dates and the hat symbol for durations. If you forget the suffix, the Catala type checker will display an error message that will help you put it where it belongs. But inside article 1, one question remains unknown: what is the value of of the fixed percentage? Often, precise values are defined elsewhere in the legislative source. Here, let's suppose we have: #### Article 2 The fixed percentage mentioned at article 1 is equal to 20 %. ```catala scope IncomeTaxComputation: definition fixed_percentage equals 20 % # Writing 20% is just an abbreviation for 0.20 ``` You can see here that Catala allows definitions to be scattered throughout the annotation of the legislative text, so that each definition is as close as possible to its location in the text. ### Conditional definitions So far so good, but now the legislative text introduces some trickiness. Let us suppose the third article says: #### Article 3 If the individual is in charge of 2 or more children, then the fixed percentage mentioned at article 1 is equal to 15 %. ```catala # How to redefine fixed_percentage? ``` This article actually gives another definition for the fixed percentage, which was already defined in article 2. However, article 3 defines the percentage conditionally to the individual having more than 2 children. Catala allows you precisely to redefine a variable under a condition: ```catala scope IncomeTaxComputation: definition fixed_percentage under condition individual.number_of_children >= 2 consequence equals 15 % # Writing 15% is just an abbreviation for 0.15 ``` When the Catala program will execute, the right definition will be dynamically chosen by looking at which condition is true. A correctly drafted legislative source should always ensure that at most one condition is true at all times. However, if it is not the case, Catala will let you define a precedence on the conditions, which has to be justified by the law. ### Functions Catala lets you define functions anywhere in your data. Here's what it looks like in the metadata definition when we want to define a two-brackets tax computation: ```catala-metadata declaration structure TwoBrackets: data breakpoint content money data rate1 content decimal data rate2 content decimal declaration scope TwoBracketsTaxComputation : context brackets content TwoBrackets context tax_formula content money depends on money ``` And in the code: #### Article4 The tax amount for a two-brackets computation is equal to the amount of income in each bracket multiplied by the rate of each bracket. ```catala scope TwoBracketsTaxComputation : definition tax_formula of income equals if income <=$ brackets.breakpoint then income *$ brackets.rate1 else ( brackets.breakpoint *$ brackets.rate1 +$ (income -$ brackets.breakpoint) *$ brackets.rate2 ) ``` ### Scope inclusion Now that we've defined our helper scope for computing a two-brackets tax, we want to use it in our main tax computation scope. #### Article 5 For individuals whose income is greater than $100,000, the income tax of article 1 is 40% of the income above $100,000. Below $100,000, the income tax is 20% of the income. ```catala declaration scope NewIncomeTaxComputation: context two_brackets scope TwoBracketsTaxComputation # This line says that we add the item two_brackets to the context. # However, the "scope" keyword tells that this item is not a piece of data # but rather a subscope that we can use to compute things. context individual content Individual context income_tax content money scope NewIncomeTaxComputation : definition two_brackets.brackets equals TwoBrackets { -- breakpoint: $100,000 -- rate1: 20% -- rate2: 40% } definition income_tax equals two_brackets.tax_formula of individual.income ``` #### Article 6 Individuals earning less than $10,000 are exempted of the income tax mentioned at article 1. ```catala scope NewIncomeTaxComputation: definition income_tax under condition individual.income <=$ $10,000 consequence equals $0 ``` That's it! We've defined a two-brackets tax computation simply by annotating legislative article by snippets of Catala code. However, attentive readers may have caught something weird in articles 5 and 6. What happens when the income of the individual is between $10,000 and $100,000 ? The law leaves it unspecified ; our dummy articles are clearly badly drafted. But Catala can help you find this sort of errors via simple testing or even formal verification. Let's start with the testing. ### Testing Catala programs Testing Catala programs can be done directly into Catala. Indeed, writing test cases for each Catala scope that you define is a good practice called "unit testing" in the software engineering community. A test case is defined as another scope: #### Testing NewIncomeTaxComputation ```catala declaration scope Test1: context tax_computation scope NewIncomeTaxComputation context income_tax content money scope Test1: definition tax_computation.individual # We define the argument to the subscope equals # The four lines below define a whole structure by giving a value to # each of its fields Individual { -- income: $230,000 -- number_of_children: 0 } definition income_tax equals tax_computation.income_tax # Next, we retrieve the income tax value compute it by the subscope and # assert that it is equal to the expected value : # ($230,000-$100,00)*40%+$100,000*20% = $72,000 assertion income_tax = $72,000 ``` This test should pass. Let us now consider a failing test case: ```catala declaration scope Test2: context tax_computation scope NewIncomeTaxComputation context income_tax content money scope Test2: definition tax_computation.individual equals Individual { -- income: $4,000 -- number_of_children: 0 } definition income_tax equals tax_computation.income_tax assertion income_tax = $0 ``` This test case should compute a $0 income tax because of Article 6. But instead, execution will yield an error saying that there is a conflict between rules. ### Defining exceptions to rules Indeed, the definition of the income tax in article 6 conflicts with the definition of income tax in article 5. But actually, article 6 is just an exception of article 5. In the law, it is implicit that if article 6 is applicable, then it takes precedence over article 5. #### Fixing the computation This implicit precedence has to be explicitly declared in Catala. Here is a fixed version of the NewIncomeTaxComputation scope: ```catala declaration scope NewIncomeTaxComputationFixed: context two_brackets scope TwoBracketsTaxComputation context individual content Individual context income_tax content money scope NewIncomeTaxComputationFixed : definition two_brackets.brackets equals TwoBrackets { -- breakpoint: $100,000 -- rate1: 20% -- rate2: 40% } # To define an exception to a rule, you have to first label the rule that # you want to attach the exception to. You can put any snake_case identifier # for the label label article_5 definition income_tax equals two_brackets.tax_formula of individual.income # Then, you can declare the exception by referring back to the label exception article_5 definition income_tax under condition individual.income <=$ $10,000 consequence equals $0 ``` And the test that should now work: ```catala declaration scope Test3: context tax_computation scope NewIncomeTaxComputationFixed context income_tax content money scope Test3: definition tax_computation.individual equals Individual { -- income: $4,000 -- number_of_children: 0 } definition income_tax equals tax_computation.income_tax assertion income_tax = $0 ``` ### Conclusion This tutorial presents the basic concepts and syntax of the Catala language features. It is then up to you use them to annotate legislative texts with their algorithmic translation. There is no single way to write Catala programs, as the program style should be adapted to the legislation it annotates. However, Catala is a functional language at heart, so following standard functional programming design patterns should help achieve concise and readable code.