# The Catala language tutorial Welcome to this tutorial, whose objective is to guide you through the features of the Catala language and teach you how to annotate a legislative text using the language. This document is addressed primarily to developers or people that have a programming background, though tech-savvy lawyers will probably figure things out. ```catala # Welcome to the code mode of Catala. This is a comment, because the line is # prefixed by #. # This tutorial does not cover the installation of Catala. For more # information about this, please refer the readme at: # https://github.com/CatalaLang/catala#building-and-installation # If you want to get started with your own Catala program to follow this # tutorial, simply create an empty file with the extension .catala_en # and start typing your program. # For a complete reference of the Catala syntax, see: # https://github.com/CatalaLang/catala/raw/master/doc/syntax/syntax.pdf # This tutorial itself is written as a Catala program and can be accessed at: # https://github.com/CatalaLang/catala/blob/master/examples/tutorial_en ``` ## Literate programming To begin writing a Catala program, you must start from the text of the legislative source that will justify the code that you will write. Concretely, that means copy-pasting the text of the law into a Catala source file and formatting it according so that Catala can understand it. Catala source files have the ".catala_en" extension. If you were to write a Catala program for a French law, you would use the ".catala_fr" extension; for Polish law, "catala_pl", etc. You can write any kind of plain text in Catala, and it will be printed as is in PDF or HTML output. You can split your text into short lines of less than 80 characters, terminal-style, and those will appear as a single paragraph in the output. If you want to create a new paragraph, you have to leave a blank line in the source. The syntax for non-code text in a Catala program follows a subset of Markdown that supports titles and Catala block codes. Catala allows you to declare section or subsection headers as it is done here, with the "#" symbol. You can define heading of lower importance by adding increasing numbers of "#" after the title of the heading. Generally, the literate programming syntax of Catala follows the syntax of Markdown (though it does not have all of its features). In the rest of this tutorial, we will analyze a fictional example that defines an income tax. This income tax is defined through several articles of law, each of them introduced by a header. Here is the first one: ### Article 1 The income tax for an individual is defined as a fixed percentage of the individual's income over a year. ```catala # We will soon learn what to write here in order to translate the meaning # of the article into Catala code. # To create a block of Catala code in your file, bound it with Mardown-style # "```catala" and "```" delimiters. ``` To translate that fictional income tax definition into a Catala program, we will intertwine short snippets of code between the sentences of the legislative text. Each snippet of code should be as short as possible and as close as possible to the actual sentence that justifies the code. This style is called literate programming, a programming paradigm invented by the famous computer scientist Donald Knuth in the 70s. ## Defining a fictional income tax The content of article 1 uses a lot of implicit context: there exists an individual with an income, as well as an income tax that the individual has to pay each year. Even if this implicit context is not verbatim in the law, we have to explicit it for programming purposes. Concretely, we need a "metadata" section that defines the shape and types of the data used inside the law. Let's start our metadata section by declaring the type information for the individual, the taxpayer that will be the subject of the tax computation. This individual has an income and a number of children, both pieces of information which will be needed for tax purposes : ```catala-metadata declaration structure Individual: # The name of the structure "Individual", must start with an # uppercase letter: this is the CamelCase convention. data income content money # In this line, "income" is the name of the structure field and # "money" is the type of what is stored in that field. # Available types include: integer, decimal, money, date, duration, # and any other structure or enumeration that you declare. data number_of_children content integer # "income" and "number_of_children" start by a lowercase letter, # they follow the snake_case convention. ``` This structure contains two data fields, "income" and "number_of_children". Structures are useful to group together data that goes together. Usually, you get one structure per concrete object on which the law applies (like the individual). It is up to you to decide how to group the data together, but you should aim to optimize code readability. Sometimes, the law gives an enumeration of different situations. These enumerations are modeled in Catala using an enumeration type, like: ```catala-metadata declaration enumeration TaxCredit: # The name "TaxCredit" is also written in CamelCase. -- NoTaxCredit # This line says that "TaxCredit" can be a "NoTaxCredit" situation. -- ChildrenTaxCredit content integer # This line says that alternatively, "TaxCredit" can be a # "ChildrenTaxCredit" situation. This situation carries a content # of type integer corresponding to the number of children concerned # by the tax credit. This means that if you're in the "ChildrenTaxCredit" # situation, you will also have access to this number of children. ``` In computer science terms, such an enumeration is called a "sum type" or simply an enum. The combination of structures and enumerations allow the Catala programmer to declare all possible shapes of data, as they are equivalent to the powerful notion of "algebraic data types". We've defined and typed the data that the program will manipulate. Now we have to define the logical context in which this data will evolve. This is done in Catala using "scopes". Scopes are close to functions in terms of traditional programming. Scopes also have to be declared in metadata, so here we go: ```catala-metadata declaration scope IncomeTaxComputation: # Scope names use CamelCase. input individual content Individual # This line declares a scope variable of the scope, which is akin to # a function parameter in computer science term. This is the piece of # data on which the scope will operate. internal fixed_percentage content decimal output income_tax content money ``` The scope is the basic abstraction unit in Catala programs, and the declaration of the scope is akin to a function signature: it contains a list of all the arguments along with their types. But in Catala, scopes' variables stand for three things: input arguments, local variables and outputs. The difference between these three categories can be specified by the different input/output attributes preceding the variable names. "input" means that the variable has to be defined only when the scope IncomeTaxComputation is called. "internal" means that the variable cannot be seen from outside the scope: it is neither an input nor an output of the scope. "output" means that a caller scope can retrieve the computed value of the variable. Note that a variable can also be simulataneously an input and an output of the scope, in that case it should be annotated with "input output". We now have everything to annotate the contents of article 1, which is copied over below. ### Article 1 The income tax for an individual is defined as a fixed percentage of the individual's income over a year. ```catala scope IncomeTaxComputation: definition income_tax equals individual.income *$ fixed_percentage ``` In the code, we are defining inside our scope the amount of the income tax according to the formula described in the article. When defining formulas, you have access to all the usual arithmetic operators: addition "+", subtraction "-", multiplication "*" and division (slash). However, in the Catala code, you can see that we use "*$" ("*€" in French, etc.) to multiply the individual income by the fixed percentage. The $ suffix indicates that we are performing a multiplication on an amount of money. Indeed, in Catala, you have to keep track of what you are dealing with: is it money ? Is it an integer? Using just "+" or "*" can be ambiguous in terms of rounding, since money is usually rounded at the cent. So to disambiguate, we suffix these operations with something that indicates the type of what we manipulate. The suffixes are "$" for money, "." for decimals, "@" for dates and the "^" symbol for durations. If you forget the suffix, the Catala type checker will display an error message that will help you put it where it belongs. Coming back to article 1, one question remains unknown: what is the value of the fixed percentage? Often, precise values are defined elsewhere in the legislative source. Here, let's suppose we have: ### Article 2 The fixed percentage mentioned at article 1 is equal to 20 %. ```catala scope IncomeTaxComputation: definition fixed_percentage equals 20 % # Writing 20% is just an abbreviation for "0.20". ``` You can see here that Catala allows definitions to be scattered throughout the annotation of the legislative text, so that each definition is as close as possible to its location in the text. ## Conditional definitions So far so good, but now the legislative text introduces some trickiness. Let us suppose the third article says: ### Article 3 If the individual is in charge of 2 or more children, then the fixed percentage mentioned at article 1 is equal to 15 %. ```catala # How to redefine fixed_percentage? ``` This article actually gives another definition for the fixed percentage, which was already defined in article 2. However, article 3 defines the percentage conditionally to the individual having more than 2 children. Catala allows you precisely to redefine a variable under a condition: ```catala scope IncomeTaxComputation: definition fixed_percentage under condition individual.number_of_children >= 2 consequence equals 15 % # Writing 15% is just an abbreviation for "0.15". ``` When the Catala program will execute, the right definition will be dynamically chosen by looking at which condition is true. A correctly drafted legislative source should always ensure that at most one condition is true at all times. However, if it is not the case, Catala will let you define a precedence on the conditions, which has to be justified by the law. But we will see how to do that later. ## Rules So far, you've learnt how to declare a scope with some variables, and give definitions to these variables scattered accross the text of the law at the relevant locations. But there is a pattern very frequent in legislative texts: what about conditions? A condition is a value that can be either true or false, like a boolean in programming. However, the law implicitly assumes a condition is false unless specified otherwise. This pattern is so common in law that Catala gives it a special syntax. More precisely, it calls the definition of conditions "rules", which coincides with the usual meaning law people would give it. Here is an example of a condition that might arise in the law: ### Article 3 bis The children eligible for application of article 3 ```catala # To deal with children eligibility, we create a new scope. declaration scope Child: input age content integer # The age of the child can be declared as before. output is_eligible_article_3 condition # For we declare the eligibility using the special "condition" keyword # that stands for the content of the variable. scope Child: rule is_eligible_article_3 under condition age < 18 consequence fulfilled # When defining when a condition is true, we use the special "rule" syntax # instead of "definition". Rules set conditions to be "fulfilled" or # "not fulfilled" conditionally. ``` When interacting with other elements of the code, condition values behaves like boolean values. ## Functions Catala lets you define functions anywhere in your scope variable. Indeed, Catala is a functional language and encourages using functions to describe relationships between data. Here's what it looks like in the metadata definition when we want to define a two-brackets tax computation: ```catala-metadata declaration structure TwoBrackets: data breakpoint content money data rate1 content decimal data rate2 content decimal # This structure describes the parameters of a tax computation formula that # has two tax brackets, each with their own tax rate. declaration scope TwoBracketsTaxComputation : input brackets content TwoBrackets # This input variable contains the description of the # parameters of the tax formula. output tax_formula content money depends on money # But for declaring the tax_formula variable, we declare it as # a function: "content money depends on money" means a function that # returns money as output (the tax) and takes money as input (the income). ``` ### Article 4 The tax amount for a two-brackets computation is equal to the amount of income in each bracket multiplied by the rate of each bracket. ```catala scope TwoBracketsTaxComputation : definition tax_formula of income equals # The money parameter for the "tax_formula" function is introduced as # "income". The name of the parameter is your choice, and will not impact # things outside this part of the definition. You can choose another # name in another definition of "tax_formula". if income <=$ brackets.breakpoint then income *$ brackets.rate1 else ( brackets.breakpoint *$ brackets.rate1 +$ (income -$ brackets.breakpoint) *$ brackets.rate2 ) # This is the formula for implementing a two-bracketstax system. ``` ## Scope inclusion Now that we've defined our helper scope for computing a two-brackets tax, we want to use it in our main tax computation scope. As mentioned before, Catala's scope can also be thought of as big functions. And these big functions can call each other, which is what we'll see in the below article. ### Article 5 For individuals whose income is greater than $100,000, the income tax of article 1 is 40% of the income above $100,000. Below $100,000, the income tax is 20% of the income. ```catala declaration scope NewIncomeTaxComputation: two_brackets scope TwoBracketsTaxComputation # This line says that we add the item two_brackets to the context. # However, the "scope" keyword tells that this item is not a piece of data # but rather a subscope that we can use to compute things. input individual content Individual output income_tax content money scope NewIncomeTaxComputation : # Since the subscope "two_brackets" is like a big function we can call, # we need to define its arguments. This is done below: definition two_brackets.brackets equals TwoBrackets { -- breakpoint: $100,000 -- rate1: 20% -- rate2: 40% } # By defining the input variable "brackets" of the subscope "two_brackets", # we have changed how the subscope will execute. The subscope will execute # with all the values defined by the caller, then compute the value # of its other variables. definition income_tax equals two_brackets.tax_formula of individual.income # After the subscope has executed, you can retrieve results from it. Here, # we retrive the result of variable "tax_formula" of computed by the # subscope "two_brackets". It's up yo you to choose what is an input and # an output of your subscope; if you make an inconsistent choice, the # Catala compiler will warn you. ``` Now that we've successfully defined our income tax computation, the legislator inevitably comes to disturb our beautiful and regular formulas to add a special case! The article below is a really common pattern in statutes, and let's see how Catala handles it. ### Article 6 Individuals earning less than $10,000 are exempted of the income tax mentioned at article 1. ```catala scope NewIncomeTaxComputation: # Here, we simply define a new conditional definition for "income tax" # that handles the special case. definition income_tax under condition individual.income <=$ $10,000 consequence equals $0 # What, you think something might be wrong with this? Hmmm... We'll see # later! ``` That's it! We've defined a two-brackets tax computation with a special case simply by annotating legislative article by snippets of Catala code. However, attentive readers may have caught something weird in the Catala translation of articles 5 and 6. What happens when the income of the individual is lesser than $10,000? Right now, the two definitions at articles 5 and 6 for income_tax apply, and they're in conflict. This is a flaw in the Catala translation, but the language can help you find this sort of errors via simple testing or even formal verification. Let's start with the testing. ## Testing Catala programs Testing Catala programs can be done directly into Catala. Indeed, writing test cases for each Catala scope that you define is a good practice called "unit testing" in the software engineering community. A test case is defined as another scope: ### Testing NewIncomeTaxComputation ```catala declaration scope Test1: # We include the scope to tax as a subscope. tax_computation scope NewIncomeTaxComputation output income_tax content money # To execute that test, assuming that the Catala compiler can be called # with "catala", enter the following command: # catala Interpret --scope=Test1 tutorial_en.catala_en scope Test1: definition tax_computation.individual # We define the argument to the subscope equals # The four lines below define a whole structure by giving a value to # each of its fields Individual { -- income: $230,000 -- number_of_children: 0 } definition income_tax equals tax_computation.income_tax # Next, we retrieve the income tax value compute it by the subscope and # assert that it is equal to the expected value : # ($230,000-$100,00)*40%+$100,000*20% = $72,000 assertion income_tax = $72,000 ``` This test should pass. Let us now consider a failing test case: ```catala declaration scope Test2: tax_computation scope NewIncomeTaxComputation output income_tax content money scope Test2: definition tax_computation.individual equals Individual { -- income: $4,000 -- number_of_children: 0 } definition income_tax equals tax_computation.income_tax assertion income_tax = $0 ``` This test case should compute a $0 income tax because of Article 6. But instead, execution will yield an error saying that there is a conflict between rules. ## Defining exceptions to rules Indeed, the definition of the income tax in article 6 conflicts with the definition of income tax in article 5. But actually, article 6 is just an exception of article 5. In the law, it is implicit that if article 6 is applicable, then it takes precedence over article 5. ### Fixing the computation This implicit precedence has to be explicitly declared in Catala. Here is a fixed version of the NewIncomeTaxComputation scope: ```catala declaration scope NewIncomeTaxComputationFixed: two_brackets scope TwoBracketsTaxComputation input individual content Individual output tax_formula content money depends on money context output income_tax content money # This variable is tagged with "context", a new concept which we have not # introduced yet. For now, ignore it as we'll come back to it in the section # "Context scope variables". scope NewIncomeTaxComputationFixed : definition two_brackets.brackets equals TwoBrackets { -- breakpoint: $100,000 -- rate1: 20% -- rate2: 40% } definition tax_formula of income equals two_brackets.tax_formula of income ``` To define an exception to a rule, you have to first label the rule that you want to attach the exception to. You can put any snake_case identifier for the label: ```catala scope NewIncomeTaxComputationFixed: label article_5 definition income_tax equals two_brackets.tax_formula of individual.income # Then, you can declare the exception by referring back to the label exception article_5 definition income_tax under condition individual.income <=$ $10,000 consequence equals $0 ``` And the test that should now work: ```catala declaration scope Test3: tax_computation scope NewIncomeTaxComputationFixed output income_tax content money scope Test3: definition tax_computation.individual equals Individual { -- income: $4,000 -- number_of_children: 0 } definition income_tax equals tax_computation.income_tax assertion income_tax = $0 ``` ### Defining exceptions to groups of rules Note that the label system also lets you define more complicated exceptions patterns. Sometimes, you want to declare an exception to a group of piecewise definitions. To do that, simply use the same label for all the piecewise definitions. ### Cumulative exceptions As we have seen, two exceptions applying at the same time to a given rule are in conflict, and trigger an error. It happens, though, that these exceptions yield the same end result: for convenience, Catala tolerates this case and returns the common result, as long as there is a strict syntactic equality. #### Article 6 bis Individuals with 7 children or more are exempted of the income tax mentioned at article 1. ```catala scope NewIncomeTaxComputationFixed: exception article_5 definition income_tax under condition individual.number_of_children >= 7 consequence equals $0 ``` The same problem as above would be triggered for families with an income below `$10,000` and 7 children or more. But here Catala can detect that it won't matter since the result in both cases is an exemption. ```catala declaration scope Test4: tax_computation scope NewIncomeTaxComputationFixed output income_tax content money scope Test4: definition tax_computation.individual equals Individual { -- income: $7,000 -- number_of_children: 7 } definition income_tax equals tax_computation.income_tax assertion income_tax = $0 ``` ## Context scope variables With its "input","output" and "internal" variables, Catala's scope are close to regular functions with arguments, local variables and output. However, the law can sometimes be adversarial to good programming practices and define provisions that break the abstraction barrier normally associated to a function. This can be the case when an outside body of legislative text "reuses" a a legal concept but adding a twist on it. Consider the following fictionnal (but not quite pathological computational-wise) example. ### Article 7 The justice system delivers fines to individuals when they committed an offense. The fines are determined based on the amount of taxes paid by the individual. The more taxes the invidual pays, the higher the fine. However, the determination of the amount of taxes paid by an individual in this context shall include a flat tax fee of $500 for individuals earning less than $10,000. ```catala # To compute the basis determined for issuing the fines, we create a new scope. declaration scope BasisForFineDetermination: tax_computation scope NewIncomeTaxComputationFixed # This scope will call the NewIncomeTaxComputationFixed scope that defines # the proper tax computation. input individual content Individual output basis_for_fine content money scope BasisForFineDetermination : # First, we link the inputs and outputs of the two scopes. definition tax_computation.individual equals individual definition basis_for_fine equals tax_computation.income_tax # But then, how to account for the provision of the law that reverts the # mechanism canceling taxes for individuals earning less than $10,000 dollars? # This is where the "context" concept comes into play. Indeed, we had # annotated the "income_tax" variable of "NewIncomeTaxComputationFixed" with # the "context" attribute. "context" is a variant of "input" that exposes the # variable as an input of the scope. However, it is more permissive than # "input" because it lets you re-define the "context" variable inside its # own scope. Then, you're faced with a choice for the value of "income_tax": # do you take the value coming from its definition inside # "NewIncomeTaxComputationFixed" or do you take the value coming from the # input of the scope? This dilemma is resolved in two ways. First, by looking # at the conditions of the definitions: only definitions that have a condition # evaluating to "true" at runtime will be considered. If there's only one, # the pick is easy. But what if two definitions trigger at the same time, # one from the input and one from the "NewIncomeTaxComputationFixed" scope? # Well, second, in that case we always prioritize the input definition for # picking. In Catala, the caller scope has priority over the callee scope # for "context" variable. It's as if the caller provided an extra exception # for the definition of the scope variable. # Back to our little problem, the solution is here to provide from the outside # an exceptional definition for income tax for people earning less than # $10,0000. definition tax_computation.income_tax under condition individual.income <=$ $10,000 consequence equals $500 ``` ## One variable, multiple states When a quantity is mentioned it the law, it does not always maps exactly to a unique Catala variable. More precisely, it often happens that the law defines a unique quantity with multiple computation steps, each new one building on the previous one. Here is an example of such a setup and how to deal with it thanks to a dedicated Catala feature. Under the hood, the different states of a Catala variable are implemented by distinct variables inside the lower intermediate representations of the language. ### Article 8 For taxation purposes, the values of the building operated for charity purposes can be deducted from the wealth of the individual, which is then capped at $2,500,000. ```catala declaration scope WealthTax: input value_of_buildings_used_for_charity content money input total_wealth content money internal wealth content money # After the type of the variable, we can define the ordered list of states # that the variable shall take before computing its final value. In each # state, we'll be able to refer to the value of the previous state. state total state after_charity_deductions state after_capping scope WealthTax: definition wealth state total equals total_wealth definition wealth state after_charity_deductions equals wealth -$ # Here, "wealth" refers to the state "total" value_of_buildings_used_for_charity definition wealth state after_capping equals if wealth >=$ $2,500,000 then $2,500,000 else wealth # Here, "wealth" refers to the state "after_charity_decuctions" assertion wealth >$ $0 # Outside of the definition of "wealth", "wealth" always refer to the final # state of the variable, here "after_capping". ``` ## Catala values So far, this tutorial has introduced you to the basic structure of Catala programs with scope, definitions and exceptions. But to be able to handle most statutes, Catala comes with support for the usual types of values on which legislative computations operate. ### Booleans Booleans are the most basic type of value in Catala: they can be either true or false. Conditions are simply booleans with a default value of false, see the section about conditions above. ```catala declaration scope BooleanValues: internal value1 condition internal value2 content boolean scope BooleanValues: rule value1 under condition false and true consequence fulfilled # Sets the boolean of the value1 condition to "true" under condition # "false and true" definition value2 equals value1 xor (value1 and true) = false # Boolean operators include and, or, xor ``` ### Integers Integers in Catala are infinite precision: they behave like true mathematical integers, and not like computer integers that are bounded by a maximum value due to them being stored in 32 or 64 bits. Integers can be negative. ```catala declaration scope IntegerValues: internal value1 content integer internal value2 content integer scope IntegerValues: definition value1 under condition 12 - (5 * 3) < 65 consequence equals 45 / 9 # The / operators corresponds to an integer division that truncates towards 0. definition value2 equals value1 * value1 * 65 / 100 ``` ### Decimals Decimals in Catala also have infinite precisions, behaving like true mathematical rational numbers, and not like computer floating point numbers that perform approximate computations. Operators are suffixed with ".". ```catala declaration scope DecimalValues: internal value1 content decimal internal value2 content decimal scope DecimalValues: definition value1 under condition 12.655465446655426 -. 0.45265426541654 <. 12.3554654652 consequence equals (integer_to_decimal of 45) /. (integer_to_decimal of 9) # The /. operators corresponds to an exact division. definition value2 equals value1 *. value1 *. 65% # Percentages are decimal numbers (0.65 here) ``` ### Money In Catala, money is simply represented as an integer number of cents. You cannot in Catala have a more precise amount of money than a cent. However, you can muliply an amount of money by a decimal, and the result is rounded towards the nearest cent. This principled behaviour to keep track of where you need precision in your computations and select decimals for precision instead of relying on money figures where sub-cent precision matters. Two money amounts can be divided, yielding a decimal. ```catala declaration scope MoneyValues: internal value1 content decimal internal value2 content money scope MoneyValues: definition value1 under condition 12.655465446655426 -. 0.45265426541654 <. 12.3554654652 consequence equals (integer_to_decimal of 45) /. (integer_to_decimal of 9) definition value2 equals $1.00 *$ ((($6,520.23 -$ $320.45) *$ value1) /$ $45) ``` ### Dates and durations Catala has support for Gregorian calendar dates as well as duration computations in terms of days, months and years. A difference between dates is a duration measured in days, and the addition of a date and a duration yields a new date. Durations measured in days, months or years are not mixable with each other, as months and years do not always have the same number of days. This non-mixability is not captured by the type system of Catala but will yield errors at runtime. Date literals are specified using the ISO 8601 standard to avoid confusion between american and european notations. Date operators are prefixed by "@" while duration operators are prefixed by "^". ```catala declaration scope DateValues: internal value1 content date internal value2 content duration scope DateValues: definition value1 equals |2000-01-01| +@ 1 year # yields |2001-01-01| definition value2 equals (value1 -@ |1999-12-31|) +^ 45 day # 367 + 45 days (2000 is bissextile) ``` ### Collections Often, Catala programs need to speak about a collection of data because the law talks about the number of children, the maximum of a list, etc. Catala features first-class support for collections, which are basically fixed-size lists. You can create a list, filter its elements but also aggregate over its contents to compute all sorts of values. ```catala declaration scope CollectionValues: internal value1 content collection integer internal value2 content integer scope CollectionValues: definition value1 equals [45;-6;3;4;0;2155] definition value2 equals sum integer for i in value1 of (i * i) # sum of squares ``` ## Conclusion This tutorial presents the basic concepts and syntax of the Catala language features. It is then up to you use them to annotate legislative texts with their algorithmic translation. There is no single way to write Catala programs, as the program style should be adapted to the legislation it annotates. However, Catala is a functional language at heart, so following standard functional programming design patterns should help achieve concise and readable code.