catala/examples/tutorial_en/tutorial_en.catala_en
Denis Merigoux a6338c3a9c Typos
2021-01-16 18:06:22 +01:00

391 lines
15 KiB
Plaintext

@@The Catala language tutorial@@
Welcome to this tutorial, whose objective is to guide you through the features
of the Catala language and teach you how to annotate a legislative text using
the language. This document is addressed primarily to developers or people that
have a programming background, though tech-savvy lawyers will probably figure
things out.
@@Literate programming@@+
To begin writing a Catala program, you must start from the text of the
legislative source that will justify the code that you will write. Concretely,
that means copy-pasting the text of the law into a Catala source file and
formatting it according so that Catala can understand it. Catala source files
have the ".catala_en" extension. If you were to write a Catala program for a
French law, you would use the ".catala_fr" extension.
You can write any kind of plain text in Catala, and it will be printed as is
in PDF or HTML output. You can split your text into short lines, those
will appear as a single paragraph in the output. If you want to create a
new paragrah, you have to leave a blank line in the source.
Catala allows you to declare section or subsection headers as it is done
here, with the "at" symbol repeated twice. You can define heading of lower
importance by adding increasing numbers of "+" after the title of the heading.
The fundamental division unit is the article, introduced by a single "at".
Let's analyse a fictional example that defines an income tax.
@Article 1@
The income tax for an individual is defined as a fixed percentage of the
individual's income over a year.
/*
# Welcome to the code mode of Catala. This is a comment, because the line is
# prefixed by #.
# We will soon learn what to write here in order to translate the meaning
# of the article into Catala code.
*/
To do that, we will intertwine short snippets of code between the sentences of
the legislative text. Each snippet of code should be as short as possible and
as close as possible to the actual sentence that justifies the code. This style
is called literate programming, a programming paradigm invented by the famous
computer scientist Donald Knuth in the 70s.
@@Defining a fictional income tax@@+
The content of article 1 uses a lot of implicit context: there exists an
individual with an income, as well as an income tax that the individual has
to pay each year. Even if this implicit context is not verbatim in the law,
we have to explicit it for programming purposes. Concretely, we need a
"metadata" section that defines the shape and types of the data used
inside the law.
Let's start our metadata section by declaring the type information for the
individual:
@@Begin metadata@@
/*
declaration structure Individual:
# The name of the structure "Individual", must start with an
# uppercase letter: this is the CamelCase convention.
data income content money
# In this line, "income" is the name of the structure field and
# "money" is the type of what is stored in that field.
# Available types include: integer, decimal, money, date, duration,
# and any other structure or enumeration that you declare
data number_of_children content integer
# "income" and "number_of_children" start by a lowercase letter,
# they follow the snake_case convention
*/
@@End metadata@@
This structure contains two data fields, "income" and "number_of_children". Structures are
useful to group together data that goes together. Usually, you
get one structure per concrete object on which the law applies (like the
individual). It is up to you to decide how to group the data together,
but you should aim to optimize code readability.
Sometimes, the law gives an enumeration of different situations. These
enumerations are modeled in Catala using an enumeration type, like:
@@Begin metadata@@
/*
declaration enumeration TaxCredit:
# The name "TaxCredit" is also written in CamelCase
-- NoTaxCredit
# This line says that "TaxCredit" can be a "NoTaxCredit" situation
-- ChildrenTaxCredit content integer
# This line says that alternatively, "TaxCredit" can be a
# "ChildrenTaxCredit" situation. This situation carries a content
# of type integer corresponding to the number of children concerned
# by the tax credit. This means that if you're in the "ChildrenTaxCredit"
# situation, you will also have access to this number of children
*/
@@End metadata@@
In computer science terms, such an enumeration is called a "sum type" or simply
an enum. The combination of structures and enumerations allow the Catala
programmer to declare all possible shapes of data, as they are equivalent to
the powerful notion of "algebraic datatypes".
We've defined and typed the data that the program will manipulate. Now we have
to define the logical context in which these data will evolve. This is done in
Catala using "scopes". Scopes are close to functions in terms of traditional
programming. Scopes also have to be declared in metadata, so here we go:
@@Begin metadata@@
/*
declaration scope IncomeTaxComputation:
# Scope names use CamelCase
context individual content Individual
# This line declares a context element of the scope, which is akin to
# a function parameter in computer science term. This is the piece of
# data on which the scope will operate
context fixed_percentage content decimal
context income_tax content money
*/
@@End metadata@@
We now have everything to annotate the contents of article 1, which is copied
over below.
@Article 1@
The income tax for an individual is defined as a fixed percentage of the
individual's income over a year.
/*
scope IncomeTaxComputation:
definition income_tax equals
individual.income *$ fixed_percentage
*/
In the code, we are defining inside our scope the amount of the income tax
according to the formula described in the article. When defining formulae,
you have access to all the usual arithmetic operators: addition "+",
substraction "-", multiplication "*" and division (slash).
However, in the Catala code, you can see that we use "*$" to multiply the
individual income by the fixed percentage. The $ suffix indicates that we
are performing a multiplication on an amount of money. Indeed, in Catala,
you have to keep track of what you are dealing with: is it money ? Is it
an integer? Using just "+" or "*" can be ambiguous in terms of rounding,
since money is usually rounded at the cent. So to disambiguate, we suffix these
operations with something that indicates the type of what we manipulate.
The suffixes are "$" for money "." for decimals, "at" (like in email adresses)
for dates and the hat symbol for durations. If you forget the suffix, the Catala type
checker will display an error message that will help you put it where it
belongs.
But inside article 1, one question remains unknown: what is the value of
of the fixed percentage? Often, precise values are defined elsewhere in the
legislative source. Here, let's suppose we have:
@Article 2@
The fixed percentage mentionned at article 1 is equal to 20 %.
/*
scope IncomeTaxComputation:
definition fixed_percentage equals 20 %
# Writing 20% is just an abbreviation for 0.20
*/
You can see here that Catala allows definitions to be scattered throughout
the annotation of the legislative text, so that each
definition is as close as possible to its location in the text.
@@Conditional definitions@@+
So far so good, but now the legislative text introduces some trickiness. Let us
suppose the third article says:
@Article 3@ If the individual is in charge of 2 or more children, then the fixed
percentage mentionned at article 1 is equal to 15 %.
/*
# How to redefine fixed_percentage?
*/
This article actually gives another definition for the fixed percentage, which
was already defined in article 2. However, article 3 defines the percentage
conditionnally to the individual having more than 2 children. Catala allows
you precisely to redefine a variable under a condition:
/*
scope IncomeTaxComputation:
definition fixed_percentage under condition
individual.number_of_children >= 2
consequence equals 15 %
# Writing 15% is just an abbreviation for 0.15
*/
When the Catala program will execute, the right definition will be dynamically
chosen by looking at which condition is true. A correctly drafted legislative
source should always ensure that at most one condition is true at all times.
However, if it is not the case, Catala will let you define a precedence on the
conditions, which has to be justified by the law.
@@Functions@@+
Catala lets you define functions anywhere in your data. Here's what it looks
like in the metadata definition when we want to define a two-brackets tax
computation:
@@Begin metadata@@
/*
declaration structure TwoBrackets:
data breakpoint content money
data rate1 content decimal
data rate2 content decimal
declaration scope TwoBracketsTaxComputation :
context brackets content TwoBrackets
context tax_formula content money depends on money
*/
@@End metadata@@
And in the code:
@Article4@ The tax amount for a two-brackets computation is equal to the amount
of income in each bracket multiplied by the rate of each bracket.
/*
scope TwoBracketsTaxComputation :
definition tax_formula of income equals
if income <=$ brackets.breakpoint then
income *$ brackets.rate1
else (
brackets.breakpoint *$ brackets.rate1 +$
(income -$ brackets.breakpoint) *$ brackets.rate2
)
*/
@@Scope inclusion@@+
Now that we've defined our helper scope for computing a two-brackets tax, we
want to use it in our main tax computation scope.
@Article 5@ For individuals whose income is greater than $100,000, the income
tax of article 1 is 40% of the income above $100,000. Below $100,000, the
income tax is 20% of the income.
/*
declaration scope NewIncomeTaxComputation:
context two_brackets scope TwoBracketsTaxComputation
# This line says that we add the item two_brackets to the context.
# However, the "scope" keyword tells that this item is not a piece of data
# but rather a subscope that we can use to compute things.
context individual content Individual
context income_tax content money
scope NewIncomeTaxComputation :
definition two_brackets.brackets equals TwoBrackets {
-- breakpoint: $100,000
-- rate1: 20%
-- rate2: 40%
}
definition income_tax equals two_brackets.tax_formula of individual.income
*/
@Article 6@
Individuals earning less than $10,000 are exempted of the income tax mentionned
at article 1.
/*
scope NewIncomeTaxComputation:
definition income_tax under condition
individual.income <=$ $10,000
consequence equals $0
*/
That's it! We've defined a two-brackets tax computation simply by annotating
legislative article by snippets of Catala code. However, attentive readers
may have caught something weird in articles 5 and 6. What happens when the
income of the individual is between $10,000 and $100,000 ?
The law leaves it unspecified ; our dummy articles are clearly badly drafted.
But Catala can help you find this sort of errors via simple testing or
even formal verification. Let's start with the testing.
@@Testing Catala programs@@+
Testing Catala programs can be done directly into Catala. Indeed, writing test
cases for each Catala scope that you define is a good practice called
"unit testing" in the software engineering community. A test case is defined
as another scope:
@Testing NewIncomeTaxComputation@
/*
declaration scope Test1:
context tax_computation scope NewIncomeTaxComputation
context income_tax content money
scope Test1:
definition
tax_computation.individual
# We define the argument to the subscope
equals
# The four lines below define a whole structure by giving a value to
# each of its fields
Individual {
-- income: $230,000
-- number_of_children: 0
}
definition income_tax equals tax_computation.income_tax
# Next, we retrieve the income tax value compute it by the subscope and
# assert that it is equal to the expected value :
# ($230,000-$100,00)*40%+$100,000*20% = $72,000
assertion income_tax = $72,000
*/
This test should pass. Let us now consider a failing test case:
/*
declaration scope Test2:
context tax_computation scope NewIncomeTaxComputation
context income_tax content money
scope Test2:
definition tax_computation.individual equals Individual {
-- income: $4,000
-- number_of_children: 0
}
definition income_tax equals tax_computation.income_tax
assertion income_tax = $0
*/
This test case should compute a $0 income tax because of Article 6. But instead,
execution will yield an error saying that there is a conflict between rules.
@@Defining exceptions to rules@@+
Indeed, the definition of the income tax in article 6 conflicts with the
definition of income tax in article 5. But actually, article 6 is just an
exception of article 5. In the law, it is implicit that if article 6 is
applicable, then it takes precedence over article 5.
@Fixing the computation@
This implicit precedence has to be explicitely declared in Catala. Here is a
fixed version of the NewIncomeTaxComputation scope:
/*
declaration scope NewIncomeTaxComputationFixed:
context two_brackets scope TwoBracketsTaxComputation
context individual content Individual
context income_tax content money
scope NewIncomeTaxComputationFixed :
definition two_brackets.brackets equals TwoBrackets {
-- breakpoint: $100,000
-- rate1: 20%
-- rate2: 40%
}
# To define an exception to a rule, you have to first label the rule that
# you want to attach the exception to. You can put any snake_case identifier
# for the label
label article_5
definition income_tax equals two_brackets.tax_formula of individual.income
# Then, you can declare the exception by referring back to the label
exception article_5
definition income_tax under condition
individual.income <=$ $10,000
consequence equals $0
*/
And the test that should now work:
/*
declaration scope Test3:
context tax_computation scope NewIncomeTaxComputationFixed
context income_tax content money
scope Test3:
definition tax_computation.individual equals Individual {
-- income: $4,000
-- number_of_children: 0
}
definition income_tax equals tax_computation.income_tax
assertion income_tax = $0
*/
@@Conclusion@@+
This tutorial presents the basic concepts and syntax of the Catala language
features. It is then up to you use them to annotate legislative texts
with their algorithmic translation.
There is no single way to write Catala programs, as the program style should be
adapted to the legislation it annotates. However, Catala is a functional
language at heart, so following standard functional programming design patterns
should help achieve concise and readable code.