hledger/hledger-lib/hledger_csv.5
2020-12-15 09:03:38 -08:00

1301 lines
39 KiB
Groff

.\"t
.TH "hledger_csv" "5" "December 2020" "hledger 1.20.99" "hledger User Manuals"
.SH NAME
.PP
CSV - how hledger reads CSV data, and the CSV rules file format
.SH DESCRIPTION
.PP
hledger can read CSV files (Character Separated Value - usually comma,
semicolon, or tab) containing dated records as if they were journal
files, automatically converting each CSV record into a transaction.
.PP
(To learn about \f[I]writing\f[R] CSV, see CSV output.)
.PP
We describe each CSV file\[aq]s format with a corresponding \f[I]rules
file\f[R].
By default this is named like the CSV file with a \f[C].rules\f[R]
extension added.
Eg when reading \f[C]FILE.csv\f[R], hledger also looks for
\f[C]FILE.csv.rules\f[R] in the same directory as \f[C]FILE.csv\f[R].
You can specify a different rules file with the \f[C]--rules-file\f[R]
option.
If a rules file is not found, hledger will create a sample rules file,
which you\[aq]ll need to adjust.
.PP
This file contains rules describing the CSV data (header line, fields
layout, date format etc.), and how to construct hledger journal entries
(transactions) from it.
Often there will also be a list of conditional rules for categorising
transactions based on their descriptions.
Here\[aq]s an overview of the CSV rules; these are described more fully
below, after the examples:
.PP
.TS
tab(@);
lw(30.1n) lw(39.9n).
T{
\f[B]\f[CB]skip\f[B]\f[R]
T}@T{
skip one or more header lines or matched CSV records
T}
T{
\f[B]\f[CB]fields\f[B]\f[R]
T}@T{
name CSV fields, assign them to hledger fields
T}
T{
\f[B]field assignment\f[R]
T}@T{
assign a value to one hledger field, with interpolation
T}
T{
\f[B]\f[CB]separator\f[B]\f[R]
T}@T{
a custom field separator
T}
T{
\f[B]\f[CB]if\f[B] block\f[R]
T}@T{
apply some rules to CSV records matched by patterns
T}
T{
\f[B]\f[CB]if\f[B] table\f[R]
T}@T{
apply some rules to CSV records matched by patterns, alternate syntax
T}
T{
\f[B]\f[CB]end\f[B]\f[R]
T}@T{
skip the remaining CSV records
T}
T{
\f[B]\f[CB]date-format\f[B]\f[R]
T}@T{
how to parse dates in CSV records
T}
T{
\f[B]\f[CB]decimal-mark\f[B]\f[R]
T}@T{
the decimal mark used in CSV amounts, if ambiguous
T}
T{
\f[B]\f[CB]newest-first\f[B]\f[R]
T}@T{
disambiguate record order when there\[aq]s only one date
T}
T{
\f[B]\f[CB]include\f[B]\f[R]
T}@T{
inline another CSV rules file
T}
T{
\f[B]\f[CB]balance-type\f[B]\f[R]
T}@T{
choose which type of balance assignments to use
T}
.TE
.PP
Note, for best error messages when reading CSV files, use a
\f[C].csv\f[R], \f[C].tsv\f[R] or \f[C].ssv\f[R] file extension or file
prefix - see File Extension below.
.PP
There\[aq]s an introductory Convert CSV files tutorial on hledger.org.
.SH EXAMPLES
.PP
Here are some sample hledger CSV rules files.
See also the full collection at:
.PD 0
.P
.PD
https://github.com/simonmichael/hledger/tree/master/examples/csv
.SS Basic
.PP
At minimum, the rules file must identify the date and amount fields, and
often it also specifies the date format and how many header lines there
are.
Here\[aq]s a simple CSV file and a rules file for it:
.IP
.nf
\f[C]
Date, Description, Id, Amount
12/11/2019, Foo, 123, 10.23
\f[R]
.fi
.IP
.nf
\f[C]
# basic.csv.rules
skip 1
fields date, description, _, amount
date-format %d/%m/%Y
\f[R]
.fi
.IP
.nf
\f[C]
$ hledger print -f basic.csv
2019-11-12 Foo
expenses:unknown 10.23
income:unknown -10.23
\f[R]
.fi
.PP
Default account names are chosen, since we didn\[aq]t set them.
.SS Bank of Ireland
.PP
Here\[aq]s a CSV with two amount fields (Debit and Credit), and a
balance field, which we can use to add balance assertions, which is not
necessary but provides extra error checking:
.IP
.nf
\f[C]
Date,Details,Debit,Credit,Balance
07/12/2012,LODGMENT 529898,,10.0,131.21
07/12/2012,PAYMENT,5,,126
\f[R]
.fi
.IP
.nf
\f[C]
# bankofireland-checking.csv.rules
# skip the header line
skip
# name the csv fields, and assign some of them as journal entry fields
fields date, description, amount-out, amount-in, balance
# We generate balance assertions by assigning to \[dq]balance\[dq]
# above, but you may sometimes need to remove these because:
#
# - the CSV balance differs from the true balance,
# by up to 0.0000000000005 in my experience
#
# - it is sometimes calculated based on non-chronological ordering,
# eg when multiple transactions clear on the same day
# date is in UK/Ireland format
date-format %d/%m/%Y
# set the currency
currency EUR
# set the base account for all txns
account1 assets:bank:boi:checking
\f[R]
.fi
.IP
.nf
\f[C]
$ hledger -f bankofireland-checking.csv print
2012-12-07 LODGMENT 529898
assets:bank:boi:checking EUR10.0 = EUR131.2
income:unknown EUR-10.0
2012-12-07 PAYMENT
assets:bank:boi:checking EUR-5.0 = EUR126.0
expenses:unknown EUR5.0
\f[R]
.fi
.PP
The balance assertions don\[aq]t raise an error above, because we\[aq]re
reading directly from CSV, but they will be checked if these entries are
imported into a journal file.
.SS Amazon
.PP
Here we convert amazon.com order history, and use an if block to
generate a third posting if there\[aq]s a fee.
(In practice you\[aq]d probably get this data from your bank instead,
but it\[aq]s an example.)
.IP
.nf
\f[C]
\[dq]Date\[dq],\[dq]Type\[dq],\[dq]To/From\[dq],\[dq]Name\[dq],\[dq]Status\[dq],\[dq]Amount\[dq],\[dq]Fees\[dq],\[dq]Transaction ID\[dq]
\[dq]Jul 29, 2012\[dq],\[dq]Payment\[dq],\[dq]To\[dq],\[dq]Foo.\[dq],\[dq]Completed\[dq],\[dq]$20.00\[dq],\[dq]$0.00\[dq],\[dq]16000000000000DGLNJPI1P9B8DKPVHL\[dq]
\[dq]Jul 30, 2012\[dq],\[dq]Payment\[dq],\[dq]To\[dq],\[dq]Adapteva, Inc.\[dq],\[dq]Completed\[dq],\[dq]$25.00\[dq],\[dq]$1.00\[dq],\[dq]17LA58JSKRD4HDGLNJPI1P9B8DKPVHL\[dq]
\f[R]
.fi
.IP
.nf
\f[C]
# amazon-orders.csv.rules
# skip one header line
skip 1
# name the csv fields, and assign the transaction\[aq]s date, amount and code.
# Avoided the \[dq]status\[dq] and \[dq]amount\[dq] hledger field names to prevent confusion.
fields date, _, toorfrom, name, amzstatus, amzamount, fees, code
# how to parse the date
date-format %b %-d, %Y
# combine two fields to make the description
description %toorfrom %name
# save the status as a tag
comment status:%amzstatus
# set the base account for all transactions
account1 assets:amazon
# leave amount1 blank so it can balance the other(s).
# I\[aq]m assuming amzamount excludes the fees, don\[aq]t remember
# set a generic account2
account2 expenses:misc
amount2 %amzamount
# and maybe refine it further:
#include categorisation.rules
# add a third posting for fees, but only if they are non-zero.
if %fees [1-9]
account3 expenses:fees
amount3 %fees
\f[R]
.fi
.IP
.nf
\f[C]
$ hledger -f amazon-orders.csv print
2012-07-29 (16000000000000DGLNJPI1P9B8DKPVHL) To Foo. ; status:Completed
assets:amazon
expenses:misc $20.00
2012-07-30 (17LA58JSKRD4HDGLNJPI1P9B8DKPVHL) To Adapteva, Inc. ; status:Completed
assets:amazon
expenses:misc $25.00
expenses:fees $1.00
\f[R]
.fi
.SS Paypal
.PP
Here\[aq]s a real-world rules file for (customised) Paypal CSV, with
some Paypal-specific rules, and a second rules file included:
.IP
.nf
\f[C]
\[dq]Date\[dq],\[dq]Time\[dq],\[dq]TimeZone\[dq],\[dq]Name\[dq],\[dq]Type\[dq],\[dq]Status\[dq],\[dq]Currency\[dq],\[dq]Gross\[dq],\[dq]Fee\[dq],\[dq]Net\[dq],\[dq]From Email Address\[dq],\[dq]To Email Address\[dq],\[dq]Transaction ID\[dq],\[dq]Item Title\[dq],\[dq]Item ID\[dq],\[dq]Reference Txn ID\[dq],\[dq]Receipt ID\[dq],\[dq]Balance\[dq],\[dq]Note\[dq]
\[dq]10/01/2019\[dq],\[dq]03:46:20\[dq],\[dq]PDT\[dq],\[dq]Calm Radio\[dq],\[dq]Subscription Payment\[dq],\[dq]Completed\[dq],\[dq]USD\[dq],\[dq]-6.99\[dq],\[dq]0.00\[dq],\[dq]-6.99\[dq],\[dq]simon\[at]joyful.com\[dq],\[dq]memberships\[at]calmradio.com\[dq],\[dq]60P57143A8206782E\[dq],\[dq]MONTHLY - $1 for the first 2 Months: Me - Order 99309. Item total: $1.00 USD first 2 months, then $6.99 / Month\[dq],\[dq]\[dq],\[dq]I-R8YLY094FJYR\[dq],\[dq]\[dq],\[dq]-6.99\[dq],\[dq]\[dq]
\[dq]10/01/2019\[dq],\[dq]03:46:20\[dq],\[dq]PDT\[dq],\[dq]\[dq],\[dq]Bank Deposit to PP Account \[dq],\[dq]Pending\[dq],\[dq]USD\[dq],\[dq]6.99\[dq],\[dq]0.00\[dq],\[dq]6.99\[dq],\[dq]\[dq],\[dq]simon\[at]joyful.com\[dq],\[dq]0TU1544T080463733\[dq],\[dq]\[dq],\[dq]\[dq],\[dq]60P57143A8206782E\[dq],\[dq]\[dq],\[dq]0.00\[dq],\[dq]\[dq]
\[dq]10/01/2019\[dq],\[dq]08:57:01\[dq],\[dq]PDT\[dq],\[dq]Patreon\[dq],\[dq]PreApproved Payment Bill User Payment\[dq],\[dq]Completed\[dq],\[dq]USD\[dq],\[dq]-7.00\[dq],\[dq]0.00\[dq],\[dq]-7.00\[dq],\[dq]simon\[at]joyful.com\[dq],\[dq]support\[at]patreon.com\[dq],\[dq]2722394R5F586712G\[dq],\[dq]Patreon* Membership\[dq],\[dq]\[dq],\[dq]B-0PG93074E7M86381M\[dq],\[dq]\[dq],\[dq]-7.00\[dq],\[dq]\[dq]
\[dq]10/01/2019\[dq],\[dq]08:57:01\[dq],\[dq]PDT\[dq],\[dq]\[dq],\[dq]Bank Deposit to PP Account \[dq],\[dq]Pending\[dq],\[dq]USD\[dq],\[dq]7.00\[dq],\[dq]0.00\[dq],\[dq]7.00\[dq],\[dq]\[dq],\[dq]simon\[at]joyful.com\[dq],\[dq]71854087RG994194F\[dq],\[dq]Patreon* Membership\[dq],\[dq]\[dq],\[dq]2722394R5F586712G\[dq],\[dq]\[dq],\[dq]0.00\[dq],\[dq]\[dq]
\[dq]10/19/2019\[dq],\[dq]03:02:12\[dq],\[dq]PDT\[dq],\[dq]Wikimedia Foundation, Inc.\[dq],\[dq]Subscription Payment\[dq],\[dq]Completed\[dq],\[dq]USD\[dq],\[dq]-2.00\[dq],\[dq]0.00\[dq],\[dq]-2.00\[dq],\[dq]simon\[at]joyful.com\[dq],\[dq]tle\[at]wikimedia.org\[dq],\[dq]K9U43044RY432050M\[dq],\[dq]Monthly donation to the Wikimedia Foundation\[dq],\[dq]\[dq],\[dq]I-R5C3YUS3285L\[dq],\[dq]\[dq],\[dq]-2.00\[dq],\[dq]\[dq]
\[dq]10/19/2019\[dq],\[dq]03:02:12\[dq],\[dq]PDT\[dq],\[dq]\[dq],\[dq]Bank Deposit to PP Account \[dq],\[dq]Pending\[dq],\[dq]USD\[dq],\[dq]2.00\[dq],\[dq]0.00\[dq],\[dq]2.00\[dq],\[dq]\[dq],\[dq]simon\[at]joyful.com\[dq],\[dq]3XJ107139A851061F\[dq],\[dq]\[dq],\[dq]\[dq],\[dq]K9U43044RY432050M\[dq],\[dq]\[dq],\[dq]0.00\[dq],\[dq]\[dq]
\[dq]10/22/2019\[dq],\[dq]05:07:06\[dq],\[dq]PDT\[dq],\[dq]Noble Benefactor\[dq],\[dq]Subscription Payment\[dq],\[dq]Completed\[dq],\[dq]USD\[dq],\[dq]10.00\[dq],\[dq]-0.59\[dq],\[dq]9.41\[dq],\[dq]noble\[at]bene.fac.tor\[dq],\[dq]simon\[at]joyful.com\[dq],\[dq]6L8L1662YP1334033\[dq],\[dq]Joyful Systems\[dq],\[dq]\[dq],\[dq]I-KC9VBGY2GWDB\[dq],\[dq]\[dq],\[dq]9.41\[dq],\[dq]\[dq]
\f[R]
.fi
.IP
.nf
\f[C]
# paypal-custom.csv.rules
# Tips:
# Export from Activity -> Statements -> Custom -> Activity download
# Suggested transaction type: \[dq]Balance affecting\[dq]
# Paypal\[aq]s default fields in 2018 were:
# \[dq]Date\[dq],\[dq]Time\[dq],\[dq]TimeZone\[dq],\[dq]Name\[dq],\[dq]Type\[dq],\[dq]Status\[dq],\[dq]Currency\[dq],\[dq]Gross\[dq],\[dq]Fee\[dq],\[dq]Net\[dq],\[dq]From Email Address\[dq],\[dq]To Email Address\[dq],\[dq]Transaction ID\[dq],\[dq]Shipping Address\[dq],\[dq]Address Status\[dq],\[dq]Item Title\[dq],\[dq]Item ID\[dq],\[dq]Shipping and Handling Amount\[dq],\[dq]Insurance Amount\[dq],\[dq]Sales Tax\[dq],\[dq]Option 1 Name\[dq],\[dq]Option 1 Value\[dq],\[dq]Option 2 Name\[dq],\[dq]Option 2 Value\[dq],\[dq]Reference Txn ID\[dq],\[dq]Invoice Number\[dq],\[dq]Custom Number\[dq],\[dq]Quantity\[dq],\[dq]Receipt ID\[dq],\[dq]Balance\[dq],\[dq]Address Line 1\[dq],\[dq]Address Line 2/District/Neighborhood\[dq],\[dq]Town/City\[dq],\[dq]State/Province/Region/County/Territory/Prefecture/Republic\[dq],\[dq]Zip/Postal Code\[dq],\[dq]Country\[dq],\[dq]Contact Phone Number\[dq],\[dq]Subject\[dq],\[dq]Note\[dq],\[dq]Country Code\[dq],\[dq]Balance Impact\[dq]
# This rules file assumes the following more detailed fields, configured in \[dq]Customize report fields\[dq]:
# \[dq]Date\[dq],\[dq]Time\[dq],\[dq]TimeZone\[dq],\[dq]Name\[dq],\[dq]Type\[dq],\[dq]Status\[dq],\[dq]Currency\[dq],\[dq]Gross\[dq],\[dq]Fee\[dq],\[dq]Net\[dq],\[dq]From Email Address\[dq],\[dq]To Email Address\[dq],\[dq]Transaction ID\[dq],\[dq]Item Title\[dq],\[dq]Item ID\[dq],\[dq]Reference Txn ID\[dq],\[dq]Receipt ID\[dq],\[dq]Balance\[dq],\[dq]Note\[dq]
fields date, time, timezone, description_, type, status_, currency, grossamount, feeamount, netamount, fromemail, toemail, code, itemtitle, itemid, referencetxnid, receiptid, balance, note
skip 1
date-format %-m/%-d/%Y
# ignore some paypal events
if
In Progress
Temporary Hold
Update to
skip
# add more fields to the description
description %description_ %itemtitle
# save some other fields as tags
comment itemid:%itemid, fromemail:%fromemail, toemail:%toemail, time:%time, type:%type, status:%status_
# convert to short currency symbols
if %currency USD
currency $
if %currency EUR
currency E
if %currency GBP
currency P
# generate postings
# the first posting will be the money leaving/entering my paypal account
# (negative means leaving my account, in all amount fields)
account1 assets:online:paypal
amount1 %netamount
# the second posting will be money sent to/received from other party
# (account2 is set below)
amount2 -%grossamount
# if there\[aq]s a fee, add a third posting for the money taken by paypal.
if %feeamount [1-9]
account3 expenses:banking:paypal
amount3 -%feeamount
comment3 business:
# choose an account for the second posting
# override the default account names:
# if the amount is positive, it\[aq]s income (a debit)
if %grossamount \[ha][\[ha]-]
account2 income:unknown
# if negative, it\[aq]s an expense (a credit)
if %grossamount \[ha]-
account2 expenses:unknown
# apply common rules for setting account2 & other tweaks
include common.rules
# apply some overrides specific to this csv
# Transfers from/to bank. These are usually marked Pending,
# which can be disregarded in this case.
if
Bank Account
Bank Deposit to PP Account
description %type for %referencetxnid %itemtitle
account2 assets:bank:wf:pchecking
account1 assets:online:paypal
# Currency conversions
if Currency Conversion
account2 equity:currency conversion
\f[R]
.fi
.IP
.nf
\f[C]
# common.rules
if
darcs
noble benefactor
account2 revenues:foss donations:darcshub
comment2 business:
if
Calm Radio
account2 expenses:online:apps
if
electronic frontier foundation
Patreon
wikimedia
Advent of Code
account2 expenses:dues
if Google
account2 expenses:online:apps
description google | music
\f[R]
.fi
.IP
.nf
\f[C]
$ hledger -f paypal-custom.csv print
2019-10-01 (60P57143A8206782E) Calm Radio MONTHLY - $1 for the first 2 Months: Me - Order 99309. Item total: $1.00 USD first 2 months, then $6.99 / Month ; itemid:, fromemail:simon\[at]joyful.com, toemail:memberships\[at]calmradio.com, time:03:46:20, type:Subscription Payment, status:Completed
assets:online:paypal $-6.99 = $-6.99
expenses:online:apps $6.99
2019-10-01 (0TU1544T080463733) Bank Deposit to PP Account for 60P57143A8206782E ; itemid:, fromemail:, toemail:simon\[at]joyful.com, time:03:46:20, type:Bank Deposit to PP Account, status:Pending
assets:online:paypal $6.99 = $0.00
assets:bank:wf:pchecking $-6.99
2019-10-01 (2722394R5F586712G) Patreon Patreon* Membership ; itemid:, fromemail:simon\[at]joyful.com, toemail:support\[at]patreon.com, time:08:57:01, type:PreApproved Payment Bill User Payment, status:Completed
assets:online:paypal $-7.00 = $-7.00
expenses:dues $7.00
2019-10-01 (71854087RG994194F) Bank Deposit to PP Account for 2722394R5F586712G Patreon* Membership ; itemid:, fromemail:, toemail:simon\[at]joyful.com, time:08:57:01, type:Bank Deposit to PP Account, status:Pending
assets:online:paypal $7.00 = $0.00
assets:bank:wf:pchecking $-7.00
2019-10-19 (K9U43044RY432050M) Wikimedia Foundation, Inc. Monthly donation to the Wikimedia Foundation ; itemid:, fromemail:simon\[at]joyful.com, toemail:tle\[at]wikimedia.org, time:03:02:12, type:Subscription Payment, status:Completed
assets:online:paypal $-2.00 = $-2.00
expenses:dues $2.00
expenses:banking:paypal ; business:
2019-10-19 (3XJ107139A851061F) Bank Deposit to PP Account for K9U43044RY432050M ; itemid:, fromemail:, toemail:simon\[at]joyful.com, time:03:02:12, type:Bank Deposit to PP Account, status:Pending
assets:online:paypal $2.00 = $0.00
assets:bank:wf:pchecking $-2.00
2019-10-22 (6L8L1662YP1334033) Noble Benefactor Joyful Systems ; itemid:, fromemail:noble\[at]bene.fac.tor, toemail:simon\[at]joyful.com, time:05:07:06, type:Subscription Payment, status:Completed
assets:online:paypal $9.41 = $9.41
revenues:foss donations:darcshub $-10.00 ; business:
expenses:banking:paypal $0.59 ; business:
\f[R]
.fi
.SH CSV RULES
.PP
The following kinds of rule can appear in the rules file, in any order.
Blank lines and lines beginning with \f[C]#\f[R] or \f[C];\f[R] are
ignored.
.SS \f[C]skip\f[R]
.IP
.nf
\f[C]
skip N
\f[R]
.fi
.PP
The word \[dq]skip\[dq] followed by a number (or no number, meaning 1)
tells hledger to ignore this many non-empty lines preceding the CSV
data.
(Empty/blank lines are skipped automatically.) You\[aq]ll need this
whenever your CSV data contains header lines.
.PP
It also has a second purpose: it can be used inside if blocks to ignore
certain CSV records (described below).
.SS \f[C]fields\f[R]
.IP
.nf
\f[C]
fields FIELDNAME1, FIELDNAME2, ...
\f[R]
.fi
.PP
A fields list (the word \[dq]fields\[dq] followed by comma-separated
field names) is the quick way to assign CSV field values to hledger
fields.
It does two things:
.IP "1." 3
it names the CSV fields.
This is optional, but can be convenient later for interpolating them.
.IP "2." 3
when you use a standard hledger field name, it assigns the CSV value to
that part of the hledger transaction.
.PP
Here\[aq]s an example that says \[dq]use the 1st, 2nd and 4th fields as
the transaction\[aq]s date, description and amount; name the last two
fields for later reference; and ignore the others\[dq]:
.IP
.nf
\f[C]
fields date, description, , amount, , , somefield, anotherfield
\f[R]
.fi
.PP
Field names may not contain whitespace.
Fields you don\[aq]t care about can be left unnamed.
Currently there must be least two items (there must be at least one
comma).
.PP
Note, always use comma in the fields list, even if your CSV uses another
separator character.
.PP
Here are the standard hledger field/pseudo-field names.
For more about the transaction parts they refer to, see the manual for
hledger\[aq]s journal format.
.SS Transaction field names
.PP
\f[C]date\f[R], \f[C]date2\f[R], \f[C]status\f[R], \f[C]code\f[R],
\f[C]description\f[R], \f[C]comment\f[R] can be used to form the
transaction\[aq]s first line.
.SS Posting field names
.SS account
.PP
\f[C]accountN\f[R], where N is 1 to 99, causes a posting to be
generated, with that account name.
.PP
Most often there are two postings, so you\[aq]ll want to set
\f[C]account1\f[R] and \f[C]account2\f[R].
Typically \f[C]account1\f[R] is associated with the CSV file, and is set
once with a top-level assignment, while \f[C]account2\f[R] is set based
on each transaction\[aq]s description, and in conditional blocks.
.PP
If a posting\[aq]s account name is left unset but its amount is set (see
below), a default account name will be chosen (like
\[dq]expenses:unknown\[dq] or \[dq]income:unknown\[dq]).
.SS amount
.PP
\f[C]amountN\f[R] sets posting N\[aq]s amount.
If the CSV uses separate fields for inflows and outflows, you can use
\f[C]amountN-in\f[R] and \f[C]amountN-out\f[R] instead.
By assigning to \f[C]amount1\f[R], \f[C]amount2\f[R], ...
etc.
you can generate anywhere from 0 to 99 postings.
.PP
There is also an older, unnumbered form of these names, suitable for
2-posting transactions, which sets both posting 1\[aq]s and (negated)
posting 2\[aq]s amount: \f[C]amount\f[R], or \f[C]amount-in\f[R] and
\f[C]amount-out\f[R].
This is still supported because it keeps pre-hledger-1.17 csv rules
files working, and because it can be more succinct, and because it
converts posting 2\[aq]s amount to cost if there\[aq]s a transaction
price, which can be useful.
.PP
If you have an existing rules file using the unnumbered form, you might
want to use the numbered form in certain conditional blocks, without
having to update and retest all the old rules.
To facilitate this, posting 1 ignores
\f[C]amount\f[R]/\f[C]amount-in\f[R]/\f[C]amount-out\f[R] if any of
\f[C]amount1\f[R]/\f[C]amount1-in\f[R]/\f[C]amount1-out\f[R] are
assigned, and posting 2 ignores them if any of
\f[C]amount2\f[R]/\f[C]amount2-in\f[R]/\f[C]amount2-out\f[R] are
assigned, avoiding conflicts.
.SS currency
.PP
If the CSV has the currency symbol in a separate field (ie, not part of
the amount field), you can use \f[C]currencyN\f[R] to prepend it to
posting N\[aq]s amount.
Or, \f[C]currency\f[R] with no number affects all postings.
.SS balance
.PP
\f[C]balanceN\f[R] sets a balance assertion amount (or if the posting
amount is left empty, a balance assignment) on posting N.
.PP
Also, for compatibility with hledger <1.17: \f[C]balance\f[R] with no
number is equivalent to \f[C]balance1\f[R].
.PP
You can adjust the type of assertion/assignment with the
\f[C]balance-type\f[R] rule (see below).
.SS comment
.PP
Finally, \f[C]commentN\f[R] sets a comment on the Nth posting.
Comments can also contain tags, as usual.
.PP
See TIPS below for more about setting amounts and currency.
.SS field assignment
.IP
.nf
\f[C]
HLEDGERFIELDNAME FIELDVALUE
\f[R]
.fi
.PP
Instead of or in addition to a fields list, you can use a \[dq]field
assignment\[dq] rule to set the value of a single hledger field, by
writing its name (any of the standard hledger field names above)
followed by a text value.
The value may contain interpolated CSV fields, referenced by their
1-based position in the CSV record (\f[C]%N\f[R]), or by the name they
were given in the fields list (\f[C]%CSVFIELDNAME\f[R]).
Some examples:
.IP
.nf
\f[C]
# set the amount to the 4th CSV field, with \[dq] USD\[dq] appended
amount %4 USD
# combine three fields to make a comment, containing note: and date: tags
comment note: %somefield - %anotherfield, date: %1
\f[R]
.fi
.PP
Interpolation strips outer whitespace (so a CSV value like
\f[C]\[dq] 1 \[dq]\f[R] becomes \f[C]1\f[R] when interpolated) (#1051).
See TIPS below for more about referencing other fields.
.SS \f[C]separator\f[R]
.PP
You can use the \f[C]separator\f[R] rule to read other kinds of
character-separated data.
The argument is any single separator character, or the words
\f[C]tab\f[R] or \f[C]space\f[R] (case insensitive).
Eg, for comma-separated values (CSV):
.IP
.nf
\f[C]
separator ,
\f[R]
.fi
.PP
or for semicolon-separated values (SSV):
.IP
.nf
\f[C]
separator ;
\f[R]
.fi
.PP
or for tab-separated values (TSV):
.IP
.nf
\f[C]
separator TAB
\f[R]
.fi
.PP
If the input file has a \f[C].csv\f[R], \f[C].ssv\f[R] or \f[C].tsv\f[R]
file extension (or a \f[C]csv:\f[R], \f[C]ssv:\f[R], \f[C]tsv:\f[R]
prefix), the appropriate separator will be inferred automatically, and
you won\[aq]t need this rule.
.SS \f[C]if\f[R] block
.IP
.nf
\f[C]
if MATCHER
RULE
if
MATCHER
MATCHER
MATCHER
RULE
RULE
\f[R]
.fi
.PP
Conditional blocks (\[dq]if blocks\[dq]) are a block of rules that are
applied only to CSV records which match certain patterns.
They are often used for customising account names based on transaction
descriptions.
.SS Matching the whole record
.PP
Each MATCHER can be a record matcher, which looks like this:
.IP
.nf
\f[C]
REGEX
\f[R]
.fi
.PP
REGEX is a case-insensitive regular expression which tries to match
anywhere within the CSV record.
It is a POSIX ERE (extended regular expression) that also supports GNU
word boundaries (\f[C]\[rs]b\f[R], \f[C]\[rs]B\f[R], \f[C]\[rs]<\f[R],
\f[C]\[rs]>\f[R]), and nothing else.
If you have trouble, be sure to check our
https://hledger.org/hledger.html#regular-expressions doc.
.PP
Important note: the record that is matched is not the original record,
but a synthetic one, with any enclosing double quotes (but not enclosing
whitespace) removed, and always comma-separated (which means that a
field containing a comma will appear like two fields).
Eg, if the original record is
\f[C]2020-01-01; \[dq]Acme, Inc.\[dq]; 1,000\f[R], the REGEX will
actually see \f[C]2020-01-01,Acme, Inc., 1,000\f[R]).
.SS Matching individual fields
.PP
Or, MATCHER can be a field matcher, like this:
.IP
.nf
\f[C]
%CSVFIELD REGEX
\f[R]
.fi
.PP
which matches just the content of a particular CSV field.
CSVFIELD is a percent sign followed by the field\[aq]s name or column
number, like \f[C]%date\f[R] or \f[C]%1\f[R].
.SS Combining matchers
.PP
A single matcher can be written on the same line as the \[dq]if\[dq]; or
multiple matchers can be written on the following lines, non-indented.
Multiple matchers are OR\[aq]d (any one of them can match), unless one
begins with an \f[C]&\f[R] symbol, in which case it is AND\[aq]ed with
the previous matcher.
.IP
.nf
\f[C]
if
MATCHER
& MATCHER
RULE
\f[R]
.fi
.SS Rules applied on successful match
.PP
After the patterns there should be one or more rules to apply, all
indented by at least one space.
Three kinds of rule are allowed in conditional blocks:
.IP \[bu] 2
field assignments (to set a hledger field)
.IP \[bu] 2
skip (to skip the matched CSV record)
.IP \[bu] 2
end (to skip all remaining CSV records).
.PP
Examples:
.IP
.nf
\f[C]
# if the CSV record contains \[dq]groceries\[dq], set account2 to \[dq]expenses:groceries\[dq]
if groceries
account2 expenses:groceries
\f[R]
.fi
.IP
.nf
\f[C]
# if the CSV record contains any of these patterns, set account2 and comment as shown
if
monthly service fee
atm transaction fee
banking thru software
account2 expenses:business:banking
comment XXX deductible ? check it
\f[R]
.fi
.SS \f[C]if\f[R] table
.IP
.nf
\f[C]
if,CSVFIELDNAME1,CSVFIELDNAME2,...,CSVFIELDNAMEn
MATCHER1,VALUE11,VALUE12,...,VALUE1n
MATCHER2,VALUE21,VALUE22,...,VALUE2n
MATCHER3,VALUE31,VALUE32,...,VALUE3n
<empty line>
\f[R]
.fi
.PP
Conditional tables (\[dq]if tables\[dq]) are a different syntax to
specify field assignments that will be applied only to CSV records which
match certain patterns.
.PP
MATCHER could be either field or record matcher, as described above.
When MATCHER matches, values from that row would be assigned to the CSV
fields named on the \f[C]if\f[R] line, in the same order.
.PP
Therefore \f[C]if\f[R] table is exactly equivalent to a sequence of of
\f[C]if\f[R] blocks:
.IP
.nf
\f[C]
if MATCHER1
CSVFIELDNAME1 VALUE11
CSVFIELDNAME2 VALUE12
...
CSVFIELDNAMEn VALUE1n
if MATCHER2
CSVFIELDNAME1 VALUE21
CSVFIELDNAME2 VALUE22
...
CSVFIELDNAMEn VALUE2n
if MATCHER3
CSVFIELDNAME1 VALUE31
CSVFIELDNAME2 VALUE32
...
CSVFIELDNAMEn VALUE3n
\f[R]
.fi
.PP
Each line starting with MATCHER should contain enough (possibly empty)
values for all the listed fields.
.PP
Rules would be checked and applied in the order they are listed in the
table and, like with \f[C]if\f[R] blocks, later rules (in the same or
another table) or \f[C]if\f[R] blocks could override the effect of any
rule.
.PP
Instead of \[aq],\[aq] you can use a variety of other non-alphanumeric
characters as a separator.
First character after \f[C]if\f[R] is taken to be the separator for the
rest of the table.
It is the responsibility of the user to ensure that separator does not
occur inside MATCHERs and values - there is no way to escape separator.
.PP
Example:
.IP
.nf
\f[C]
if,account2,comment
atm transaction fee,expenses:business:banking,deductible? check it
%description groceries,expenses:groceries,
2020/01/12.*Plumbing LLC,expenses:house:upkeep,emergency plumbing call-out
\f[R]
.fi
.SS \f[C]end\f[R]
.PP
This rule can be used inside if blocks (only), to make hledger stop
reading this CSV file and move on to the next input file, or to command
execution.
Eg:
.IP
.nf
\f[C]
# ignore everything following the first empty record
if ,,,,
end
\f[R]
.fi
.SS \f[C]date-format\f[R]
.IP
.nf
\f[C]
date-format DATEFMT
\f[R]
.fi
.PP
This is a helper for the \f[C]date\f[R] (and \f[C]date2\f[R]) fields.
If your CSV dates are not formatted like \f[C]YYYY-MM-DD\f[R],
\f[C]YYYY/MM/DD\f[R] or \f[C]YYYY.MM.DD\f[R], you\[aq]ll need to add a
date-format rule describing them with a strptime date parsing pattern,
which must parse the CSV date value completely.
Some examples:
.IP
.nf
\f[C]
# MM/DD/YY
date-format %m/%d/%y
\f[R]
.fi
.IP
.nf
\f[C]
# D/M/YYYY
# The - makes leading zeros optional.
date-format %-d/%-m/%Y
\f[R]
.fi
.IP
.nf
\f[C]
# YYYY-Mmm-DD
date-format %Y-%h-%d
\f[R]
.fi
.IP
.nf
\f[C]
# M/D/YYYY HH:MM AM some other junk
# Note the time and junk must be fully parsed, though only the date is used.
date-format %-m/%-d/%Y %l:%M %p some other junk
\f[R]
.fi
.PP
For the supported strptime syntax, see:
.PD 0
.P
.PD
https://hackage.haskell.org/package/time/docs/Data-Time-Format.html#v:formatTime
.SS \f[C]decimal-mark\f[R]
.IP
.nf
\f[C]
decimal-mark .
\f[R]
.fi
.PP
or:
.IP
.nf
\f[C]
decimal-mark ,
\f[R]
.fi
.PP
hledger automatically accepts either period or comma as a decimal mark
when parsing numbers (cf Amounts).
However if any numbers in the CSV contain digit group marks, such as
thousand-separating commas, you should declare the decimal mark
explicitly with this rule, to avoid misparsed numbers.
.SS \f[C]newest-first\f[R]
.PP
hledger always sorts the generated transactions by date.
Transactions on the same date should appear in the same order as their
CSV records, as hledger can usually auto-detect whether the CSV\[aq]s
normal order is oldest first or newest first.
But if all of the following are true:
.IP \[bu] 2
the CSV might sometimes contain just one day of data (all records having
the same date)
.IP \[bu] 2
the CSV records are normally in reverse chronological order (newest at
the top)
.IP \[bu] 2
and you care about preserving the order of same-day transactions
.PP
then, you should add the \f[C]newest-first\f[R] rule as a hint.
Eg:
.IP
.nf
\f[C]
# tell hledger explicitly that the CSV is normally newest first
newest-first
\f[R]
.fi
.SS \f[C]include\f[R]
.IP
.nf
\f[C]
include RULESFILE
\f[R]
.fi
.PP
This includes the contents of another CSV rules file at this point.
\f[C]RULESFILE\f[R] is an absolute file path or a path relative to the
current file\[aq]s directory.
This can be useful for sharing common rules between several rules files,
eg:
.IP
.nf
\f[C]
# someaccount.csv.rules
## someaccount-specific rules
fields date,description,amount
account1 assets:someaccount
account2 expenses:misc
## common rules
include categorisation.rules
\f[R]
.fi
.SS \f[C]balance-type\f[R]
.PP
Balance assertions generated by assigning to balanceN are of the simple
\f[C]=\f[R] type by default, which is a single-commodity,
subaccount-excluding assertion.
You may find the subaccount-including variants more useful, eg if you
have created some virtual subaccounts of checking to help with
budgeting.
You can select a different type of assertion with the
\f[C]balance-type\f[R] rule:
.IP
.nf
\f[C]
# balance assertions will consider all commodities and all subaccounts
balance-type ==*
\f[R]
.fi
.PP
Here are the balance assertion types for quick reference:
.IP
.nf
\f[C]
= single commodity, exclude subaccounts
=* single commodity, include subaccounts
== multi commodity, exclude subaccounts
==* multi commodity, include subaccounts
\f[R]
.fi
.SH TIPS
.SS Rapid feedback
.PP
It\[aq]s a good idea to get rapid feedback while
creating/troubleshooting CSV rules.
Here\[aq]s a good way, using entr from http://eradman.com/entrproject :
.IP
.nf
\f[C]
$ ls foo.csv* | entr bash -c \[aq]echo ----; hledger -f foo.csv print desc:SOMEDESC\[aq]
\f[R]
.fi
.PP
A desc: query (eg) is used to select just one, or a few, transactions of
interest.
\[dq]bash -c\[dq] is used to run multiple commands, so we can echo a
separator each time the command re-runs, making it easier to read the
output.
.SS Valid CSV
.PP
hledger accepts CSV conforming to RFC 4180.
When CSV values are enclosed in quotes, note:
.IP \[bu] 2
they must be double quotes (not single quotes)
.IP \[bu] 2
spaces outside the quotes are not allowed
.SS File Extension
.PP
To help hledger identify the format and show the right error messages,
CSV/SSV/TSV files should normally be named with a \f[C].csv\f[R],
\f[C].ssv\f[R] or \f[C].tsv\f[R] filename extension.
Or, the file path should be prefixed with \f[C]csv:\f[R], \f[C]ssv:\f[R]
or \f[C]tsv:\f[R].
Eg:
.IP
.nf
\f[C]
$ hledger -f foo.ssv print
\f[R]
.fi
.PP
or:
.IP
.nf
\f[C]
$ cat foo | hledger -f ssv:- foo
\f[R]
.fi
.PP
You can override the file extension with a separator rule if needed.
See also: Input files in the hledger manual.
.SS Reading multiple CSV files
.PP
If you use multiple \f[C]-f\f[R] options to read multiple CSV files at
once, hledger will look for a correspondingly-named rules file for each
CSV file.
But if you use the \f[C]--rules-file\f[R] option, that rules file will
be used for all the CSV files.
.SS Valid transactions
.PP
After reading a CSV file, hledger post-processes and validates the
generated journal entries as it would for a journal file - balancing
them, applying balance assignments, and canonicalising amount styles.
Any errors at this stage will be reported in the usual way, displaying
the problem entry.
.PP
There is one exception: balance assertions, if you have generated them,
will not be checked, since normally these will work only when the CSV
data is part of the main journal.
If you do need to check balance assertions generated from CSV right
away, pipe into another hledger:
.IP
.nf
\f[C]
$ hledger -f file.csv print | hledger -f- print
\f[R]
.fi
.SS Deduplicating, importing
.PP
When you download a CSV file periodically, eg to get your latest bank
transactions, the new file may overlap with the old one, containing some
of the same records.
.PP
The import command will (a) detect the new transactions, and (b) append
just those transactions to your main journal.
It is idempotent, so you don\[aq]t have to remember how many times you
ran it or with which version of the CSV.
(It keeps state in a hidden \f[C].latest.FILE.csv\f[R] file.) This is
the easiest way to import CSV data.
Eg:
.IP
.nf
\f[C]
# download the latest CSV files, then run this command.
# Note, no -f flags needed here.
$ hledger import *.csv [--dry]
\f[R]
.fi
.PP
This method works for most CSV files.
(Where records have a stable chronological order, and new records appear
only at the new end.)
.PP
A number of other tools and workflows, hledger-specific and otherwise,
exist for converting, deduplicating, classifying and managing CSV data.
See:
.IP \[bu] 2
https://hledger.org -> sidebar -> real world setups
.IP \[bu] 2
https://plaintextaccounting.org -> data import/conversion
.SS Setting amounts
.PP
A posting amount can be set in one of these ways:
.IP \[bu] 2
by assigning (with a fields list or field assignment) to
\f[C]amountN\f[R] (posting N\[aq]s amount) or \f[C]amount\f[R] (posting
1\[aq]s amount)
.IP \[bu] 2
by assigning to \f[C]amountN-in\f[R] and \f[C]amountN-out\f[R] (or
\f[C]amount-in\f[R] and \f[C]amount-out\f[R]).
For each CSV record, whichever of these has a non-zero value will be
used, with appropriate sign.
If both contain a non-zero value, this may not work.
.IP \[bu] 2
by assigning to \f[C]balanceN\f[R] (or \f[C]balance\f[R]) instead of the
above, setting the amount indirectly via a balance assignment.
If you do this the default account name may be wrong, so you should set
that explicitly.
.PP
There is some special handling for an amount\[aq]s sign:
.IP \[bu] 2
If an amount value is parenthesised, it will be de-parenthesised and
sign-flipped.
.IP \[bu] 2
If an amount value begins with a double minus sign, those cancel out and
are removed.
.IP \[bu] 2
If an amount value begins with a plus sign, that will be removed
.SS Setting currency/commodity
.PP
If the currency/commodity symbol is included in the CSV\[aq]s amount
field(s):
.IP
.nf
\f[C]
2020-01-01,foo,$123.00
\f[R]
.fi
.PP
you don\[aq]t have to do anything special for the commodity symbol, it
will be assigned as part of the amount.
Eg:
.IP
.nf
\f[C]
fields date,description,amount
\f[R]
.fi
.IP
.nf
\f[C]
2020-01-01 foo
expenses:unknown $123.00
income:unknown $-123.00
\f[R]
.fi
.PP
If the currency is provided as a separate CSV field:
.IP
.nf
\f[C]
2020-01-01,foo,USD,123.00
\f[R]
.fi
.PP
You can assign that to the \f[C]currency\f[R] pseudo-field, which has
the special effect of prepending itself to every amount in the
transaction (on the left, with no separating space):
.IP
.nf
\f[C]
fields date,description,currency,amount
\f[R]
.fi
.IP
.nf
\f[C]
2020-01-01 foo
expenses:unknown USD123.00
income:unknown USD-123.00
\f[R]
.fi
.PP
Or, you can use a field assignment to construct the amount yourself,
with more control.
Eg to put the symbol on the right, and separated by a space:
.IP
.nf
\f[C]
fields date,description,cur,amt
amount %amt %cur
\f[R]
.fi
.IP
.nf
\f[C]
2020-01-01 foo
expenses:unknown 123.00 USD
income:unknown -123.00 USD
\f[R]
.fi
.PP
Note we used a temporary field name (\f[C]cur\f[R]) that is not
\f[C]currency\f[R] - that would trigger the prepending effect, which we
don\[aq]t want here.
.SS Referencing other fields
.PP
In field assignments, you can interpolate only CSV fields, not hledger
fields.
In the example below, there\[aq]s both a CSV field and a hledger field
named amount1, but %amount1 always means the CSV field, not the hledger
field:
.IP
.nf
\f[C]
# Name the third CSV field \[dq]amount1\[dq]
fields date,description,amount1
# Set hledger\[aq]s amount1 to the CSV amount1 field followed by USD
amount1 %amount1 USD
# Set comment to the CSV amount1 (not the amount1 assigned above)
comment %amount1
\f[R]
.fi
.PP
Here, since there\[aq]s no CSV amount1 field, %amount1 will produce a
literal \[dq]amount1\[dq]:
.IP
.nf
\f[C]
fields date,description,csvamount
amount1 %csvamount USD
# Can\[aq]t interpolate amount1 here
comment %amount1
\f[R]
.fi
.PP
When there are multiple field assignments to the same hledger field,
only the last one takes effect.
Here, comment\[aq]s value will be be B, or C if \[dq]something\[dq] is
matched, but never A:
.IP
.nf
\f[C]
comment A
comment B
if something
comment C
\f[R]
.fi
.SS How CSV rules are evaluated
.PP
Here\[aq]s how to think of CSV rules being evaluated (if you really need
to).
First,
.IP \[bu] 2
\f[C]include\f[R] - all includes are inlined, from top to bottom, depth
first.
(At each include point the file is inlined and scanned for further
includes, recursively, before proceeding.)
.PP
Then \[dq]global\[dq] rules are evaluated, top to bottom.
If a rule is repeated, the last one wins:
.IP \[bu] 2
\f[C]skip\f[R] (at top level)
.IP \[bu] 2
\f[C]date-format\f[R]
.IP \[bu] 2
\f[C]newest-first\f[R]
.IP \[bu] 2
\f[C]fields\f[R] - names the CSV fields, optionally sets up initial
assignments to hledger fields
.PP
Then for each CSV record in turn:
.IP \[bu] 2
test all \f[C]if\f[R] blocks.
If any of them contain a \f[C]end\f[R] rule, skip all remaining CSV
records.
Otherwise if any of them contain a \f[C]skip\f[R] rule, skip that many
CSV records.
If there are multiple matched \f[C]skip\f[R] rules, the first one wins.
.IP \[bu] 2
collect all field assignments at top level and in matched \f[C]if\f[R]
blocks.
When there are multiple assignments for a field, keep only the last one.
.IP \[bu] 2
compute a value for each hledger field - either the one that was
assigned to it (and interpolate the %CSVFIELDNAME references), or a
default
.IP \[bu] 2
generate a synthetic hledger transaction from these values.
.PP
This is all part of the CSV reader, one of several readers hledger can
use to parse input files.
When all files have been read successfully, the transactions are passed
as input to whichever hledger command the user specified.
.SH "REPORTING BUGS"
Report bugs at http://bugs.hledger.org
(or on the #hledger IRC channel or hledger mail list)
.SH AUTHORS
Simon Michael <simon@joyful.com> and contributors
.SH COPYRIGHT
Copyright (C) 2007-2020 Simon Michael.
.br
Released under GNU GPL v3 or later.
.SH SEE ALSO
hledger(1), hledger\-ui(1), hledger\-web(1),
hledger_csv(5), hledger_journal(5), hledger_timeclock(5), hledger_timedot(5),
ledger(1)