hledger/hledger-lib/hledger_csv.5

608 lines
17 KiB
Groff
Raw Normal View History

2019-09-13 21:00:03 +03:00
.TH "hledger_csv" "5" "September 2019" "hledger 1.15.99" "hledger User Manuals"
.SH NAME
.PP
2019-05-24 08:26:43 +03:00
CSV - how hledger reads CSV data, and the CSV rules file format
.SH DESCRIPTION
.PP
2019-05-24 08:26:43 +03:00
hledger can read CSV (comma-separated value) files as if they were
journal files, automatically converting each CSV record into a
transaction.
2019-05-24 08:26:43 +03:00
(To learn about \f[I]writing\f[R] CSV, see CSV output.)
.PP
Converting CSV to transactions requires some special conversion rules.
These do several things:
.IP \[bu] 2
they describe the layout and format of the CSV data
.IP \[bu] 2
2019-11-07 00:10:17 +03:00
they can customize the generated journal entries (transactions) using a
simple templating language
.IP \[bu] 2
they can add refinements based on patterns in the CSV data, eg
categorizing transactions with more detailed account names.
.PP
2019-05-24 08:26:43 +03:00
When reading a CSV file named \f[C]FILE.csv\f[R], hledger looks for a
conversion rules file named \f[C]FILE.csv.rules\f[R] in the same
directory.
2019-05-24 08:26:43 +03:00
You can override this with the \f[C]--rules-file\f[R] option.
If the rules file does not exist, hledger will auto-create one with some
example rules, which you\[aq]ll need to adjust.
.PP
2019-05-24 08:26:43 +03:00
At minimum, the rules file must identify the date and amount fields.
It\[aq]s often necessary to specify the date format, and the number of
header lines to skip, also.
Eg:
.IP
.nf
\f[C]
2019-05-24 08:26:43 +03:00
fields date, _, _, amount
date-format %d/%m/%Y
skip 1
\f[R]
.fi
.PP
2019-11-07 00:10:17 +03:00
More examples in the EXAMPLES section below.
.SH CSV RULES
.PP
The following kinds of rule can appear in the rules file, in any order
(except for \f[C]end\f[R] which can appear only inside a conditional
block).
Blank lines and lines beginning with \f[C]#\f[R] or \f[C];\f[R] are
ignored.
.SS \f[C]skip\f[R]
.IP
.nf
\f[C]
2019-11-07 00:10:17 +03:00
skip N
2019-05-24 08:26:43 +03:00
\f[R]
.fi
.PP
2019-11-07 00:10:17 +03:00
The word \[dq]skip\[dq] followed by a number (or no number, meaning 1)
tells hledger to ignore this many non-empty lines preceding the CSV
data.
(Empty/blank lines are skipped automatically.) You\[aq]ll need this
whenever your CSV data contains header lines.
.PP
2019-11-07 00:10:17 +03:00
It also has a second purpose: it can be used to ignore certain CSV
records, see conditional blocks below.
.SS \f[C]fields\f[R]
.IP
.nf
\f[C]
fields FIELDNAME1, FIELDNAME2, ...
\f[R]
.fi
.PP
A fields list (\[dq]fields\[dq] followed by one or more comma-separated
field names) is the quick way to assign CSV field values to hledger
fields.
It (a) names the CSV fields, in order (names may not contain whitespace;
fields you don\[aq]t care about can be left unnamed), and (b) assigns
them to hledger fields if you use standard hledger field names.
Here\[aq]s an example:
.IP
.nf
\f[C]
# use the 1st, 2nd and 4th CSV fields as the transaction\[aq]s date, description and amount,
# ignore the 3rd, 5th and 6th fields,
# and name the 7th and 8th fields for later reference:
# 1 2 3 4 5 6 7 8
fields date, description, , amount1, , , somefield, anotherfield
\f[R]
.fi
2016-04-13 06:31:17 +03:00
.PP
2019-11-07 00:10:17 +03:00
Here are the standard hledger field names:
.SS Transaction fields
.PP
2019-11-07 00:10:17 +03:00
\f[C]date\f[R], \f[C]date2\f[R], \f[C]status\f[R], \f[C]code\f[R],
\f[C]description\f[R], \f[C]comment\f[R] can be used to form the
transaction\[aq]s first line.
Only \f[C]date\f[R] is required.
(See also date-format below.)
.SS Posting fields
.PP
\f[C]accountN\f[R], where N is 1 to 9, sets the Nth posting\[aq]s
account name.
Most often there are two postings, so you\[aq]ll want to set
\f[C]account1\f[R] and \f[C]account2\f[R].
.PP
A number of field/pseudo-field names are available for setting posting
amounts:
.IP \[bu] 2
\f[C]amountN\f[R] sets posting N\[aq]s amount
.IP \[bu] 2
\f[C]amountN-in\f[R] and \f[C]amountN-out\f[R] can be used instead, if
the CSV has separate fields for debits and credits
.IP \[bu] 2
\f[C]currencyN\f[R] sets a currency symbol to be left-prefixed to the
amount, useful if the CSV provides that as a separate field
.IP \[bu] 2
\f[C]balanceN\f[R] sets a (separate) balance assertion amount (or when
no posting amount is set, a balance assignment)
.PP
If you write these with no number (\f[C]amount\f[R],
\f[C]amount-in\f[R], \f[C]amount-out\f[R], \f[C]currency\f[R],
\f[C]balance\f[R]), it means posting 1.
Also, if you set an amount for posting 1 only, a second posting that
balances the transaction will be generated automatically.
This helps support CSV rules created before hledger 1.16.
.PP
Finally, \f[C]commentN\f[R] sets a comment on the Nth posting.
Comments can of course contain tags.
.SS \f[C](field assignment)\f[R]
.IP
.nf
\f[C]
HLEDGERFIELDNAME FIELDVALUE
\f[R]
.fi
.PP
Instead of or in addition to a fields list, you can assign a value to a
hledger field by writing its name (any of the standard names above)
followed by a text value.
The value may contain interpolated CSV fields, referenced by their
1-based position in the CSV record (\f[C]%N\f[R]), or by the name they
were given in the fields list (\f[C]%CSVFIELDNAME\f[R]).
Eg:
.IP
.nf
\f[C]
2019-11-07 00:10:17 +03:00
# set the amount to the 4th CSV field, with \[dq] USD\[dq] appended
amount %4 USD
\f[R]
.fi
.IP
.nf
\f[C]
# combine three fields to make a comment, containing note: and date: tags
comment note: %somefield - %anotherfield, date: %1
2019-05-24 08:26:43 +03:00
\f[R]
.fi
2016-04-13 06:31:17 +03:00
.PP
2019-11-07 00:10:17 +03:00
Interpolation strips any outer whitespace, so a CSV value like
\f[C]\[dq] 1 \[dq]\f[R] becomes \f[C]1\f[R] when interpolated (#1051).
Note you can only interpolate CSV fields, not the hledger fields being
assigned to; for more on this, see TIPS.
.SS \f[C]date-format\f[R]
.IP
.nf
\f[C]
date-format DATEFMT
\f[R]
.fi
.PP
2019-11-07 00:10:17 +03:00
This is a helper for the \f[C]date\f[R] (and \f[C]date2\f[R]) fields.
If your CSV dates are not formatted like \f[C]YYYY-MM-DD\f[R],
\f[C]YYYY/MM/DD\f[R] or \f[C]YYYY.MM.DD\f[R], you\[aq]ll need to specify
the format by writing \[dq]date-format\[dq] followed by a strptime-like
date parsing pattern, which must parse the date field values completely.
Examples:
.IP
.nf
\f[C]
2019-05-24 08:26:43 +03:00
# for dates like \[dq]11/06/2013\[dq]:
date-format %m/%d/%Y
\f[R]
.fi
.IP
.nf
\f[C]
2019-11-07 00:10:17 +03:00
# for dates like \[dq]6/11/2013\[dq]. The - allows leading zeros to be optional.
2019-05-24 08:26:43 +03:00
date-format %-d/%-m/%Y
\f[R]
.fi
.IP
.nf
\f[C]
2019-05-24 08:26:43 +03:00
# for dates like \[dq]2013-Nov-06\[dq]:
date-format %Y-%h-%d
\f[R]
.fi
.IP
.nf
\f[C]
2019-05-24 08:26:43 +03:00
# for dates like \[dq]11/6/2013 11:32 PM\[dq]:
date-format %-m/%-d/%Y %l:%M %p
\f[R]
.fi
2019-11-07 00:10:17 +03:00
.SS \f[C]if\f[R]
.IP
.nf
\f[C]
2019-11-07 00:10:17 +03:00
if PATTERN
RULE
if
PATTERN
PATTERN
PATTERN
RULE
RULE
2019-05-24 08:26:43 +03:00
\f[R]
.fi
2016-04-13 06:31:17 +03:00
.PP
2019-11-07 00:10:17 +03:00
Conditional blocks apply one or more rules to CSV records which are
matched by any of the PATTERNs.
This allows transactions to be customised or categorised based on
patterns in the data.
.PP
2019-11-07 00:10:17 +03:00
A single pattern can be written on the same line as the \[dq]if\[dq]; or
multiple patterns can be written on the following lines, non-indented.
.PP
2019-11-07 00:10:17 +03:00
Patterns are case-insensitive regular expressions which try to match any
part of the whole CSV record.
It\[aq]s not yet possible to match within a specific field.
Note the CSV record they see is close but not identical to the one in
the CSV file; eg double quotes are removed, and the separator character
becomes comma.
.PP
After the patterns, there should be one or more rules to apply, all
indented by at least one space.
Three kinds of rule are allowed in conditional blocks:
.IP \[bu] 2
field assignments (to set a field\[aq]s value)
.IP \[bu] 2
skip (to skip the matched CSV record)
.IP \[bu] 2
end (to skip all remaining CSV records).
2019-09-01 07:02:00 +03:00
.PP
Examples:
.IP
.nf
\f[C]
2019-05-24 08:26:43 +03:00
# if the CSV record contains \[dq]groceries\[dq], set account2 to \[dq]expenses:groceries\[dq]
if groceries
account2 expenses:groceries
\f[R]
.fi
.IP
.nf
\f[C]
2019-05-24 08:26:43 +03:00
# if the CSV record contains any of these patterns, set account2 and comment as shown
if
2019-05-24 08:26:43 +03:00
monthly service fee
atm transaction fee
banking thru software
account2 expenses:business:banking
comment XXX deductible ? check it
\f[R]
.fi
2019-11-07 00:10:17 +03:00
.SS \f[C]end\f[R]
.PP
As mentioned above, this rule can be used inside conditional blocks
(only) to cause hledger to stop reading CSV records and proceed with
command execution.
Eg:
.IP
.nf
\f[C]
# ignore everything following the first empty record
if ,,,,
end
\f[R]
.fi
.SS \f[C]include\f[R]
.IP
.nf
\f[C]
include RULESFILE
\f[R]
.fi
.PP
Include another CSV rules file at this point, as if it were written
inline.
\f[C]RULESFILE\f[R] is an absolute file path or a path relative to the
current file\[aq]s directory.
.PP
This can be useful eg for reusing common rules in several rules files:
.IP
.nf
\f[C]
# someaccount.csv.rules
## someaccount-specific rules
fields date,description,amount
account1 some:account
account2 some:misc
## common rules
include categorisation.rules
\f[R]
.fi
.SS \f[C]newest-first\f[R]
2016-04-13 06:31:17 +03:00
.PP
2019-11-07 00:10:17 +03:00
hledger always sorts the generated transactions by date.
Transactions on the same date should appear in the same order as their
CSV records, as hledger can usually auto-detect whether the CSV\[aq]s
normal order is oldest first or newest first.
But if all of the following are true:
.IP \[bu] 2
the CSV might sometimes contain just one day of data (all records having
the same date)
.IP \[bu] 2
the CSV records are normally in reverse chronological order (newest
first)
.IP \[bu] 2
and you care about preserving the order of same-day transactions
.PP
2019-11-07 00:10:17 +03:00
you should add the \f[C]newest-first\f[R] rule as a hint.
Eg:
.IP
.nf
\f[C]
2019-11-07 00:10:17 +03:00
# tell hledger explicitly that the CSV is normally newest-first
newest-first
2019-05-24 08:26:43 +03:00
\f[R]
.fi
2019-11-07 00:10:17 +03:00
.SH EXAMPLES
.PP
A more complete example, generating three-posting transactions:
.IP
.nf
\f[C]
# hledger CSV rules for amazon.com order history
# sample:
# \[dq]Date\[dq],\[dq]Type\[dq],\[dq]To/From\[dq],\[dq]Name\[dq],\[dq]Status\[dq],\[dq]Amount\[dq],\[dq]Fees\[dq],\[dq]Transaction ID\[dq]
# \[dq]Jul 29, 2012\[dq],\[dq]Payment\[dq],\[dq]To\[dq],\[dq]Adapteva, Inc.\[dq],\[dq]Completed\[dq],\[dq]$25.00\[dq],\[dq]$0.00\[dq],\[dq]17LA58JSK6PRD4HDGLNJQPI1PB9N8DKPVHL\[dq]
# skip one header line
skip 1
# name the csv fields (and assign the transaction\[aq]s date, amount and code)
fields date, _, toorfrom, name, amzstatus, amount1, fees, code
# how to parse the date
date-format %b %-d, %Y
# combine two fields to make the description
description %toorfrom %name
# save these fields as tags
comment status:%amzstatus
# set the base account for all transactions
account1 assets:amazon
# flip the sign on the amount
amount -%amount
# Put fees in a separate posting
amount3 %fees
comment3 fees
\f[R]
.fi
.PP
For more examples, see Convert CSV files.
.SH TIPS
.SS Reading multiple CSV files
.PP
You can read multiple CSV files at once using multiple \f[C]-f\f[R]
arguments on the command line.
hledger will look for a correspondingly-named rules file for each CSV
file.
If you use the \f[C]--rules-file\f[R] option, that rules file will be
used for all the CSV files.
.SS Deduplicating, importing
.PP
When you download a CSV file repeatedly, eg to get your latest bank
transactions, the new file may contain some of the same records as the
old one.
The print --new command is one simple way to detect just the new
transactions.
Or better still, the import command appends those new transactions to
your main journal.
This is the easiest way to import CSV data.
Eg, after downloading your latest CSV files:
.IP
.nf
\f[C]
$ hledger import *.csv [--dry]
\f[R]
.fi
.SS Other import methods
.PP
A number of other tools and workflows, hledger-specific and otherwise,
exist for converting, deduplicating, classifying and managing CSV data.
See:
.IP \[bu] 2
https://hledger.org -> sidebar -> real world setups
.IP \[bu] 2
https://plaintextaccounting.org -> data import/conversion
.SS Valid CSV
.PP
hledger accepts CSV conforming to RFC 4180.
Some things to note when values are enclosed in quotes:
.IP \[bu] 2
you must use double quotes (not single quotes)
.IP \[bu] 2
spaces outside the quotes are not allowed
.SS Other separator characters
.PP
With the \f[C]--separator \[aq]CHAR\[aq]\f[R] option, hledger will
expect the separator to be CHAR instead of a comma.
Ie it will read other \[dq]Character Separated Values\[dq] formats, such
as TSV (Tab Separated Values).
Note: on the command line, use a real tab character in quotes, not Eg:
.IP
.nf
\f[C]
$ hledger -f foo.tsv --separator \[aq] \[aq] print
\f[R]
.fi
.PP
(Experimental.)
.SS Setting amounts
.PP
A posting amount can be set in one of these ways:
2019-05-24 08:26:43 +03:00
.IP \[bu] 2
2019-11-07 00:10:17 +03:00
by assigning (with a fields list or field assigment) to
\f[C]amountN\f[R] (posting N\[aq]s amount) or \f[C]amount\f[R] (posting
1\[aq]s amount)
2019-05-24 08:26:43 +03:00
.IP \[bu] 2
2019-11-07 00:10:17 +03:00
by assigning to \f[C]amountN-in\f[R] and \f[C]amountN-out\f[R] (or
\f[C]amount-in\f[R] and \f[C]amount-out\f[R]).
For each CSV record, whichever of these has a non-zero value will be
used, with appropriate sign.
If both contain a non-zero value, this may not work.
2019-05-24 08:26:43 +03:00
.IP \[bu] 2
2019-11-07 00:10:17 +03:00
by assigning to \f[C]balanceN\f[R] (or \f[C]balance\f[R]) instead of the
above, setting the amount indirectly via a balance assignment.
.PP
2019-05-24 08:26:43 +03:00
There is some special handling for sign in amounts:
.IP \[bu] 2
If an amount value is parenthesised, it will be de-parenthesised and
sign-flipped.
.IP \[bu] 2
2019-11-07 00:10:17 +03:00
If an amount value begins with a double minus sign, those cancel out and
are removed.
2017-04-19 18:58:51 +03:00
.PP
2019-05-24 08:26:43 +03:00
If the currency/commodity symbol is provided as a separate CSV field,
2019-11-07 00:10:17 +03:00
you can assign it to \f[C]currency\f[R] (affects all posting amounts) or
\f[C]currencyN\f[R] (affects just posting N\[aq]s amount).
The symbol will be prepended to the amount.
Or for more control, you can set both currency symbol and amount with a
field assignment, eg:
2019-05-24 08:26:43 +03:00
.IP
.nf
\f[C]
fields date,description,currency,amount
2019-11-07 00:10:17 +03:00
# add currency symbol on the right:
2019-05-24 08:26:43 +03:00
amount %amount %currency
\f[R]
.fi
2019-11-07 00:10:17 +03:00
.SS Referencing other fields
.PP
2019-11-07 00:10:17 +03:00
In field assignments, you can interpolate only CSV fields, not hledger
fields.
In the example below, there\[aq]s both a CSV field and a hledger field
named amount1, but %amount1 always means the CSV field, not the hledger
field:
.IP
.nf
\f[C]
# Name the third CSV field \[dq]amount1\[dq]
fields date,description,amount1
# Set hledger\[aq]s amount1 to the CSV amount1 field followed by USD
amount1 %amount1 USD
# Set comment to the CSV amount1 (not the amount1 assigned above)
comment %amount1
\f[R]
.fi
2019-09-01 07:02:00 +03:00
.PP
2019-11-07 00:10:17 +03:00
Here, since there\[aq]s no CSV amount1 field, %amount1 will produce a
literal \[dq]amount1\[dq]:
.IP
.nf
\f[C]
fields date,description,csvamount
amount1 %csvamount USD
# Can\[aq]t interpolate amount1 here
comment %amount1
\f[R]
.fi
.PP
When there are multiple field assignments to the same hledger field,
only the last one takes effect.
Here, comment\[aq]s value will be be B, or C if \[dq]something\[dq] is
matched, but never A:
.IP
.nf
\f[C]
comment A
comment B
if something
comment C
\f[R]
.fi
.SS How CSV rules are evaluated
.PP
Here\[aq]s how to think of CSV rules being evaluated (if you really need
to).
First,
.IP \[bu] 2
include - all includes are inlined, from top to bottom, depth first.
(At each include point the file is inlined and scanned for further
includes, before proceeding.)
2019-09-01 07:02:00 +03:00
.PP
2019-11-07 00:10:17 +03:00
Then \[dq]global\[dq] rules are evaluated, top to bottom.
If a rule is repeated, the last one wins:
.IP \[bu] 2
skip (at top level)
.IP \[bu] 2
date-format
.IP \[bu] 2
newest-first
.IP \[bu] 2
fields - names the CSV fields, optionally sets up initial assignments to
hledger fields
2019-09-01 07:02:00 +03:00
.PP
2019-11-07 00:10:17 +03:00
Then for each CSV record in turn:
2019-09-01 07:02:00 +03:00
.IP \[bu] 2
2019-11-07 00:10:17 +03:00
test all \f[C]if\f[R] blocks.
If any of them contain a \f[C]end\f[R] rule, skip all remaining CSV
records.
Otherwise if any of them contain a \f[C]skip\f[R] rule, skip that many
CSV records.
If there are multiple matched skip rules, the first one wins.
2019-09-01 07:02:00 +03:00
.IP \[bu] 2
2019-11-07 00:10:17 +03:00
collect all field assignments at top level and in matched if blocks.
When there are multiple assignments for a field, keep only the last one.
.IP \[bu] 2
compute a value for each hledger field - either the one that was
assigned to it (and interpolate the %CSVFIELDNAME references), or a
default
.IP \[bu] 2
generate a synthetic hledger transaction from these values, which
becomes part of the input to the hledger command that has been selected
.SS Valid transactions
.PP
hledger currently does not post-process and validate transactions
generated from CSV as thoroughly as transactions read from a journal
file.
This means that if your rules are wrong, you can generate invalid
transactions.
Or, amounts may not be displayed with a canonical display style.
.PP
So when setting up or adjusting CSV rules, you should check your results
visually with the print command.
You can pipe print\[aq]s output through hledger once more to validate
and canonicalise fully.
Eg:
.IP
.nf
\f[C]
$ hledger -f some.csv print | hledger -f- print -I
\f[R]
.fi
.PP
(The -I/--ignore-assertions flag disables balance assertion checks,
usually needed when re-parsing print output.)
.SH "REPORTING BUGS"
Report bugs at http://bugs.hledger.org
(or on the #hledger IRC channel or hledger mail list)
.SH AUTHORS
Simon Michael <simon@joyful.com> and contributors
.SH COPYRIGHT
Copyright (C) 2007-2019 Simon Michael.
.br
2016-04-13 06:31:17 +03:00
Released under GNU GPL v3 or later.
.SH SEE ALSO
hledger(1), hledger\-ui(1), hledger\-web(1), hledger\-api(1),
2016-04-13 07:10:02 +03:00
hledger_csv(5), hledger_journal(5), hledger_timeclock(5), hledger_timedot(5),
ledger(1)
http://hledger.org