mirror of
https://github.com/simonmichael/hledger.git
synced 2024-11-12 19:08:34 +03:00
7ecc42f142
[ci skip]
608 lines
17 KiB
Groff
608 lines
17 KiB
Groff
|
|
.TH "hledger_csv" "5" "September 2019" "hledger 1.15.99" "hledger User Manuals"
|
|
|
|
|
|
|
|
.SH NAME
|
|
.PP
|
|
CSV - how hledger reads CSV data, and the CSV rules file format
|
|
.SH DESCRIPTION
|
|
.PP
|
|
hledger can read CSV (comma-separated value) files as if they were
|
|
journal files, automatically converting each CSV record into a
|
|
transaction.
|
|
(To learn about \f[I]writing\f[R] CSV, see CSV output.)
|
|
.PP
|
|
Converting CSV to transactions requires some special conversion rules.
|
|
These do several things:
|
|
.IP \[bu] 2
|
|
they describe the layout and format of the CSV data
|
|
.IP \[bu] 2
|
|
they can customize the generated journal entries (transactions) using a
|
|
simple templating language
|
|
.IP \[bu] 2
|
|
they can add refinements based on patterns in the CSV data, eg
|
|
categorizing transactions with more detailed account names.
|
|
.PP
|
|
When reading a CSV file named \f[C]FILE.csv\f[R], hledger looks for a
|
|
conversion rules file named \f[C]FILE.csv.rules\f[R] in the same
|
|
directory.
|
|
You can override this with the \f[C]--rules-file\f[R] option.
|
|
If the rules file does not exist, hledger will auto-create one with some
|
|
example rules, which you\[aq]ll need to adjust.
|
|
.PP
|
|
At minimum, the rules file must identify the date and amount fields.
|
|
It\[aq]s often necessary to specify the date format, and the number of
|
|
header lines to skip, also.
|
|
Eg:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
fields date, _, _, amount
|
|
date-format %d/%m/%Y
|
|
skip 1
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
More examples in the EXAMPLES section below.
|
|
.SH CSV RULES
|
|
.PP
|
|
The following kinds of rule can appear in the rules file, in any order
|
|
(except for \f[C]end\f[R] which can appear only inside a conditional
|
|
block).
|
|
Blank lines and lines beginning with \f[C]#\f[R] or \f[C];\f[R] are
|
|
ignored.
|
|
.SS \f[C]skip\f[R]
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
skip N
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
The word \[dq]skip\[dq] followed by a number (or no number, meaning 1)
|
|
tells hledger to ignore this many non-empty lines preceding the CSV
|
|
data.
|
|
(Empty/blank lines are skipped automatically.) You\[aq]ll need this
|
|
whenever your CSV data contains header lines.
|
|
.PP
|
|
It also has a second purpose: it can be used to ignore certain CSV
|
|
records, see conditional blocks below.
|
|
.SS \f[C]fields\f[R]
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
fields FIELDNAME1, FIELDNAME2, ...
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
A fields list (\[dq]fields\[dq] followed by one or more comma-separated
|
|
field names) is the quick way to assign CSV field values to hledger
|
|
fields.
|
|
It (a) names the CSV fields, in order (names may not contain whitespace;
|
|
fields you don\[aq]t care about can be left unnamed), and (b) assigns
|
|
them to hledger fields if you use standard hledger field names.
|
|
Here\[aq]s an example:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# use the 1st, 2nd and 4th CSV fields as the transaction\[aq]s date, description and amount,
|
|
# ignore the 3rd, 5th and 6th fields,
|
|
# and name the 7th and 8th fields for later reference:
|
|
# 1 2 3 4 5 6 7 8
|
|
|
|
fields date, description, , amount1, , , somefield, anotherfield
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
Here are the standard hledger field names:
|
|
.SS Transaction fields
|
|
.PP
|
|
\f[C]date\f[R], \f[C]date2\f[R], \f[C]status\f[R], \f[C]code\f[R],
|
|
\f[C]description\f[R], \f[C]comment\f[R] can be used to form the
|
|
transaction\[aq]s first line.
|
|
Only \f[C]date\f[R] is required.
|
|
(See also date-format below.)
|
|
.SS Posting fields
|
|
.PP
|
|
\f[C]accountN\f[R], where N is 1 to 9, sets the Nth posting\[aq]s
|
|
account name.
|
|
Most often there are two postings, so you\[aq]ll want to set
|
|
\f[C]account1\f[R] and \f[C]account2\f[R].
|
|
.PP
|
|
A number of field/pseudo-field names are available for setting posting
|
|
amounts:
|
|
.IP \[bu] 2
|
|
\f[C]amountN\f[R] sets posting N\[aq]s amount
|
|
.IP \[bu] 2
|
|
\f[C]amountN-in\f[R] and \f[C]amountN-out\f[R] can be used instead, if
|
|
the CSV has separate fields for debits and credits
|
|
.IP \[bu] 2
|
|
\f[C]currencyN\f[R] sets a currency symbol to be left-prefixed to the
|
|
amount, useful if the CSV provides that as a separate field
|
|
.IP \[bu] 2
|
|
\f[C]balanceN\f[R] sets a (separate) balance assertion amount (or when
|
|
no posting amount is set, a balance assignment)
|
|
.PP
|
|
If you write these with no number (\f[C]amount\f[R],
|
|
\f[C]amount-in\f[R], \f[C]amount-out\f[R], \f[C]currency\f[R],
|
|
\f[C]balance\f[R]), it means posting 1.
|
|
Also, if you set an amount for posting 1 only, a second posting that
|
|
balances the transaction will be generated automatically.
|
|
This helps support CSV rules created before hledger 1.16.
|
|
.PP
|
|
Finally, \f[C]commentN\f[R] sets a comment on the Nth posting.
|
|
Comments can of course contain tags.
|
|
.SS \f[C](field assignment)\f[R]
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
HLEDGERFIELDNAME FIELDVALUE
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
Instead of or in addition to a fields list, you can assign a value to a
|
|
hledger field by writing its name (any of the standard names above)
|
|
followed by a text value.
|
|
The value may contain interpolated CSV fields, referenced by their
|
|
1-based position in the CSV record (\f[C]%N\f[R]), or by the name they
|
|
were given in the fields list (\f[C]%CSVFIELDNAME\f[R]).
|
|
Eg:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# set the amount to the 4th CSV field, with \[dq] USD\[dq] appended
|
|
amount %4 USD
|
|
\f[R]
|
|
.fi
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# combine three fields to make a comment, containing note: and date: tags
|
|
comment note: %somefield - %anotherfield, date: %1
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
Interpolation strips any outer whitespace, so a CSV value like
|
|
\f[C]\[dq] 1 \[dq]\f[R] becomes \f[C]1\f[R] when interpolated (#1051).
|
|
Note you can only interpolate CSV fields, not the hledger fields being
|
|
assigned to; for more on this, see TIPS.
|
|
.SS \f[C]date-format\f[R]
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
date-format DATEFMT
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
This is a helper for the \f[C]date\f[R] (and \f[C]date2\f[R]) fields.
|
|
If your CSV dates are not formatted like \f[C]YYYY-MM-DD\f[R],
|
|
\f[C]YYYY/MM/DD\f[R] or \f[C]YYYY.MM.DD\f[R], you\[aq]ll need to specify
|
|
the format by writing \[dq]date-format\[dq] followed by a strptime-like
|
|
date parsing pattern, which must parse the date field values completely.
|
|
Examples:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# for dates like \[dq]11/06/2013\[dq]:
|
|
date-format %m/%d/%Y
|
|
\f[R]
|
|
.fi
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# for dates like \[dq]6/11/2013\[dq]. The - allows leading zeros to be optional.
|
|
date-format %-d/%-m/%Y
|
|
\f[R]
|
|
.fi
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# for dates like \[dq]2013-Nov-06\[dq]:
|
|
date-format %Y-%h-%d
|
|
\f[R]
|
|
.fi
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# for dates like \[dq]11/6/2013 11:32 PM\[dq]:
|
|
date-format %-m/%-d/%Y %l:%M %p
|
|
\f[R]
|
|
.fi
|
|
.SS \f[C]if\f[R]
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
if PATTERN
|
|
RULE
|
|
|
|
if
|
|
PATTERN
|
|
PATTERN
|
|
PATTERN
|
|
RULE
|
|
RULE
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
Conditional blocks apply one or more rules to CSV records which are
|
|
matched by any of the PATTERNs.
|
|
This allows transactions to be customised or categorised based on
|
|
patterns in the data.
|
|
.PP
|
|
A single pattern can be written on the same line as the \[dq]if\[dq]; or
|
|
multiple patterns can be written on the following lines, non-indented.
|
|
.PP
|
|
Patterns are case-insensitive regular expressions which try to match any
|
|
part of the whole CSV record.
|
|
It\[aq]s not yet possible to match within a specific field.
|
|
Note the CSV record they see is close but not identical to the one in
|
|
the CSV file; eg double quotes are removed, and the separator character
|
|
becomes comma.
|
|
.PP
|
|
After the patterns, there should be one or more rules to apply, all
|
|
indented by at least one space.
|
|
Three kinds of rule are allowed in conditional blocks:
|
|
.IP \[bu] 2
|
|
field assignments (to set a field\[aq]s value)
|
|
.IP \[bu] 2
|
|
skip (to skip the matched CSV record)
|
|
.IP \[bu] 2
|
|
end (to skip all remaining CSV records).
|
|
.PP
|
|
Examples:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# if the CSV record contains \[dq]groceries\[dq], set account2 to \[dq]expenses:groceries\[dq]
|
|
if groceries
|
|
account2 expenses:groceries
|
|
\f[R]
|
|
.fi
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# if the CSV record contains any of these patterns, set account2 and comment as shown
|
|
if
|
|
monthly service fee
|
|
atm transaction fee
|
|
banking thru software
|
|
account2 expenses:business:banking
|
|
comment XXX deductible ? check it
|
|
\f[R]
|
|
.fi
|
|
.SS \f[C]end\f[R]
|
|
.PP
|
|
As mentioned above, this rule can be used inside conditional blocks
|
|
(only) to cause hledger to stop reading CSV records and proceed with
|
|
command execution.
|
|
Eg:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# ignore everything following the first empty record
|
|
if ,,,,
|
|
end
|
|
\f[R]
|
|
.fi
|
|
.SS \f[C]include\f[R]
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
include RULESFILE
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
Include another CSV rules file at this point, as if it were written
|
|
inline.
|
|
\f[C]RULESFILE\f[R] is an absolute file path or a path relative to the
|
|
current file\[aq]s directory.
|
|
.PP
|
|
This can be useful eg for reusing common rules in several rules files:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# someaccount.csv.rules
|
|
|
|
## someaccount-specific rules
|
|
fields date,description,amount
|
|
account1 some:account
|
|
account2 some:misc
|
|
|
|
## common rules
|
|
include categorisation.rules
|
|
\f[R]
|
|
.fi
|
|
.SS \f[C]newest-first\f[R]
|
|
.PP
|
|
hledger always sorts the generated transactions by date.
|
|
Transactions on the same date should appear in the same order as their
|
|
CSV records, as hledger can usually auto-detect whether the CSV\[aq]s
|
|
normal order is oldest first or newest first.
|
|
But if all of the following are true:
|
|
.IP \[bu] 2
|
|
the CSV might sometimes contain just one day of data (all records having
|
|
the same date)
|
|
.IP \[bu] 2
|
|
the CSV records are normally in reverse chronological order (newest
|
|
first)
|
|
.IP \[bu] 2
|
|
and you care about preserving the order of same-day transactions
|
|
.PP
|
|
you should add the \f[C]newest-first\f[R] rule as a hint.
|
|
Eg:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# tell hledger explicitly that the CSV is normally newest-first
|
|
newest-first
|
|
\f[R]
|
|
.fi
|
|
.SH EXAMPLES
|
|
.PP
|
|
A more complete example, generating three-posting transactions:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# hledger CSV rules for amazon.com order history
|
|
|
|
# sample:
|
|
# \[dq]Date\[dq],\[dq]Type\[dq],\[dq]To/From\[dq],\[dq]Name\[dq],\[dq]Status\[dq],\[dq]Amount\[dq],\[dq]Fees\[dq],\[dq]Transaction ID\[dq]
|
|
# \[dq]Jul 29, 2012\[dq],\[dq]Payment\[dq],\[dq]To\[dq],\[dq]Adapteva, Inc.\[dq],\[dq]Completed\[dq],\[dq]$25.00\[dq],\[dq]$0.00\[dq],\[dq]17LA58JSK6PRD4HDGLNJQPI1PB9N8DKPVHL\[dq]
|
|
|
|
# skip one header line
|
|
skip 1
|
|
|
|
# name the csv fields (and assign the transaction\[aq]s date, amount and code)
|
|
fields date, _, toorfrom, name, amzstatus, amount1, fees, code
|
|
|
|
# how to parse the date
|
|
date-format %b %-d, %Y
|
|
|
|
# combine two fields to make the description
|
|
description %toorfrom %name
|
|
|
|
# save these fields as tags
|
|
comment status:%amzstatus
|
|
|
|
# set the base account for all transactions
|
|
account1 assets:amazon
|
|
|
|
# flip the sign on the amount
|
|
amount -%amount
|
|
|
|
# Put fees in a separate posting
|
|
amount3 %fees
|
|
comment3 fees
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
For more examples, see Convert CSV files.
|
|
.SH TIPS
|
|
.SS Reading multiple CSV files
|
|
.PP
|
|
You can read multiple CSV files at once using multiple \f[C]-f\f[R]
|
|
arguments on the command line.
|
|
hledger will look for a correspondingly-named rules file for each CSV
|
|
file.
|
|
If you use the \f[C]--rules-file\f[R] option, that rules file will be
|
|
used for all the CSV files.
|
|
.SS Deduplicating, importing
|
|
.PP
|
|
When you download a CSV file repeatedly, eg to get your latest bank
|
|
transactions, the new file may contain some of the same records as the
|
|
old one.
|
|
The print --new command is one simple way to detect just the new
|
|
transactions.
|
|
Or better still, the import command appends those new transactions to
|
|
your main journal.
|
|
This is the easiest way to import CSV data.
|
|
Eg, after downloading your latest CSV files:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ hledger import *.csv [--dry]
|
|
\f[R]
|
|
.fi
|
|
.SS Other import methods
|
|
.PP
|
|
A number of other tools and workflows, hledger-specific and otherwise,
|
|
exist for converting, deduplicating, classifying and managing CSV data.
|
|
See:
|
|
.IP \[bu] 2
|
|
https://hledger.org -> sidebar -> real world setups
|
|
.IP \[bu] 2
|
|
https://plaintextaccounting.org -> data import/conversion
|
|
.SS Valid CSV
|
|
.PP
|
|
hledger accepts CSV conforming to RFC 4180.
|
|
Some things to note when values are enclosed in quotes:
|
|
.IP \[bu] 2
|
|
you must use double quotes (not single quotes)
|
|
.IP \[bu] 2
|
|
spaces outside the quotes are not allowed
|
|
.SS Other separator characters
|
|
.PP
|
|
With the \f[C]--separator \[aq]CHAR\[aq]\f[R] option, hledger will
|
|
expect the separator to be CHAR instead of a comma.
|
|
Ie it will read other \[dq]Character Separated Values\[dq] formats, such
|
|
as TSV (Tab Separated Values).
|
|
Note: on the command line, use a real tab character in quotes, not Eg:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ hledger -f foo.tsv --separator \[aq] \[aq] print
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
(Experimental.)
|
|
.SS Setting amounts
|
|
.PP
|
|
A posting amount can be set in one of these ways:
|
|
.IP \[bu] 2
|
|
by assigning (with a fields list or field assigment) to
|
|
\f[C]amountN\f[R] (posting N\[aq]s amount) or \f[C]amount\f[R] (posting
|
|
1\[aq]s amount)
|
|
.IP \[bu] 2
|
|
by assigning to \f[C]amountN-in\f[R] and \f[C]amountN-out\f[R] (or
|
|
\f[C]amount-in\f[R] and \f[C]amount-out\f[R]).
|
|
For each CSV record, whichever of these has a non-zero value will be
|
|
used, with appropriate sign.
|
|
If both contain a non-zero value, this may not work.
|
|
.IP \[bu] 2
|
|
by assigning to \f[C]balanceN\f[R] (or \f[C]balance\f[R]) instead of the
|
|
above, setting the amount indirectly via a balance assignment.
|
|
.PP
|
|
There is some special handling for sign in amounts:
|
|
.IP \[bu] 2
|
|
If an amount value is parenthesised, it will be de-parenthesised and
|
|
sign-flipped.
|
|
.IP \[bu] 2
|
|
If an amount value begins with a double minus sign, those cancel out and
|
|
are removed.
|
|
.PP
|
|
If the currency/commodity symbol is provided as a separate CSV field,
|
|
you can assign it to \f[C]currency\f[R] (affects all posting amounts) or
|
|
\f[C]currencyN\f[R] (affects just posting N\[aq]s amount).
|
|
The symbol will be prepended to the amount.
|
|
Or for more control, you can set both currency symbol and amount with a
|
|
field assignment, eg:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
fields date,description,currency,amount
|
|
# add currency symbol on the right:
|
|
amount %amount %currency
|
|
\f[R]
|
|
.fi
|
|
.SS Referencing other fields
|
|
.PP
|
|
In field assignments, you can interpolate only CSV fields, not hledger
|
|
fields.
|
|
In the example below, there\[aq]s both a CSV field and a hledger field
|
|
named amount1, but %amount1 always means the CSV field, not the hledger
|
|
field:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
# Name the third CSV field \[dq]amount1\[dq]
|
|
fields date,description,amount1
|
|
|
|
# Set hledger\[aq]s amount1 to the CSV amount1 field followed by USD
|
|
amount1 %amount1 USD
|
|
|
|
# Set comment to the CSV amount1 (not the amount1 assigned above)
|
|
comment %amount1
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
Here, since there\[aq]s no CSV amount1 field, %amount1 will produce a
|
|
literal \[dq]amount1\[dq]:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
fields date,description,csvamount
|
|
amount1 %csvamount USD
|
|
# Can\[aq]t interpolate amount1 here
|
|
comment %amount1
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
When there are multiple field assignments to the same hledger field,
|
|
only the last one takes effect.
|
|
Here, comment\[aq]s value will be be B, or C if \[dq]something\[dq] is
|
|
matched, but never A:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
comment A
|
|
comment B
|
|
if something
|
|
comment C
|
|
\f[R]
|
|
.fi
|
|
.SS How CSV rules are evaluated
|
|
.PP
|
|
Here\[aq]s how to think of CSV rules being evaluated (if you really need
|
|
to).
|
|
First,
|
|
.IP \[bu] 2
|
|
include - all includes are inlined, from top to bottom, depth first.
|
|
(At each include point the file is inlined and scanned for further
|
|
includes, before proceeding.)
|
|
.PP
|
|
Then \[dq]global\[dq] rules are evaluated, top to bottom.
|
|
If a rule is repeated, the last one wins:
|
|
.IP \[bu] 2
|
|
skip (at top level)
|
|
.IP \[bu] 2
|
|
date-format
|
|
.IP \[bu] 2
|
|
newest-first
|
|
.IP \[bu] 2
|
|
fields - names the CSV fields, optionally sets up initial assignments to
|
|
hledger fields
|
|
.PP
|
|
Then for each CSV record in turn:
|
|
.IP \[bu] 2
|
|
test all \f[C]if\f[R] blocks.
|
|
If any of them contain a \f[C]end\f[R] rule, skip all remaining CSV
|
|
records.
|
|
Otherwise if any of them contain a \f[C]skip\f[R] rule, skip that many
|
|
CSV records.
|
|
If there are multiple matched skip rules, the first one wins.
|
|
.IP \[bu] 2
|
|
collect all field assignments at top level and in matched if blocks.
|
|
When there are multiple assignments for a field, keep only the last one.
|
|
.IP \[bu] 2
|
|
compute a value for each hledger field - either the one that was
|
|
assigned to it (and interpolate the %CSVFIELDNAME references), or a
|
|
default
|
|
.IP \[bu] 2
|
|
generate a synthetic hledger transaction from these values, which
|
|
becomes part of the input to the hledger command that has been selected
|
|
.SS Valid transactions
|
|
.PP
|
|
hledger currently does not post-process and validate transactions
|
|
generated from CSV as thoroughly as transactions read from a journal
|
|
file.
|
|
This means that if your rules are wrong, you can generate invalid
|
|
transactions.
|
|
Or, amounts may not be displayed with a canonical display style.
|
|
.PP
|
|
So when setting up or adjusting CSV rules, you should check your results
|
|
visually with the print command.
|
|
You can pipe print\[aq]s output through hledger once more to validate
|
|
and canonicalise fully.
|
|
Eg:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
$ hledger -f some.csv print | hledger -f- print -I
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
(The -I/--ignore-assertions flag disables balance assertion checks,
|
|
usually needed when re-parsing print output.)
|
|
|
|
|
|
.SH "REPORTING BUGS"
|
|
Report bugs at http://bugs.hledger.org
|
|
(or on the #hledger IRC channel or hledger mail list)
|
|
|
|
.SH AUTHORS
|
|
Simon Michael <simon@joyful.com> and contributors
|
|
|
|
.SH COPYRIGHT
|
|
|
|
Copyright (C) 2007-2019 Simon Michael.
|
|
.br
|
|
Released under GNU GPL v3 or later.
|
|
|
|
.SH SEE ALSO
|
|
hledger(1), hledger\-ui(1), hledger\-web(1), hledger\-api(1),
|
|
hledger_csv(5), hledger_journal(5), hledger_timeclock(5), hledger_timedot(5),
|
|
ledger(1)
|
|
|
|
http://hledger.org
|