imp:import: support -s/--strict properly (fix #2113)

hledger import -s now runs strict checks on an in-memory copy of the
updated journal, before updating the journal file; if strict checks
fail, nothing is written to disk.

And hledger import now does not update any .latest files until it has
run without error (no failing strict checks, no failure while writing
the journal file). This makes it more idempotent, so you can run it
again after fixing problems.
This commit is contained in:
Simon Michael 2023-11-16 21:59:42 -10:00
parent e92ab28cce
commit fba297f705
2 changed files with 57 additions and 30 deletions

View File

@ -1,5 +1,7 @@
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE NamedFieldPuns #-}
module Hledger.Cli.Commands.Import (
importmode
@ -42,30 +44,52 @@ importcmd opts@CliOpts{rawopts_=rawopts,inputopts_=iopts} j = do
Nothing -> Just inferredStyles
Just inputStyles -> Just $ inputStyles <> inferredStyles
iopts' = iopts{new_=True, new_save_=not dryrun, balancingopts_=defbalancingopts{commodity_styles_= combinedStyles}}
iopts' = iopts{
new_=True, -- read only new transactions since last time
new_save_=False, -- defer saving .latest files until the end
strict_=False, -- defer strict checks until the end
balancingopts_=defbalancingopts{commodity_styles_= combinedStyles} -- use amount styles from both when balancing txns
}
case inputfiles of
[] -> error' "please provide one or more input files as arguments" -- PARTIAL:
fs -> do
enewj <- runExceptT $ readJournalFiles iopts' fs
case enewj of
Left e -> error' e
Right newj ->
enewjandlatestdates <- runExceptT $ readJournalFilesAndLatestDates iopts' fs
case enewjandlatestdates of
Left err -> error' err
Right (newj, latestdates) ->
case sortOn tdate $ jtxns newj of
-- with --dry-run the output should be valid journal format, so messages have ; prepended
[] -> do
-- in this case, we vary the output depending on --dry-run, which is a bit awkward
let semicolon = if dryrun then "; " else "" :: String
printf "%sno new transactions found in %s\n\n" semicolon inputstr
newts | dryrun -> do
printf "; would import %d new transactions from %s:\n\n" (length newts) inputstr
-- TODO how to force output here ?
-- length (jtxns newj) `seq` print' opts{rawopts_=("explicit",""):rawopts} newj
mapM_ (T.putStr . showTransaction) newts
newts | catchup -> do
printf "marked %s as caught up, skipping %d unimported transactions\n\n" inputstr (length newts)
newts -> do
-- XXX This writes unix line endings (\n), some at least,
-- even if the file uses dos line endings (\r\n), which could leave
-- mixed line endings in the file. See also writeFileWithBackupIfChanged.
foldM_ (`journalAddTransaction` opts) j newts -- gets forced somehow.. (how ?)
printf "imported %d new transactions from %s to %s\n" (length newts) inputstr (journalFilePath j)
if dryrun
then do
-- first show imported txns
printf "; would import %d new transactions from %s:\n\n" (length newts) inputstr
mapM_ (T.putStr . showTransaction) newts
-- then check the whole journal with them added, if in strict mode
when (strict_ iopts) $ strictChecks
else do
-- first check the whole journal with them added, if in strict mode
when (strict_ iopts) $ strictChecks
-- then add (append) the transactions to the main journal file
-- XXX This writes unix line endings (\n), some at least,
-- even if the file uses dos line endings (\r\n), which could leave
-- mixed line endings in the file. See also writeFileWithBackupIfChanged.
foldM_ (`journalAddTransaction` opts) j newts -- gets forced somehow.. (how ?)
printf "imported %d new transactions from %s to %s\n" (length newts) inputstr (journalFilePath j)
-- and finally update the .latest files
mapM_ (saveLatestDates latestdates . snd . splitReaderPrefix) fs
where
-- add the new transactions to the journal in memory and check the whole thing
strictChecks = either fail pure $ journalStrictChecks j'
where j' = foldl' (flip addTransaction) j newts

View File

@ -1,9 +1,10 @@
## import
Read new transactions added to each FILE since last run, and add them to
the journal. Or with --dry-run, just print the transactions
that would be added. Or with --catchup, just mark all of the FILEs'
transactions as imported, without actually importing any.
Read new transactions added to each FILE provided as arguments since
last run, and add them to the journal.
Or with --dry-run, just print the transactions that would be added.
Or with --catchup, just mark all of the FILEs' current transactions
as imported, without importing them.
_FLAGS
@ -22,14 +23,14 @@ most common import source, and these docs focus on that case.
### Deduplication
As a convenience `import` does *deduplication* while reading transactions.
This does not mean "ignore transactions that look the same",
but rather "ignore transactions that have been seen before".
This is intended for when you are periodically importing foreign data
which may contain already-imported transactions.
So eg, if every day you download bank CSV files containing redundant data,
you can safely run `hledger import bank.csv` and only new transactions will be imported.
(`import` is idempotent.)
`import` does *time-based deduplication*, to detect only the new
transactions since the last successful import.
(This does not mean "ignore transactions that look the same",
but rather "ignore transactions that have been seen before".)
This is intended for when you are periodically importing downloaded data,
which may overlap with previous downloads.
Eg if every week (or every day) you download a bank's last three months of CSV data,
you can safely run `hledger import thebank.csv` each time and only new transactions will be imported.
Since the items being read (CSV records, eg) often do not come with
unique identifiers, hledger detects new transactions by date, assuming
@ -46,9 +47,11 @@ if you import often, the new transactions will be few, so less likely
to be the ones affected).
hledger remembers the latest date processed in each input file by
saving a hidden ".latest" state file in the same directory. Eg when
reading `finance/bank.csv`, it will look for and update the
`finance/.latest.bank.csv` state file.
saving a hidden ".latest.FILE" file in FILE's directory
(after a succesful import).
Eg when reading `finance/bank.csv`, it will look for and update the
`finance/.latest.bank.csv` state file.
The format is simple: one or more lines containing the
same ISO-format date (YYYY-MM-DD), meaning "I have processed
transactions up to this date, and this many of them on that date."