imp:import: support -s/--strict properly (fix #2113)

hledger import -s now runs strict checks on an in-memory copy of the updated journal, before updating the journal file; if strict checks fail, nothing is written to disk. And hledger import now does not update any .latest files until it has run without error (no failing strict checks, no failure while writing the journal file). This makes it more idempotent, so you can run it again after fixing problems.
2024-12-28 12:54:07 +03:00 · 2023-11-16 21:59:42 -10:00 · 2023-11-16 21:59:42 -10:00 · fba297f705
commit fba297f705
parent e92ab28cce
2 changed files with 57 additions and 30 deletions
--- a/hledger/Hledger/Cli/Commands/Import.hs
+++ b/hledger/Hledger/Cli/Commands/Import.hs
@ -1,5 +1,7 @@
 {-# LANGUAGE OverloadedStrings #-}
 {-# LANGUAGE TemplateHaskell #-}
+{-# LANGUAGE MultiWayIf #-}
+{-# LANGUAGE NamedFieldPuns #-}

 module Hledger.Cli.Commands.Import (
  importmode
@ -42,30 +44,52 @@ importcmd opts@CliOpts{rawopts_=rawopts,inputopts_=iopts} j = do
          Nothing -> Just inferredStyles
          Just inputStyles -> Just $ inputStyles <> inferredStyles

-    iopts' = iopts{new_=True, new_save_=not dryrun, balancingopts_=defbalancingopts{commodity_styles_= combinedStyles}}
+    iopts' = iopts{
+      new_=True,  -- read only new transactions since last time
+      new_save_=False,  -- defer saving .latest files until the end
+      strict_=False,  -- defer strict checks until the end
+      balancingopts_=defbalancingopts{commodity_styles_= combinedStyles}  -- use amount styles from both when balancing txns
+      }
+
  case inputfiles of
    [] -> error' "please provide one or more input files as arguments"  -- PARTIAL:
    fs -> do
-      enewj <- runExceptT $ readJournalFiles iopts' fs
-      case enewj of
-        Left e     -> error' e
-        Right newj ->
+      enewjandlatestdates <- runExceptT $ readJournalFilesAndLatestDates iopts' fs
+      case enewjandlatestdates of
+        Left err -> error' err
+        Right (newj, latestdates) ->
          case sortOn tdate $ jtxns newj of
            -- with --dry-run the output should be valid journal format, so messages have ; prepended
            [] -> do
              -- in this case, we vary the output depending on --dry-run, which is a bit awkward
              let semicolon = if dryrun then "; " else "" :: String
              printf "%sno new transactions found in %s\n\n" semicolon inputstr
-            newts | dryrun -> do
-              printf "; would import %d new transactions from %s:\n\n" (length newts) inputstr
-              -- TODO how to force output here ?
-              -- length (jtxns newj) `seq` print' opts{rawopts_=("explicit",""):rawopts} newj
-              mapM_ (T.putStr . showTransaction) newts
+
            newts | catchup -> do
              printf "marked %s as caught up, skipping %d unimported transactions\n\n" inputstr (length newts)
+
            newts -> do
-              -- XXX This writes unix line endings (\n), some at least,
-              -- even if the file uses dos line endings (\r\n), which could leave
-              -- mixed line endings in the file. See also writeFileWithBackupIfChanged.
-              foldM_ (`journalAddTransaction` opts) j newts  -- gets forced somehow.. (how ?)
-              printf "imported %d new transactions from %s to %s\n" (length newts) inputstr (journalFilePath j)
+              if dryrun
+              then do
+                -- first show imported txns
+                printf "; would import %d new transactions from %s:\n\n" (length newts) inputstr
+                mapM_ (T.putStr . showTransaction) newts
+                -- then check the whole journal with them added, if in strict mode
+                when (strict_ iopts) $ strictChecks
+
+              else do
+                -- first check the whole journal with them added, if in strict mode
+                when (strict_ iopts) $ strictChecks
+                -- then add (append) the transactions to the main journal file
+                -- XXX This writes unix line endings (\n), some at least,
+                -- even if the file uses dos line endings (\r\n), which could leave
+                -- mixed line endings in the file. See also writeFileWithBackupIfChanged.
+                foldM_ (`journalAddTransaction` opts) j newts  -- gets forced somehow.. (how ?)
+                printf "imported %d new transactions from %s to %s\n" (length newts) inputstr (journalFilePath j)
+                -- and finally update the .latest files
+                mapM_ (saveLatestDates latestdates . snd . splitReaderPrefix) fs
+
+              where
+                -- add the new transactions to the journal in memory and check the whole thing
+                strictChecks = either fail pure $ journalStrictChecks j'
+                  where j' = foldl' (flip addTransaction) j newts
--- a/hledger/Hledger/Cli/Commands/Import.md
+++ b/hledger/Hledger/Cli/Commands/Import.md
@ -1,9 +1,10 @@
 ## import

-Read new transactions added to each FILE since last run, and add them to
-the journal. Or with --dry-run, just print the transactions 
-that would be added. Or with --catchup, just mark all of the FILEs'
-transactions as imported, without actually importing any.
+Read new transactions added to each FILE provided as arguments since
+last run, and add them to the journal.
+Or with --dry-run, just print the transactions that would be added.
+Or with --catchup, just mark all of the FILEs' current transactions 
+as imported, without importing them.

 _FLAGS

@ -22,14 +23,14 @@ most common import source, and these docs focus on that case.

 ### Deduplication

-As a convenience `import` does *deduplication* while reading transactions.
-This does not mean "ignore transactions that look the same",
-but rather "ignore transactions that have been seen before".
-This is intended for when you are periodically importing foreign data
-which may contain already-imported transactions.
-So eg, if every day you download bank CSV files containing redundant data,
-you can safely run `hledger import bank.csv` and only new transactions will be imported.
-(`import` is idempotent.)
+`import` does *time-based deduplication*, to detect only the new
+transactions since the last successful import.
+(This does not mean "ignore transactions that look the same",
+but rather "ignore transactions that have been seen before".)
+This is intended for when you are periodically importing downloaded data,
+which may overlap with previous downloads.
+Eg if every week (or every day) you download a bank's last three months of CSV data,
+you can safely run `hledger import thebank.csv` each time and only new transactions will be imported.

 Since the items being read (CSV records, eg) often do not come with
 unique identifiers, hledger detects new transactions by date, assuming
@ -46,9 +47,11 @@ if you import often, the new transactions will be few, so less likely
 to be the ones affected).

 hledger remembers the latest date processed in each input file by
-saving a hidden ".latest" state file in the same directory. Eg when
-reading `finance/bank.csv`, it will look for and update the
-`finance/.latest.bank.csv` state file. 
+saving a hidden ".latest.FILE" file in FILE's directory
+(after a succesful import).
+
+Eg when reading `finance/bank.csv`, it will look for and update the
+`finance/.latest.bank.csv` state file.
 The format is simple: one or more lines containing the
 same ISO-format date (YYYY-MM-DD), meaning "I have processed
 transactions up to this date, and this many of them on that date."