lib: drop the file format auto-detection feature

For a long time hledger has auto-detected the file format when it's not known, eg when reading from a file with unusual extension (like .dat or .txt), or from standard input (-f-), or when using the include directive (which currently ignores file extensions). Auto-detecting has been done by trying all readers until one succeeds. This could guess wrong in some cases, but it was so rare that it has been working fine. Recently, more conveniences have been added to timedot format, increasing its overlap with journal format, which makes this kind of auto-detection unreliable. Auto-detection and auto-detection failures are (probably) still pretty rare in practice. But when it does happen it's confusing, giving misleading errors or false successes (eg printing timedot entries instead of a journal error). For predictability and to minimise confusion, hledger no longer tries to guess; when there's no file extension or reader prefix, it assumes journal format. To specify one of the other formats, you must use a standard file extension (.timeclock, .timedot, .csv, .ssv, .tsv), or a reader prefix (-f csv:foo.txt, -f timedot:-). For now, the include directive still tries to autodetect (journal/timeclock/timedot), and this can't be overridden; it will be fixed later. Experimental; testing and feedback welcome.
2024-11-10 05:39:31 +03:00 · 2020-02-29 09:17:39 -08:00 · 2020-02-29 09:17:39 -08:00 · b1f3880c3d
commit b1f3880c3d
parent 32eb839eac
4 changed files with 49 additions and 64 deletions
--- a/hledger-lib/Hledger/Read.hs
+++ b/hledger-lib/Hledger/Read.hs
@ -98,7 +98,8 @@ readers = [
 readerNames :: [String]
 readerNames = map rFormat readers
-- | Read a Journal from the given text trying all readers in turn, or throw an error.
+-- | Read a Journal from the given text, assuming journal format; or
 -- throw an error.
 readJournal' :: Text -> IO Journal
 readJournal' t = readJournal def Nothing t >>= either error' return
@ -106,41 +107,41 @@ readJournal' t = readJournal def Nothing t >>= either error' return
 --
 -- Read a Journal from some text, or return an error message.
 --
-- The reader (data format) is chosen based on a recognised file name extension in @mfile@ (if provided).
+-- The reader (data format) is chosen based on, in this order:
 -- If it does not identify a known reader, all built-in readers are tried in turn
 -- (returning the first one's error message if none of them succeed).
 --
-- Input ioptions (@iopts@) specify CSV conversion rules file to help convert CSV data,
+-- - a reader name provided in @iopts@
 -- enable or disable balance assertion checking and automated posting generation.
 --
 -- - a reader prefix in the @mfile@ path
 --
 -- - a file extension in @mfile@
 --
 -- If none of these is available, or if the reader name is unrecognised,
 -- we use the journal reader. (We used to try all readers in this case;
 -- since hledger 1.17, we prefer predictability.)
 readJournal :: InputOpts -> Maybe FilePath -> Text -> IO (Either String Journal)
-readJournal iopts mfile txt =
+readJournal iopts mpath txt = do
-  tryReaders iopts mfile specifiedorallreaders txt
+  dbg1IO "trying reader" (rFormat r)
  ej <- (runExceptT . (rParser r) iopts (fromMaybe "(string)" mpath)) txt
  dbg1IO "reader result" (' ':show ej)
  return ej
  where
-    specifiedorallreaders = maybe stablereaders (:[]) $ findReader (mformat_ iopts) mfile
+    r = fromMaybe JournalReader.reader $ findReader (mformat_ iopts) mpath
    stablereaders = filter (not.rExperimental) readers
-- | Try to parse the given text to a Journal using each reader in turn,
+-- | @findReader mformat mpath@
 -- returning the first success, or if all of them fail, the first error message.
 --
-- Input options specify CSV conversion rules file to help convert CSV data,
+-- Find the reader named by @mformat@, if provided.
-- enable or disable balance assertion checking and automated posting generation.
+-- Or, if a file path is provided, find the first reader that handles
--
+-- its file extension, if any.
-tryReaders :: InputOpts -> Maybe FilePath -> [Reader] -> Text -> IO (Either String Journal)
+findReader :: Maybe StorageFormat -> Maybe FilePath -> Maybe Reader
-tryReaders iopts mpath readers txt = firstSuccessOrFirstError [] readers
+findReader Nothing Nothing     = Nothing
 findReader (Just fmt) _        = headMay [r | r <- readers, rFormat r == fmt]
 findReader Nothing (Just path) =
  case prefix of
    Just fmt -> headMay [r | r <- readers, rFormat r == fmt]
    Nothing  -> headMay [r | r <- readers, ext `elem` rExtensions r]
  where
-    -- TODO: #1087 when parsing csv with -f -, if the csv (rules) parser fails, 
+    (prefix,path') = splitReaderPrefix path
-    -- we would rather see that error, not the one from the journal parser
+    ext            = drop 1 $ takeExtension path'
    firstSuccessOrFirstError :: [String] -> [Reader] -> IO (Either String Journal)
    firstSuccessOrFirstError [] []        = return $ Left "no readers found"
    firstSuccessOrFirstError errs (r:rs) = do
      dbg1IO "trying reader" (rFormat r)
      result <- (runExceptT . (rParser r) iopts path) txt
      dbg1IO "reader result" $ either id show result
      case result of Right j -> return $ Right j                        -- success!
                     Left e  -> firstSuccessOrFirstError (errs++[e]) rs -- keep trying
    firstSuccessOrFirstError (e:_) []    = return $ Left e              -- none left, return first error
    path = fromMaybe "(string)" mpath
 -- | Read the default journal file specified by the environment, or raise an error.
 defaultJournal :: IO Journal
@ -193,7 +194,7 @@ readJournalFiles iopts =
 -- the @mformat_@ specified in the input options, if any;
 -- the file path's READER: prefix, if any;
 -- a recognised file name extension.
-- if none of these identify a known reader, all built-in readers are tried in turn.
+-- if none of these identify a known reader, the journal reader is used.
 --
 -- The input options can also configure balance assertion checking, automated posting
 -- generation, a rules file for converting CSV data, etc.
@ -266,22 +267,6 @@ newJournalContent = do
  d <- getCurrentDay
  return $ printf "; journal created %s by hledger\n" (show d)
 -- | @findReader mformat mpath@
 --
 -- Find the reader named by @mformat@, if provided.
 -- Or, if a file path is provided, find the first reader that handles
 -- its file extension, if any.
 findReader :: Maybe StorageFormat -> Maybe FilePath -> Maybe Reader
 findReader Nothing Nothing     = Nothing
 findReader (Just fmt) _        = headMay [r | r <- readers, rFormat r == fmt]
 findReader Nothing (Just path) =
  case prefix of
    Just fmt -> headMay [r | r <- readers, rFormat r == fmt]
    Nothing  -> headMay [r | r <- readers, ext `elem` rExtensions r]
  where
    (prefix,path') = splitReaderPrefix path
    ext            = drop 1 $ takeExtension path'
 -- A "LatestDates" is zero or more copies of the same date,
 -- representing the latest transaction date read from a file,
 -- and how many transactions there were on that date.
--- a/tests/journal/parse-errors.test
+++ b/tests/journal/parse-errors.test
@ -22,15 +22,12 @@ expecting date separator or digit
 2018/1/1
  a  1
-# 2. When read from stdin, this example actually passes because hledger tries all readers.
+# 2. When read from stdin with no reader prefix, the journal reader is used,
-# If they all failed, it would show the error from the first (journal reader).
+# and fails here. (Before 1.17, all readers were tried and the timedot reader
-# But in this case the timedot reader can parse it.
+# would succeed.)
 $ hledger -f - print
->
+>2 /could not balance/
-2018-01-01 *
+>=1
    (a)            1.00
 >=
 # 3. So in these tests we must sometimes force the desired format, like so.
 # Now we see the error from the journal reader.
--- a/tests/timeclock.test
+++ b/tests/timeclock.test
@ -1,4 +1,5 @@
-# 1. a timeclock session is parsed as a similarly-named transaction with one virtual posting
+# 1. a timeclock session is parsed as a similarly-named transaction with one virtual posting.
 # Note since 1.17 we need to specify stdin's format explicitly.
 <
 i 2009/1/1 08:00:00
 o 2009/1/1 09:00:00 stuff on checkout record  is ignored
@ -7,7 +8,7 @@ i 2009/1/2 08:00:00 account name
 o 2009/1/2 09:00:00
 i 2009/1/3 08:00:00 some:account name  and a description
 o 2009/1/3 09:00:00
-$ hledger -f - print
+$ hledger -f timeclock:- print
 >
 2009-01-01 * 08:00-09:00
    ()           1.00h
@ -24,14 +25,14 @@ $ hledger -f - print
 # For a missing clock-out, now is implied
 <
 i 2020/1/1 08:00
-$ hledger -f - balance
+$ hledger -f timeclock:- balance
 > /./
 >= 0
 # For a log not starting with clock-out, print error
 <
 o 2020/1/1 08:00
-$ hledger -f - balance
+$ hledger -f timeclock:- balance
 >2 /line 1: expected timeclock code i/
 >= !0
@ -39,7 +40,7 @@ $ hledger -f - balance
 <
 o 2020/1/1 08:00
 o 2020/1/1 09:00
-$ hledger -f - balance
+$ hledger -f timeclock:- balance
 >2 /line 1: expected timeclock code i/
 >= !0
@ -47,13 +48,13 @@ $ hledger -f - balance
 <
 i 2020/1/1 08:00
 i 2020/1/1 09:00
-$ hledger -f - balance
+$ hledger -f timeclock:- balance
 >2 /line 2: expected timeclock code o/
 >= !0
 ## TODO
 ## multi-day sessions get a new transaction for each day
-#hledger -f- print
+#hledger -ftimeclock:- print
 #<<<
 #i 2017/04/20 09:00:00 A
 #o 2017/04/20 17:00:00
@ -67,7 +68,7 @@ $ hledger -f - balance
 #
 ## unclosed sessions are automatically closed at report time
 ## TODO this output looks wrong
-#hledger -f- print
+#hledger -ftimeclock:- print
 #<<<
 #i 2017/04/20 09:00:00 A
 #o 2017/04/20 17:00:00
--- a/tests/timedot.test
+++ b/tests/timedot.test
@ -1,3 +1,5 @@
 # Note since 1.17 we need to specify stdin's format explicitly.
 # 1. basic timedot entry
 <
 # comment
@ -7,7 +9,7 @@
 a:aa  1
 b:bb  2
-$ hledger -f- print
+$ hledger -ftimedot:- print
 2020-01-01 *
    (a:aa)            1.00
@ -21,7 +23,7 @@ $ hledger -f- print
 * 2020-01-01
 ** a:aa  1
-$ hledger -f- print
+$ hledger -ftimedot:- print
 2020-01-01 *
    (a:aa)            1.00