lib: drop the file format auto-detection feature

For a long time hledger has auto-detected the file format when it's not known, eg when reading from a file with unusual extension (like .dat or .txt), or from standard input (-f-), or when using the include directive (which currently ignores file extensions). Auto-detecting has been done by trying all readers until one succeeds. This could guess wrong in some cases, but it was so rare that it has been working fine. Recently, more conveniences have been added to timedot format, increasing its overlap with journal format, which makes this kind of auto-detection unreliable. Auto-detection and auto-detection failures are (probably) still pretty rare in practice. But when it does happen it's confusing, giving misleading errors or false successes (eg printing timedot entries instead of a journal error). For predictability and to minimise confusion, hledger no longer tries to guess; when there's no file extension or reader prefix, it assumes journal format. To specify one of the other formats, you must use a standard file extension (.timeclock, .timedot, .csv, .ssv, .tsv), or a reader prefix (-f csv:foo.txt, -f timedot:-). For now, the include directive still tries to autodetect (journal/timeclock/timedot), and this can't be overridden; it will be fixed later. Experimental; testing and feedback welcome.
2024-09-19 18:29:36 +03:00 · 2020-02-29 09:17:39 -08:00 · 2020-02-29 09:17:39 -08:00 · b1f3880c3d
commit b1f3880c3d
parent 32eb839eac
4 changed files with 49 additions and 64 deletions
--- a/hledger-lib/Hledger/Read.hs
+++ b/hledger-lib/Hledger/Read.hs
@ -98,7 +98,8 @@ readers = [
 readerNames :: [String]
 readerNames = map rFormat readers

-- | Read a Journal from the given text trying all readers in turn, or throw an error.
+-- | Read a Journal from the given text, assuming journal format; or
+-- throw an error.
 readJournal' :: Text -> IO Journal
 readJournal' t = readJournal def Nothing t >>= either error' return

@ -106,41 +107,41 @@ readJournal' t = readJournal def Nothing t >>= either error' return
 --
 -- Read a Journal from some text, or return an error message.
 --
-- The reader (data format) is chosen based on a recognised file name extension in @mfile@ (if provided).
-- If it does not identify a known reader, all built-in readers are tried in turn
-- (returning the first one's error message if none of them succeed).
+-- The reader (data format) is chosen based on, in this order:
 --
-- Input ioptions (@iopts@) specify CSV conversion rules file to help convert CSV data,
-- enable or disable balance assertion checking and automated posting generation.
+-- - a reader name provided in @iopts@
 --
+-- - a reader prefix in the @mfile@ path
+--
+-- - a file extension in @mfile@
+--
+-- If none of these is available, or if the reader name is unrecognised,
+-- we use the journal reader. (We used to try all readers in this case;
+-- since hledger 1.17, we prefer predictability.)
 readJournal :: InputOpts -> Maybe FilePath -> Text -> IO (Either String Journal)
-readJournal iopts mfile txt =
-  tryReaders iopts mfile specifiedorallreaders txt
+readJournal iopts mpath txt = do
+  dbg1IO "trying reader" (rFormat r)
+  ej <- (runExceptT . (rParser r) iopts (fromMaybe "(string)" mpath)) txt
+  dbg1IO "reader result" (' ':show ej)
+  return ej
  where
-    specifiedorallreaders = maybe stablereaders (:[]) $ findReader (mformat_ iopts) mfile
-    stablereaders = filter (not.rExperimental) readers
+    r = fromMaybe JournalReader.reader $ findReader (mformat_ iopts) mpath

-- | Try to parse the given text to a Journal using each reader in turn,
-- returning the first success, or if all of them fail, the first error message.
+-- | @findReader mformat mpath@
 --
-- Input options specify CSV conversion rules file to help convert CSV data,
-- enable or disable balance assertion checking and automated posting generation.
--
-tryReaders :: InputOpts -> Maybe FilePath -> [Reader] -> Text -> IO (Either String Journal)
-tryReaders iopts mpath readers txt = firstSuccessOrFirstError [] readers
+-- Find the reader named by @mformat@, if provided.
+-- Or, if a file path is provided, find the first reader that handles
+-- its file extension, if any.
+findReader :: Maybe StorageFormat -> Maybe FilePath -> Maybe Reader
+findReader Nothing Nothing     = Nothing
+findReader (Just fmt) _        = headMay [r | r <- readers, rFormat r == fmt]
+findReader Nothing (Just path) =
+  case prefix of
+    Just fmt -> headMay [r | r <- readers, rFormat r == fmt]
+    Nothing  -> headMay [r | r <- readers, ext `elem` rExtensions r]
  where
-    -- TODO: #1087 when parsing csv with -f -, if the csv (rules) parser fails, 
-    -- we would rather see that error, not the one from the journal parser
-    firstSuccessOrFirstError :: [String] -> [Reader] -> IO (Either String Journal)
-    firstSuccessOrFirstError [] []        = return $ Left "no readers found"
-    firstSuccessOrFirstError errs (r:rs) = do
-      dbg1IO "trying reader" (rFormat r)
-      result <- (runExceptT . (rParser r) iopts path) txt
-      dbg1IO "reader result" $ either id show result
-      case result of Right j -> return $ Right j                        -- success!
-                     Left e  -> firstSuccessOrFirstError (errs++[e]) rs -- keep trying
-    firstSuccessOrFirstError (e:_) []    = return $ Left e              -- none left, return first error
-    path = fromMaybe "(string)" mpath
+    (prefix,path') = splitReaderPrefix path
+    ext            = drop 1 $ takeExtension path'

 -- | Read the default journal file specified by the environment, or raise an error.
 defaultJournal :: IO Journal
@ -193,7 +194,7 @@ readJournalFiles iopts =
 -- the @mformat_@ specified in the input options, if any;
 -- the file path's READER: prefix, if any;
 -- a recognised file name extension.
-- if none of these identify a known reader, all built-in readers are tried in turn.
+-- if none of these identify a known reader, the journal reader is used.
 --
 -- The input options can also configure balance assertion checking, automated posting
 -- generation, a rules file for converting CSV data, etc.
@ -266,22 +267,6 @@ newJournalContent = do
  d <- getCurrentDay
  return $ printf "; journal created %s by hledger\n" (show d)

-- | @findReader mformat mpath@
--
-- Find the reader named by @mformat@, if provided.
-- Or, if a file path is provided, find the first reader that handles
-- its file extension, if any.
-findReader :: Maybe StorageFormat -> Maybe FilePath -> Maybe Reader
-findReader Nothing Nothing     = Nothing
-findReader (Just fmt) _        = headMay [r | r <- readers, rFormat r == fmt]
-findReader Nothing (Just path) =
-  case prefix of
-    Just fmt -> headMay [r | r <- readers, rFormat r == fmt]
-    Nothing  -> headMay [r | r <- readers, ext `elem` rExtensions r]
-  where
-    (prefix,path') = splitReaderPrefix path
-    ext            = drop 1 $ takeExtension path'
-
 -- A "LatestDates" is zero or more copies of the same date,
 -- representing the latest transaction date read from a file,
 -- and how many transactions there were on that date.
--- a/tests/journal/parse-errors.test
+++ b/tests/journal/parse-errors.test
@ -22,15 +22,12 @@ expecting date separator or digit
 2018/1/1
  a  1

-# 2. When read from stdin, this example actually passes because hledger tries all readers.
-# If they all failed, it would show the error from the first (journal reader).
-# But in this case the timedot reader can parse it.
+# 2. When read from stdin with no reader prefix, the journal reader is used,
+# and fails here. (Before 1.17, all readers were tried and the timedot reader
+# would succeed.)
 $ hledger -f - print
->
-2018-01-01 *
-    (a)            1.00
-
->=
+>2 /could not balance/
+>=1

 # 3. So in these tests we must sometimes force the desired format, like so.
 # Now we see the error from the journal reader.
--- a/tests/timeclock.test
+++ b/tests/timeclock.test
@ -1,4 +1,5 @@
-# 1. a timeclock session is parsed as a similarly-named transaction with one virtual posting
+# 1. a timeclock session is parsed as a similarly-named transaction with one virtual posting.
+# Note since 1.17 we need to specify stdin's format explicitly.
 <
 i 2009/1/1 08:00:00
 o 2009/1/1 09:00:00 stuff on checkout record  is ignored
@ -7,7 +8,7 @@ i 2009/1/2 08:00:00 account name
 o 2009/1/2 09:00:00
 i 2009/1/3 08:00:00 some:account name  and a description
 o 2009/1/3 09:00:00
-$ hledger -f - print
+$ hledger -f timeclock:- print
 >
 2009-01-01 * 08:00-09:00
    ()           1.00h
@ -24,14 +25,14 @@ $ hledger -f - print
 # For a missing clock-out, now is implied
 <
 i 2020/1/1 08:00
-$ hledger -f - balance
+$ hledger -f timeclock:- balance
 > /./
 >= 0

 # For a log not starting with clock-out, print error
 <
 o 2020/1/1 08:00
-$ hledger -f - balance
+$ hledger -f timeclock:- balance
 >2 /line 1: expected timeclock code i/
 >= !0

@ -39,7 +40,7 @@ $ hledger -f - balance
 <
 o 2020/1/1 08:00
 o 2020/1/1 09:00
-$ hledger -f - balance
+$ hledger -f timeclock:- balance
 >2 /line 1: expected timeclock code i/
 >= !0

@ -47,13 +48,13 @@ $ hledger -f - balance
 <
 i 2020/1/1 08:00
 i 2020/1/1 09:00
-$ hledger -f - balance
+$ hledger -f timeclock:- balance
 >2 /line 2: expected timeclock code o/
 >= !0

 ## TODO
 ## multi-day sessions get a new transaction for each day
-#hledger -f- print
+#hledger -ftimeclock:- print
 #<<<
 #i 2017/04/20 09:00:00 A
 #o 2017/04/20 17:00:00
@ -67,7 +68,7 @@ $ hledger -f - balance
 #
 ## unclosed sessions are automatically closed at report time
 ## TODO this output looks wrong
-#hledger -f- print
+#hledger -ftimeclock:- print
 #<<<
 #i 2017/04/20 09:00:00 A
 #o 2017/04/20 17:00:00
--- a/tests/timedot.test
+++ b/tests/timedot.test
@ -1,3 +1,5 @@
+# Note since 1.17 we need to specify stdin's format explicitly.
+
 # 1. basic timedot entry
 <
 # comment
@ -7,7 +9,7 @@
 a:aa  1
 b:bb  2

-$ hledger -f- print
+$ hledger -ftimedot:- print
 2020-01-01 *
    (a:aa)            1.00

@ -21,7 +23,7 @@ $ hledger -f- print
 * 2020-01-01
 ** a:aa  1

-$ hledger -f- print
+$ hledger -ftimedot:- print
 2020-01-01 *
    (a:aa)            1.00