lib: drop the file format auto-detection feature

For a long time hledger has auto-detected the file format when it's
not known, eg when reading from a file with unusual extension (like
.dat or .txt), or from standard input (-f-), or when using the include
directive (which currently ignores file extensions).

Auto-detecting has been done by trying all readers until one succeeds.
This could guess wrong in some cases, but it was so rare that it has
been working fine.

Recently, more conveniences have been added to timedot format,
increasing its overlap with journal format, which makes this kind of
auto-detection unreliable.

Auto-detection and auto-detection failures are (probably) still pretty
rare in practice. But when it does happen it's confusing, giving
misleading errors or false successes (eg printing timedot entries
instead of a journal error).

For predictability and to minimise confusion, hledger no longer tries
to guess; when there's no file extension or reader prefix, it assumes
journal format. To specify one of the other formats, you must use a
standard file extension (.timeclock, .timedot, .csv, .ssv, .tsv), or a
reader prefix (-f csv:foo.txt, -f timedot:-).

For now, the include directive still tries to autodetect
(journal/timeclock/timedot), and this can't be overridden; it will be
fixed later.

Experimental; testing and feedback welcome.
This commit is contained in:
Simon Michael 2020-02-29 09:17:39 -08:00
parent 32eb839eac
commit b1f3880c3d
4 changed files with 49 additions and 64 deletions

View File

@ -98,7 +98,8 @@ readers = [
readerNames :: [String] readerNames :: [String]
readerNames = map rFormat readers readerNames = map rFormat readers
-- | Read a Journal from the given text trying all readers in turn, or throw an error. -- | Read a Journal from the given text, assuming journal format; or
-- throw an error.
readJournal' :: Text -> IO Journal readJournal' :: Text -> IO Journal
readJournal' t = readJournal def Nothing t >>= either error' return readJournal' t = readJournal def Nothing t >>= either error' return
@ -106,41 +107,41 @@ readJournal' t = readJournal def Nothing t >>= either error' return
-- --
-- Read a Journal from some text, or return an error message. -- Read a Journal from some text, or return an error message.
-- --
-- The reader (data format) is chosen based on a recognised file name extension in @mfile@ (if provided). -- The reader (data format) is chosen based on, in this order:
-- If it does not identify a known reader, all built-in readers are tried in turn
-- (returning the first one's error message if none of them succeed).
-- --
-- Input ioptions (@iopts@) specify CSV conversion rules file to help convert CSV data, -- - a reader name provided in @iopts@
-- enable or disable balance assertion checking and automated posting generation.
-- --
-- - a reader prefix in the @mfile@ path
--
-- - a file extension in @mfile@
--
-- If none of these is available, or if the reader name is unrecognised,
-- we use the journal reader. (We used to try all readers in this case;
-- since hledger 1.17, we prefer predictability.)
readJournal :: InputOpts -> Maybe FilePath -> Text -> IO (Either String Journal) readJournal :: InputOpts -> Maybe FilePath -> Text -> IO (Either String Journal)
readJournal iopts mfile txt = readJournal iopts mpath txt = do
tryReaders iopts mfile specifiedorallreaders txt dbg1IO "trying reader" (rFormat r)
ej <- (runExceptT . (rParser r) iopts (fromMaybe "(string)" mpath)) txt
dbg1IO "reader result" (' ':show ej)
return ej
where where
specifiedorallreaders = maybe stablereaders (:[]) $ findReader (mformat_ iopts) mfile r = fromMaybe JournalReader.reader $ findReader (mformat_ iopts) mpath
stablereaders = filter (not.rExperimental) readers
-- | Try to parse the given text to a Journal using each reader in turn, -- | @findReader mformat mpath@
-- returning the first success, or if all of them fail, the first error message.
-- --
-- Input options specify CSV conversion rules file to help convert CSV data, -- Find the reader named by @mformat@, if provided.
-- enable or disable balance assertion checking and automated posting generation. -- Or, if a file path is provided, find the first reader that handles
-- -- its file extension, if any.
tryReaders :: InputOpts -> Maybe FilePath -> [Reader] -> Text -> IO (Either String Journal) findReader :: Maybe StorageFormat -> Maybe FilePath -> Maybe Reader
tryReaders iopts mpath readers txt = firstSuccessOrFirstError [] readers findReader Nothing Nothing = Nothing
findReader (Just fmt) _ = headMay [r | r <- readers, rFormat r == fmt]
findReader Nothing (Just path) =
case prefix of
Just fmt -> headMay [r | r <- readers, rFormat r == fmt]
Nothing -> headMay [r | r <- readers, ext `elem` rExtensions r]
where where
-- TODO: #1087 when parsing csv with -f -, if the csv (rules) parser fails, (prefix,path') = splitReaderPrefix path
-- we would rather see that error, not the one from the journal parser ext = drop 1 $ takeExtension path'
firstSuccessOrFirstError :: [String] -> [Reader] -> IO (Either String Journal)
firstSuccessOrFirstError [] [] = return $ Left "no readers found"
firstSuccessOrFirstError errs (r:rs) = do
dbg1IO "trying reader" (rFormat r)
result <- (runExceptT . (rParser r) iopts path) txt
dbg1IO "reader result" $ either id show result
case result of Right j -> return $ Right j -- success!
Left e -> firstSuccessOrFirstError (errs++[e]) rs -- keep trying
firstSuccessOrFirstError (e:_) [] = return $ Left e -- none left, return first error
path = fromMaybe "(string)" mpath
-- | Read the default journal file specified by the environment, or raise an error. -- | Read the default journal file specified by the environment, or raise an error.
defaultJournal :: IO Journal defaultJournal :: IO Journal
@ -193,7 +194,7 @@ readJournalFiles iopts =
-- the @mformat_@ specified in the input options, if any; -- the @mformat_@ specified in the input options, if any;
-- the file path's READER: prefix, if any; -- the file path's READER: prefix, if any;
-- a recognised file name extension. -- a recognised file name extension.
-- if none of these identify a known reader, all built-in readers are tried in turn. -- if none of these identify a known reader, the journal reader is used.
-- --
-- The input options can also configure balance assertion checking, automated posting -- The input options can also configure balance assertion checking, automated posting
-- generation, a rules file for converting CSV data, etc. -- generation, a rules file for converting CSV data, etc.
@ -266,22 +267,6 @@ newJournalContent = do
d <- getCurrentDay d <- getCurrentDay
return $ printf "; journal created %s by hledger\n" (show d) return $ printf "; journal created %s by hledger\n" (show d)
-- | @findReader mformat mpath@
--
-- Find the reader named by @mformat@, if provided.
-- Or, if a file path is provided, find the first reader that handles
-- its file extension, if any.
findReader :: Maybe StorageFormat -> Maybe FilePath -> Maybe Reader
findReader Nothing Nothing = Nothing
findReader (Just fmt) _ = headMay [r | r <- readers, rFormat r == fmt]
findReader Nothing (Just path) =
case prefix of
Just fmt -> headMay [r | r <- readers, rFormat r == fmt]
Nothing -> headMay [r | r <- readers, ext `elem` rExtensions r]
where
(prefix,path') = splitReaderPrefix path
ext = drop 1 $ takeExtension path'
-- A "LatestDates" is zero or more copies of the same date, -- A "LatestDates" is zero or more copies of the same date,
-- representing the latest transaction date read from a file, -- representing the latest transaction date read from a file,
-- and how many transactions there were on that date. -- and how many transactions there were on that date.

View File

@ -22,15 +22,12 @@ expecting date separator or digit
2018/1/1 2018/1/1
a 1 a 1
# 2. When read from stdin, this example actually passes because hledger tries all readers. # 2. When read from stdin with no reader prefix, the journal reader is used,
# If they all failed, it would show the error from the first (journal reader). # and fails here. (Before 1.17, all readers were tried and the timedot reader
# But in this case the timedot reader can parse it. # would succeed.)
$ hledger -f - print $ hledger -f - print
> >2 /could not balance/
2018-01-01 * >=1
(a) 1.00
>=
# 3. So in these tests we must sometimes force the desired format, like so. # 3. So in these tests we must sometimes force the desired format, like so.
# Now we see the error from the journal reader. # Now we see the error from the journal reader.

View File

@ -1,4 +1,5 @@
# 1. a timeclock session is parsed as a similarly-named transaction with one virtual posting # 1. a timeclock session is parsed as a similarly-named transaction with one virtual posting.
# Note since 1.17 we need to specify stdin's format explicitly.
< <
i 2009/1/1 08:00:00 i 2009/1/1 08:00:00
o 2009/1/1 09:00:00 stuff on checkout record is ignored o 2009/1/1 09:00:00 stuff on checkout record is ignored
@ -7,7 +8,7 @@ i 2009/1/2 08:00:00 account name
o 2009/1/2 09:00:00 o 2009/1/2 09:00:00
i 2009/1/3 08:00:00 some:account name and a description i 2009/1/3 08:00:00 some:account name and a description
o 2009/1/3 09:00:00 o 2009/1/3 09:00:00
$ hledger -f - print $ hledger -f timeclock:- print
> >
2009-01-01 * 08:00-09:00 2009-01-01 * 08:00-09:00
() 1.00h () 1.00h
@ -24,14 +25,14 @@ $ hledger -f - print
# For a missing clock-out, now is implied # For a missing clock-out, now is implied
< <
i 2020/1/1 08:00 i 2020/1/1 08:00
$ hledger -f - balance $ hledger -f timeclock:- balance
> /./ > /./
>= 0 >= 0
# For a log not starting with clock-out, print error # For a log not starting with clock-out, print error
< <
o 2020/1/1 08:00 o 2020/1/1 08:00
$ hledger -f - balance $ hledger -f timeclock:- balance
>2 /line 1: expected timeclock code i/ >2 /line 1: expected timeclock code i/
>= !0 >= !0
@ -39,7 +40,7 @@ $ hledger -f - balance
< <
o 2020/1/1 08:00 o 2020/1/1 08:00
o 2020/1/1 09:00 o 2020/1/1 09:00
$ hledger -f - balance $ hledger -f timeclock:- balance
>2 /line 1: expected timeclock code i/ >2 /line 1: expected timeclock code i/
>= !0 >= !0
@ -47,13 +48,13 @@ $ hledger -f - balance
< <
i 2020/1/1 08:00 i 2020/1/1 08:00
i 2020/1/1 09:00 i 2020/1/1 09:00
$ hledger -f - balance $ hledger -f timeclock:- balance
>2 /line 2: expected timeclock code o/ >2 /line 2: expected timeclock code o/
>= !0 >= !0
## TODO ## TODO
## multi-day sessions get a new transaction for each day ## multi-day sessions get a new transaction for each day
#hledger -f- print #hledger -ftimeclock:- print
#<<< #<<<
#i 2017/04/20 09:00:00 A #i 2017/04/20 09:00:00 A
#o 2017/04/20 17:00:00 #o 2017/04/20 17:00:00
@ -67,7 +68,7 @@ $ hledger -f - balance
# #
## unclosed sessions are automatically closed at report time ## unclosed sessions are automatically closed at report time
## TODO this output looks wrong ## TODO this output looks wrong
#hledger -f- print #hledger -ftimeclock:- print
#<<< #<<<
#i 2017/04/20 09:00:00 A #i 2017/04/20 09:00:00 A
#o 2017/04/20 17:00:00 #o 2017/04/20 17:00:00

View File

@ -1,3 +1,5 @@
# Note since 1.17 we need to specify stdin's format explicitly.
# 1. basic timedot entry # 1. basic timedot entry
< <
# comment # comment
@ -7,7 +9,7 @@
a:aa 1 a:aa 1
b:bb 2 b:bb 2
$ hledger -f- print $ hledger -ftimedot:- print
2020-01-01 * 2020-01-01 *
(a:aa) 1.00 (a:aa) 1.00
@ -21,7 +23,7 @@ $ hledger -f- print
* 2020-01-01 * 2020-01-01
** a:aa 1 ** a:aa 1
$ hledger -f- print $ hledger -ftimedot:- print
2020-01-01 * 2020-01-01 *
(a:aa) 1.00 (a:aa) 1.00