lib: drop the file format auto-detection feature

For a long time hledger has auto-detected the file format when it's
not known, eg when reading from a file with unusual extension (like
.dat or .txt), or from standard input (-f-), or when using the include
directive (which currently ignores file extensions).

Auto-detecting has been done by trying all readers until one succeeds.
This could guess wrong in some cases, but it was so rare that it has
been working fine.

Recently, more conveniences have been added to timedot format,
increasing its overlap with journal format, which makes this kind of
auto-detection unreliable.

Auto-detection and auto-detection failures are (probably) still pretty
rare in practice. But when it does happen it's confusing, giving
misleading errors or false successes (eg printing timedot entries
instead of a journal error).

For predictability and to minimise confusion, hledger no longer tries
to guess; when there's no file extension or reader prefix, it assumes
journal format. To specify one of the other formats, you must use a
standard file extension (.timeclock, .timedot, .csv, .ssv, .tsv), or a
reader prefix (-f csv:foo.txt, -f timedot:-).

For now, the include directive still tries to autodetect
(journal/timeclock/timedot), and this can't be overridden; it will be
fixed later.

Experimental; testing and feedback welcome.
This commit is contained in:
Simon Michael 2020-02-29 09:17:39 -08:00
parent 32eb839eac
commit b1f3880c3d
4 changed files with 49 additions and 64 deletions

View File

@ -98,7 +98,8 @@ readers = [
readerNames :: [String]
readerNames = map rFormat readers
-- | Read a Journal from the given text trying all readers in turn, or throw an error.
-- | Read a Journal from the given text, assuming journal format; or
-- throw an error.
readJournal' :: Text -> IO Journal
readJournal' t = readJournal def Nothing t >>= either error' return
@ -106,41 +107,41 @@ readJournal' t = readJournal def Nothing t >>= either error' return
--
-- Read a Journal from some text, or return an error message.
--
-- The reader (data format) is chosen based on a recognised file name extension in @mfile@ (if provided).
-- If it does not identify a known reader, all built-in readers are tried in turn
-- (returning the first one's error message if none of them succeed).
-- The reader (data format) is chosen based on, in this order:
--
-- Input ioptions (@iopts@) specify CSV conversion rules file to help convert CSV data,
-- enable or disable balance assertion checking and automated posting generation.
-- - a reader name provided in @iopts@
--
-- - a reader prefix in the @mfile@ path
--
-- - a file extension in @mfile@
--
-- If none of these is available, or if the reader name is unrecognised,
-- we use the journal reader. (We used to try all readers in this case;
-- since hledger 1.17, we prefer predictability.)
readJournal :: InputOpts -> Maybe FilePath -> Text -> IO (Either String Journal)
readJournal iopts mfile txt =
tryReaders iopts mfile specifiedorallreaders txt
readJournal iopts mpath txt = do
dbg1IO "trying reader" (rFormat r)
ej <- (runExceptT . (rParser r) iopts (fromMaybe "(string)" mpath)) txt
dbg1IO "reader result" (' ':show ej)
return ej
where
specifiedorallreaders = maybe stablereaders (:[]) $ findReader (mformat_ iopts) mfile
stablereaders = filter (not.rExperimental) readers
r = fromMaybe JournalReader.reader $ findReader (mformat_ iopts) mpath
-- | Try to parse the given text to a Journal using each reader in turn,
-- returning the first success, or if all of them fail, the first error message.
-- | @findReader mformat mpath@
--
-- Input options specify CSV conversion rules file to help convert CSV data,
-- enable or disable balance assertion checking and automated posting generation.
--
tryReaders :: InputOpts -> Maybe FilePath -> [Reader] -> Text -> IO (Either String Journal)
tryReaders iopts mpath readers txt = firstSuccessOrFirstError [] readers
-- Find the reader named by @mformat@, if provided.
-- Or, if a file path is provided, find the first reader that handles
-- its file extension, if any.
findReader :: Maybe StorageFormat -> Maybe FilePath -> Maybe Reader
findReader Nothing Nothing = Nothing
findReader (Just fmt) _ = headMay [r | r <- readers, rFormat r == fmt]
findReader Nothing (Just path) =
case prefix of
Just fmt -> headMay [r | r <- readers, rFormat r == fmt]
Nothing -> headMay [r | r <- readers, ext `elem` rExtensions r]
where
-- TODO: #1087 when parsing csv with -f -, if the csv (rules) parser fails,
-- we would rather see that error, not the one from the journal parser
firstSuccessOrFirstError :: [String] -> [Reader] -> IO (Either String Journal)
firstSuccessOrFirstError [] [] = return $ Left "no readers found"
firstSuccessOrFirstError errs (r:rs) = do
dbg1IO "trying reader" (rFormat r)
result <- (runExceptT . (rParser r) iopts path) txt
dbg1IO "reader result" $ either id show result
case result of Right j -> return $ Right j -- success!
Left e -> firstSuccessOrFirstError (errs++[e]) rs -- keep trying
firstSuccessOrFirstError (e:_) [] = return $ Left e -- none left, return first error
path = fromMaybe "(string)" mpath
(prefix,path') = splitReaderPrefix path
ext = drop 1 $ takeExtension path'
-- | Read the default journal file specified by the environment, or raise an error.
defaultJournal :: IO Journal
@ -193,7 +194,7 @@ readJournalFiles iopts =
-- the @mformat_@ specified in the input options, if any;
-- the file path's READER: prefix, if any;
-- a recognised file name extension.
-- if none of these identify a known reader, all built-in readers are tried in turn.
-- if none of these identify a known reader, the journal reader is used.
--
-- The input options can also configure balance assertion checking, automated posting
-- generation, a rules file for converting CSV data, etc.
@ -266,22 +267,6 @@ newJournalContent = do
d <- getCurrentDay
return $ printf "; journal created %s by hledger\n" (show d)
-- | @findReader mformat mpath@
--
-- Find the reader named by @mformat@, if provided.
-- Or, if a file path is provided, find the first reader that handles
-- its file extension, if any.
findReader :: Maybe StorageFormat -> Maybe FilePath -> Maybe Reader
findReader Nothing Nothing = Nothing
findReader (Just fmt) _ = headMay [r | r <- readers, rFormat r == fmt]
findReader Nothing (Just path) =
case prefix of
Just fmt -> headMay [r | r <- readers, rFormat r == fmt]
Nothing -> headMay [r | r <- readers, ext `elem` rExtensions r]
where
(prefix,path') = splitReaderPrefix path
ext = drop 1 $ takeExtension path'
-- A "LatestDates" is zero or more copies of the same date,
-- representing the latest transaction date read from a file,
-- and how many transactions there were on that date.

View File

@ -22,15 +22,12 @@ expecting date separator or digit
2018/1/1
a 1
# 2. When read from stdin, this example actually passes because hledger tries all readers.
# If they all failed, it would show the error from the first (journal reader).
# But in this case the timedot reader can parse it.
# 2. When read from stdin with no reader prefix, the journal reader is used,
# and fails here. (Before 1.17, all readers were tried and the timedot reader
# would succeed.)
$ hledger -f - print
>
2018-01-01 *
(a) 1.00
>=
>2 /could not balance/
>=1
# 3. So in these tests we must sometimes force the desired format, like so.
# Now we see the error from the journal reader.

View File

@ -1,4 +1,5 @@
# 1. a timeclock session is parsed as a similarly-named transaction with one virtual posting
# 1. a timeclock session is parsed as a similarly-named transaction with one virtual posting.
# Note since 1.17 we need to specify stdin's format explicitly.
<
i 2009/1/1 08:00:00
o 2009/1/1 09:00:00 stuff on checkout record is ignored
@ -7,7 +8,7 @@ i 2009/1/2 08:00:00 account name
o 2009/1/2 09:00:00
i 2009/1/3 08:00:00 some:account name and a description
o 2009/1/3 09:00:00
$ hledger -f - print
$ hledger -f timeclock:- print
>
2009-01-01 * 08:00-09:00
() 1.00h
@ -24,14 +25,14 @@ $ hledger -f - print
# For a missing clock-out, now is implied
<
i 2020/1/1 08:00
$ hledger -f - balance
$ hledger -f timeclock:- balance
> /./
>= 0
# For a log not starting with clock-out, print error
<
o 2020/1/1 08:00
$ hledger -f - balance
$ hledger -f timeclock:- balance
>2 /line 1: expected timeclock code i/
>= !0
@ -39,7 +40,7 @@ $ hledger -f - balance
<
o 2020/1/1 08:00
o 2020/1/1 09:00
$ hledger -f - balance
$ hledger -f timeclock:- balance
>2 /line 1: expected timeclock code i/
>= !0
@ -47,13 +48,13 @@ $ hledger -f - balance
<
i 2020/1/1 08:00
i 2020/1/1 09:00
$ hledger -f - balance
$ hledger -f timeclock:- balance
>2 /line 2: expected timeclock code o/
>= !0
## TODO
## multi-day sessions get a new transaction for each day
#hledger -f- print
#hledger -ftimeclock:- print
#<<<
#i 2017/04/20 09:00:00 A
#o 2017/04/20 17:00:00
@ -67,7 +68,7 @@ $ hledger -f - balance
#
## unclosed sessions are automatically closed at report time
## TODO this output looks wrong
#hledger -f- print
#hledger -ftimeclock:- print
#<<<
#i 2017/04/20 09:00:00 A
#o 2017/04/20 17:00:00

View File

@ -1,3 +1,5 @@
# Note since 1.17 we need to specify stdin's format explicitly.
# 1. basic timedot entry
<
# comment
@ -7,7 +9,7 @@
a:aa 1
b:bb 2
$ hledger -f- print
$ hledger -ftimedot:- print
2020-01-01 *
(a:aa) 1.00
@ -21,7 +23,7 @@ $ hledger -f- print
* 2020-01-01
** a:aa 1
$ hledger -f- print
$ hledger -ftimedot:- print
2020-01-01 *
(a:aa) 1.00