* csv rules: Show prettier parsing errors
This goes from
hledger: user error ("ParseError {errorPos = SourcePos {sourceName = \"foo.csv.rules\",
sourceLine = Pos 20, sourceColumn = Pos 1} :| [], errorUnexpected =
fromList [Tokens (' ' :| \"\")], errorExpected = fromList [Label ('b' :| \"lank or comment
line\"),EndOfInput], errorCustom = fromList []}")
to
hledger: user error (foo.csv.rules:20:1:
unexpected space
expecting blank or comment line or end of input
)
* csv rules: Fix parsing of empty field values
A single line containing `account1 ` (note the space at the end) should
parse as assignment of the empty string to account1. At least it did
until commit 4141067.
The problem is that megaparsec's `space` parses multiple space
characters as opposed to parsec. So in the example above it would
incorrectly consume the newline.
This commit also adds a new test case for this bug.
Timeclock transaction ids now count up rather than down.
Also, remove old code for appending timeclock transactions to journal transactions,
a holdover from the days when both were allowed in one file.
Transactions are now numbered consistently during journal finalisation,
rather than just in the journal reader. Also transaction knot-tying has been
moved out of journalBalanceTransactions.
* Replace Parsec with Megaparsec (see #289)
This builds upon PR #289 by @rasendubi
* Revert renaming of parseWithState to parseWithCtx
* Fix doctests
* Update for Megaparsec 5
* Specialize parser to improve performance
* Pretty print errors
* Swap StateT and ParsecT
This is necessary to get the correct backtracking behavior, i.e. discard
state changes if the parsing fails.
The journal/timeclock/timedot parsers, instead of constructing (opaque)
journal update functions which are later applied to build the journal,
now construct the journal directly (by modifying the parser state). This
is easier to understand and debug. It also removes any possibility of
the journal updates being a space leak. (They weren't, in fact memory
usage is now slightly higher, but that will be addressed in other ways.)
Also:
Journal data and journal parse info have been merged into one type (for
now), and field names are more consistent.
The ParsedJournal type alias has been added to distinguish being-parsed
and finalised journals.
Journal is now a monoid.
stats: fixed an issue with ordering of include files
journal: fixed an issue with ordering of included same-date transactions
timeclock: sessions can no longer span file boundaries (unclocked-out
sessions will be auto-closed at the end of the file).
expandPath now throws a proper IO error (and requires the IO monad).
journal files can now include journal, timeclock or timedot files (but
not yet CSV files). Also timeclock/timedot files no longer support
default year directives.
The Hledger.Read.* modules have been reorganised for better reuse.
Hledger.Read.Utils has been renamed Hledger.Read.Common and holds
low-level parsers & utilities; high-level read utilities have moved to
Hledger.Read.
Bracketed posting dates were fragile; they worked only if you wrote full
10-character dates. Also some semantics were a bit unclear. Now they
should be robust, and have been documented more clearly. This is a
legacy undocumented Ledger syntax, but it improves compatibility and
might be preferable to the more verbose "date:" tags if you write
posting dates often (as I do).
Internally, bracketed posting dates are no longer considered to be tags.
Journal comment, tag, and posting date parsers have been reworked, all
with doctests. Also the journal parser types generally have been
tightened up and clarified, making it much easier to know how to combine
and run them. There's now
-- | A parser of strings with generic user state, monad and return type.
type StringParser u m a = ParsecT String u m a
-- | A string parser with journal-parsing state.
type JournalParser m a = StringParser JournalContext m a
-- | A journal parser that runs in IO and can throw an error mid-parse.
type ErroringJournalParser a = JournalParser (ExceptT String IO) a
and corresponding convenience functions (and short aliases) for running them.
We now parse account directives, like Ledger's. We don't do anything
with them yet. The default parent account feature must now be spelled
"apply account"/"end apply account".
The Journal, Timelog and Timedot readers' detectors now check
each line in the sample data, not just the first one. I think
the sample data is only about 30 chars right now, but even so
this fixed a format detection issue I was seeing.
Timedot is a plain text format for logging dated, categorised
quantities (eg time), supported by hledger. It is convenient for
approximate and retroactive time logging, eg when the real-time
clock-in/out required with a timeclock file is too precise or too
interruptive. It can be formatted like a bar chart, making clear at a
glance where time was spent.
I really don't see why that extra x parameter is needed or works..
rewrite it in simpler form.
I also might be introducing breakage for older GHC's by using
unconditionally <$>, but I'm not seeing that for some reason
(tested back to ghc 7.6).
This adds a accountNameApplyAliasesMemo, which memoises the result of
applying a set of aliases (simple and regex) to an account name. In
theory this should reduce more repetitive work, but in practice it
doesn't seem to make a difference, so it's unused for now.
Simpler and clearer. We now have "transaction prices" (recorded as part
of transaction amounts) and "market prices" (recorded with P
directives). Both are matters of historical record, also this avoids
confusion with the balance command's "historical balances".
We now parse, and also print, posting-less journal entries, as I
proposed on the lists.
These are not real General Journal entries/transactions, but here is my
rationale:
- Ledger and beancount parse them
- if we parse them, we should print them
- they provide a natural way to record and report non-transaction events
- most of all, they permit more gradual introduction and learning of the concepts.
Eg a beginner can keep a simple journal even before learning about accounts and postings.
A transaction/posting status of ! (pending) was effectively equivalent
to * (cleared). Now it's a separate state, not matched by --cleared.
The new Ledger-compatible --pending flag matches it, and so does
--uncleared. The equivalent search queries are now status:*, status:!
and status: (the old status:1 and status:0 spellings are deprecated).
Since we interpret --uncleared and status: as "any state except cleared",
it's not currently possible to match things which are neither cleared
nor pending.
The regex account aliases added in 0.24 trip up people switching between
hledger and Ledger. (Also they are currently slow).
This change makes the old non-regex aliases the default; they are
unsurprising, useful, and pretty close in functionality to Ledger's.
The new regex aliases are also available; they must be enclosed in
forward slashes. Ledger effectively ignores these, which is ok.
Also clarify docs, refactor, and use the same parser for alias
directives and alias options
NOTE: this is important to correctly build JournalContext
NOTE: currently a list reverse must done at the end,
maybe using a Data.Queue would be more efficient.
If the CSV records appear to have been in reverse date order,
we'll now reverse them all before also sorting by transaction date,
so that the original order of same-day transactions is preserved.
We detect this using a simple heuristic: if the first converted
transaction's date is later than the last's.
alias match patterns (the part left of the =) are now case-insensitive
regular expressions matching anywhere in the account name. The
replacement string (the part right of the =) can replace multiple
matches within the account name. The replacement string does not yet
support any of the usual syntax like backreferences.
Now that balance assertions are checking only a single commodity, it can
be confusing. Eg say all your amounts are in dollars, an assertion like
"= 0" checked the dollar balance in hledger 0.23 but always succeeds in
hledger 0.24. When an assertion fails, we now report which commodity was
checked to help troubleshooting.
Amount display styles have been reworked a bit; they are now calculated
after journal parsing, not during it. This allows the fix for #196:
we now search through the amounts until a decimal point is detected,
instead of just looking at the first one; likewise for digit groups.
Digit groups are now implemented with a better type.
Digit group size detection has been improved a little:
1000,000 now gives group sizes [3,4,4,...], not [3,3,...], and
10,000 gives groups sizes [3,3,...] not [3,2,2,..].
(To get [3,2,2,...] you'd use eg 00,00,000.)
There are still some old (or new ?) issues; I don't think we handle
inconsistent decimal points & digit groups too well. But for now all
tests pass.
Can be helpful when reading Ledger files, where assertions may have
different semantics; or for getting some answers from your journal
to help you fix your assertions.
Could be called --no-assertions, but this might create surprise when it
has an effect contrary to --no-new-accounts.
I had to add another flag throughout the parsers & journal read
functions, ok for now.
- The CSV reader no longer writes a "(stdin).rules" file when reading
from stdin.
- Selection of reader(s) is now smarter when input is coming from stdin.
Previously, all readers were considered applicable for stdin. This
meant that when reading a CSV file from stdin, the journal and timelog
readers were always tried first, and if the CSV file was unparseable,
you'd see the first (journal) reader's error instead of the CSV
reader's. Now, the readers do some basic content sniffing when
reading stdin, so it generally tries only the one right reader and
we'll see the right errors.
- The read system now has more debug output.