stripAnsi is called many times during rendering (by strWidth), so
should be fast. It was originally a regex replacement, and more
recently a custom parser. The parser was slower, particularly the one
in 1.19.1. See #1350, and this rough test:
time118ish = timeIt $ print $ length $ concat $ map (fromRight undefined . regexReplace (toRegex' "\ESC\\[([0-9]+;)*([0-9]+)?[ABCDHJKfmsu]") "") testdata
time119 = timeparser (many (takeWhile1P Nothing (/='\ESC') <|> "" <$ ansi))
time1191 = timeparser (many ("" <$ try ansi <|> pure <$> anySingle))
timeparser p = timeIt $ print $ length $ concat $ map (concat . fromJust . parseMaybe p) testdata
testdata = concat $ replicate 10000
[ "2008-01-01 income assets🏦checking $1 $1"
, "2008-06-01 gift assets🏦checking $1 $2"
, "2008-06-02 save assets🏦saving $1 $3"
, " assets🏦checking ..m$-1\ESC[m\ESC[m $2"
, "2008-06-03 eat & shop assets:cash ..m$-2\ESC[m\ESC[m 0"
, "2008-12-31 pay off assets🏦checking ..m$-1\ESC[m\ESC[m ..m$-1\ESC[m\ESC[m"
]
ghci> time118ish
4560000
CPU time: 0.17s
ghci> time119
4560000
CPU time: 0.91s
ghci> time1191
4560000
CPU time: 2.76s
Possibly a more careful parser could beat regexReplace. Note the
latter does memoisation, which could be faster and/or could also use
more resident memory in some situations.
Ideally we would calculate all widths before adding ANSI colour codes,
so we wouldn't have to wastefully strip them.
This PR #1330, addressing #1312 (parseQuery is partial) and #1245
(internal server error).
User-visible changes:
- hledger-web now handles malformed regular expressions
(eg, a query consisting of the single character `?`) gracefully,
showing a tidy error message instead "internal server error".
API/internal changes:
- The Regex type alias has been replaced by the Regexp ADT, which
contains both the compiled regular expression (so is guaranteed to
be usable at runtime) and the original string (so can be serialised,
printed, compared, etc.) A Regexp also knows whether is it case
sensitive or case insensitive. The Hledger.Utils.Regex api has changed.
- Typeable and Data instances are no longer derived for hledger's
data types; they were redundant/no longer needed
- NFData instances are no longer derived for hledger's data types.
This speeds up a full build by roughly 7%. But it means we can't
deep-evaluate hledger values, or time hledger code with Criterion.
https://github.com/simonmichael/hledger/pull/1330#issuecomment-684075129
has some ideas on this.
- Query no longer has a custom Show instance
- Some internal use of regexps was replaced by text replacement or
parsers.
- Hledger.Utils.String: quoteIfNeeded now actually escapes quotes in
strings; dropped escapeQuotes
- Hledger.Utils.Tree: dropped some old utilities
- dropped some obsolete code for the old --display option
Merge branch 'regexp' into master
matchesAccount_
matchesAmount_
matchesCommodity_
matchesPosting_
matchesPriceDirective_
matchesTags_
matchesTransaction_
These don't yet have tests of their own, but were converted
mechanically from the originals which should help.
;areg: debug output
;areg: show a title indicating which account was picked
This might be a bit of a pain for scripting, but otherwise it can be
quite confusing if your argument matches an account you didn't expect.
;areg: improve CSV headings
;areg: show at most two commodities per amount
accountTransactionsReport now filters transactions more thoroughly, so
eg transactions dated outside the report period will not be shown.
Previously the transaction would be shown if it had any posting dated
inside the report period. Possibly some other filter criteria now get
applied that didn't before. I think on balance this will give slightly
preferable results.
bal --budget was always showing the period as column heading,
as if for a change report. With --cumulative or --historical
it should show the end date, like other balance reports. Cf
https://hledger.org/hledger.html#multicolumn-balance-report.
This is an API change, but it seems better than having additional
colour-supporting variants and trying to avoid duplicated code.
I stopped short of changing showAmount, so cshowAmount still exists.
Multicolumn balance reports showing many commodities tend to become
unreadably wide, especially in tree mode. Now by default we show at
most two commodities, and a count of the rest if there are more than
two. This should help keep reports somewhat readable by default.
SMorgan:
This PR aims to accomplish two major goals:
- Get boring parent ellision working for multiBalanceReport
- Remove the special BalanceReport code, and just use multiBalanceReport
I believe it does both, with the following additional benefits:
A refactor of multiBalanceReportWith, to make the structure easier to follow, and with a clearer division of responsibilities
All decisions for how an account name is to be displayed are now made in multiBalanceReport, rather than scattered around the code base
Some miscellaneous improvements in account name rendering, including --drop now working with MultiBalanceReports, and addressing some of #373
Algorithmic changes:
- Using HashMap AccountName (Map DateSpan Account) instead of [[MixedAmount]] is new. I admit I didn't profile this change (though given the nubs and lookups, I thought it was appropriate), so I'm glad it produces a speedup.
- Producing the starting balances no longer calls the whole balanceReport, just the first few functions to get what it needs.
- displayedAccounts is completely rewritten. Perhaps one subtle thing to note is that in tree mode it no longer excludes nodes with zero inclusive balance unless they also have zero exclusive balance.
SMichael:
I'll mark the passing of the old multiBalanceReport, into which I poured many an hour :). It is in a way the heart (brain ?) of hledger - the key feature of ledgerlikes (balance report) and a key improvement introduced by hledger (tabular multiperiod balance reports). You have split that 300-line though well documented function into modular parts, which could be a little harder to understand in detail but are easier to understand in the large and more amenable to further refactoring. Then you fixed some old limitations (boring parent eliding in multi period balance reports, --drop with tree mode reports), allowing us to drop the old balanceReport and focus on just the new multiBalanceReport. And for representing the tabular data you replaced the semantically correct but inefficient list of lists with a map of maps, speeding up many-columned balance reports significantly (~40%). Last and not least you made it really easy to review. Thanks @Xitian9, great work.
This works with glob patterns too, applying the prefix to each path.
This can be useful when included files don't have the standard file
extension, eg:
include timedot:2020*.md
As mentioned by netvor on IRC, the unbalanced transaction error was
not too clear when postings all have the same sign.
Some other wording has been clarified, and the main error message is
now shown on multiple lines for readability (at the cost of
predictability/grepability..)
There's also a probably unnoticeable change: selecting which parts of
the error to show is now based on display precisions (reusing the
balanced check logic), rather than original precisions.