This is rather a sketch, we need to work on documentation, tests, and
perhaps on performance, but it should show the direction Megaparsec
5.0.0 is taking.
Close#95.
Here we introduce ‘scientific’ parser that can parse arbitrary big
numbers without error or memory overflow. ‘float’ still returns
‘Double’, but it's defined in terms of ‘scientific’ now. Since
‘Scientific’ type can reliably represent integer values as well as
floating point values, ‘number’ now returns ‘Scientific’ instead of
‘Either Integer Double’ (‘Integer’ or ‘Double’ can be extracted from
‘Scientific’ value anyway). This in turn makes ‘signed’ parser more
natural and general, because we do not need ad-hoc ‘Signed’ type class
anymore.
This should improve experience of users who use Megaparsec with Alex and
Happy. The commit also introduces some minor changes in
‘Text.Megaparsec.Pos’ module (improving argument order).
Removed ‘parseFromFile’ and ‘StorableStream’ type-class that was
necessary for it. The reason for removal is that reading from file and
then parsing its contents is trivial for every instance of ‘Stream’ and
this function provides no way to use newer methods for running a parser,
such as ‘runParser'’. So, simply put, it adds little value and was
included in 4.x versions for compatibility purposes.
Collection of constraints changed from ‘Alternative m, Monad m, Stream s
t’ to ‘MonadPlus m, Stream s t’. This is done to make it easier to write
more abstract code with older GHC where such primitives as ‘guard’ are
defined for instances of ‘MonadPlus’, not ‘Alternative’.
This was Parsec's legacy that we should eliminate now. ‘Message’ does
not constitute enumeration, ‘toEnum’ was never properly defined for
it. The idea to use ‘fromEnum’ to determine type of ‘Message’ is also
ugly, for this purpose new functions ‘isUnexpected’, ‘isExpected’, and
‘isMessage’ are defined in ‘Text.Megaparsec.Error’.
This also enables the respective warnings flags in dev mode
to help megaparsec remain forward compatible.
The dependencies on `semigroup` and `fail` are conditional on
`impl(ghc >= 8)` and avoid CPP and conditionally defined instances
(which would result in an conditional API).
Close#81.
This solution is mostly OK as it passes tests and almost all benchmarks
show that there is no performance degradation.
The only function that bothers me is ‘pPlus’ (or ‘mplus’, or
‘<|>’). Benchmarks ‘choice/match’, ‘choice/nomatch’, and ‘manyTill’ show
about 44 % worse performance with current implementation of the feature
— this is not acceptable. All these functions are defined via ‘mplus’,
so it's necessary to find a way to improve that function.
Also, ‘mplus’ is tricky in that it combines different branches of
parsing. Previously, all logic describing how to combine failing
branches into one ‘ParseError’ were in ‘mergeError’ function. Now we
have to have ‘longestMatch’ function to choose right state as well,
because it's natural to expect that state on failure would correspond to
‘ParseError’. This should be done elegantly.