Close#95.
Here we introduce ‘scientific’ parser that can parse arbitrary big
numbers without error or memory overflow. ‘float’ still returns
‘Double’, but it's defined in terms of ‘scientific’ now. Since
‘Scientific’ type can reliably represent integer values as well as
floating point values, ‘number’ now returns ‘Scientific’ instead of
‘Either Integer Double’ (‘Integer’ or ‘Double’ can be extracted from
‘Scientific’ value anyway). This in turn makes ‘signed’ parser more
natural and general, because we do not need ad-hoc ‘Signed’ type class
anymore.
Removed ‘parseFromFile’ and ‘StorableStream’ type-class that was
necessary for it. The reason for removal is that reading from file and
then parsing its contents is trivial for every instance of ‘Stream’ and
this function provides no way to use newer methods for running a parser,
such as ‘runParser'’. So, simply put, it adds little value and was
included in 4.x versions for compatibility purposes.
Collection of constraints changed from ‘Alternative m, Monad m, Stream s
t’ to ‘MonadPlus m, Stream s t’. This is done to make it easier to write
more abstract code with older GHC where such primitives as ‘guard’ are
defined for instances of ‘MonadPlus’, not ‘Alternative’.
This was Parsec's legacy that we should eliminate now. ‘Message’ does
not constitute enumeration, ‘toEnum’ was never properly defined for
it. The idea to use ‘fromEnum’ to determine type of ‘Message’ is also
ugly, for this purpose new functions ‘isUnexpected’, ‘isExpected’, and
‘isMessage’ are defined in ‘Text.Megaparsec.Error’.
After some thinking I decided that this may be not desirable in some
cases, so we should not enable it by default. I've edited documentation
of ‘makeExprParser’ to explain why this doesn't work by default and how
to make it work.
Close#64.
‘makeExprParser’ now generates parser that can handle several
occurrences of the same prefix or postfix operator in a row. This allows
to parse something like C pointers (for example ‘**i’) without resorting
to hacks.
The feature is experimental, I'm not entirely sure it's not
buggy. Upcoming additional tests for ‘Text.Megaparsec.Expr’ will show
whether it behaves correctly in all cases and doesn't have adverse
effects. For now, I've edited existing test to generate data with
repeating prefix negations and postfix factorials. Current code-base
passes the test.
Close#43.
The method allows to fail with arbitrary collection of
messages. ‘unexpected’ is not defined in terms of ‘failure’. One
consequence of this design decision is that ‘failure’ is now method of
‘MonadParsec’, while ‘unexpected’ is not.
Close#47, close#57.
This commit introduces ‘runParser'’ and ‘runParserT'’ functions that
take and return parser state. This makes it possible to partially parse
input, resume parsing, specify non-standard initial textual position,
etc.
Internal changes involve some refactoring to make ‘Reply’ more
readable and facilitate extraction of complete parser state on failure
as well as success.
The commit adds basic tests for the new functionality as well.
Close#65.
Previously we had 5 nearly identical definitions of the function,
varying only in type-specific ‘readFile’ function. Now the problem is
solved by introduction of ‘StorableStream’ type class. All supported
stream types are instances of the class out of box and thus we have
polymorphic version of ‘parseFromFile’.
Close#62.
Apart from some refactoring, the following important changes were
introduced:
* ‘ParseError’ is now a monoid.
* Added functions ‘addErrorMessages’ and ‘newErrorMessages’.
Now it's impossible to create ‘SourcePos’ with non-positive line number
or column number. Unfortunately we cannot use ‘Numeric.Natural’ because
we need to support older versions of ‘base’.
Closes#56.
In particular, file name and textual position are represented like this:
filename.hs:5:6:
error message
This format should be more conventional, so various tools will be able
to parse it and provide some support (for example, Emacs can work with
this format).
‘Text.Megaparsec’ and ‘Text.Megaparsec.Prim’ do not export these data
types and their constructors anymore. These data types are rather
low-level implementation detail that should not be visible to
end-user. They are also subject to certain changes in future.
Closes#38.
Now tab width can be manipulated with via the following functions:
* ‘getTabWidth’
* ‘setTabWidth’
Other auxiliary changes were performed, such as updating of
‘updatePosChar’.
This also corrects a bit obsolete descriptions of some functions.
These parsers are considered deprecated:
* ‘chainl’
* ‘chainl1’
* ‘chainr’
* ‘chainr1’
* ‘sepEndBy’
* ‘sepEndBy1’
Apart from this, the commit includes various cosmetic changes in
module ‘Text.Megaparsec.Combinator’.
The following functions and data types have been renamed:
* ‘permute’ → ‘makePermParser’
* ‘buildExpressionParser’ → ‘makeExprParser’
* ‘GenLanguageDef’ → ‘LanguageDef’
* ‘GenTokenParser’ → ‘Lexer’
* ‘makeTokenParser’ → ‘makeLexer’
* Removed ‘optionMaybe’ parser, because ‘optional’ from
‘Control.Applicative’ does the same thing.
* Renamed ‘tokenPrim’ → ‘token’, removed old ‘token’, because
‘tokenPrim’ is more general and ‘token’ is little used.
* Fixed bug with ‘notFollowedBy’ always succeeded with parsers that
don't consume input, see #6.
* Hint system introduced that greatly improved quality of error messages
and made code of ‘Text.Megaparsec.Prim’ a lot clearer.
The improvements affected other modules too:
* Some parsers from ‘Text.Megaparsec.Combinators’ now live in
‘Text.Megaparsec.Prim’.
* Hint system improved error messages, so I needed to rewrite test for
‘Text.Megaparsec.Char.eol’, since it's error messages are very
intelligent now and cannot be emulated by ‘newline’ and ‘crlf’ parsers
used separately.
* Test for Bug9 from old-tests is passed successfully again.
This parser can be told to parse from ‘m’ to ‘n’ occurrences of some
thing. Old parser ‘count’ is now named ‘count’' and defined in terms of
that more powerful one.
This commit also reorders functions in module
‘Text.Megaparsec.Combinator’ and everywhere where the functions are
listed. The same order is used everywhere.
Added new character parsers in ‘Text.Megaparsec.Char’:
* ‘controlChar’
* ‘printChar’
* ‘markChar’
* ‘numberChar’
* ‘punctuationChar’
* ‘symbolChar’
* ‘separatorChar’
* ‘asciiChar’
* ‘latin1Char’
* ‘charCategory’
Renamed some parsers:
‘spaces’ → ‘space’
‘space’ → ‘spaceChar’
‘lower’ → ‘lowerChar’
‘upper’ → ‘upperChar’
‘letter’ → ‘letterChar’
‘alphaNum’ → ‘alphaNumChar’
‘digit’ → ‘digitChar’
‘octDigit’ → ‘octDigitChar’
‘hexDigit’ → ‘hexDigitChar’
Descriptions of old parsers have been updated to accent some
Unicode-specific moments. For example, old description of ‘letter’
stated that it parses letters from “a” to “z” and from “A” to “Z”. This
is wrong, since it used ‘Data.Char.isAlpha’ predicate internally and
thus parsed many more characters.
These functions are now re-exported from ‘Control.Applicative’
module. ‘many’ and ‘some’ are now part of ‘Alternative’ instance of
‘ParsecT’.
Note that these functions are re-exported only in ‘Text.MegaParsec’
module, but not in ‘Text.MegaParsec.Prim’ to avoid duplication of
floating doc-strings. Others internal modules now just casually import
‘Control.Applicative’ for their needs.
Note that ‘many1’ was renamed to ‘some’, the same is done for other
parsers that had ‘many1’ part in their names (for consistency).