Close#75.
Now accumulated hints are not used with ‘ParseError’ records that have
only custom messages in them (created with ‘Message’ constructor, as
opposed to ‘Unexpected’ or ‘Expected’). This strips “expected” line from
custom error messages where it's unlikely to be relevant anyway.
Arbitrary messages created with ‘Message’ constructor should not be
rendered as “or”-separated list. This commit makes every such message be
displayed on new line.
After some thinking I decided that this may be not desirable in some
cases, so we should not enable it by default. I've edited documentation
of ‘makeExprParser’ to explain why this doesn't work by default and how
to make it work.
Close#64.
‘makeExprParser’ now generates parser that can handle several
occurrences of the same prefix or postfix operator in a row. This allows
to parse something like C pointers (for example ‘**i’) without resorting
to hacks.
The feature is experimental, I'm not entirely sure it's not
buggy. Upcoming additional tests for ‘Text.Megaparsec.Expr’ will show
whether it behaves correctly in all cases and doesn't have adverse
effects. For now, I've edited existing test to generate data with
repeating prefix negations and postfix factorials. Current code-base
passes the test.
Close#69.
Although previously used syntax is correct Haskell syntax for multi-line
string literals, CPP extension that we need to use for compatibility
reasons obviously makes ‘\’ symbol escape following newline character
that leads to ‘\t’ being interpreted as tab character.
The proposed solution just concatenates result error message from list
of strings — the most lightweight and reliable solution in our case.
What Parsec used is called “FreeBSD” or “BSD 2 clause”. Addition of the
third clause may require contacting all the authors. To hell with it,
let it be “FreeBSD” (which is anyway better than “BSD-like”), I'm a
hacker, not a lawyer (tm).
This commit clarifies license of the software replacing “BSD3” with more
conventional “BSD 3 clause”.
Another change is addition of the third clause originally missing in
license of Parsec (which is licensed under BSD 2 clause license). The
addition of the third clause in form:
* Neither the names of the copyright holders nor the names of
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
does not violate original BSD 2 clause license effectively making it BSD
3 clause license (which I find preferable).
Close#43.
The method allows to fail with arbitrary collection of
messages. ‘unexpected’ is not defined in terms of ‘failure’. One
consequence of this design decision is that ‘failure’ is now method of
‘MonadParsec’, while ‘unexpected’ is not.
Close#47, close#57.
This commit introduces ‘runParser'’ and ‘runParserT'’ functions that
take and return parser state. This makes it possible to partially parse
input, resume parsing, specify non-standard initial textual position,
etc.
Internal changes involve some refactoring to make ‘Reply’ more
readable and facilitate extraction of complete parser state on failure
as well as success.
The commit adds basic tests for the new functionality as well.
Close#65.
Previously we had 5 nearly identical definitions of the function,
varying only in type-specific ‘readFile’ function. Now the problem is
solved by introduction of ‘StorableStream’ type class. All supported
stream types are instances of the class out of box and thus we have
polymorphic version of ‘parseFromFile’.
Close#62.
Apart from some refactoring, the following important changes were
introduced:
* ‘ParseError’ is now a monoid.
* Added functions ‘addErrorMessages’ and ‘newErrorMessages’.
Now it's impossible to create ‘SourcePos’ with non-positive line number
or column number. Unfortunately we cannot use ‘Numeric.Natural’ because
we need to support older versions of ‘base’.
Indented text returned by ‘showMessages’ may be undesirable, but we
cannot add indentation outside of the function (edge case: strings
including newline are displayed in the messages).
Closes#56.
In particular, file name and textual position are represented like this:
filename.hs:5:6:
error message
This format should be more conventional, so various tools will be able
to parse it and provide some support (for example, Emacs can work with
this format).
‘Text.Megaparsec’ and ‘Text.Megaparsec.Prim’ do not export these data
types and their constructors anymore. These data types are rather
low-level implementation detail that should not be visible to
end-user. They are also subject to certain changes in future.
This patch introduces compatibility to base-4.7.0.x. It was tested
on Win 8.1 x86_64, using GHC 7.8.4. It mainly consists of a bunch
of #if !MIN_VERSION(4,8,0) ... #endif additions and a lower bound
on base in the cabal file as well as a general introduction of the
CPP extension via default-extensions.
It also removes a potential error source in tests/Util.hs, since
the backslash in /=\ can lead to strange quirks on certain systems
(backslash and newline only separated by whitespace).
Other, squashed commits:
- Remove 'recent version of base' from Readme
- Change necessary version of GHC
‘Text.Megaparsec.Prim’ cannot be considered portable since it uses
multi-parameter type classes and functional dependencies.
Other modules that depend on these non-portable features from
‘Text.Megaparsec.Prim’ should be considered non-portable too.
Closes#37.
Most part of these changes is proposed by @neongreen. To apply precisely
what I deem acceptable, correct some of them in other way, and add some
other things, I've manually re-edited this.
Closes#38.
Now tab width can be manipulated with via the following functions:
* ‘getTabWidth’
* ‘setTabWidth’
Other auxiliary changes were performed, such as updating of
‘updatePosChar’.
This also corrects a bit obsolete descriptions of some functions.
Closes#36.
We should try to preserve original information where possible. User then
can convert case of parsed string if necessary. Previous implementation
discarded actually parsed string and returned argument of the
function — this can be considered as data loss of a sort.
Closes#35.
Since ‘many’ (and thus ‘some’) are the only combinator that can succeed
consuming input and produce hints at the same time we can conclude that
‘cok'’ continuation in ‘pLabel’ combinator is only called when ‘many’ is
labelled. By correcting label in this case prepending the phrase “rest
of ” to actual label we can greatly improve result error message.
Close # 27.
Backtracking user state can be achieved via combination of ‘StateT’
monad transformer and ‘ParsecT’:
StateT StateType (ParsecT s m a)
This user state can be more flexible. This fact renders current built-in
user state redundant.
To help work with this new approach (combining monad transformers more
freely) we introduce ‘MonadParsec’ MTL-style type class. All tools that
come with Megaparsec library were modified to work smoothly with any
instance of ‘MonadParsec’, not only ‘ParsecT’.
Now all the combinators in ‘Text.Megaparsec.Combinator’ are defined for
any instance of ‘Control.Alternative’ (sometimes ‘Control.Applicative’).
Some combinators are inlined.
Closes#29.
Now testing function can return ‘Either [Message] a’ so it can construct
full list of error messages. This may be useful in some cases when
tokens are more complex than simple characters.
Multi-character operators should use ‘try’ in order to be reported
correctly (as “operator”). I've mentioned it in doc-string of
‘makeExprParser’.
It's tempting to include ‘try’ directly in expression parsing code, but
following general spirit of Parsec toward ‘try’, I think current
solution is the best.
Various languages may vary in how hexadecimal and octal literals should
be prefixed. Following the spirit of the new lexer we leave this to
programmer to decide.
Eliminated ‘Text.Megaparsec.Language’ module because at this point it is
clear that already existing definitions are of little use in
Megaparsec. I started writing “default” language definition in
‘Text.Megaparsec.Lexer’.
At this point it should be possible to parse languages where indentation
matters, although we will need to provide more helpers to make it
easier.
Obviously order does matter here, since ‘Monoid’ instance for ‘Hints’ is
derived from [], so (<>) is the same as (++) and we should be careful
to keep things in the right order.
These parsers are considered deprecated:
* ‘chainl’
* ‘chainl1’
* ‘chainr’
* ‘chainr1’
* ‘sepEndBy’
* ‘sepEndBy1’
Apart from this, the commit includes various cosmetic changes in
module ‘Text.Megaparsec.Combinator’.
The following functions and data types have been renamed:
* ‘permute’ → ‘makePermParser’
* ‘buildExpressionParser’ → ‘makeExprParser’
* ‘GenLanguageDef’ → ‘LanguageDef’
* ‘GenTokenParser’ → ‘Lexer’
* ‘makeTokenParser’ → ‘makeLexer’
The improved error messages in Megaparsec are quite sensitive to how
parsers are written, which parts of parser are labeled, etc. Current
implementation of token parsers in ‘Text.Megaparsec.Token’ is written
without this in mind. We will improve the module later, for now let us
rewrite/simplify some parts to avoid failing tests.
If ‘x’ in ‘x >>= y’ consumes input but produces some hints, we should
accumulate them nonetheless. Why it's important can be demonstrated by
the following test:
many (char 'a') >> many (char 'b') >> eof
This should fail on input "ac" with the following message:
parse error at line 1, column 2:
unexpected 'c'
expecting 'a', 'b' or end of input
As you can see even though parser ‘many (char 'a')’ consumed input, its
hits may be useful later.
* Removed ‘optionMaybe’ parser, because ‘optional’ from
‘Control.Applicative’ does the same thing.
* Renamed ‘tokenPrim’ → ‘token’, removed old ‘token’, because
‘tokenPrim’ is more general and ‘token’ is little used.
* Fixed bug with ‘notFollowedBy’ always succeeded with parsers that
don't consume input, see #6.
* Hint system introduced that greatly improved quality of error messages
and made code of ‘Text.Megaparsec.Prim’ a lot clearer.
The improvements affected other modules too:
* Some parsers from ‘Text.Megaparsec.Combinators’ now live in
‘Text.Megaparsec.Prim’.
* Hint system improved error messages, so I needed to rewrite test for
‘Text.Megaparsec.Char.eol’, since it's error messages are very
intelligent now and cannot be emulated by ‘newline’ and ‘crlf’ parsers
used separately.
* Test for Bug9 from old-tests is passed successfully again.
This parser can be told to parse from ‘m’ to ‘n’ occurrences of some
thing. Old parser ‘count’ is now named ‘count’' and defined in terms of
that more powerful one.
This commit also reorders functions in module
‘Text.Megaparsec.Combinator’ and everywhere where the functions are
listed. The same order is used everywhere.
Added new character parsers in ‘Text.Megaparsec.Char’:
* ‘controlChar’
* ‘printChar’
* ‘markChar’
* ‘numberChar’
* ‘punctuationChar’
* ‘symbolChar’
* ‘separatorChar’
* ‘asciiChar’
* ‘latin1Char’
* ‘charCategory’
Renamed some parsers:
‘spaces’ → ‘space’
‘space’ → ‘spaceChar’
‘lower’ → ‘lowerChar’
‘upper’ → ‘upperChar’
‘letter’ → ‘letterChar’
‘alphaNum’ → ‘alphaNumChar’
‘digit’ → ‘digitChar’
‘octDigit’ → ‘octDigitChar’
‘hexDigit’ → ‘hexDigitChar’
Descriptions of old parsers have been updated to accent some
Unicode-specific moments. For example, old description of ‘letter’
stated that it parses letters from “a” to “z” and from “A” to “Z”. This
is wrong, since it used ‘Data.Char.isAlpha’ predicate internally and
thus parsed many more characters.
New tests shows that I had wrong assumption about workings of this
particular function. This is not a problem, though, complete test-suite
will eliminate this sort of nuisance soon.
These functions are now re-exported from ‘Control.Applicative’
module. ‘many’ and ‘some’ are now part of ‘Alternative’ instance of
‘ParsecT’.
Note that these functions are re-exported only in ‘Text.MegaParsec’
module, but not in ‘Text.MegaParsec.Prim’ to avoid duplication of
floating doc-strings. Others internal modules now just casually import
‘Control.Applicative’ for their needs.
Note that ‘many1’ was renamed to ‘some’, the same is done for other
parsers that had ‘many1’ part in their names (for consistency).
Changed how numbers are parsed because they were parsed in a naïf and
hairy way. Added tests for #2 and #3 (in old Parsec project these are
number 35 and 39 respectively).
* Since Haskell report doesn't say anything about sign, I've made
‘integer’ and ‘float’ parse numbers without sign.
* Removed ‘natural’ parser, it's equal to new ‘integer’ now.
* Renamed ‘naturalOrFloat’ → ‘number’ — this doesn't parse sign too.
* Added new combinator ‘signed’ to parse all sorts of signed numbers.
* For the sake of convenience I've added ‘integer'’, ‘float'’, and
‘number'’ combinators that also can parse signed numbers out of box.