mirror of
https://github.com/mrkkrp/megaparsec.git
synced 2024-11-23 10:23:39 +03:00
Improve the documentation
This commit is contained in:
parent
b5887b59d7
commit
ce7228346a
213
README.md
213
README.md
@ -36,50 +36,46 @@ written by Daan Leijen.
|
||||
|
||||
## Features
|
||||
|
||||
This project provides flexible solutions to satisfy common parsing
|
||||
needs. The section describes them shortly. If you're looking for
|
||||
comprehensive documentation, see the
|
||||
[section about documentation](#documentation).
|
||||
This project provides flexible solutions to satisfy common parsing needs.
|
||||
The section describes them shortly. If you're looking for comprehensive
|
||||
documentation, see the [section about documentation](#documentation).
|
||||
|
||||
### Core features
|
||||
|
||||
The package is built around `MonadParsec`, a MTL-style monad
|
||||
transformer. All tools and features work with any instance of
|
||||
`MonadParsec`. You can achieve various effects combining monad transformers,
|
||||
i.e. building monad stack. Since most common monad transformers like
|
||||
`WriterT`, `StateT`, `ReaderT` and others are instances of `MonadParsec`,
|
||||
you can wrap `ParsecT` *in* these monads, achieving, for example,
|
||||
The package is built around `MonadParsec`, an MTL-style monad transformer.
|
||||
All tools and features work with all instances of `MonadParsec`. You can
|
||||
achieve various effects combining monad transformers, i.e. building monad
|
||||
stack. Since the standard common monad transformers like `WriterT`,
|
||||
`StateT`, `ReaderT` and others are instances of the `MonadParsec` type
|
||||
class, you can wrap `ParsecT` *in* these monads, achieving, for example,
|
||||
backtracking state.
|
||||
|
||||
On the other hand `ParsecT` is instance of many type classes as well. The
|
||||
On the other hand `ParsecT` is an instance of many type classes as well. The
|
||||
most useful ones are `Monad`, `Applicative`, `Alternative`, and
|
||||
`MonadParsec`.
|
||||
|
||||
The module
|
||||
[`Text.Megaparsec.Combinator`](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Combinator.html)
|
||||
(its functions are included in `Text.Megaparsec`) contains traditional,
|
||||
general combinators that work with any instance of `Alternative` and some
|
||||
even with instances of `Applicative`.
|
||||
[`Text.Megaparsec.Combinator`](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Combinator.html) (its
|
||||
functions are included in `Text.Megaparsec`) contains traditional, general
|
||||
combinators that work with instances of `Applicative` and `Alternative`.
|
||||
|
||||
Role of `Monad`, `Applicative`, and `Alternative` should be obvious, so
|
||||
let's enumerate methods of `MonadParsec` type class. The class represents
|
||||
core, basic functions of Megaparsec parsing. The rest of library is built
|
||||
Let's enumerate methods of the `MonadParsec` type class. The class abstracts
|
||||
primitive functions of Megaparsec parsing. The rest of the library is built
|
||||
via combination of these primitives:
|
||||
|
||||
* `failure` allows to fail with arbitrary collection of messages.
|
||||
* `failure` allows to fail reporting an arbitrary parse error.
|
||||
|
||||
* `label` allows to add a “label” to any parser, so when it fails the user will
|
||||
see the label in the error message where “expected” items are enumerated.
|
||||
* `label` allows to add a “label” to a parser, so if it fails the user will
|
||||
see the label instead of an automatically deduced expected token.
|
||||
|
||||
* `hidden` hides any parser from error messages altogether, this is
|
||||
officially recommended way to hide things, prefer it to the `label ""`
|
||||
approach.
|
||||
* `hidden` hides a parser from error messages altogether. This is the
|
||||
recommended way to hide things, prefer it to the `label ""` approach.
|
||||
|
||||
* `try` enables backtracking in parsing.
|
||||
|
||||
* `lookAhead` allows to parse something without consuming input.
|
||||
* `lookAhead` allows to parse input without consuming it.
|
||||
|
||||
* `notFollowedBy` succeeds when its argument fails, it does not consume
|
||||
* `notFollowedBy` succeeds when its argument fails and does not consume
|
||||
input.
|
||||
|
||||
* `withRecovery` allows to recover from parse errors “on-the-fly” and
|
||||
@ -91,42 +87,43 @@ via combination of these primitives:
|
||||
|
||||
* `eof` only succeeds at the end of input.
|
||||
|
||||
* `token` is used to parse single token.
|
||||
* `token` is used to parse a single token.
|
||||
|
||||
* `tokens` makes it easy to parse several tokens in a row.
|
||||
|
||||
* `getParserState` returns full parser state.
|
||||
* `getParserState` returns the full parser state.
|
||||
|
||||
* `updateParserState` applies given function on parser state.
|
||||
* `updateParserState` applies a given function on the parser state.
|
||||
|
||||
This list of core functions is longer than in some other libraries. Our goal
|
||||
was efficient and readable implementation of functionality provided by every
|
||||
such primitive, not minimal number of them. You can read the comprehensive
|
||||
description of every primitive function in
|
||||
is efficient, readable implementations, and rich functionality, not minimal
|
||||
number of primitive combinators. You can read the comprehensive description
|
||||
of every primitive function in
|
||||
[Megaparsec documentation](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Prim.html).
|
||||
|
||||
Megaparsec can currently work with the following types of input stream
|
||||
out-of-box:
|
||||
out-of-the-box:
|
||||
|
||||
* `String` = `[Char]`
|
||||
|
||||
* `ByteString` (strict and lazy)
|
||||
|
||||
* `Text` (strict and lazy)
|
||||
|
||||
It's also simple to make it work with custom token streams, and Megaparsec
|
||||
users have done so many times with great success.
|
||||
|
||||
### Error messages
|
||||
|
||||
Megaparsec 5 introduces well-typed error messages and ability to use custom
|
||||
data types to adjust the library to your domain of interest. No need to keep
|
||||
your info as shapeless bunch of strings anymore.
|
||||
Megaparsec 5 introduces well-typed error messages and the ability to use
|
||||
custom data types to adjust the library to specific domain of interest. No
|
||||
need to use a shapeless bunch of strings anymore.
|
||||
|
||||
The default error component (`Dec`) has constructors corresponding to `fail`
|
||||
function and indentation-related error messages. It is a decent option that
|
||||
should work out-of-box for most parsing needs, while you are free to use
|
||||
your own custom error component when necessary with little effort.
|
||||
The default error component (`Dec`) has constructors corresponding to the
|
||||
`fail` function and indentation-related error messages. It is a decent
|
||||
option that should work out-of-box for most parsing needs, while you are
|
||||
free to use your own custom error component when necessary.
|
||||
|
||||
This new design allowed Megaparsec 5 to have much more helpful error
|
||||
messages for indentation-sensitive parsing instead of plain “incorrect
|
||||
messages for indentation-sensitive parsing instead of the plain “incorrect
|
||||
indentation” phrase.
|
||||
|
||||
### Alex and Happy support
|
||||
@ -135,16 +132,17 @@ Megaparsec works well with streams of tokens produced by tools like
|
||||
Alex/Happy. Megaparsec 5 adds `updatePos` method to `Stream` type class that
|
||||
gives you full control over textual positions that are used to report token
|
||||
positions in error messages. You can update current position on per
|
||||
character basis or extract it from token — all cases are covered.
|
||||
character basis or extract it from token.
|
||||
|
||||
### Character parsing
|
||||
|
||||
Megaparsec has decent support for Unicode-aware character parsing. Functions
|
||||
for character parsing live in [`Text.Megaparsec.Char`](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Char.html) (they all are
|
||||
included in `Text.Megaparsec`). The functions can be divided into several
|
||||
categories:
|
||||
for character parsing live in the
|
||||
[`Text.Megaparsec.Char`](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Char.html)
|
||||
module (they all are included in `Text.Megaparsec`). The functions can be
|
||||
divided into several categories:
|
||||
|
||||
* *Simple parsers* — parsers that parse certain character or several
|
||||
* *Simple parsers*—parsers that parse certain character or several
|
||||
characters of the same kind. This includes `newline`, `crlf`, `eol`,
|
||||
`tab`, and `space`.
|
||||
|
||||
@ -154,20 +152,19 @@ categories:
|
||||
`digitChar`, and others.
|
||||
|
||||
* *General parsers* that allow you to parse a single character you specify
|
||||
or one of given characters, or any character except for given ones, or
|
||||
character satisfying given predicate. Case-insensitive versions of the
|
||||
parsers are available.
|
||||
or one of the given characters, or any character except for the given
|
||||
ones, or character satisfying given predicate. Case-insensitive versions
|
||||
of the parsers are available.
|
||||
|
||||
* *Parsers for sequences of characters* parse strings. These are more
|
||||
efficient and provide better error messages than other approaches most
|
||||
programmers can come up with. Case-sensitive `string` parser is available
|
||||
as well as case-insensitive `string'`.
|
||||
* *Parsers for sequences of characters* parse strings. Case-sensitive
|
||||
`string` parser is available as well as case-insensitive `string'`.
|
||||
|
||||
### Permutation parsing
|
||||
|
||||
For those who are interested in parsing of permutation phrases, there is
|
||||
[`Text.Megaparsec.Perm`](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Perm.html). You have to import the module explicitly, it's not
|
||||
included in the `Text.Megaparsec` module.
|
||||
For those who are interested in parsing of permutation phrases, there
|
||||
is [`Text.Megaparsec.Perm`](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Perm.html).
|
||||
You have to import the module explicitly, it's not included in the
|
||||
`Text.Megaparsec` module.
|
||||
|
||||
### Expression parsing
|
||||
|
||||
@ -186,9 +183,9 @@ is a module that should help you write your lexer. If you have used `Parsec`
|
||||
in the past, this module “fixes” its particularly inflexible
|
||||
`Text.Parsec.Token`.
|
||||
|
||||
`Text.Megaparsec.Lexer` is intended to be imported qualified, it's not
|
||||
included in `Text.Megaparsec`. The module doesn't impose how you should
|
||||
write your parser, but certain approaches may be more elegant than
|
||||
`Text.Megaparsec.Lexer` is intended to be imported via a qualified import,
|
||||
it's not included in `Text.Megaparsec`. The module doesn't impose how you
|
||||
should write your parser, but certain approaches may be more elegant than
|
||||
others. An especially important theme is parsing of white space, comments,
|
||||
and indentation.
|
||||
|
||||
@ -196,7 +193,7 @@ The design of the module allows you quickly solve simple tasks and doesn't
|
||||
get in your way when you want to implement something less standard.
|
||||
|
||||
Since Megaparsec 5, all tools for indentation-sensitive parsing are
|
||||
available in `Text.Megaparsec.Lexer` module — no third party packages
|
||||
available in `Text.Megaparsec.Lexer` module—no third party packages
|
||||
required.
|
||||
|
||||
## Documentation
|
||||
@ -208,12 +205,12 @@ Hackage](https://hackage.haskell.org/package/megaparsec) for yourself.
|
||||
|
||||
## Tutorials
|
||||
|
||||
You can visit [site of the project](https://mrkkrp.github.io/megaparsec/)
|
||||
which
|
||||
You can visit
|
||||
the [site of the project](https://mrkkrp.github.io/megaparsec/) which
|
||||
has [several tutorials](https://mrkkrp.github.io/megaparsec/tutorials.html)
|
||||
that should help you to start with your parsing tasks. The site also has
|
||||
instructions and tips for Parsec users who decide to switch. If you want to
|
||||
improve an existing tutorial or add your own, open a PR
|
||||
instructions and tips for Parsec users who decide to migrate to Megaparsec.
|
||||
If you want to improve an existing tutorial or add your own, open a PR
|
||||
against [this repo](https://github.com/mrkkrp/megaparsec-site).
|
||||
|
||||
## Performance
|
||||
@ -252,13 +249,13 @@ better choice.
|
||||
|
||||
### Megaparsec vs Parsec
|
||||
|
||||
Since Megaparsec is a fork of Parsec, it's necessary to list main
|
||||
Since Megaparsec is a fork of Parsec, we are bound to list the main
|
||||
differences between the two libraries:
|
||||
|
||||
* Better error messages. We test our error messages using dense QuickCheck
|
||||
tests. Good error messages are just as important for us as correct return
|
||||
values of our parsers. Megaparsec will be especially useful if you write
|
||||
compiler or interpreter for some language.
|
||||
values of our parsers. Megaparsec will be especially useful if you write a
|
||||
compiler or an interpreter for some language.
|
||||
|
||||
* Some quirks and “buggy features” (as well as plain bugs) of original
|
||||
Parsec are fixed. There is no undocumented surprising stuff in Megaparsec.
|
||||
@ -284,9 +281,15 @@ differences between the two libraries:
|
||||
* Megaparsec can recover from parse errors “on the fly” and continue
|
||||
parsing.
|
||||
|
||||
* Megaparsec allows to conditionally process parse errors *inside your
|
||||
parser* before parsing is finished. In particular, it's possible to define
|
||||
regions in which parse errors, should they happen, will get a “context
|
||||
tag”, e.g. we could build a context stack like “in function definition
|
||||
foo”, “in expression x”, etc. This is not possible with Parsec.
|
||||
|
||||
* Megaparsec is faster.
|
||||
|
||||
* Megaparsec is better supported.
|
||||
* Megaparsec is ~~better~~ supported.
|
||||
|
||||
If you want to see a detailed change log, `CHANGELOG.md` may be helpful.
|
||||
Also see [this original announcement](https://notehub.org/w7037) for another
|
||||
@ -295,17 +298,17 @@ comparison.
|
||||
To be honest Parsec's development has seemingly stagnated. It has no test
|
||||
suite (only three per-bug tests), and all its releases beginning from
|
||||
version 3.1.2 (according or its change log) were about introducing and
|
||||
fixing regressions. Parsec is old and somewhat famous in Haskell community,
|
||||
so we understand there will be some kind of inertia, but we advise you use
|
||||
Megaparsec from now on because it solves many problems of original Parsec
|
||||
project. If you think you still have a reason to use original Parsec, open
|
||||
an issue.
|
||||
fixing regressions. Parsec is old and somewhat famous in the Haskell
|
||||
community, so we understand there will be some kind of inertia, but we
|
||||
advise you use Megaparsec from now on because it solves many problems of the
|
||||
original Parsec project. If you think you still have a reason to use
|
||||
original Parsec, open an issue.
|
||||
|
||||
### Megaparsec vs Trifecta
|
||||
|
||||
[Trifecta](https://hackage.haskell.org/package/trifecta) is another Haskell
|
||||
library featuring good error messages. Like some other projects of Edward
|
||||
Kmett, it's probably good, but also poorly documented, arcane, and has
|
||||
Kmett, it's probably good, but also under-documented, and has
|
||||
unfixed [bugs and flaws](https://github.com/ekmett/trifecta/issues) that
|
||||
Edward is too busy to fix (simply a fact, no offense intended). Other
|
||||
reasons one may question choice of Trifecta is his/her parsing library:
|
||||
@ -332,7 +335,7 @@ Earley, but there are still enough reasons to choose it over Earley:
|
||||
|
||||
* Megaparsec is faster.
|
||||
|
||||
* Your grammar may be not context free or you may want introduce some sort
|
||||
* Your grammar may be not context-free or you may want introduce some sort
|
||||
of state to the parsing process. Almost all non-trivial parsers require
|
||||
something of this sort. Even if your grammar is context-free, state may
|
||||
allow to add some additional niceties. Earley does not support that.
|
||||
@ -341,12 +344,12 @@ Earley, but there are still enough reasons to choose it over Earley:
|
||||
arbitrary data in them, return multiple error messages, mark regions that
|
||||
affect any error that happens in those regions, etc.
|
||||
|
||||
* The approach Earley uses differs from conventional monadic parsing. If you
|
||||
work not alone, chances people you work with, especially beginners will be
|
||||
much more productive with libraries taking more traditional path to
|
||||
parsing like Megaparsec.
|
||||
* The approach Earley uses differs from the conventional monadic parsing. If
|
||||
you work not alone, chances people you work with, especially beginners
|
||||
will be much more productive with libraries taking more traditional path
|
||||
to parsing like Megaparsec.
|
||||
|
||||
IOW, Megaparsec is less safe but also much more powerful.
|
||||
IOW, Megaparsec is less safe but also more powerful.
|
||||
|
||||
### Megaparsec vs Parsers
|
||||
|
||||
@ -360,49 +363,53 @@ following:
|
||||
parser builders, so they can write basic core functionality and get the
|
||||
rest “for free”.
|
||||
|
||||
* It currently has a bug in definition of `lookAhead` for various monad
|
||||
transformers like `StateT`, etc. which is visible when you create
|
||||
backtracking state via monad stack, not via built-in features.
|
||||
* It currently has a ~~bug~~ feature in definition of `lookAhead` for
|
||||
various monad transformers like `StateT`, etc. which is visible when you
|
||||
create backtracking state via monad stack, not via built-in features. The
|
||||
feature makes it so `lookAhead` will backtrack your parser state but not
|
||||
your custom state added via `StateT`. Kmett thinks this behavior is
|
||||
better.
|
||||
|
||||
We intended to use Parsers library in Megaparsec at some point, but aside
|
||||
from already mentioned flaws the library has different conventions for
|
||||
naming of things, different set of “core” functions, etc., different
|
||||
approach to lexer. So it didn't happen, Megaparsec has minimal dependencies,
|
||||
it is feature-rich and self-contained.
|
||||
approach to lexing. So it didn't happen, Megaparsec has minimal
|
||||
dependencies, it is feature-rich and self-contained.
|
||||
|
||||
## Related packages
|
||||
|
||||
The following packages are designed to be used with Megaparsec:
|
||||
|
||||
* [`hspec-megaparsec`](https://hackage.haskell.org/package/hspec-megaparsec)
|
||||
— utilities for testing Megaparsec parsers with with
|
||||
[Hspec](https://hackage.haskell.org/package/hspec).
|
||||
* [`hspec-megaparsec`](https://hackage.haskell.org/package/hspec-megaparsec)—utilities
|
||||
for testing Megaparsec parsers with
|
||||
with [Hspec](https://hackage.haskell.org/package/hspec).
|
||||
|
||||
* [`cassava-megaparsec`](https://hackage.haskell.org/package/cassava-megaparsec) —
|
||||
Megaparsec parser of CSV files that plays nicely
|
||||
* [`cassava-megaparsec`](https://hackage.haskell.org/package/cassava-megaparsec)—Megaparsec
|
||||
parser of CSV files that plays nicely
|
||||
with [Cassava](https://hackage.haskell.org/package/cassava).
|
||||
|
||||
* [`tagsoup-megaparsec`](https://hackage.haskell.org/package/tagsoup-megaparsec)
|
||||
— a library for easily using [TagSoup](https://hackage.haskell.org/package/tagsoup)
|
||||
as a token type in Megaparsec.
|
||||
* [`tagsoup-megaparsec`](https://hackage.haskell.org/package/tagsoup-megaparsec)—a
|
||||
library for easily
|
||||
using [TagSoup](https://hackage.haskell.org/package/tagsoup) as a token
|
||||
type in Megaparsec.
|
||||
|
||||
## Links to announcements
|
||||
|
||||
Here are some blog posts mainly announcing new features of the project and
|
||||
describing what sort of things are now possible:
|
||||
|
||||
* [The original Megaparsec 4.0.0 announcement](https://notehub.org/w7037)
|
||||
* [Megaparsec 4 and 5](https://mrkkrp.github.io/posts/megaparsec-4-and-5.html)
|
||||
* [Announcing Megaparsec 5](https://mrkkrp.github.io/posts/announcing-megaparsec-5.html)
|
||||
* [Latest additions to Megaparsec](https://mrkkrp.github.io/posts/latest-additions-to-megaparsec.html)
|
||||
* [Announcing Megaparsec 5](https://mrkkrp.github.io/posts/announcing-megaparsec-5.html)
|
||||
* [Megaparsec 4 and 5](https://mrkkrp.github.io/posts/megaparsec-4-and-5.html)
|
||||
* [The original Megaparsec 4.0.0 announcement](https://notehub.org/w7037)
|
||||
|
||||
## Authors
|
||||
|
||||
The project was started and is currently maintained by Mark Karpov. You can
|
||||
find complete list of contributors in `AUTHORS.md` file in official
|
||||
repository of the project. Thanks to all the people who propose features and
|
||||
ideas, although they are not in `AUTHORS.md`, without them Megaparsec would
|
||||
not be that good.
|
||||
find the complete list of contributors in the `AUTHORS.md` file in the
|
||||
official repository of the project. Thanks to all the people who propose
|
||||
features and ideas, although they are not in `AUTHORS.md`, without them
|
||||
Megaparsec would not be that good.
|
||||
|
||||
## Contribution
|
||||
|
||||
@ -410,7 +417,7 @@ Issues (bugs, feature requests or otherwise feedback) may be reported in
|
||||
[the GitHub issue tracker for this project](https://github.com/mrkkrp/megaparsec/issues).
|
||||
|
||||
Pull requests are also welcome (and yes, they will get attention and will be
|
||||
merged quickly if they are good, we are progressive folks).
|
||||
merged quickly if they are good).
|
||||
|
||||
If you want to write a tutorial to be hosted on Megaparsec's site, open an
|
||||
issue or pull request [here](https://github.com/mrkkrp/megaparsec-site).
|
||||
|
@ -13,17 +13,17 @@
|
||||
-- If you are new to Megaparsec and don't know where to begin, take a look
|
||||
-- at our tutorials <https://mrkkrp.github.io/megaparsec/tutorials.html>.
|
||||
--
|
||||
-- By default this module is set up to parse character data. If you'd like to
|
||||
-- parse the result of your own tokenizer you should start with the following
|
||||
-- imports:
|
||||
-- By default this module is set up to parse character data. If you'd like
|
||||
-- to parse the result of your own tokenizer you should start with the
|
||||
-- following imports:
|
||||
--
|
||||
-- > import Text.Megaparsec.Prim
|
||||
-- > import Text.Megaparsec.Combinator
|
||||
--
|
||||
-- Then you can implement your own version of 'satisfy' on top of the
|
||||
-- 'token' primitive.
|
||||
-- 'token' primitive, etc.
|
||||
--
|
||||
-- Typical import section looks like this:
|
||||
-- The typical import section looks like this:
|
||||
--
|
||||
-- > import Text.Megaparsec
|
||||
-- > import Text.Megaparsec.String
|
||||
@ -205,13 +205,13 @@ import Text.Megaparsec.Prim
|
||||
--
|
||||
-- @some p@ applies the parser @p@ /one/ or more times and returns a list of
|
||||
-- the returned values of @p@. The note about behavior of the combinator in
|
||||
-- case when @p@ fails consuming input (see 'A.many') applies to 'some' as
|
||||
-- well.
|
||||
-- the case when @p@ fails consuming input (see 'A.many') applies to 'some'
|
||||
-- as well.
|
||||
--
|
||||
-- > word = some letter
|
||||
|
||||
-- $optional
|
||||
--
|
||||
-- @optional p@ tries to apply parser @p@. It will parse @p@ or nothing. It
|
||||
-- only fails if @p@ fails after consuming input. On success result of @p@
|
||||
-- is returned inside of 'Just', on failure 'Nothing' is returned.
|
||||
-- @optional p@ tries to apply the parser @p@. It will parse @p@ or nothing.
|
||||
-- It only fails if @p@ fails after consuming input. On success result of
|
||||
-- @p@ is returned inside of 'Just', on failure 'Nothing' is returned.
|
||||
|
@ -56,7 +56,7 @@ module Text.Megaparsec.Char
|
||||
, string' )
|
||||
where
|
||||
|
||||
import Control.Applicative ((<|>))
|
||||
import Control.Applicative
|
||||
import Data.Char
|
||||
import Data.List.NonEmpty (NonEmpty (..))
|
||||
import Data.Maybe (fromJust)
|
||||
@ -67,7 +67,6 @@ import Text.Megaparsec.Error
|
||||
import Text.Megaparsec.Prim
|
||||
|
||||
#if !MIN_VERSION_base(4,8,0)
|
||||
import Control.Applicative ((<$>), pure)
|
||||
import Data.Foldable (Foldable (), any, elem, notElem)
|
||||
import Prelude hiding (any, elem, notElem)
|
||||
#endif
|
||||
@ -82,7 +81,7 @@ newline = char '\n'
|
||||
{-# INLINE newline #-}
|
||||
|
||||
-- | Parse a carriage return character followed by a newline character.
|
||||
-- Return sequence of characters parsed.
|
||||
-- Return the sequence of characters parsed.
|
||||
|
||||
crlf :: (MonadParsec e s m, Token s ~ Char) => m String
|
||||
crlf = string "\r\n"
|
||||
@ -91,7 +90,7 @@ crlf = string "\r\n"
|
||||
-- | Parse a CRLF (see 'crlf') or LF (see 'newline') end of line. Return the
|
||||
-- sequence of characters parsed.
|
||||
--
|
||||
-- > eol = (pure <$> newline) <|> crlf
|
||||
-- > eol = (pure <$> newline) <|> crlf <?> "end of line"
|
||||
|
||||
eol :: (MonadParsec e s m, Token s ~ Char) => m String
|
||||
eol = (pure <$> newline) <|> crlf <?> "end of line"
|
||||
@ -240,7 +239,7 @@ charCategory :: (MonadParsec e s m, Token s ~ Char) => GeneralCategory -> m Char
|
||||
charCategory cat = satisfy ((== cat) . generalCategory) <?> categoryName cat
|
||||
{-# INLINE charCategory #-}
|
||||
|
||||
-- | Return human-readable name of Unicode General Category.
|
||||
-- | Return the human-readable name of Unicode General Category.
|
||||
|
||||
categoryName :: GeneralCategory -> String
|
||||
categoryName cat =
|
||||
@ -293,8 +292,8 @@ char c = token testChar (Just c)
|
||||
else Left (f x, f c, E.empty)
|
||||
{-# INLINE char #-}
|
||||
|
||||
-- | The same as 'char' but case-insensitive. This parser returns actually
|
||||
-- parsed character preserving its case.
|
||||
-- | The same as 'char' but case-insensitive. This parser returns the
|
||||
-- actually parsed character preserving its case.
|
||||
--
|
||||
-- >>> parseTest (char' 'e') "E"
|
||||
-- 'E'
|
||||
@ -320,8 +319,9 @@ anyChar = satisfy (const True) <?> "character"
|
||||
|
||||
-- | @oneOf cs@ succeeds if the current character is in the supplied
|
||||
-- collection of characters @cs@. Returns the parsed character. Note that
|
||||
-- this parser doesn't automatically generate “expected” component of error
|
||||
-- message, so usually you should label it manually with 'label' or ('<?>').
|
||||
-- this parser cannot automatically generate the “expected” component of
|
||||
-- error message, so usually you should label it manually with 'label' or
|
||||
-- ('<?>').
|
||||
--
|
||||
-- See also: 'satisfy'.
|
||||
--
|
||||
@ -375,8 +375,8 @@ satisfy f = token testChar Nothing
|
||||
----------------------------------------------------------------------------
|
||||
-- Sequence of characters
|
||||
|
||||
-- | @string s@ parses a sequence of characters given by @s@. Returns
|
||||
-- the parsed string (i.e. @s@).
|
||||
-- | @string s@ parses a sequence of characters given by @s@. Returns the
|
||||
-- parsed string (i.e. @s@).
|
||||
--
|
||||
-- > divOrMod = string "div" <|> string "mod"
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
-- Portability : portable
|
||||
--
|
||||
-- Commonly used generic combinators. Note that all the combinators work
|
||||
-- with any 'Alternative' instance.
|
||||
-- with 'Applicative' and 'Alternative' instances.
|
||||
|
||||
{-# LANGUAGE BangPatterns #-}
|
||||
{-# LANGUAGE CPP #-}
|
||||
@ -66,9 +66,9 @@ count :: Applicative m => Int -> m a -> m [a]
|
||||
count n p = sequenceA (replicate n p)
|
||||
{-# INLINE count #-}
|
||||
|
||||
-- | @count\' m n p@ parses from @m@ to @n@ occurrences of @p@. If @n@ is
|
||||
-- not positive or @m > n@, the parser equals to @return []@. Returns a list
|
||||
-- of parsed values.
|
||||
-- | @count' m n p@ parses from @m@ to @n@ occurrences of @p@. If @n@ is not
|
||||
-- positive or @m > n@, the parser equals to @return []@. Returns a list of
|
||||
-- parsed values.
|
||||
--
|
||||
-- Please note that @m@ /may/ be negative, in this case effect is the same
|
||||
-- as if it were equal to zero.
|
||||
@ -125,8 +125,9 @@ someTill :: Alternative m => m a -> m end -> m [a]
|
||||
someTill p end = (:) <$> p <*> manyTill p end
|
||||
{-# INLINE someTill #-}
|
||||
|
||||
-- | @option x p@ tries to apply parser @p@. If @p@ fails without consuming
|
||||
-- input, it returns the value @x@, otherwise the value returned by @p@.
|
||||
-- | @option x p@ tries to apply the parser @p@. If @p@ fails without
|
||||
-- consuming input, it returns the value @x@, otherwise the value returned
|
||||
-- by @p@.
|
||||
--
|
||||
-- > priority = option 0 (digitToInt <$> digitChar)
|
||||
|
||||
|
@ -52,7 +52,7 @@ import Control.Applicative
|
||||
#endif
|
||||
|
||||
-- | Data type that is used to represent “unexpected\/expected” items in
|
||||
-- parse error. The data type is parametrized over token type @t@.
|
||||
-- 'ParseError'. The data type is parametrized over the token type @t@.
|
||||
--
|
||||
-- @since 5.0.0
|
||||
|
||||
@ -132,18 +132,18 @@ instance ErrorComponent Dec where
|
||||
representFail = DecFail
|
||||
representIndentation = DecIndentation
|
||||
|
||||
-- | The data type @ParseError@ represents parse errors. It provides the
|
||||
-- stack of source positions, set of expected and unexpected tokens as well
|
||||
-- as set of custom associated data. The data type is parametrized over
|
||||
-- token type @t@ and custom data @e@.
|
||||
-- | 'ParseError' represents… parse errors. It provides the stack of source
|
||||
-- positions, a set of expected and unexpected tokens as well as a set of
|
||||
-- custom associated data. The data type is parametrized over the token type
|
||||
-- @t@ and the custom data @e@.
|
||||
--
|
||||
-- Note that the stack of source positions contains current position as its
|
||||
-- head, and the rest of positions allows to track full sequence of include
|
||||
-- files with topmost source file at the end of the list.
|
||||
--
|
||||
-- 'Semigroup' (or 'Monoid') instance of the data type allows to merge parse
|
||||
-- errors from different branches of parsing. When merging two
|
||||
-- 'ParseError's, longest match is preferred; if positions are the same,
|
||||
-- 'Semigroup' (and 'Monoid') instance of the data type allows to merge
|
||||
-- parse errors from different branches of parsing. When merging two
|
||||
-- 'ParseError's, the longest match is preferred; if positions are the same,
|
||||
-- custom data sets and collections of message items are combined.
|
||||
|
||||
data ParseError t e = ParseError
|
||||
@ -195,10 +195,10 @@ instance (Arbitrary t, Ord t, Arbitrary e, Ord e)
|
||||
#endif
|
||||
|
||||
-- | Merge two error data structures into one joining their collections of
|
||||
-- message items and preferring longest match. In other words, earlier error
|
||||
-- message is discarded. This may seem counter-intuitive, but 'mergeError'
|
||||
-- is only used to merge error messages of alternative branches of parsing
|
||||
-- and in this case longest match should be preferred.
|
||||
-- message items and preferring the longest match. In other words, earlier
|
||||
-- error message is discarded. This may seem counter-intuitive, but
|
||||
-- 'mergeError' is only used to merge error messages of alternative branches
|
||||
-- of parsing and in this case longest match should be preferred.
|
||||
|
||||
mergeError :: (Ord t, Ord e)
|
||||
=> ParseError t e
|
||||
@ -302,7 +302,7 @@ instance ShowErrorComponent Dec where
|
||||
EQ -> "equal to "
|
||||
GT -> "greater than "
|
||||
|
||||
-- | Pretty-print 'ParseError'. The rendered 'String' always ends with a
|
||||
-- | Pretty-print a 'ParseError'. The rendered 'String' always ends with a
|
||||
-- newline.
|
||||
--
|
||||
-- The function is defined as:
|
||||
@ -320,7 +320,7 @@ parseErrorPretty :: ( Ord t
|
||||
parseErrorPretty e =
|
||||
sourcePosStackPretty (errorPos e) ++ ":\n" ++ parseErrorTextPretty e
|
||||
|
||||
-- | Pretty-print stack of source positions.
|
||||
-- | Pretty-print a stack of source positions.
|
||||
--
|
||||
-- @since 5.0.0
|
||||
|
||||
@ -330,7 +330,7 @@ sourcePosStackPretty ms = concatMap f rest ++ sourcePosPretty pos
|
||||
rest = reverse rest'
|
||||
f p = "in file included from " ++ sourcePosPretty p ++ ",\n"
|
||||
|
||||
-- | Transforms list of error messages into their textual representation.
|
||||
-- | Transforms a list of error messages into their textual representation.
|
||||
|
||||
messageItemsPretty :: ShowErrorComponent a
|
||||
=> String -- ^ Prefix to prepend
|
||||
@ -343,16 +343,16 @@ messageItemsPretty prefix ts
|
||||
in prefix ++ f ts ++ "\n"
|
||||
|
||||
-- | Print a pretty list where items are separated with commas and the word
|
||||
-- “or” according to rules of English punctuation.
|
||||
-- “or” according to the rules of English punctuation.
|
||||
|
||||
orList :: NonEmpty String -> String
|
||||
orList (x:|[]) = x
|
||||
orList (x:|[y]) = x ++ " or " ++ y
|
||||
orList xs = intercalate ", " (NE.init xs) ++ ", or " ++ NE.last xs
|
||||
|
||||
-- | Pretty-print textual part of a 'ParseError', that is, everything except
|
||||
-- stack of source positions. The rendered staring always ends with a new
|
||||
-- line.
|
||||
-- | Pretty-print a textual part of a 'ParseError', that is, everything
|
||||
-- except stack of source positions. The rendered staring always ends with a
|
||||
-- new line.
|
||||
--
|
||||
-- @since 5.1.0
|
||||
|
||||
|
@ -35,7 +35,7 @@ data Operator m a
|
||||
|
||||
-- | @makeExprParser term table@ builds an expression parser for terms
|
||||
-- @term@ with operators from @table@, taking the associativity and
|
||||
-- precedence specified in @table@ into account.
|
||||
-- precedence specified in the @table@ into account.
|
||||
--
|
||||
-- @table@ is a list of @[Operator m a]@ lists. The list is ordered in
|
||||
-- descending precedence. All operators in one list have the same precedence
|
||||
@ -46,13 +46,13 @@ data Operator m a
|
||||
--
|
||||
-- Unary operators of the same precedence can only occur once (i.e. @--2@ is
|
||||
-- not allowed if @-@ is prefix negate). If you need to parse several prefix
|
||||
-- or postfix operators in a row, (like C pointers — @**i@) you can use this
|
||||
-- or postfix operators in a row, (like C pointers—@**i@) you can use this
|
||||
-- approach:
|
||||
--
|
||||
-- > manyUnaryOp = foldr1 (.) <$> some singleUnaryOp
|
||||
--
|
||||
-- This is not done by default because in some cases you don't want to allow
|
||||
-- repeating prefix or postfix operators.
|
||||
-- This is not done by default because in some cases allowing repeating
|
||||
-- prefix or postfix operators is not desirable.
|
||||
--
|
||||
-- If you want to have an operator that is a prefix of another operator in
|
||||
-- the table, use the following (or similar) wrapper instead of plain
|
||||
@ -98,9 +98,9 @@ addPrecLevel term ops =
|
||||
las' = pInfixL (choice las) term'
|
||||
nas' = pInfixN (choice nas) term'
|
||||
|
||||
-- | @pTerm prefix term postfix@ parses a term with @term@ surrounded by
|
||||
-- optional prefix and postfix unary operators. Parsers @prefix@ and
|
||||
-- @postfix@ are allowed to fail, in this case 'id' is used.
|
||||
-- | @pTerm prefix term postfix@ parses a @term@ surrounded by optional
|
||||
-- prefix and postfix unary operators. Parsers @prefix@ and @postfix@ are
|
||||
-- allowed to fail, in this case 'id' is used.
|
||||
|
||||
pTerm :: MonadParsec e s m => m (a -> a) -> m a -> m (a -> a) -> m a
|
||||
pTerm prefix term postfix = do
|
||||
|
@ -53,7 +53,7 @@ module Text.Megaparsec.Lexer
|
||||
, signed )
|
||||
where
|
||||
|
||||
import Control.Applicative ((<|>), some, optional)
|
||||
import Control.Applicative
|
||||
import Control.Monad (void)
|
||||
import Data.Char (readLitChar)
|
||||
import Data.List.NonEmpty (NonEmpty (..))
|
||||
@ -67,10 +67,6 @@ import Text.Megaparsec.Pos
|
||||
import Text.Megaparsec.Prim
|
||||
import qualified Text.Megaparsec.Char as C
|
||||
|
||||
#if !MIN_VERSION_base(4,8,0)
|
||||
import Control.Applicative ((<$>), (<*), (*>), (<*>), pure)
|
||||
#endif
|
||||
|
||||
----------------------------------------------------------------------------
|
||||
-- White space
|
||||
|
||||
@ -150,9 +146,9 @@ symbol' :: (MonadParsec e s m, Token s ~ Char)
|
||||
symbol' spc = lexeme spc . C.string'
|
||||
|
||||
-- | Given comment prefix this function returns a parser that skips line
|
||||
-- comments. Note that it stops just before newline character but doesn't
|
||||
-- consume the newline. Newline is either supposed to be consumed by 'space'
|
||||
-- parser or picked up manually.
|
||||
-- comments. Note that it stops just before the newline character but
|
||||
-- doesn't consume the newline. Newline is either supposed to be consumed by
|
||||
-- 'space' parser or picked up manually.
|
||||
|
||||
skipLineComment :: (MonadParsec e s m, Token s ~ Char)
|
||||
=> String -- ^ Line comment prefix
|
||||
@ -189,7 +185,7 @@ skipBlockCommentNested start end = p >> void (manyTill e n)
|
||||
----------------------------------------------------------------------------
|
||||
-- Indentation
|
||||
|
||||
-- | Return current indentation level.
|
||||
-- | Return the current indentation level.
|
||||
--
|
||||
-- The function is a simple shortcut defined as:
|
||||
--
|
||||
@ -218,13 +214,13 @@ incorrectIndent ord ref actual = failure E.empty E.empty (E.singleton x)
|
||||
where x = representIndentation ord ref actual
|
||||
|
||||
-- | @indentGuard spaceConsumer ord ref@ first consumes all white space
|
||||
-- (indentation) with @spaceConsumer@ parser, then it checks column
|
||||
-- (indentation) with @spaceConsumer@ parser, then it checks the column
|
||||
-- position. Ordering between current indentation level and the reference
|
||||
-- indentation level @ref@ should be @ord@, otherwise the parser fails. On
|
||||
-- success the current column position is returned.
|
||||
--
|
||||
-- When you want to parse a block of indentation, first run this parser with
|
||||
-- arguments like @indentGuard spaceConsumer GT (unsafePos 1)@ — this will
|
||||
-- arguments like @indentGuard spaceConsumer GT (unsafePos 1)@—this will
|
||||
-- make sure you have some indentation. Use returned value to check
|
||||
-- indentation on every subsequent line according to syntax of your
|
||||
-- language.
|
||||
@ -305,7 +301,7 @@ indentBlock sc r = do
|
||||
f (x:xs)
|
||||
|
||||
-- | Grab indented items. This is a helper for 'indentBlock', it's not a
|
||||
-- part of public API.
|
||||
-- part of the public API.
|
||||
|
||||
indentedItems :: MonadParsec e s m
|
||||
=> Pos -- ^ Reference indentation level
|
||||
@ -328,7 +324,7 @@ indentedItems ref lvl sc p = go
|
||||
-- | Create a parser that supports line-folding. The first argument is used
|
||||
-- to consume white space between components of line fold, thus it /must/
|
||||
-- consume newlines in order to work properly. The second argument is a
|
||||
-- callback that receives custom space-consuming parser as argument. This
|
||||
-- callback that receives a custom space-consuming parser as argument. This
|
||||
-- parser should be used after separate components of line fold that can be
|
||||
-- put on different lines.
|
||||
--
|
||||
@ -353,8 +349,8 @@ lineFold sc action =
|
||||
----------------------------------------------------------------------------
|
||||
-- Character and string literals
|
||||
|
||||
-- | The lexeme parser parses a single literal character without quotes.
|
||||
-- Purpose of this parser is to help with parsing of conventional escape
|
||||
-- | The lexeme parser parses a single literal character without quotes. The
|
||||
-- purpose of this parser is to help with parsing of conventional escape
|
||||
-- sequences. It's your responsibility to take care of character literal
|
||||
-- syntax in your language (by surrounding it with single quotes or
|
||||
-- similar).
|
||||
@ -404,7 +400,7 @@ decimal = nump "" C.digitChar <?> "decimal integer"
|
||||
-- | Parse an integer in hexadecimal representation. Representation of
|
||||
-- hexadecimal number is expected to be according to the Haskell report
|
||||
-- except for the fact that this parser doesn't parse “0x” or “0X” prefix.
|
||||
-- It is responsibility of the programmer to parse correct prefix before
|
||||
-- It is a responsibility of the programmer to parse correct prefix before
|
||||
-- parsing the number itself.
|
||||
--
|
||||
-- For example you can make it conform to Haskell report like this:
|
||||
@ -416,7 +412,7 @@ hexadecimal = nump "0x" C.hexDigitChar <?> "hexadecimal integer"
|
||||
|
||||
-- | Parse an integer in octal representation. Representation of octal
|
||||
-- number is expected to be according to the Haskell report except for the
|
||||
-- fact that this parser doesn't parse “0o” or “0O” prefix. It is
|
||||
-- fact that this parser doesn't parse “0o” or “0O” prefix. It is a
|
||||
-- responsibility of the programmer to parse correct prefix before parsing
|
||||
-- the number itself.
|
||||
|
||||
@ -430,7 +426,7 @@ octal = nump "0o" C.octDigitChar <?> "octal integer"
|
||||
nump :: MonadParsec e s m => String -> m Char -> m Integer
|
||||
nump prefix baseDigit = read . (prefix ++) <$> some baseDigit
|
||||
|
||||
-- | Parse a floating point value as 'Scientific' number. 'Scientific' is
|
||||
-- | Parse a floating point value as a 'Scientific' number. 'Scientific' is
|
||||
-- great for parsing of arbitrary precision numbers coming from an untrusted
|
||||
-- source. See documentation in "Data.Scientific" for more information.
|
||||
-- Representation of the floating point value is expected to be according to
|
||||
|
@ -71,7 +71,7 @@ makePermParser (Perm def xs) = choice (fmap branch xs ++ empty)
|
||||
-- | The expression @f \<$$> p@ creates a fresh permutation parser
|
||||
-- consisting of parser @p@. The the final result of the permutation parser
|
||||
-- is the function @f@ applied to the return value of @p@. The parser @p@ is
|
||||
-- not allowed to accept empty input — use the optional combinator ('<$?>')
|
||||
-- not allowed to accept empty input—use the optional combinator ('<$?>')
|
||||
-- instead.
|
||||
--
|
||||
-- If the function @f@ takes more than one parameter, the type variable @b@
|
||||
@ -90,7 +90,7 @@ f <$$> p = newperm f <||> p
|
||||
-- | The expression @f \<$?> (x, p)@ creates a fresh permutation parser
|
||||
-- consisting of parser @p@. The final result of the permutation parser is
|
||||
-- the function @f@ applied to the return value of @p@. The parser @p@ is
|
||||
-- optional — if it cannot be applied, the default value @x@ will be used
|
||||
-- optional—if it cannot be applied, the default value @x@ will be used
|
||||
-- instead.
|
||||
|
||||
(<$?>) :: MonadParsec e s m
|
||||
@ -100,7 +100,7 @@ f <$$> p = newperm f <||> p
|
||||
f <$?> xp = newperm f <|?> xp
|
||||
|
||||
-- | The expression @perm \<||> p@ adds parser @p@ to the permutation parser
|
||||
-- @perm@. The parser @p@ is not allowed to accept empty input — use the
|
||||
-- @perm@. The parser @p@ is not allowed to accept empty input—use the
|
||||
-- optional combinator ('<|?>') instead. Returns a new permutation parser
|
||||
-- that includes @p@.
|
||||
|
||||
@ -111,7 +111,7 @@ f <$?> xp = newperm f <|?> xp
|
||||
(<||>) = add
|
||||
|
||||
-- | The expression @perm \<||> (x, p)@ adds parser @p@ to the permutation
|
||||
-- parser @perm@. The parser @p@ is optional — if it cannot be applied, the
|
||||
-- parser @perm@. The parser @p@ is optional—if it cannot be applied, the
|
||||
-- default value @x@ will be used instead. Returns a new permutation parser
|
||||
-- that includes the optional parser @p@.
|
||||
|
||||
|
@ -8,7 +8,7 @@
|
||||
-- Portability : portable
|
||||
--
|
||||
-- Textual source position. The position includes name of file, line number,
|
||||
-- and column number. List of such positions can be used to model stack of
|
||||
-- and column number. List of such positions can be used to model a stack of
|
||||
-- include files.
|
||||
|
||||
{-# LANGUAGE CPP #-}
|
||||
@ -163,7 +163,7 @@ sourcePosPretty (SourcePos n l c)
|
||||
-- Helpers implementing default behaviors
|
||||
|
||||
-- | Update a source position given a character. The first argument
|
||||
-- specifies tab width. If the character is a newline (\'\\n\') the line
|
||||
-- specifies the tab width. If the character is a newline (\'\\n\') the line
|
||||
-- number is incremented by 1. If the character is a tab (\'\\t\') the
|
||||
-- column number is incremented to the nearest tab position. In all other
|
||||
-- cases, the column is incremented by 1.
|
||||
@ -187,8 +187,8 @@ defaultUpdatePos width apos@(SourcePos n l c) ch = (apos, npos)
|
||||
_ -> SourcePos n l (c <> u)
|
||||
|
||||
-- | Value of tab width used by default. Always prefer this constant when
|
||||
-- you want to refer to default tab width because actual value /may/ change
|
||||
-- in future. Current value is @8@.
|
||||
-- you want to refer to the default tab width because actual value /may/
|
||||
-- change in future.
|
||||
|
||||
defaultTabWidth :: Pos
|
||||
defaultTabWidth = unsafePos 8
|
||||
|
@ -226,8 +226,9 @@ accHints
|
||||
accHints hs1 c x s hs2 = c x s (hs1 <> hs2)
|
||||
{-# INLINE accHints #-}
|
||||
|
||||
-- | Replace the most recent group of hints (if any) with given 'ErrorItem'
|
||||
-- (or delete it if 'Nothing' is given). This is used in 'label' primitive.
|
||||
-- | Replace the most recent group of hints (if any) with the given
|
||||
-- 'ErrorItem' (or delete it if 'Nothing' is given). This is used in 'label'
|
||||
-- primitive.
|
||||
|
||||
refreshLastHint :: Hints t -> Maybe (ErrorItem t) -> Hints t
|
||||
refreshLastHint (Hints []) _ = Hints []
|
||||
@ -254,24 +255,24 @@ class Ord (Token s) => Stream s where
|
||||
-- | Update position in stream given tab width, current position, and
|
||||
-- current token. The result is a tuple where the first element will be
|
||||
-- used to report parse errors for current token, while the second element
|
||||
-- is the incremented position that will be stored in parser's state. The
|
||||
-- stored (incremented) position is used whenever position can't
|
||||
-- is the incremented position that will be stored in the parser's state.
|
||||
-- The stored (incremented) position is used whenever position can't
|
||||
-- be\/shouldn't be updated by consuming a token. For example, when using
|
||||
-- 'failure', we don't grab a new token (we need to fail right were we are
|
||||
-- now), so error position will be taken from parser's state.
|
||||
--
|
||||
-- When you work with streams where elements do not contain information
|
||||
-- about their position in input, result is usually consists of the third
|
||||
-- argument unchanged and incremented position calculated with respect to
|
||||
-- current token. This is how default instances of 'Stream' work (they use
|
||||
-- 'defaultUpdatePos', which may be a good starting point for your own
|
||||
-- position-advancing function).
|
||||
-- about their position in input, the result is usually consists of the
|
||||
-- third argument unchanged and incremented position calculated with
|
||||
-- respect to current token. This is how default instances of 'Stream'
|
||||
-- work (they use 'defaultUpdatePos', which may be a good starting point
|
||||
-- for your own position-advancing function).
|
||||
--
|
||||
-- When you wish to deal with stream of tokens where every token “knows”
|
||||
-- When you wish to deal with a stream of tokens where every token “knows”
|
||||
-- its start and end position in input (for example, you have produced the
|
||||
-- stream with Happy\/Alex), then the best strategy is to use the start
|
||||
-- position as actual element position and provide the end position of the
|
||||
-- token as incremented one.
|
||||
-- position as the actual element position and provide the end position of
|
||||
-- the token as the incremented one.
|
||||
--
|
||||
-- @since 5.0.0
|
||||
|
||||
@ -318,38 +319,8 @@ instance Stream TL.Text where
|
||||
updatePos = const defaultUpdatePos
|
||||
{-# INLINE updatePos #-}
|
||||
|
||||
-- If you're reading this, you may be interested in how Megaparsec works on
|
||||
-- lower level. That's quite simple. 'ParsecT' is a wrapper around function
|
||||
-- that takes five arguments:
|
||||
--
|
||||
-- * State. It includes input stream, position in input stream and
|
||||
-- current value of tab width.
|
||||
--
|
||||
-- * “Consumed-OK” continuation (cok). This is a function that takes
|
||||
-- three arguments: result of parsing, state after parsing, and hints
|
||||
-- (see their description above). This continuation is called when
|
||||
-- something has been consumed during parsing and result is OK (no error
|
||||
-- occurred).
|
||||
--
|
||||
-- * “Consumed-error” continuation (cerr). This function is called when
|
||||
-- some part of input stream has been consumed and parsing resulted in
|
||||
-- an error. This continuation takes 'ParseError' and state information
|
||||
-- at the time error occurred.
|
||||
--
|
||||
-- * “Empty-OK” continuation (eok). The function takes the same
|
||||
-- arguments as “consumed-OK” continuation. “Empty-OK” is called when no
|
||||
-- input has been consumed and no error occurred.
|
||||
--
|
||||
-- * “Empty-error” continuation (eerr). The function is called when no
|
||||
-- input has been consumed, but nonetheless parsing resulted in an
|
||||
-- error. Just like “consumed-error”, the continuation takes
|
||||
-- 'ParseError' record and state information.
|
||||
--
|
||||
-- You call specific continuation when you want to proceed in that specific
|
||||
-- branch of control flow.
|
||||
|
||||
-- | @Parsec@ is non-transformer variant of more general 'ParsecT' monad
|
||||
-- transformer.
|
||||
-- | @Parsec@ is a non-transformer variant of the more general 'ParsecT'
|
||||
-- monad transformer.
|
||||
|
||||
type Parsec e s = ParsecT e s Identity
|
||||
|
||||
@ -439,8 +410,6 @@ pFail msg = ParsecT $ \s@(State _ pos _ _) _ _ _ eerr ->
|
||||
where d = E.singleton (representFail msg)
|
||||
{-# INLINE pFail #-}
|
||||
|
||||
-- | Low-level creation of the 'ParsecT' type.
|
||||
|
||||
mkPT :: Monad m => (State s -> m (Reply e s a)) -> ParsecT e s m a
|
||||
mkPT k = ParsecT $ \s cok cerr eok eerr -> do
|
||||
(Reply s' consumption result) <- k s
|
||||
@ -505,8 +474,8 @@ pPlus m n = ParsecT $ \s cok cerr eok eerr ->
|
||||
in unParser m s cok cerr eok meerr
|
||||
{-# INLINE pPlus #-}
|
||||
|
||||
-- | From two states, return the one with greater number of processed
|
||||
-- tokens. If the numbers of processed tokens are equal, prefer the latter
|
||||
-- | From two states, return the one with the greater number of processed
|
||||
-- tokens. If the numbers of processed tokens are equal, prefer the second
|
||||
-- state.
|
||||
|
||||
longestMatch :: State s -> State s -> State s
|
||||
@ -529,7 +498,7 @@ instance MonadTrans (ParsecT e s) where
|
||||
class (ErrorComponent e, Stream s, A.Alternative m, MonadPlus m)
|
||||
=> MonadParsec e s m | m -> e s where
|
||||
|
||||
-- | The most general way to stop parsing and report 'ParseError'.
|
||||
-- | The most general way to stop parsing and report a 'ParseError'.
|
||||
--
|
||||
-- 'unexpected' is defined in terms of this function:
|
||||
--
|
||||
@ -556,22 +525,22 @@ class (ErrorComponent e, Stream s, A.Alternative m, MonadPlus m)
|
||||
hidden = label ""
|
||||
|
||||
-- | The parser @try p@ behaves like parser @p@, except that it backtracks
|
||||
-- parser state when @p@ fails (either consuming input or not).
|
||||
-- the parser state when @p@ fails (either consuming input or not).
|
||||
--
|
||||
-- This combinator is used whenever arbitrary look ahead is needed. Since
|
||||
-- it pretends that it hasn't consumed any input when @p@ fails, the
|
||||
-- ('A.<|>') combinator will try its second alternative even when the
|
||||
-- first parser failed while consuming input.
|
||||
--
|
||||
-- For example, here is a parser that is supposed to parse word “let” or
|
||||
-- “lexical”:
|
||||
-- For example, here is a parser that is supposed to parse the word “let”
|
||||
-- or the word “lexical”:
|
||||
--
|
||||
-- >>> parseTest (string "let" <|> string "lexical") "lexical"
|
||||
-- 1:1:
|
||||
-- unexpected "lex"
|
||||
-- expecting "let"
|
||||
--
|
||||
-- What happens here? First parser consumes “le” and fails (because it
|
||||
-- What happens here? The first parser consumes “le” and fails (because it
|
||||
-- doesn't see a “t”). The second parser, however, isn't tried, since the
|
||||
-- first parser has already consumed some input! 'try' fixes this behavior
|
||||
-- and allows backtracking to work:
|
||||
@ -603,17 +572,17 @@ class (ErrorComponent e, Stream s, A.Alternative m, MonadPlus m)
|
||||
|
||||
lookAhead :: m a -> m a
|
||||
|
||||
-- | @notFollowedBy p@ only succeeds when parser @p@ fails. This parser
|
||||
-- /never consumes/ any input and /never modifies/ parser state. It can be
|
||||
-- used to implement the “longest match” rule.
|
||||
-- | @notFollowedBy p@ only succeeds when the parser @p@ fails. This
|
||||
-- parser /never consumes/ any input and /never modifies/ parser state. It
|
||||
-- can be used to implement the “longest match” rule.
|
||||
|
||||
notFollowedBy :: m a -> m ()
|
||||
|
||||
-- | @withRecovery r p@ allows continue parsing even if parser @p@ fails.
|
||||
-- In this case @r@ is called with actual 'ParseError' as its argument.
|
||||
-- Typical usage is to return value signifying failure to parse this
|
||||
-- particular object and to consume some part of input up to start of next
|
||||
-- object.
|
||||
-- In this case @r@ is called with the actual 'ParseError' as its
|
||||
-- argument. Typical usage is to return a value signifying failure to
|
||||
-- parse this particular object and to consume some part of the input up
|
||||
-- to the point where the next object starts.
|
||||
--
|
||||
-- Note that if @r@ fails, original error message is reported as if
|
||||
-- without 'withRecovery'. In no way recovering parser @r@ can influence
|
||||
@ -626,8 +595,8 @@ class (ErrorComponent e, Stream s, A.Alternative m, MonadPlus m)
|
||||
-> m a -- ^ Original parser
|
||||
-> m a -- ^ Parser that can recover from failures
|
||||
|
||||
-- | @observing p@ allows to “observe” failure of @p@ parser, should it
|
||||
-- happen, without actually ending parsing, but instead getting the
|
||||
-- | @observing p@ allows to “observe” failure of the @p@ parser, should
|
||||
-- it happen, without actually ending parsing, but instead getting the
|
||||
-- 'ParseError' in 'Left'. On success parsed value is returned in 'Right'
|
||||
-- as usual. Note that this primitive just allows you to observe parse
|
||||
-- errors as they happen, it does not backtrack or change how the @p@
|
||||
@ -668,7 +637,7 @@ class (ErrorComponent e, Stream s, A.Alternative m, MonadPlus m)
|
||||
-> Maybe (Token s) -- ^ Token to report when input stream is empty
|
||||
-> m a
|
||||
|
||||
-- | The parser @tokens test@ parses list of tokens and returns it.
|
||||
-- | The parser @tokens test@ parses a list of tokens and returns it.
|
||||
-- Supplied predicate @test@ is used to check equality of given and parsed
|
||||
-- tokens.
|
||||
--
|
||||
@ -699,11 +668,11 @@ class (ErrorComponent e, Stream s, A.Alternative m, MonadPlus m)
|
||||
-- ^ List of tokens to parse
|
||||
-> m [Token s]
|
||||
|
||||
-- | Returns the full parser state as a 'State' record.
|
||||
-- | Return the full parser state as a 'State' record.
|
||||
|
||||
getParserState :: m (State s)
|
||||
|
||||
-- | @updateParserState f@ applies function @f@ to the parser state.
|
||||
-- | @updateParserState f@ applies the function @f@ to the parser state.
|
||||
|
||||
updateParserState :: (State s -> State s) -> m ()
|
||||
|
||||
@ -892,8 +861,8 @@ infix 0 <?>
|
||||
(<?>) :: MonadParsec e s m => m a -> String -> m a
|
||||
(<?>) = flip label
|
||||
|
||||
-- | The parser @unexpected item@ always fails with an error message telling
|
||||
-- about unexpected item @item@ without consuming any input.
|
||||
-- | The parser @unexpected item@ fails with an error message telling about
|
||||
-- unexpected item @item@ without consuming any input.
|
||||
|
||||
unexpected :: MonadParsec e s m => ErrorItem (Token s) -> m a
|
||||
unexpected item = failure (E.singleton item) E.empty E.empty
|
||||
@ -914,9 +883,9 @@ match p = do
|
||||
return (streamTake (tp' - tp) s, r)
|
||||
|
||||
-- | Specify how to process 'ParseError's that happen inside of this
|
||||
-- wrapper. As a side effect of current implementation changing 'errorPos'
|
||||
-- with this combinator will also change the final 'statePos' in parser
|
||||
-- state.
|
||||
-- wrapper. As a side effect of the current implementation changing
|
||||
-- 'errorPos' with this combinator will also change the final 'statePos' in
|
||||
-- the parser state.
|
||||
--
|
||||
-- @since 5.3.0
|
||||
|
||||
@ -961,8 +930,8 @@ setInput s = updateParserState (\(State _ pos tp w) -> State s pos tp w)
|
||||
getPosition :: MonadParsec e s m => m SourcePos
|
||||
getPosition = NE.head . statePos <$> getParserState
|
||||
|
||||
-- | Get position where the next token in the stream begins. If the stream
|
||||
-- is empty, return 'Nothing'.
|
||||
-- | Get the position where the next token in the stream begins. If the
|
||||
-- stream is empty, return 'Nothing'.
|
||||
--
|
||||
-- @since 5.3.0
|
||||
|
||||
@ -980,9 +949,8 @@ setPosition :: MonadParsec e s m => SourcePos -> m ()
|
||||
setPosition pos = updateParserState $ \(State s (_:|z) tp w) ->
|
||||
State s (pos:|z) tp w
|
||||
|
||||
-- | Push given position into stack of positions and continue parsing
|
||||
-- working with this position. Useful for working with include files and the
|
||||
-- like.
|
||||
-- | Push a position into stack of positions and continue parsing working
|
||||
-- with this position. Useful for working with include files and the like.
|
||||
--
|
||||
-- See also: 'getPosition', 'setPosition', 'popPosition', and 'SourcePos'.
|
||||
--
|
||||
@ -992,9 +960,9 @@ pushPosition :: MonadParsec e s m => SourcePos -> m ()
|
||||
pushPosition pos = updateParserState $ \(State s z tp w) ->
|
||||
State s (NE.cons pos z) tp w
|
||||
|
||||
-- | Pop a position from stack of positions unless it only contains one
|
||||
-- element (in that case stack of positions remains the same). This is how
|
||||
-- to return to previous source file after 'pushPosition'.
|
||||
-- | Pop a position from the stack of positions unless it only contains one
|
||||
-- element (in that case the stack of positions remains the same). This is
|
||||
-- how to return to previous source file after 'pushPosition'.
|
||||
--
|
||||
-- See also: 'getPosition', 'setPosition', 'pushPosition', and 'SourcePos'.
|
||||
--
|
||||
@ -1006,14 +974,14 @@ popPosition = updateParserState $ \(State s z tp w) ->
|
||||
Nothing -> State s z tp w
|
||||
Just z' -> State s z' tp w
|
||||
|
||||
-- | Get number of tokens processed so far.
|
||||
-- | Get the number of tokens processed so far.
|
||||
--
|
||||
-- @since 5.2.0
|
||||
|
||||
getTokensProcessed :: MonadParsec e s m => m Word
|
||||
getTokensProcessed = stateTokensProcessed <$> getParserState
|
||||
|
||||
-- | Set number of tokens processed so far.
|
||||
-- | Set the number of tokens processed so far.
|
||||
--
|
||||
-- @since 5.2.0
|
||||
|
||||
@ -1021,20 +989,21 @@ setTokensProcessed :: MonadParsec e s m => Word -> m ()
|
||||
setTokensProcessed tp = updateParserState $ \(State s pos _ w) ->
|
||||
State s pos tp w
|
||||
|
||||
-- | Return tab width. Default tab width is equal to 'defaultTabWidth'. You
|
||||
-- can set different tab width with help of 'setTabWidth'.
|
||||
-- | Return the tab width. The default tab width is equal to
|
||||
-- 'defaultTabWidth'. You can set a different tab width with the help of
|
||||
-- 'setTabWidth'.
|
||||
|
||||
getTabWidth :: MonadParsec e s m => m Pos
|
||||
getTabWidth = stateTabWidth <$> getParserState
|
||||
|
||||
-- | Set tab width. If argument of the function is not positive number,
|
||||
-- 'defaultTabWidth' will be used.
|
||||
-- | Set tab width. If the argument of the function is not a positive
|
||||
-- number, 'defaultTabWidth' will be used.
|
||||
|
||||
setTabWidth :: MonadParsec e s m => Pos -> m ()
|
||||
setTabWidth w = updateParserState $ \(State s pos tp _) ->
|
||||
State s pos tp w
|
||||
|
||||
-- | @setParserState st@ set the full parser state to @st@.
|
||||
-- | @setParserState st@ sets the parser state to @st@.
|
||||
|
||||
setParserState :: MonadParsec e s m => State s -> m ()
|
||||
setParserState st = updateParserState (const st)
|
||||
@ -1062,14 +1031,14 @@ parse
|
||||
-> Either (ParseError (Token s) e) a
|
||||
parse = runParser
|
||||
|
||||
-- | @parseMaybe p input@ runs parser @p@ on @input@ and returns result
|
||||
-- inside 'Just' on success and 'Nothing' on failure. This function also
|
||||
-- parses 'eof', so if the parser doesn't consume all of its input, it will
|
||||
-- fail.
|
||||
-- | @parseMaybe p input@ runs the parser @p@ on @input@ and returns the
|
||||
-- result inside 'Just' on success and 'Nothing' on failure. This function
|
||||
-- also parses 'eof', so if the parser doesn't consume all of its input, it
|
||||
-- will fail.
|
||||
--
|
||||
-- The function is supposed to be useful for lightweight parsing, where
|
||||
-- error messages (and thus file name) are not important and entire input
|
||||
-- should be parsed. For example it can be used when parsing of single
|
||||
-- should be parsed. For example it can be used when parsing of a single
|
||||
-- number according to specification of its format is desired.
|
||||
|
||||
parseMaybe :: (ErrorComponent e, Stream s) => Parsec e s a -> s -> Maybe a
|
||||
@ -1078,7 +1047,7 @@ parseMaybe p s =
|
||||
Left _ -> Nothing
|
||||
Right x -> Just x
|
||||
|
||||
-- | The expression @parseTest p input@ applies a parser @p@ against input
|
||||
-- | The expression @parseTest p input@ applies the parser @p@ against input
|
||||
-- @input@ and prints the result to stdout. Useful for testing.
|
||||
|
||||
parseTest :: ( ShowErrorComponent e
|
||||
@ -1093,7 +1062,7 @@ parseTest p input =
|
||||
Left e -> putStr (parseErrorPretty e)
|
||||
Right x -> print x
|
||||
|
||||
-- | @runParser p file input@ runs parser @p@ on the input list of tokens
|
||||
-- | @runParser p file input@ runs parser @p@ on the input stream of tokens
|
||||
-- @input@, obtained from source @file@. The @file@ is only used in error
|
||||
-- messages and may be the empty string. Returns either a 'ParseError'
|
||||
-- ('Left') or a value of type @a@ ('Right').
|
||||
@ -1330,7 +1299,7 @@ fixs' _ (Right (b,s,w)) = (Right b, s, w)
|
||||
-- Debugging
|
||||
|
||||
-- | @dbg label p@ parser works exactly like @p@, but when it's evaluated it
|
||||
-- prints information useful for debugging. The @label@ is only used to
|
||||
-- also prints information useful for debugging. The @label@ is only used to
|
||||
-- refer to this parser in the debugging output. This combinator uses the
|
||||
-- 'trace' function from "Debug.Trace" under the hood.
|
||||
--
|
||||
@ -1339,17 +1308,17 @@ fixs' _ (Right (b,s,w)) = (Right b, s, w)
|
||||
-- print-out. As of current version, this combinator prints all available
|
||||
-- information except for /hints/, which are probably only interesting to
|
||||
-- the maintainer of Megaparsec itself and may be quite verbose to output in
|
||||
-- general. Let me know if you would like to be able to see hints as part of
|
||||
-- general. Let me know if you would like to be able to see hints in the
|
||||
-- debugging output.
|
||||
--
|
||||
-- The output itself is pretty self-explanatory, although the following
|
||||
-- abbreviations should be clarified (they are derived from low-level source
|
||||
-- code):
|
||||
-- abbreviations should be clarified (they are derived from the low-level
|
||||
-- source code):
|
||||
--
|
||||
-- * @COK@ — “consumed OK”. The parser consumed input and succeeded.
|
||||
-- * @CERR@ — “consumed error”. The parser consumed input and failed.
|
||||
-- * @EOK@ — “empty OK”. The parser succeeded without consuming input.
|
||||
-- * @EERR@ — “empty error”. The parser failed without consuming input.
|
||||
-- * @COK@—“consumed OK”. The parser consumed input and succeeded.
|
||||
-- * @CERR@—“consumed error”. The parser consumed input and failed.
|
||||
-- * @EOK@—“empty OK”. The parser succeeded without consuming input.
|
||||
-- * @EERR@—“empty error”. The parser failed without consuming input.
|
||||
--
|
||||
-- Finally, it's not possible to lift this function into some monad
|
||||
-- transformers without introducing surprising behavior (e.g. unexpected
|
||||
@ -1383,7 +1352,7 @@ dbg lbl p = ParsecT $ \s cok cerr eok eerr ->
|
||||
l (DbgEERR (streamTake (streamDelta s s') (stateInput s)) err)
|
||||
in unParser p s cok' cerr' eok' eerr'
|
||||
|
||||
-- | Single piece of info to be rendered with 'dbgLog'.
|
||||
-- | A single piece of info to be rendered with 'dbgLog'.
|
||||
|
||||
data DbgItem s e a
|
||||
= DbgIn [Token s]
|
||||
@ -1432,13 +1401,13 @@ streamDelta
|
||||
-> Word -- ^ Number of consumed tokens
|
||||
streamDelta s0 s1 = stateTokensProcessed s1 - stateTokensProcessed s0
|
||||
|
||||
-- | Extract given number of tokens from the stream.
|
||||
-- | Extract a given number of tokens from the stream.
|
||||
|
||||
streamTake :: Stream s => Word -> s -> [Token s]
|
||||
streamTake n s = genericTake n (unfold s)
|
||||
|
||||
-- | Custom version of 'unfold' that matches signature of 'uncons' method in
|
||||
-- 'Stream' type class we use.
|
||||
-- | A custom version of 'unfold' that matches signature of the 'uncons'
|
||||
-- method in the 'Stream' type class we use.
|
||||
|
||||
unfold :: Stream s => s -> [Token s]
|
||||
unfold s = case uncons s of
|
||||
|
Loading…
Reference in New Issue
Block a user