src/Html | ||
tests | ||
.prettierrc.json | ||
elm.json | ||
package-lock.json | ||
package.json | ||
README.md |
elm-html-parser
Note: Not currently published to Elm packages.
A lenient html5 parser implemented with Elm.
A lenient alternative to hecrj/elm-html-parser.
Usage
run
to parse an html string into a list of html nodes.runDocument
to parse<!doctype html>[...]
into a root node.
import Html.Parser
Html.Parser.run "<p class=greeting>hello <strong>world</strong></p>"
-- Ok
-- [ Element "p" [ ("class", "greeting") ]
-- [ Text "hello "
-- , Element "strong" [] [ Text "world" ]
-- ]
-- ]
Rendering:
nodeToHtml
ornodesToHtml
to render parsed nodes into virtual dom nodes that Elm can render.nodeToString
andnodesToString
to render parsed nodes into a string.nodeToPrettyString
andnodesToPrettyString
to render parsed nodes into indented strings.
Goals
- Leniency
- Avoids validating while parsing
- Prefers to immitate browser parsing behavior rather than html5 spec.
- Prefers to use the html5 spec only to handle ambiguous cases rather than to prohibit invalid html5
- Prefers to fall back to text nodes than short-circuit with parse errors
- Handle user-written html
- Users don't write character entities like
&
and<
. This parser should strive to handle cases like<p><:</p>
->Element "p" [] [ Text "<:" ]
.
- Users don't write character entities like
Features / Quirks
-
Characters don't need to be escaped into entities.
e.g.
<div><:</div>
will parse correctly and doesn't need to be rewritten into<div><:</div>
. -
Tags that should not nest are autoclosed.
e.g.
<p>a<p>b
-><p>a</p><p>b</p>
. -
Closing tags that have no matching open tags are ignored.
e.g.
</a><div></div></div></b>
-><div></div>
-
Ignores comments in whitespace positions:
e.g.
<div <!--comment-->/>
-><div/>
-
Parses comments in text node positions:
e.g.
div><!--comment--></div>
->Element "div" [ Comment "comment" ]
Differences from existing packages
Currently, there is only one html parser published to Elm packages: hecrj/elm-html-parser.
@hecjr has said that following the html5 spec is a goal of their parser, so their parser is stricter by design and rejects invalid html5.
Development
git clone
and npm install
.
npm test
to run testsnpm docs
to preview docs locally
Special thanks
- @hecrj and their contributors.
- @ymtszw for their work on the Javascript
<script>
parser.