Update readme

2024-11-25 08:19:17 +03:00 · 2022-05-16 21:24:02 -05:00 · 2022-05-16 21:24:02 -05:00 · 980e8c27cd
commit 980e8c27cd
parent 09bf0f643e
1 changed files with 48 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -71,6 +71,54 @@ Currently, there is only one html parser published to Elm packages: [hecrj/elm-h
 - `npm test` to run tests
 - `npm docs` to preview docs locally

+## Technical notes
+
+### Parsing text
+
+One source of parser complexity is text. 
+
+Text in lenient html is basically "anything that wasn't parsed by the other parsers."
+
+This means that you can't have a simple parser like:
+
+```elm
+parser : Parser Node
+parser =
+    oneOf
+        [ element
+        , comment
+        , text
+        ]
+```
+
+Because how would you define the `text` parser that doesn't underconsume ("parse anything until `'<'`") nor overconsume?
+
+The best way I can think of accomplishing this with `elm/parser` is to, inside a loop, try all of your other parsers and then, if they all fail, consume a single character before looping again.
+
+Something like this:
+
+```elm
+parser : Parser (List Node)
+parser =
+    loop [] <|
+        \acc ->
+            oneOf
+                [ element |> map (\node -> Loop (node :: acc))
+                , comment |> map (\node -> Loop (node :: acc))
+                , chompIf (\_ -> True) 
+                    |> map (Text << String.fromChar)
+                    |> map (\node -> Loop (node :: acc))
+                , succeed () 
+                    |> map (\_ -> (Done (List.reverse acc)))
+                ]
+```
+
+It's not nice and simple anymore.  And since it's not possible to make an exhaustive `text` parser, I've had to repeat this kind of logic in various places.
+
+### The `LookAhead` parser
+
+TODO
+
 ## Special thanks

 - @hecrj and their contributors.