basic syntax highlighting
added more syntax highlighting coverage
add example of a markdown table with styling
move FIXED_TOKEN logic into highlight
refactor highlight, add support for backpassing
escape html from source code
fix bug with <pre> tag ordering
refactor out html from roc_parse
remove test, put highlight functionality into separate file
fix typo
* Unify parsing of string literals and scalar literals, to (e.g.) ensure escapes are handled uniformly. Notably, this makes unicode escapes valid in scalar literals.
* Add a variety of custom error messages about specific failure cases of parsing string/scalar literals. For example, if we're expecting a string (e.g. a package name in the header) and the user tried using single quotes, give a clear message about that.
* Fix formatting of unicode escapes (they previously used {}, now correctly use () to match roc strings)
... and extract a shared helper with collection_trailing_sep_e, thereby resolving some inconsistencies with normal collection parsing.
... and delete some dead code while I'm at it
* test_fmt moves out of fmt crate
* test_parse _mostly_ moves out of parse crate and into `test_snapshots.rs` (some simple tests remain)
* now there's only two fuzz targets, fuzz_expr and fuzz_module, that cover both parsing and formatting
* added a system to auto-add new snapshot entries for new test files
* took some commented-out tests in `test_parse` and converted them to snapshot tests
* moved test_fmt's verification of formatting consistency into test_snapshots
* fixed a huge derp on my part where the fmt fuzzer in #4758 was completely useless (broken by refactoring just prior to submitting the PR)
* fixed a formatting bug found by fuzzing (bound_variable.expr.roc) - that I missed earlier due to ^^^ that derp
* no longer have roc_test_utils as a dependency in fmt - which was causing problems for the wasm build
This commit adds fuzzing for the (expr) formatter, with the same invariants that we use for fmt tests:
* We start with text, which we parse
* We format the AST, which must succeed
* We parse back the AST and make sure it's identical igoring whitespace+comments
* We format the new AST and assert it's equal to the first formatted version ("idempotency")
Interestingly, while a lot of bugs this found were in the formatter, it also found some parsing bugs.
It then fixes a bunch of bugs that fell out:
* Some small oversights in RemoveSpaces
* Make sure `_a` doesn't parse as an inferred type (`_`) followed by an identifier (parsing bug!)
* Call `extract_spaces` on a parsed expr before matching on it, lest it be Expr::SpaceBefore - when parsing aliases
* A few cases where the formatter generated invalid/different code
* Numerous formatting bugs that caused the formatting to not be idempotent
The last point there is worth talking further about. There were several cases where the old code was trying to enforce strong
opinions about how to insert newlines in function types and defs. In both of those cases, it looked like the goals of
(1) idempotency, (2) giving the user some say in the output, and (3) these strong opinions - were often in conflict.
For these cases, I erred on the side of following the user's existing choices about where to put newlines.
We can go back and re-add this strong opinionation later - but this seemed the right approach for now.
On my M1 mac this shows as ~25% faster at parsing Num.roc than the old implementation, probably because nobody wrote any NEON code.
Even on my x86_64 linux box (Ryzen 2700x), this shows as 10% faster than the current SSE implementation (running with RUSTFLAGS="-C target-cpu=native").