Translate syntax warnings and attach to IR when translating operator applications.
We should ensure that all Trees are checked for warnings and every warning is attached to some IR. That would require a bit of refactoring: In TreeToIr, we could define helpers wrapping every IR constructor and accepting a `Tree` parameter. The `Tree` could be used to populate the `IdentifiedLocation` when constructing the IR type, and then to attach all warnings after constructing the IR object.
# Important Notes
- Update JNI dependency.
- Introduces a `cargo bench` runner for parser.
In a sequence of value-level operators, whitespace does not affect relative precedence. Functional operators still follow the space-precedence rules.
The "functional" operators are: `>> << |> |>> <| <<| : .`, application, and any operator containing `<-` or `->`. All other operators are considered value-level operators.
Asymmetric whitespace can still be used to form *operator sections* of value-level operators, e.g. `+2 * 3` is still equivalent to `x -> (x+2) * 3`.
Precedence of application is unchanged, so `f x+y` is still equivalent to `f (x + y)` and `f x+y * z` is still equivalent to `(f (x + y)) * z`.
Any attempt to use spacing to override value-level operator precedence will be caught by the new enso linter. Mixed spacing (for clarity) in value-operator expressions is allowed, as long as it is consistent with the precedences of the operators.
Closes#10366.
# Important Notes
Precedence warnings:
- The parser emits a warning if the whitespace in an expression is inconsistent with its effective precedence.
- A new enso linter can be run with `./run libraries lint`. It parses all `.enso` files in `distribution/lib` and `test`, and reports any errors or warnings. It can also be run on individual files: `cargo run --release --bin check_syntax -- file1 file2...` (the result may be easier to read than the `./run` output).
- The linter is also run as part of `./run lint`, so it is checked in CI.
Additional language change:
- The exponentiation operator (`^`) now has higher precedence than the multiplication class (`*`, `/`, `%`). This change did not affect any current enso files.
Library changes:
- The libraries have been updated. The new warnings were used to identify all affected code; the changes themselves have not been programmatically verified (in many cases their equivalence relies on the commutativity of string concatenation).
Single-phase whitespace-aware precedence resolution.
#### Performance
![newplot(4)](https://github.com/user-attachments/assets/9822b0dc-17c3-4d2d-adf7-eb8b1c240522)
Since this is a major refactor of the core of the parser, I benchmarked it; it's about 3% faster.
# Important Notes
- Move operator-identifier recognition to lexer.
- Move compound-token assembly out of precedence resolver
Add support for private methods. Most of the changes are in parser and compiler. The runtime checking of private functions was already present since #9692
# Important Notes
- Only top-level methods can be declared `private`.
- private method cannot be called from different project
- private method cannot be accessed from polyglot code (private method does not exist for polyglot code)
Closes#8836.
Atom constructors can be declared as private (project-private). project-private constructors can be called only from the same project. See the encapsulation.md docs for more info.
---------
Co-authored-by: Jaroslav Tulach <jaroslav.tulach@enso.org>
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
Co-authored-by: Hubert Plociniczak <hubert.plociniczak@gmail.com>
Co-authored-by: Kaz Wesley <kaz@lambdaverse.org>
* Text literals: Accept unpaired-surrogate escape codes.
Unpaired surrogates are not allowed by Unicode, but they occur in practice
because many systems accept them; for example, they may be present in filenames
on Windows (which are otherwise constrained to UTF-16).
Programs written in Enso should be able to work with them, if only because they
represent edge cases that should be tested when converting encodings and at
other system boundaries.
- Generalize the representation of interpreted-text-escapes in the lexer, so
that we are not tied to the strict Unicode of Rust's `str`.
- Move some doc-comment code from the parser to test utilities.
- Simplify token serialization.
* Reduce parser dependencies
- `enso-parser-syntax-tree-visitor` is now only used when building tests and debug tools.
- Remove `enso-logging` crate and its macros.
- The main bin for `enso-parser` has been moved to a `check_syntax` tool in `enso-parser-debug`.
This PR updates the Rust toolchain to recent nightly.
Most of the changes are related to fixing newly added warnings and adjusting the feature flags. Also the formatter changed its behavior slightly, causing some whitespace changes.
Other points:
* Changed debug level of the `buildscript` profile to `lint-tables-only` — this should improve the build times and space usage somewhat.
* Moved lint configuration to the worksppace `Cargo.toml` definition. Adjusted the formatter appropriately.
* Removed auto-generated IntelliJ run configurations, as they are not useful anymore.
* Added a few trivial stdlib nightly functions that were removed to our codebase.
* Bumped many dependencies but still not all:
* `clap` bump encountered https://github.com/clap-rs/clap/issues/5407 — for now the warnings were silenced by the lint config.
* `octocrab` — our forked diverged to far with the original, needs more refactoring.
* `derivative` — is unmaintained and has no updated version, despite introducing warnings in the generated code. There is no direct replacement.
* Test describing the current behavior of chained if then else application
* Chained block should behave just like Group around if_then_else
* Finishing line on BlockStart fixes if_then_else_chained_block
* Only finish the line when there was not start of a macro segment
* Fix tests
* Refine else-body with macro patterns.
* Update test syntax to maintain original semantics
* Few additional tests
---------
Co-authored-by: Kaz Wesley <kaz@lambdaverse.org>
Implements #6166.
# Important Notes
- More consistent handling of `default` arguments. `default` is a valid identifier, and only has special meaning when it isn't bound in scope. Since distinguishing the builtin `default` from an identifier called `default` cannot be done until alias analysis has been performed, `default` is now represented in the AST as a regular identifier.
- `TreeToIr`: Remove `insideTypeAscription`. It was only used for bug-for-bug compatibility with the old parser during the transition.
Add `line:column` information to source code references produced by the parser. This information will be used by GUI2 as part of the solution to #8134.
# Important Notes
- `parse_all_enso_files.sh` has been used to ensure this doesn't affect tree structures.
- `parse_all_enso_files.sh` now checks emitted locations for consistency, and has been used to verify that all line:col references match the values found by an independent scan of the source up to the given UTF8 position.
- Validate spans during existing lexer and parser unit tests, and in `enso_parser_debug`.
- Fix lost span info causing failures of updated tests.
# Important Notes
- [x] Output of `parse_all_enso_files.sh` is unchanged since before #7881 (modulo libs changes since then).
- When the parser encounters an input with the first line indented, it now creates a sub-block for lines at than indent level, and emits a syntax error (every indented block must have a parent).
- When the parser encounters a number with a base but no digits (e.g. `0x`), it now emits a `Number` with `None` in the digits field rather than a 0-length digits token.
Generate TS bindings and lazy deserialization for the parser types.
# Important Notes
- The new API is imported into `ffi.ts`, but not yet used.
- I have tested the generated code in isolation, but cannot commit tests as we are not currently able to load WASM modules when running in `vitest`.
Adds the ability to declare a module as *private*. Modifies the parser to add the `private` keyword as a reserved keyword. All the checks for private modules are implemented as an independent *Compiler pass*. No checks are done at runtime.
# Important Notes
- Introduces new keyword - `private` - a reserved keyword.
- Modules that have `private` keyword as the first statement are declared as *private* (Project private)
- Public module cannot have private submodules and vice versa.
- This would require runtime access checks
- See #7088 for the specification.
Resolve macros eagerly. Improves performance; allows parser to handle arbitrarily-long lines (fixes#7691).
# Important Notes
- A new utility, `lib/rust/parser/debug/tools/parse_all_enso_files.sh`, supports comparing ASTs parsed with different versions of the parser. This tool has been used to verify that this refactor doesn't change the result of parsing any standard library or test file.
Fixes#5826.
# Important Notes
- Change frontend representation of negation.
- Fix a precedence issue: The `.` operators in -1.x and -1.2 must have different precedences.
- Remove a no-longer-needed special case from backend translation.
- Add tests for this case after all translations.
Implement new Enso documentation parser; remove old Scala Enso parser.
Performance: Total time parsing documentation is now ~2ms.
# Important Notes
- Doc parsing is now done only in the frontend.
- Some engine tests had never been switched to the new parser. We should investigate tests that don't pass after the switch: #5894.
- The option to run the old searcher has been removed, as it is obsolete and was already broken before this (see #5909).
- Some interfaces used only by the old searcher have been removed.
`@` should not be legal to use as a binary operator. I accepted it in the parser because it occurred in the .enso sources, but it was actually used to create a syntax error to test error recovery.
See: https://www.pivotaltracker.com/story/show/184054024
Libraries: Revert changes that were necessitated by a new rule we have decided not to introduce.
Parser:
- Support mixed constructors/bindings in types.
- Disallow zero-length hex sequences in character escapes: `\x`, `\u`, `\u{}`, `\U`, `\U{}` are no longer legal synonyms for `\0` (matches old parser behavior).
Ensure all tokens from the input are represented in trees resulting from invalid inputs--tests now cover every reachable code line that creates an `Invalid` node. (Also implemented stricter validation, mainly of `import`/`export` statements.)
See: https://www.pivotaltracker.com/story/show/183405907
Fix bugs in `TreeToIr` (rewrite) and parser. Implement more undocumented features in parser. Emulate some old parser bugs and quirks for compatibility.
Changes in libs:
- Fix some bugs.
- Clean up some odd syntaxes that the old parser translates idiosyncratically.
- Constructors are now required to precede methods.
# Important Notes
Out of 221 files:
- 215 match the old parser
- 6 contain complex types the old parser is known not to handle correctly
So, compared to the old parser, the new parser parses 103% of files correctly.
- Special precedence rules for case-of so that `:` operator works without parens or nospace-grouping.
- Support an old-lambda syntax: `x->x-> x`. According to the usual rules, the first nospace group would be parsed as an operator section. The expression now parses as a lambda that contains a lambda.
- Match old parser treatment of # in doc comments.
- Tweak precedence so (a : B = c) works.
- Documented constructors.
- Implement macro-contexts-lite (`from` is now only a keyword at the beginning of a line)
- Support special nospace-group handling for old lambdas (so expressions like this work: `x-> y-> x + y`)
- Fix a text-escape incompatibility
# Important Notes
- There is now an `OperatorFunction`, which is like a `Function` but has an operator for a name, and likewise an `OperatorTypeSignature`.