This is the latest `master` — it works fine now that we build our own version
of `web-tree-sitter`.
It's twice as big as the last stable release. I don't know why, but I've been
using it for about a week without any trouble.
Predicates were getting pretty chaotic. With the namespace, it's now clearer
where a predicate may be defined and what its purpose is. It also helps resolve
ambiguity — e.g., both folds and highlights can have an `endAt` predicate, so
now they're disambiguated as `fold.endAt` and `adjust.endAt`, respectively.
Namespaces:
* `highlight` for highlight-specific settings (of which there is presently only
one)
* `test` for scope tests (currently used by highlights and indents)
* `adjust` for scope adjustments (highlights)
* `indent` for indent settings
* `fold` for fold settings
Right now, only the `test` namespace will be used by more than one kind of query
file, but I could imagine that changing in the future.
For now, tests and adjustments still work without the prepended namespace, but
I imagine I'll take that out even before we ship this feature experimentally.
Much easier to make big changes like this before anyone depends on them.
This also draws a much clearer line between `#set!` predicates with special
meaning in Pulsar… and those which are being used to set arbitrary data for
later use. For instance, if you see `(#set! isOnLeftSideOfAssignment true)`, you
know it must just be arbitrary data.
…along with a temporary `core.useExperimentalModernTreeSitter` setting.
If we truly planned to keep three different language modes around indefinitely,
changing `core.useTreeSitterParsers` to an enum would make sense. But we don't,
so it'd actually just be a gigantic pain in the ass to migrate one setting to
another of a different type.
When we ship modern-tree-sitter experimentally, we'll make it opt-in via the
temporary setting. When we make it the official tree-sitter implementation and
remove the legacy node-tree-sitter version, we'll remove the temporary setting
and just change the semantics around `core.useTreeSitterParsers`.
Reverting the addition of the `core.languageParser` setting is a chore, but it
prevents a _gigantic_ future headache.
It's possible to assign a node different scopes based on its contents, but we
need a way to detect these sorts of nodes so that we know how much of the buffer
to invalidate when that node's contents change.
Previously, we were simply finding the range of the deepest node in the tree for
a given edit and invalidating its entire region, in the hopes that this
heuristic would let us hide this complexity from users. Sadly, we can't get away
with that in terms of performance; the most specific node could end up being a
_very large_ node, and it's just not worth slowing everything else down just to
handle these silly edge cases.
Instead, a grammar author can mark certain queries in `highlights.scm` with
an `invalidateOnChange` property:
```scm
((comment) @comment.block.documentation.javadoc.java
(#match? @comment.block.documentation.javadoc.java "^/\\*\\*")
(#set! final true)
(#set! invalidateOnChange true))
((comment) @comment.block.java
(#match? @comment.block.java "^/\\*")
(#set! invalidateOnChange true))
```
Here we've got one scope for a JavaDoc (`/**`) block comment and one for a
regular (`/*`) block comment. Since a change in either kind can flip the scope
of the entire comment, both should be marked with `invalidateOnChange`.
If the world were perfect — if the web-tree-sitter bindings allowed it, or if
we could pre-process query files — we could set this property automatically
whenever a node both (a) defined a `#match?` predicate and (b) began and ended
on different rows. We'd be able to use that information to keep track of these
nodes on our own, as well as detect whether an arbitrary edit will actually
trigger a scope change. In the meantime, this is an unusual enough need that
we'll put the onus on the grammar author for now.
The comment in `src/web-tree-sitter.js` and the documentation in
`vendor/web-tree-sitter` explain why this change is needed. I hope it's
temporary.
I haven't removed our `web-tree-sitter` dependency in `package.json` because
it's a useful way of declaring (and remembering) which version of tree-sitter
we've built against, and because it lets us flip a single config flag in
`src/web-tree-sitter.js` to compare the behavior of our custom version with that
of the stock version. The cost of keeping the stock version around is under 300
kilobytes.
The custom builds will come from specialty branches on our tree-sitter fork.
When we upgrade to a new version, someone should follow the directions in
`vendor/web-tree-sitter/README.md` to create a new specialty branch.
When someone wants to build a `language-foo` package and finds that
`tree-sitter-foo` needs a function we don't yet export, they should be able to
see the warning we generate in the console and file a ticket. If we're on our
game, we should be able to generate a new web-tree-sitter build and get it into
a rolling release within a week or so.
I hope web-tree-sitter someday finds a way to fix this so that we no longer need
to do this chore. See https://github.com/tree-sitter/tree-sitter/issues/949.
The `syntaxQuery` key in a grammar definition file is the only query whose name
differs from its canonical filename, and there's no reason why that should be
the case. Might as well change it now before we ship instead of going through
the pain of changing it later.
…by not assuming a certain tree structure.
The injection-point callbacks should not make any assumptions about the tree
because there's no guaranteed that they'll run when the tree is clean.
`tags.scm` is a query file that comes standard in a tree-sitter repo, and is one
that we understand well enough to be able to use to identify symbols in the
file.
This is a big deal because it gives us an alternative to the `ctags`-based
symbol navigation that `symbols-view` has been using for ages.
The plan is to refactor `symbols-view` to do nothing except consume various
providers of symbols. We'd bundle a `ctags` provider and a tree-sitter-based
provider by default. LSP-based packages could also register as providers.
…and schedule them to be redone when the tree is clean.
Async parsing means we have to account for this. We tell the display layer when
ranges need to be re-highlighted, but we can't be sure when those tasks will be
scheduled, or if the tree will be clean when we do the required `highlights.scm`
captures.
We have several options:
1. When we run the query and notice that the tree is dirty, do an off-schedule
synchronous parse.
2. When `HighlightIterator#seek` notices the tree is dirty, cancel the highlight
job and reschedule it for whenever the tree is clean.
3. When `HighlightIterator#seek` notices the tree is dirty, proceed with the
highlight job but _also_ schedule the same job to run again when the tree is
clean.
#1 is tempting, but ultimately irresponsible if we don't know how long that
synchronous parse will take.
#2 is not technically possible, because the highlighting job will proceed no
matter what we do, so if we return early from `seek`, the display layer will
interpret that to mean the entire buffer range is plain text.
#3 is what I've implemented here. Captures against a dirty tree are likely to
be at least partially inaccurate, but since we have to go through with the job
anyway, we might as well allow it to proceed, thereby minimizing the flash of
unhighlighted text.
I can't think of a spec that will catch this situation reliably, but I'll give
it a shot later on.