…to each of `onlyIfDescendantOfType` and `onlyIfAncestorOfType`.
Each of these can now accept a space-separated list of node types. If any of the
node's ancestors or descendants matches any of the given types, the test passes.
…not to expect empty tokens.
I made a change in 29cfcad that neglected to apply any scopes for a range that
was zero characters long. My instincts tell me that this is a safe change to
make, but it does affect any tests that used `tokensForScreenRow` and expected
it to report information about zero-length tokens. So the expected results
needed to be updated.
…when straddling an injection boundary.
When we do this one row at a time, the controlling layer for an indent query is
the comparison (previous) row, because that's where the query starts. The batch
version should be no different.
The new spec describes the exact scenario that revealed this bug.
Predicates were getting pretty chaotic. With the namespace, it's now clearer
where a predicate may be defined and what its purpose is. It also helps resolve
ambiguity — e.g., both folds and highlights can have an `endAt` predicate, so
now they're disambiguated as `fold.endAt` and `adjust.endAt`, respectively.
Namespaces:
* `highlight` for highlight-specific settings (of which there is presently only
one)
* `test` for scope tests (currently used by highlights and indents)
* `adjust` for scope adjustments (highlights)
* `indent` for indent settings
* `fold` for fold settings
Right now, only the `test` namespace will be used by more than one kind of query
file, but I could imagine that changing in the future.
For now, tests and adjustments still work without the prepended namespace, but
I imagine I'll take that out even before we ship this feature experimentally.
Much easier to make big changes like this before anyone depends on them.
This also draws a much clearer line between `#set!` predicates with special
meaning in Pulsar… and those which are being used to set arbitrary data for
later use. For instance, if you see `(#set! isOnLeftSideOfAssignment true)`, you
know it must just be arbitrary data.
…along with a temporary `core.useExperimentalModernTreeSitter` setting.
If we truly planned to keep three different language modes around indefinitely,
changing `core.useTreeSitterParsers` to an enum would make sense. But we don't,
so it'd actually just be a gigantic pain in the ass to migrate one setting to
another of a different type.
When we ship modern-tree-sitter experimentally, we'll make it opt-in via the
temporary setting. When we make it the official tree-sitter implementation and
remove the legacy node-tree-sitter version, we'll remove the temporary setting
and just change the semantics around `core.useTreeSitterParsers`.
Reverting the addition of the `core.languageParser` setting is a chore, but it
prevents a _gigantic_ future headache.
It's possible to assign a node different scopes based on its contents, but we
need a way to detect these sorts of nodes so that we know how much of the buffer
to invalidate when that node's contents change.
Previously, we were simply finding the range of the deepest node in the tree for
a given edit and invalidating its entire region, in the hopes that this
heuristic would let us hide this complexity from users. Sadly, we can't get away
with that in terms of performance; the most specific node could end up being a
_very large_ node, and it's just not worth slowing everything else down just to
handle these silly edge cases.
Instead, a grammar author can mark certain queries in `highlights.scm` with
an `invalidateOnChange` property:
```scm
((comment) @comment.block.documentation.javadoc.java
(#match? @comment.block.documentation.javadoc.java "^/\\*\\*")
(#set! final true)
(#set! invalidateOnChange true))
((comment) @comment.block.java
(#match? @comment.block.java "^/\\*")
(#set! invalidateOnChange true))
```
Here we've got one scope for a JavaDoc (`/**`) block comment and one for a
regular (`/*`) block comment. Since a change in either kind can flip the scope
of the entire comment, both should be marked with `invalidateOnChange`.
If the world were perfect — if the web-tree-sitter bindings allowed it, or if
we could pre-process query files — we could set this property automatically
whenever a node both (a) defined a `#match?` predicate and (b) began and ended
on different rows. We'd be able to use that information to keep track of these
nodes on our own, as well as detect whether an arbitrary edit will actually
trigger a scope change. In the meantime, this is an unusual enough need that
we'll put the onus on the grammar author for now.
The `syntaxQuery` key in a grammar definition file is the only query whose name
differs from its canonical filename, and there's no reason why that should be
the case. Might as well change it now before we ship instead of going through
the pain of changing it later.
…by not assuming a certain tree structure.
The injection-point callbacks should not make any assumptions about the tree
because there's no guaranteed that they'll run when the tree is clean.
* A paste that ended in a newline would incorrectly adjust the row after the
end of the pasted text.
* A paste that needed no after-paste adjustment would incorrectly alter the
undo history.
This is a big one. All the tests pass and I haven't been able to make it
explode in ages, so it's ready for testing by others. But if you manage to make
it break, you can flip the `FEATURE_ASYNC_PARSE` flag back to `false`.
This sets a time limit (currently 3ms) for parsing; if the job needs more time
than that, we return a promise and keep working on the parsing in a series of
jobs that each take no more than 3ms, yielding in between. The editor stays
responsive and the highlighting will catch up when it's able.
The system is able to account for edits that are made during an async parse,
and will keep re-parsing to handle those edits until the tree is clean.
In practice, this has been surprisingly pleasant to use, even on very large
files; and smaller files are unlikely to exceed that 3ms threshold very often.
We can play around with the exact threshold. If we make it smaller, then
theoretically the UI gets even snappier, but there's more overhead when the
parsing is divided into smaller tasks. To keep the UI snappy, we'd want to limit
the threshold to some fraction of the 16.67ms that we're given with each
UI frame, keeping in mind that there's plenty of work to do in that allotted
time _after_ the tree gets parsed.
The goal is to support as much as possible of what can be done synchronously
in TextMate grammars (using a perplexing regex-based system) by going async when
necessary in modern Tree-sitter grammars (using an `indents.scm` file that is
far easier to understand).
The one true concession we have to make here is to accept that certain
auto-indent behaviors can't be replicated atomically if they happen inside of a
transaction alongside other changes. In those cases, we must wait until the end
of the transaction, then auto-indent the entire range affected by the
transaction instead.
If you're interested in knowing more about this… well, I'd be surprised. But the
entire rationale is documented here:
https://gist.github.com/savetheclocktower/4fd4e3edce76ee5820990db0f012062e
…no matter where `HighlightIterator` starts.
If `HighlightIterator` tried to start a highlighting job at a point where more
than one layer reported an already-open scope, those scopes were combined such
that those from shallower layers went first. But that's not necessarily right,
and it's important to get the ordering correct because it affects
`scopeDescriptorForPosition` in particular.
Luckily, all the information we had to compile to find out if already-open
scopes should be covered by deeper injection layers… is exactly what we need to
solve this problem. When we assemble the list of open scopes, we can sort them
based on buffer position, only using layer depth to break ties.
All the logic that we had employed to decide whether an injection ought to
prevent a shallower layer from scoping a particular region… was completely
ignored when determining which scopes should already be open as part of
`HighlightIterator#seek`. Hence a scope boundary might be suppressed if it's in
the middle of a highlighting job's range, but still might be assumed to be open
for a highlighting job that starts a few lines later.
To fix this, we've got to keep track of when those already-open scopes would've
opened, and if we find that one injection layer has enabled
`coverShallowerScopes`, we must re-assess those open scopes and remove the ones
that _would've_ been suppressed if we were highlighting the area where the scope
would've opened.
With me so far?
There's another thing that needed fixing: an injection layer can stop its parent
from applying a scope to an arbitrary token, but it should not be able to stop
its parent from applying its _base_ scope, like `source.js`. That's also been
fixed.
If I had to do it all over again, I would've dropped the ability of one layer to
“cover” the scopes of another, but it was already half-implemented in
`TreeSitterLanguageMode` and I felt compelled to implement it _properly_,
for whatever definition of “properly” I had in my head at the time. Now it's an
opt-in behavior, and anyone who opts into it deserves what they get.
All the existing specs still pass, and I added a new one for this new behavior.
Luckily, the current implementation of `scopeDescriptorForPosition` is the
perfect way to test this, because it ends up calling `HighlightIterator#seek` at
the precise position we tell it to.
And a new option for injection points has been added:
`includeAdjacentWhitespace`. During my windmill-tilting efforts to take an
otherwise great `tree-sitter-markdown` parser and try to inject a YAML
front-matter section — despite its lack of understanding of that syntax — I
discovered that a certain section of YAML wasn't parsing right because the
`tree-sitter-markdown` parser was ignoring a bit of leading whitespace that the
`tree-sitter-yaml` parser needed to see, because it was syntactically
meaningful. That made me realize how to handle this in the injection point API.
I wouldn't dare try to land that YAML/Markdown hack into the Pulsar core, but
the `includeAdjacentWhitespace` option will be useful for other injections, most
notably PHP. When a `content` callback returns an array of nodes, often that
gets converted into a series of disjoint ranges that _aren't_ actually disjoint;
they're just consecutive syntax nodes separated by whitespace. So the new option
takes this into account; when it's enabled, if two would-be injection ranges are
separated only by whitespace, then those two ranges are consolidated into a
single range _along with_ that whitespace.
`onlyIfRangeWithData` and `onlyIfDescendantOfNodeWithData` (and their
negations).
Unlike `onlyIfConfig`, we're comparing these values to earlier values from
`#set!` predicates, so we shouldn't coerce them at all.
Fix a few scope resolution bugs.
Added and renamed a few specs, most notably the new `onlyIfConfig`.
Wrote specs for all the positive versions of tests, omitting specs for the
negations; therefore I've aimed to make the negations as simple as possible so
that they are implicitly proven to work as long as their counterparts work.
…in injected layers.
The last straw was the realization that in the following Markdown…
```js
let foo = "bar";
```
…the entire code fence, including the backticks, would carry the `source.js`
scope name. Instead, each injection range should begin `source.js` at its start
and end `source.js` at its end.
This could leave spaces inside of injections unscoped — because whitespace is
often excluded from injection ranges — but that's just something we have to live
with for now.
…by making its implementation identical to that of `TreeSitterLanguageMode`.
They had it right all along — to ensure fidelity, use the exact same approach
that the display layer itself uses. This should prevent
`scopeDescriptorForPosition`'s information from straying too far from reality.
Also fixed `bufferRangeForScopeAtPosition`, which had no knowledge of covered
scopes or scope adjustments. Added specs.
…because we had no way of realizing how much of the buffer was affected by a change.
This was a worst-case scenario: consider a multiline JavaDoc-style comment — i.e., staring with `/**`. Imagine a `highlights.scm` that uses `#match?` to scope it as either `comment.block` or `comment.block.documentation` based on the presence or absence of that second asterisk. Now imagine you remove that second asterisk to make it an ordinary block comment.
If the tree-sitter parser understands the difference between these things, and gives them different node types, there's no problem, because it'll tell you which ranges need new highlighting based on syntactic changes. But if the change merely means a different scope should be applied — like if you highlight documentation comments differently from block comments — the whole comment won't get re-scoped and re-highlighted, because the highlights query isn't consulted until after we decide how much to invalidate. In a Java file, only the line you edited would get invalidated, so the rest of the block comment would remain the wrong color.
An ideal solution here would be to
(a) recognize when a buffer edit will cause a capture to `#match?` differently from how it does now; then
(b) invalidate that range so as to get it re-highlighted.
The “problem” is that we don't run the highlights query until the display layer asks us to tell it what to re-highlight, plus we retain no data structure such that we can compare old-versus-new and see which things are different. But that's less of a problem and more of a design decision made in order to deliver acceptable performance and to keep that costly work in WASM-land where it belongs.
The solution I arrived at is to
(a) detect the range of the deepest single node that has been affected by an edit, and
(b) ensure sure we invalidate at least that much of the buffer when we make a change.
In most cases, this will not even cause more syntax highlighting to take place; Pulsar always seems to re-highlight at least the entire row that we're on when we insert as little as a single character.
This does put some limits on how people can use `#match?`, but those are prudent limits anyway. Imagine deciding to scope things differently based on the text contained in a distant descendant, or even in a _sibling_. These are footguns.
If we really didn't like this and needed to solve it in a more general way, we could invent a new kind of specialized SCM file called `invalidations.scm`, and define captures for any kind of node that needs to be invalidated if an arbitrary change happens within it. We could run that query as part of the process of evaluating which ranges to re-highlight. Folks would want to do this sparingly, but if there were no other way to get the kind of highlighting needed in a particular grammar, then it would be a necessary evil.