…for suggesting the indent of a number of buffer rows at once.
The way `TextEditor` wants to auto-indent is by asking each row what its
indent level should be, then indenting each one in turn. This is because it
knows the language mode will probably determine row X's correct indentation by
using row X-1's indentation as a baseline, so there's no point in even calling
`suggestedIndentForBufferRow` for line X until X-1 is correctly indented.
This isn't ideal for us because each editor change will prompt a tree re-parse.
Instead, `suggestedIndentForBufferRows` will return the correct indentation
level for each row in a range without making any buffer changes.
It can do this because it tells `suggestedIndentForBufferRow` what the previous
row's indentation level is – hypothetically — so that it doesn't try to look it
up. Only the first row in the range compares its indentation to the _actual_
indentation of its preceding row; the rest of the rows figure out their
indentations relative to the hypothetical value we computed in the previous trip
through the loop.
This is paired with special logic in `TextEditor` to use this method instead of
its typical code path when asked to auto-indent multiple rows at once. Thus
we'll reduce the number of unscheduled tree parses from N (where N is the total
number of rows wanting to be auto-indented) to either zero or one.
Amazingly, I think this is the first time I've had to touch `text-editor.js` for
any reason.
This hadn't occurred to me until I tried to mark illegal optional-chaining operators in certain contexts, much like how it's done in https://github.com/pulsar-edit/pulsar/pull/79. But it's a powerful way to match things regardless of their exact ancestry.
I spent a lot of time pursuing various things we could try to cut down on the amount of tree reparsing we'd need.
I haven't committed any of it, but those experiments did inform some changes that are present here. When wanting to perform an operation that would require a tree re-parse, we can now theoretically choose between (a) re-parsing and staying synchronous or (b) waiting until the next scheduled tree parse and going async.
The implications for how indentation works are too big to plow ahead with this approach, but it would be worth experimenting with. The vast majority of all our indentation decisions are prompted by a single keystroke, and in those cases there's no particular reason why we can't wait until a few milliseconds later when we're going to be re-parsing anyway.
Spent a lot of time on indents today. Now that `nvim-treesitter` has renamed their indent captures, I figured I should switch to a system whose names made more sense to me.
`@indent_end` and `@branch` were too similar to keep as separate things, and have been consolidated into `@dedent`.
This system is pretty easy to write queries for, but it's awfully hard to _explain_, so I wonder if I have to go back to the drawing board here. I was convinced of its theoretical elegance, but then found a situation where I needed a different sort of capture, so I also invented `@dedent.next`. So now I'm not so sure.
Folds also got a look because they really did need some optimization. When we perform a transaction in the editor, we invalidate the folds cache, and since we know we'll end up querying against each of those lines individually, we can pre-capture each range with a folds query and save some time.
The worst thing about indents is the prospect of needing to re-parse the tree separate from the ordinary update process, since the editor calls `suggestedIndentForBufferRow` before the editor transaction finishes. It's not a bottleneck yet, but it could be if a given tree-sitter grammar were particularly slow. We can at least re-use the same tree when one of the `suggestedIndent` methods calls another.
…because we had no way of realizing how much of the buffer was affected by a change.
This was a worst-case scenario: consider a multiline JavaDoc-style comment — i.e., staring with `/**`. Imagine a `highlights.scm` that uses `#match?` to scope it as either `comment.block` or `comment.block.documentation` based on the presence or absence of that second asterisk. Now imagine you remove that second asterisk to make it an ordinary block comment.
If the tree-sitter parser understands the difference between these things, and gives them different node types, there's no problem, because it'll tell you which ranges need new highlighting based on syntactic changes. But if the change merely means a different scope should be applied — like if you highlight documentation comments differently from block comments — the whole comment won't get re-scoped and re-highlighted, because the highlights query isn't consulted until after we decide how much to invalidate. In a Java file, only the line you edited would get invalidated, so the rest of the block comment would remain the wrong color.
An ideal solution here would be to
(a) recognize when a buffer edit will cause a capture to `#match?` differently from how it does now; then
(b) invalidate that range so as to get it re-highlighted.
The “problem” is that we don't run the highlights query until the display layer asks us to tell it what to re-highlight, plus we retain no data structure such that we can compare old-versus-new and see which things are different. But that's less of a problem and more of a design decision made in order to deliver acceptable performance and to keep that costly work in WASM-land where it belongs.
The solution I arrived at is to
(a) detect the range of the deepest single node that has been affected by an edit, and
(b) ensure sure we invalidate at least that much of the buffer when we make a change.
In most cases, this will not even cause more syntax highlighting to take place; Pulsar always seems to re-highlight at least the entire row that we're on when we insert as little as a single character.
This does put some limits on how people can use `#match?`, but those are prudent limits anyway. Imagine deciding to scope things differently based on the text contained in a distant descendant, or even in a _sibling_. These are footguns.
If we really didn't like this and needed to solve it in a more general way, we could invent a new kind of specialized SCM file called `invalidations.scm`, and define captures for any kind of node that needs to be invalidated if an arbitrary change happens within it. We could run that query as part of the process of evaluating which ranges to re-highlight. Folks would want to do this sparingly, but if there were no other way to get the kind of highlighting needed in a particular grammar, then it would be a necessary evil.
An injection layer will include its root scope over its entire extent — i.e., whatever node is the root of its injection point — by default.
You can change the injected language's scope name with the new `languageScope` option on `addInjectionPoint`, but more useful is the ability to set it to `null` and bypass this behavior.
That's the right thing to do for the builtin injections `text.todo` and `text.hyperlink`, or else their scopes would always be present.
(This is a pain in the ass and I'm not 100% sure we've found the right solution. Pretend there's an injection for an embedded language like ERB; it'd inject into a bunch of disparate ranges, but still apply a scope over the entire extent, even the ranges where it's not active. The alternative — defining start/end scope boundaries for each of the disjoint ranges — is pretty chaotic in its own right, since those ranges tend to ignore things like whitespace. I would want such a language to set `languageScope: null`, then set its root scope via capture at whatever level of granularity makes sense for that language.)
…for defining an indentation level relative to that of another line.
In rare cases where the indentation level of row X isn't just that of X-1 _plus or minus one_, the `@match` capture lets us specify when a line should match up exactly with another line.
Needs `(#set! matchIndentOf foo)`, where `foo` is a node descriptor. Will use the indentation level at the start of the specified node. (Might change this to accept a position descriptor.)
Optionally accepts `(#set! offsetIndent X)`, where X is any integer; allows us to say (e.g.) “indent one level deeper than X.”
The `bracket-matcher` package was assuming that `getSyntaxNodeForPosition`'s second argument — the test predicate — would stop running as soon as we found our match, and was using it to assign `startTag` and `endTag` variables. This is a fair assumption to make. So we shouldn't run the predicate until we're prepared to return immediately as soon as we find a match. That means putting _all_ candidates into a list, sorting that list from smallest to largest, and only then running the predicate.
Originally tried to use `tree-sitter-comment` (https://github.com/stsewd/tree-sitter-comment), but the syntax used there is stricter; eventually used it as reference to write something simpler. Works basically identically to the existing `language-todo` TM-style grammar.
The tree-sitter-hyperlink parser is of my own invention and is quite rudimentary at this point, but it would be nice if it supported more of the stuff that the TM-style hyperlink grammar supports.
Effectively: install ppm's dependencies (and run its postinstall
scripts) with Yarn, not npm, on Windows.
This makes the installation of ppm's dependencies consistently use
Yarn, not npm, in Cirrus CI. Consistent for all OSes/arches we build
for in Cirrus.
(* Although, at some level the postinstall scripts are run via npm as
a subprocess of node as a subprocess of Yarn in a shell/console
sub-process... Yes, in my opinion, ppm's postinstall scripts are way
too complicated...)
ALSO: This fixes an oversight in PR 239 where the build script of the
wrong project (ppm instead of core) was being run at one point, again
on Windows.
So, this should make it so Pulsar's core dependencies are built for
the correct Electron version instead of built for Node... Oops!