[language-php] Add meta.embedded.block/meta.embedded.line

…to each PHP section in order to match the functionality of the TextMate grammar.

This was a gigantic endeavor and involved doing several things that were on my “ugh, I’ll get around to it” list:

* To make some of this stuff work, we were using two different PHP parser layers. The root one did the syntax highlighting and separated the HTML from the rest, and the deeper injection was there just so we could scope some ranges as `source.php`.

  That doesn’t work because of rules we have about shallower layers acting before deeper layers when iterating through syntax highlighting boundaries. I was trying to be too clever. Now the _root_ layer is the one that does no highlighting, and the _deeper_ layer is the one that exists to apply `source.php` _and then_ perform highlighting and indentation hinting and folding and whatnot.

  If we need to, we can move some of the queries to the root layer for whatever reason; maybe we come across a bug six months from now that could be fixed if we made the root layer in charge of folding. We have options.

* All boundaries for `HighlightIterator`s now _either_ open one or more scopes _or_ close one or more scopes. If one boundary can close some scopes and open others, then boundaries on different layers that share a buffer position cannot possibly close/open in the correct order because they can't intermingle. This needed to happen and this was as good of an excuse as any to do it.
* We needed to present sane PHP opening/closing ranges to the injection. Ironically, `tree-sitter-php` itself makes this nearly impossible because of how the tree is structured; it tries to hide the `<?php` and `?>` tags from the PHP layer and treat them as text, despite the fact that PHP is the thing that knows how to parse that.

  The best way to do this — at least until I encounter a reason why it can’t work — is to find all the `<?php`s and `?>`s, sort them, group them, and build fake node ranges. This is fine! I mean, it’s ridiculous, but it’s fine! The ranges that we hand off to another parser are allowed to be completely arbitrary!

  This lets us do what we were only approximating before: have _one_ PHP injection layer with _any number of_ content ranges, each of which begins with `<?php` and ends with `?>` without trying to claim any surrounding whitespace. This would be worth it even if we didn’t have to do any of this other stuff.

* Each content range of the injection needs _either_ `meta.embedded.line.php` _or_ `meta.embedded.block.php` depending on whether the range begins and ends on the same line. This is not something I was willing to regress on because it's _hard_ to distinguish between PHP and non-PHP unless your editor helps you out, and because I wasn't going to go into a bunch of themes and tell them to treat `source.php` like `meta.embedded`.

  This also meant that we needed to be able to add _multiple_ “root” scope names per content range. But we already had a mode in which the `languageScope` injection option could be a callback, so it wasn't hard to make it a callback that could take extra parameters for the buffer and range.

  This isn't a feature that I'm eager for other grammars to use, or even necessarily know about, but it was what we needed to deliver feature parity here. And none of it would have been necessary if `tree-sitter-php` made more sensible choices. (Whatever. Maybe what I want isn't possible for some strange reason.)

All existing tests pass. Tests have been written for the new `languageScope` features. Tests need to be written for PHP just like for all other language grammars, but that was on the to-do list anyway. If any of this turns out to be catastrophic, it’s easy to roll back, but tests are passing on my machine and I expect them to pass in CI.
This commit is contained in:
Andrew Dupont 2024-01-20 20:49:17 -08:00
parent 2a0ebacf62
commit 46877b05ff
5 changed files with 329 additions and 162 deletions

View File

@ -8,10 +8,6 @@ injectionRegex: 'php|PHP'
treeSitter:
parserSource: 'github:tree-sitter/tree-sitter-php#d5e7cacb6c27e0e131c7f76c0dbfee56dfcc61e3'
grammar: 'tree-sitter/tree-sitter-php.wasm'
highlightsQuery: 'tree-sitter/queries/highlights.scm'
tagsQuery: 'tree-sitter/queries/tags.scm'
foldsQuery: 'tree-sitter/queries/folds.scm'
indentsQuery: 'tree-sitter/queries/indents.scm'
fileTypes: [
'aw'

View File

@ -3,11 +3,13 @@ type: 'modern-tree-sitter'
parser: 'tree-sitter-php'
# Give it a precise injectionRegex that won't get accidentally matched with
# anything. This grammar only exists as a way to apply the `source.php` scope.
# anything.
injectionRegex: '^(internal-php)$'
treeSitter:
parserSource: 'github:tree-sitter/tree-sitter-php#594b8bad093abe739c3d2a2cae5abae33c5fb23d'
# TODO: See if we can make an “empty” Tree-sitter parser. We don't need this
# layer to do any of its own parsing work.
grammar: 'tree-sitter/tree-sitter-php.wasm'
highlightsQuery: 'tree-sitter/queries/highlights.scm'
tagsQuery: 'tree-sitter/queries/tags.scm'
foldsQuery: 'tree-sitter/queries/folds.scm'
indentsQuery: 'tree-sitter/queries/indents.scm'

View File

@ -1,8 +1,83 @@
const { Point, Range } = require('atom');
function isPhpDoc(node) {
let { text } = node;
return text.startsWith('/**') && !text.startsWith('/***')
}
function comparePoints(a, b) {
const rows = a.row - b.row;
if (rows === 0) {
return a.column - b.column;
} else {
return rows;
}
}
// Given a series of opening and closing PHP tags, pairs and groups them as a
// series of node specs suitable for defining the bounds of an injection.
function interpret(nodes) {
let sorted = [...nodes].sort((a, b) => {
return comparePoints(a.startPosition, b.startPosition);
});
let ranges = [];
let currentStart = null;
let lastIndex = nodes.length - 1;
for (let [index, node] of sorted.entries()) {
let isStart = node.type === 'php_tag';
let isEnd = node.type === '?>';
let isLast = index === lastIndex;
if (isStart) {
if (currentStart) {
throw new Error('Unbalanced content!');
}
currentStart = node;
if (isLast) {
// There's no ending tag to match this starting tag. This is valid and
// simply signifies that the rest of the file is PHP. We can return a
// range from here to `Infinity` and let the language mode clip it to
// the edge of the buffer.
let spec = {
startIndex: currentStart.startIndex,
startPosition: currentStart.startPosition,
endIndex: Infinity,
endPosition: Point.INFINITY,
range: new Range(
currentStart.range.start,
Point.INFINITY
)
};
ranges.push(spec);
currentStart = null;
break;
}
}
if (isEnd) {
if (!currentStart) {
throw new Error('Unbalanced content!');
}
let spec = {
startIndex: currentStart.startIndex,
startPosition: currentStart.startPosition,
endIndex: node.endIndex,
endPosition: node.endPosition,
range: new Range(
currentStart.range.start,
node.range.end
)
};
ranges.push(spec);
currentStart = null;
}
}
return ranges;
}
exports.activate = function () {
// Here's how we handle the mixing of PHP and HTML:
@ -50,36 +125,48 @@ exports.activate = function () {
return 'internal-php';
},
content(node) {
let results = [];
// At the top level we should ignore `text` nodes, since they're just
// HTML. We should also ignore the middle children of
// `text_interpolation` nodes (also `text`), but we need to include their
// first and last children, which correspond to `?>` and `<?php`.
// The actual structure of the tree is utter chaos for us. The best way
// to make sense of it is to grab all the delimiters of the ranges we're
// interested in, sort them, pair them off, and turn them into fake
// ranges. As long as we return objects with the range properties that
// actual nodes have, _and_ we opt into `includeChildren` (so that it
// doesn't try to read the `children` property to subtract child nodes'
// ranges), this works just fine.
//
// In practice, it seems that `text` is always a child of the root except
// inside of `text_interpolation`, and `text_interpolation` is always a
// child of the root. The only exceptions I've noticed are when the tree
// is in an error state, so they may not be worth worrying about.
for (let child of node.children) {
if (child.type === 'text') { continue; }
if (child.type === 'text_interpolation') {
for (let grandchild of child.children) {
if (grandchild.type === 'text') { continue; }
results.push(grandchild);
}
continue;
}
results.push(child);
}
return results;
// If you're ever skeptical about whether this is really the easiest way
// to do this, fire up `tree-sitter-tools` and take a look at the
// structure of a PHP file in `tree-sitter-php`. If you know of something
// simpler, I'm all ears.
//
// TODO: This method should be allowed to return actual ordinary `Range`
// instances, in which case we'd understand that no futher processing
// need take place by the language mode.
let boundaries = node.descendantsOfType(['php_tag', '?>']);
return interpret(boundaries);
},
includeChildren: true,
newlinesBetween: true,
includeAdjacentWhitespace: true
});
newlinesBetween: false,
// includeAdjacentWhitespace: true,
// TODOs and URLs
// ==============
// For parity with the TextMate PHP grammar, we need to be able to scope
// this region with not just `source.php` but also `meta.embedded.X.php`,
// where X is one of `line` or `block` depending on whether the range spans
// multiple lines.
//
// There is no way to do this via queries because there is no discrete node
// against which we could conditionally add `meta.embedded.block` or
// `meta.embedded.line`… because of the aforementioned lunacy of the tree
// structure.
//
// So we had to invent a feature for it. When `languageScope` is a function,
// it allows the injection to decide on a range-by-range basis what the
// scope name is… _and_ it can return more than one scope name.
languageScope(grammar, _buffer, range) {
let extraScope = range.start.row !== range.end.row ?
'meta.embedded.block.php' : 'meta.embedded.line.php';
return [grammar.scopeName, extraScope];
}
});
// HEREDOCS and NOWDOCS
@ -129,6 +216,9 @@ exports.activate = function () {
};
// TODOs and URLs
// ==============
exports.consumeHyperlinkInjection = (hyperlink) => {
hyperlink.addInjectionPoint('text.html.php', {
types: ['comment', 'string_value'],

View File

@ -1428,6 +1428,73 @@ describe('WASMTreeSitterLanguageMode', () => {
).toBe(true);
});
it('allows multiple base scopes on the injected layer when `languageScope` is a function', async () => {
let customJsConfig = { ...jsConfig };
let customJsGrammar = new WASMTreeSitterGrammar(atom.grammars, jsGrammarPath, customJsConfig);
await jsGrammar.setQueryForTest('highlightsQuery', `
(comment) @comment
(property_identifier) @property
(call_expression (identifier) @function)
(template_string) @string
(template_substitution
["\${" "}"] @interpolation)
`);
let customHtmlConfig = { ...htmlConfig };
let customHtmlGrammar = new WASMTreeSitterGrammar(atom.grammars, htmlGrammarPath, customHtmlConfig);
await htmlGrammar.setQueryForTest('highlightsQuery', `
(fragment) @html
(tag_name) @tag
(attribute_name) @attr
`);
customHtmlGrammar.addInjectionPoint({
...SCRIPT_TAG_INJECTION_POINT,
languageScope: (grammar, _buffer, range) => {
return [grammar.scopeName, `meta.line${range.start.row}`];
}
});
jasmine.useRealClock();
atom.grammars.addGrammar(customJsGrammar);
atom.grammars.addGrammar(customHtmlGrammar);
buffer.setText('<script>\nhello();\n</script>\n<div>\n</div>\n<script>\ngoodbye();</script>');
const languageMode = new WASMTreeSitterLanguageMode({
grammar: customHtmlGrammar,
buffer,
config: atom.config,
grammars: atom.grammars
});
buffer.setLanguageMode(languageMode);
await languageMode.ready;
let descriptor = languageMode.scopeDescriptorForPosition([1, 1]);
expect(
descriptor.getScopesArray().includes('source.js')
).toBe(true);
expect(
descriptor.getScopesArray().includes(`meta.line0`)
).toBe(true);
expect(
descriptor.getScopesArray().includes(`meta.line5`)
).toBe(false);
descriptor = languageMode.scopeDescriptorForPosition([6, 1]);
expect(
descriptor.getScopesArray().includes('source.js')
).toBe(true);
expect(
descriptor.getScopesArray().includes(`meta.line5`)
).toBe(true);
expect(
descriptor.getScopesArray().includes(`meta.line0`)
).toBe(false);
});
it('notifies onDidTokenize listeners the first time all syntax highlighting is done', async () => {
const promise = new Promise(resolve => {
editor.onDidTokenize(event => {

View File

@ -2582,29 +2582,29 @@ class HighlightIterator {
getCloseScopeIds() {
let iterator = last(this.iterators);
if (this.currentScopeIsCovered) {
// console.log(
// iterator.name,
// iterator.depth,
// 'would close',
// iterator._inspectScopes(
// iterator.getCloseScopeIds()
// ),
// 'at',
// iterator.getPosition().toString(),
// 'but scope is covered!'
// );
} else {
// console.log(
// iterator.name,
// iterator.depth,
// 'CLOSING',
// iterator.getPosition().toString(),
// iterator._inspectScopes(
// iterator.getCloseScopeIds()
// )
// );
}
// if (this.currentScopeIsCovered) {
// console.log(
// iterator.name,
// iterator.depth,
// 'would close',
// iterator._inspectScopes(
// iterator.getCloseScopeIds()
// ),
// 'at',
// iterator.getPosition().toString(),
// 'but scope is covered!'
// );
// } else {
// console.log(
// iterator.name,
// iterator.depth,
// 'CLOSING',
// iterator.getPosition().toString(),
// iterator._inspectScopes(
// iterator.getCloseScopeIds()
// )
// );
// }
if (iterator) {
if (this.currentScopeIsCovered) {
return iterator.getOpenScopeIds().filter(id => {
@ -2620,29 +2620,29 @@ class HighlightIterator {
getOpenScopeIds() {
let iterator = last(this.iterators);
// let ids = iterator.getOpenScopeIds();
if (this.currentScopeIsCovered) {
// console.log(
// iterator.name,
// iterator.depth,
// 'would open',
// iterator._inspectScopes(
// iterator.getOpenScopeIds()
// ),
// 'at',
// iterator.getPosition().toString(),
// 'but scope is covered!'
// );
} else {
// console.log(
// iterator.name,
// iterator.depth,
// 'OPENING',
// iterator.getPosition().toString(),
// iterator._inspectScopes(
// iterator.getOpenScopeIds()
// )
// );
}
// if (this.currentScopeIsCovered) {
// console.log(
// iterator.name,
// iterator.depth,
// 'would open',
// iterator._inspectScopes(
// iterator.getOpenScopeIds()
// ),
// 'at',
// iterator.getPosition().toString(),
// 'but scope is covered!'
// );
// } else {
// console.log(
// iterator.name,
// iterator.depth,
// 'OPENING',
// iterator.getPosition().toString(),
// iterator._inspectScopes(
// iterator.getOpenScopeIds()
// )
// );
// }
if (iterator) {
if (this.currentScopeIsCovered) {
return iterator.getOpenScopeIds().filter(id => {
@ -2684,6 +2684,8 @@ class HighlightIterator {
}
}
const EMPTY_SCOPES = Object.freeze([]);
// Iterates through everything that a `LanguageLayer` is responsible for,
// marking boundaries for scope insertion.
class LayerHighlightIterator {
@ -2777,7 +2779,7 @@ class LayerHighlightIterator {
}
isAtInjectionBoundary() {
let position = Point.fromObject(this.iterator.key);
let position = Point.fromObject(this.iterator.key.position);
return position.isEqual(this.start) || position.isEqual(this.end);
}
@ -2789,61 +2791,38 @@ class LayerHighlightIterator {
}
getOpenScopeIds() {
let openScopeIds = this.iterator.value.openScopeIds;
// if (openScopeIds.length > 0) {
// console.log(
// this.name,
// this.depth,
// 'OPENING',
// this.getPosition().toString(),
// this._inspectScopes(
// this.iterator.value.openScopeIds
// )
// );
// }
return [...openScopeIds];
let { key, value } = this.iterator;
return key.boundary === 'end' ? EMPTY_SCOPES : [...value.scopeIds];
}
getCloseScopeIds() {
let closeScopeIds = this.iterator.value.closeScopeIds;
// if (closeScopeIds.length > 0) {
// console.log(
// this.name,
// 'CLOSING',
// this.getPosition().toString(),
// this._inspectScopes(
// this.iterator.value.closeScopeIds
// )
// );
// }
return [...closeScopeIds];
let { key, value } = this.iterator;
return key.boundary === 'start' ? EMPTY_SCOPES : [...value.scopeIds];
}
opensScopes() {
let scopes = this.getOpenScopeIds();
return scopes.length > 0;
return this.iterator?.key?.boundary === 'start';
}
closesScopes() {
let scopes = this.getCloseScopeIds();
return scopes.length > 0;
return this.iterator?.key?.boundary === 'end';
}
getPosition() {
return this.iterator.key || Point.INFINITY;
return this.iterator?.key?.position ?? Point.INFINITY;
}
logPosition() {
let pos = this.getPosition();
let { key, value } = this.iterator;
let { languageMode } = this.languageLayer;
let verb = key.boundary === 'end' ? 'close' : 'open';
console.log(
`[highlight] (${pos.row}, ${pos.column})`,
'close',
this.iterator.value.closeScopeIds.map(id => languageMode.scopeNameForScopeId(id)),
'open',
this.iterator.value.openScopeIds.map(id => languageMode.scopeNameForScopeId(id)),
verb,
value.scopeIds.map(id => languageMode.scopeNameForScopeId(id)),
'next?',
this.iterator.hasNext
);
@ -2851,35 +2830,33 @@ class LayerHighlightIterator {
compare(other) {
// First, favor the one whose current position is earlier.
const result = comparePoints(this.iterator.key, other.iterator.key);
const result = comparePoints(
this.iterator.key.position,
other.iterator.key.position
);
if (result !== 0) { return result; }
// Failing that, favor iterators that need to close scopes over those that
// don't.
if (this.closesScopes() && !other.closesScopes()) {
let ourBoundary = this.iterator.key.boundary;
let theirBoundary = other.iterator.key.boundary;
let bothClosing = ourBoundary === 'end' && theirBoundary === 'end';
if (ourBoundary === 'end' && !bothClosing) {
return -1;
} else if (other.closesScopes() && !this.closesScopes()) {
} else if (theirBoundary === 'end' && !bothClosing) {
return 1;
}
let bothOpening = this.opensScopes() && other.opensScopes();
let bothClosing = this.closesScopes() && other.closesScopes();
if (bothClosing) {
// When both iterators are closing scopes, the deeper layer should act
// first.
return other.languageLayer.depth - this.languageLayer.depth;
} else {
// When both iterators are opening scopes — or if there's a mix of
// opening and closing — the shallower layer should act first.
// When both iterators are opening scopes, the shallower layer should act
// first.
return this.languageLayer.depth - other.languageLayer.depth;
}
// TODO: We need to move to a system where every point in the iterator
// _either_ closes scopes _or_ opens them, with the former visited before
// the latter. Otherwise there's no correct way to sort them when two
// different layers have the same position and both want to close _and_
// open scopes.
}
moveToSuccessor() {
@ -2904,7 +2881,7 @@ class LayerHighlightIterator {
if (!this.end) { return false; }
let next = this.peekAtSuccessor();
return comparePoints(next, this.end) > 0;
return comparePoints(next.position, this.end) > 0;
}
}
@ -2982,6 +2959,9 @@ class LanguageLayer {
if (err.name === 'GrammarLoadError') {
console.warn(err.message);
if (err.queryType === 'highlightsQuery') {
// Recover by setting an empty `highlightsQuery` so that we don't
// propagate errors.
//
// TODO: Warning?
grammar.highlightsQuery = grammar.setQueryForTest(
'highlightsQuery',
@ -3015,10 +2995,8 @@ class LanguageLayer {
// injected.
languageScope = injectionPoint.languageScope;
// The `languageScope` parameter can be a function.
if (typeof languageScope === 'function') {
languageScope = languageScope(this.grammar);
}
// The `languageScope` parameter can be a function. That means we won't
// decide now; we'll decide later on a range-by-range basis.
// Honor an explicit `null`, but fall back to the default scope name
// otherwise.
@ -3028,7 +3006,10 @@ class LanguageLayer {
}
this.languageScope = languageScope;
if (languageScope === null) {
if (languageScope === null || typeof languageScope === 'function') {
// If `languageScope` is a function, we'll still often end up with a
// `languageScopeId` (or several); we just won't be able to compute it
// ahead of time.
this.languageScopeId = null;
} else {
this.languageScopeId = this.languageMode.idForScope(languageScope);
@ -3120,15 +3101,10 @@ class LanguageLayer {
from = buffer.clipPosition(Point.fromObject(from, true));
to = buffer.clipPosition(Point.fromObject(to, true));
let boundaries = createTree(comparePoints);
let boundaries = createTree(compareBoundaries);
let extent = this.getExtent();
let captures;
if (this.highlightsQuery) {
captures = this.highlightsQuery.captures(this.tree.rootNode, from, to);
} else {
captures = [];
}
let captures = this.highlightsQuery?.captures(this.tree.rootNode, from, to) ?? [];
this.scopeResolver.reset();
for (let capture of captures) {
@ -3213,37 +3189,65 @@ class LanguageLayer {
//
let includedRanges = this.depth === 0 ? [extent] : this.getCurrentRanges();
if (this.languageScopeId) {
let languageScopeIdForRange = () => this.languageScopeId;
if (typeof this.languageScope === 'function') {
languageScopeIdForRange = (range) => {
let scopeName = this.languageScope(this.grammar, this.languageMode.buffer, range);
if (Array.isArray(scopeName)) {
return scopeName.map(s => this.languageMode.idForScope(s));
} else {
return this.languageMode.idForScope(scopeName);
}
};
}
if (this.languageScopeId || typeof this.languageScope === 'function') {
for (let range of includedRanges) {
// Filter out ranges that have no intersection with ours.
if (range.end.isLessThanOrEqual(from)) { continue; }
if (range.start.isGreaterThanOrEqual(to)) { continue; }
let languageScopeIds = languageScopeIdForRange(range);
if (!languageScopeIds) continue;
if (!Array.isArray(languageScopeIds)) {
languageScopeIds = [languageScopeIds];
}
if (range.start.isLessThan(from)) {
// If we get this far, we know that the base language scope was open
// when our range began.
alreadyOpenScopes.set(range.start, [this.languageScopeId]);
alreadyOpenScopes.set(
range.start,
languageScopeIds
);
} else {
// Range start must be between `from` and `to`, or else equal `from`
// exactly.
this.scopeResolver.setBoundary(
range.start,
this.languageScopeId,
'open',
{ root: true, length: Infinity }
);
for (let id of languageScopeIds) {
this.scopeResolver.setBoundary(
range.start,
id,
'open',
{ root: true, length: Infinity }
);
}
}
if (range.end.isGreaterThan(to)) {
// Do nothing; we don't need to set this boundary.
} else {
// The range must end somewhere within our range.
this.scopeResolver.setBoundary(
range.end,
this.languageScopeId,
'close',
{ root: true, length: Infinity }
);
//
// Close the boundaries in the opposite order of how we opened them.
for (let i = languageScopeIds.length - 1; i >= 0; i--) {
this.scopeResolver.setBoundary(
range.end,
languageScopeIds[i],
'close',
{ root: true, length: Infinity }
);
}
}
}
}
@ -3263,12 +3267,20 @@ class LanguageLayer {
continue;
}
let bundle = {
closeScopeIds: [...data.close],
openScopeIds: [...data.open]
};
let OPEN_KEY = { position: point, boundary: 'start' };
let CLOSE_KEY = { position: point, boundary: 'end' };
boundaries = boundaries.insert(point, bundle);
if (data.close.length > 0) {
boundaries = boundaries.insert(CLOSE_KEY, {
scopeIds: Object.freeze(data.close)
});
}
if (data.open.length > 0) {
boundaries = boundaries.insert(OPEN_KEY, {
scopeIds: Object.freeze(data.open)
});
}
}
return [boundaries, alreadyOpenScopes];
@ -3829,11 +3841,11 @@ class LanguageLayer {
// If the cursor is resting before column X, we want all scopes that cover
// the character in column X.
let captures = this.highlightsQuery.captures(
let captures = this.highlightsQuery?.captures(
this.tree.rootNode,
point,
{ row: point.row, column: point.column + 1 }
);
) ?? [];
let results = [];
for (let capture of captures) {