pulsar/vendor/web-tree-sitter
Andrew Dupont a2f3c9e25a Fix typo
2023-05-04 21:26:17 -07:00
..
README.md Fix typo 2023-05-04 21:26:17 -07:00
tree-sitter.js Move to a custom version of web-tree-sitter 2023-04-25 17:31:49 -07:00
tree-sitter.wasm Move to a custom version of web-tree-sitter 2023-04-25 17:31:49 -07:00

Building a custom web-tree-sitter

Tree-sitter parsers often use external C scanners, and those scanners sometimes use functions in the C standard library. For this to work in a WASM environment, web-tree-sitter needs to have anticipated which stdlib functions will need to be available. If a tree-sitter parser uses stdlib function X, but X is not included in this list of exports, the parser will fail to work and will throw an error whenever it hits a code path that uses the rogue function.

For this reason, Pulsar builds a custom web-tree-sitter. Every time someone tries to integrate a new tree-sitter parser into a Pulsar grammar, they might find that the parser relies on some stdlib function we havent included yet — in which case they can let us know and well be able to update our web-tree-sitter build so that it can export that function.

Pulsar will need to do this until tree-sitter#949 is addressed in some way.

Check out the modified branch for the version were targeting

At time of writing, Pulsar was targeting web-tree-sitter version 0.20.7, so a branch exists on our fork called v0-20-7-modified. That branch contains a modified exports.json file and a modified script for building web-tree-sitter.

When we target a newer version of web-tree-sitter, a similar branch should be created against the corresponding upstream tag. The commits that were applied on the previous modified branch should be able to be cherry-picked onto the new one rather easily.

Add whatever methods are needed to exports.json

For instance, tree-sitter-ruby introduced a new dependency on the C stdlib function iswupper a while back, and web-tree-sitter doesnt export that one by default. So we can add the line

  "_iswupper",

in an appropriate place in exports.json, then rebuild web-tree-sitter so that the WASM-built version of the tree-sitter-ruby parser has that function available to it.

If a third-party tree-sitter grammar needs something more esoteric, our default position should be to add it to the build. If the export results in a major change in file size or — somehow — performance, then the change can be discussed.

Run script/build-wasm from the root

To build web-tree-sitter for a particular version, make sure youre using the appropriate version of Emscripten. This document is useful at matching up tree-sitter versions with Emscripten versions.

The default build-wasm script performs minification with terser. Thats easy enough to turn off — and we do — but even without minifcation, emscripten generates a JS file that doesnt have line breaks or indentation. We fix this by running js-beautify as a final step.

Pulsar, as a desktop app, doesnt gain a lot from minification, and ultimately its better to have a source file that the user can more easily debug if necessary. And it makes the next change a bit easier:

Add a warning message

When a parser tries to use a stdlib function that isnt exported by web-tree-sitter, the error thats thrown is not very useful. So we try to detect when that scenario is going to happen and insert a warning in the console to help users that might otherwise be befuddled.

This may be automated in the future, but for now you can modify tree-sitter.js to include the checkForAsmVersion function:

var Module = typeof Module !== "undefined" ? Module : {};
var TreeSitter = function() {

  function checkForAsmVersion(prop) {
    if (!(prop in Module['asm'])) {
      console.warn(`Warning: parser wants to call function ${prop}, but it is not defined. If parsing fails, this is probably the reason why. Please report this to the Pulsar team so that this parser can be supported properly.`);
    }
  }

  var initPromise;
  var document = typeof window == "object" ? {
    currentScript: window.document.currentScript
  } : null;

You can then search for this line

if (!resolved) resolved = resolveSymbol(prop, true);

and add the following line right below it:

checkForAsmVersion(prop);

The line in question is generated by emscripten, so if it changes in the future, you should be able to look up its equivalent in the correct version of emscripten.

Copy it to vendor

Under lib/binding_web youll find the built files tree-sitter.js and tree-sitter.wasm. Copy both to Pulsars vendor/tree-sitter directory. Relaunch Pulsar and do a smoke test with a couple of existing grammars to make sure you didnt break anything.