1
1
mirror of https://github.com/github/semantic.git synced 2024-11-27 12:57:49 +03:00

Update docs now that alacarte ASTs are gone.

This is pretty straightforward.

Fixes #602. Fixes #107.
This commit is contained in:
Patrick Thomson 2020-07-13 12:34:52 -04:00
parent ba27f8e456
commit 4e5012835f
3 changed files with 11 additions and 126 deletions

View File

@ -58,7 +58,7 @@ Available options:
## Development ## Development
`semantic` requires at least GHC 8.8.1 and Cabal 3.0. We strongly recommend using [`ghcup`][ghcup] to sandbox GHC versions, as GHC packages installed through your OS's package manager may not install statically-linked versions of the GHC boot libraries. `semantic` currently builds only on Unix systems; users of other operating systems may wish to use the [Docker images](https://github.com/github/semantic/packages/11609). `semantic` requires at least GHC 8.8.3 and Cabal 3.0. We strongly recommend using [`ghcup`][ghcup] to sandbox GHC versions, as GHC packages installed through your OS's package manager may not install statically-linked versions of the GHC boot libraries. `semantic` currently builds only on Unix systems; users of other operating systems may wish to use the [Docker images](https://github.com/github/semantic/packages/11609).
We use `cabal's` [Nix-style local builds][nix] for development. To get started quickly: We use `cabal's` [Nix-style local builds][nix] for development. To get started quickly:
@ -71,7 +71,7 @@ cabal v2-test
cabal v2-run semantic -- --help cabal v2-run semantic -- --help
``` ```
You can also use the [Bazel](https://bazel.build) build system for development. To learn more about Bazel and why it might give you a better development experience, check the documentation at `docs/build.md`. You can also use the [Bazel](https://bazel.build) build system for development. To learn more about Bazel and why it might give you a better development experience, check the [build documentation](docs/build.md).
``` bash ``` bash
git clone git@github.com:github/semantic.git git clone git@github.com:github/semantic.git
@ -89,13 +89,12 @@ bazel build //...
## Technology and architecture ## Technology and architecture
Architecturally, `semantic`: Architecturally, `semantic`:
1. Reads blobs. 1. Generates per-language Haskell syntax types based on [tree-sitter](https://github.com/tree-sitter/tree-sitter) grammar definitions.
2. Generates parse trees for those blobs with [tree-sitter][tree-sitter] (an incremental parsing system for programming tools). 2. Reads blobs from a filesystem or provided via a protocol buffer request.
3. Assigns those trees into a generalized representation of syntax. 3. Returns blobs or performs analysis.
4. Performs analysis, computes diffs, or just returns parse trees. 4. Renders output in one of many supported formats.
5. Renders output in one of many supported formats.
Semantic leverages a number of interesting algorithms and techniques: Throughout its lifestyle, `semantic` has leveraged a number of interesting algorithms and techniques, including:
- Myers' algorithm (SES) as described in the paper [*An O(ND) Difference Algorithm and Its Variations*][SES] - Myers' algorithm (SES) as described in the paper [*An O(ND) Difference Algorithm and Its Variations*][SES]
- RWS as described in the paper [*RWS-Diff: Flexible and Efficient Change Detection in Hierarchical Data*][RWS]. - RWS as described in the paper [*RWS-Diff: Flexible and Efficient Change Detection in Hierarchical Data*][RWS].

View File

@ -1,66 +0,0 @@
# Program analysis
Program analysis allows us to ask questions about and analyze the behavior of computer programs. Analyzing this behavior allows us to (eventually) answer subtle but powerful questions such as, will this use more than 8 GB of RAM? Does this present a user interface? We perform program analysis statically—that is, without executing the program.
Were able to compute the following end results using evaluation:
1. **Import graph:** graph representing all dependencies (`import`s, `require`s, etc.)
2. **Call graph:** a control flow graph that represents calling relationships (ie., how one particular function calls other functions). This information is often vital for debugging purposes and determining where code is failing.
3. **Control flow graph:** representation of _all_ paths that might be traversed through a program during its execution.
### Abstract interpretation
To do program analysis, we implement an approach based on the paper [Abstracting Definitional Interpreters](https://plum-umd.github.io/abstracting-definitional-interpreters/), which we've extended to work with our [à la carte representation of syntaxes](http://www.cs.ru.nl/~W.Swierstra/Publications/DataTypesALaCarte.pdf). This allows us to build a library of interpreters that do different things, but are written with the _same_ evaluation semantics. This approach offers several advantages; we can define _one_ evaluator and get different behaviors out of it (via type-directed polymorphism).
We employ three types of interpretation: *concrete semantics*, *abstract semantics* and *type-checking*.
1. Under **concrete semantics**, we are precise; we only compute the result of code that is called. This allows us to see exactly what happens when we run our program. For example, if we expect to return a boolean value and our results differ, well throw an error (which is sub-optimal because in a language like Ruby, a lot of objects that are not booleans could be used as booleans).
2. Under **abstract semantics**, we are exhaustive; we compute the result of all possible permutations. This is how we compute call graphs. Under abstract semantics, we dont know if something is going to be `true` or `false`, so we take both branches—non-deterministically producing both using the `<|>` operator which represents choice, building a union of possibilities.
3. Under **type-checking semantics**, we verify that the type of a syntactic construct (ex., an object of type `Int`) matches what is expected when it is used. This helps us check type errors, emulating compile-time static type checking.
### Evaluation
The [`Evaluatable`](https://github.com/github/semantic/blob/master/src/Data/Abstract/Evaluatable.hs) class defines the necessary interface for a term to be evaluated. While a default definition of `eval` is given, instances with computational content must implement `eval` to perform their small-step operational semantics. Evaluation gives us a way to capture what it means to interpret the syntax data types we create using the [Assignment](https://github.com/github/semantic/blob/master/docs/assignment.md) stage. The evaluation algebra also handles each syntax without caring about any language-specific implementation. We do this by cascading polymorphic functions using the `Evaluatable` type class.
We have yet to finish implementing `Evaluatable` instances for the various à la carte syntaxes. Doing so requires knowledge of the type and value evaluation semantics of a particular syntax and familiarity with the functions for interacting with the environment and store.
#### Implementing `Evaluatable` instances
The following is a brief guide to working with the definitional interpreters and implementing instances of `Evaluatable` for the various pieces of syntax. `Semantil.Util` defines a series of language-specific wrapper functions for working in ghci to do evaluation.
_Helpers:_
- `parseFile`: parses one file.
- `evaluateLanguageProject`: takes a list of files and evaluates them usually under concrete semantics.
- `callGraphLanguageProject`: uses the same mechanism for evaluating, but uses abstract semantics.
- `typeCheckLanguageFile`: allows us to evaluate under type checking semantics.
#### Creating good abstractions
When adding `Evaluatable` instances, we may notice that certain language-specific syntaxes share semantics sufficiently enough to be consolidated into a language-agnostic data type (and resultantly, have one `Evaluatable` instance). Other times—it may be the opposite case where there is not enough overlap in evaluation semantics and therefore requires decoupling. Reasoning through the right abstractions is a big part of determining how to write these `Evaluatable` instances.
### Effects
To perform these computations, we need effects. An effect is something a piece of code does which isnt strictly encapsulated in its return value. Outside of taking inputs and returning outputs, programs must capture state in memory by read or write, throw exceptions, fail to terminate, or terminate non-deterministically, etc. These outcomes are known as _effects_. An an example, consider the JS function:
```
function square(x) {
return x * x;
}
```
This is _pure_ because it performs no effects, whereas the similar function:
```function square(x) {
console.log("squaring x: " + x);
return x * x;
}
```
computes the same result value but additionally performs an effect (logging). Effects provide convenient access to powerful and efficient capabilities of the machine such as interrupts, stateful memory, the file system, and the monitor.
We compute effects non-deterministically.
<!--- WIP: come back and briefly talk about why this is useful for program analysis --->
<!--- WIP: come back and briefly talk about why this is useful for runEvaluator --->
### Potential use-cases
- *Dead code* analysis: reduce potential surface area (security vulnerabilities). Less code to maintain is always a good thing. Good examples in most IDEs.
- *Symbolic* - allows us to do symbolic execution. https://prepack.io/
- *Caching* - a way to guarantee that an analysis will terminate. allows us to write a type checker. abstracting variables to their types. Instead of potentially infinite series of integers, you can represent as Int (finitization of values)
- *Collecting* - allows us to have greater precision in other analyses (more useful internally)
- *Tracing and reachable* state - useful for debugging (it's verbose).

View File

@ -18,62 +18,20 @@ Space leaks can be detected by running `semantic` with a restricted heap size. N
## Building ## Building
`stack build --fast` is a nice way to speed up local development (builds without optimizations). Before building with `cabal`, be sure to run `cabal configure --disable-optimizations --enable-tests`. GHC defaults to `-O1`, which can significantly slow recompiles.
## Testing ## Testing
`stack build --fast semantic:test` builds and runs the unit tests. `stack build --fast semantic:test` builds and runs the unit tests.
- Find out what all the possible test arguments are with `stack test --help`. - Find out what all the possible test arguments are with `cabal run semantic:spec -- --help`.
- Focus in on a particular test or set of tests with `-m`/`--match`: - Focus in on a particular test or set of tests with `-m`/`--match`:
stack test --test-arguments="-m ruby" cabal run semantic:spec -- -p ruby
- Use `--skip` to run everything but matching tests:
stack test --test-arguments="--skip ruby"
- It can take a while to run them over the whole project. Focus in on a particular module with `--test-arguments`: - It can take a while to run them over the whole project. Focus in on a particular module with `--test-arguments`:
stack test --test-arguments=src/Data/Range.hs cabal run semantic:spec -- -p Data.Language
## Difftool
`git` can be configured to open diffs in `semantic` using `git-difftool`:
1. Install semantic to the local bin path: `stack install :semantic`
2. Configure `semantic` as a difftool:
git config difftool.semantic.cmd 'semantic diff --patch "$LOCAL" "$REMOTE"'
3. Optionally, configure `semantic` as the default difftool:
git config diff.tool semantic
4. Perform git diffs using semantic by invoking `git-difftool`:
# if configured as default
git difftool
# otherwise
git difftool -t semantic
5. _Bonus round!_ Optionally, configure `git-difftool` to never prompt:
git config difftool.prompt false
## Editing
- 1. Install ghc-mod from the semantic directory by running:
`stack install ghc-mod`
- 2. You'll need the `ide-haskell` plugins for atom. You can install through apm:
       `apm install haskell-ghc-mod ide-haskell ide-haskell-cabal linter linter-ui-default`
# Ctags Support # Ctags Support
@ -99,12 +57,6 @@ Alternatively, you can replace `symbols-view` with `joshvera/tags-view` in your
Then disable the `symbols-view` package. Then disable the `symbols-view` package.
## Semantic documentation in Dash
You can generate a `semantic` docset and import it into Dash locally. To do so run the `script/haddock` first to ensure Haddock documentation is generated. Then run `script/docset`. This should generate `.docset/semantic.docset` in the `semantic` repo. The last step is to import the `semantic.docset` into Dash. Open dash, open preferences, select the 'Docsets' tab, click the `+` icon to add a new docset, and direct the file browser to `semantic/.docsets/semantic.docset`.
## Working with grammar datatypes ## Working with grammar datatypes
`haskell-tree-sitter` includes some TemplateHaskell machinery to generate a datatype from a tree-sitter parsers symbol table. You can generally guess the constructors of that type by turning the snake_case production names from the tree-sitter grammar into UpperCamelCase names, but you can also have the compiler dump the datatype out in full in the repl: `haskell-tree-sitter` includes some TemplateHaskell machinery to generate a datatype from a tree-sitter parsers symbol table. You can generally guess the constructors of that type by turning the snake_case production names from the tree-sitter grammar into UpperCamelCase names, but you can also have the compiler dump the datatype out in full in the repl: