mirror of
https://github.com/github/semantic.git
synced 2024-11-23 08:27:56 +03:00
Update docs now that alacarte ASTs are gone.
This is pretty straightforward. Fixes #602. Fixes #107.
This commit is contained in:
parent
ba27f8e456
commit
4e5012835f
15
README.md
15
README.md
@ -58,7 +58,7 @@ Available options:
|
||||
|
||||
## Development
|
||||
|
||||
`semantic` requires at least GHC 8.8.1 and Cabal 3.0. We strongly recommend using [`ghcup`][ghcup] to sandbox GHC versions, as GHC packages installed through your OS's package manager may not install statically-linked versions of the GHC boot libraries. `semantic` currently builds only on Unix systems; users of other operating systems may wish to use the [Docker images](https://github.com/github/semantic/packages/11609).
|
||||
`semantic` requires at least GHC 8.8.3 and Cabal 3.0. We strongly recommend using [`ghcup`][ghcup] to sandbox GHC versions, as GHC packages installed through your OS's package manager may not install statically-linked versions of the GHC boot libraries. `semantic` currently builds only on Unix systems; users of other operating systems may wish to use the [Docker images](https://github.com/github/semantic/packages/11609).
|
||||
|
||||
We use `cabal's` [Nix-style local builds][nix] for development. To get started quickly:
|
||||
|
||||
@ -71,7 +71,7 @@ cabal v2-test
|
||||
cabal v2-run semantic -- --help
|
||||
```
|
||||
|
||||
You can also use the [Bazel](https://bazel.build) build system for development. To learn more about Bazel and why it might give you a better development experience, check the documentation at `docs/build.md`.
|
||||
You can also use the [Bazel](https://bazel.build) build system for development. To learn more about Bazel and why it might give you a better development experience, check the [build documentation](docs/build.md).
|
||||
|
||||
``` bash
|
||||
git clone git@github.com:github/semantic.git
|
||||
@ -89,13 +89,12 @@ bazel build //...
|
||||
## Technology and architecture
|
||||
|
||||
Architecturally, `semantic`:
|
||||
1. Reads blobs.
|
||||
2. Generates parse trees for those blobs with [tree-sitter][tree-sitter] (an incremental parsing system for programming tools).
|
||||
3. Assigns those trees into a generalized representation of syntax.
|
||||
4. Performs analysis, computes diffs, or just returns parse trees.
|
||||
5. Renders output in one of many supported formats.
|
||||
1. Generates per-language Haskell syntax types based on [tree-sitter](https://github.com/tree-sitter/tree-sitter) grammar definitions.
|
||||
2. Reads blobs from a filesystem or provided via a protocol buffer request.
|
||||
3. Returns blobs or performs analysis.
|
||||
4. Renders output in one of many supported formats.
|
||||
|
||||
Semantic leverages a number of interesting algorithms and techniques:
|
||||
Throughout its lifestyle, `semantic` has leveraged a number of interesting algorithms and techniques, including:
|
||||
|
||||
- Myers' algorithm (SES) as described in the paper [*An O(ND) Difference Algorithm and Its Variations*][SES]
|
||||
- RWS as described in the paper [*RWS-Diff: Flexible and Efficient Change Detection in Hierarchical Data*][RWS].
|
||||
|
@ -1,66 +0,0 @@
|
||||
# Program analysis
|
||||
|
||||
Program analysis allows us to ask questions about and analyze the behavior of computer programs. Analyzing this behavior allows us to (eventually) answer subtle but powerful questions such as, will this use more than 8 GB of RAM? Does this present a user interface? We perform program analysis statically—that is, without executing the program.
|
||||
|
||||
We’re able to compute the following end results using evaluation:
|
||||
1. **Import graph:** graph representing all dependencies (`import`s, `require`s, etc.)
|
||||
2. **Call graph:** a control flow graph that represents calling relationships (ie., how one particular function calls other functions). This information is often vital for debugging purposes and determining where code is failing.
|
||||
3. **Control flow graph:** representation of _all_ paths that might be traversed through a program during its execution.
|
||||
|
||||
### Abstract interpretation
|
||||
To do program analysis, we implement an approach based on the paper [Abstracting Definitional Interpreters](https://plum-umd.github.io/abstracting-definitional-interpreters/), which we've extended to work with our [à la carte representation of syntaxes](http://www.cs.ru.nl/~W.Swierstra/Publications/DataTypesALaCarte.pdf). This allows us to build a library of interpreters that do different things, but are written with the _same_ evaluation semantics. This approach offers several advantages; we can define _one_ evaluator and get different behaviors out of it (via type-directed polymorphism).
|
||||
|
||||
We employ three types of interpretation: *concrete semantics*, *abstract semantics* and *type-checking*.
|
||||
|
||||
1. Under **concrete semantics**, we are precise; we only compute the result of code that is called. This allows us to see exactly what happens when we run our program. For example, if we expect to return a boolean value and our results differ, we’ll throw an error (which is sub-optimal because in a language like Ruby, a lot of objects that are not booleans could be used as booleans).
|
||||
2. Under **abstract semantics**, we are exhaustive; we compute the result of all possible permutations. This is how we compute call graphs. Under abstract semantics, we don’t know if something is going to be `true` or `false`, so we take both branches—non-deterministically producing both using the `<|>` operator which represents choice, building a union of possibilities.
|
||||
3. Under **type-checking semantics**, we verify that the type of a syntactic construct (ex., an object of type `Int`) matches what is expected when it is used. This helps us check type errors, emulating compile-time static type checking.
|
||||
|
||||
### Evaluation
|
||||
The [`Evaluatable`](https://github.com/github/semantic/blob/master/src/Data/Abstract/Evaluatable.hs) class defines the necessary interface for a term to be evaluated. While a default definition of `eval` is given, instances with computational content must implement `eval` to perform their small-step operational semantics. Evaluation gives us a way to capture what it means to interpret the syntax data types we create using the [Assignment](https://github.com/github/semantic/blob/master/docs/assignment.md) stage. The evaluation algebra also handles each syntax without caring about any language-specific implementation. We do this by cascading polymorphic functions using the `Evaluatable` type class.
|
||||
|
||||
We have yet to finish implementing `Evaluatable` instances for the various à la carte syntaxes. Doing so requires knowledge of the type and value evaluation semantics of a particular syntax and familiarity with the functions for interacting with the environment and store.
|
||||
|
||||
#### Implementing `Evaluatable` instances
|
||||
The following is a brief guide to working with the definitional interpreters and implementing instances of `Evaluatable` for the various pieces of syntax. `Semantil.Util` defines a series of language-specific wrapper functions for working in ghci to do evaluation.
|
||||
|
||||
_Helpers:_
|
||||
- `parseFile`: parses one file.
|
||||
- `evaluateLanguageProject`: takes a list of files and evaluates them usually under concrete semantics.
|
||||
- `callGraphLanguageProject`: uses the same mechanism for evaluating, but uses abstract semantics.
|
||||
- `typeCheckLanguageFile`: allows us to evaluate under type checking semantics.
|
||||
|
||||
#### Creating good abstractions
|
||||
When adding `Evaluatable` instances, we may notice that certain language-specific syntaxes share semantics sufficiently enough to be consolidated into a language-agnostic data type (and resultantly, have one `Evaluatable` instance). Other times—it may be the opposite case where there is not enough overlap in evaluation semantics and therefore requires decoupling. Reasoning through the right abstractions is a big part of determining how to write these `Evaluatable` instances.
|
||||
|
||||
### Effects
|
||||
To perform these computations, we need effects. An effect is something a piece of code does which isn’t strictly encapsulated in its return value. Outside of taking inputs and returning outputs, programs must capture state in memory by read or write, throw exceptions, fail to terminate, or terminate non-deterministically, etc. These outcomes are known as _effects_. An an example, consider the JS function:
|
||||
|
||||
```
|
||||
function square(x) {
|
||||
return x * x;
|
||||
}
|
||||
```
|
||||
|
||||
This is _pure_ because it performs no effects, whereas the similar function:
|
||||
|
||||
```function square(x) {
|
||||
console.log("squaring x: " + x);
|
||||
return x * x;
|
||||
}
|
||||
```
|
||||
computes the same result value but additionally performs an effect (logging). Effects provide convenient access to powerful and efficient capabilities of the machine such as interrupts, stateful memory, the file system, and the monitor.
|
||||
|
||||
We compute effects non-deterministically.
|
||||
|
||||
<!--- WIP: come back and briefly talk about why this is useful for program analysis --->
|
||||
|
||||
<!--- WIP: come back and briefly talk about why this is useful for runEvaluator --->
|
||||
|
||||
### Potential use-cases
|
||||
|
||||
- *Dead code* analysis: reduce potential surface area (security vulnerabilities). Less code to maintain is always a good thing. Good examples in most IDEs.
|
||||
- *Symbolic* - allows us to do symbolic execution. https://prepack.io/
|
||||
- *Caching* - a way to guarantee that an analysis will terminate. allows us to write a type checker. abstracting variables to their types. Instead of potentially infinite series of integers, you can represent as Int (finitization of values)
|
||||
- *Collecting* - allows us to have greater precision in other analyses (more useful internally)
|
||||
- *Tracing and reachable* state - useful for debugging (it's verbose).
|
@ -18,62 +18,20 @@ Space leaks can be detected by running `semantic` with a restricted heap size. N
|
||||
|
||||
## Building
|
||||
|
||||
`stack build --fast` is a nice way to speed up local development (builds without optimizations).
|
||||
|
||||
Before building with `cabal`, be sure to run `cabal configure --disable-optimizations --enable-tests`. GHC defaults to `-O1`, which can significantly slow recompiles.
|
||||
|
||||
## Testing
|
||||
|
||||
`stack build --fast semantic:test` builds and runs the unit tests.
|
||||
|
||||
- Find out what all the possible test arguments are with `stack test --help`.
|
||||
- Find out what all the possible test arguments are with `cabal run semantic:spec -- --help`.
|
||||
- Focus in on a particular test or set of tests with `-m`/`--match`:
|
||||
|
||||
stack test --test-arguments="-m ruby"
|
||||
|
||||
- Use `--skip` to run everything but matching tests:
|
||||
|
||||
stack test --test-arguments="--skip ruby"
|
||||
cabal run semantic:spec -- -p ruby
|
||||
|
||||
- It can take a while to run them over the whole project. Focus in on a particular module with `--test-arguments`:
|
||||
|
||||
stack test --test-arguments=src/Data/Range.hs
|
||||
|
||||
|
||||
## Difftool
|
||||
|
||||
`git` can be configured to open diffs in `semantic` using `git-difftool`:
|
||||
|
||||
1. Install semantic to the local bin path: `stack install :semantic`
|
||||
|
||||
2. Configure `semantic` as a difftool:
|
||||
|
||||
git config difftool.semantic.cmd 'semantic diff --patch "$LOCAL" "$REMOTE"'
|
||||
|
||||
3. Optionally, configure `semantic` as the default difftool:
|
||||
|
||||
git config diff.tool semantic
|
||||
|
||||
4. Perform git diffs using semantic by invoking `git-difftool`:
|
||||
|
||||
# if configured as default
|
||||
git difftool
|
||||
# otherwise
|
||||
git difftool -t semantic
|
||||
|
||||
5. _Bonus round!_ Optionally, configure `git-difftool` to never prompt:
|
||||
|
||||
git config difftool.prompt false
|
||||
|
||||
|
||||
## Editing
|
||||
|
||||
- 1. Install ghc-mod from the semantic directory by running:
|
||||
|
||||
`stack install ghc-mod`
|
||||
|
||||
- 2. You'll need the `ide-haskell` plugins for atom. You can install through apm:
|
||||
|
||||
`apm install haskell-ghc-mod ide-haskell ide-haskell-cabal linter linter-ui-default`
|
||||
cabal run semantic:spec -- -p Data.Language
|
||||
|
||||
# Ctags Support
|
||||
|
||||
@ -99,12 +57,6 @@ Alternatively, you can replace `symbols-view` with `joshvera/tags-view` in your
|
||||
|
||||
Then disable the `symbols-view` package.
|
||||
|
||||
|
||||
## Semantic documentation in Dash
|
||||
|
||||
You can generate a `semantic` docset and import it into Dash locally. To do so run the `script/haddock` first to ensure Haddock documentation is generated. Then run `script/docset`. This should generate `.docset/semantic.docset` in the `semantic` repo. The last step is to import the `semantic.docset` into Dash. Open dash, open preferences, select the 'Docsets' tab, click the `+` icon to add a new docset, and direct the file browser to `semantic/.docsets/semantic.docset`.
|
||||
|
||||
|
||||
## Working with grammar datatypes
|
||||
|
||||
`haskell-tree-sitter` includes some TemplateHaskell machinery to generate a datatype from a tree-sitter parser’s symbol table. You can generally guess the constructors of that type by turning the snake_case production names from the tree-sitter grammar into UpperCamelCase names, but you can also have the compiler dump the datatype out in full in the repl:
|
||||
|
Loading…
Reference in New Issue
Block a user