mirror of
https://github.com/github/semantic.git
synced 2025-01-01 19:55:34 +03:00
Add mini summit problem notes
This commit is contained in:
parent
f1293a9887
commit
25f8b922ab
70
weekly/2016-06-21.md
Normal file
70
weekly/2016-06-21.md
Normal file
@ -0,0 +1,70 @@
|
||||
# Semantic Diff Problems (Mini-Summit)
|
||||
|
||||
### Performance (most significant problem)
|
||||
|
||||
- SES / Alignment are biggest time / space consumers
|
||||
- Profiling small subsets of code paths rather than the full context.
|
||||
- Adding more criterion benchmarks for code paths not currently profiled (like Diff Summaries)
|
||||
|
||||
#### Alignment Performance
|
||||
|
||||
- Has to visit each child of each remaining line
|
||||
|
||||
#### [SES](https://github.com/github/semantic-diff/files/22485/An.O.ND.Difference.Algorithm.and.its.Variations.pdf) Performance
|
||||
|
||||
- n^3 the size of the tree
|
||||
- Can try bounded SES (looks ahead by a fixed size of nodes)
|
||||
- Identify more comparisons we can skip (i.e. don't compare functions with array literals)
|
||||
- Does not look like there are more easy wins here (algorithm is already implemented to prevent unnecessary comparisions).
|
||||
- In some cases, the diffing is expensive because we don't have more
|
||||
fine-grain identifiers for certain diffs. (e.g. a test file with 100 statement expressions)
|
||||
- Diffing against identifiers (use the edit distance to determine whether to compare terms with SES or not)
|
||||
- This could result in us missing a function rename though
|
||||
- Not a catchall, but it can help increase performance in a larger number of cases
|
||||
|
||||
#### [RWS](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf) Performance
|
||||
|
||||
- Random Walk Similarity
|
||||
- computes approximation to the minimal edit script
|
||||
- O(log N) rather than O(n^3)
|
||||
- RWS does not rely on identifiers
|
||||
- RWS solves our performance problem in the general form
|
||||
- Can allow us to diff patches of patches (something we cannot do currently with our implementation of SES)
|
||||
|
||||
#### Diff Summaries Performance
|
||||
|
||||
- Performance of DS is dependent on diffing (Diff Terms, Interpreter, cost functions)
|
||||
|
||||
### Failing too hard when we fail (request is not completing if Semantic Diff fails)
|
||||
|
||||
- How can we fail better on dotcom?
|
||||
- How can we fail better when parsing? (both in Semantic Diff and dotcom)
|
||||
|
||||
### Responsiveness
|
||||
|
||||
- Async fetch diff summaries / diffs / progressive diffs or diff summaries
|
||||
|
||||
### Improving grammars (getting Ruby parser fixed, testing C parser)
|
||||
|
||||
### Measuring effectiveness of grammars
|
||||
|
||||
### Tooling
|
||||
|
||||
- Why isn't parallelization of SES having the expected effect?
|
||||
- Should focus on low hanging fruit but we're not going to write a debugger.
|
||||
|
||||
### Time limitations with respect to solutions and team
|
||||
|
||||
### Ramp up time is extremely variable.
|
||||
|
||||
### Onboarding
|
||||
|
||||
- SES algorithm requires some context and background to understand the code at a macro.
|
||||
- Plan a bit before pairing to gain context
|
||||
|
||||
### Pre-launch Ideas
|
||||
|
||||
- Test on a couple file server nodes and run semantic diff on javascript repos.
|
||||
- Collect repos, files, shas that contain error nodes to gain a % of error rates and expose errors in tree sitter grammars.
|
||||
- If sources have errors, can we use a parser that validates the source is correct?
|
||||
- Configure a script that is as language independent as possible that can automate the error collection process but allows us to specify an independent validating parser for each language.
|
Loading…
Reference in New Issue
Block a user