1
1
mirror of https://github.com/github/semantic.git synced 2024-12-24 15:35:14 +03:00

Merge remote-tracking branch 'origin/master' into syntax-redux

This commit is contained in:
joshvera 2016-06-27 14:02:41 -04:00
commit d209abda02
2 changed files with 110 additions and 0 deletions

74
weekly/2016-06-21.md Normal file
View File

@ -0,0 +1,74 @@
# Semantic Diff Problems (Mini-Summit)
### Performance (most significant problem)
- SES / Alignment are biggest time / space consumers.
- Profiling small subsets of code paths rather than the full context.
- Adding more criterion benchmarks for code paths not currently profiled (like Diff Summaries).
##### Alignment performance
- Has to visit each child of each remaining line.
##### [SES](https://github.com/github/semantic-diff/files/22485/An.O.ND.Difference.Algorithm.and.its.Variations.pdf) Performance
- n^3 the size of the tree.
- Can try bounded SES (looks ahead by a fixed size of nodes).
- Identify more comparisons we can skip (i.e. don't compare functions with array literals).
- Does not look like there are more easy wins here (algorithm is already implemented to prevent unnecessary comparisions).
- In some cases, the diffing is expensive because we don't have more fine-grain identifiers for certain diffs. (e.g. a test file with 100 statement expressions).
- Diffing against identifiers (use the edit distance to determine whether to compare terms with SES or not).
- This could result in us missing a function rename though.
- Not a catchall, but it can help increase performance in a larger number of cases.
##### [RWS](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf) Performance
- Random Walk Similarity.
- computes approximation to the minimal edit script.
- O(log N) rather than O(n^3).
- RWS does not rely on identifiers.
- RWS solves our performance problem in the general form.
- Can allow us to diff patches of patches (something we cannot do currently with our implementation of SES).
##### Diff summaries performance
- Performance of DS is dependent on diffing (Diff Terms, Interpreter, cost functions)
### Failing too hard
- Request is not completing if Semantic Diff fails.
- How can we fail better on dotcom?
- How can we fail better when parsing? (both in Semantic Diff and dotcom)
### Responsiveness
- Async fetch diff summaries / diffs / progressive diffs or diff summaries
### Improving grammars
- Fix Ruby parser.
- Testing and verifying other grammars.
### Measure effectiveness of grammars
### Tooling
- Why isn't parallelization of SES having the expected effect?
- Should focus on low hanging fruit but we're not going to write a debugger.
### Time limitations with respect to solutions and team
### Ramp up time is extremely variable.
### Onboarding
- Pairing has been fantastic.
- SES algorithm requires some context and background to understand the code at the general / macro level.
- Plan a bit before pairing to gain context.
### Pre-launch Ideas
- Test on a couple file server nodes and run semantic diff on javascript repos.
- Collect repos, files, shas that contain error nodes to gain a % of error rates and expose errors in tree sitter grammars.
- If sources have errors, can we use a parser that validates the source is correct?
- Configure a script that is as language independent as possible that can automate the error collection process but allows us to specify an independent validating parser for each language.

36
weekly/2016-06-27.md Normal file
View File

@ -0,0 +1,36 @@
# June 27th, 2016 weekly
## What went well?
@joshvera: Pairing, minisummitting, RWS discussions.
@rewinfrey: Pairing, context on recursion schemes, started independent work on the project, minisummitting. Defined what to work on next
@robrix: Minisummit: got to know both of you better & really enjoyed that. Before that I was on vacation but you both did a great job!
## What went less well?
@joshvera: Lots more problems turned up. Lots of stuff that has taken on more importance as weve thought about it more. Feel like I couldve made more progress on diff summaries by now. Some of that has been minisummit, some of that has been every time we do more work on it there seems to be new layers peeling off exposing other issues & more work needing to be done.
@rewinfrey: Maybe Im overly optimistic but I dont have anything to point to that I didnt think went well. The challenges we identified during minisummit felt like a good sign of the project moving forward.
@robrix: Ditto. May end up being a bit distracted over the next couple of weeks figuring out some stuff re: summit & my attendance of it.
## What did you learn?
@joshvera: Type ∋ Type isnt as easy as it sounds. Learned about you both too!
@rewinfrey: Learned about you both. Recursion schemes! Relationships between algebras & projections, coalgebras & embeddings, and recursion-schemes `Base` type family. Further explored some other morphisms. RWS (albeit misreading some of it).
@robrix: Learned about you both! RWS. Some stuff about derivative-parsing. Learned a lot about communication too.
## Anything else?
@joshvera: Out Thursday/Friday.
@rewinfrey: Josh, how did the blue suit fit? (“Really well.”)
@robrix: Canada Day on Friday. Youre both invited to celebrate it as well, by being as Canadian as possible.