From 25f8b922aba725fdad08fa93051b8540aa7338b0 Mon Sep 17 00:00:00 2001 From: Rick Winfrey Date: Tue, 21 Jun 2016 12:26:48 -0400 Subject: [PATCH 1/3] Add mini summit problem notes --- weekly/2016-06-21.md | 70 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 weekly/2016-06-21.md diff --git a/weekly/2016-06-21.md b/weekly/2016-06-21.md new file mode 100644 index 000000000..6b2ed2c08 --- /dev/null +++ b/weekly/2016-06-21.md @@ -0,0 +1,70 @@ +# Semantic Diff Problems (Mini-Summit) + +### Performance (most significant problem) + + - SES / Alignment are biggest time / space consumers + - Profiling small subsets of code paths rather than the full context. + - Adding more criterion benchmarks for code paths not currently profiled (like Diff Summaries) + +#### Alignment Performance + + - Has to visit each child of each remaining line + +#### [SES](https://github.com/github/semantic-diff/files/22485/An.O.ND.Difference.Algorithm.and.its.Variations.pdf) Performance + + - n^3 the size of the tree + - Can try bounded SES (looks ahead by a fixed size of nodes) + - Identify more comparisons we can skip (i.e. don't compare functions with array literals) + - Does not look like there are more easy wins here (algorithm is already implemented to prevent unnecessary comparisions). + - In some cases, the diffing is expensive because we don't have more + fine-grain identifiers for certain diffs. (e.g. a test file with 100 statement expressions) + - Diffing against identifiers (use the edit distance to determine whether to compare terms with SES or not) + - This could result in us missing a function rename though + - Not a catchall, but it can help increase performance in a larger number of cases + +#### [RWS](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf) Performance + + - Random Walk Similarity + - computes approximation to the minimal edit script + - O(log N) rather than O(n^3) + - RWS does not rely on identifiers + - RWS solves our performance problem in the general form + - Can allow us to diff patches of patches (something we cannot do currently with our implementation of SES) + +#### Diff Summaries Performance + + - Performance of DS is dependent on diffing (Diff Terms, Interpreter, cost functions) + +### Failing too hard when we fail (request is not completing if Semantic Diff fails) + + - How can we fail better on dotcom? + - How can we fail better when parsing? (both in Semantic Diff and dotcom) + +### Responsiveness + + - Async fetch diff summaries / diffs / progressive diffs or diff summaries + +### Improving grammars (getting Ruby parser fixed, testing C parser) + +### Measuring effectiveness of grammars + +### Tooling + + - Why isn't parallelization of SES having the expected effect? + - Should focus on low hanging fruit but we're not going to write a debugger. + +### Time limitations with respect to solutions and team + +### Ramp up time is extremely variable. + +### Onboarding + + - SES algorithm requires some context and background to understand the code at a macro. + - Plan a bit before pairing to gain context + +### Pre-launch Ideas + + - Test on a couple file server nodes and run semantic diff on javascript repos. + - Collect repos, files, shas that contain error nodes to gain a % of error rates and expose errors in tree sitter grammars. + - If sources have errors, can we use a parser that validates the source is correct? + - Configure a script that is as language independent as possible that can automate the error collection process but allows us to specify an independent validating parser for each language. From 5c25ae4719e479570cd62471cd24d556a0699de2 Mon Sep 17 00:00:00 2001 From: Rick Winfrey Date: Tue, 21 Jun 2016 12:34:19 -0400 Subject: [PATCH 2/3] Small changes / typos --- weekly/2016-06-21.md | 56 ++++++++++++++++++++++++-------------------- 1 file changed, 30 insertions(+), 26 deletions(-) diff --git a/weekly/2016-06-21.md b/weekly/2016-06-21.md index 6b2ed2c08..14242a5fa 100644 --- a/weekly/2016-06-21.md +++ b/weekly/2016-06-21.md @@ -2,41 +2,41 @@ ### Performance (most significant problem) - - SES / Alignment are biggest time / space consumers + - SES / Alignment are biggest time / space consumers. - Profiling small subsets of code paths rather than the full context. - - Adding more criterion benchmarks for code paths not currently profiled (like Diff Summaries) + - Adding more criterion benchmarks for code paths not currently profiled (like Diff Summaries). -#### Alignment Performance +##### Alignment performance - - Has to visit each child of each remaining line + - Has to visit each child of each remaining line. -#### [SES](https://github.com/github/semantic-diff/files/22485/An.O.ND.Difference.Algorithm.and.its.Variations.pdf) Performance +##### [SES](https://github.com/github/semantic-diff/files/22485/An.O.ND.Difference.Algorithm.and.its.Variations.pdf) Performance - - n^3 the size of the tree - - Can try bounded SES (looks ahead by a fixed size of nodes) - - Identify more comparisons we can skip (i.e. don't compare functions with array literals) + - n^3 the size of the tree. + - Can try bounded SES (looks ahead by a fixed size of nodes). + - Identify more comparisons we can skip (i.e. don't compare functions with array literals). - Does not look like there are more easy wins here (algorithm is already implemented to prevent unnecessary comparisions). - - In some cases, the diffing is expensive because we don't have more - fine-grain identifiers for certain diffs. (e.g. a test file with 100 statement expressions) - - Diffing against identifiers (use the edit distance to determine whether to compare terms with SES or not) - - This could result in us missing a function rename though - - Not a catchall, but it can help increase performance in a larger number of cases + - In some cases, the diffing is expensive because we don't have more fine-grain identifiers for certain diffs. (e.g. a test file with 100 statement expressions). + - Diffing against identifiers (use the edit distance to determine whether to compare terms with SES or not). + - This could result in us missing a function rename though. + - Not a catchall, but it can help increase performance in a larger number of cases. -#### [RWS](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf) Performance +##### [RWS](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf) Performance - - Random Walk Similarity - - computes approximation to the minimal edit script - - O(log N) rather than O(n^3) - - RWS does not rely on identifiers - - RWS solves our performance problem in the general form - - Can allow us to diff patches of patches (something we cannot do currently with our implementation of SES) + - Random Walk Similarity. + - computes approximation to the minimal edit script. + - O(log N) rather than O(n^3). + - RWS does not rely on identifiers. + - RWS solves our performance problem in the general form. + - Can allow us to diff patches of patches (something we cannot do currently with our implementation of SES). -#### Diff Summaries Performance +##### Diff summaries performance - Performance of DS is dependent on diffing (Diff Terms, Interpreter, cost functions) -### Failing too hard when we fail (request is not completing if Semantic Diff fails) +### Failing too hard + - Request is not completing if Semantic Diff fails. - How can we fail better on dotcom? - How can we fail better when parsing? (both in Semantic Diff and dotcom) @@ -44,9 +44,12 @@ - Async fetch diff summaries / diffs / progressive diffs or diff summaries -### Improving grammars (getting Ruby parser fixed, testing C parser) +### Improving grammars -### Measuring effectiveness of grammars + - Fix Ruby parser. + - Testing and verifying other grammars. + +### Measure effectiveness of grammars ### Tooling @@ -59,8 +62,9 @@ ### Onboarding - - SES algorithm requires some context and background to understand the code at a macro. - - Plan a bit before pairing to gain context + - Pairing has been fantastic. + - SES algorithm requires some context and background to understand the code at the general / macro level. + - Plan a bit before pairing to gain context. ### Pre-launch Ideas From bb78a5d003b02d92b9620a6c572878025b596ff6 Mon Sep 17 00:00:00 2001 From: Rob Rix Date: Mon, 27 Jun 2016 11:19:22 -0400 Subject: [PATCH 3/3] June 27th, 2016 weekly --- weekly/2016-06-27.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 weekly/2016-06-27.md diff --git a/weekly/2016-06-27.md b/weekly/2016-06-27.md new file mode 100644 index 000000000..de844a09e --- /dev/null +++ b/weekly/2016-06-27.md @@ -0,0 +1,36 @@ +# June 27th, 2016 weekly + +## What went well? + +@joshvera: Pairing, minisummitting, RWS discussions. + +@rewinfrey: Pairing, context on recursion schemes, started independent work on the project, minisummitting. Defined what to work on next + +@robrix: Minisummit: got to know both of you better & really enjoyed that. Before that I was on vacation but you both did a great job! + + +## What went less well? + +@joshvera: Lots more problems turned up. Lots of stuff that has taken on more importance as we’ve thought about it more. Feel like I could’ve made more progress on diff summaries by now. Some of that has been minisummit, some of that has been every time we do more work on it there seems to be new layers peeling off exposing other issues & more work needing to be done. + +@rewinfrey: Maybe I’m overly optimistic but I don’t have anything to point to that I didn’t think went well. The challenges we identified during minisummit felt like a good sign of the project moving forward. + +@robrix: Ditto. May end up being a bit distracted over the next couple of weeks figuring out some stuff re: summit & my attendance of it. + + +## What did you learn? + +@joshvera: Type ∋ Type isn’t as easy as it sounds. Learned about you both too! + +@rewinfrey: Learned about you both. Recursion schemes! Relationships between algebras & projections, coalgebras & embeddings, and recursion-scheme’s `Base` type family. Further explored some other morphisms. RWS (albeit misreading some of it). + +@robrix: Learned about you both! RWS. Some stuff about derivative-parsing. Learned a lot about communication too. + + +## Anything else? + +@joshvera: Out Thursday/Friday. + +@rewinfrey: Josh, how did the blue suit fit? (“Really well.”) + +@robrix: Canada Day on Friday. You’re both invited to celebrate it as well, by being as Canadian as possible.