Add mini summit problem notes

2025-01-01 19:55:34 +03:00 · 2016-06-21 12:26:48 -04:00 · 2016-06-21 12:26:48 -04:00 · 25f8b922ab
commit 25f8b922ab
parent f1293a9887
1 changed files with 70 additions and 0 deletions
--- a/weekly/2016-06-21.md
+++ b/weekly/2016-06-21.md
@ -0,0 +1,70 @@
+# Semantic Diff Problems (Mini-Summit)
+
+### Performance (most significant problem)
+
+  - SES / Alignment are biggest time / space consumers
+  - Profiling small subsets of code   paths rather than the full context.
+  - Adding more criterion benchmarks for code paths not currently profiled (like Diff Summaries)
+
+#### Alignment Performance
+
+  - Has to visit each child of each remaining line
+
+#### [SES](https://github.com/github/semantic-diff/files/22485/An.O.ND.Difference.Algorithm.and.its.Variations.pdf) Performance
+
+  - n^3 the size of the tree
+  - Can try bounded SES (looks ahead by a fixed size of nodes)
+  - Identify more comparisons we can skip (i.e. don't compare functions with array literals)
+  - Does not look like there are more easy wins here (algorithm is already implemented to prevent unnecessary comparisions).
+    - In some cases, the diffing is expensive because we don't have more
+      fine-grain identifiers for certain diffs. (e.g. a test file with 100 statement expressions)
+      - Diffing against identifiers (use the edit distance to determine whether to compare terms with SES or not)
+      - This could result in us missing a function rename though
+      - Not a catchall, but it can help increase performance in a larger number of cases
+
+#### [RWS](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf) Performance
+
+  - Random Walk Similarity
+  - computes approximation to the minimal edit script
+  - O(log N) rather than O(n^3)
+  - RWS does not rely on identifiers
+  - RWS solves our performance problem in the general form
+  - Can allow us to diff patches of patches (something we cannot do currently with our implementation of SES)
+
+#### Diff Summaries Performance
+
+  - Performance of DS is dependent on diffing (Diff Terms, Interpreter, cost functions)
+
+### Failing too hard when we fail (request is not completing if Semantic Diff fails)
+
+  - How can we fail better on dotcom?
+  - How can we fail better when parsing? (both in Semantic Diff and dotcom)
+
+### Responsiveness
+
+  - Async fetch diff summaries / diffs / progressive diffs or diff summaries
+
+### Improving grammars (getting Ruby parser fixed, testing C parser)
+
+### Measuring effectiveness of grammars
+
+### Tooling
+
+  - Why isn't parallelization of SES having the expected effect?
+  - Should focus on low hanging fruit   but we're not going to write a debugger.
+
+### Time limitations with respect to solutions and team
+
+### Ramp up time is extremely variable.
+
+### Onboarding
+
+  - SES algorithm requires some context and background to understand the code at a macro.
+  - Plan a bit before pairing to gain context
+
+### Pre-launch Ideas
+
+  - Test on a couple file server nodes and run semantic diff on javascript repos.
+  - Collect repos, files, shas that contain error nodes to gain a % of error rates and expose errors in tree sitter grammars.
+  - If sources have errors, can we use a parser that validates the source is correct?
+  - Configure a script that is as language independent as possible that can automate the error collection process but allows us to specify an independent validating parser for each language.