# Semantic Diff Problems (Mini-Summit) ### Performance (most significant problem) - SES / Alignment are biggest time / space consumers. - Profiling small subsets of code paths rather than the full context. - Adding more criterion benchmarks for code paths not currently profiled (like Diff Summaries). ##### Alignment performance - Has to visit each child of each remaining line. ##### [SES](https://github.com/github/semantic-diff/files/22485/An.O.ND.Difference.Algorithm.and.its.Variations.pdf) Performance - n^3 the size of the tree. - Can try bounded SES (looks ahead by a fixed size of nodes). - Identify more comparisons we can skip (i.e. don't compare functions with array literals). - Does not look like there are more easy wins here (algorithm is already implemented to prevent unnecessary comparisions). - In some cases, the diffing is expensive because we don't have more fine-grain identifiers for certain diffs. (e.g. a test file with 100 statement expressions). - Diffing against identifiers (use the edit distance to determine whether to compare terms with SES or not). - This could result in us missing a function rename though. - Not a catchall, but it can help increase performance in a larger number of cases. ##### [RWS](https://github.com/github/semantic-diff/files/325837/RWS-Diff.Flexible.and.Efficient.Change.Detection.in.Hierarchical.Data.pdf) Performance - Random Walk Similarity. - computes approximation to the minimal edit script. - O(log N) rather than O(n^3). - RWS does not rely on identifiers. - RWS solves our performance problem in the general form. - Can allow us to diff patches of patches (something we cannot do currently with our implementation of SES). ##### Diff summaries performance - Performance of DS is dependent on diffing (Diff Terms, Interpreter, cost functions) ### Failing too hard - Request is not completing if Semantic Diff fails. - How can we fail better on dotcom? - How can we fail better when parsing? (both in Semantic Diff and dotcom) ### Responsiveness - Async fetch diff summaries / diffs / progressive diffs or diff summaries ### Improving grammars - Fix Ruby parser. - Testing and verifying other grammars. ### Measure effectiveness of grammars ### Tooling - Why isn't parallelization of SES having the expected effect? - Should focus on low hanging fruit but we're not going to write a debugger. ### Time limitations with respect to solutions and team ### Ramp up time is extremely variable. ### Onboarding - Pairing has been fantastic. - SES algorithm requires some context and background to understand the code at the general / macro level. - Plan a bit before pairing to gain context. ### Pre-launch Ideas - Test on a couple file server nodes and run semantic diff on javascript repos. - Collect repos, files, shas that contain error nodes to gain a % of error rates and expose errors in tree sitter grammars. - If sources have errors, can we use a parser that validates the source is correct? - Configure a script that is as language independent as possible that can automate the error collection process but allows us to specify an independent validating parser for each language.