1
1
mirror of https://github.com/github/semantic.git synced 2024-12-28 09:21:35 +03:00
semantic/weekly/2016-06-21.md
2016-06-21 12:34:19 -04:00

3.2 KiB

Semantic Diff Problems (Mini-Summit)

Performance (most significant problem)

  • SES / Alignment are biggest time / space consumers.
  • Profiling small subsets of code paths rather than the full context.
  • Adding more criterion benchmarks for code paths not currently profiled (like Diff Summaries).
Alignment performance
  • Has to visit each child of each remaining line.
SES Performance
  • n^3 the size of the tree.
  • Can try bounded SES (looks ahead by a fixed size of nodes).
  • Identify more comparisons we can skip (i.e. don't compare functions with array literals).
  • Does not look like there are more easy wins here (algorithm is already implemented to prevent unnecessary comparisions).
  • In some cases, the diffing is expensive because we don't have more fine-grain identifiers for certain diffs. (e.g. a test file with 100 statement expressions).
  • Diffing against identifiers (use the edit distance to determine whether to compare terms with SES or not).
  • This could result in us missing a function rename though.
  • Not a catchall, but it can help increase performance in a larger number of cases.
RWS Performance
  • Random Walk Similarity.
  • computes approximation to the minimal edit script.
  • O(log N) rather than O(n^3).
  • RWS does not rely on identifiers.
  • RWS solves our performance problem in the general form.
  • Can allow us to diff patches of patches (something we cannot do currently with our implementation of SES).
Diff summaries performance
  • Performance of DS is dependent on diffing (Diff Terms, Interpreter, cost functions)

Failing too hard

  • Request is not completing if Semantic Diff fails.
  • How can we fail better on dotcom?
  • How can we fail better when parsing? (both in Semantic Diff and dotcom)

Responsiveness

  • Async fetch diff summaries / diffs / progressive diffs or diff summaries

Improving grammars

  • Fix Ruby parser.
  • Testing and verifying other grammars.

Measure effectiveness of grammars

Tooling

  • Why isn't parallelization of SES having the expected effect?
  • Should focus on low hanging fruit but we're not going to write a debugger.

Time limitations with respect to solutions and team

Ramp up time is extremely variable.

Onboarding

  • Pairing has been fantastic.
  • SES algorithm requires some context and background to understand the code at the general / macro level.
  • Plan a bit before pairing to gain context.

Pre-launch Ideas

  • Test on a couple file server nodes and run semantic diff on javascript repos.
  • Collect repos, files, shas that contain error nodes to gain a % of error rates and expose errors in tree sitter grammars.
  • If sources have errors, can we use a parser that validates the source is correct?
  • Configure a script that is as language independent as possible that can automate the error collection process but allows us to specify an independent validating parser for each language.