Implements import from the [callgrind format](https://www.valgrind.org/docs/manual/cl-format.html).
This comes with a big caveat that the call graph information contained with callgrind formatted files don't uniquely define a flamegraph, so the generated flamegraph is a best-effort guess. Here's the comment from the top of the main file for the callgrind importer with an examplataion:
```
// https://www.valgrind.org/docs/manual/cl-format.html
//
// Larger example files can be found by searching on github:
// https://github.com/search?q=cfn%3D&type=code
//
// Converting callgrind files into flamegraphs is challenging because callgrind
// formatted profiles contain call graphs with weighted nodes and edges, and
// such a weighted call graph does not uniquely define a flamegraph.
//
// Consider a program that looks like this:
//
// // example.js
// function backup(read) {
// if (read) {
// read()
// } else {
// write()
// }
// }
//
// function start() {
// backup(true)
// }
//
// function end() {
// backup(false)
// }
//
// start()
// end()
//
// Profiling this program might result in a profile that looks like the
// following flame graph defined in Brendan Gregg's plaintext format:
//
// start;backup;read 4
// end;backup;write 4
//
// When we convert this execution into a call-graph, we get the following:
//
// +------------------+ +---------------+
// | start (self: 0) | | end (self: 0) |
// +------------------+ +---------------|
// \ /
// (total: 4) \ / (total: 4)
// v v
// +------------------+
// | backup (self: 0) |
// +------------------+
// / \
// (total: 4) / \ (total: 4)
// v v
// +----------------+ +-----------------+
// | read (self: 4) | | write (self: 4) |
// +----------------+ +-----------------+
//
// In the process of the conversion, we've lost information about the ratio of
// time spent in read v.s. write in the start call v.s. the end call. The
// following flame graph would yield the exact same call-graph, and therefore
// the exact sample call-grind formatted profile:
//
// start;backup;read 3
// start;backup;write 1
// end;backup;read 1
// end;backup;write 3
//
// This is unfortunate, since it means we can't produce a flamegraph that isn't
// potentially lying about the what the actual execution behavior was. To
// produce a flamegraph at all from the call graph representation, we have to
// decide how much weight each sub-call should have. Given that we know the
// total weight of each node, we'll make the incorrect assumption that every
// invocation of a function will have the average distribution of costs among
// the sub-function invocations. In the example given, this means we assume that
// every invocation of backup() is assumed to spend half its time in read() and
// half its time in write().
//
// So the flamegraph we'll produce from the given call-graph will actually be:
//
// start;backup;read 2
// start;backup;write 2
// end;backup;read 2
// end;backup;write 2
//
// A particularly bad consequence is that the resulting flamegraph will suggest
// that there was at some point a call stack that looked like
// strat;backup;write, even though that never happened in the real program
// execution.
```
Fixes#18
The trace event format has a very unfortunate combination of requirements in order to give a best-effort interpretation of a given trace file:
1. Events may be recorded out-of-order by timestamp
2. Events with the *same* timestamp should be processed in the order they were provided in the file. Mostly.
The first requirement is written explicitly [in the spec](https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview).
> The events do not have to be in timestamp-sorted order.
The second one isn't explicitly written, but it's implicitly true because otherwise the interpretation of a file is ambiguous. For example, the following file has all events with the same `ts` field, but re-ordering the fields changes the interpretation.
```
[
{ "pid": 0, "tid": 0, "ph": "X", "ts": 0, "dur": 20, "name": "alpha" },
{ "pid": 0, "tid": 0, "ph": "X", "ts": 0, "dur": 20, "name": "beta" }
}
```
If we allowed arbitrary reordering, it would be ambiguous whether the alpha frame should be nested inside of the beta frame or vice versa. Since traces are interpreted as call trees, it's not okay to just arbitrarily choose.
So you might next guess that a reasonable approach would be to do a [stable sort](https://wiki.c2.com/?StableSort) by "ts", then process the events one-by-one. This almost works, except for two additional problems. The first problem is that in some situations this would still yield invalid results.
```
[
{"pid": 0, "tid": 0, "ph": "B", "name": "alpha", "ts": 0},
{"pid": 0, "tid": 0, "ph": "B", "name": "beta", "ts": 0},
{"pid": 0, "tid": 0, "ph": "E", "name": "alpha", "ts": 1},
{"pid": 0, "tid": 0, "ph": "E", "name": "beta", "ts": 1}
]
```
If we were to follow this rule, we would try to execute the `"E"` for alpha before the `"E"` for beta, even though beta is on the top of the stack. So in *that* case, we actually need to execute the `"E"` for beta first, otherwise the resulting profile is incorrect.
The other problem with this approach of using the stable sort order is the question of how to deal with `"X"` events. speedscope translates `"X"` events into a `"B"` and `"E"` event pair. But where should it put the `"E"` event? Your first guess might be "at the index where the `"X"` events occur in the file". This runs into trouble in cases like this:
```
[
{ "pid": 0, "tid": 0, "ph": "X", "ts": 9, "dur": 1, "name": "beta" },
{ "pid": 0, "tid": 0, "ph": "X", "ts": 9, "dur": 2, "name": "gamma" },
]
```
The most natural translation of this would be to convert it into the following `"B"` and `"E"` events:
```
[
{ "pid": 0, "tid": 0, "ph": "B", "ts": 9, "name": "beta" },
{ "pid": 0, "tid": 0, "ph": "E", "ts": 10, "name": "beta" },
{ "pid": 0, "tid": 0, "ph": "B", "ts": 9, "name": "gamma" },
{ "pid": 0, "tid": 0, "ph": "E", "ts": 11, "name": "gamma" },
]
```
Which, after a stable sort turns into this:
```
[
{ "pid": 0, "tid": 0, "ph": "B", "ts": 9, "name": "beta" },
{ "pid": 0, "tid": 0, "ph": "B", "ts": 9, "name": "gamma" },
{ "pid": 0, "tid": 0, "ph": "E", "ts": 10, "name": "beta" },
{ "pid": 0, "tid": 0, "ph": "E", "ts": 11, "name": "gamma" },
]
```
Notice that we again have a problem where we open "beta" before "gamma", but we need to close "beta" first because it ends first!
Ultimately, I couldn't figure out any sort order that would allow me to predict ahead-of-time what order to process the events in. So instead, I create two event queues: one for `"B"` events, and one for `"E"` events, and then try to be clever about how I merge them together.
AFAICT, chrome://tracing does not sort events before processing them, which is kind of baffling. But chrome://tracing also has really bizarre behaviour for things like this where the resulting flamegraph isn't even a valid tree (there are overlapping ranges):
```
[
{ "pid": 0, "tid": 0, "ph": "X", "ts": 0, "dur": 10, "name": "alpha" },
{ "pid": 0, "tid": 0, "ph": "X", "ts": 5, "dur": 10, "name": "beta" }
}
```
So I'm going to call this "good enough" for now.
Fixes#223Fixes#320
In #273, I changed `CallTreeProfileBuilder.leaveFrame` to fail hard when you request to leave a frame different from the one at the top of the stack. It turns out we were intentionally doing this for trace event imports, because `args` are part of the frame key, and we want to allow profiles to be imported where the `"B"` and `"E"` events have differing `args` field.
This PR fixes the import code to permissively allow the `"args"` field to not match between the `"B"` and `"E"` fields.
**A note on intentional differences between speedscope and chrome://tracing**
`chrome://tracing` will close whichever frame is at the top when it gets an `"E"` event, regardless of whether the name or the args match. speedscope will ignore the event entirely if the `"name"` field doesn't match, but will warn but still close the frame if the `"name"`s match but the `"args"` don't.
```
[
{"pid": 0, "tid": 0, "ph": "B", "name": "alpha", "ts": 0},
{"pid": 0, "tid": 0, "ph": "B", "name": "beta", "ts": 1},
{"pid": 0, "tid": 0, "ph": "E", "name": "gamma", "ts": 2},
{"pid": 0, "tid": 0, "ph": "E", "name": "beta", "ts": 9},
{"pid": 0, "tid": 0, "ph": "E", "name": "alpha", "ts": 10}
]
```
### speedscope
![image](https://user-images.githubusercontent.com/150329/97098205-7365dd00-1637-11eb-9869-4e81ebebcee1.png)
```
warning: ts=2: Request to end "gamma" when "beta" was on the top of the stack. Doing nothing instead.
```
### chrome://tracing
![image](https://user-images.githubusercontent.com/150329/97098215-87114380-1637-11eb-909c-b2e70c7291a4.png)
This PR adds the ability to remap an already-loaded profile using a JavaScript source map. This is useful for e.g. recording minified profiles in production, and then remapping their symbols when the source map isn't made directly available to the browser in production.
This is a bit of a hidden feature. The way it works is to drop a profile into speedscope, then drop the sourcemap file on top of it.
To test this, I used a small project @cricklet made (https://gist.github.com/cricklet/0deaaa7dd63657adb6818f0a52362651), and also tested against speedscope itself.
To test against speedscope itself, I profiled loading a file in speedscope in Chrome, then dropped the resulting Chrome timeline profile into speedscope, and dropped speedscope's own sourcemap on top. Before dropping the source map, the symbols look like this:
![image](https://user-images.githubusercontent.com/150329/94977230-b2878f00-04cc-11eb-8907-02a1f1485653.png)
After dropping the source map, they look like this:
![image](https://user-images.githubusercontent.com/150329/94977253-d4811180-04cc-11eb-9f88-1e7a02149331.png)
I also added automated tests using a small JS bundle constructed with various different JS bundlers to make sure it was doing a sensible thing in each case.
# Background
Remapping symbols in profiles using source-maps proved to be more complex than I originally thought because of an idiosyncrasy of which line & column are referenced for stack frames in browsers. Rather than the line & column referencing the first character of the symbol, they instead reference the opening paren for the function definition.
Here's an example file where it's not immediately apparent which line & column is going to be referenced by each stack frame:
```
class Kludge {
constructor() {
alpha()
}
zap() {
alpha()
}
}
function alpha() {
for (let i = 0; i < 1000; i++) {
beta()
delta()
}
}
function beta() {
for (let i = 0; i < 10; i++) {
gamma()
}
}
const delta = function () {
for (let i = 0; i < 10; i++) {
gamma()
}
}
const gamma =
() => {
let prod = 1
for (let i = 1; i < 1000; i++) {
prod *= i
}
return prod
}
const k = new Kludge()
k.zap()
```
The resulting profile looks like this:
![image](https://user-images.githubusercontent.com/150329/94976830-0db88200-04cb-11eb-86d7-934365a17c53.png)
The relevant line & column for each function are...
```
// Kludge: line 2, column 14
class Kludge {
constructor() {
^
...
// zap: line 6, column 6
zap() {
^
...
// alpha: line 11, column 15
function alpha() {
^
...
// delta: line 24, column 24
const delta = function () {
^
...
// gamma: line 31, column 1
const gamma =
() => {
^
```
If we look up the source map entry that corresponds to the opening paren, we'll nearly always get nothing. Instead, we'll look at the entry *preceding* the one which contains the opening paren, and hope that has our symbol name. It seems this works at least some of the time.
Another complication is that some, but not all source maps include the original names of functions. For ones that don't, but do include the original source-code, we try to deduce it ourselves with varying amounts of success.
Supersedes #306Fixes#139
The Austin format conversion tools have been moved to the dedicated austin-python module. The README has been updated to point to the new instructions.
Before this PR, we blindly assumed that all text imported into speedscope was UTF-8 encoded. This, unsurprisingly, is not always true. After this PR, we support text that's UTF-16 encoded, with either the little-endian or big-endian byte-order-mark.
Fixed#291
Closes#294
This adds import for Safari/webkit profiler. Well, for Safari 13.1 for sure, I haven't done any work to check if there's been changes to the syntax.
It seems to work OK, and is already a huge improvement over profiling in Safari (which doesn't even have a flame graph, let alone something like left heavy). Sadly, the sampler resolution is only 1kHz, which is not super useful for a lot of profiling work. I made a ticket on webkit bug tracker to ask for 10kHz/configurable sampling rate: https://bugs.webkit.org/show_bug.cgi?id=214866
Another thing that's missing is that I cut out all the idle time. We could also insert layout/paint samples into the timeline by parsing `events`. But I'll leave that for another time.
<img width="1280" alt="Captura de pantalla 2020-07-28 a las 11 02 06" src="https://user-images.githubusercontent.com/183747/88643560-20c16700-d0c2-11ea-9c73-d9159e68fab9.png">
## Context
Hi! I'm working on an experimental React [concurrent mode profiler](https://react-scheduling-profiler.vercel.app) in partnership with the React core team, and we're using a [custom build of Speedscope](https://github.com/taneliang/speedscope/compare/master...taneliang:fork-for-scheduling-profiler) that exposes Speedscope's internals to support our custom flamechart rendering. Specifically, Speedscope is used to import and process Chrome profiles, which are then fed to our rendering code that draws everything to a canvas.
Here's a screenshot of our app for context. The stuff above the thick gray bar is React data (some React Fiber lanes, React events, and other user timing marks), and a flamechart is drawn below.
![image](https://user-images.githubusercontent.com/12784593/89261576-e2e3b600-d660-11ea-9b90-6c6991d061d6.png)
## Problem
Early on, we had [an issue](https://github.com/MLH-Fellowship/scheduling-profiler-prototype/issues/42) where our flamechart was not aligned with the React data. The discrepancy between the flamechart frames and our React data grew over the time of the profile.
We tracked down the cause to https://github.com/jlfwong/speedscope/pull/80, which resolves https://github.com/jlfwong/speedscope/issues/70. It seems like zeroing out those negative time deltas resulted in the accumulation of errors over the time of these profiles, which resulted in the very visible misalignment in our profiler.
I am confident that the React data's timestamps are correct because they are obtained from User Timing marks, which have absolute timestamps and are thus independent of any `timeDelta` stuff. This would mean that Speedscope is likely displaying incorrect timestamps for Chrome profiles.
## Solution
This PR takes a different approach to solving the negative `timeDelta` problem: we add a `lastElapsed` variable as a sort of backstop, preventing `elapsed` from traveling backwards in time, while still ensuring that `elapsed` is always accurate.
We've been using this patch in our custom build for about a month now and it seems to work well.
This implements the next step towards full featured search in speedscope: visual highlighting of matching search results in the time ordered & left heavy views. This doesn't yet add the ability to click prev/next to select the next matching element in the editor, but I'm still planning on doing something like that. I haven't figured out yet what I want the user experience to be like for that.
![speedscope-flamegraph-search](https://user-images.githubusercontent.com/150329/87898991-9ebba900-ca04-11ea-9bd9-31ad8d4c6d2a.gif)
This works towards fixing #38
This fixes two unrelated problems which together caused performance issues in the sandwich view & made hover tooltips appear to be broken.
The first issue was caused by continuously priming the `requestAnimationFrame` loop when it should be a no-op, and the second issue was caused by using different cache keys when trying to access a memoized value in the caller & callee flamegraph components. This resulted in thrash, and especially bad performance because the cache miss was resulting in us re-allocating the WebGL framebuffer on every frame, which is unsurprisingly quite slow.
Fixes#212Fixes#155Fixes#74 (though this was maybe already fixed)
This is the first step towards fixing #38.
I started with the easiest part from a UI-paradigm perspective, and also the place that's the most confusing that search doesn't work. Before this PR, browers' Cmd+F/Ctrl+F would *look* like it worked in the Sandwich view, but they wouldn't work fully because the view in the sandwich view is a virtualized table, meaning that it doesn't put all of the rows in the DOM. Instead, it only renders enough to fill the viewport to make rendering much faster.
Here's what the changes from this PR look like in action:
![Kapture 2020-07-12 at 23 17 33](https://user-images.githubusercontent.com/150329/87276802-ef2b8780-c495-11ea-9856-9c834ea7f028.gif)
Before closing #38, I'll be adding search functionality to the flamechart views too.
This adds much better UI for selecting different profiles within a single import.
![Kapture 2020-05-30 at 21 34 06](https://user-images.githubusercontent.com/150329/83344564-595ce400-a2bd-11ea-8306-e5d8f647b65e.gif)
You can now hover over the middle of the toolbar or hit `t` on your keyboard to bring up the profile selector. From there, you can use fuzzy-find to switch to the profile you want, and hit "enter" to select it. The up and down arrow keys can be used while the profile selector filter input is focused to move through the list of profiles.
I think the "next" and "prev" buttons are now totally useless, so I removed them.
Fixes#167
Profile switching was subtly broken because action creators weren't being correctly re-bound due to a missing dependency in a `useCallback` call.
I also tried to reduce boilerplate in this PR by adding additional exhaustive deps protection via eslint for `useSelector`, `useAppSelector`, and `useActionCreator`. The removes the need for using `useCallback` or each of those.
Fixes#280
To test this, load a profile, then save a `.tsx` file locally. Before this change, it would bring you back to the welcome screen after hot reload. After this change, application state is still displayed. This is because before the change, the `setGLCanvas` action wasn't resulting in a re-render because it occurred between the initial render and the `useLayoutEffect` callback.
Fixes#276
I'd like to try writing new components using hooks, and to do that I need to upgrade from preact 8 to preact X.
For reasons that are... complicated, in order to upgrade without breaking part of my build process, I had to remove the dependency on `preact-redux` altogether. This led me to write my own implementation, and as part of that I realized I could remove `createContainer` in favour of some simple hooks that use redux.
Before landing:
- [x] Investigate performance issues in the sandwich views
- [x] Investigate es-lint checks for exhaustive hook dependencies
Fixes#268
I fixed it by dropping the dependency on quicktype entirely, and using its dependency directly. I still don't understand why the version of typescript used in this repository affects what quicktype is doing, but it seems like the issue is in quicktype, not its dependency.
I validated this change was correct by diffing the output of `node scripts/generate-file-format-schema-json.js` with what's currently on http://speedscope.app/file-format-schema.json. There's no difference.
This PR also includes changes to the CI script to ensure that we can catch this before hitting master next time.
@JustinBeckwith pointed out in #262 that `npm install` was broken in node 13.x, and @DanielRuf pointed in #254 that test fail for node 11+ because of a change to stability of sorting.
This PR seeks to address both of those.
The installation issue was fixed by just regenerating `package-lock.json` without needing to bump any of the direct dependency versions. The test failure issue requires manual intervention.
To fix the sort stability issue, I updated the tests to use the stable sort values (these were all the correct values, though some of the test values were incorrect).
To make the suite still pass for node 10, I added a hack where I override `Array.prototype.sort` with a stable implementation that's *only* used in tests (See comments in code for a justification for why)
## Test Plan
Before this PR: `npm install` on node 13.x fails & `npm run jest` results in test failures
After this PR: `npm install` on node 13.x passes & `npm run jest` passes for node 10, 12, and 13.
I ended up in a horrible peer dependency hell and apparently needed to bump the versions of quicktype, typescript, ts-jest, *and* jest to get out of it. But I think I got out of it!
Local builds and deployment builds both seem to work after these changes.
The code to import trace formatted events intentionally re-orders events in order to make it easier at flamegraph construction time to order the pushes and pops of frames.
It turns out that this re-ordering results in incorrect flamegraphs being generated as shown in #251.
This PR fixes this by avoiding re-ordering in situations where it isn't necessary.