This attempts to improve the quality of the on-CPU profiles stackprof provides. Rather than weighing samples by their timestamp deltas, which, in our opinion, are only valid in wall-clock mode, this weighs callchains by:
```
S = number of samples
P = sample period in nanoseconds
W = S * P
```
The difference after this change is quite substantial, specially in profiles that previously were showing up with heavy IO frames:
* Total profile weight is almost down by 90%, which actually makes sense for an on-CPU profile if the app is relatively idle
* Certain callchains that blocked in syscalls / IO are now much lower weight. This was what I was expecting to find.
Here is an example of the latter point.
In delta mode, we see an io select taking a long time, it is a significant portion of the profile:
<img width="1100" alt="236936508-709bee01-d616-4246-ba74-ab004331dcd3" src="https://github.com/dalehamel/speedscope/assets/4398256/39140f1e-50a9-4f33-8a61-ec98b6273fd4">
But in period scaling mode, it is only a couple of sample periods ultimately:
<img width="206" alt="236936693-9d44304e-a1c2-4906-b3c8-50e19e6f9f27" src="https://github.com/dalehamel/speedscope/assets/4398256/7d19077f-ef25-4d79-980b-cfa1775d928d">
This PR attempts to support stackprof's object mode which tracks the number of allocated objects. This differs from the other modes (cpu and wall) by taking samples every time a Ruby object is allocated using Ruby's [`NEWOBJ` tracepoint](df24b85953/ext/stackprof/stackprof.c (L198-L199)).
When importing an object mode profile into speedscope today it still works but what you see is a profile using time units. The profile will only have samples for when an object was allocated which means even if time is reported, the profile is not really meaningful when looking at time.
To address this I've done three things when `mode` is `object`:
+ adjusted the total size of the `StackListProfileBuilder` to use the number of samples (since each sample is one allocation)
+ adjusted the weight of each sample to be `nSamples` (which I believe is always `1` but I'm not positive)
+ do not set the value formatter to a time formatter
Here's what it looks like before and after my changes (note the units and weight of samples):
wall (before) | object (before) | object (after)
-- | -- | --
<img width="1624" alt="Screen Shot 2022-05-11 at 4 51 31 PM" src="https://user-images.githubusercontent.com/898172/167945635-2401ca73-4de7-4559-b884-cf8947ca9738.png"> | <img width="1624" alt="Screen Shot 2022-05-11 at 4 51 34 PM" src="https://user-images.githubusercontent.com/898172/167945641-ef302a60-730b-4afd-8e44-5f02e54b3cb7.png"> | <img width="1624" alt="Screen Shot 2022-05-11 at 4 51 42 PM" src="https://user-images.githubusercontent.com/898172/167945643-5611b267-f8b2-4227-a2bf-7145c4030aa2.png">
<details>
<summary>Test code</summary>
```ruby
require 'stackprof'
require 'json'
def do_test
5.times do
make_a_word
end
end
def make_a_word
('a'..'z').to_a.shuffle.map(&:upcase).join
end
StackProf.start(mode: :object, interval: 1, raw: true)
do_test
StackProf.stop
File.write('tmp/object_profile.json', JSON.generate(StackProf.results))
StackProf.start(mode: :wall, interval: 1, raw: true)
do_test
StackProf.stop
File.write('tmp/wall_profile.json', JSON.generate(StackProf.results))
```
</details>
This PR adds the ability to remap an already-loaded profile using a JavaScript source map. This is useful for e.g. recording minified profiles in production, and then remapping their symbols when the source map isn't made directly available to the browser in production.
This is a bit of a hidden feature. The way it works is to drop a profile into speedscope, then drop the sourcemap file on top of it.
To test this, I used a small project @cricklet made (https://gist.github.com/cricklet/0deaaa7dd63657adb6818f0a52362651), and also tested against speedscope itself.
To test against speedscope itself, I profiled loading a file in speedscope in Chrome, then dropped the resulting Chrome timeline profile into speedscope, and dropped speedscope's own sourcemap on top. Before dropping the source map, the symbols look like this:
![image](https://user-images.githubusercontent.com/150329/94977230-b2878f00-04cc-11eb-8907-02a1f1485653.png)
After dropping the source map, they look like this:
![image](https://user-images.githubusercontent.com/150329/94977253-d4811180-04cc-11eb-9f88-1e7a02149331.png)
I also added automated tests using a small JS bundle constructed with various different JS bundlers to make sure it was doing a sensible thing in each case.
# Background
Remapping symbols in profiles using source-maps proved to be more complex than I originally thought because of an idiosyncrasy of which line & column are referenced for stack frames in browsers. Rather than the line & column referencing the first character of the symbol, they instead reference the opening paren for the function definition.
Here's an example file where it's not immediately apparent which line & column is going to be referenced by each stack frame:
```
class Kludge {
constructor() {
alpha()
}
zap() {
alpha()
}
}
function alpha() {
for (let i = 0; i < 1000; i++) {
beta()
delta()
}
}
function beta() {
for (let i = 0; i < 10; i++) {
gamma()
}
}
const delta = function () {
for (let i = 0; i < 10; i++) {
gamma()
}
}
const gamma =
() => {
let prod = 1
for (let i = 1; i < 1000; i++) {
prod *= i
}
return prod
}
const k = new Kludge()
k.zap()
```
The resulting profile looks like this:
![image](https://user-images.githubusercontent.com/150329/94976830-0db88200-04cb-11eb-86d7-934365a17c53.png)
The relevant line & column for each function are...
```
// Kludge: line 2, column 14
class Kludge {
constructor() {
^
...
// zap: line 6, column 6
zap() {
^
...
// alpha: line 11, column 15
function alpha() {
^
...
// delta: line 24, column 24
const delta = function () {
^
...
// gamma: line 31, column 1
const gamma =
() => {
^
```
If we look up the source map entry that corresponds to the opening paren, we'll nearly always get nothing. Instead, we'll look at the entry *preceding* the one which contains the opening paren, and hope that has our symbol name. It seems this works at least some of the time.
Another complication is that some, but not all source maps include the original names of functions. For ones that don't, but do include the original source-code, we try to deduce it ourselves with varying amounts of success.
Supersedes #306Fixes#139
In #160, I wrote code which incorrectly assumed that at most one profile would be active at a time. It turns out this assumption is incorrect because of webworkers! This PR introduces a fix which correctly separates samples taken on the main thread from samples taken on worker threads, and allows viewing both in speedscope.
Fixes#171
This PR adds support for importing from Google's pprof format, which is a gzipped, protobuf encoded file format (that's incredibly well documented!) The [pprof http library](https://golang.org/pkg/net/http/pprof/) also offers an output of the trace file format, which continues to not be supported in speedscope to date (See #77). This will allow importing of profiles generated by the standard library go profiler for analysis of profiles containing heap allocation information, CPU profile information, and a few other things like coroutine creation information.
In order to add support for that a number of dependent bits of functionality were added, which should each provide an easier path for future binary input sources
- A protobuf decoding library was included ([protobufjs](https://www.npmjs.com/package/protobufjs)) which includes both a protobuf parser generator based on a .proto file & TypeScript definition generation from the resulting generated JavaScript file
- More generic binary file import. Before this PR, all supported sources were plaintext, with the exception of Instruments 10 support, which takes a totally different codepath. Now binary file import should work when files are dropped, opened via file browsing, or opened via invocation of the speedscope CLI.
- Transparent gzip decoding of imported files (this means that if you were to gzip compress another JSON file, then importing it should still work fine)
Fixes#60.
--
This is a [donation motivated](https://github.com/jlfwong/speedscope/issues/60#issuecomment-419660710) PR motivated by donations by @davecheney & @jmoiron to [/dev/color](https://www.devcolor.org/welcome) 🎉
This PR fixes#159, and also fixes various small things about how profiles were imported for previous versions of Chrome & for Firefox.
The Chrome 69 format splits profiles across several [Trace Event Format](https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview) events. There are two relevant events: "Profile" and "ProfileChunk". At first read through a profile, it seems like profiles are incorrectly terminated, but it seems like the cause of that is that, for whatever reason, events in the event log are not always sorted in chronological order. If sorted chronologically, then the event sequence can be parsed sensibly.
In the process of looking at this information, I also discovered that speedscope's chrome importer was incorrectly interpreting the value of the first element in `timeDeltas` array. It's intended to be the elapsed time since the start of the profile, not the time between the first pair of samples. This changes the weight attributed to the first sample.
#33 added support for importing from instruments indirectly via opening instruments and using the deep copy command. This PR adds support for importing `.trace` files directly, though only for time profiles specifically, and only for the highest sample count thread in the profile.
This PR adds `.trace` files from Instruments 9, and adds support for importing from either Instruments 8 and 9. The only major difference in the file format seems to be that Instruments 9 applies raw `zlib` compression generously throughout the file.
This PR also adds example `.trace` files for memory allocations, which are not supported for direct import. They use a totally different storage format for recording memory allocations, and I haven't yet figured out how that list of allocations references their corresponding callstack.
Lastly, this PR also adds examples from Instruments 7 since I happen to have a machine with an old version of Instruments. Import from Instruments 7 probably wouldn't be hard to add, but I haven't done that in this PR.
This currently only works in Chrome, and only via drag-and-drop of the files.
To test, drag the decompressed `simple-time-profile.trace` from 6016d970b9/sample/profiles/Instruments/9.3.1/simple-time-profile.trace.zip onto speedscope.
The result should be this:
![image](https://user-images.githubusercontent.com/150329/40162338-8fa13502-5968-11e8-8fb3-40626e41884a.png)
Fixes#15
This should help keep things organized as speedscope supports more languages & more formats
Test Plan: Try importing from every single file type, see that the link to load the example profile still works