traceprof: new extension to perform accurate profiling
Summary:
Instead of sampling periodically like the default profiler `statprof`, trace
every function call so the report is accurate. The output is designed to be
similar to `statprof`.
Written in C++ and inline assembly to minimize overhead. Cython is used as a
glue layer.
Comparing to `lsprof`, the overhead is similarly small. The major difference is
this profiler cares about "frame object" and records `frame->f_back` (parent
frame) so it could draw a statprof-like tree of the callgraph. `lsprof` records
"code object" instead and could have difficulty to deal with recursive calls.
The following code could demostrate the difference:
```
def fib(x):
if x < 2:
return x
return fib(x - 1) + fib(x - 2)
# lsprof - recorded 1 code object for fib and no way to figure out callgraph
import _lsprof
p = _lsprof.Profiler()
p.enable(subcalls=True)
fib(5)
p.disable()
for s in p.getstats():
print(s)
'''output:
_lsprof.profiler_entry(code="<method 'disable' of '_lsprof.Profiler' objects>", callcount=1, reccallcount=0, totaltime=0.0, inlinetime=0.0, calls=None)
_lsprof.profiler_entry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=15, reccallcount=14, totaltime=8e-06, inlinetime=8e-06, calls=[_lsprof.profiler_subentry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=14, reccallcount=12, totaltime=6e-06, inlinetime=6e-06)])
'''
# traceprof (with timethreshold set to 0) - callgraph drawn correctly
import sys
from hgext3rd import traceprof
with traceprof.profile(None, sys.stderr):
fib(5)
'''output:
| <module> a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
\ __exit__ contextlib.py:21
'''
```
There are some performance / correctness tradeoffs. The current implementation
takes `frame` address and several (but not theoretically complete) fields to
hash a frame. That could in theory lead to inaccurate frame information being
recorded. But hashing all fields recursively incurs significant overhead. The
related logic is the `hashframe` function, which could be tweaked later.
This is an initial version. It probably does not cover all corner cases like
exception handling well. And it may duplicate a same frame multiple times
because of the way it hashes a frame. The duplicated frame issue could be
solved by adding post-profiling filtering though.
Here is a real-world example of `hg log -r .` on `hg-committed`:
```
| <module> hg:10
| run dispatch.py:81
| dispatch dispatch.py:132
| _runcatch dispatch.py:202
| _callcatch dispatch.py:309
| callcatch scmutil.py:133
| _runcatchfunc dispatch.py:215
| _dispatch dispatch.py:744
2 \ repository hg.py:166
2 | _peerorrepo hg.py:151
2 | instance localrepo.py:2025
2 | __init__ localrepo.py:275
863 \ runcommand dispatch.py:656
863 | _runcommand dispatch.py:926
863 | <lambda> dispatch.py:918
863 | check util.py:1059
863 | log commands.py:3262
480 \ getlogrevs cmdutil.py:2236
479 | _logrevs cmdutil.py:2190
479 | revrange scmutil.py:429
479 | anyrevs localrepo.py:622
478 | mfunc revset.py:2275
478 | __init__ (4 times) smartset.py:1033
478 | __init__ (4 times) smartset.py:919
478 | __len__ (4 times) localrepo.py:587
478 | changelog repoview.py:309
477 | filterrevs (11 times) repoview.py:260
477 | computehidden repoview.py:171
455 \ hideablerevs repoview.py:24
455 | getrevs obsolete.py:1114
455 | _computeobsoleteset obsolete.py:1143
3 \ __get__ (2 times) localrepo.py:76
7 \ node (5760 times) changelog.py:359
4 | node (8938 times) revlog.py:482
9 \ __get__ (5760 times) localrepo.py:76
428 \ __get__ util.py:798
428 | successors obsolete.py:672
225 \ __get__ util.py:798
224 | _all obsolete.py:662
6 \ tryread vfs.py:32
6 | read vfs.py:60
200 \ _readmarkers obsolete.py:442
* 199 | _fm1readmarkers obsolete.py:430
15 \ _checkinvalidmarkers obsolete.py:523
* 203 \ _addsuccessors obsolete.py:504
3 \ tryreadcache repoview.py:166
3 | cachehash repoview.py:98
3 | heads localrepo.py:1884
17 \ <genexpr> (5225 times) repoview.py:191
15 | __contains__ (5226 times) ancestor.py:334
5 | parentrevs (4010 times) changelog.py:371
2 | parentrevs (4332 times) revlog.py:479
9 \ pager ui.py:843
9 | _runpager ui.py:906
8 | __init__ subprocess.py:330
8 | _execute_child subprocess.py:880
5 | _eintr_retry_call subprocess.py:118
374 \ show cmdutil.py:1308
374 | _show cmdutil.py:1316
365 \ _changesetlabels cmdutil.py:1266
365 | troubled context.py:225
12 \ unstable (2 times) context.py:207
12 | getrevs obsolete.py:1114
12 | _computeunstableset obsolete.py:1154
3 \ set (322 times) localrepo.py:610
3 \ revs (3 times) localrepo.py:593
8 \ __getitem__ (963 times) localrepo.py:566
4 \ parents (321 times) context.py:246
4 | __get__ (321 times) util.py:798
4 | _parents (321 times) context.py:562
3 \ <genexpr> (642 times) obsolete.py:1164
3 | obsolete (321 times) context.py:199
2 | getrevs (321 times) obsolete.py:1114
319 \ bumped (2 times) context.py:211
319 | getrevs obsolete.py:1114
319 | _computebumpedset obsolete.py:1181
4 \ set (322 times) localrepo.py:610
312 \ allprecursors (2021 times) obsolete.py:850
308 | __get__ util.py:798
308 | precursors obsolete.py:678
* 308 | _addprecursors obsolete.py:509
34 \ divergent (2 times) context.py:218
34 | getrevs obsolete.py:1114
33 | _computedivergentset obsolete.py:1204
4 \ set (322 times) localrepo.py:610
24 \ successorssets (1368 times) obsolete.py:899
3 \ __get__ (1368 times) localrepo.py:76
4 \ __contains__ (200 times) localrepo.py:575
9 \ names (2 times) namespaces.py:184
9 | <lambda> namespaces.py:43
9 | nodetags localrepo.py:741
8 | __get__ util.py:798
8 | _tagscache localrepo.py:646
8 | _findtags localrepo.py:685
7 | findglobaltags tags.py:170
7 | _readtagcache tags.py:370
6 | filteredhash scmutil.py:307
\ __exit__ contextlib.py:21
| maybeprofile profiling.py:199
| __exit__ contextlib.py:21
| profile profiling.py:148
| __exit__ contextlib.py:21
Total time: 867 ms
```
Some example conclusions from reading the above output include:
1. `_fm1readmarkers`, `_addsuccessors`, `_addprecursors` are slow
2. `changelog.node` and `changelog.parentrevs` are called thousands of times,
but they are just a few milliseconds.
3. Reading the `obsstore` file (`vfs.read`) takes 5ms exactly
While `statprof` may get 1 right, `lsprof` may get 2 right, neither of them is
helpful for 3. `statprof` is not accurate enough to be confident about 5ms,
`lsprof` will include other `vfs.read` calls so it's impossible to see just the
read for obsstore.
Test Plan:
With the upstream [patch](https://patchwork.mercurial-scm.org/patch/20822/), run:
```
HGPROF=traceprof hg id --config extensions.traceprof= --profile
```
Reviewers: #mercurial, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy, mjpieters, jiangyuan
Differential Revision: https://phabricator.intern.facebook.com/D5104851
Signature: t1:5104851:1495480386:666847ea08bb6a94658bc10d7f0e91c225d56836
2017-05-23 01:49:45 +03:00
|
|
|
# distutils: language = c++
|
|
|
|
|
|
|
|
# traceprof.pyx - C++ to Python bridge for the traceprof Mercurial extension
|
|
|
|
#
|
|
|
|
# Copyright 2017 Facebook, Inc.
|
|
|
|
#
|
|
|
|
# This software may be used and distributed according to the terms of the
|
|
|
|
# GNU General Public License version 2 or any later version.
|
|
|
|
|
|
|
|
"""accurate callgraph profiling
|
|
|
|
|
|
|
|
lsprof's high precision, plus statprof's intuitive output format.
|
|
|
|
|
|
|
|
Config::
|
|
|
|
|
|
|
|
[traceprof]
|
|
|
|
# whether to disable Python GC before profiling
|
|
|
|
disablegc = no
|
|
|
|
|
|
|
|
# minimal microseconds to show a function
|
|
|
|
timethreshold = 2000
|
|
|
|
|
|
|
|
# minimal call count to show "(N times)"
|
|
|
|
countthreshold = 2
|
2017-05-23 21:50:33 +03:00
|
|
|
|
|
|
|
# frame de-duplication (slower to print outputs)
|
|
|
|
framededup = yes
|
traceprof: new extension to perform accurate profiling
Summary:
Instead of sampling periodically like the default profiler `statprof`, trace
every function call so the report is accurate. The output is designed to be
similar to `statprof`.
Written in C++ and inline assembly to minimize overhead. Cython is used as a
glue layer.
Comparing to `lsprof`, the overhead is similarly small. The major difference is
this profiler cares about "frame object" and records `frame->f_back` (parent
frame) so it could draw a statprof-like tree of the callgraph. `lsprof` records
"code object" instead and could have difficulty to deal with recursive calls.
The following code could demostrate the difference:
```
def fib(x):
if x < 2:
return x
return fib(x - 1) + fib(x - 2)
# lsprof - recorded 1 code object for fib and no way to figure out callgraph
import _lsprof
p = _lsprof.Profiler()
p.enable(subcalls=True)
fib(5)
p.disable()
for s in p.getstats():
print(s)
'''output:
_lsprof.profiler_entry(code="<method 'disable' of '_lsprof.Profiler' objects>", callcount=1, reccallcount=0, totaltime=0.0, inlinetime=0.0, calls=None)
_lsprof.profiler_entry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=15, reccallcount=14, totaltime=8e-06, inlinetime=8e-06, calls=[_lsprof.profiler_subentry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=14, reccallcount=12, totaltime=6e-06, inlinetime=6e-06)])
'''
# traceprof (with timethreshold set to 0) - callgraph drawn correctly
import sys
from hgext3rd import traceprof
with traceprof.profile(None, sys.stderr):
fib(5)
'''output:
| <module> a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
\ __exit__ contextlib.py:21
'''
```
There are some performance / correctness tradeoffs. The current implementation
takes `frame` address and several (but not theoretically complete) fields to
hash a frame. That could in theory lead to inaccurate frame information being
recorded. But hashing all fields recursively incurs significant overhead. The
related logic is the `hashframe` function, which could be tweaked later.
This is an initial version. It probably does not cover all corner cases like
exception handling well. And it may duplicate a same frame multiple times
because of the way it hashes a frame. The duplicated frame issue could be
solved by adding post-profiling filtering though.
Here is a real-world example of `hg log -r .` on `hg-committed`:
```
| <module> hg:10
| run dispatch.py:81
| dispatch dispatch.py:132
| _runcatch dispatch.py:202
| _callcatch dispatch.py:309
| callcatch scmutil.py:133
| _runcatchfunc dispatch.py:215
| _dispatch dispatch.py:744
2 \ repository hg.py:166
2 | _peerorrepo hg.py:151
2 | instance localrepo.py:2025
2 | __init__ localrepo.py:275
863 \ runcommand dispatch.py:656
863 | _runcommand dispatch.py:926
863 | <lambda> dispatch.py:918
863 | check util.py:1059
863 | log commands.py:3262
480 \ getlogrevs cmdutil.py:2236
479 | _logrevs cmdutil.py:2190
479 | revrange scmutil.py:429
479 | anyrevs localrepo.py:622
478 | mfunc revset.py:2275
478 | __init__ (4 times) smartset.py:1033
478 | __init__ (4 times) smartset.py:919
478 | __len__ (4 times) localrepo.py:587
478 | changelog repoview.py:309
477 | filterrevs (11 times) repoview.py:260
477 | computehidden repoview.py:171
455 \ hideablerevs repoview.py:24
455 | getrevs obsolete.py:1114
455 | _computeobsoleteset obsolete.py:1143
3 \ __get__ (2 times) localrepo.py:76
7 \ node (5760 times) changelog.py:359
4 | node (8938 times) revlog.py:482
9 \ __get__ (5760 times) localrepo.py:76
428 \ __get__ util.py:798
428 | successors obsolete.py:672
225 \ __get__ util.py:798
224 | _all obsolete.py:662
6 \ tryread vfs.py:32
6 | read vfs.py:60
200 \ _readmarkers obsolete.py:442
* 199 | _fm1readmarkers obsolete.py:430
15 \ _checkinvalidmarkers obsolete.py:523
* 203 \ _addsuccessors obsolete.py:504
3 \ tryreadcache repoview.py:166
3 | cachehash repoview.py:98
3 | heads localrepo.py:1884
17 \ <genexpr> (5225 times) repoview.py:191
15 | __contains__ (5226 times) ancestor.py:334
5 | parentrevs (4010 times) changelog.py:371
2 | parentrevs (4332 times) revlog.py:479
9 \ pager ui.py:843
9 | _runpager ui.py:906
8 | __init__ subprocess.py:330
8 | _execute_child subprocess.py:880
5 | _eintr_retry_call subprocess.py:118
374 \ show cmdutil.py:1308
374 | _show cmdutil.py:1316
365 \ _changesetlabels cmdutil.py:1266
365 | troubled context.py:225
12 \ unstable (2 times) context.py:207
12 | getrevs obsolete.py:1114
12 | _computeunstableset obsolete.py:1154
3 \ set (322 times) localrepo.py:610
3 \ revs (3 times) localrepo.py:593
8 \ __getitem__ (963 times) localrepo.py:566
4 \ parents (321 times) context.py:246
4 | __get__ (321 times) util.py:798
4 | _parents (321 times) context.py:562
3 \ <genexpr> (642 times) obsolete.py:1164
3 | obsolete (321 times) context.py:199
2 | getrevs (321 times) obsolete.py:1114
319 \ bumped (2 times) context.py:211
319 | getrevs obsolete.py:1114
319 | _computebumpedset obsolete.py:1181
4 \ set (322 times) localrepo.py:610
312 \ allprecursors (2021 times) obsolete.py:850
308 | __get__ util.py:798
308 | precursors obsolete.py:678
* 308 | _addprecursors obsolete.py:509
34 \ divergent (2 times) context.py:218
34 | getrevs obsolete.py:1114
33 | _computedivergentset obsolete.py:1204
4 \ set (322 times) localrepo.py:610
24 \ successorssets (1368 times) obsolete.py:899
3 \ __get__ (1368 times) localrepo.py:76
4 \ __contains__ (200 times) localrepo.py:575
9 \ names (2 times) namespaces.py:184
9 | <lambda> namespaces.py:43
9 | nodetags localrepo.py:741
8 | __get__ util.py:798
8 | _tagscache localrepo.py:646
8 | _findtags localrepo.py:685
7 | findglobaltags tags.py:170
7 | _readtagcache tags.py:370
6 | filteredhash scmutil.py:307
\ __exit__ contextlib.py:21
| maybeprofile profiling.py:199
| __exit__ contextlib.py:21
| profile profiling.py:148
| __exit__ contextlib.py:21
Total time: 867 ms
```
Some example conclusions from reading the above output include:
1. `_fm1readmarkers`, `_addsuccessors`, `_addprecursors` are slow
2. `changelog.node` and `changelog.parentrevs` are called thousands of times,
but they are just a few milliseconds.
3. Reading the `obsstore` file (`vfs.read`) takes 5ms exactly
While `statprof` may get 1 right, `lsprof` may get 2 right, neither of them is
helpful for 3. `statprof` is not accurate enough to be confident about 5ms,
`lsprof` will include other `vfs.read` calls so it's impossible to see just the
read for obsstore.
Test Plan:
With the upstream [patch](https://patchwork.mercurial-scm.org/patch/20822/), run:
```
HGPROF=traceprof hg id --config extensions.traceprof= --profile
```
Reviewers: #mercurial, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy, mjpieters, jiangyuan
Differential Revision: https://phabricator.intern.facebook.com/D5104851
Signature: t1:5104851:1495480386:666847ea08bb6a94658bc10d7f0e91c225d56836
2017-05-23 01:49:45 +03:00
|
|
|
"""
|
|
|
|
|
|
|
|
from libc.stdio cimport FILE
|
|
|
|
from cpython.object cimport PyObject
|
|
|
|
|
|
|
|
import contextlib
|
|
|
|
import gc
|
|
|
|
|
2018-01-09 04:45:51 +03:00
|
|
|
cdef extern from "extlib/traceprofimpl.cpp":
|
traceprof: new extension to perform accurate profiling
Summary:
Instead of sampling periodically like the default profiler `statprof`, trace
every function call so the report is accurate. The output is designed to be
similar to `statprof`.
Written in C++ and inline assembly to minimize overhead. Cython is used as a
glue layer.
Comparing to `lsprof`, the overhead is similarly small. The major difference is
this profiler cares about "frame object" and records `frame->f_back` (parent
frame) so it could draw a statprof-like tree of the callgraph. `lsprof` records
"code object" instead and could have difficulty to deal with recursive calls.
The following code could demostrate the difference:
```
def fib(x):
if x < 2:
return x
return fib(x - 1) + fib(x - 2)
# lsprof - recorded 1 code object for fib and no way to figure out callgraph
import _lsprof
p = _lsprof.Profiler()
p.enable(subcalls=True)
fib(5)
p.disable()
for s in p.getstats():
print(s)
'''output:
_lsprof.profiler_entry(code="<method 'disable' of '_lsprof.Profiler' objects>", callcount=1, reccallcount=0, totaltime=0.0, inlinetime=0.0, calls=None)
_lsprof.profiler_entry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=15, reccallcount=14, totaltime=8e-06, inlinetime=8e-06, calls=[_lsprof.profiler_subentry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=14, reccallcount=12, totaltime=6e-06, inlinetime=6e-06)])
'''
# traceprof (with timethreshold set to 0) - callgraph drawn correctly
import sys
from hgext3rd import traceprof
with traceprof.profile(None, sys.stderr):
fib(5)
'''output:
| <module> a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
\ __exit__ contextlib.py:21
'''
```
There are some performance / correctness tradeoffs. The current implementation
takes `frame` address and several (but not theoretically complete) fields to
hash a frame. That could in theory lead to inaccurate frame information being
recorded. But hashing all fields recursively incurs significant overhead. The
related logic is the `hashframe` function, which could be tweaked later.
This is an initial version. It probably does not cover all corner cases like
exception handling well. And it may duplicate a same frame multiple times
because of the way it hashes a frame. The duplicated frame issue could be
solved by adding post-profiling filtering though.
Here is a real-world example of `hg log -r .` on `hg-committed`:
```
| <module> hg:10
| run dispatch.py:81
| dispatch dispatch.py:132
| _runcatch dispatch.py:202
| _callcatch dispatch.py:309
| callcatch scmutil.py:133
| _runcatchfunc dispatch.py:215
| _dispatch dispatch.py:744
2 \ repository hg.py:166
2 | _peerorrepo hg.py:151
2 | instance localrepo.py:2025
2 | __init__ localrepo.py:275
863 \ runcommand dispatch.py:656
863 | _runcommand dispatch.py:926
863 | <lambda> dispatch.py:918
863 | check util.py:1059
863 | log commands.py:3262
480 \ getlogrevs cmdutil.py:2236
479 | _logrevs cmdutil.py:2190
479 | revrange scmutil.py:429
479 | anyrevs localrepo.py:622
478 | mfunc revset.py:2275
478 | __init__ (4 times) smartset.py:1033
478 | __init__ (4 times) smartset.py:919
478 | __len__ (4 times) localrepo.py:587
478 | changelog repoview.py:309
477 | filterrevs (11 times) repoview.py:260
477 | computehidden repoview.py:171
455 \ hideablerevs repoview.py:24
455 | getrevs obsolete.py:1114
455 | _computeobsoleteset obsolete.py:1143
3 \ __get__ (2 times) localrepo.py:76
7 \ node (5760 times) changelog.py:359
4 | node (8938 times) revlog.py:482
9 \ __get__ (5760 times) localrepo.py:76
428 \ __get__ util.py:798
428 | successors obsolete.py:672
225 \ __get__ util.py:798
224 | _all obsolete.py:662
6 \ tryread vfs.py:32
6 | read vfs.py:60
200 \ _readmarkers obsolete.py:442
* 199 | _fm1readmarkers obsolete.py:430
15 \ _checkinvalidmarkers obsolete.py:523
* 203 \ _addsuccessors obsolete.py:504
3 \ tryreadcache repoview.py:166
3 | cachehash repoview.py:98
3 | heads localrepo.py:1884
17 \ <genexpr> (5225 times) repoview.py:191
15 | __contains__ (5226 times) ancestor.py:334
5 | parentrevs (4010 times) changelog.py:371
2 | parentrevs (4332 times) revlog.py:479
9 \ pager ui.py:843
9 | _runpager ui.py:906
8 | __init__ subprocess.py:330
8 | _execute_child subprocess.py:880
5 | _eintr_retry_call subprocess.py:118
374 \ show cmdutil.py:1308
374 | _show cmdutil.py:1316
365 \ _changesetlabels cmdutil.py:1266
365 | troubled context.py:225
12 \ unstable (2 times) context.py:207
12 | getrevs obsolete.py:1114
12 | _computeunstableset obsolete.py:1154
3 \ set (322 times) localrepo.py:610
3 \ revs (3 times) localrepo.py:593
8 \ __getitem__ (963 times) localrepo.py:566
4 \ parents (321 times) context.py:246
4 | __get__ (321 times) util.py:798
4 | _parents (321 times) context.py:562
3 \ <genexpr> (642 times) obsolete.py:1164
3 | obsolete (321 times) context.py:199
2 | getrevs (321 times) obsolete.py:1114
319 \ bumped (2 times) context.py:211
319 | getrevs obsolete.py:1114
319 | _computebumpedset obsolete.py:1181
4 \ set (322 times) localrepo.py:610
312 \ allprecursors (2021 times) obsolete.py:850
308 | __get__ util.py:798
308 | precursors obsolete.py:678
* 308 | _addprecursors obsolete.py:509
34 \ divergent (2 times) context.py:218
34 | getrevs obsolete.py:1114
33 | _computedivergentset obsolete.py:1204
4 \ set (322 times) localrepo.py:610
24 \ successorssets (1368 times) obsolete.py:899
3 \ __get__ (1368 times) localrepo.py:76
4 \ __contains__ (200 times) localrepo.py:575
9 \ names (2 times) namespaces.py:184
9 | <lambda> namespaces.py:43
9 | nodetags localrepo.py:741
8 | __get__ util.py:798
8 | _tagscache localrepo.py:646
8 | _findtags localrepo.py:685
7 | findglobaltags tags.py:170
7 | _readtagcache tags.py:370
6 | filteredhash scmutil.py:307
\ __exit__ contextlib.py:21
| maybeprofile profiling.py:199
| __exit__ contextlib.py:21
| profile profiling.py:148
| __exit__ contextlib.py:21
Total time: 867 ms
```
Some example conclusions from reading the above output include:
1. `_fm1readmarkers`, `_addsuccessors`, `_addprecursors` are slow
2. `changelog.node` and `changelog.parentrevs` are called thousands of times,
but they are just a few milliseconds.
3. Reading the `obsstore` file (`vfs.read`) takes 5ms exactly
While `statprof` may get 1 right, `lsprof` may get 2 right, neither of them is
helpful for 3. `statprof` is not accurate enough to be confident about 5ms,
`lsprof` will include other `vfs.read` calls so it's impossible to see just the
read for obsstore.
Test Plan:
With the upstream [patch](https://patchwork.mercurial-scm.org/patch/20822/), run:
```
HGPROF=traceprof hg id --config extensions.traceprof= --profile
```
Reviewers: #mercurial, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy, mjpieters, jiangyuan
Differential Revision: https://phabricator.intern.facebook.com/D5104851
Signature: t1:5104851:1495480386:666847ea08bb6a94658bc10d7f0e91c225d56836
2017-05-23 01:49:45 +03:00
|
|
|
void enable()
|
|
|
|
void disable()
|
2018-01-10 00:26:14 +03:00
|
|
|
IF UNAME_SYSNAME == "Windows":
|
|
|
|
void report()
|
|
|
|
ELSE:
|
|
|
|
void report(FILE *)
|
traceprof: new extension to perform accurate profiling
Summary:
Instead of sampling periodically like the default profiler `statprof`, trace
every function call so the report is accurate. The output is designed to be
similar to `statprof`.
Written in C++ and inline assembly to minimize overhead. Cython is used as a
glue layer.
Comparing to `lsprof`, the overhead is similarly small. The major difference is
this profiler cares about "frame object" and records `frame->f_back` (parent
frame) so it could draw a statprof-like tree of the callgraph. `lsprof` records
"code object" instead and could have difficulty to deal with recursive calls.
The following code could demostrate the difference:
```
def fib(x):
if x < 2:
return x
return fib(x - 1) + fib(x - 2)
# lsprof - recorded 1 code object for fib and no way to figure out callgraph
import _lsprof
p = _lsprof.Profiler()
p.enable(subcalls=True)
fib(5)
p.disable()
for s in p.getstats():
print(s)
'''output:
_lsprof.profiler_entry(code="<method 'disable' of '_lsprof.Profiler' objects>", callcount=1, reccallcount=0, totaltime=0.0, inlinetime=0.0, calls=None)
_lsprof.profiler_entry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=15, reccallcount=14, totaltime=8e-06, inlinetime=8e-06, calls=[_lsprof.profiler_subentry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=14, reccallcount=12, totaltime=6e-06, inlinetime=6e-06)])
'''
# traceprof (with timethreshold set to 0) - callgraph drawn correctly
import sys
from hgext3rd import traceprof
with traceprof.profile(None, sys.stderr):
fib(5)
'''output:
| <module> a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
\ __exit__ contextlib.py:21
'''
```
There are some performance / correctness tradeoffs. The current implementation
takes `frame` address and several (but not theoretically complete) fields to
hash a frame. That could in theory lead to inaccurate frame information being
recorded. But hashing all fields recursively incurs significant overhead. The
related logic is the `hashframe` function, which could be tweaked later.
This is an initial version. It probably does not cover all corner cases like
exception handling well. And it may duplicate a same frame multiple times
because of the way it hashes a frame. The duplicated frame issue could be
solved by adding post-profiling filtering though.
Here is a real-world example of `hg log -r .` on `hg-committed`:
```
| <module> hg:10
| run dispatch.py:81
| dispatch dispatch.py:132
| _runcatch dispatch.py:202
| _callcatch dispatch.py:309
| callcatch scmutil.py:133
| _runcatchfunc dispatch.py:215
| _dispatch dispatch.py:744
2 \ repository hg.py:166
2 | _peerorrepo hg.py:151
2 | instance localrepo.py:2025
2 | __init__ localrepo.py:275
863 \ runcommand dispatch.py:656
863 | _runcommand dispatch.py:926
863 | <lambda> dispatch.py:918
863 | check util.py:1059
863 | log commands.py:3262
480 \ getlogrevs cmdutil.py:2236
479 | _logrevs cmdutil.py:2190
479 | revrange scmutil.py:429
479 | anyrevs localrepo.py:622
478 | mfunc revset.py:2275
478 | __init__ (4 times) smartset.py:1033
478 | __init__ (4 times) smartset.py:919
478 | __len__ (4 times) localrepo.py:587
478 | changelog repoview.py:309
477 | filterrevs (11 times) repoview.py:260
477 | computehidden repoview.py:171
455 \ hideablerevs repoview.py:24
455 | getrevs obsolete.py:1114
455 | _computeobsoleteset obsolete.py:1143
3 \ __get__ (2 times) localrepo.py:76
7 \ node (5760 times) changelog.py:359
4 | node (8938 times) revlog.py:482
9 \ __get__ (5760 times) localrepo.py:76
428 \ __get__ util.py:798
428 | successors obsolete.py:672
225 \ __get__ util.py:798
224 | _all obsolete.py:662
6 \ tryread vfs.py:32
6 | read vfs.py:60
200 \ _readmarkers obsolete.py:442
* 199 | _fm1readmarkers obsolete.py:430
15 \ _checkinvalidmarkers obsolete.py:523
* 203 \ _addsuccessors obsolete.py:504
3 \ tryreadcache repoview.py:166
3 | cachehash repoview.py:98
3 | heads localrepo.py:1884
17 \ <genexpr> (5225 times) repoview.py:191
15 | __contains__ (5226 times) ancestor.py:334
5 | parentrevs (4010 times) changelog.py:371
2 | parentrevs (4332 times) revlog.py:479
9 \ pager ui.py:843
9 | _runpager ui.py:906
8 | __init__ subprocess.py:330
8 | _execute_child subprocess.py:880
5 | _eintr_retry_call subprocess.py:118
374 \ show cmdutil.py:1308
374 | _show cmdutil.py:1316
365 \ _changesetlabels cmdutil.py:1266
365 | troubled context.py:225
12 \ unstable (2 times) context.py:207
12 | getrevs obsolete.py:1114
12 | _computeunstableset obsolete.py:1154
3 \ set (322 times) localrepo.py:610
3 \ revs (3 times) localrepo.py:593
8 \ __getitem__ (963 times) localrepo.py:566
4 \ parents (321 times) context.py:246
4 | __get__ (321 times) util.py:798
4 | _parents (321 times) context.py:562
3 \ <genexpr> (642 times) obsolete.py:1164
3 | obsolete (321 times) context.py:199
2 | getrevs (321 times) obsolete.py:1114
319 \ bumped (2 times) context.py:211
319 | getrevs obsolete.py:1114
319 | _computebumpedset obsolete.py:1181
4 \ set (322 times) localrepo.py:610
312 \ allprecursors (2021 times) obsolete.py:850
308 | __get__ util.py:798
308 | precursors obsolete.py:678
* 308 | _addprecursors obsolete.py:509
34 \ divergent (2 times) context.py:218
34 | getrevs obsolete.py:1114
33 | _computedivergentset obsolete.py:1204
4 \ set (322 times) localrepo.py:610
24 \ successorssets (1368 times) obsolete.py:899
3 \ __get__ (1368 times) localrepo.py:76
4 \ __contains__ (200 times) localrepo.py:575
9 \ names (2 times) namespaces.py:184
9 | <lambda> namespaces.py:43
9 | nodetags localrepo.py:741
8 | __get__ util.py:798
8 | _tagscache localrepo.py:646
8 | _findtags localrepo.py:685
7 | findglobaltags tags.py:170
7 | _readtagcache tags.py:370
6 | filteredhash scmutil.py:307
\ __exit__ contextlib.py:21
| maybeprofile profiling.py:199
| __exit__ contextlib.py:21
| profile profiling.py:148
| __exit__ contextlib.py:21
Total time: 867 ms
```
Some example conclusions from reading the above output include:
1. `_fm1readmarkers`, `_addsuccessors`, `_addprecursors` are slow
2. `changelog.node` and `changelog.parentrevs` are called thousands of times,
but they are just a few milliseconds.
3. Reading the `obsstore` file (`vfs.read`) takes 5ms exactly
While `statprof` may get 1 right, `lsprof` may get 2 right, neither of them is
helpful for 3. `statprof` is not accurate enough to be confident about 5ms,
`lsprof` will include other `vfs.read` calls so it's impossible to see just the
read for obsstore.
Test Plan:
With the upstream [patch](https://patchwork.mercurial-scm.org/patch/20822/), run:
```
HGPROF=traceprof hg id --config extensions.traceprof= --profile
```
Reviewers: #mercurial, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy, mjpieters, jiangyuan
Differential Revision: https://phabricator.intern.facebook.com/D5104851
Signature: t1:5104851:1495480386:666847ea08bb6a94658bc10d7f0e91c225d56836
2017-05-23 01:49:45 +03:00
|
|
|
void settimethreshold(double)
|
|
|
|
void setcountthreshold(size_t)
|
2017-05-23 21:50:33 +03:00
|
|
|
void setdedup(int)
|
|
|
|
void clear()
|
traceprof: new extension to perform accurate profiling
Summary:
Instead of sampling periodically like the default profiler `statprof`, trace
every function call so the report is accurate. The output is designed to be
similar to `statprof`.
Written in C++ and inline assembly to minimize overhead. Cython is used as a
glue layer.
Comparing to `lsprof`, the overhead is similarly small. The major difference is
this profiler cares about "frame object" and records `frame->f_back` (parent
frame) so it could draw a statprof-like tree of the callgraph. `lsprof` records
"code object" instead and could have difficulty to deal with recursive calls.
The following code could demostrate the difference:
```
def fib(x):
if x < 2:
return x
return fib(x - 1) + fib(x - 2)
# lsprof - recorded 1 code object for fib and no way to figure out callgraph
import _lsprof
p = _lsprof.Profiler()
p.enable(subcalls=True)
fib(5)
p.disable()
for s in p.getstats():
print(s)
'''output:
_lsprof.profiler_entry(code="<method 'disable' of '_lsprof.Profiler' objects>", callcount=1, reccallcount=0, totaltime=0.0, inlinetime=0.0, calls=None)
_lsprof.profiler_entry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=15, reccallcount=14, totaltime=8e-06, inlinetime=8e-06, calls=[_lsprof.profiler_subentry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=14, reccallcount=12, totaltime=6e-06, inlinetime=6e-06)])
'''
# traceprof (with timethreshold set to 0) - callgraph drawn correctly
import sys
from hgext3rd import traceprof
with traceprof.profile(None, sys.stderr):
fib(5)
'''output:
| <module> a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
\ __exit__ contextlib.py:21
'''
```
There are some performance / correctness tradeoffs. The current implementation
takes `frame` address and several (but not theoretically complete) fields to
hash a frame. That could in theory lead to inaccurate frame information being
recorded. But hashing all fields recursively incurs significant overhead. The
related logic is the `hashframe` function, which could be tweaked later.
This is an initial version. It probably does not cover all corner cases like
exception handling well. And it may duplicate a same frame multiple times
because of the way it hashes a frame. The duplicated frame issue could be
solved by adding post-profiling filtering though.
Here is a real-world example of `hg log -r .` on `hg-committed`:
```
| <module> hg:10
| run dispatch.py:81
| dispatch dispatch.py:132
| _runcatch dispatch.py:202
| _callcatch dispatch.py:309
| callcatch scmutil.py:133
| _runcatchfunc dispatch.py:215
| _dispatch dispatch.py:744
2 \ repository hg.py:166
2 | _peerorrepo hg.py:151
2 | instance localrepo.py:2025
2 | __init__ localrepo.py:275
863 \ runcommand dispatch.py:656
863 | _runcommand dispatch.py:926
863 | <lambda> dispatch.py:918
863 | check util.py:1059
863 | log commands.py:3262
480 \ getlogrevs cmdutil.py:2236
479 | _logrevs cmdutil.py:2190
479 | revrange scmutil.py:429
479 | anyrevs localrepo.py:622
478 | mfunc revset.py:2275
478 | __init__ (4 times) smartset.py:1033
478 | __init__ (4 times) smartset.py:919
478 | __len__ (4 times) localrepo.py:587
478 | changelog repoview.py:309
477 | filterrevs (11 times) repoview.py:260
477 | computehidden repoview.py:171
455 \ hideablerevs repoview.py:24
455 | getrevs obsolete.py:1114
455 | _computeobsoleteset obsolete.py:1143
3 \ __get__ (2 times) localrepo.py:76
7 \ node (5760 times) changelog.py:359
4 | node (8938 times) revlog.py:482
9 \ __get__ (5760 times) localrepo.py:76
428 \ __get__ util.py:798
428 | successors obsolete.py:672
225 \ __get__ util.py:798
224 | _all obsolete.py:662
6 \ tryread vfs.py:32
6 | read vfs.py:60
200 \ _readmarkers obsolete.py:442
* 199 | _fm1readmarkers obsolete.py:430
15 \ _checkinvalidmarkers obsolete.py:523
* 203 \ _addsuccessors obsolete.py:504
3 \ tryreadcache repoview.py:166
3 | cachehash repoview.py:98
3 | heads localrepo.py:1884
17 \ <genexpr> (5225 times) repoview.py:191
15 | __contains__ (5226 times) ancestor.py:334
5 | parentrevs (4010 times) changelog.py:371
2 | parentrevs (4332 times) revlog.py:479
9 \ pager ui.py:843
9 | _runpager ui.py:906
8 | __init__ subprocess.py:330
8 | _execute_child subprocess.py:880
5 | _eintr_retry_call subprocess.py:118
374 \ show cmdutil.py:1308
374 | _show cmdutil.py:1316
365 \ _changesetlabels cmdutil.py:1266
365 | troubled context.py:225
12 \ unstable (2 times) context.py:207
12 | getrevs obsolete.py:1114
12 | _computeunstableset obsolete.py:1154
3 \ set (322 times) localrepo.py:610
3 \ revs (3 times) localrepo.py:593
8 \ __getitem__ (963 times) localrepo.py:566
4 \ parents (321 times) context.py:246
4 | __get__ (321 times) util.py:798
4 | _parents (321 times) context.py:562
3 \ <genexpr> (642 times) obsolete.py:1164
3 | obsolete (321 times) context.py:199
2 | getrevs (321 times) obsolete.py:1114
319 \ bumped (2 times) context.py:211
319 | getrevs obsolete.py:1114
319 | _computebumpedset obsolete.py:1181
4 \ set (322 times) localrepo.py:610
312 \ allprecursors (2021 times) obsolete.py:850
308 | __get__ util.py:798
308 | precursors obsolete.py:678
* 308 | _addprecursors obsolete.py:509
34 \ divergent (2 times) context.py:218
34 | getrevs obsolete.py:1114
33 | _computedivergentset obsolete.py:1204
4 \ set (322 times) localrepo.py:610
24 \ successorssets (1368 times) obsolete.py:899
3 \ __get__ (1368 times) localrepo.py:76
4 \ __contains__ (200 times) localrepo.py:575
9 \ names (2 times) namespaces.py:184
9 | <lambda> namespaces.py:43
9 | nodetags localrepo.py:741
8 | __get__ util.py:798
8 | _tagscache localrepo.py:646
8 | _findtags localrepo.py:685
7 | findglobaltags tags.py:170
7 | _readtagcache tags.py:370
6 | filteredhash scmutil.py:307
\ __exit__ contextlib.py:21
| maybeprofile profiling.py:199
| __exit__ contextlib.py:21
| profile profiling.py:148
| __exit__ contextlib.py:21
Total time: 867 ms
```
Some example conclusions from reading the above output include:
1. `_fm1readmarkers`, `_addsuccessors`, `_addprecursors` are slow
2. `changelog.node` and `changelog.parentrevs` are called thousands of times,
but they are just a few milliseconds.
3. Reading the `obsstore` file (`vfs.read`) takes 5ms exactly
While `statprof` may get 1 right, `lsprof` may get 2 right, neither of them is
helpful for 3. `statprof` is not accurate enough to be confident about 5ms,
`lsprof` will include other `vfs.read` calls so it's impossible to see just the
read for obsstore.
Test Plan:
With the upstream [patch](https://patchwork.mercurial-scm.org/patch/20822/), run:
```
HGPROF=traceprof hg id --config extensions.traceprof= --profile
```
Reviewers: #mercurial, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy, mjpieters, jiangyuan
Differential Revision: https://phabricator.intern.facebook.com/D5104851
Signature: t1:5104851:1495480386:666847ea08bb6a94658bc10d7f0e91c225d56836
2017-05-23 01:49:45 +03:00
|
|
|
|
|
|
|
cdef extern from "Python.h":
|
|
|
|
FILE* PyFile_AsFile(PyObject *p)
|
|
|
|
|
|
|
|
@contextlib.contextmanager
|
|
|
|
def profile(ui, fp):
|
|
|
|
if ui is not None:
|
|
|
|
if ui.configbool('traceprof', 'disablegc'):
|
|
|
|
gc.disable() # slightly more predictable
|
|
|
|
microseconds = ui.configint('traceprof', 'timethreshold')
|
|
|
|
if microseconds is not None:
|
|
|
|
settimethreshold((<double>microseconds) / 1000.0)
|
|
|
|
count = ui.configint('traceprof', 'countthreshold')
|
|
|
|
if count is not None:
|
|
|
|
setcountthreshold(count)
|
2017-05-23 21:50:33 +03:00
|
|
|
dedup = ui.configbool('traceprof', 'framededup', True)
|
|
|
|
setdedup(<int>dedup)
|
traceprof: new extension to perform accurate profiling
Summary:
Instead of sampling periodically like the default profiler `statprof`, trace
every function call so the report is accurate. The output is designed to be
similar to `statprof`.
Written in C++ and inline assembly to minimize overhead. Cython is used as a
glue layer.
Comparing to `lsprof`, the overhead is similarly small. The major difference is
this profiler cares about "frame object" and records `frame->f_back` (parent
frame) so it could draw a statprof-like tree of the callgraph. `lsprof` records
"code object" instead and could have difficulty to deal with recursive calls.
The following code could demostrate the difference:
```
def fib(x):
if x < 2:
return x
return fib(x - 1) + fib(x - 2)
# lsprof - recorded 1 code object for fib and no way to figure out callgraph
import _lsprof
p = _lsprof.Profiler()
p.enable(subcalls=True)
fib(5)
p.disable()
for s in p.getstats():
print(s)
'''output:
_lsprof.profiler_entry(code="<method 'disable' of '_lsprof.Profiler' objects>", callcount=1, reccallcount=0, totaltime=0.0, inlinetime=0.0, calls=None)
_lsprof.profiler_entry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=15, reccallcount=14, totaltime=8e-06, inlinetime=8e-06, calls=[_lsprof.profiler_subentry(code=<code object fib at 0x7f27ca587530, file "a.py", line 1>, callcount=14, reccallcount=12, totaltime=6e-06, inlinetime=6e-06)])
'''
# traceprof (with timethreshold set to 0) - callgraph drawn correctly
import sys
from hgext3rd import traceprof
with traceprof.profile(None, sys.stderr):
fib(5)
'''output:
| <module> a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
0 \ fib a.py:1
0 | fib a.py:1
\ __exit__ contextlib.py:21
'''
```
There are some performance / correctness tradeoffs. The current implementation
takes `frame` address and several (but not theoretically complete) fields to
hash a frame. That could in theory lead to inaccurate frame information being
recorded. But hashing all fields recursively incurs significant overhead. The
related logic is the `hashframe` function, which could be tweaked later.
This is an initial version. It probably does not cover all corner cases like
exception handling well. And it may duplicate a same frame multiple times
because of the way it hashes a frame. The duplicated frame issue could be
solved by adding post-profiling filtering though.
Here is a real-world example of `hg log -r .` on `hg-committed`:
```
| <module> hg:10
| run dispatch.py:81
| dispatch dispatch.py:132
| _runcatch dispatch.py:202
| _callcatch dispatch.py:309
| callcatch scmutil.py:133
| _runcatchfunc dispatch.py:215
| _dispatch dispatch.py:744
2 \ repository hg.py:166
2 | _peerorrepo hg.py:151
2 | instance localrepo.py:2025
2 | __init__ localrepo.py:275
863 \ runcommand dispatch.py:656
863 | _runcommand dispatch.py:926
863 | <lambda> dispatch.py:918
863 | check util.py:1059
863 | log commands.py:3262
480 \ getlogrevs cmdutil.py:2236
479 | _logrevs cmdutil.py:2190
479 | revrange scmutil.py:429
479 | anyrevs localrepo.py:622
478 | mfunc revset.py:2275
478 | __init__ (4 times) smartset.py:1033
478 | __init__ (4 times) smartset.py:919
478 | __len__ (4 times) localrepo.py:587
478 | changelog repoview.py:309
477 | filterrevs (11 times) repoview.py:260
477 | computehidden repoview.py:171
455 \ hideablerevs repoview.py:24
455 | getrevs obsolete.py:1114
455 | _computeobsoleteset obsolete.py:1143
3 \ __get__ (2 times) localrepo.py:76
7 \ node (5760 times) changelog.py:359
4 | node (8938 times) revlog.py:482
9 \ __get__ (5760 times) localrepo.py:76
428 \ __get__ util.py:798
428 | successors obsolete.py:672
225 \ __get__ util.py:798
224 | _all obsolete.py:662
6 \ tryread vfs.py:32
6 | read vfs.py:60
200 \ _readmarkers obsolete.py:442
* 199 | _fm1readmarkers obsolete.py:430
15 \ _checkinvalidmarkers obsolete.py:523
* 203 \ _addsuccessors obsolete.py:504
3 \ tryreadcache repoview.py:166
3 | cachehash repoview.py:98
3 | heads localrepo.py:1884
17 \ <genexpr> (5225 times) repoview.py:191
15 | __contains__ (5226 times) ancestor.py:334
5 | parentrevs (4010 times) changelog.py:371
2 | parentrevs (4332 times) revlog.py:479
9 \ pager ui.py:843
9 | _runpager ui.py:906
8 | __init__ subprocess.py:330
8 | _execute_child subprocess.py:880
5 | _eintr_retry_call subprocess.py:118
374 \ show cmdutil.py:1308
374 | _show cmdutil.py:1316
365 \ _changesetlabels cmdutil.py:1266
365 | troubled context.py:225
12 \ unstable (2 times) context.py:207
12 | getrevs obsolete.py:1114
12 | _computeunstableset obsolete.py:1154
3 \ set (322 times) localrepo.py:610
3 \ revs (3 times) localrepo.py:593
8 \ __getitem__ (963 times) localrepo.py:566
4 \ parents (321 times) context.py:246
4 | __get__ (321 times) util.py:798
4 | _parents (321 times) context.py:562
3 \ <genexpr> (642 times) obsolete.py:1164
3 | obsolete (321 times) context.py:199
2 | getrevs (321 times) obsolete.py:1114
319 \ bumped (2 times) context.py:211
319 | getrevs obsolete.py:1114
319 | _computebumpedset obsolete.py:1181
4 \ set (322 times) localrepo.py:610
312 \ allprecursors (2021 times) obsolete.py:850
308 | __get__ util.py:798
308 | precursors obsolete.py:678
* 308 | _addprecursors obsolete.py:509
34 \ divergent (2 times) context.py:218
34 | getrevs obsolete.py:1114
33 | _computedivergentset obsolete.py:1204
4 \ set (322 times) localrepo.py:610
24 \ successorssets (1368 times) obsolete.py:899
3 \ __get__ (1368 times) localrepo.py:76
4 \ __contains__ (200 times) localrepo.py:575
9 \ names (2 times) namespaces.py:184
9 | <lambda> namespaces.py:43
9 | nodetags localrepo.py:741
8 | __get__ util.py:798
8 | _tagscache localrepo.py:646
8 | _findtags localrepo.py:685
7 | findglobaltags tags.py:170
7 | _readtagcache tags.py:370
6 | filteredhash scmutil.py:307
\ __exit__ contextlib.py:21
| maybeprofile profiling.py:199
| __exit__ contextlib.py:21
| profile profiling.py:148
| __exit__ contextlib.py:21
Total time: 867 ms
```
Some example conclusions from reading the above output include:
1. `_fm1readmarkers`, `_addsuccessors`, `_addprecursors` are slow
2. `changelog.node` and `changelog.parentrevs` are called thousands of times,
but they are just a few milliseconds.
3. Reading the `obsstore` file (`vfs.read`) takes 5ms exactly
While `statprof` may get 1 right, `lsprof` may get 2 right, neither of them is
helpful for 3. `statprof` is not accurate enough to be confident about 5ms,
`lsprof` will include other `vfs.read` calls so it's impossible to see just the
read for obsstore.
Test Plan:
With the upstream [patch](https://patchwork.mercurial-scm.org/patch/20822/), run:
```
HGPROF=traceprof hg id --config extensions.traceprof= --profile
```
Reviewers: #mercurial, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy, mjpieters, jiangyuan
Differential Revision: https://phabricator.intern.facebook.com/D5104851
Signature: t1:5104851:1495480386:666847ea08bb6a94658bc10d7f0e91c225d56836
2017-05-23 01:49:45 +03:00
|
|
|
enable()
|
|
|
|
try:
|
|
|
|
yield
|
|
|
|
finally:
|
|
|
|
disable()
|
2018-01-10 00:26:14 +03:00
|
|
|
IF UNAME_SYSNAME == "Windows":
|
|
|
|
# On Windows, "FILE*" could be incompatible between Python.exe and
|
|
|
|
# traceimpl.exp. Use stderr from traceimpl.exp conservatively.
|
|
|
|
report()
|
|
|
|
ELSE:
|
|
|
|
report(PyFile_AsFile(<PyObject *>fp))
|
2017-08-15 04:12:29 +03:00
|
|
|
clear()
|